-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data integrity issue in DDD 17 dataset #10
Comments
Thanks for pointing out some data integrity problems.
First of all, have you tried DDD20 (the Resilio Sync folder called
DDD17-fordfocus)? It was collected with more care for data integrity and
much more rigorous checking of the files. Please check the DDD20 webpage
https://sites.google.com/view/davis-driving-dataset-2020/home .
The DDD20 ford focus recordings are in folders organized by days (e.g.
aug01)
Tobi
|
The DDD 20 is just too large for me to store, so I just downloaded the DDD 17 dataset. |
Oh, I see. You do not want to pay for the personal copy of Resilio to
enable selective sync? I agree that $60 is a lot for one time use....
Unfortunately we cannot host this nearly 1TB via other channels right
now. What we can do is put a few samples from the entire dataset on
gdrive. Which recordings would be best?
Tobi
|
Well, I'm not asking for a few samples from DDD 20. Because of practical issues, I would like to stick with DDD 17. And then I found some integrity issues in DDD 17. I want to know if these integrity issues can be resolved. e.g. will sorting by timestamps solve the problem? Or is it caused by some deeper reason meaning that the entire recording is invalid? If the latter case is true, maybe you can mention it on the DDD 17 homepage or just remove those invalid recordings to avoid unnecessary download of invalid files. |
We will take a look at the DDD17 files again.... it might take some time
because the python environment is not setup now on any of our computers.
Regarding *rec1487858093.hdf5, *I cannot see it either in my sync of the
folder. I will see if it is on a backup or the root server at the lab.
Regarding the other recordings: In general there is no guarantee that
timestamps of records increase monotonically in time. It is a pain but
a fact... The packets were written by the code to the hdf5 file during
acquisition, but apparently can appear slightly out of order. At least
that is my experience with rosbag recordings. Does the data look OK in
viewer? I'm not sure what we did to sort them in original dataset paper
(ICLR paper). You can try to sort them as I do in the jAER
RosbagFileInputStream (and it should be a lot easier in python).
Sorry about the hassles.
Tobi
**
|
Thanks. The DDD 17 dataset is stored in a headless server and so I cannot view it. By original dataset paper, do you mean the ICML workshop paper "DDD17: End-To-End DAVIS Driving Dataset"? I don't see a ICLR paper. |
I'm attempting a restore from remote S3 backup to see if that contains
the missing run5 file... cross fingers for this 35GB file.
Unfortunately, I recycled the HDD with the source copies. It's possible
we might have a copy somewhere else too, but it could be lost.
|
Very unfortunately, this file seems to be lost forever unless someone
else out there still has a copy of it... I don't have any on any of my
computers or HDDs
…On 14.01.22 07:47, youkaichao wrote:
Hi, thanks for your valuable efforts in providing such a large
dataset. When I try to use the DDD 17 dataset, I encountered several
integrity issues:
1. I download the DDD 17 dataset via resilio sync. It seems
*run5/rec1487858093.hdf5 is missing*. Is it a problem at the
server side or the client side? If it is a client-side issue, I
can re-download that file.
[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage",
"potentialAction": { ***@***.***": "ViewAction", "target":
"#10", "url":
"#10", "name": "View
Issue" }, "description": "View this Issue on GitHub", "publisher": {
***@***.***": "Organization", "name": "GitHub", "url": "https://github.com"
} } ]
|
Hi, thanks for your valuable efforts in providing such a large dataset. When I try to use the DDD 17 dataset, I encountered several integrity issues:
Below are a list of recordings with data integrity issues:
run3/rec1487355090.hdf5
run3/rec1487356509.hdf5
run3/rec1487417411.hdf5
run3/rec1487419513.hdf5
run3/rec1487424147.hdf5
run3/rec1487427200.hdf5
run3/rec1487430438.hdf5
run3/rec1487433587.hdf5
run3/rec1487594667.hdf5
run3/rec1487600962.hdf5
run5/rec1487849663.hdf5
run5/rec1487860613.hdf5
run5/rec1487864316.hdf5
The text was updated successfully, but these errors were encountered: