Splitting test/train/val and representative datasets, and convert to tfrecords #1510

therealpurplemana · 2024-05-28T22:38:56Z

Hi, thanks for your great project. I'm using it to export data from cvat.ai, manipulate, and re-export into Tensorflow format.

In my specific case, I'm combining homogenius datasets by adding sources to a project which I exported from cvat.ai (so I can prune out incompletely labeled datasets), then I run

!datum transform --project ./tfdata -t split -- -t detection \ --subset train:.7 --subset val:.15 --subset test:.15

After which, I run to export it:
!datum project export -p ./tfdata --format tf_detection_api -o ./final-export-tf_detection_api-detection -- --save-media (and --save-masks for segmentation export)

This produces a new folder with subfolders with /annotations and /images organized into train/test/val.json and respectively in the /images folder nicely packaged as TFRecords. There's also oddly a default.tfrecord but it was pretty small so I just deleted it.

Now, I also need a 20% representative dataset from my original dataset -- how do I "undo" the splits in my project? Or am I thinking about this incorrectly?

Currently, I need to delete the project, recreate it, re-add my sources, re-split into 20/80%, and then export again, and copy over the TFRecord.

Curious if there's an easier way to do this either through CLI or Python.

The text was updated successfully, but these errors were encountered:

jihyeonyi · 2024-05-30T23:56:41Z

Hi @therealpurplemana, thank you for your interest in our project.
Datumaro offers a version control feature, but it requires commits of the project.
Alternatively, you could combine all subsets into a single dataset and then re-split them as needed.

therealpurplemana · 2024-05-31T17:35:09Z

Thank you.

github-actions bot assigned jihyeonyi May 28, 2024

therealpurplemana closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting test/train/val and representative datasets, and convert to tfrecords #1510

Splitting test/train/val and representative datasets, and convert to tfrecords #1510

therealpurplemana commented May 28, 2024 •

edited

Loading

jihyeonyi commented May 30, 2024 •

edited

Loading

therealpurplemana commented May 31, 2024

Splitting test/train/val and representative datasets, and convert to tfrecords #1510

Splitting test/train/val and representative datasets, and convert to tfrecords #1510

Comments

therealpurplemana commented May 28, 2024 • edited Loading

jihyeonyi commented May 30, 2024 • edited Loading

therealpurplemana commented May 31, 2024

therealpurplemana commented May 28, 2024 •

edited

Loading

jihyeonyi commented May 30, 2024 •

edited

Loading