Data Management Tips

Working with large video datasets can be challenging. We have compiled some tools and suggestions to make this easier.

Organizing metadata.

We recommend naming video files with metadata.

This helps keep videos organized, especially if the SRT files are stored elsewhere.
Keeping naming conventions consistent across the dataset makes it easier to query specific videos.
For example, a video file name could be something like "YYYYMMDD-species-location-videoidXX.mp4", where YYYMMDD is the date the video was collected, "species" is the animal type in the video, and "location" is the geographical location the footage was collected at. The "videoidXX" is the ID label automatically generated by the camera, we highly recommend retaining this information as it can be used to reference other automatically generated metadata, such the SRT file associated with the video, or telemetry data.
Labelling videos with metadata is also helpful when the video files need to be moved around different systems for storage or analysis, as discussed below.
See rename.sh for an example of renaming all video files in a directory. Note: this script was written for a collection of videos where the metadata (species and date) was stored in the video file path.

We recommend downgrading the raw video files to a lower resolution before uploading them to CVAT.

Analyzing long or high-resolution videos in CVAT can be challenging. Larger files may buffer, creating delays in the manual annotation process. Some instances of CVAT may not allow 4K or 5K videos to be uploaded.
We downgraded the raw videos from 5K to 1080p and then uploaded the 1080p version to CVAT. These lower-resolution videos ran much faster than the original raw videos while still having sufficient resolution to detect the animals manually.
See downgrade.sh for the script. Important Note!
If videos are downgraded in Step 2A (performing detections to create tracks) they must be upscaled for Step 2B (creating mini-scenes).
If the detections are done on a lower-resolution video, these annotations must be scaled back to match the original size before creating mini-scenes. Preserving the highest-resolution videos possible for the mini-scenes is essential. This ensures the behavior detection model has a higher success rate in detecting behavior in the video.

We recommend using a tool like Globus to transfer large datasets across systems.

You may want to keep some of your data on a different server than you use to run CVAT.
The CVAT instance must have at least 20% memory free to operate
To prevent out-of-memory issues, we kept our raw video data on a different server and moved the data back and forth to the server with our CVAT instance using Globus
To adopt this workflow, you should install Globus Connect Personal onto your server and provide it access to the directory for the host volume mounted to CVAT.
If you are comfortable working with command-line tools, rsync will also work. For details, see the rsync tutorial.