A dataset can be used by accessing DatasetCatalog
for its data, or MetadataCatalog for its metadata (class names, etc).
This document explains how to setup the builtin datasets so they can be used by the above APIs.
Use Custom Datasets gives a deeper dive on how to use DatasetCatalog
and MetadataCatalog
,
and how to add new datasets to them.
ZegFormer has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
DETECTRON2_DATASETS
.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
$DETECTRON2_DATASETS/
coco/
ADE20K_2021_17_01/
You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets
.
If left unset, the default is ./datasets
relative to your current working directory.
Prepare data for COCO-Stuff:
coco/
coco_stuff/
annotations/
train2017/
000000144874.png
...
val2017/
000000213035.png
...
images/
train2017/
000000189148.jpg
...
val2017/
000000213547.jpg
...
word_vectors/
fasttext.pkl
glove.pkl
word2vec.pkl
# below are generated by prepare_coco_stuff_sem_seg.py
split/
seen_cls.npy
val_cls.npy
novel_cls.npy
seen_classnames.json
unseen_classnames.json
all_classnames.json
...
annotations_detectron2/
train2017/
val2017_unseen/
Get the COCO (2017) images from https://cocodataset.org/
wget http://images.cocodataset.org/zips/train2017.zip
wget http://images.cocodataset.org/zips/val2017.zip
Get the COCO-Stuff annotation from https://github.com/nightrome/cocostuff.
wget http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
Unzip train2017.zip
, val2017
, and stuffthingmaps_trainval2017.zip
. Then put them to the correct location listed above.
Split the classes into seen and unseen for training and testing.
python datasets/coco-stuff/create_cocostuff_class_names_json.py
Generate the labels for training and testing.
python datasets/coco-stuff/prepare_coco_stuff_sem_seg_seen.py
python datasets/coco-stuff/prepare_coco_stuff_sem_seg_unseen.py
python datasets/coco-stuff/prepare_coco_stuff_sem_seg_val_all.py
Prepare data for ADE20k-Full:
Download the data of ADE20k-Full from https://groups.csail.mit.edu/vision/datasets/ADE20K/request_data/
ADE20K_2021_17_01/
images/
images_detectron2_freq/
annotations_detectron2_freq/
index_ade20k.pkl
index_ade20k.mat
objects.txt
ADE20K_275_pure_class.json
ADE20K_572_pure_class.json
ADE20K_847_pure_class.json
The ADE20K_275_pure_class.json
, ADE20K_572_pure_class.json
, ADE20K_847_pure_class.json
, images_detectron2
and annotations_detectron2
are generated by the following scripts
python datasets/ade20k-full-frequency-split/create_ade-frequency_json.py
python datasets/ade20k-full-frequency-split/prepare_ade20k_full_frequency_all_val.py
python datasets/ade20k-full-frequency-split/prepare_ade20k_full_frequency_seen.py
python datasets/ade20k-full-frequency-split/prepare_ade20k_full_frequency_unseen_val.py
We follow the CaGNet to set up the training and testing data of PASCAL VOC. We also create a copy on the google drive for the convenience.
VOCZERO/
images/
train/
2011_003261.jpg
...
val/
2011_003145.jpg
...
annotations/
train/
2011_003255.png
...
val/
2011_003103.png
...
all_classnames.json
seen_classnames.json
unseen_classnames.json
annotations_detectron2/
train_seen
python datasets/pascal/create_voc_class_names_json.py
python datasets/pascal/prepare_pascal_voc_seen.py
python datasets/pascal/prepare_pascal_voc_unseen_val.py
python datasets/pascal/prepare_pascal_voc_val_all.py