InsightFace provides a variety of labeled face dataset preprocessed to 112x112 size.
The unzipped dataset looks as follows
example: faces_webface_112x112
└── train.rec
├── train.idx │
├── train.lst │
├── agedb_30.bin │
├── calfw.bin │
├── cfp_ff.bin │
├── cfp_fp.bin │
├── cplfw.bin │
├── lfw.bin │
train.rec
contains all the training dataset images and rec
format combines all data to a single file
whilst allowing indexed access.
rec
file is good when one does not one to create millions of individual image files in storage.
We provide a training code that utilizes this rec
file directly without converting to jpg
format.
But if one ones to convert to jpg
images and train, refer to the next section.
- Download the dataset from insightface link
- Unzip it to a desired location,
DATASET_ROOT
ex)/data/
. - The result folder we will call
DATASET_NAME
, ex)faces_webface_112x112
. - For preprocessing run
python convert.py --rec_path <DATASET_ROOT>/<DATASET_NAME> --make_validation_memfiles
- During training,
- turn on the option
--use_mxrecord
- set
--data_root
equal toDATASET_ROOT
- set
--train_data_path
to theDATASET_NAME
. - set
--val_data_path
to theDATASET_NAME
.
- turn on the option
- Note you cannot turn on
--train_data_subset
option. For this you must expand the dataset to images (refer to below section).
Another option is to extract out all images from the InsightFace train.rec file. It uses the directory as label structure, and you can swap it with your own dataset.
- Download the dataset from insightface link
- Unzip it to a desired location,
DATASET_ROOT
ex)/data/
. - The result folder we will call
DATASET_NAME
, ex)faces_webface_112x112
. - For preprocessing run
python convert.py --rec_path <DATASET_ROOT>/<DATASET_NAME> --make_image_files --make_validation_memfiles
- During training,
- do not turn on the option
--use_mxrecord
- Rest are the same.
- set
--data_root
equal toDATASET_ROOT
- set
--train_data_path
to theDATASET_NAME
. - set
--val_data_path
to theDATASET_NAME
.
- do not turn on the option
1If you want to use your custom training dataset, prepare images in folder (as label) structure
and change the --data_root
and --train_data_path
accordingly. The custom dataset should be located at <data_root>/<train_data_path>
- Sample run scripts are provided in
scritps
- EX) Run
bash script/run_ir50_ms1mv2.sh
after changing the--data_root
and--train_data_path
to fit your needs. - If you are using ImageFolder dataset, then remove
--use_mxrecord
.
- [IMPORTANT] Once the training script has started, check if your image color channel is correct by looking at the sample stored in
<RUN_DIR>/training_samples
.