- Obtain access to the MIMIC-CXR-JPG Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:
gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG
-
In order to obtain demographic information for each patient, you will need to obtain access to MIMIC-IV. Download
core/patients.csv.gz
andcore/admissions.csv.gz
and place the files in theMIMIC-CXR-JPG
directory. -
Move or create a symbolic link to the
MIMIC-CXR-JPG
folder from your data directory. -
Run
python -m scripts.process_data mimic_cxr --data_path <data_path>
.
-
Download the downsampled CheXpert dataset and extract it.
-
Register for an account and download the CheXpert demographics data here. Place the
CHEXPERT DEMO.xlsx
in your CheXpert directory. -
Move or create a symbolic link to the
CheXpert-v1.0-small
folder namedchexpert
in your data directory. -
Run
python -m scripts.process_data chexpert --data_path <data_path>
.
-
Download the
images
folder and theData_Entry_2017_v2020.csv
file from this link. Move the csv file into the parent directory of theimages
folder. -
Move or create a symbolic link to the parent folder named
ChestXray8
in your data directory. -
Run
python -m scripts.process_data nih --data_path <data_path>
.
-
We use a resized version of PadChest, which can be downloaded here.
-
Unzip
images-224.tar
. -
Move or create a symbolic link to this folder named
PadChest
in your data directory. This directory should contain the folderimages-224
and the filePADCHEST_chest_x_ray_images_labels_160K_01.02.19.csv
. -
Run
python -m scripts.process_data padchest --data_path <data_path>
.
-
Obtain access to the VinDr-CXR dataset on PhysioNet and download the dataset.
-
Move or create a symbolic link to this folder named
vindr-cxr
in your data directory. -
Run
python -m scripts.process_data vindr --data_path <data_path>
.
- Download the SIIM dataset from Kaggle.
kaggle datasets download -d jesperdramsch/siim-acr-pneumothorax-segmentation-data
-
Move or create a symbolic link to this folder named
SIIM
in your data directory. -
Run
python -m scripts.process_data siim --data_path <data_path>
.
-
Download the ISIC 2020 Challenge dataset (the JPEG zip file and the metadata v2 file). Unzip the zip file.
-
Move or create a symbolic link to the parent folder named
ISIC
in your data directory. This folder should contain the fileISIC_2020_Training_GroundTruth_v2.csv
and the foldertrain
. -
Run
python -m scripts.process_data isic --data_path <data_path>
.
-
Download the ODIR dataset. Unzip the training images.
-
Move or create a symbolic link to the parent folder named
ODIR
in your data directory. -
Run
python -m scripts.process_data odir --data_path <data_path>
.