Skip to content

Latest commit

 

History

History
81 lines (44 loc) · 3.63 KB

DataSources.md

File metadata and controls

81 lines (44 loc) · 3.63 KB

Downloading Datasets

MIMIC-CXR

  1. Obtain access to the MIMIC-CXR-JPG Database on PhysioNet and download the dataset. We recommend downloading from the GCP bucket:
gcloud auth login
mkdir MIMIC-CXR-JPG
gsutil -m rsync -d -r gs://mimic-cxr-jpg-2.0.0.physionet.org MIMIC-CXR-JPG
  1. In order to obtain demographic information for each patient, you will need to obtain access to MIMIC-IV. Download core/patients.csv.gz and core/admissions.csv.gz and place the files in the MIMIC-CXR-JPG directory.

  2. Move or create a symbolic link to the MIMIC-CXR-JPG folder from your data directory.

  3. Run python -m scripts.process_data mimic_cxr --data_path <data_path>.

CheXpert

  1. Download the downsampled CheXpert dataset and extract it.

  2. Register for an account and download the CheXpert demographics data here. Place the CHEXPERT DEMO.xlsx in your CheXpert directory.

  3. Move or create a symbolic link to the CheXpert-v1.0-small folder named chexpert in your data directory.

  4. Run python -m scripts.process_data chexpert --data_path <data_path>.

ChestX-ray8 (NIH)

  1. Download the images folder and the Data_Entry_2017_v2020.csv file from this link. Move the csv file into the parent directory of the images folder.

  2. Move or create a symbolic link to the parent folder named ChestXray8 in your data directory.

  3. Run python -m scripts.process_data nih --data_path <data_path>.

PadChest

  1. We use a resized version of PadChest, which can be downloaded here.

  2. Unzip images-224.tar.

  3. Move or create a symbolic link to this folder named PadChest in your data directory. This directory should contain the folder images-224 and the file PADCHEST_chest_x_ray_images_labels_160K_01.02.19.csv.

  4. Run python -m scripts.process_data padchest --data_path <data_path>.

VinDr-CXR

  1. Obtain access to the VinDr-CXR dataset on PhysioNet and download the dataset.

  2. Move or create a symbolic link to this folder named vindr-cxr in your data directory.

  3. Run python -m scripts.process_data vindr --data_path <data_path>.

SIIM

  1. Download the SIIM dataset from Kaggle.
kaggle datasets download -d jesperdramsch/siim-acr-pneumothorax-segmentation-data
  1. Move or create a symbolic link to this folder named SIIM in your data directory.

  2. Run python -m scripts.process_data siim --data_path <data_path>.

ISIC

  1. Download the ISIC 2020 Challenge dataset (the JPEG zip file and the metadata v2 file). Unzip the zip file.

  2. Move or create a symbolic link to the parent folder named ISIC in your data directory. This folder should contain the file ISIC_2020_Training_GroundTruth_v2.csv and the folder train.

  3. Run python -m scripts.process_data isic --data_path <data_path>.

ODIR

  1. Download the ODIR dataset. Unzip the training images.

  2. Move or create a symbolic link to the parent folder named ODIR in your data directory.

  3. Run python -m scripts.process_data odir --data_path <data_path>.