bulk2sc is the first framework that provides a solid foundation for generating single-cell data from bulk RNA-seq datasets that learns cell type distributions from single cell reference data. bulk2sc consists of three components: scGMVAE, Bulk Encoder, and genVAE, and they are visualized in the following figure:
Below, we show four UMAPs that demonstrate the cell type clusters are different stages of bulk2sc: raw input data, reparameterized latent representation from GMM parameters
For a quick start, you can download the PBMC 3K data from the 10X Genomics website and pre-trained Bulk Encoder and scDecoder weights in Google Drive here. To run pre-trained model, simply place the unzipped files inside bulk2sc directory and run
cd bulk2sc
python main.py
To train with custom data, you will first need to:
0. If cell types are necessary, run scType.R
to them. You will need to modify the script for your specific data and filenames.
- Modify parameters in utils.py.
- Modify main.py to adjust filepath.
- Run
python main.py