Skip to content

Pytorch Implementation of Ab-initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference (CryoSPIN), NeurIPS 2024

Notifications You must be signed in to change notification settings

shekshaa/semi-amortized-cryoem

Repository files navigation

CryoSPIN ❄️߷

CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference
Shayan Shekarforoush, David Lindell, Marcus Brubaker, David Fleet
NeurIPS 2024

TL;DR: We develop a new approach to ab-initio cryo-EM 3D reconstruction using semi-amortization to accelerate pose convergence and multi-head encoder to handle pose uncertainty.

Dependencies

The code is tested on Python 3.9 and Pytorch 1.12 with cuda version 11.3. Please run following commands to create a compatible (mini)conda environment called cryoSPIN:

# create env
conda create -n cryoSPIN python=3.9
conda activate cryoSPIN

# pytorch
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

# other libraries
pip install scipy==1.9.3 scikit-image==0.22.0 tqdm PyYAML matplotlib kornia notebook tensorboard numpy==1.23.4
pip install starfile==0.4.5 mrcfile

The code also depends on Pytorch3D. Install the latest stable version of it by running:

pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"

We also use EMAN2 command line interface to align the predicted map with the ground truth before computing Fourier Shell Correlation (FSC) and resolution.

Synthetic Data

To reproduce results, you first need to syntheticly generate 2D projections based on given 3D density maps. You can find density maps of Spliceosome, Spike Protein, and Heat Shock Protein (HSP) stored in mrcfiles folder. For each dataset, we provide a config file which primarily defines path to density map (.mrc), the number projections, and image size (e.g. 128). See the config file for more parameters. To generate data, run generate_data.py with the corresponding config file, e.g. for HSP:

python generate_data.py --config ./configs/mrc2star_hsp.yaml

As a result, the dataset will get stored locally in ./synthetic_data/hsp/. In this folder, you can find a star file called data.star storing the metadata (such as CTF parameters) accompanied with a folder called Particles storing particle images into several .mrcs files.

Once the synthetic data is ready, you can run the semi-amortized method,

python train_semi-amortized.py --config ./configs/train_synth.yaml --save_path path/to/save/logs

which will write the reconstruction logs in path/to/save/logs. You can run tensorboard to see several curves such as reconstruction error and mean/median rotation errors.

Within the config file train.yaml, several hyperparameters are defined. An important one is num_rotations which determines number of heads of CNN encoder during auto-encoding stage. Moreover, epochs_amortized and epochs_unamortized specify number of epochs spent on auto-encoding and auto-decoding stages, respectively. We provide a brief description for each hyperparamter in the config file.

Experimental Data

To evaluate on real datasets, we use 80S Ribosome (EMPIAR10028). Particle images are originally of size D=360 with Apix=1.34A. We downsample them to D=128. To ensure reproduciblity, we provide the corresponding particles and metadata in Zenodo link. Please download the data and place it in real_data folder.

Once the data is ready, you can run the semi-amortized method using the config file train_real.yaml:

python train_semi-amortized.py --config ./configs/train_real.yaml --save_path path/to/save/logs

Similarly, the results will be saved in path/to/save/logs.

Citation

@article{shekarforoush2024improving,
  title={Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference},
  author={Shekarforoush, Shayan and Lindell, David B and Brubaker, Marcus A and Fleet, David J},
  journal={arXiv preprint arXiv:2406.10455},
  year={2024}
}

Acknowledgement

This research was supported in part by the Province of Ontario, the Government of Canada, through NSERC, CIFAR, and the Canada First Research Excellence Fund for the Vision, Science to Applications (VISTA) programme, and by companies sponsoring the Vector Institute.

About

Pytorch Implementation of Ab-initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference (CryoSPIN), NeurIPS 2024

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages