GitHub - psp3dcg/LightRoseTTA: The official pytorch implementation of "LightRoseTTA: High-efficient and Accurate Protein Structure Prediction Using an Ultra-Lightweight Deep Graph Model"

LightRoseTTA - Pytorch

Installation

Clone the package

git clone https://github.com/psp3dcg/LightRoseTTA.git
cd LightRoseTTA

Create conda environment using LightRoseTTA-env.yml file

# create conda environment for LightRoseTTA
conda env create -f LightRoseTTA-env.yml

Download and Install the biological packages

$ bash install_bio_package.sh
$ copy bio_package(blast, csblast, psipred) to your_path/LightRoseTTA/msa_feat

Download the Uniref30[46G], BFD[272G] and pdb100[over 100G] datasets

$ python download_datasets.py

Test datasets

Download the test datasets from google drive

Testing steps

# run the testing python file
python test_script.py [FASTA_folder_path] [data_write_path] [Uniref30_dataset_path]  [pdb100_dataset_path] [BFD_dataset_path] [model_file_path]
	
-FASTA_folder_path: the path of folder containing FASTA files
-data_write_path: the path of folder to write generated data
-Uniref30_dataset_path: the path of Uniref30 dataset
-pdb100_dataset_path: the path of pdb100 dataset
-BFD_dataset_path: the path of BFD dataset
-model_file_path: the path of model file

For example,
# for general proteins
python test_script.py ./Orphan25_fasta ./Orphan25_data ./Uniref30_2020_06 ./pdb100_2021Mar03 ./BFD ./weights/LightRoseTTA.pth

# for antibodies
python test_script.py ./Antibody_fasta ./Antibody_data ./Uniref30_2020_06 ./pdb100_2021Mar03 ./BFD ./weights/LightRoseTTA-Ab.pth

The output "*.pdb" files are located in "data_write_path/predict_pdb" (e.g. Orphan25_data/predict_pdb)

Training steps

# prepare the training data provided by LightRoseTTA_preprocess_train_data
(a) download the LightRoseTTA_preprocess_train_data.zip and unzip it
(b) prepare the ".fasta" files and corresponding ".pdb" files
(c) cd "LightRoseTTA_preprocess_train_data" folder and generate data following the README.md

# run the training python file
(a) download the LightRoseTTA_train_data.zip and unzip it
(b) cd "LightRoseTTA_preprocess_train_data" folder
(c) python LightRoseTTA_train.py -dataset [training_data_path]
ps: training_data_path should include "raw" folder and "processed" folder

References

X Wang, et al., LightRoseTTA: High-efficient and Accurate Protein Structure Prediction Using an Ultra-Lightweight Deep Graph Model, bioRxiv 10.1101/2023.11.20.566676 (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
atom_graph		atom_graph
confs		confs
example/Orphan25_fasta		example/Orphan25_fasta
model		model
msa_feat		msa_feat
utils		utils
weights		weights
LightRoseTTA-env.yml		LightRoseTTA-env.yml
LightRoseTTA_preprocess_train_data.zip		LightRoseTTA_preprocess_train_data.zip
LightRoseTTA_test.py		LightRoseTTA_test.py
LightRoseTTA_training_code.zip		LightRoseTTA_training_code.zip
README.md		README.md
data_pipeline.py		data_pipeline.py
download_datasets.py		download_datasets.py
generate_LightRoseTTA_data.py		generate_LightRoseTTA_data.py
generate_pt_file.py		generate_pt_file.py
install_bio_packages.sh		install_bio_packages.sh
test_script.py		test_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LightRoseTTA - Pytorch

Installation

Test datasets

Testing steps

Training steps

References

About

Releases

Packages

Languages

psp3dcg/LightRoseTTA

Folders and files

Latest commit

History

Repository files navigation

LightRoseTTA - Pytorch

Installation

Test datasets

Testing steps

Training steps

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages