DODO: reDesign AlphaFOld2 Disorered regiOns

What is DODO?

DODO is a Python package and command-line utility for taking an AF2 structure and redesigning the disordered regions to make them look more like IDRs. To be clear, the work done by DeepMind to make AlphaFold2 is AMAZING, and I do not mean to take away from that in ANY WAY. However, for visualizing proteins for presentations, etc. it would be nifty to be able to make the IDRs look more 'IDR-like'. DODO does just that! What it does is identify the IDRs in the structure, predict the end-to-end distance for each IDR from its sequence using ALBATROSS (see https://github.com/idptools/sparrow) and rebuild the structure such that the IDRs are the approximate correct overall dimensions (see example below). If that visualization doesn't work for you, there are other options to make the IDRs more compact or expanded than is predicted from sequence. In addition, you can make a PDB with multiple IDRs in a single 'structure' and keep the folded domains fixed, which when opened in VMD makes something that looks like a simulation trajectroy (to be very clear, it is NOT the equivalent to an actual simulation trajectory but is really nice for visualizations).

Current Limitations

Rebuilding employs a simple random walk approach, so the actual IDR conformation is completely random and not scientifically useful. However, it is still quite useful for visualization of the protein overall.
At this moment, the rebuilt IDRs only contain alpha carbons whereas folded regions contain all atoms. This is mainly to do with the trickiness of placing all the atoms back after dramatically changing the IDR. I'm working on all-atom IDR generation, which I'm hoping to bring in the future (but this is not a gaurantee because it's tricky business and for visualization is not really necessary to be honest).
Because some IDRs only have alpha carbons, the order of bonds result in an 'unusual bond' warning in VMD.
Some visualization modes don't work in VMD. Licorice seems to work fine, tube and trace do not. I'm working on this.

How can you use DODO?

DODO is currently usable in Python and from the command-line.

Installation

If you are having any issues installing DODO, we have a few known potential install issues, so please keep reading this section! If you're having a problem that we don't have listed here, please let me know!

Note - to install DODO, you first need to have cython and numpy installed. To install cython and numpy, simpy run:

pip install cython numpy

Once you have cython and numpy installed, you should be able to install DODO.

To install DODO, run the following command from terminal:

pip install git+https://github.com/idptools/dodo.git

Additional known install failures

Over time we have discovered some ways that installing DODO can fail. One is related to your version of setuptools. Try running the following and then attempt to install DODO:

pip install setuptools --upgrade

Another fail that we are aware of involves wheel. To fix this, run the following:

pip install wheel --upgrade

DODO Python Functions

First import build from DODO.

from dodo import build

You can build new structures from from an existing PDB or just have DODO download the structure from the AF2 database. You can also generate PDBs of IDRs from sequence alone!

Generating a structure from the name alone

To have DODO download a structure from a protein name and alter the disordered regions, you can use the pdb_from_name() function. There are two required arguments unless you set graph=True: 1. the protein name as a string, 2. the out_path for where to save the PDB. If you set graph=True, you don't need to specify the outpath and DODO will just show your PDB in a 3D graph using matplotlib.

build.pdb_from_name('human p53', out_path='/Users/your_user_name/Desktop/my_cool_proteins/my_protein.pdb')

Additional usage:

All arguments for build.pdb_from_name() are as follows:
protein_name - required. The name of your protein as a string. Specifying the organism increases your chance of success.

out_path - optional if you set graph=True. Otherwise raises an exception. Where to save your protein structure file. Specify the file name here.

mode - optional. Default: 'predicted'. The predicted option predicts the end-to-end distance of your disordered regions from sequence and then makes the IDRs fit within that distance. Additional options are super_compact, compact, normal, expanded, super_expanded, max_expansion. These are pretty self explanatory.

num_models - optional. Default: 1. num_models lets you choose the number of models of IDRs to make for your protein. The folded domains are left in the same location for all models wherease the IDRs vary.

linear_placement - optional. Default: False. Whether to place the folded domains linearly for visualization.

just_fds - optional. Default: False. Setting just_fds to True will save out folded domains as individual PDBs with the name of your protein as specified in out_path with the coordinates of the fd in the file name. Formatted as protname_resStart_resEnd.pdb for each FD.

beta_for_FD_IDR - optional. Default: False. Whether to set beta values such that all IDRs = 0 and FDs=100 for visualization.

include_FD_atoms - optional. Default: True. Whether to include all atoms for the FDs. Only CA for IDRs for now.

CONECT_lines - optional. Default: True. Whether to included CONECT lines in the generated PDB. Makes visualization generally better.

verbose - optional. Default: True. Whether to show progress as structure is being made.

use_metapredict - optional. Default: False. This option lets you use metapredict to predicte the IDRs and folded regions. Although fairly accurate, it doesn't get the exact cutoffs for some regions and fails to predict small loops within large folded regions. The default is to use the number of atoms neighboring each atom in the AF2 structure. The default behavior is slower but works better.

graph - optional. Default: False. Setting this to True will pull up a really rough looking structure of your protein using the 3D graphing functionality in matplotlib. This is something I made when developing this to quickly look at structures. You shouldn't use this, but you can if you want. It's kind of fun TBH.

attempts_per_region - optional. Default: 20. Number of times to try and make each region.

attempts_per_coord - optional. Default: 2000. Number of times to try to generate each coordinate for each alpha carbon in the structure.

Modifying the IDR from an existing PDB file

You can also have DODO alter a pre-existing AF2 pdb file. The AF2 file should have all atom information (though this isn't required). There are two required arguments if not graphing: path_to_pdb : the path to your pdb file as a string and out_path : the path and filename of where to save your file. If you set graph=True, you don't need to specify the out_path.

build.pdb_from_pdb('/Users/your_user_name/Desktop/my_AF2_pdb.pdb', out_path='/Users/your_user_name/Desktop/my_AF2_PDB_DODO.pdb')

Additional usage:

All arguments for build.pdb_from_pdb() are as follows:
path_to_pdb - required. The filepath to your pdb as a string.