Find native simulations where the structure of each data point has an RMSD no greater than a max RMSD (when compared to frame0).
Input: (a) logfile (project/run/clone/last_time_in_ps/RMSD); (b) max RMSD (Angstroms).
Output: native_sims_<MAX_RMSD>A.txt
, containing project/run/clone/last_time_in_ps.
Logic: For each simulation (having a unique project/run/clone), if all data point has an RMSD not greater than MAX_RMSD, it is considered native. When this is the case, output the simulation's project, run, clone, and final timeframe (ps).
Given a list of native simulations, extract atomic contact data to an outfile.
Input: (a) list of native simulations (proj/run/clone/last_time_in_ps); (b) concatenated all_contact.con.
Output: atomic contacts data from native sims only.
Logic:
- Read in list of native simulations (project/run/clone)
- Read in concatenated contacts file
- If project/run/clone in this file matches one from native simulations list, print to output all contacts info
Input: (a) native simulation atomic contact data; (b) list of native simulations (proj/run/clone/last_time_in_ps).
Output: Unique contacts with percent of time this contact appears in native sims (collectively), mean atomic distances, and their standard deviations.
Logic:
- Calculate the total number of frames in all native simulations
- Read the list of native simulations (proj/run/clone/total_time_in_ps) and import the times
- Sum them up
- Divide by 100
- Import atomic contact data from native simulations
- Store this data in a hash/dictionary whose keys are
i
-j
atom numbers and whose corresponding values are arrays of atomic distances
- Store this data in a hash/dictionary whose keys are
- For each
i
-j
atom pair's atomic distances, find the following statistical quantities:- mean atomic distance,
- mean atomic distance standard deviation,
- percentage (total number of distances divided by total number of frames in all native simulations)
Input: (a) native sim contacts; (b) secondary structure keys file. Output: Same as input but add 2nd structure info.
Logic:
- Import structure keys from file
- ??? Read in the output from 01.b.v3 and add on secondary structure info
for each line based on
i
-j
atom pair
Input: TBD
Output: TBD
Logic: TBD
- Input
- a file with $i-$j contact, mean distance, mean + 2 * std dev, & structure info (output of 01.c)
- Concatenated file
- $P = 25
- $distance = 4.5
- $distanceNC = 6.0
- Output
- categorized contacts for all sim & time frames
- list of native contacts
- list of excluded contacts
- Read the file with summarized info of contacts from native sims.
- save $i-$j pair, percentage, mean distance, mean + 2*stddev, and 2nd structures.
- if the appearance percentage of a contact is smaller $P OR the distance is greater than $distanceNC (6.0), save that contact to an excluded list
- else, consider that contact native
- Read concatenated contacts file, for all contacts of a given timestamp:
- if a $i-$j distance smaller than $distance (4.5), and if $i-$j pair is on the native list (and not on excluded list), and if $i-$j distance is smaller than mean distance + 2 stddev (from native sims contacts list), count the number of NC (S1, S2, L1, L2, T)
- else, consider that contact non native
Logic behind summarizing all data's native contact information (implemented in the original "03" script written by A. Radcliff and K. Nguyen)
- Initializes all variables
- Reads in native contacts data from 01.c output
- in tandem: fills in hashmap for all {i,j} pairs
- define if on excluded or native contact lists (comparing to $P)
- Reads in
all_contact_P$proj
data file- checks the timestamp and extracts P/R/C/T
- foreach contact
- check distance and compares to $D = [mean + 2SD]
- (in each frame) assigning 2'/3' structure
- output line states contact status
- Every time a new timestamp is found, print the previous frame info