Skip to content

Commit

Permalink
Merge pull request #131 from BackofenLab/intarna-2
Browse files Browse the repository at this point in the history
v2.3.1
  • Loading branch information
martin-raden authored Nov 23, 2018
2 parents 1f974bf + f7b21f7 commit dc2f77c
Show file tree
Hide file tree
Showing 13 changed files with 487 additions and 58 deletions.
21 changes: 21 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,4 +1,25 @@

version 2.3.1

181108 Martin Raden :
+ IntaRNA/PredictionTrackerSpotProbAll : exhaustively computes and prints all
spot probabilities (ie for all intermolecular index pairs)
* bin/CommandLineParsing :
+ registration and documentation of PredictionTrackerSpotProbAll usage
* README.md :
* exhaustive spot prob calculation documented

180907 Martin Raden :
* IntaRNA/OutputHandlerCsv :
+ 'hybridDP|Bfull' output for full sequence lengths
* IntaRNA/Interaction :
* dotBar() :
* dotBracket() :
+ "fullLength" argument to trigger full-length or interaction-site-only
output
* README.md :
+ 'hybridD(P|B)full' documentation

version 2.3.0

180906 Martin Raden :
Expand Down
83 changes: 62 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ The following topics are covered by this documentation:
- [Energy parameters and temperature](#energy)
- [Additional output files](#outFiles)
- [Minimal energy profiles](#profileMinE)
- [Spot probability profiles](#profileSpotProb) using partition functions
- [Minimal energy for all intermolecular index pairs](#pairMinE)
- [Spot probability profiles](#profileSpotProb) using partition functions
- [Interaction probabilities for interaction spots of interest](#spotProb)
- [Accessibility and unpaired probabilities](#accessibility)
- [Local versus global unpaired probabilities](#accLocalGlobal)
Expand Down Expand Up @@ -609,7 +609,7 @@ This information can be incorporated into IntaRNA predictions by providing
*explicit seed information*. To this end, the `--seedTQ` parameter can be used.
It takes a comma-separated list of seed string encodings in the format
`startTbpsT&startQbpsQ`, which is in the same format as the IntaRNA `hybridDB`
output (see below), i.e. e.g. `--seedTQ='4|||.|&7||.||'`
output (see [below](#outModeCsv)), i.e. e.g. `--seedTQ='4|||.|&7||.||'`
(ensure you quote the seed encoding to avoid a shell interpretation of the pipe symbol '|')
to encode a seed interaction like
the following
Expand Down Expand Up @@ -772,8 +772,10 @@ are
- `end1` : end index of hybrid in seq1
- `start2` : start index of hybrid in seq2
- `end2` : end index of hybrid in seq2
- `hybridDP` : hybrid in VRNA dot-bracket notation
- `hybridDB` : hybrid in dot-bar notation
- `hybridDP` : hybrid in VRNA dot-bracket notation (interaction sites only)
- `hybridDPfull` : hybrid in VRNA dot-bracket notation (full sequence length)
- `hybridDB` : hybrid in dot-bar notation (interactin sites only)
- `hybridDBfull` : hybrid in dot-bar notation (full sequence length)
- `E` : overall hybridization energy
- `ED1` : ED value of seq1
- `ED2` : ED value of seq2
Expand Down Expand Up @@ -974,23 +976,6 @@ mfe interaction close to the 5'-end of the molecule.



<br />
<a name="profileSpotProb" />

### Spot probability profiles

Similarly to (minimal energy profiles)[#profileMinE], it is also possible to
compute position-wise probabilities how likely a position is covered by an
interaction, i.e. its *spot probability*. To the end, we compute for each
position $i$ the partition function $Zi$ of all interactions covering $i$.
Given the overall partition function *Z* including all possible interactions,
the position-speficit spot probability for *i* is given by *Zi/Z*.

Such profiles can be generated using `--out=qSpotProb:MYPROFILEFILE.csv` or
`--out=tSpotProb:...` for the query/target sequence respectively and independently.



<br />
<a name="pairMinE" />

Expand Down Expand Up @@ -1025,6 +1010,26 @@ mfe-site in the second sequence and are thus less likely to occure.



<br />
<a name="profileSpotProb" />

### Spot probability profiles

Similarly to [minimal energy profiles](#profileMinE), it is also possible to
compute position-wise probabilities how likely a position is covered by an
interaction, i.e. its *spot probability*. To the end, we compute for each
position *i* the partition function *Zi* of all interactions covering *i*.
Given the overall partition function *Z* including all possible interactions,
the position-speficit spot probability for *i* is given by *Zi/Z*.

Such profiles can be generated using `--out=qSpotProb:MYPROFILEFILE.csv` or
`--out=tSpotProb:...` for the query/target sequence respectively and independently.

Note, instead of a file you can also write the profile to stream using `STDOUT`
or `STDERR` instead of a file name.



<br />
<a name="spotProb" />

Expand Down Expand Up @@ -1068,6 +1073,42 @@ Nevertheless, since the Boltzmann probabilities are dominated by the low(est)
energy interactions, we consider the probability estimates as meaningful!


#### Spot probabilities for all intermolecular index pairs

If interested in all intermolecular index pair combinations (all spots), you
can exclude the list from the call and only specify the output file/stream by
`--out="spotProb:MYSPOTPROBFILE.csv"`. The resulting semicolon-separated table provides the spot
probability for each index pair combination, as shown below.

```[bash]
--> IntaRNA --out="spotProb:STDOUT" -t AAACACCCCCGGUGGUUUGG -q AAACACCCCCGGUGGUUUGG --energy=B -m E --noSeed --out=/dev/null
spotProb;A_1;A_2;A_3;C_4;A_5;C_6;C_7;C_8;C_9;C_10;G_11;G_12;U_13;G_14;G_15;U_16;U_17;U_18;G_19;G_20
A_1;3.29634e-07;1.04259e-06;1.88581e-06;3.92703e-06;6.01889e-06;1.09014e-05;1.98091e-05;3.26691e-05;4.97285e-05;6.40822e-05;0.00114721;0.00220701;0.00540899;0.00237187;0.00360255;0.00701572;0.00700194;0.00417576;0;0
A_2;1.04259e-06;3.29756e-06;6.59127e-06;1.23155e-05;2.15023e-05;3.65021e-05;6.65094e-05;0.000109528;0.000164207;0.000210854;0.00272534;0.00524301;0.00906912;0.00467934;0.00760322;0.0122582;0.0114086;0.00602621;0;0
A_3;1.88581e-06;6.59127e-06;1.46527e-05;2.91728e-05;5.38079e-05;9.40555e-05;0.000179154;0.000297828;0.000439376;0.000558681;0.005502;0.0104376;0.0143365;0.00670991;0.0118697;0.0159918;0.0123526;0.00593264;0;0
C_4;3.92703e-06;1.23155e-05;2.91728e-05;7.35161e-05;0.000143682;0.000297768;0.000618853;0.00101766;0.00145303;0.0017689;0.0102492;0.0153404;0.015348;0.0123563;0.0147842;0.0126304;0.00887796;0.00667471;0.00973898;0.0075106
A_5;6.01889e-06;2.15023e-05;5.38079e-05;0.000143682;0.00027948;0.00055287;0.00115804;0.00206338;0.00313617;0.00403553;0.0127348;0.0189567;0.022699;0.0126698;0.0146316;0.0153814;0.0102196;0.00793081;0.00690787;0.00474379
C_6;1.09014e-05;3.65021e-05;9.40555e-05;0.000297768;0.00055287;0.00127294;0.00278079;0.00507586;0.00801924;0.0102464;0.0234918;0.0277373;0.0219681;0.0173597;0.0181085;0.0126097;0.00712746;0.00444397;0.0150453;0.0139076
C_7;1.98091e-05;6.65094e-05;0.000179154;0.000618853;0.00115804;0.00278079;0.00611847;0.0114976;0.0187521;0.0245813;0.0392287;0.0365652;0.0245582;0.0257423;0.0194922;0.00880615;0.00691222;0.00878555;0.0299224;0.0270425
C_8;3.26691e-05;0.000109528;0.000297828;0.00101766;0.00206338;0.00507586;0.0114976;0.0224763;0.0380841;0.051392;0.0716197;0.0587535;0.0346146;0.0429812;0.0266964;0.00717352;0.00914256;0.0183812;0.0585854;0.0521578
C_9;4.97285e-05;0.000164207;0.000439376;0.00145303;0.00313617;0.00801924;0.0187521;0.0380841;0.0671115;0.0930764;0.115935;0.0830931;0.045061;0.0572892;0.0333122;0.0068722;0.0139344;0.0357869;0.0859975;0.0736521
C_10;6.40822e-05;0.000210854;0.000558681;0.0017689;0.00403553;0.0102464;0.0245813;0.051392;0.0930764;0.12986;0.14737;0.0865541;0.0493791;0.0589481;0.030814;0.00654186;0.0174543;0.0481296;0.085893;0.0665433
G_11;0.00114721;0.00272534;0.005502;0.0102492;0.0127348;0.0234918;0.0392287;0.0716197;0.115935;0.14737;0.136251;0.0740332;0.0512369;0.0438958;0.0218223;0.0117975;0.0219606;0.0500723;0.0522355;0.033636
G_12;0.00220701;0.00524301;0.0104376;0.0153404;0.0189567;0.0277373;0.0365652;0.0587535;0.0830931;0.0865541;0.0740332;0.0495326;0.0356528;0.0244903;0.0175123;0.0140905;0.0164346;0.0207754;0.0228082;0.0156495
U_13;0.00540899;0.00906912;0.0143365;0.015348;0.022699;0.0219681;0.0245582;0.0346146;0.045061;0.0493791;0.0512369;0.0356528;0.0251168;0.0207634;0.017577;0.0111258;0.010028;0.012619;0.0218789;0.0161306
G_14;0.00237187;0.00467934;0.00670991;0.0123563;0.0126698;0.0173597;0.0257423;0.0429812;0.0572892;0.0589481;0.0438958;0.0244903;0.0207634;0.0122356;0.00753315;0.0078345;0.00965907;0.0128517;0.0109001;0.0059406
G_15;0.00360255;0.00760322;0.0118697;0.0147842;0.0146316;0.0181085;0.0194922;0.0266964;0.0333122;0.030814;0.0218223;0.0175123;0.017577;0.00753315;0.00840878;0.00965183;0.00775696;0.0062456;0.00442768;0.00259566
U_16;0.00701572;0.0122582;0.0159918;0.0126304;0.0153814;0.0126097;0.00880615;0.00717352;0.0068722;0.00654186;0.0117975;0.0140905;0.0111258;0.0078345;0.00965183;0.00685222;0.00365991;0.00195865;0.00765631;0.00707759
U_17;0.00700194;0.0114086;0.0123526;0.00887796;0.0102196;0.00712746;0.00691222;0.00914256;0.0139344;0.0174543;0.0219606;0.0164346;0.010028;0.00965907;0.00775696;0.00365991;0.00362107;0.00625881;0.0137988;0.0117224
U_18;0.00417576;0.00602621;0.00593264;0.00667471;0.00793081;0.00444397;0.00878555;0.0183812;0.0357869;0.0481296;0.0500723;0.0207754;0.012619;0.0128517;0.0062456;0.00195865;0.00625881;0.0161844;0.0212005;0.0133051
G_19;0;0;0;0.00973898;0.00690787;0.0150453;0.0299224;0.0585854;0.0859975;0.085893;0.0522355;0.0228082;0.0218789;0.0109001;0.00442768;0.00765631;0.0137988;0.0212005;0.0158313;0.00807077
G_20;0;0;0;0.0075106;0.00474379;0.0139076;0.0270425;0.0521578;0.0736521;0.0665433;0.033636;0.0156495;0.0161306;0.0059406;0.00259566;0.00707759;0.0117224;0.0133051;0.00807077;0.00396313
```

This data can be visualized in heatmaps as discussed for the [minimal energy heatmap](#pairMinE).



<br />
<a name="accessibility" />

Expand Down
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

AC_PREREQ([2.65])
# 5 argument version only available with aclocal >= 2.64
AC_INIT([IntaRNA], [2.3.0], [], [intaRNA], [http://www.bioinf.uni-freiburg.de] )
AC_INIT([IntaRNA], [2.3.1], [], [intaRNA], [http://www.bioinf.uni-freiburg.de] )

# minimal required version of the boost library
BOOST_REQUIRED_VERSION=1.50.0
Expand Down
57 changes: 43 additions & 14 deletions src/IntaRNA/Interaction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -131,36 +131,65 @@ operator= ( const InteractionRange & range )

std::string
Interaction::
dotBar( const Interaction & i )
dotBar( const Interaction & i, const bool fullLength )
{
#if INTARNA_IN_DEBUG_MODE
if (!i.isValid())
throw std::runtime_error("Interaction::dotBar("+toString(i)+") not valid!");
#endif
// compile overall dot-bracket representation
return toString(i.basePairs.begin()->first +1)
+ dotSomething(i.basePairs.begin(), i.basePairs.end(), true, '|')
+"&"
+toString(i.basePairs.rbegin()->second +1)
+ dotSomething(i.basePairs.rbegin(), i.basePairs.rend(), false, '|')
;

// check whether to do full length output
if (fullLength) {
// compile dot-bar representation for full sequences
return "1"
+ std::string(i.basePairs.begin()->first, '.' ) // leading unpaired s1
+ dotSomething(i.basePairs.begin(), i.basePairs.end(), true, '|') // s1 structure
+ std::string(i.s1->size() - i.basePairs.rbegin()->first -1, '.' ) // trailing unpaired s1
+ "&"
+ "1"
+ std::string(i.basePairs.rbegin()->second, '.' ) // trailing unpaired s2
+ dotSomething(i.basePairs.rbegin(), i.basePairs.rend(), false, '|') // s2 structure
+ std::string( i.s2->size() - i.basePairs.begin()->second -1, '.' ) // leading unpaired s2
;
} else {
// compile dot-bar representation for interacting subsequences only
return toString(i.basePairs.begin()->first +1)
+ dotSomething(i.basePairs.begin(), i.basePairs.end(), true, '|')
+"&"
+toString(i.basePairs.rbegin()->second +1)
+ dotSomething(i.basePairs.rbegin(), i.basePairs.rend(), false, '|')
;
}
}

////////////////////////////////////////////////////////////////////////////

std::string
Interaction::
dotBracket( const Interaction & i, const char symOpen, const char symClose )
dotBracket( const Interaction & i, const char symOpen, const char symClose, const bool fullLength )
{
#if INTARNA_IN_DEBUG_MODE
if (!i.isValid())
throw std::runtime_error("Interaction::dotBracket("+toString(i)+") not valid!");
#endif
// compile overall dot-bracket representation
return dotSomething(i.basePairs.begin(), i.basePairs.end(), true, symOpen)
+"&"
+ dotSomething(i.basePairs.rbegin(), i.basePairs.rend(), false, symClose)
;

if (fullLength) {
// compile dot-bracket representation for full sequence lengths
return std::string(i.basePairs.begin()->first, '.' ) // leading unpaired s1
+ dotSomething(i.basePairs.begin(), i.basePairs.end(), true, symOpen) // s1 structure
+ std::string(i.s1->size() - i.basePairs.rbegin()->first -1, '.' ) // trailing unpaired s1
+"&"
+ std::string(i.s2->size() - i.basePairs.rbegin()->second -1, '.' ) // leading unpaired s2
+ dotSomething(i.basePairs.rbegin(), i.basePairs.rend(), false, symClose) // s2 structure
+ std::string(i.basePairs.rbegin()->second, '.' ) // trailing unpaired s2
;
} else {
// compile dot-bracket representation for interacting subsequences only
return dotSomething(i.basePairs.begin(), i.basePairs.end(), true, symOpen)
+"&"
+ dotSomething(i.basePairs.rbegin(), i.basePairs.rend(), false, symClose)
;
}
}

////////////////////////////////////////////////////////////////////////////
Expand Down
9 changes: 7 additions & 2 deletions src/IntaRNA/Interaction.h
Original file line number Diff line number Diff line change
Expand Up @@ -200,11 +200,14 @@ class Interaction {
* needed yielding still a valid encoding (for the swapped sequences).
*
* @param i interaction to encode
* @param fullLength whether or not full length base pair annotation is to
* be printed
*
* @return the dot-bracket encoding
*/
static
std::string
dotBar( const Interaction & i );
dotBar( const Interaction & i, const bool fullLength = false );

/**
* Produces a dot-bracket encoding of the interaction in VRNA style in the
Expand All @@ -221,12 +224,14 @@ class Interaction {
* @param i interaction to encode
* @param symOpen the symbol to be used for base pairs in dot-bracket-1
* @param symClose the symbol to be used for base pairs in dot-bracket-2
* @param fullLength whether or not full length base pair annotation is to
* be printed
*
* @return the VRNA-styled dot-bracket encoding
*/
static
std::string
dotBracket( const Interaction & i, const char symOpen = '(', const char symClose = ')');
dotBracket( const Interaction & i, const char symOpen = '(', const char symClose = ')', const bool fullLength = false);


protected:
Expand Down
2 changes: 2 additions & 0 deletions src/IntaRNA/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ libIntaRNA_a_HEADERS = \
PredictionTrackerPairMinE.h \
PredictionTrackerProfileMinE.h \
PredictionTrackerSpotProb.h \
PredictionTrackerSpotProbAll.h \
PredictionTrackerProfileSpotProb.h \
Predictor.h \
PredictorMaxProb.h \
Expand Down Expand Up @@ -99,6 +100,7 @@ libIntaRNA_a_SOURCES = \
PredictionTrackerPairMinE.cpp \
PredictionTrackerProfileMinE.cpp \
PredictionTrackerSpotProb.cpp \
PredictionTrackerSpotProbAll.cpp \
PredictionTrackerProfileSpotProb.cpp \
PredictorMaxProb.cpp \
PredictorMfe.cpp \
Expand Down
8 changes: 8 additions & 0 deletions src/IntaRNA/OutputHandlerCsv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,14 @@ add( const Interaction & i )
outTmp <<Interaction::dotBar( i );
break;

case hybridDPfull:
outTmp <<Interaction::dotBracket( i, '(', ')', true );
break;

case hybridDBfull:
outTmp <<Interaction::dotBar( i, true );
break;

case E:
outTmp <<i.energy;
break;
Expand Down
4 changes: 4 additions & 0 deletions src/IntaRNA/OutputHandlerCsv.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ class OutputHandlerCsv : public OutputHandler
end2, //!< end index of hybrid in seq2
hybridDP, //!< hybrid in VRNA dot-bracket notation
hybridDB, //!< hybrid in dot-bar notation
hybridDPfull, //!< hybrid in VRNA dot-bracket notation for full sequence lengths
hybridDBfull, //!< hybrid in dot-bar notation for full sequence lengths
E, //!< overall hybridization energy
ED1, //!< ED value of seq1
ED2, //!< ED value of seq2
Expand Down Expand Up @@ -101,6 +103,8 @@ class OutputHandlerCsv : public OutputHandler
colType2string[subseqDB] = "subseqDB";
colType2string[hybridDP] = "hybridDP";
colType2string[hybridDB] = "hybridDB";
colType2string[hybridDPfull] = "hybridDPfull";
colType2string[hybridDBfull] = "hybridDBfull";
colType2string[E] = "E";
colType2string[ED1] = "ED1";
colType2string[ED2] = "ED2";
Expand Down
14 changes: 7 additions & 7 deletions src/IntaRNA/PredictionTrackerPairMinE.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ class PredictionTrackerPairMinE: public PredictionTracker
public:

/**
* Constructs a PredictionTracker that collected for each positions of a
* sequence the minimal energy of any interaction covering this position.
* Constructs a PredictionTracker that collected for index pair of two
* sequences the minimal energy of any interaction covering this position.
*
* Note, if a filename is provided, its stream is closed on destruction of
* this object!
Expand All @@ -45,8 +45,8 @@ class PredictionTrackerPairMinE: public PredictionTracker
);

/**
* Constructs an PredictionTracker that collected for each positions of a
* sequence the minimal energy of any interaction covering this position.
* Constructs a PredictionTracker that collected for index pair of two
* sequences the minimal energy of any interaction covering this position.
*
* Note, the stream is NOT closed nor deleted on destruction of this object!
*
Expand All @@ -63,13 +63,13 @@ class PredictionTrackerPairMinE: public PredictionTracker
);

/**
* destruction: write the profile(s) to the according streams.
* destruction: write the pair information to the according stream.
*/
virtual ~PredictionTrackerPairMinE();


/**
* Updates the profile information for each Predictor.updateOptima() call.
* Updates the minE information for each Predictor.updateOptima() call.
*
* @param i1 the index of the first sequence interacting with i2
* @param j1 the index of the first sequence interacting with j2
Expand Down Expand Up @@ -106,7 +106,7 @@ class PredictionTrackerPairMinE: public PredictionTracker
E2dMatrix pairMinE;

/**
* Writes profile data to stream.
* Writes minE data to stream.
*
* @param out the output stream to write to
* @param pairMinE the minE data to write
Expand Down
Loading

0 comments on commit dc2f77c

Please sign in to comment.