-
Notifications
You must be signed in to change notification settings - Fork 5
Input files
Two types of input files, or series of files, are needed to create an optimization problem in FALCON. One describes the network structure and the other describes the experimental conditions and their associated experimental data. Both can either be in the Excel format (.xls or .xlsx) or in the text format (.txt). Networks can also be described in .sif format (Simple Interaction File, used by CytoScape and others), while data files can be in .csv format, in which case we need 3 different files for the inputs, outputs, and errors on outputs measurements.
In the text format, the network model is defined by a tab-separated table where each line defines one interaction. The network interactions are defined in the first 3 columns: the first column defines the name of the source node, the second one defines the type of the interaction (‘->’ for activation and ‘-|’ for inhibition) and the third one defines the name of the target node, similar to the simple interaction file (.sif) format. Additional arguments for each interaction are defined in the fourth and fifth columns. The name of the parameter associated with the interaction is assigned in the fourth column. The fifth column defines the type of linear and non-linear interactions which can be defined by Boolean gates i.e. ‘A’ for the AND gate, ‘O’ for the OR gate, and ‘N’ for no Boolean gate which represent the convergence, redundancy and addition of signals, respectively. In the Excel format, users can define the descriptions of each network interaction in the same manner as in the text format. Columns have titles which are not included into the model description. The network should be a fully specified Dynamic Bayesian Network, i.e. a directed cyclic graph where three types of interactions are possible: additive interactions with one or more partners, Boolean interactions with two or more partners, which can be either of the OR or AND type. In all cases the interactions can be activating or inhibiting. It is possible to specify Boolean functions that comprise more than two inputs as these will be automatically expanded to their simple form by the toolbox.
Model definition of the Toy example containing AND gates (A) in Excel format:
Input | Reaction | Output | parameter | gate |
---|---|---|---|---|
A | -> | M | k1 | N |
B | -> | M | k2 | N |
C | -| | M | ki | N |
M | -> | T | kA | A |
N | -> | T | kA | A |
In the .sif format, the name of the parameter will be generated automatically from the name of the input and output, while all interactions will be considered additive (no Boolean gate) and no High-Low constrains will be considered. The model can be exported to the .xlsx format to be modified outside of Matlab.
In the text format (see Toy example), the data file is organized into 3 tab-separated columns. The first column defines the experimental conditions i.e. the combinations of inputs into the system. Each of the input and their associated state value (from 0 to 1) are separated by a comma (,) and the assignment of the following input is also separated by a comma (,) in the first column. Then, users can define the experimental readouts/data of the output nodes in the second column with the same format as the input(s). If there exist the variation indicator of the data e.g. standard deviations (S.D.), the users can also define them in the third column with the same format as the previous columns. Note that the third column can be omitted if the variance of the data is not known e.g. in a single replicate experiment.
In the Excel format, there exist 3 datasheets: ‘input’, ‘output’, and ‘error’. The tables have headers with the name of the nodes. The 'input' sheet contains in the first column the name of the experimental condition. The same data assignment format applies also for the ‘output’ and ‘error’ datasheets.
Note: in case there exist missing data points on the data table, they need to be filled by “NaN” (Not-a-Number) so that these data points will not be taken into the calculation of the fitting cost and they will not be displayed on the plots.
a. Input (sheet1)
A | B | C | N |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
1 | 1 | 0 | 0 |
1 | 1 | 1 | 0 |
1 | 1 | 1 | 1 |
b. Output (sheet2)
M | T |
---|---|
0 | 0 |
0.6 | 0 |
1 | 0 |
0.5 | 0.5 |
0.5 | 0.5 |
c. Error (sheet3)
M | T |
---|---|
0.05 | 0.05 |
0.05 | 0.05 |
0.05 | 0.05 |
0.05 | 0.05 |
0.05 | 0.05 |
The .csv format follows the .xls format, and the different sheets (inputs, outputs, errors) should be in three different .csv files. In this case the variable MeasFile should be a cell of three strings with each string the name of the input, output, error file, in that order.
The analysis of differential regulation between cell lines implies the construction of parallel networks which are then contextualized simultaneously. This allows different degrees of linkage between the parameters subsets, notably the inclusion of different assumptions about the shape of the parameter space in the form of a regularization of the objective function. When such a network structure is created, users have the option to fix one or more parameters between the contexts, meaning that the fixed parameters will be forced to be equal across all contexts. To do this, an additional .xls file is passed to the FalconMakeGlobalModel function, which lists such interactions. When no interaction needs to be fixed, an empty xls file is passed.