You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the input proteome contains 2 or more very similar sequences, each of them well aligned (bitscore > 100) by Diamond on the same BiGG_gene (model.gene), only 1 of them is retained for subsequent processing.
This in my opinion is due to the following line contained in carveme/reconstruction/scoring.py :
The x.iloc[0] selects just 1 of the input proteins per each BiGG_gene.
Suppose the Diamond output for 2 of my input proteins is the following:
query
target
bitscore
evalue
identity
ppos
qcovhsp
scovhsp
gene_26222
iCN900.CD630_31350
288.0
2.620000e-97
47.7
70.7
96.0
98.3
gene_27280
iCN900.CD630_31350
283.0
1.240000e-95
47.9
70.5
99.7
98.3
gene_26222 will be retained for subsequent processing, while gene_27280 will be discarded (bitscore 288.0 wins over 283.0).
The consequence is that, when the GPR of the underlying reaction is constructed , only gene_26222 will appear.
In my opinion, it would be best to replace ‘gene_26222' in the GPR with '(gene_26222 or gene_27280)' .
The text was updated successfully, but these errors were encountered:
lazzarigioele
changed the title
CarveMe builds wrong GPRs in particular cases
CarveMe ignores highly similar genes during the building of GPRs
Jun 15, 2023
Using carveme v1.5.2.
When the input proteome contains 2 or more very similar sequences, each of them well aligned (bitscore > 100) by Diamond on the same BiGG_gene (model.gene), only 1 of them is retained for subsequent processing.
This in my opinion is due to the following line contained in carveme/reconstruction/scoring.py :
The x.iloc[0] selects just 1 of the input proteins per each BiGG_gene.
Suppose the Diamond output for 2 of my input proteins is the following:
gene_26222 will be retained for subsequent processing, while gene_27280 will be discarded (bitscore 288.0 wins over 283.0).
The consequence is that, when the GPR of the underlying reaction is constructed , only gene_26222 will appear.
In my opinion, it would be best to replace ‘gene_26222' in the GPR with '(gene_26222 or gene_27280)' .
The text was updated successfully, but these errors were encountered: