CarveMe ignores highly similar genes during the building of GPRs #180

lazzarigioele · 2023-06-07T09:20:31Z

Using carveme v1.5.2.

When the input proteome contains 2 or more very similar sequences, each of them well aligned (bitscore > 100) by Diamond on the same BiGG_gene (model.gene), only 1 of them is retained for subsequent processing.
This in my opinion is due to the following line contained in carveme/reconstruction/scoring.py :

gene2gene = annotation.query('score > 100') \
                          .sort_values(by='score', ascending=False) \
                          .groupby('BiGG_gene', as_index=False).apply(lambda x: x.iloc[0])

The x.iloc[0] selects just 1 of the input proteins per each BiGG_gene.
Suppose the Diamond output for 2 of my input proteins is the following:

query	target	bitscore	evalue	identity	ppos	qcovhsp	scovhsp
gene_26222	iCN900.CD630_31350	288.0	2.620000e-97	47.7	70.7	96.0	98.3
gene_27280	iCN900.CD630_31350	283.0	1.240000e-95	47.9	70.5	99.7	98.3

gene_26222 will be retained for subsequent processing, while gene_27280 will be discarded (bitscore 288.0 wins over 283.0).
The consequence is that, when the GPR of the underlying reaction is constructed , only gene_26222 will appear.
In my opinion, it would be best to replace ‘gene_26222' in the GPR with '(gene_26222 or gene_27280)' .

lazzarigioele changed the title ~~CarveMe builds wrong GPRs in particular cases~~ CarveMe ignores highly similar genes during the building of GPRs Jun 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CarveMe ignores highly similar genes during the building of GPRs #180

CarveMe ignores highly similar genes during the building of GPRs #180

lazzarigioele commented Jun 7, 2023 •

edited

Loading

CarveMe ignores highly similar genes during the building of GPRs #180

CarveMe ignores highly similar genes during the building of GPRs #180

Comments

lazzarigioele commented Jun 7, 2023 • edited Loading

lazzarigioele commented Jun 7, 2023 •

edited

Loading