Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CarveMe ignores highly similar genes during the building of GPRs #180

Open
lazzarigioele opened this issue Jun 7, 2023 · 0 comments
Open

Comments

@lazzarigioele
Copy link

lazzarigioele commented Jun 7, 2023

Using carveme v1.5.2.

When the input proteome contains 2 or more very similar sequences, each of them well aligned (bitscore > 100) by Diamond on the same BiGG_gene (model.gene), only 1 of them is retained for subsequent processing.
This in my opinion is due to the following line contained in carveme/reconstruction/scoring.py :

gene2gene = annotation.query('score > 100') \
                          .sort_values(by='score', ascending=False) \
                          .groupby('BiGG_gene', as_index=False).apply(lambda x: x.iloc[0])

The x.iloc[0] selects just 1 of the input proteins per each BiGG_gene.
Suppose the Diamond output for 2 of my input proteins is the following:

query target bitscore evalue identity ppos qcovhsp scovhsp
gene_26222 iCN900.CD630_31350 288.0 2.620000e-97 47.7 70.7 96.0 98.3
gene_27280 iCN900.CD630_31350 283.0 1.240000e-95 47.9 70.5 99.7 98.3

gene_26222 will be retained for subsequent processing, while gene_27280 will be discarded (bitscore 288.0 wins over 283.0).
The consequence is that, when the GPR of the underlying reaction is constructed , only gene_26222 will appear.
In my opinion, it would be best to replace ‘gene_26222' in the GPR with '(gene_26222 or gene_27280)' .

@lazzarigioele lazzarigioele changed the title CarveMe builds wrong GPRs in particular cases CarveMe ignores highly similar genes during the building of GPRs Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant