You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on the reconstruction of metabolic networks from public genomes of the NCBI database, I have found that gapseq (version 1.2 with subcommand doall) uses region of the genome that are tagged as pseudogene during reaction inference. I identified this by using the genome sequence as input to gapseq and by comparing (here by searching for overlap) the region predicted to be associated with a reaction to the genes present in the GenBank file of the organism at the same location. When the corresponding gene has a pseudo qualifier, I considered that the reaction was associated with a pseudogene.
There are a lot of variations, some species have no matches with pseudogene regions and other have hundreds of reactions associated with these regions.
It seems logical to find them, as pseudogene regions still contain some sequences similar to the ones of functional genes that can match when tblasting them. In my previous team, we encounter a similar issue when developing the method AuCoMe.
Do you think it could be possible to identify and label these reactions as associated with pseudogenes? Or at least put a warning when using genome sequence file as input?
Thoughts on pseudogenes and metabolism
For me this raises the question of whether taking into account these regions or not. Because, yes, they have been identified (often automatically) as pseudogene regions but these predictions could be taken with caution. Especially for two points:
the notion of pseudo-pseudogenes, where predicted pseudogenes show an activity. For example: (1) regulation activity (Pink et al. 2011), (2) translation activity (Prieto-Godino et al. 2016) or (3) protein expression (Feng et al. 2022).
the notion of protogenes, where genes can arise from genomic sequence and could be misinterpreted as pseudogenes (Carvunis et al. 2012)).
So I think it could be interesting to label these reaction as they can show (1) a loss of (or inactive) function, (2) a modification of this function but that can still be performed or (3) a future potential active function. But maybe they should not be present in the model that will be used to make prediction (such as with Flux Balance Analysis) due to the uncertainty about them?
Best regards,
Arnaud Belcour.
The text was updated successfully, but these errors were encountered:
Hello,
Technical part
While working on the reconstruction of metabolic networks from public genomes of the NCBI database, I have found that gapseq (
version 1.2
with subcommanddoall
) uses region of the genome that are tagged as pseudogene during reaction inference. I identified this by using the genome sequence as input to gapseq and by comparing (here by searching for overlap) the region predicted to be associated with a reaction to the genes present in the GenBank file of the organism at the same location. When the corresponding gene has apseudo
qualifier, I considered that the reaction was associated with a pseudogene.There are a lot of variations, some species have no matches with pseudogene regions and other have hundreds of reactions associated with these regions.
It seems logical to find them, as pseudogene regions still contain some sequences similar to the ones of functional genes that can match when tblasting them. In my previous team, we encounter a similar issue when developing the method AuCoMe.
Do you think it could be possible to identify and label these reactions as associated with pseudogenes? Or at least put a warning when using genome sequence file as input?
Thoughts on pseudogenes and metabolism
For me this raises the question of whether taking into account these regions or not. Because, yes, they have been identified (often automatically) as pseudogene regions but these predictions could be taken with caution. Especially for two points:
the notion of pseudo-pseudogenes, where predicted pseudogenes show an activity. For example: (1) regulation activity (Pink et al. 2011), (2) translation activity (Prieto-Godino et al. 2016) or (3) protein expression (Feng et al. 2022).
the notion of protogenes, where genes can arise from genomic sequence and could be misinterpreted as pseudogenes (Carvunis et al. 2012)).
So I think it could be interesting to label these reaction as they can show (1) a loss of (or inactive) function, (2) a modification of this function but that can still be performed or (3) a future potential active function. But maybe they should not be present in the model that will be used to make prediction (such as with Flux Balance Analysis) due to the uncertainty about them?
Best regards,
Arnaud Belcour.
The text was updated successfully, but these errors were encountered: