Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find homologes in human and mouse training data #32

Open
1 of 5 tasks
teresa-m opened this issue Oct 27, 2021 · 2 comments
Open
1 of 5 tasks

Find homologes in human and mouse training data #32

teresa-m opened this issue Oct 27, 2021 · 2 comments
Assignees

Comments

@teresa-m
Copy link
Member

teresa-m commented Oct 27, 2021

Idea: find a few examples of homology RRIs in human and mouse training data. Homologe x of the mouse can be then tested if it will be correctly predicted by the 'human model' and vice versa.

TODO:

  • Isolate the interaction partners within the positive training instances
  • Convert IDs
  • search for homologs: http://www.informatics.jax.org/homology.shtml
  • compare the two organisms list for which of the homologs are in both organisms
  • check the RRI, hybrid/ interacting sequences
@pavanvidem
Copy link
Member

pavanvidem commented Oct 27, 2021

First, it is better to find some evolutionarily conserved interactions from the literature. For eg, U1 snRNA and MALAT1 lncRNA interaction is conserved between human and mouse (see original PARIS paper). It is enough to have a handful of such interactions to prove that the models are robust enough to detect cross-species conserved interactions.

@domonik
Copy link
Collaborator

domonik commented Mar 7, 2022

Find homologs via:
https://rest.ensembl.org/
https://rest.ensembl.org/documentation/info/homology_ensemblgene

only possible for genes not for transcripts. Thus, needs to select most likey transcript. eg. pairwise alignment.
It is possible to extract all transcripts via the api:
https://rest.ensembl.org/documentation/info/overlap_id
just like the provided example with the flag
feature=transcript
and afterwards extract the transcript sequence.
https://rest.ensembl.org/documentation/info/sequence_id
make sure to get the spliced version via:
/sequence/id?type=cdna

Disadvantage:
Uses a lot of api calls and might take some time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants