-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bionty Ontology .from_values behaves strangely #2255
Comments
To reproduce: !lamin init --storage run-tests --schema bionty,wetlab
import lamindb as ln
import bionty as bt
mouse = bt.Organism.from_source(name="mouse").save()
bt.Gene(ensembl_gene_id="ENSMUSG00000035310", organism=mouse).save()
bt.Gene.from_source(ensembl_gene_id="ENSG00000139618").save()
# should return 1 val
bt.Gene.from_values(["ENSMUSG00000035310"], field=bt.Gene.ensembl_gene_id, organism=mouse)
# RecordList([Gene(uid='6KCoBil6asUg', ensembl_gene_id='ENSMUSG00000035310', created_by_id=1, organism_id=1, created_at=2024-12-19 10:34:19 UTC)])
# should also return 1 val
bt.Gene.from_values(["ENSG00000139618"], field=bt.Gene.ensembl_gene_id, organism=mouse)
# RecordList([Gene(uid='1DQZiYQ1wP6x', symbol='BRCA2', ensembl_gene_id='ENSG00000139618', ncbi_gene_ids='675', biotype='protein_coding', synonyms='FANCD|FANCD1|FAD|BRCC2|FACD|XRCC11|FAD1', description='BRCA2 DNA repair associated ', created_by_id=1, source_id=11, organism_id=2, created_at=2024-12-19 10:30:49 UTC)])
# but this should now return 2 val
bt.Gene.from_values(["ENSMUSG00000035310", "ENSG00000139618"], field=bt.Gene.ensembl_gene_id, organism=mouse)
# RecordList([Gene(uid='6KCoBil6asUg', ensembl_gene_id='ENSMUSG00000035310', created_by_id=1, organism_id=1, created_at=2024-12-19 10:34:19 UTC)]) |
This example itself is wrong, ENSMUSG00000035310 is a mouse gene, but ENSG00000139618 is a human gene. So bt.Gene.from_values(["ENSMUSG00000035310", "ENSG00000139618"], field=bt.Gene.ensembl_gene_id, organism=mouse) will only return a single mouse gene because you specified the organism. There's no bug. |
This is a bug, previously the source was determined by source of the existing records. 'lactocyte' is a term in a newer version of ontology that's set as the default, but not linked to the existing records. I fixed it here: #2310 |
This example itself is wrong, ENSMUSG00000035310 is a mouse gene, but ENSG00000139618 is a human gene. So bt.Gene.from_values(["ENSMUSG00000035310", "ENSG00000139618"], field=bt.Gene.ensembl_gene_id, organism=mouse) will only return a single mouse gene because you specified the organism. There's no bug. Is there a way that Lukas could have been made aware of this through the API? I fear that's difficult but many users might not have the awareness to readily spot differences by eye. |
There's a specific parameter |
Also occurred here:
add_new_from_var_index
leads to unique key value error because it attempts to save values with no source again #2293This causes problems for curation.
Example during census curation in
laminlabs/cellxgene
:results in
So this returns 2 cell types that are in the instance ("adipocyte", "subcutaneous adipocyte"), however here in the curator code it is clearly assumed that
.from_values
should return both existing and public records.When there are no existing records, only public,
.from_values
returns themresults in
These are from public.
So now i believe we have a problem because
.from_values
returns public records only when the provided list has only public records, if they are mixed with existing, when only existing are returned and thus curators can't add public records.The text was updated successfully, but these errors were encountered: