-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add Curator for tiledbsoma stores #2228
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2228 +/- ##
==========================================
+ Coverage 92.71% 92.90% +0.18%
==========================================
Files 55 55
Lines 6947 7159 +212
==========================================
+ Hits 6441 6651 +210
- Misses 506 508 +2 ☔ View full report in Codecov by Sentry. |
74f6188
to
fcef038
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good from my end. The bigger questions are for @sunnyosun and @Zethson about conventions for the user-facing API design and code organizations.
This is another thread.
@sunnyosun should also review this PR though.
lamindb/_curate.py
Outdated
in `.standardize` or `.add_new_from`, see the output of `.var_index`. | ||
categoricals: A dictionary mapping ``.obs`` columns to a registry field. | ||
obs_columns: The registry field for mapping the ``.obs`` columns. | ||
using_key: A reference LaminDB instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's please not expose using_key
for any new Curator class. It should disappear from all.
(Unless you tell me we absolutely need this @Zethson @sunnyosun, I continue to think that this is a horribly complicated design choice and a very bad name for an argument. I'll make a new GitHub issue on this topic or try to find where we previously discussed this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's hide it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful to extend the CxG curator. By default, it uses the laminlabs/cellxgene
instance but when extending it (e.g. with perturbations) you want to now not curate against laminlabs/cellxgene
but your own instance.
I've also never liked the name but one of you wanted it renamed from using
to using_key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the description of the argument, put it to the bottom.
This is impressive, Sergei! All my comments are minor. Last request is to populate the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I recognized quite many parts of the code and we can look out for ways to consolidate them in the future I hope.
lamindb/_curate.py
Outdated
in `.standardize` or `.add_new_from`, see the output of `.var_index`. | ||
categoricals: A dictionary mapping ``.obs`` columns to a registry field. | ||
obs_columns: The registry field for mapping the ``.obs`` columns. | ||
using_key: A reference LaminDB instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's hide it.
lamindb/_curate.py
Outdated
obs_columns: The registry field for mapping the names of the `.obs` columns. | ||
organism: The organism name. | ||
sources: A dictionary mapping `.obs` columns to Source records. | ||
exclude: A dictionary mapping column names to values to exclude. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclude: A dictionary mapping column names to values to exclude. | |
exclude: A dictionary mapping column names to values to exclude from validation. | |
When specific :class:~bionty.Source instances are pinned and may lack default values (e.g., "unknown" or "na"), using the exclude parameter ensures they are not validated. |
@Koncopd . Written on Github, so please check syntax + formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
lamindb/_curate.py
Outdated
|
||
Examples: | ||
>>> import bionty as bt | ||
>>> curate = ln.Curator.from_tiledbsoma( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> curate = ln.Curator.from_tiledbsoma( | |
>>> curator = ln.Curator.from_tiledbsoma( |
Thought we're using that everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
This PR adds
SOMACurator
class to curatetiledbsoma
stores.