Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add Curator for tiledbsoma stores #2228

Merged
merged 31 commits into from
Dec 3, 2024
Merged

✨ Add Curator for tiledbsoma stores #2228

merged 31 commits into from
Dec 3, 2024

Conversation

Koncopd
Copy link
Member

@Koncopd Koncopd commented Nov 28, 2024

This PR adds SOMACurator class to curate tiledbsoma stores.

curator = ln.Curator.from_tiledbsoma(
    "./curate.tiledbsoma",
    {"RNA": ("var_id", bt.Gene.symbol)},
    categoricals={"cell_type": bt.CellType.name},
    organism="human",
)
curator.validate()
curator.standardize("all")
curator.add_new_from("all")
curator.save_artifact()

@Koncopd Koncopd marked this pull request as draft November 28, 2024 17:31
Copy link

github-actions bot commented Nov 28, 2024

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 99.05660% with 2 lines in your changes missing coverage. Please review.

Project coverage is 92.90%. Comparing base (4c27162) to head (c6472e9).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
lamindb/_curate.py 99.05% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2228      +/-   ##
==========================================
+ Coverage   92.71%   92.90%   +0.18%     
==========================================
  Files          55       55              
  Lines        6947     7159     +212     
==========================================
+ Hits         6441     6651     +210     
- Misses        506      508       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pull request November 28, 2024 19:20 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 29, 2024 19:28 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 29, 2024 19:52 Inactive
@Koncopd Koncopd force-pushed the curate_soma branch 3 times, most recently from 74f6188 to fcef038 Compare December 2, 2024 10:01
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 10:15 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 10:43 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 12:22 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 13:04 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 13:44 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 14:35 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 2, 2024 17:29 Inactive
@Koncopd Koncopd marked this pull request as ready for review December 2, 2024 17:30
Copy link
Member

@falexwolf falexwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good from my end. The bigger questions are for @sunnyosun and @Zethson about conventions for the user-facing API design and code organizations.

This is another thread.

@sunnyosun should also review this PR though.

lamindb/_curate.py Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
in `.standardize` or `.add_new_from`, see the output of `.var_index`.
categoricals: A dictionary mapping ``.obs`` columns to a registry field.
obs_columns: The registry field for mapping the ``.obs`` columns.
using_key: A reference LaminDB instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's please not expose using_key for any new Curator class. It should disappear from all.

(Unless you tell me we absolutely need this @Zethson @sunnyosun, I continue to think that this is a horribly complicated design choice and a very bad name for an argument. I'll make a new GitHub issue on this topic or try to find where we previously discussed this).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's hide it.

Copy link
Member

@Zethson Zethson Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful to extend the CxG curator. By default, it uses the laminlabs/cellxgene instance but when extending it (e.g. with perturbations) you want to now not curate against laminlabs/cellxgene but your own instance.

I've also never liked the name but one of you wanted it renamed from using to using_key.

Copy link
Member Author

@Koncopd Koncopd Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the description of the argument, put it to the bottom.

lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Show resolved Hide resolved
@falexwolf
Copy link
Member

This is impressive, Sergei!

All my comments are minor. Last request is to populate the PR description.

Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I recognized quite many parts of the code and we can look out for ways to consolidate them in the future I hope.

lamindb/_curate.py Show resolved Hide resolved
lamindb/_curate.py Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
lamindb/_curate.py Outdated Show resolved Hide resolved
noxfile.py Show resolved Hide resolved
tests/core/test_curator.py Outdated Show resolved Hide resolved
in `.standardize` or `.add_new_from`, see the output of `.var_index`.
categoricals: A dictionary mapping ``.obs`` columns to a registry field.
obs_columns: The registry field for mapping the ``.obs`` columns.
using_key: A reference LaminDB instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's hide it.

@github-actions github-actions bot temporarily deployed to pull request December 3, 2024 10:06 Inactive
obs_columns: The registry field for mapping the names of the `.obs` columns.
organism: The organism name.
sources: A dictionary mapping `.obs` columns to Source records.
exclude: A dictionary mapping column names to values to exclude.
Copy link
Member

@Zethson Zethson Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exclude: A dictionary mapping column names to values to exclude.
exclude: A dictionary mapping column names to values to exclude from validation.
When specific :class:~bionty.Source instances are pinned and may lack default values (e.g., "unknown" or "na"), using the exclude parameter ensures they are not validated.

@Koncopd . Written on Github, so please check syntax + formatting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


Examples:
>>> import bionty as bt
>>> curate = ln.Curator.from_tiledbsoma(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> curate = ln.Curator.from_tiledbsoma(
>>> curator = ln.Curator.from_tiledbsoma(

Thought we're using that everywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@github-actions github-actions bot temporarily deployed to pull request December 3, 2024 10:35 Inactive
@Koncopd Koncopd merged commit 3487e24 into main Dec 3, 2024
16 checks passed
@Koncopd Koncopd deleted the curate_soma branch December 3, 2024 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants