🎨 Centralize a small test dataset for curation & queries #2234

falexwolf · 2024-11-29T11:05:02Z

This adds ln.core.datasets.small_dataset1 and uses it to for additional tests of how FeatureManager deals with pandas and numpy types.

def small_dataset1(
    format: Literal["df", "anndata"],
) -> tuple[pd.DataFrame, dict[str, Any]] | ad.AnnData:
    # define the data in the dataset
    # it's a mix of numerical measurements and observation-level metadata
    dataset_dict = {
        "CD8A": [1, 2, 3],
        "CD4": [3, 4, 5],
        "CD14": [5, 6, 7],
        "cell_medium": ["DMSO", "IFNG", "DMSO"],
        "sample_note": ["was ok", "looks naah", "pretty! 🤩"],
        "cell_type_by_expert": ["B cell", "T cell", "T cell"],
        "cell_type_by_model": ["B cell", "T cell", "T cell"],
    }
    # define the dataset-level metadata
    metadata = {
        "temperature": 21.6,
        "study": "Candidate marker study 1",
        "date_of_study": "2024-12-01",
        "study_note": "We had a great time performing this study and the results look compelling.",
    }
    # the dataset as DataFrame
    dataset_df = pd.DataFrame(dataset_dict, index=["sample1", "sample2", "sample3"])
    dataset_ad = ad.AnnData(
        dataset_df.iloc[:, :3], obs=dataset_df.iloc[:, 3:], uns=metadata
    )
    if format == "df":
        return dataset_df, metadata
    else:
        return dataset_ad

@sunnyosun, @Zethson, @Koncopd -- my hope is that we can keep iterating on this dataset and re-use it across test scenarios relating to curation.

It's in fact pretty hard to come up with good test datasets that cover different cases comprehensively.

codecov · 2024-11-29T13:21:47Z

Codecov Report

Attention: Patch coverage is 76.31579% with 9 lines in your changes missing coverage. Please review.

Project coverage is 92.88%. Comparing base (c54f99f) to head (6daad43).
Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
lamindb/core/_feature_manager.py	69.23%	8 Missing ⚠️
lamindb/_feature.py	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2234      +/-   ##
==========================================
+ Coverage   92.36%   92.88%   +0.51%     
==========================================
  Files          54       54              
  Lines        6566     6687     +121     
==========================================
+ Hits         6065     6211     +146     
+ Misses        501      476      -25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-11-29T13:28:38Z

🚀 Deployed on https://6749c16be9ef8a88390602a0--lamindb-qnwk.netlify.app

falexwolf added 3 commits November 29, 2024 12:04

🎨 Refactor test datasets

01b73d8

♻️ Better pandas type treatment

e875f9b

💚 Fix

6daad43

falexwolf changed the title ~~🎨 Refactor test datasets~~ 🎨 Centralize a small test dataset for curation Nov 29, 2024

github-actions bot temporarily deployed to pull request November 29, 2024 13:28 Inactive

falexwolf merged commit 7019150 into main Nov 29, 2024
15 of 16 checks passed

falexwolf deleted the pandastypes branch November 29, 2024 13:45

falexwolf changed the title ~~🎨 Centralize a small test dataset for curation~~ 🎨 Centralize a small test dataset for curation & queries Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎨 Centralize a small test dataset for curation & queries #2234

🎨 Centralize a small test dataset for curation & queries #2234

falexwolf commented Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading

github-actions bot commented Nov 29, 2024

🎨 Centralize a small test dataset for curation & queries #2234

🎨 Centralize a small test dataset for curation & queries #2234

Conversation

falexwolf commented Nov 29, 2024 • edited Loading

codecov bot commented Nov 29, 2024 • edited Loading

Codecov Report

github-actions bot commented Nov 29, 2024

falexwolf commented Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading