Merge branch 'master' into benchmark_support

scverse · Feb 22, 2024 · f8368c6 · f8368c6
2 parents d69a90c + 48b495d
commit f8368c6
Show file tree

Hide file tree

Showing 11 changed files with 60 additions and 187 deletions.
diff --git a/.azure-pipelines.yml b/.azure-pipelines.yml
@@ -52,21 +52,21 @@ jobs:
 
   - script: |
       python -m pip install --upgrade pip
-      pip install wheel coverage
+      pip install wheel
       pip install .[dev,$(TEST_EXTRA)]
     displayName: 'Install dependencies'
     condition: eq(variables['DEPENDENCIES_VERSION'], 'latest')
 
   - script: |
       python -m pip install --pre --upgrade pip
-      pip install --pre wheel coverage
+      pip install --pre wheel
       pip install --pre .[dev,$(TEST_EXTRA)]
       pip install -v "anndata[dev,test] @ git+https://github.com/scverse/anndata"
     displayName: 'Install dependencies release candidates'
     condition: eq(variables['DEPENDENCIES_VERSION'], 'pre-release')
 
   - script: |
-      python -m pip install pip wheel tomli packaging pytest-cov
+      python -m pip install pip wheel tomli packaging
       pip install `python3 ci/scripts/min-deps.py pyproject.toml --extra dev test`
       pip install --no-deps .
     displayName: 'Install dependencies minimum version'
@@ -81,8 +81,7 @@ jobs:
     condition: eq(variables['TEST_TYPE'], 'standard')
 
   - script: |
-      coverage run -m pytest
-      coverage xml
+      pytest --cov --cov-report=xml --cov-context=test
     displayName: 'PyTest (coverage)'
     condition: eq(variables['TEST_TYPE'], 'coverage')
 

diff --git a/docs/release-notes/1.10.0.md b/docs/release-notes/1.10.0.md
@@ -1,40 +1,37 @@
-### 1.10.0 {small}`the future`
+### 1.10.0rc1 {small}`2024-02-22`
 
 ```{rubric} Features
 ```
 
+* {func}`~scanpy.pp.scrublet` and {func}`~scanpy.pp.scrublet_simulate_doublets` were moved from {mod}`scanpy.external.pp` to {mod}`scanpy.pp`. The `scrublet` implementation is now maintained as part of scanpy {pr}`2703` {smaller}`P Angerer`
+* {func}`scanpy.pp.pca`, {func}`scanpy.pp.scale`, {func}`scanpy.pl.embedding`, and {func}`scanpy.experimental.pp.normalize_pearson_residuals_pca` now support a `mask` parameter {pr}`2272` {smaller}`C Bright, T Marcella, & P Angerer`
+* Enhanced dask support for some internal utilities, paving the way for more extensive dask support {pr}`2696` {smaller}`P Angerer`
+* {func}`scanpy.pp.highly_variable_genes` supports dask for the default `seurat` and `cell_ranger` flavors {pr}`2809` {smaller}`P Angerer`
+* New function {func}`scanpy.get.aggregate` which allows grouped aggregations over your data. Useful for pseudobulking! {pr}`2590` {smaller}`Isaac Virshup` {smaller}`Ilan Gold` {smaller}`Jon Bloom`
+* {func}`scanpy.pp.neighbors` now has a `transformer` argument allowing the use of different ANN/ KNN libraries {pr}`2536` {smaller}`P Angerer`
+* {func}`scanpy.experimental.pp.highly_variable_genes` using `flavor='pearson_residuals'` now uses numba for variance computation and is faster {pr}`2612` {smaller}`S Dicks & P Angerer`
+* {func}`scanpy.tl.leiden` now offers `igraph`'s implementation of the leiden algorithm via  via `flavor` when set to `igraph`. `leidenalg`'s implementation is still default, but discouraged.  {pr}`2815` {smaller}`I Gold`
+* {func}`scanpy.pp.highly_variable_genes` has new flavor `seurat_v3_paper` that is in its implementation consistent with the paper description in Stuart et al 2018. {pr}`2792` {smaller}`E Roellin`
 * {func}`scanpy.datasets.blobs` now accepts a `random_state` argument {pr}`2683` {smaller}`E Roellin`
 * {func}`scanpy.pp.pca` and {func}`scanpy.pp.regress_out` now accept a layer argument {pr}`2588` {smaller}`S Dicks`
 * {func}`scanpy.pp.subsample` with `copy=True` can now be called in backed mode {pr}`2624` {smaller}`E Roellin`
-* {func}`scanpy.pp.neighbors` now has a `transformer` argument allowing for more flexibility {pr}`2536` {smaller}`P Angerer`
-* {func}`scanpy.experimental.pp.highly_variable_genes` using `flavor='pearson_residuals'`
-  now uses numba for variance computation {pr}`2612` {smaller}`S Dicks & P Angerer`
 * {func}`scanpy.external.pp.harmony_integrate` now runs with 64 bit floats improving reproducibility {pr}`2655` {smaller}`S Dicks`
-* {func}`~scanpy.pp.scrublet` and {func}`~scanpy.pp.scrublet_simulate_doublets` were moved from {mod}`scanpy.external.pp` to {mod}`scanpy.pp`.
-  The `scrublet` implementation is now maintained as part of scanpy {pr}`2703` {smaller}`P Angerer`
-* Enhanced dask support for some internal utilities, paving the way for more extensive dask support {pr}`2696` {smaller}`P Angerer`
-* {func}`scanpy.pp.pca`, {func}`scanpy.pp.scale`, {func}`scanpy.pl.embedding`, and {func}`scanpy.experimental.pp.normalize_pearson_residuals_pca`
-  now support a `mask` parameter {pr}`2272` {smaller}`C Bright, T Marcella, & P Angerer`
-* New function {func}`scanpy.get.aggregate` which allows grouped aggregations over your data. Useful for pseudobulking! {pr}`2590` {smaller}`Isaac Virshup` {smaller}`Ilan Gold` {smaller}`Jon Bloom`
 * {func}`scanpy.tl.rank_genes_groups` no longer warns that it's default was changed from t-test_overestim_var to t-test {pr}`2798` {smaller}`L Heumos`
-* {func}`scanpy.tl.leiden` now offers `igraph`'s implementation of the leiden algorithm via  via `flavor` when set to `igraph`.  `leidenalg`'s implementation is still default, but discouraged.  {pr}`2815` {smaller}`I Gold`
-* {func}`scanpy.pp.highly_variable_genes` has new flavor `seurat_v3_paper` that is in its implementation consistent with the paper description in Stuart et al 2018. {pr}`2792` {smaller}`E Roellin`
-* {func}`scanpy.pp.highly_variable_genes` supports dask for the default `seurat` and `cell_ranger` flavors {pr}`2809` {smaller}`P Angerer`
-* Auto conversion of strings to collections in `scanpy.pp.calculate_qc_metrics` {pr}`2859` {smaller}`N Teyssier`
+* `scanpy.pp.calculate_qc_metrics` now allows `qc_vars` to be passed as a string {pr}`2859` {smaller}`N Teyssier`
 
 ```{rubric} Docs
 ```
+
+* Re-add search-as-you-type, this time via `readthedocs-sphinx-search` {pr}`2805` {smaller}`P Angerer`
 * Fixed a lot of broken usage examples {pr}`2605` {smaller}`P Angerer`
 * Improved harmonization of return field of `sc.pp` and `sc.tl` functions {pr}`2742` {smaller}`E Roellin`
-* Re-add search-as-you-type, this time via `readthedocs-sphinx-search` {pr}`2805` {smaller}`P Angerer`
 * Improved docs for `percent_top` argument of {func}`~scanpy.pp.calculate_qc_metrics` {pr}`2849` {smaller}`I Virshup`
 
 ```{rubric} Bug fixes
 ```
 
 * Updated {func}`~scanpy.read_visium` such that it can read spaceranger 2.0 files {smaller}`L Lehner`
-* Fix {func}`~scanpy.pp.normalize_total` {pr}`2466` {smaller}`P Angerer`
-* Fix testing package build {pr}`2468` {smaller}`P Angerer`
+* Fix {func}`~scanpy.pp.normalize_total` for dask {pr}`2466` {smaller}`P Angerer`
 * Fix setting `sc.settings.verbosity` in some cases {pr}`2605` {smaller}`P Angerer`
 * Fix all remaining pandas warnings {pr}`2789` {smaller}`P Angerer`
 * Fix some annoying plotting warnings around violin plots {pr}`2844` {smaller}`P Angerer`
@@ -45,13 +42,12 @@
 ```
 
 * Scanpy is now tested against python 3.12 {pr}`2863` {smaller}`ivirshup`
-
-```{rubric} Ecosystem
-```
+* Fix testing package build {pr}`2468` {smaller}`P Angerer`
 
 ```{rubric} Deprecations
 ```
 
 * Dropped support for Python 3.8. [More details here](https://numpy.org/neps/nep-0029-deprecation_policy.html). {pr}`2695` {smaller}`P Angerer`
 * Deprecated specifying large numbers of function parameters by position as opposed to by name/keyword in all public APIs.
   e.g. prefer `sc.tl.umap(adata, min_dist=0.1, spread=0.8)` over `sc.tl.umap(adata, 0.1, 0.8)` {pr}`2702` {smaller}`P Angerer`
+* Dropped support for `umap<0.5` for performance reasons. {pr}`2870` {smaller}`P Angerer`
diff --git a/pyproject.toml b/pyproject.toml
@@ -63,7 +63,7 @@ dependencies = [
     "natsort",
     "joblib",
     "numba>=0.56",
-    "umap-learn>=0.3.10",
+    "umap-learn>=0.5,!=0.5.0",
     "pynndescent>=0.5",
     "packaging>=21.3",
     "session-info",
@@ -87,6 +87,7 @@ test-min = [
     "pytest>=7.4.2",
     "pytest-nunit",
     "pytest-mock",
+    "pytest-cov",
     "profimp",
 ]
 test = [
@@ -159,7 +160,6 @@ addopts = [
     "--import-mode=importlib",
     "--strict-markers",
     "--doctest-modules",
-    "-pscanpy.testing._pytest",
 ]
 testpaths = ["scanpy"]
 norecursedirs = ["scanpy/tests/_images"]

diff --git a/scanpy/_utils/__init__.py b/scanpy/_utils/__init__.py
@@ -84,10 +84,6 @@ def set_igraph_random_state(random_state: int):
 
 
 def check_versions():
-    from .._compat import pkg_version
-
-    umap_version = pkg_version("umap-learn")
-
     if version.parse(anndata_version) < version.parse("0.6.10"):
         from .. import __version__
 
@@ -96,15 +92,6 @@ def check_versions():
             f"not {anndata_version}.\nRun `pip install anndata -U --no-deps`."
         )
 
-    if umap_version < version.parse("0.3.0"):
-        from . import __version__
-
-        # make this a warning, not an error
-        # it might be useful for people to still be able to run it
-        logg.warning(
-            f"Scanpy {__version__} needs umap " f"version >=0.3.0, not {umap_version}."
-        )
-
 
 def getdoc(c_or_f: Callable | type) -> str | None:
     if getattr(c_or_f, "__doc__", None) is None:

diff --git a/scanpy/neighbors/_connectivity.py b/scanpy/neighbors/_connectivity.py
@@ -123,7 +123,7 @@ def umap(
         from umap.umap_ import fuzzy_simplicial_set
 
     X = coo_matrix(([], ([], [])), shape=(n_obs, 1))
-    connectivities = fuzzy_simplicial_set(
+    connectivities, _sigmas, _rhos = fuzzy_simplicial_set(
         X,
         n_neighbors,
         None,
@@ -134,8 +134,4 @@ def umap(
         local_connectivity=local_connectivity,
     )
 
-    if isinstance(connectivities, tuple):
-        # In umap-learn 0.4, this returns (result, sigmas, rhos)
-        connectivities = connectivities[0]
-
     return connectivities.tocsr()
diff --git a/scanpy/preprocessing/_docs.py b/scanpy/preprocessing/_docs.py
@@ -26,7 +26,7 @@
     By default uses them if they have been determined beforehand.
 
     .. deprecated:: 1.10.0
-       Use `mask` instead
+       Use `mask_var` instead
 """
 
 doc_obs_qc_args = """\

diff --git a/scanpy/tests/conftest.py b/scanpy/tests/conftest.py
@@ -7,6 +7,8 @@
 
 import pytest
 
+pytest_plugins = ["scanpy.testing._pytest"]
+
 # just import for the IMPORTED check
 import scanpy as _sc  # noqa: F401
 

diff --git a/scanpy/tests/test_aggregated.py b/scanpy/tests/test_aggregated.py
@@ -4,6 +4,7 @@
 import numpy as np
 import pandas as pd
 import pytest
+from packaging.version import Version
 from scipy.sparse import csr_matrix
 
 import scanpy as sc
@@ -113,7 +114,6 @@ def test_aggregate_vs_pandas(metric, array_type):
             .groupby(["louvain", "percent_mito_binned"], observed=True)
             .agg(metric)
         )
-    # TODO: figure out the axis names
     expected.index = expected.index.to_frame().apply(
         lambda x: "_".join(map(str, x)), axis=1
     )
@@ -124,6 +124,13 @@ def test_aggregate_vs_pandas(metric, array_type):
     result_df.index.name = None
     result_df.columns.name = None
 
+    if Version(pd.__version__) < Version("2"):
+        # Order of results returned by groupby changed in pandas 2
+        assert expected.shape == result_df.shape
+        assert expected.index.isin(result_df.index).all()
+
+        expected = expected.loc[result_df.index]
+
     pd.testing.assert_frame_equal(result_df, expected, check_dtype=False, atol=1e-5)
 
 

diff --git a/scanpy/tests/test_pca.py b/scanpy/tests/test_pca.py
@@ -382,7 +382,7 @@ def test_mask_order_warning(request):
         UserWarning,
         match="When using a mask parameter with anndata<0.9 on a dense array",
     ):
-        sc.pp.pca(adata, mask=mask)
+        sc.pp.pca(adata, mask_var=mask)
 
 
 def test_mask_defaults(array_type, float_dtype):