updates sparse scale #2942

Intron7 · 2024-03-22T10:51:32Z

This would fix #2941

I created some numba.njit() kernels that perform in-place substitutions based on the assumption that we only change existing values and don't add new ones (where all the scipy overhead comes from).

Benchmarks for 90k cells and 25k genes:
CSR:
old 23 s | new 1 s | 23x
CSC:
old 61 s | 1.6 s | 36x

codecov · 2024-03-22T11:14:26Z

Codecov Report

Attention: Patch coverage is 87.70492% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 75.51%. Comparing base (c68557c) to head (35dd438).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2942      +/-   ##
==========================================
+ Coverage   75.49%   75.51%   +0.02%     
==========================================
  Files         116      117       +1     
  Lines       12911    12955      +44     
==========================================
+ Hits         9747     9783      +36     
- Misses       3164     3172       +8

Files	Coverage Δ
scanpy/preprocessing/__init__.py	`100.00% <100.00%> (ø)`
scanpy/preprocessing/_simple.py	`83.97% <100.00%> (-1.19%)`	⬇️
scanpy/tools/_rank_genes_groups.py	`94.33% <100.00%> (ø)`
scanpy/preprocessing/_scale.py	`87.39% <87.39%> (ø)`

Intron7 · 2024-03-22T13:08:12Z

It seems like I run into segfaults with the csc implementation. I could not reproduce these with on my PC. I run the tests a lot of times. Since the csr-kernel seems to be working and csr is anyway better for row-based subsets I would propose that we transform the csc into a csr and than return that for mask_obs.

ivirshup

Have you looked at the implementation of inplace_column_scale? WDYT about just scaling by one when the value is masked out

docs/release-notes/1.10.1.md

Intron7 · 2024-03-27T11:23:28Z

That could also work. But this would require a bit of a rewrite. I think the current solution is simpler and also really fast.

ivirshup · 2024-03-28T11:22:46Z

My thinking on this right now is that:

The code for masking logic (pre this PR) is kind of a mess
This PR doesn't make the code nicer

But the performance benefit is quite good, and for sure the operation X[mask_obs, :] = scale_rv is something we don't want to do with sparse matrices.

I also think we could get even faster, plus a bit cleaner if we instead modified scale array to use something like what I suggest here to accept a row_mask argument:

from scipy import sparse
import numpy as np
from operator import mul, truediv

def broadcast_csr_by_vec(X, vec, op, axis):
    if axis == 0:
        new_data = op(X.data, np.repeat(vec, np.diff(X.indptr)))
    elif axis == 1:
        new_data = op(X.data, vec.take(X.indices, mode="clip"))
    return X._with_data(new_data)

Which I think would be something like:

def broadcast_csr_by_vec(X, vec, op, axis, row_mask: None | np.ndarray):
    if row_mask is not None:
        vec = np.where(row_mask, vec, 1)
    if axis == 0:
        new_data = op(X.data, np.repeat(vec, np.diff(X.indptr)))
    elif axis == 1:
        new_data = op(X.data, vec.take(X.indices, mode="clip"))
    return X._with_data(new_data)

Or, since we're doing numba already we could do just write out the operation with a check to see if we're on a masked row (which should be even faster since we're not allocating anything extra).

I think either of these solutions would be simpler since we do the masking all in one place, and don't have to have a second update step.

Intron7 · 2024-03-28T12:45:51Z

I really like your idea. But I feed like this will complecate the clipping function aswell since we need to subset there aswell than.

ivirshup · 2024-03-28T13:01:28Z

clipping function

I think it's not so bad. I think you can use similar logic. Numpy version is something like:

    if axis == 0:
        data_mask = np.repeat(row_mask, np.diff(X.indptr))
    elif axis == 1:
        data_mask = obs_mask.take(X.indices, mode="clip")
    X.data[(X.data > max_value) & data_mask] = max_value

right?

For numba, I'd just include the clipping in the inner loop so it's still single pass.

Intron7 · 2024-03-28T13:39:53Z

I would be open to doing this is you move the whole sparse part out of scale array and than keeping it in scale sparse

ivirshup · 2024-03-28T14:07:21Z

I would be open to doing this is you move the whole sparse part out of scale array and than keeping it in scale sparse

Absolutely. The sparse definition being in scale_array is a big part of what makes the current code a mess

Intron7 · 2024-03-28T16:25:33Z

@ivirshup I don't know where the crash in the build is coming from I change nothing in those parts. However I rewrote the sparse logic and to me it's now a lot better.

ivirshup · 2024-03-28T21:11:46Z

I think it was a bugged uv release, but seems to be fixed now.

Intron7 · 2024-03-28T21:51:35Z

Ok looks good now

ivirshup · 2024-04-02T13:52:53Z

scanpy/preprocessing/_simple.py

+    elif isspmatrix_csc(X) and mask_obs is None:
+        return scale_array(
+            X,
+            zero_center=zero_center,
+            copy=copy,
+            max_value=max_value,
+            return_mean_std=return_mean_std,
+            mask_obs=mask_obs,
+        )


I'm a little confused by what this branch ends up doing. Does this not hit the numba kernel?

Btw, I think you can have:

if mask is None:

blocks in numba code where the entire block is just excluded from the compiled code since it's a compile time constant.

No that doesn't hit the numba kernel. For CSC matrix I doesnt make sense to make sense to hit the numba kernel. If we use a map obs we need to transform into csr for the mask-obs subset for the mean-var-calc. I think what you suggest is fair that we also use the base implementation for none mask csr.

blocks in numba code where the entire block is just excluded from the compiled code since it's a compile time constant.

I'll look into this and adjust the code based on this

So I updated the kernel to use 2 compile time constant.

ivirshup · 2024-04-03T11:46:06Z

scanpy/preprocessing/_simple.py

+    @numba.njit()
+    def _scale_sparse_numba(indptr, indices, data, *, std, mask_obs, has_mask, clip):
+        def _loop_scale(cell_ix):
+            for j in numba.prange(indptr[cell_ix], indptr[cell_ix + 1]):


Is this multithreaded if you don't pass parallel=True to njit?

its slower, because of the way the memory access happens. I tested it. So no its not multi-threaded

Ok after running some more tests. Its not the memory access but the compile time. The speedup only happens for very large matrices in the 2nd run so I dont think its worth it.

Intron7 · 2024-04-04T10:00:43Z

I have some concerns about the performance of the no numba version for larger datasets. So it might be better to either switch to the numba kernel for larger datasets or take the compile hit for small datasets

ivirshup · 2024-04-08T10:20:09Z

So it might be better to either switch to the numba kernel for larger datasets or take the compile hit for small datasets

The compiled versions should get cached, so it's a one time cost per install. No?

ivirshup · 2024-04-08T11:54:09Z

@Intron7, could you open an issue with the follow up things you wanted to investigate here?

I think this is good to merge as it gets ~ a 100x speed up, and we can do comparisons on top of this in follow ups.

Co-authored-by: Severin Dicks <[email protected]>

Intron7 added 2 commits March 22, 2024 11:40

updates sparse scale

0ee8601

use prange

a1662d1

Intron7 requested review from flying-sheep and ivirshup March 22, 2024 10:51

Intron7 added this to the 1.10.1 milestone Mar 22, 2024

adds release note

e2652ad

Intron7 added 2 commits March 22, 2024 12:36

adds csc tests

250bbcc

remove prange to avoid segfault in test

b808433

Intron7 and others added 4 commits March 22, 2024 14:11

switches csc to csr for mask_obs

44018e7

update docstring

501b8a4

Merge branch 'main' into inplace_sparse_scale

efd0e55

add kernel tests

9cdf3e2

ivirshup reviewed Mar 27, 2024

View reviewed changes

docs/release-notes/1.10.1.md Outdated Show resolved Hide resolved

Intron7 and others added 3 commits March 27, 2024 12:13

update release note to performance

fcca28f

Merge branch 'main' into inplace_sparse_scale

119cde7

fixes end of file

532333a

Intron7 requested a review from ivirshup March 28, 2024 09:05

updates so dask is covered with mask

b06e9b0

Intron7 added 2 commits March 28, 2024 17:16

rework sparse scale

95540de

remove redundant line

c9e2736

ivirshup reviewed Apr 2, 2024

View reviewed changes

update complier_constant

b1336f0

Intron7 requested a review from ivirshup April 2, 2024 17:35

Intron7 and others added 5 commits April 2, 2024 19:40

Merge branch 'main' into inplace_sparse_scale

ba73360

remove small oversight

c0deab6

updates max_value

35411aa

adds sparse kernel tests

20b6d36

update inner kernel

87be224

ivirshup reviewed Apr 3, 2024

View reviewed changes

Intron7 added 3 commits April 3, 2024 13:48

move scale out of simple

8e6f52b

updates a dependency

38cbd29

caches the kernel

894426b

Intron7 requested a review from ivirshup April 3, 2024 13:02

Intron7 added 5 commits April 3, 2024 15:32

only use kernel if a mask is given

3766925

fixes an issue with max_value for sparse matrixes

a1a0ffd

removes print

43edd8a

remove parallel

5f91805

removee unused dependency

99cd8a1

Move numba code to it's own method

35dd438

ivirshup approved these changes Apr 8, 2024

View reviewed changes

ivirshup merged commit 3255fda into main Apr 8, 2024
13 checks passed

ivirshup deleted the inplace_sparse_scale branch April 8, 2024 11:54

meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Apr 8, 2024

Backport PR scverse#2942: updates sparse scale

4042018

meeseeksmachine mentioned this pull request Apr 8, 2024

Backport PR #2942 on branch 1.10.x (updates sparse scale) #2985

Merged

ivirshup pushed a commit that referenced this pull request Apr 8, 2024

Backport PR #2942: updates sparse scale (#2985)

41bc8fa

Co-authored-by: Severin Dicks <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updates sparse scale #2942

updates sparse scale #2942

Intron7 commented Mar 22, 2024 •

edited

Loading

codecov bot commented Mar 22, 2024 •

edited

Loading

Intron7 commented Mar 22, 2024

ivirshup left a comment

Intron7 commented Mar 27, 2024

ivirshup commented Mar 28, 2024 •

edited

Loading

Intron7 commented Mar 28, 2024

ivirshup commented Mar 28, 2024 •

edited

Loading

Intron7 commented Mar 28, 2024 •

edited

Loading

ivirshup commented Mar 28, 2024

Intron7 commented Mar 28, 2024

ivirshup commented Mar 28, 2024

Intron7 commented Mar 28, 2024

ivirshup Apr 2, 2024

Intron7 Apr 2, 2024

Intron7 Apr 2, 2024

ivirshup Apr 3, 2024

Intron7 Apr 3, 2024 •

edited

Loading

Intron7 Apr 3, 2024

Intron7 commented Apr 4, 2024 •

edited

Loading

ivirshup commented Apr 8, 2024

ivirshup commented Apr 8, 2024

updates sparse scale #2942

updates sparse scale #2942

Conversation

Intron7 commented Mar 22, 2024 • edited Loading

codecov bot commented Mar 22, 2024 • edited Loading

Codecov Report

Intron7 commented Mar 22, 2024

ivirshup left a comment

Choose a reason for hiding this comment

Intron7 commented Mar 27, 2024

ivirshup commented Mar 28, 2024 • edited Loading

Intron7 commented Mar 28, 2024

ivirshup commented Mar 28, 2024 • edited Loading

Intron7 commented Mar 28, 2024 • edited Loading

ivirshup commented Mar 28, 2024

Intron7 commented Mar 28, 2024

ivirshup commented Mar 28, 2024

Intron7 commented Mar 28, 2024

ivirshup Apr 2, 2024

Choose a reason for hiding this comment

Intron7 Apr 2, 2024

Choose a reason for hiding this comment

Intron7 Apr 2, 2024

Choose a reason for hiding this comment

ivirshup Apr 3, 2024

Choose a reason for hiding this comment

Intron7 Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

Intron7 Apr 3, 2024

Choose a reason for hiding this comment

Intron7 commented Apr 4, 2024 • edited Loading

ivirshup commented Apr 8, 2024

ivirshup commented Apr 8, 2024

Intron7 commented Mar 22, 2024 •

edited

Loading

codecov bot commented Mar 22, 2024 •

edited

Loading

ivirshup commented Mar 28, 2024 •

edited

Loading

ivirshup commented Mar 28, 2024 •

edited

Loading

Intron7 commented Mar 28, 2024 •

edited

Loading

Intron7 Apr 3, 2024 •

edited

Loading

Intron7 commented Apr 4, 2024 •

edited

Loading