Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): cache indptr for backed sparse matrices #1266

Merged
merged 21 commits into from
Jan 11, 2024
Merged

(fix): cache indptr for backed sparse matrices #1266

merged 21 commits into from
Jan 11, 2024

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Dec 14, 2023

@ilan-gold ilan-gold added this to the 0.10.4 milestone Dec 14, 2023
@ilan-gold ilan-gold changed the title (fix): cache indptr (fix): cache indptr for backed sparse matrices Dec 14, 2023
Copy link

codecov bot commented Dec 14, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (ab43f8d) 85.26% compared to head (b448e88) 85.31%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1266      +/-   ##
==========================================
+ Coverage   85.26%   85.31%   +0.05%     
==========================================
  Files          34       34              
  Lines        5462     5490      +28     
==========================================
+ Hits         4657     4684      +27     
- Misses        805      806       +1     
Flag Coverage Δ
gpu-tests 51.94% <61.76%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
anndata/tests/helpers.py 96.13% <100.00%> (+0.18%) ⬆️
anndata/_core/sparse_dataset.py 93.77% <94.73%> (-0.07%) ⬇️

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Dec 15, 2023

I have spent way too long staring at the error (and trying various things to figure out why it is happening, including causing my python to segault several times). I cannot make heads or tails of why the error is happening. If I add an assert statement to check if self.indptr == self.group['indptr'] before https://github.com/scverse/anndata/pull/1266/files#diff-b160b916be368b6665c7eb0b265a2d19b28c96ef86ef8a814788209feac57a66L439, it fails sometimes (the tests are random so I can't say exactly on what case this is happening). To me, this is indicates that something outside the class is changing the underlying indptr to change but I have no idea what or why. I have checked the access count (via AccessTrackingStore) and it's at 2, one for setting and one for getting (I edited locally to look for setting). So this further indicates that it is happening by something that is reopening the group. Genuinely at a loss, the concat_on_disk code is foreign to me but maybe I will have more success on Monday.

@ilan-gold ilan-gold self-assigned this Dec 29, 2023
@ilan-gold ilan-gold requested review from ivirshup and removed request for flying-sheep December 29, 2023 23:25
@flying-sheep flying-sheep modified the milestones: 0.10.4, 0.10.5 Jan 4, 2024
Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks basically good. Made a few minor suggestions. A little demonstration of effect would be nice as well.

I'm assuming the error you are talking about above no longer occurs?

anndata/_core/sparse_dataset.py Outdated Show resolved Hide resolved
Comment on lines 301 to 303
@property
def group(self):
return self._group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay with this being made public, but could you add a docstring + type hint for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor Author

@ilan-gold ilan-gold Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the type hint appears to be unnecessary. You'll notice that some of the builds failed because of documentation issues - I honestly don't know why. The other references to Group have no issue, but this one causes things to break. VSCode can infer the type and the docs are fine without it.

Copy link
Contributor Author

@ilan-gold ilan-gold Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I lied. The docs don't work but the type hints do work e.g., in VSCode. I was looking at the wrong Group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have no idea why this is breaking. Extremely strange stuff. I will look again tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's working. Can we resolve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I did not push the change, so the docs are building because of that, unless this works for you locally.

anndata/_core/sparse_dataset.py Outdated Show resolved Hide resolved
anndata/tests/test_backed_sparse.py Outdated Show resolved Hide resolved
docs/release-notes/0.10.4.md Outdated Show resolved Hide resolved
anndata/_core/sparse_dataset.py Show resolved Hide resolved
Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything is addressed apart from a little type hinting. Not totally sure I understand the problems you ran into, but asked some questions in comments on the code.


shape: tuple[int, int]
"""Shape of the matrix."""

@property
def group(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def group(self):
def group(self) -> GroupStorageType:

Is this the change that was giving you problems with the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

anndata/_core/sparse_dataset.py Outdated Show resolved Hide resolved
@ivirshup ivirshup merged commit 4a1caa9 into main Jan 11, 2024
14 checks passed
@ivirshup ivirshup deleted the ig/cache_indptr branch January 11, 2024 14:49
meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Jan 11, 2024
ivirshup pushed a commit that referenced this pull request Jan 11, 2024
@flying-sheep flying-sheep mentioned this pull request Jan 26, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparseDataset inefficient loading of indptr
3 participants