Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr Tile Sink #1446

Merged
merged 42 commits into from
Mar 21, 2024
Merged

Zarr Tile Sink #1446

merged 42 commits into from
Mar 21, 2024

Conversation

annehaley
Copy link
Collaborator

@annehaley annehaley commented Jan 29, 2024

This PR makes changes to the Zarr tile source to allow the user to write new N-dimensional files. This includes the addition of new, addTile, and write methods.

This PR also rewrites test/test_sink to copy data from TestFileTileSource rather than randomly generate data.

TODOs:

  • Improve band bookkeeping
  • Allow user to crop data upon write

Future Work:

@annehaley annehaley marked this pull request as ready for review February 9, 2024 16:35
@manthey
Copy link
Member

manthey commented Mar 5, 2024

I was playing with this a little, trying to replace a custom data collector with this branch. I was expecting the class properties sizeX and sizeY properties (and frames, bandCount, dtype) to update based on the data collected. Also, adding data is slow, because (a) we are using np.pad and (b) we are calling create_dataset on every change. If create_dataset is being called just to rechunk things, we can pick a chunking of 512 in x and y, a largish value in samples (bands) like 16, and 1 for all other axes.

# Edit OME metadata
self._zarr.attrs.update({
'multiscales': [{
'version': '0.5-dev',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked ome zarr docs for version information. Are we more conformant to a dev version than the latest official version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just following what I found here: https://ngff.openmicroscopy.org/latest/#multiscale-md, but I'm not entirely confident that this is all correct.

@manthey
Copy link
Member

manthey commented Mar 18, 2024

With #1477, there are now some tests that can be enabled. Uncomment the two parametered tests on lines
https://github.com/girder/large_image/blob/master/test/test_examples.py#L90-L91 ; they should pass for this branch before we consider it mergeable.

axes[0:0] = [k]
self._axes = {k: i for i, k in enumerate(axes)}
while len(tile.shape) < len(axes):
tile = np.expand_dims(tile, axis=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me this could be done without an explicit while loop. I think the equivalent would be:

if len(tile.shape) < len(axes):
    tile = np.expand_dims(tile, axis=tuple(range(len(axes) - len(tile.shape))))
if mask is not None and len(mask.shape) < len(axes):
    mask = np.expand_dims(mask, axis=tuple(range(len(axes) - len(mask.shape))))

I'm not sure that is clearer, though.

self._addLock = threading.RLock()
self._framecount = 0
self._mm_x = 0
self._mm_y = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add self._levels = [] here and a test that just does

sink = large_image_source_zarr.new()
assert sink.metadata['levels'] == 0
assert sink.getRegion(format=TILE_FORMAT_NUMPY)[0].shape[:2] == (0, 0)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added in 157c731

axes = [x.lower() for x in axes]
if axes[-1] != 's':
axes.append('s')
if mask is not None and len(axes) - 1 == len(mask.shape):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be outdented (not in the if axes[-1] clause).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Moved in 62e6572

@manthey
Copy link
Member

manthey commented Mar 19, 2024

Let's add a mask test:

def testAddTileWithMask():
    sink = large_image_source_zarr.new()
    tile0 = np.random.random((10, 10))
    sink.addTile(tile0, 0, 0)
    orig = sink.getRegion(format='numpy')[0]
    tile1 = np.random.random((10, 10))
    sink.addTile(tile1, 0, 0, mask=np.random.random((10, 10)) > 0.5)
    cur = sink.getRegion(format='numpy')[0]
    assert (tile0 == orig[:, :, 0]).all()
    assert not (tile1 == orig[:, :, 0]).all()
    assert not (tile0 == cur[:, :, 0]).all()
    assert not (tile1 == cur[:, :, 0]).all()

This will fail because of caching. In the vips source we have
https://github.com/girder/large_image/blob/master/sources/vips/large_image_source_vips/__init__.py#L125

and do self._cacheValue = str(uuid.uuid4()) as part of _initNew and when we invalidate the image (when we set self._levels = None).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants