Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple code suggestions to improve #50

Open
hvgazula opened this issue Mar 20, 2024 · 8 comments
Open

simple code suggestions to improve #50

hvgazula opened this issue Mar 20, 2024 · 8 comments

Comments

@hvgazula
Copy link
Collaborator

pixel_counts = {label:count for label,count in zip(unique,counts)}

dict(zip(unique,count))

https://github.com/sabeenlohawala/tissue_labeling/blob/8611ada4596771e1ab25cb02faf7be557509593b/scripts/mit_kwyk_data.py#L180C1-L181C85

shapes, pixel_counts = zip(*shapes_and_pixel_counts)

for i in range(label_vol.shape[d]):
# get the slice
if d == 0:
feature_slice = feature_vol[i, :, :]
label_slice = label_vol[i, :, :]
elif d == 1:
feature_slice = feature_vol[:, i, :]
label_slice = label_vol[:, i, :]
elif d == 2:
feature_slice = feature_vol[:, :, i]
label_slice = label_vol[:, :, i]
# discard slices with < 20% brain (> 80% background)
count_background = np.sum(label_slice == 0)
if count_background > 0.8 * (label_slice.shape[0] * label_slice.shape[1]):
continue
# pad slices
pad_rows = max(0,max_shape[0] - label_slice.shape[0])
pad_cols = max(0,max_shape[1] - label_slice.shape[1])
# padding for each side
pad_top = pad_rows // 2
pad_bottom = pad_rows - pad_top
pad_left = pad_cols // 2
pad_right = pad_cols - pad_left
padded_feature_slice = np.pad(feature_slice, ((pad_top, pad_bottom), (pad_left, pad_right)), mode='constant', constant_values=0)
padded_label_slice = np.pad(label_slice, ((pad_top, pad_bottom), (pad_left, pad_right)), mode='constant', constant_values=0)
# save .npy files
feature_slice_filename = f"{os.path.basename(feature).split('.')[0]}_{slice_idx:03d}.npy"
label_slice_filename = f"{os.path.basename(label).split('.')[0]}_{slice_idx:03d}.npy"
np.save(os.path.join(feature_slice_dest_dir,feature_slice_filename), padded_feature_slice[np.newaxis,:])
np.save(os.path.join(label_slice_dest_dir,label_slice_filename), padded_label_slice[np.newaxis,:])
# Done: get pixel_counts
if get_pixel_counts:
unique,counts = np.unique(padded_label_slice,return_counts = True)
pixel_counts.update({label:count for label,count in zip(unique,counts)})
# increase slice_idx
slice_idx += 1

Run this example and tell me if the above cannot be improved in the same way

import numpy as np
a = np.random.rand(10, 5, 3)
b  = list(map(sum, a)  # sum can be any function
print(len(b), b[0].shape)
@hvgazula
Copy link
Collaborator Author

hvgazula commented Mar 20, 2024

label_vol = (utils.load_volume(label, im_only=True)).astype('int32')

uint16 will do for the label vols

edit: please see the table at the bottom of this page

@hvgazula
Copy link
Collaborator Author

hvgazula commented Mar 20, 2024

all_keys = {key for d in pixel_counts for key in d.keys()}

couldn't this be written as {*d.keys() for d in pixel_counts}?

update: iterable unpacking cannot be used in comprehension

@hvgazula
Copy link
Collaborator Author

max_rows = max(max_dims[0], max_dims[1])
max_cols = max(max_dims[1], max_dims[2])

Not sure if I agree with this. What if the middle value is the largest? You will end up with a square and that's unnecessary. Am I missing something?

@hvgazula
Copy link
Collaborator Author

if mode == 'train':
for item in pixel_counts:
train_pixel_counts += item

does this have to be done within the context manager?

@hvgazula
Copy link
Collaborator Author

hvgazula commented Mar 20, 2024

Also, pixel_counts is a dict. Could you not simply write sum(pixel_counts), although i am not sure yet why the keys are added and not the values?

@sabeenlohawala
Copy link
Owner

https://github.com/sabeenlohawala/tissue_labeling/blob/8f9b20506740c2364051e9ca6975efd7f7ace38b/scripts/mit_kwyk_data.py#L293C9-L298C436

pixel_counts = pool.starmap(

Using list(map(...)) slows down the computation, but removing list results in error thrown in line 411: 'map' object is not subscriptable.

@sabeenlohawala
Copy link
Owner

max_rows = max(max_dims[0], max_dims[1])
max_cols = max(max_dims[1], max_dims[2])

Not sure if I agree with this. What if the middle value is the largest? You will end up with a square and that's unnecessary. Am I missing something?

slice[i,:,:] → shape is dim[1] x dim[2]
slice[:,i,:] → shape is dim[0] x dim[2]
slice[:,:,i] → shape is dim[0] x dim[1]
Therefore, in order for all slices to be the same shape, slice shape should be (max(dim[0], dim[1]), max(dim[1], dim[2]))

@hvgazula
Copy link
Collaborator Author

hvgazula commented Mar 21, 2024

cool..please create a separate function with this docstring so people like me will know why 😄

sabeenlohawala added a commit that referenced this issue Mar 21, 2024
Convert label vol and slices to int16

Add helper functions and docstrings

Remove unnecessary comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants