Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptahop support improvement #551

Merged
merged 35 commits into from
May 10, 2021
Merged

Conversation

cphyc
Copy link
Contributor

@cphyc cphyc commented Jan 17, 2020

This PR improves the support of adaptahop in pynbody (see #546 for the original PR).

It changes the following behaviour:

  • the halo data is not cached any more, as it was causing memory issues,
  • to mitigate the computational cost of the aforementioned change, I added an OpenMP-parallelized binary search algorithm that finds items of a sorted array into another sorted array. This improves by a factor of ~2xNCPU the performance compared to the np.searchsorted function. This was used to generate an IndexedSubSnap from a list of particle ids,
  • a new derived array has been defined, namely iord_argsort which is the result of np.argsort(sim['iord']),
  • implement get_group_array for adaptahop, to make it usable with https://github.com/pynbody/tangos (see TypeError error when linking tangos#112).

@apontzen
Copy link
Member

Looks good! I just had one query flagged inline...

@cphyc cphyc changed the title Adaptahop support Adaptahop support improvement Jan 20, 2020
# - obtained from get_group_array masking
# are the same (in term of sets)
assert len(np.setdiff1d(iord[mask], h[halo_id]['iord'])) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also be checking the reverse? I think setdiff1d is asymmetric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now done.

for halo_id, halo in self._halos.items():
fpu.seek(halo.properties['file_offset'])
fpu.skip(1) # number of particles
particle_ids = fpu.read_vector('i')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #584 is merged, this should be fixed to take into account cases where particle_ids is not sorted and/or is not a 32bit integer.

@cphyc cphyc force-pushed the adaptahop-support branch 3 times, most recently from 910f512 to 4d760de Compare October 21, 2020 14:44
@cphyc cphyc force-pushed the adaptahop-support branch from 4d760de to 7311915 Compare May 5, 2021 08:51
@cphyc
Copy link
Contributor Author

cphyc commented May 5, 2021

@apontzen do you think this is ready to go?

@apontzen
Copy link
Member

apontzen commented May 5, 2021

Is there any way to test this? Happy to add a small file to the testdata tar if needed.

@cphyc
Copy link
Contributor Author

cphyc commented May 5, 2021

I can generate one of about ~50Mio (snapshot + simulation), would that suffice?

@apontzen
Copy link
Member

apontzen commented May 5, 2021

Sounds good, thank you!

@cphyc
Copy link
Contributor Author

cphyc commented May 6, 2021

We can use the following halo catalog (very minimalistic, only two halos and linked to the output_00080 dataset already present in the testdata tar).
tree_bricks080.zip

@cphyc cphyc force-pushed the adaptahop-support branch from 3292f6a to 074da84 Compare May 10, 2021 09:10
@apontzen
Copy link
Member

Thanks!

@apontzen apontzen merged commit 61721cf into pynbody:master May 10, 2021
@cphyc cphyc deleted the adaptahop-support branch May 10, 2021 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants