Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the Census search space with an obs_value_filter parameter in find_nearest_obs #1305

Open
pablo-gar opened this issue Oct 23, 2024 · 1 comment
Labels

Comments

@pablo-gar
Copy link
Contributor

Description

Currently find_nearest_obs finds the closest Census cells to user's data. It would be a great addition to have the ability for the search to be constrained to a subset of user-defined Census cells instead of all Census cells.

To provide consistency with other census API, a simple parameter obs_value_filter (as used in the get_anndataAPI) can be added to limit the search to Census cells meeting the filter criteria.

Context

The find_nearest_obs functionality is great and working as intended, however many times I can be interested on searching the most similar cells against a specific subset of Census defined by a biological context, specially when I know my query cells are from the same/similar biological context.

Impact

This limits the ability to utilize the find_nearest_obs to its full extent in its current form.

Ideal behavior

A parameter obs_value_filter in find_nearest_obs to limit the search to Census cells meeting the filter criteria.

@pablo-gar pablo-gar changed the title Add an obs_value_filter parameter to find_nearest_obs Specify the Census search space with an obs_value_filter parameter in find_nearest_obs Oct 23, 2024
@ivirshup
Copy link
Collaborator

Hmm, will have to check with @cathystoli to see if we're still accepting feature requests from you 😉


I think this makes a ton of sense as a feature. I've noticed that we probably want to filter out all is_primary==False cells from queries, since you just end up getting a bunch of cells with the exact same embedding.

I've consulted with the TileDB folks and I think this should be quite doable.

  • This notebook demonstrates the metadata_df_filter_fn which basically does exactly what you want
  • Ideally we would be able to use the same query string you are suggesting, but we need to figure out how to wrap that as a function which can be applied per row.

cc: @mlin

@cathystoli cathystoli added P0 Priority 0 - Critical, fix ASAP! Priority backlog items 2024-q4 labels Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants