-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow subsetting TOL classifier #59
base: main
Are you sure you want to change the base?
Conversation
Adds functions to TreeOfLifeClassifier to allow subsetting the embeddings. The get_label_data() method returns a dataframe of the labels for the TOL txt embeddings. The create_taxa_filter() method creates a filter (boolean array) for the txt embeddings based on a taxa and values. The apply_filter() method filters the classifier for a filter. The filter is a boolean array with the same length as the txt embeddings. Part of #56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions for clarity. The notebook was very helpful for visualizing the update.
Co-authored-by: Elizabeth Campolongo <[email protected]>
Co-authored-by: Elizabeth Campolongo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice update!
Should there maybe be a disclaimer somewhere that incorrect training spellings for taxa may adversely affect the filter bevahior? This would impact the example in the notebook:
|
I don't think we need a special disclaimer for that. Otherwise we'll also need disclaimers for all the synonyms etc that aren't properly consolidated to canonical name, etc. Those who want to understand the prediction in detail will invariably have to familiarize themselves with the paper and the data and processing code repositories. |
Adds functions to TreeOfLifeClassifier to allow subsetting the embeddings. The get_label_data() method returns a dataframe of the labels for the TOL txt embeddings. The create_taxa_filter() method creates a filter (boolean array) for the txt embeddings based on a taxa and values. The apply_filter() method filters the classifier for a filter. The filter is a boolean array with the same length as the txt embeddings.
Part of #56