Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature to filter datasets by column and its corresponding values #53

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mjcarbonell
Copy link

What existing problem does the pull request solve?
The pull request introduces a filtering feature to the compile function of the Smart Drift object. Previously, users experienced cluttered visualizations due to the inclusion of all data points. This update allows users to apply filters to the dataset to refine the visualizations and focus on the most relevant data, leading to clearer insights and a more streamlined analysis process.

Test Plan
An example of compiling a dataset with filters can be seen below. Where we are working with a dataset with 300+ countries and simplifying it to 6 countries.

sd.compile(
full_validation=True, # Optional: to save time, leave the default False value. If True, analyze consistency on modalities between columns.
date_compile_auc="01/01/2022", # Optional: useful when computing the drift for a time that is not now
datadrift_file="datadrift_auc.csv", # Optional: name of the csv file that contains the performance history of data drift
filter_column='name', #Optional: Name of the column you wish to filter
filter_values=['France', 'Ottomans', "Austria", "Poland", "Brandenburg", "Bohemia"] # Optional: Names of the values from the column you chose above that you wish to filter.
)

Description

The issue that was fixed pertains to cluttered visualizations (Issue #51) and being able to focus on specify column values in our dataset.

Type of Change

New feature (non-breaking change which adds functionality or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

The new feature was tested with several datasets of varying sizes and complexities. Filters were applied to exclude specific ranges, outliers, and categories. The filtered data produced clearer visualizations that matched expected outcomes.

Test Configuration:

OS: Windows
Python version: [e.g., 3.9]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant