New feature to filter datasets by column and its corresponding values #53

mjcarbonell · 2023-12-26T21:34:44Z

What existing problem does the pull request solve?
The pull request introduces a filtering feature to the compile function of the Smart Drift object. Previously, users experienced cluttered visualizations due to the inclusion of all data points. This update allows users to apply filters to the dataset to refine the visualizations and focus on the most relevant data, leading to clearer insights and a more streamlined analysis process.

Test Plan
An example of compiling a dataset with filters can be seen below. Where we are working with a dataset with 300+ countries and simplifying it to 6 countries.

sd.compile(
full_validation=True, # Optional: to save time, leave the default False value. If True, analyze consistency on modalities between columns.
date_compile_auc="01/01/2022", # Optional: useful when computing the drift for a time that is not now
datadrift_file="datadrift_auc.csv", # Optional: name of the csv file that contains the performance history of data drift
filter_column='name', #Optional: Name of the column you wish to filter
filter_values=['France', 'Ottomans', "Austria", "Poland", "Brandenburg", "Bohemia"] # Optional: Names of the values from the column you chose above that you wish to filter.
)

Description

The issue that was fixed pertains to cluttered visualizations (Issue #51) and being able to focus on specify column values in our dataset.

Type of Change

New feature (non-breaking change which adds functionality or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

The new feature was tested with several datasets of varying sizes and complexities. Filters were applied to exclude specific ranges, outliers, and categories. The filtered data produced clearer visualizations that matched expected outcomes.

Test Configuration:

OS: Windows
Python version: [e.g., 3.9]

new feature to filter by column

a7c7d9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature to filter datasets by column and its corresponding values #53

New feature to filter datasets by column and its corresponding values #53

mjcarbonell commented Dec 26, 2023

New feature to filter datasets by column and its corresponding values #53

Are you sure you want to change the base?

New feature to filter datasets by column and its corresponding values #53

Conversation

mjcarbonell commented Dec 26, 2023