perf: ⚡️ Enhanced + updated code snippets & README doc #448
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✅ Closes: #447
This pull request includes extensive additions to the
pysnippets/Scrubs
directory, introducing several new Python scripts for data preprocessing and cleaning, along with detailed documentation. The changes are organized into different scripts, each serving a specific purpose in the data cleaning pipeline.Documentation and Structure:
pysnippets/Scrubs/README.md
: Added a comprehensive README file that describes the purpose of the directory and details the functionality of each script.New Scripts for Data Preprocessing:
pysnippets/Scrubs/backup.py
: Introduced a script to handle saving cleaned dataframes to disk, including functions for fixing dates, imputing categorical variables, and compressing numeric columns.pysnippets/Scrubs/clean.py
: Added a script with functions to clean dataframes by removing unwanted columns and rows, handling zero and near-zero variance columns, and dropping columns with excessive missing values.pysnippets/Scrubs/clip.py
: Introduced a script for managing categorical variables by reducing the number of levels based on frequency and coverage thresholds.pysnippets/Scrubs/compress.py
: Added functions to compress numeric data types and encode categorical variables, with persistence of encoders.pysnippets/Scrubs/dummies.py
: Created functions to generate dummy variables for categorical data and integrate them into dataframes.pysnippets/Scrubs/pipeline.py
: Developed a script to orchestrate the entire data cleaning and preprocessing pipeline, including feature engineering and data aggregation.Utility Functions:
pysnippets/Scrubs/utils.py
: Added utility functions, including a decorator to measure the execution time of functions.@UTSAVS26 kindly review this PR. I request you to please give
level 3
on this PR.