Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: ⚡️ Enhanced + updated code snippets & README doc #448

Merged
merged 1 commit into from
Nov 10, 2024

Conversation

yashksaini-coder
Copy link
Contributor

Closes: #447

This pull request includes extensive additions to the pysnippets/Scrubs directory, introducing several new Python scripts for data preprocessing and cleaning, along with detailed documentation. The changes are organized into different scripts, each serving a specific purpose in the data cleaning pipeline.

Documentation and Structure:

  • pysnippets/Scrubs/README.md: Added a comprehensive README file that describes the purpose of the directory and details the functionality of each script.

New Scripts for Data Preprocessing:

  • pysnippets/Scrubs/backup.py: Introduced a script to handle saving cleaned dataframes to disk, including functions for fixing dates, imputing categorical variables, and compressing numeric columns.
  • pysnippets/Scrubs/clean.py: Added a script with functions to clean dataframes by removing unwanted columns and rows, handling zero and near-zero variance columns, and dropping columns with excessive missing values.
  • pysnippets/Scrubs/clip.py: Introduced a script for managing categorical variables by reducing the number of levels based on frequency and coverage thresholds.
  • pysnippets/Scrubs/compress.py: Added functions to compress numeric data types and encode categorical variables, with persistence of encoders.
  • pysnippets/Scrubs/dummies.py: Created functions to generate dummy variables for categorical data and integrate them into dataframes.
  • pysnippets/Scrubs/pipeline.py: Developed a script to orchestrate the entire data cleaning and preprocessing pipeline, including feature engineering and data aggregation.

Utility Functions:


@UTSAVS26 kindly review this PR. I request you to please give level 3 on this PR.

Copy link
Contributor

👋 Thank you for opening this pull request! We're excited to review your contribution. Please give us a moment, and we'll get back to you shortly!

Feel free to join our community on Discord to discuss more!

@github-actions github-actions bot requested a review from UTSAVS26 November 10, 2024 00:11
@UTSAVS26 UTSAVS26 merged commit b07a77e into UTSAVS26:main Nov 10, 2024
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance: Python scrubs code snippets
2 participants