INDE 577 - Data Science and Machine Learning

This repository is a thorough summary of Rice University's "Data Science and Machine Learning" graduate-level course in Industrial Engineering.

What is Machine Learning?

Machine learning is the development of a computer program/system that has the ability to predict future data and make adaptations based on the current observed data. Machine learning is incredibly beneficial in simplifying tasks that would otherwise require extensive manual tuning and application of rules. When put into practice, machine learning can solve intricate problems such as:

Facilitating interaction: Empowering chatbots with natural language understanding, enabling image and speech recognition for app interactions through voice commands.
Detecting anomalies: Identifying instances of fraud, spam, hate speech, and even tumors.
Enhancing user experience: Improving recommendations, search outcomes, notifications, and advertising personalization.
Optimizing content delivery: Anticipating future actions to preload content, minimizing delays, visually summarizing complex data, and projecting future earnings.
Generating insights: Analyzing data, evaluating feature importance, summarizing documents, and more.

Course Description:

A comprehensive journey through data science and machine learning. This graduate course serves as a holistic introduction, focusing on essential algorithms, data science methodologies, and the complete data processing lifecycle. Topics include:

Data Science Practices

Python Programming
Jupyter Notebooks
Visual Studio Code
Version Control with Git and GitHub
Data Visualization
Model Building, Validation, and Error Analysis

Supervised Learning

Unsupervised Learning

This repository contains notes on each of the ML topics as well as Jupyter Notebook files that explore datasets and apply each of the topics. Data science tools used in this repository include Python (versions 3.6 and above), pandas, matplotlib, seaborn, NumPy, and scikit-learn. The course also introduced students to powerful assistant tools such as ChatGPT, GPT4, and Github CoPilot.

Dataset Used

In the notebooks of this repository, (this Spotify dataset)[https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset] will be used. I love music, and Spotify is somewhere I'd like to work in the future, so it only seemed natural to me to utilize this dataset for the project!

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
supervised-learning		supervised-learning
unsupervised-learning		unsupervised-learning
LICENSE		LICENSE
README.md		README.md
clean_data.py		clean_data.py
spotify_data.csv		spotify_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INDE 577 - Data Science and Machine Learning

What is Machine Learning?

Course Description:

Data Science Practices

Supervised Learning

Unsupervised Learning

Dataset Used

About

Releases

Packages

Languages

License

gbd2/INDE-577

Folders and files

Latest commit

History

Repository files navigation

INDE 577 - Data Science and Machine Learning

What is Machine Learning?

Course Description:

Data Science Practices

Supervised Learning

Unsupervised Learning

Dataset Used

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages