Skip to content

gbd2/INDE-577

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INDE 577 - Data Science and Machine Learning

132994733-3940c6b9-ffca-4a41-bae6-e9418484c15b

This repository is a thorough summary of Rice University's "Data Science and Machine Learning" graduate-level course in Industrial Engineering.

What is Machine Learning?

Machine learning is the development of a computer program/system that has the ability to predict future data and make adaptations based on the current observed data. Machine learning is incredibly beneficial in simplifying tasks that would otherwise require extensive manual tuning and application of rules. When put into practice, machine learning can solve intricate problems such as:

  • Facilitating interaction: Empowering chatbots with natural language understanding, enabling image and speech recognition for app interactions through voice commands.
  • Detecting anomalies: Identifying instances of fraud, spam, hate speech, and even tumors.
  • Enhancing user experience: Improving recommendations, search outcomes, notifications, and advertising personalization.
  • Optimizing content delivery: Anticipating future actions to preload content, minimizing delays, visually summarizing complex data, and projecting future earnings.
  • Generating insights: Analyzing data, evaluating feature importance, summarizing documents, and more.

Course Description:

A comprehensive journey through data science and machine learning. This graduate course serves as a holistic introduction, focusing on essential algorithms, data science methodologies, and the complete data processing lifecycle. Topics include:

Data Science Practices

  • Python Programming
  • Jupyter Notebooks
  • Visual Studio Code
  • Version Control with Git and GitHub
  • Data Visualization
  • Model Building, Validation, and Error Analysis

Supervised Learning

Unsupervised Learning

This repository contains notes on each of the ML topics as well as Jupyter Notebook files that explore datasets and apply each of the topics. Data science tools used in this repository include Python (versions 3.6 and above), pandas, matplotlib, seaborn, NumPy, and scikit-learn. The course also introduced students to powerful assistant tools such as ChatGPT, GPT4, and Github CoPilot.

Dataset Used

In the notebooks of this repository, (this Spotify dataset)[https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset] will be used. I love music, and Spotify is somewhere I'd like to work in the future, so it only seemed natural to me to utilize this dataset for the project!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published