This repository is a thorough summary of Rice University's "Data Science and Machine Learning" graduate-level course in Industrial Engineering.
Machine learning is the development of a computer program/system that has the ability to predict future data and make adaptations based on the current observed data. Machine learning is incredibly beneficial in simplifying tasks that would otherwise require extensive manual tuning and application of rules. When put into practice, machine learning can solve intricate problems such as:
- Facilitating interaction: Empowering chatbots with natural language understanding, enabling image and speech recognition for app interactions through voice commands.
- Detecting anomalies: Identifying instances of fraud, spam, hate speech, and even tumors.
- Enhancing user experience: Improving recommendations, search outcomes, notifications, and advertising personalization.
- Optimizing content delivery: Anticipating future actions to preload content, minimizing delays, visually summarizing complex data, and projecting future earnings.
- Generating insights: Analyzing data, evaluating feature importance, summarizing documents, and more.
A comprehensive journey through data science and machine learning. This graduate course serves as a holistic introduction, focusing on essential algorithms, data science methodologies, and the complete data processing lifecycle. Topics include:
- Python Programming
- Jupyter Notebooks
- Visual Studio Code
- Version Control with Git and GitHub
- Data Visualization
- Model Building, Validation, and Error Analysis
- The Perceptron
- Gradient Descent
- Linear Regression
- Logistic Regression
- Neural Networks
- k-Nearest Neighbors
- Decision Trees
- Ensemble Learning
This repository contains notes on each of the ML topics as well as Jupyter Notebook files that explore datasets and apply each of the topics. Data science tools used in this repository include Python (versions 3.6 and above), pandas, matplotlib, seaborn, NumPy, and scikit-learn. The course also introduced students to powerful assistant tools such as ChatGPT, GPT4, and Github CoPilot.
In the notebooks of this repository, (this Spotify dataset)[https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset] will be used. I love music, and Spotify is somewhere I'd like to work in the future, so it only seemed natural to me to utilize this dataset for the project!