Course materials for weekly Python/Data science class in Hong Kong, partnered with Venturenix Lab since 2018
Instructor: Anthony Lo and Gawain Chin
Part 1 | Part 2 |
---|---|
Lesson 1: Python Basics | Lesson 7: Introduction to Data Science |
Lesson 2: Functions and Your First Application | Lesson 8: Data Manipulation and Visualization |
Lesson 3: Intensive Code Training | Lesson 9: Black box machine learning |
Lesson 4: Data Strucuture and Complexity | Lesson 10: Linear models and gradient descent |
Lesson 5: Web Scraping and OOP | Lesson 11: Logistic Regression and SVM |
Lesson 6: Data Manipulation | Lesson 12: Model Evaluation and Regularization |
Lesson 13: Ensemble Learning and Tree based Models | |
Lesson 14: Kaggle competition | |
Lesson 15: Clustering and Dimensionality Reduction | |
Lesson 16: Recommender System | |
Lesson 17: Natural Language Processing | |
Lesson 18: TBD |
- Setting up Python Environment
- Data Type: Integer, Floats, Booleans, String
- Variable assignments
- Type Conversion
- Operators: Arithmetic, Comparison, Logical, Bitwise
- Control Flows: If-elif-else, While Loop, For loop
- Setting up your python environment if you have any set up issues during the lession
- Join the class slack
- Complete the L1 Homework before next lession
- Python2 vs Python3: Difference between Python2 and Python3
- Python Cheat Sheet: Quick Python Syntax look up
- Functions: Input arguments, function return
- Local Variable vs Global Variable
- Classwork: Write a game (Refer to class notes)
- Game 2: Tic-Tac-Toe
- Python random module: For homework 1
- Python Numpy Module: For homework 2
- Numpy cheatsheet: Quick Numpy lookup
- Introduction of Github
- Review Python Basics and functions
- Review Game 2 Homework
- Complete L3 Homework
- Create your own github account and explore the open source world
- ⭐ Star this Data Science repo to get the latest materials!
- Introduction to Github: A beginner guide to Github
- Data structures (List, Set, Dictionary, Tuple)
- Mutable vs Immutable
- Understanding time complexity and space complexity
- Complete L4 Homework
- Offical documentation on Data Stucture
- Cheat Sheet on Time-complexity of various operation
- Web Scraping overview
- Python Web Scraping tool: request and beautiful soup
- Classwork: Hands-on crawling excerise
- Web scraping homework
- Requests documentation
- Beautiful Soup documentation
- Web Scraping II
- Introduction to Python Class Objects
- Pandas Basics with Case study
- Flight Delay Dataset: Create your own tables with Pandas
- What is Data Science?
- Essential Skills of Data Scientist
- Foundation of Probability
- Permutation vs Combination
- Probabilities and Statistics Refresher From Stanford
- Case Study: Titanic Dataset
- Understand Machine Learning Workflow
- First EDA Training
- Visualization: Matplotlib, Seaborn
- One Hot encoding on Variables
- Matplotlib Official Guide for Visualization
- Seaborn Official Guide: Another great package to create beautiful charts
- Pandas Get Dummy Function
- Your First machine learning experience
- EDA on Advertising Dataset
- Understand the X and Y Relationship
- First Linear Regression with Scikit-Learn Model Training
- Scikit-learn: User Guide for machine learning
- Build Linear Regression from Scratch
- Learn the theory behind gradient descent
- Learn the concept behind the logistic regression and its cost function
- Understand different types of Classification model and the difference from Linear Regression
- Tutorial on linear regression and logistic regression
- Loss function
- Logistic regression Please look at P16-P19.
- Model Evaluation Techniques: Training Set, Validation Set, Test Set
- Understand the concept: Overfitting and Underfitting
- Classification Metrics: Accuracy, Confusion Matrics, F1 Score, True Positive, False Positive, True Negative, False Negative
- Regularization Concepts: Ridge and Lasso (L1 and L2)
- Introduction of Tree-Based Model: Decision Tree
- Tree Construction Concept: Edge and Node, Splitting Concepts
- Ensemble Learning: Bagging and Boosting
- Revise the tree-based model and submit one kaggle competition by using tree base method
- Workshop lesson to work on Kaggle competition together
- Understand end to end Machine Learning flow and apply to the kaggle competition
- Use different algorithms to explore and review on model perforamance
- NA
- NA
- Unsupervised Learning Concepts
- Clustering Algorithm (e.g. K-Means)
- Dimensionality reduction (e.g. PCA)
- Case Study: Eigenface
- Self implementation of K-Means clustering
- Elbow for K-Means: link1, link2
- Example using K-means for customer segmenation
- A very comprehensive study material on SVD/PCA.
- Scikit-learn offical documentation to discuss different types of clustering.
- An introduction to the DBSCAN algorithm and its Implementation in Python on KDnuggets.
- For more details on DBSCAN, read the original paper.
- T-SNE: paper, introduction, sklearn, visual
- Hierachical-clustering: Introduction
- Understand Recommendater System
- Content-Base vs Collaborative Filtering
- Create movie profile by genres. each columns is 0/1 indicator for each genre
- Use numpy to calculate the similarity matrix (m x m)
- normalize each row by norm (A/|A| etc)
- obtain similarity matrix by M dot Mt
- write a function to get movie id and return the top K most similar movies with
- min score, max score, min rating, min total rating, time range
- Factorization Machine The FM paper by Rendle
- xlearn package for FM and FFM: link
- (Optional) Deep and Wide Learning The Deep Learning RecSys architecture by Google
- Introduction to NLP
- Tokenization, Tf-idf, Word Embedding
- NLP Package Overview: NLTK
- NLP exercise and Recsys exercise
- NLTK: official website
- Word2vec visualization: link
- Jibra (Chinese NLP): github
- Santander Customer Transaction Prediction
- Microsoft Malware Prediction
- TalkingData Mobile User Demographics
- (Exercise) Titanic: Machine Learning from Disaster
- (Exercise) Digit Recognizer
- Python Offical Tutorial: A comprehensive tutorial provided from Python offical documentation.
- Google's Python Class: Materials prepared by Google, including hours of lecture videos and exercises.
- W3resource: A series of useful exercises (with solutions) for Python Beginners.
- Python2 vs Python3: Difference between Python2 and Python3
- Object-Oriented Programming: Introduction of OOP in Python
- Time compexity: Understand standard Big-O at Python
- Pandas Tutorial: 10 minutes to pandas
- Numpy Tutorial: Official Quickstart tutorial
- Python Environment: Environment setting guide. e.g. environment variable
- (Optional) PEP 8: Style Guide for Python
- Matplotlib:Basic introduction
- Seaborn Tutorial : Easy to use Data visualization tool. Build on top of matplotlib
- Requests documentation
- Beautiful Soup documentation
- [Selenium] https://selenium-python.readthedocs.io
- Scrapy
- (Optional) CS229 notes of loss functions
- Scikit-learn documentations: User Gudie and Tutorials
- Tutorial on Linear Regression: This blog describes the basic of linear regression.
- Tutorial on Logistic Regression: This blog describes the basic of logistic regression.
- CS229 notes of Logistic Regression, read p16 - p19
- Tutorial on Support Vector Machine. Read this for SVM basic
- Lecture by Professor Patrick Winston with Math
- Information Gain
- Decision Tree Intro From ESL Chapter 9
-Unsupervised Learning Overview
- Unsupervised Intro From ESL Chapter 14
- Elbow for K-Means: link1, link2
- Example using K-means for customer segmenation
- A very comprehensive study material on SVD/PCA.
- XLearn The library we used in class for RecSys
- Factorization Machine The FM paper by Rendle
- (Optional) Deep and Wide Learning The Deep Learning RecSys architecture by Google
- NLTK: official website
- Word2vec visualization: link
- Word2vec Overview
- Jibra (Chinese NLP): github