Skip to content

Python and Data Science Source by Team-Ant, partnered with Venturenix Lab

Notifications You must be signed in to change notification settings

shinhchung/data_science_course

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python and Data Science Course Repository

Course materials for weekly Python/Data science class in Hong Kong, partnered with Venturenix Lab since 2018

Instructor: Anthony Lo and Gawain Chin
Part 1 Part 2
Lesson 1: Python Basics Lesson 7: Introduction to Data Science
Lesson 2: Functions and Your First Application Lesson 8: Data Manipulation and Visualization
Lesson 3: Intensive Code Training Lesson 9: Black box machine learning
Lesson 4: Data Strucuture and Complexity Lesson 10: Linear models and gradient descent
Lesson 5: Web Scraping and OOP Lesson 11: Logistic Regression and SVM
Lesson 6: Data Manipulation Lesson 12: Model Evaluation and Regularization
Lesson 13: Ensemble Learning and Tree based Models
Lesson 14: Kaggle competition
Lesson 15: Clustering and Dimensionality Reduction
Lesson 16: Recommender System
Lesson 17: Natural Language Processing
Lesson 18: TBD

Lesson 1: Python Basics

  • Setting up Python Environment
  • Data Type: Integer, Floats, Booleans, String
  • Variable assignments
  • Type Conversion
  • Operators: Arithmetic, Comparison, Logical, Bitwise
  • Control Flows: If-elif-else, While Loop, For loop
Homework:
  • Setting up your python environment if you have any set up issues during the lession
  • Join the class slack
  • Complete the L1 Homework before next lession
Resources

Lesson 2: Your First Applications

  • Functions: Input arguments, function return
  • Local Variable vs Global Variable
  • Classwork: Write a game (Refer to class notes)
Homework:
Resources

Lesson 3: Intensive Code Training

  • Introduction of Github
  • Review Python Basics and functions
  • Review Game 2 Homework
Homework:
  • Complete L3 Homework
  • Create your own github account and explore the open source world
  • ⭐ Star this Data Science repo to get the latest materials!
Resources

Lesson 4: Data Strucuture and Complexity

  • Data structures (List, Set, Dictionary, Tuple)
  • Mutable vs Immutable
  • Understanding time complexity and space complexity
Homework:
Resources

Lesson 5: Web Scraping and OPP

  • Web Scraping overview
  • Python Web Scraping tool: request and beautiful soup
  • Classwork: Hands-on crawling excerise
Homework:
  • Web scraping homework
Resources

Lesson 6: Data Manipulation

  • Web Scraping II
  • Introduction to Python Class Objects
  • Pandas Basics with Case study
Homework:
  • Flight Delay Dataset: Create your own tables with Pandas
Resources

Lesson 7: Introduction to Data Science

  • What is Data Science?
  • Essential Skills of Data Scientist
  • Foundation of Probability
  • Permutation vs Combination
Homework:
Resources

Lesson 8: Data Manipulation and Visualization

  • Case Study: Titanic Dataset
  • Understand Machine Learning Workflow
  • First EDA Training
  • Visualization: Matplotlib, Seaborn
Homework:
  • One Hot encoding on Variables
Resources

Lesson 9: Black box machine learning

  • Your First machine learning experience
  • EDA on Advertising Dataset
  • Understand the X and Y Relationship
Homework:
  • First Linear Regression with Scikit-Learn Model Training
Resources

Lesson 10: Linear models and gradient descent

  • Build Linear Regression from Scratch
  • Learn the theory behind gradient descent
Resources

Lesson 11: Logistic Regression and SVM

  • Learn the concept behind the logistic regression and its cost function
  • Understand different types of Classification model and the difference from Linear Regression
Resources

Lesson 12: Model Evaluation and Regularization

  • Model Evaluation Techniques: Training Set, Validation Set, Test Set
  • Understand the concept: Overfitting and Underfitting
  • Classification Metrics: Accuracy, Confusion Matrics, F1 Score, True Positive, False Positive, True Negative, False Negative
  • Regularization Concepts: Ridge and Lasso (L1 and L2)
Homework:
Resources

Lesson 13: Ensemble Learning and Tree based Models

  • Introduction of Tree-Based Model: Decision Tree
  • Tree Construction Concept: Edge and Node, Splitting Concepts
  • Ensemble Learning: Bagging and Boosting
Homework:
  • Revise the tree-based model and submit one kaggle competition by using tree base method
Resources

Lesson 14: Kaggle competition

  • Workshop lesson to work on Kaggle competition together
  • Understand end to end Machine Learning flow and apply to the kaggle competition
  • Use different algorithms to explore and review on model perforamance
Homework:
  • NA
Resources
  • NA

Lesson 15: Clustering and Dimensionality Reduction

  • Unsupervised Learning Concepts
  • Clustering Algorithm (e.g. K-Means)
  • Dimensionality reduction (e.g. PCA)
  • Case Study: Eigenface
Homework:
  • Self implementation of K-Means clustering
Resources

Lesson 16: Recommender System

  • Understand Recommendater System
  • Content-Base vs Collaborative Filtering
Homework:
  • Create movie profile by genres. each columns is 0/1 indicator for each genre
  • Use numpy to calculate the similarity matrix (m x m)
    • normalize each row by norm (A/|A| etc)
    • obtain similarity matrix by M dot Mt
  • write a function to get movie id and return the top K most similar movies with
    • min score, max score, min rating, min total rating, time range
Resources

Lesson 17: Natural Language Processing

  • Introduction to NLP
  • Tokenization, Tf-idf, Word Embedding
  • NLP Package Overview: NLTK
Homework:
  • NLP exercise and Recsys exercise
Resources

Lesson 18: TBD

Homework:
Resources

Past Kaggle by students

Python Resources


Cheat sheet


Data Visualization


Web Scraping


Basic Linear Algebra, Statistics and Calculus


Loss function

Supervised Learning

Linear Regression
  • Tutorial on Linear Regression: This blog describes the basic of linear regression.
Logistc Regression
  • Tutorial on Logistic Regression: This blog describes the basic of logistic regression.
  • CS229 notes of Logistic Regression, read p16 - p19
SVM
Decision Tree
Ensemble Learning
kNN

Unsupervised Learning

-Unsupervised Learning Overview

K-Means
Dimensionality Reduction
  • A very comprehensive study material on SVD/PCA.

Recommender System


Natural Language Processing


Reinforcement Learning


Deep Learning

About

Python and Data Science Source by Team-Ant, partnered with Venturenix Lab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%