Skip to content

Latest commit

 

History

History
260 lines (136 loc) · 11.2 KB

README.md

File metadata and controls

260 lines (136 loc) · 11.2 KB

Machine Learning Mentorship

This is the repository that contains all the material/code required to get started with the mentorship programme. A few points of administration:

  1. The length of the mentorship is around 5 weeks.

  2. We assume you have some prior knowledge of programming.

  3. For any help with the course, you can contact your mentor. A better option would be to open an issue on this repository, so that others can see your question, and it'll prevent any replicated effort on the part of the mentor.

  4. All your code will be pushed to GitHub, so if you haven't already, create a GitHub account. Fork and clone this repository and create your respective folders (refer the sample folder with my name).

  5. Create a README.md in your folder where you can keep track of your progress over the next month. The mentors will be using the README.md as a progress tracker. (Refer the sample README.md given)

Don't be afraid to ask any questions (however irrelevant you think it may be). The mentors are here to help you every step of the way.

Prerequisites

  1. Language: We'll be using Python3 throughout this course. So familiarise yourself with the language. Also learn to install packages using pip.

  2. Libraries (Installation):

    a. NumPy: Used for matrix computations.

    b. Pandas: Used for data analysis.

    c. Matplotlib: Used for data visualization

  3. Tools:

    a. Jupyter Notebook

    b. git: You'll be using GitHub for all your code/assignment submission, so learn the basics of git: pull, push, add, commit.

Resources

Since every one prefers a different approach to learning, we're gonna try our best to accomodate each style. Every topic has multiple levels of resources:

  1. Articles/Blogs: This will give you a detailed explanation for each topic alongwith the relevant mathematics.

  2. Code: If you prefer to learn by looking at the codebase, we'll link practical implementations of the topic(wherever appropriate).

  3. Lectures: We'll link free online YouTube lectures (wherever appropriate).

The recommendation would be to either use Lectures or Articles to get a solid grasp of the conceptual details, and to use the Code as a reference during the assignment. Please note that we don't tolerate any plagiarism.

At the end of each week you will be given a set of tasks to complete. This could either be a report, or a coding assignment. All submissions will happen via GitHub.

Detailed Breakdown

WEEK 1

Basics of Python:

a. Python Fundamentals

b. Variables

c. Data Types

d. Operators

e. Conditions and Loops

f. Python Functions

g. Python Data Structures
  1. Code
  2. Article/Tutorial
  3. Lecture

Python for Machine Learning

  1. Lecture

Tasks :

Submit your codes in Python in the Task 1 folder.

1.1 Python If-Else

1.2 Word Order

1.3 Time Delta

1.4 Matrix Script (Optional)


WEEK 2

NumPy

  1. Article and Code

  2. Lecture

Pandas

  1. Article and Code

  2. Lecture

Matplotlib

  1. Tutorial

  2. Article and Code

  3. Article/Blog

Tasks :

Submit your code in the Task 2 folder.

2.1 Download this dataset and perform the following tasks:

  1. Load the data (both train and test)

  2. Print the shape and display the data (using .head())

  3. Check if there are missing values in the data and replace them with "NaN"

2.2 Use this dataset and draw a line plot similar to that as shown here


WEEK 3

  1. Introduction to Machine learning

  2. Introduction to Supervised and Unsupervised Learning

  3. Linear regression

  4. Multivariate Regression

  5. Lecture (Recommended)

Tasks :

Submit your code in the Task 3 folder.

Air Quality Prediction


WEEK 4

  1. Introduction to Logistic Regression

  2. Lecture

  3. Linear Regression vs Logistic Regression

  4. K Nearest Neighbours

Tasks :

Submit your code in the Task 4 folder.

Diabetes Prediction

4.1 Using Logistic Regression

4.2 Using KNN


WEEK 5

Introduction to K-Means Clustering

  1. Article/Blog

  2. Article (A bit long but quite useful)

Support Vector Machine

  1. Article

  2. Code

  3. Lecture

Final Task :

Submit your code in the Task 5 folder.

Goal : To precisely predict individuals’ income using data collected from the 1994 U.S. Census. Your goal is to build a model that accurately predicts whether an individual makes more than $50,000.

Dataset : UCI Machine Learning Repository

Dataset Description : This dataset consists of approximately 32,000 data points, with each datapoint having 13 features. This dataset is a modified version of the dataset published in the paper “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, by Ron Kohavi.

Features

  • age : Age

  • workclass : Working Class (Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)

  • education_level : Level of Education (Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)

  • education-num : Number of educational years completed

  • marital-status : Marital status (Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse)

  • occupation : Work Occupation (Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces)

  • relationship : Relationship Status (Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried)

  • race : Race (White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)

  • sex : Sex (Female, Male)

  • capital-gain : Monetary Capital Gains

  • capital-loss : Monetary Capital Losses

  • hours-per-week : Average Hours Per Week Worked

  • native-country : Native Country (United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands)

Target Variable

  • income : Income Class (<=50K, >50K)

After the programme

Machine Learning is a vast field and since the mentorship programme was limited to 4/5 weeks we covered only the basic algorithms. Below are listed some other important algorithms. I've provided a basic introduction/implementation blog for each. However you can go ahead and explore these topics further.

  • Decision Trees and Random Forest
  1. Introductory Blog
  2. Implementation
  3. In-depth
  • Perceptrons and Neural Network
  1. Blog
  2. In-depth
  • Artificial Neural Network
  1. Blog
  2. Implementation
  3. In-depth
  • Convolutional Neural Network
  1. Blog 1
  2. Blog 2
  • Transfer Learning

  • Recurrent Neural Network

  • Generative Adversarial Networks

  • Deep Convolutional GAN's (DCGANs)