Skip to content

Latest commit

 

History

History
21 lines (19 loc) · 779 Bytes

README.md

File metadata and controls

21 lines (19 loc) · 779 Bytes

Adults DataSet UCI

Problem Setting

A polling institute wants to be able to estimate an individual’s income from his/her personal data (see einkommen.train). To this aim, 30.000 individuals were interviewed concerning the features summarized below. For some of the individuals, not all features are available. Crucially, the income of only 5.000 of the interviewee’s is known.

Steps:

  • Data Integration
  • Feature Representation
  • EDA Pairplot
  • Correlation of Numeric Attributes
  • Missing Value Representation
  • Data Cleaning, covert categorical variables to numerical
  • Check missing values
  • Feature Selection
  • Model Selection and Evaluation
    • 'Logistic Regression'
    • 'Random Forest'
    • 'Neural Network'
    • 'GaussianNB'
    • 'DecisionTreeClassifier'
    • 'SVM'