Dataset Information
Link - http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
1)ID number
2-31) Ten synthetic-valued features are computed for each cell nucleus:
- radius (mean of distances from the center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)
The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.
- Diagnosis (M = malignant, B = benign)
PCA RandomForestClassifier gives good accuracy as compare to normal RandomForestClassifier model this is because Principal Component Analysis (PCA) takes only those features, which are explaining high varience, so because of that we are getting good accuracy for PCA as compare to normal models.