-
Notifications
You must be signed in to change notification settings - Fork 100
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Vehicle Insurance Claim Fraud Detection
- Loading branch information
1 parent
2017fde
commit f981168
Showing
8 changed files
with
30,733 additions
and
0 deletions.
There are no files selected for viewing
15,421 changes: 15,421 additions & 0 deletions
15,421
Finacial Domain/Vehicle Insurance Claim Fraud Detection/Dataset/fraud_oracle.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+53.8 KB
Finacial Domain/Vehicle Insurance Claim Fraud Detection/Images/newplot (1).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+59.7 KB
Finacial Domain/Vehicle Insurance Claim Fraud Detection/Images/newplot (2).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+24.3 KB
Finacial Domain/Vehicle Insurance Claim Fraud Detection/Images/newplot (3).png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+25.1 KB
Finacial Domain/Vehicle Insurance Claim Fraud Detection/Images/newplot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9,544 changes: 9,544 additions & 0 deletions
9,544
...Domain/Vehicle Insurance Claim Fraud Detection/Model/VehicleInsuranceCLASSIFICATION.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
5,687 changes: 5,687 additions & 0 deletions
5,687
Finacial Domain/Vehicle Insurance Claim Fraud Detection/Model/VehicleInsuranceEDA.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
81 changes: 81 additions & 0 deletions
81
Finacial Domain/Vehicle Insurance Claim Fraud Detection/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Vehicle Insurance Claim Fraud Detection | ||
|
||
|
||
## GOAL | ||
|
||
|
||
Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident so, It will Detect fraud claims and will help Insurance Firms to verify them properly again. | ||
|
||
|
||
## DATASET | ||
|
||
|
||
https://www.kaggle.com/shivamb/vehicle-claim-fraud-detection | ||
|
||
## DESCRIPTION | ||
|
||
Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident so, It will Detect fraud claims and will help Insurance Firms to verify them properly again.Starting with cleaning and EDA I'll be going with Classification & SGD Classifier and will try to finalize the one with the highest accuracy | ||
|
||
## WORK DONE | ||
|
||
* Analyzed the data, searched for missing values, seperated categorical and continous data. | ||
* Removed unnecessary columns | ||
* Converting some column data to Meaning full numerical data | ||
* Grouping and Dealing with lots of Object Type of Data | ||
* Build Relation Between PolicyType with other Two Column | ||
* Recreated BasePolicy Column and compared it with PolicyType and which bought some insightful results | ||
* Provided Desired Values to Data which was available in Range. | ||
* Performed Exploratory Data Analysis | ||
* Split the dataset into Train and Test data. | ||
* Trained model with the following algorithms: | ||
* Support Vector Classifier | ||
* Naive Bayes Classifier | ||
* KNN Classifier | ||
* Decision Tree Classifier | ||
* Random Forest Classifier | ||
* XGBoost Classifier | ||
* Logistic Regression | ||
* Also evaluated the performance of the model with highest accuracy. | ||
|
||
|
||
## MODELS USED | ||
|
||
1. **Support Vector Classifier** - Support vector machine is a representation of the training data as points in space separated into categories by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. It is effective in high dimensional spaces and uses a subset of training points in the decision function, so it is also memory efficient. | ||
2. **Naive Bayes Classifier** - Naive Bayes algorithm based on Bayes’ theorem with the assumption of independence between every pair of features. This algorithm requires a small amount of training data to estimate the necessary parameters. Naive Bayes classifiers are extremely fast compared to more sophisticated methods. | ||
3. **K-nearest neighbors Classifier** - It is a simple algorithm to understand and can be used for classification analysis. Classification is computed from a simple majority vote of the K nearest neighbours of each point. This algorithm is simple to implement, robust to noisy training data, and effective if training data is large. | ||
4. **Decision Tree Classifier** - Given a data of attributes together with its classes, a decision tree produces a sequence of rules that can be used to classify the data. Decision Tree is simple to understand and visualise, requires little data preparation, and can handle both numerical and categorical data. | ||
5. **Random Forest Classifier** - Random forest classifier fits a number of decision trees on various sub-samples of datasets and uses average to improve the predictive accuracy of the model and controls over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement. It results in reduction in over-fitting and random forest classifier is more accurate than decision trees in most cases. | ||
6. **XGBoost Classifier** - XGBoost is a popular gradient-boosting library for GPU training, distributed computing, and parallelization. It’s precise, it adapts well to all types of data and problems, it has excellent documentation, and overall it’s very easy to use. | ||
7. **Logistic Regression** - Logistic regression is a machine learning algorithm for classification. In this algorithm, the probabilities describing the possible outcomes of a single trial are modelled using a logistic function. It is most useful for understanding the influence of several independent variables on a single outcome variable. | ||
|
||
|
||
|
||
|
||
## LIBRARIES NEEDED | ||
|
||
* Numpy | ||
* Pandas | ||
* Matplotlib | ||
* Seaborn | ||
* Scikit-learn | ||
* Plotly | ||
|
||
|
||
|
||
## ACCURACIES | ||
|
||
| **Model** | Accuracy | | ||
| --- | --- | | ||
|1. Support Vector|94.033722 % | | ||
|2. Naive Bayes |90.920882 % | | ||
|3. K Nearest Neighbours|94.033722 % | | ||
|4. Decision Tree|89.775184 % | | ||
|5. Random Forest |94.163424 % | | ||
|6. XGBoost |93.623000 % | | ||
|7. Logistic Regression | 94.033722 % | | ||
|
||
|
||
|
||
## CONCLUSION | ||
|
||
K Nearest Neighbours had the highest accuracy out of all the others, followed by Support Vector and Random Forest. Through this project, I learned how to apply various classification algorithms. We were having huge big data of more than 10000+ records with more than 30 columns because of which this much accuracy is attained. |