By Dwann Wilson, Kanish Mohan, Kyle Williams, and Marcellus Smith
We've used a dataset that simulates user behavior metrics such as their demographics, where the ad was located, and the amount of times they clicked on an ad. We used ML models to determine what factors makes people more inclined to click on an ad.
-
Pandas
-
Scikit-Learn
- train_test_spilt
- OneHotEncoder
- OrdinalEncoder
- StandardScaler
- PCA
- KMeans
- LinearRegression
- RandomForestClassifier
- r2_score
- classification_report
-
imblearn
- SMOTE
In this project, we focus on predicting click-through rates (CTR) for advertisements using machine learning. Effective advertising relies on understanding audience preferences.
Our approach involves feature engineering, where we transform raw data by applying techniques like Principal Component Analysis (PCA) for dimensionality reduction and one-hot encoding for categorical features (such as Age and Location). Additionally, we create a new target variable called ‘Ad Encounter’ to indicate ad clicks.
By separating features from the target variable, we train machine learning models to predict ad success based on factors like Age, Income, and Ad characteristics.
Based on the results of the principal component analysis Age, and Income have a high loading, possibly implicating that they are significant variables in the dataset.
train test split was run on the data solely analyzing click through rate (CTR) on the full and select variables and then scaled the select data in order to perform linear regression on both sets of data. R2 values range from 0 to1 with a 1 indicating that the regression model predictions fit the data perfectly.
The r2 values of -0.005 and -0.002. This indicates that the linear regression model does not explain the variability of the data. Overall, the changes in the dependent variable CTR are not completely explained by the changes in the independent variables, demonstrating that a Regression Model cannot predict the click through rate of an ad.
According to a blog post from Neil Patel (n.d.), it takes at least 2 clicks to convert a user into a customer. We can try to use a classification model to predict if an ad is succesful at getting the user to click at least 2 times.
The code uses a slice of the DataFrame that only contains rows with the amount of clicks being from 0-3. After doing a train-test-split, the X data is scaled and encoded and the y data is the clicks column encoded as "Yes" for rows with 2-3 clicks and "No" for rows with 0-1 clicks. I used the Random Forest Classifier as my model, and I calculated the maximum depth that produces the most accurate results looping through max depths 1-15. The model trained with a max depth of 8 had an accuracy score of 85%, but had no accuracy for determining ads that weren't clicked at least two times. I made several attemps to resample the data, and I when I did it with the SMOTE sampler it produced an accuracy of 81%, but with a higher accuracy for "No" results.
A classification model can be used to predict if it will convert a user into a customer, but more data is needed to improve the model's ability to differentiate effective ads from non-effective ads.
This is an analysis of the data to make meaningful observations.
My Code is showing what Ad Topic, Gender, Location and Age that are most desirable using the clicks column as the Y value and splitting the X and Y data to find the most common trends.
We found that more men over the age of 30 years old who live in Rural , Suburban and Urban under both test data and train data found they are more intersted into the topics of Finance, Travel, and Fashion vs females.