Skip to content

Cryptocurrency analysis using unsupervised machine learning.

Notifications You must be signed in to change notification settings

stephperillo/Cryptocurrencies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Cryptocurrencies

Overview

Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. This project uses unsupervised learning models to group cryptocurrencies that are currently on the trading market to create a classification system for this new investment.

The 'crypto_data.csv' was sourced from CryptoCompare. A clustering algorithm was used to group the cryptocurrencies.

Results

Deliverable 1: Preprocessing the Data for PCA (Principal Component Analysis)

I processed the data set to keep all the cryptocurrencies that are being traded. Then I removed any rows that contain at least one null value. The crypto_df Dataframe was further filtered to keep only rows where coins have been mined.

I created a new DataFrame names_df that holds only the cryptocurrency names and used the crypto_df DataFrame index as its index:

names

Next I removed the CoinName column from the crypto_df DatafFrame since it will not be used on the clustering algorithm.

This is what crypto_df looks like at this point:

crypto_df

In the next step in preprocessing, I used the get_dummies() method to create variables for the two text features, Algorithm and ProofType, and stored the resulting data in a new DataFrame named X.

Finally, I used StandardScaler fit_transform() to standardize the features from X. The StandardScaler function standardizes features by removing the mean and scaling to unit variance.1 The fit_transform() function is used to fit the scaled data to the DataFrame.

After preprocessing was completed, there were 532 cryptocurrencies to group.

Deliverable 2: Reducing Data Dimensions Using PCA (Principal Component Analysis)

In this deliverable, I applied the Principal Component Analysis algorithm to the preprocessed data from Deliverable 1. This reduced the number of dimensions to three principal components.

The result is the pcs_df DataFrame:

PCS

Deliverable 3: Clustering Cryptocurrencies Using K-means

Next I created an elbow curve using hvPlot to find the best value for K from pcs_df.

elbow_curve

The elbow of the line, where the slope drastically changes clearly occurs at K = 4 as an optimal amount of clusters. If K is too high the model is at risk for overfitting, which would give undue importance to patterns within this dataset that are not found in other, similar datasets.

Then I ran the K-means algorithm to predict the clusters. This algorithm grouped the data around four centroids. Centroids denote the mean position of all the points in each cluster.

I added these predictions as a new column, Class and concatenated crypto_df and pcs_df to create clustered_df.

The first 10 results of clustered_df:

clustered_df

Deliverable 4: Visualizing Cryptocurrencies Results

I created a 3D scatterplot using Plotly Express scatter_3d() showing the three clusters from clustered_df:

3d

If you hover over a point on the graph, a pop up displays the Coin Name, Algorithm, Total Coins Mined, and Total Coin Supply:

hover

I created a table showing the tradable cryptocurrencies using hvplot.table():

hvplot

I wanted to plot the data on a 2D scatterplot so I scaled the data. MinMaxScaler is a tool used to scale features to fall between a minimum and maximum value. In this case we scaled the data between zero and one.

scatterplot

This scatterplot shows the different classes and how they vary in the amount of coins mined vs. the total coin supply for each cryptocurrency. It is also interactive and displays the CoinName when you hover over a data point.

Summary

This analysis and its visualizations resulted in four different groups of cryptocurrencies based on this dataset. Accountability Accounting will be able to take a closer look at the differences in groups.

Footnotes

  1. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

Releases

No releases published

Packages

No packages published