Cryptocurrencies

Overview

Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. This project uses unsupervised learning models to group cryptocurrencies that are currently on the trading market to create a classification system for this new investment.

The 'crypto_data.csv' was sourced from CryptoCompare. A clustering algorithm was used to group the cryptocurrencies.

Results

Deliverable 1: Preprocessing the Data for PCA (Principal Component Analysis)

I processed the data set to keep all the cryptocurrencies that are being traded. Then I removed any rows that contain at least one null value. The crypto_df Dataframe was further filtered to keep only rows where coins have been mined.

I created a new DataFrame names_df that holds only the cryptocurrency names and used the crypto_df DataFrame index as its index:

Next I removed the CoinName column from the crypto_df DatafFrame since it will not be used on the clustering algorithm.

This is what crypto_df looks like at this point:

In the next step in preprocessing, I used the get_dummies() method to create variables for the two text features, Algorithm and ProofType, and stored the resulting data in a new DataFrame named X.

Finally, I used StandardScaler fit_transform() to standardize the features from X. The StandardScaler function standardizes features by removing the mean and scaling to unit variance.¹ The fit_transform() function is used to fit the scaled data to the DataFrame.

After preprocessing was completed, there were 532 cryptocurrencies to group.

Deliverable 2: Reducing Data Dimensions Using PCA (Principal Component Analysis)

In this deliverable, I applied the Principal Component Analysis algorithm to the preprocessed data from Deliverable 1. This reduced the number of dimensions to three principal components.

The result is the pcs_df DataFrame:

Deliverable 3: Clustering Cryptocurrencies Using K-means

Next I created an elbow curve using hvPlot to find the best value for K from pcs_df.

The elbow of the line, where the slope drastically changes clearly occurs at K = 4 as an optimal amount of clusters. If K is too high the model is at risk for overfitting, which would give undue importance to patterns within this dataset that are not found in other, similar datasets.

Then I ran the K-means algorithm to predict the clusters. This algorithm grouped the data around four centroids. Centroids denote the mean position of all the points in each cluster.

I added these predictions as a new column, Class and concatenated crypto_df and pcs_df to create clustered_df.

The first 10 results of clustered_df:

Deliverable 4: Visualizing Cryptocurrencies Results

I created a 3D scatterplot using Plotly Express scatter_3d() showing the three clusters from clustered_df:

If you hover over a point on the graph, a pop up displays the Coin Name, Algorithm, Total Coins Mined, and Total Coin Supply:

I created a table showing the tradable cryptocurrencies using hvplot.table():

I wanted to plot the data on a 2D scatterplot so I scaled the data. MinMaxScaler is a tool used to scale features to fall between a minimum and maximum value. In this case we scaled the data between zero and one.

This scatterplot shows the different classes and how they vary in the amount of coins mined vs. the total coin supply for each cryptocurrency. It is also interactive and displays the CoinName when you hover over a data point.

Summary

This analysis and its visualizations resulted in four different groups of cryptocurrencies based on this dataset. Accountability Accounting will be able to take a closer look at the differences in groups.

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler ↩

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Resources		Resources
README.md		README.md
crypto_clustering.ipynb		crypto_clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cryptocurrencies

Overview

Results

Deliverable 1: Preprocessing the Data for PCA (Principal Component Analysis)

Deliverable 2: Reducing Data Dimensions Using PCA (Principal Component Analysis)

Deliverable 3: Clustering Cryptocurrencies Using K-means

Deliverable 4: Visualizing Cryptocurrencies Results

Summary

About

Releases

Packages

Languages

stephperillo/Cryptocurrencies

Folders and files

Latest commit

History

Repository files navigation

Cryptocurrencies

Overview

Results

Deliverable 1: Preprocessing the Data for PCA (Principal Component Analysis)

Deliverable 2: Reducing Data Dimensions Using PCA (Principal Component Analysis)

Deliverable 3: Clustering Cryptocurrencies Using K-means

Deliverable 4: Visualizing Cryptocurrencies Results

Summary

Footnotes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages