Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. In this analysis, a report will be created that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment. The data will need to be processed to fit the machine learning models. Since there is no known output for what to look for, unsupervised learning will be used. A clustering algorithm will be used to group the cryptocurrencies, and data visualizations will be used to share the findings.
Steps for analysis:
- Preprocessing the Data for PCA
- Reducing Data Dimensions Using PCA
- Clustering Cryptocurrencies Using K-means
- Visualizing Cryptocurrencies Results
- Read data into a DataFrame
- Drop the "IsTrading" column
- Remove the rows that have at least one null value
- Create a new DataFrame that only holds the names of the cryptocurrencies
- Use the get_dummies() method to create variables for the two text features, "Algorithm" and "ProofType" and store the results in a new DataFrame
# Use get_dummies() to create variables for text features.
X = pd.get_dummies(crypto_df, columns=['Algorithm', 'ProofType'])
X.head()
- Then, use the StandardScaler fit_transform() function to standardize the features from the new DataFrame
# Standardize the data with StandardScaler().
crypto_scaled = StandardScaler().fit_transform(X)
print(crypto_scaled[0:5])
- Apply PCA to reduce the dimensions to three principal components
# Using PCA to reduce dimension to three principal components.
pca = PCA(n_components= 3)
crypto_pca = pca.fit_transform(crypto_scaled)
crypto_pca
- Create a new DataFrame and use the same index as the previous DataFrame and columns named "PC 1", "PC 2", and "PC 3"
# Create a DataFrame with the three principal components.
pcs_df = pd.DataFrame(data = crypto_pca, columns= ['pc1', 'pc2', 'pc3'],index= crypto_df.index)
pcs_df.head(10)
- Using the previous DataFrame, create an elbow curve using hvPlot and a for loop to find the best value for K
- Run the K-means algorithm to make predictions of the K clusters for the cryptocurrencies’ data
# Initialize the K-Means model.
model = KMeans(n_clusters=4, random_state=0)
# Fit the model
model.fit(pcs_df)
# Predict clusters
predictions = model.predict(pcs_df)
predictions
- Create a new DataFrame by concatenating the crypto_df and pcs_df DataFrames on the same columns
# Create a new DataFrame including predicted clusters and cryptocurrencies features.
# Concatentate the crypto_df and pcs_df DataFrames on the same columns.
clustered_df = pd.concat([crypto_df, pcs_df],axis =1)
- Add another column named "Class" that will hold the predictions
- Create a 3D scatter plot using the Plotly Express scatter_3d() function to plot the three clusters from the clustered_df DataFrame. Add the CoinName and Algorithm columns to the hover_name and hover_data parameters, respectively, so each data point shows the CoinName and Algorithm on hover.
- Create an hvplot scatter plot with x="TotalCoinsMined", y="TotalCoinSupply", and by="Class", and have it show the CoinName when you hover over the the data point.
Cryptocurrencies are increasing in popularity and complexity, and the ability to understand and market them to alient will be key to any financial institution's growth. As more and more people look to invest in crypto, having the knowledge of which currencies are on the market and which ones would benefit a specific client will put any institution in a great position to become an industry leader.