Create a Race Bar in Python

Tutorial Prepared by: Joi Anderson (@joicodes) • View on Notion • Updated: Sept. 27, 2020

Step 1: Curate a Data File 📊

Finding an interesting data set:

A data set is a collection of data.

Data sets are created in many different ways. Some are based off of human observations or surveys, like the U.S. Census. Others may be machine-generated, like satellite forecast data.

The most common format for data sets is a spreadsheet or CSV. Let's aim to find a dataset that is formatted as a CSV.

Here is a list of sources for interesting data sets to explore:

👉🏽 For this workshop, we will be using: Hot 100 singles (1/1/2000 and 12/28/2019)

Understanding Your Data:

Before starting your analysis on the data set, let's take the time to first understand the data we are working with. So let's take a look at the data:

Observations:

About: New Hot 100 singles from January 1, 2000 to December 28, 2019
Data Source: Web scrapped from Billboard.com
Size: 7,850 rows of data (i.e. 7,850 songs)
The first row of my data contains column names.
Columns:
- Week - The week the song entered the Billboard Hot 100
- EnterPosition - The position the song entered the Billboard Hot 100
- Song - Name of the song
- Performer - Name of the performer and features on the song.

Download Data

Export the Google Sheets file as a CSV and move it to our repository:

File > Download > Comma Separated Values (.csv, current sheet)

Rename the file hot100.csv and add it to your repository.

👀 Here is how your data looks as a raw CSV file: Preview

Step 2: Using Pandas 🐼

Meet Pandas (Python Data Analysis Library)

pandas is a Python library that give you a set of tool to do data analysis. If want to work with big data sets, then pandas is going to be your best friend. 👯‍♀️

Image from: Python Awesome

To install pandas, in your Terminal write:

pip3 install pandas

After it installs, we can import it into our [main.py](http://main.py) file:

import pandas as pd

Loading our data from CSV file

Now that we've imported pandas, we are ready to read the CSV file into Python using read_csv() from pandas:

data_frame = pd.read_csv("hot100.csv")

To see if it worked, we can see the first few rows of the data by adding the following to our code:

print( data_frame.head() )

head() gives us a snap shot of our data, by displaying the first few rows and columns of the data set.

You should an aggregated chart printed to the terminal like this:

We can also see the last rows of the data by using tail()

print( data_frame.tail() )

Step 3: Install Bar Chart Race 🏁

Meet Bar Chart Race

bar_chart_race is an open source Python library that can be used to create animated bar and line chart races in Python. It's is built on top of two popular Python data analytics library: matplotlib and plotly. This library simplifies creating racing graph animation!

👉🏽 See repo

To install bar_chart_race, in your Terminal write:

pip3 install bar_chart_race

After it installs, we can import it into our [main.py](http://main.py) file:

import bar_chart_race as bcr

Install Dependency

brew install ffmpeg

If you decide that you want to create a gif animation, install Image Magick and Ghost Script

brew install imagemagick
brew install ghostscript

Step 4: Prepare Data for Bar Chart 🔧

Transform data into 'wide' data

In order to create a racing bar chart, our data set must be in 'wide' form where:

Each row represents a single period of time
Each column holds the value for a particular category
The index contains the time component

To transform our data set into wide form we would need:

The index would be the week — using Week
Each column has a name an artists who had a Top 100 hit — using Performer
Each row should represent the cumulative count of songs by that week.

Here a rough sketch of how it would look :

We can transform to be 'wide' by creating a pivot table with pandas:

wide_data = data_frame.pivot_table(index='Week', columns='Performer', aggfunc='count', fill_value=0).cumsum()

Here is what wide_data.head() will print:

If you want to see the full output, check it out here.

Remove header

The pivot table generated a header for us that is not useful to us.

We can remove this header by using drop level() :

wide_data.columns = wide_data.columns.droplevel(0)

Remove duplicate columns

If you look at the results, there are some duplicated columns:

wide_data = wide_data.loc[:,~wide_data.columns.duplicated()]

Create a subset

There are hundreds of artists with Billboard Hot 100 hits. Our graph would be wayyyy too big if we decided to make all artists race. Let's shorten our table to 5 columns to compare.

Rather than deleting the columns we are currently not using, we can create a subset with the columns we need with pandas:

Let's choose 5 Performers (i.e. 5 columns of data) to race and store them in a list:

columns = [ "Mariah Carey", "Michael Jackson", "Drake", "Rihanna", "Lady Gaga"]

Using that list of column names, we can create a sub-dataset by doing the following:

sub_dataset = data_frame[columns]

Let's print the first few rows of sub_dataset to what data it contains:

print(sub_dataset.head())

Now that we have our data ready... let the games begin!

Step 5: Create Your Animation 🏁

Create .mp4 with Racing Bar Chart Animation

bcr.bar_chart_race(sub_dataset, filename='hot100.mp4')

Check out your video

Once your program has finished, check your repo for hot100.mp4 and watch your 5 artists race!

Which artists did you choose? Were you surprised about who won?

Here is mine (watch in 5x speed):

https://youtu.be/mgFmybMTnXs

Check the docs for Bar Chart Race to customize your animation!

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
readme_assets		readme_assets
.gitignore		.gitignore
README.md		README.md
hot100.csv		hot100.csv
main.py		main.py
requirements.txt		requirements.txt
sub_dataset.csv		sub_dataset.csv
wide_data.csv		wide_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Create a Race Bar in Python

Step 1: Curate a Data File 📊

Step 2: Using Pandas 🐼

Step 3: Install Bar Chart Race 🏁

Step 4: Prepare Data for Bar Chart 🔧

Step 5: Create Your Animation 🏁

About

Languages

joicodes/Top-100-Racing-Bar-Chart

Folders and files

Latest commit

History

Repository files navigation

Create a Race Bar in Python

Step 1: Curate a Data File 📊

Step 2: Using Pandas 🐼

Step 3: Install Bar Chart Race 🏁

Step 4: Prepare Data for Bar Chart 🔧

Step 5: Create Your Animation 🏁

About

Topics

Resources

Stars

Watchers

Forks

Languages