This project involves data wrangling several Gapminder datasets into one long-format CSV. This data is then analyzed with Pandas, Numpy, Plot.ly + Cufflinks, as well as Seaborn.
All work is in jupyter notebooks.
While all data wrangling is painful, the initial matching of region & sub-region values (e.g. 'Europe' & 'Western Europe') to country names was so particularly painful that I've documented it iin a separate file, to not distract from the flow of the data analysis.
For the same reasons, the wrangling into long format & batch-merging of multiple Gapminder datasets is also done in a separate notebook.
/gapminder:
-
analyze_CO2.ipynb: Loads
data/final_df.csv
and analyzes first onlyCO2_pc
, thenCO2_pc
by region & sub-region, then does multi-variate analyses with variables for energy, income, and standard-of-living. -
analyze_CO2.html: HTML export of above notebook.
-
make_df.ipynb: batch converts csv & xls files into dataframes, cleans them and converts to long format. It then merges them, then does partial-string matching to ensure all countries have values for the region & sub-region columns. Outputs
final_df.csv
. -
make_df.html: HTML export of above notebook.
-
/data:
- /merge_regions: contains jupyter notebook & .csv files. The notebook documents the merging of regions data with the Gapminder CO2 dataset. Outputs
countries_with_regions.csv
, which is used bymake_df.ipynb
. - /originals: contains original .csv/.xls files downloaded from gapminder. Input for
make_df.ipynb
. - final_df.csv: output of
make_df.ipynb
- /merge_regions: contains jupyter notebook & .csv files. The notebook documents the merging of regions data with the Gapminder CO2 dataset. Outputs