Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello, world! #1

Open
MichaelTiemannOSC opened this issue Nov 3, 2021 · 12 comments
Open

Hello, world! #1

MichaelTiemannOSC opened this issue Nov 3, 2021 · 12 comments

Comments

@MichaelTiemannOSC
Copy link
Contributor

Just a note saying that we now have two notebooks in this one pipeline. One processes a Shell report from 2020 (over 600 rows of data) and the other a DPDHL report from 2020 (almost 300 rows of data).

The DPDHL script is very much a hacked version of the script processing Shell's data. The next step is to port functionality back from DPDHL to Shell so that Shell becomes a unified script handling two similarly-shaped reports. If that proves reasonably feasible, we should create a new generic ingestion pipeline with a name relevant to the generic shapes it's prepared to handle. With GitHub branches, new report variations of that fundamental shape can be addressed, then merged back in. With luck, we can create a generic script that can process hundreds if not thousands of spreadsheets with minimal effort (and CPU overhead).

Here is the shape of the tidy data it produces:

Variable Notes Category Segmentation Unit Year Value

Variable = the specific datapoint being observed
Notes = row-speciifc note about the observation
Category = Top-level grouping, such as "Emissions" or "Energy". In the case of DPDHL we preserve Category:Subcategory:SubSubCategory as concatenated text in the Category, when appropriate.
Segmentation = For a category that can be sliced in various ways, a description of the slicing (e.g. 'by country', 'by fuel source', 'by business')
Unit = the unit of measurement (some work needed for percentage and so-called pure numbers that are really "number of buildings" or some such)
Year = the year of the measurement
Value = the measured value

@erikerlandson @caldeirav @MichaelClifford @oindrillac @ChristianMeyndt @idemir-ids @HeatherAck

@hbaltzell if you could find the next 5-10 spreadsheets shaped like

https://reports.shell.com/sustainability-report/2020/our-performance-data/greenhouse-gas-and-energy-data.html

https://reporting-hub.dpdhl.com/downloads/2020/4/DPDHL-ESG-Statbook-2020-en.xls

@MichaelTiemannOSC
Copy link
Contributor Author

Added @Shreyanand

@hbaltzell
Copy link

hbaltzell commented Nov 3, 2021 via email

@MichaelTiemannOSC
Copy link
Contributor Author

Just for fun, I added a 3rd script to handle 10 years of Unilever's Emissions/Energy data (162 rows):

https://www.unilever.com/planet-and-society/sustainability-reporting-centre/sustainability-performance-data/

@hbaltzell
Copy link

Michael, I had assembles about 15 examples of corproate spreadsheets in a folder called "Corp ESG spreadsheets". I can't find it on the Google drive. I would be glad to upload again if you point me to the location. In this I also had a sample spreadsheet for CDP, GRI, and EEI-AGA. The latter is the Edison Electric Institutw and the American Gas Association. They created an ESG reporting template for their members that many of them use, and it is often posted on their websites, so if you created a script for that, you could get as many as 50 utilities. This would overlap with data that we already have from RMI, but it also has other ESG data. LMK what you would like me to do with these or we can have a call.

@MichaelTiemannOSC
Copy link
Contributor Author

I did find that directory, and I've been using that to guide me to the new 2020 (and soon 2021) reports. Virtually all the companies I've looked at thus far have updated and improved their reporting, making it all more regular (and thus easier to parse for my purposes).

The name of the folder to which you refer is "Corporate ESG spreadsheets".

@hbaltzell
Copy link

hbaltzell commented Nov 4, 2021 via email

@MichaelTiemannOSC
Copy link
Contributor Author

I have a slightly different agenda, which is to find corporate reports that are similarly shaped to Shell, DPDHL, Unilever, AEP, etc. The BHP Billiton report is an example of something that is not so similarly shaped (it mixes WIDE and LONG, making it more challenging to interpret). But if we can collect WIDE-form (dates in columns left to right) reports from major companies and we can build to a single script reading consistently from dozens of sources, we'd have something to say.

We could also start a front-end that handles LONG data. That was actually the first thing I tackled with Vale SE. If we can find 10-20 like that (but not BHP yet), that would be good, too.

Once we have a strong ingestion engine, we can go about developing a sector-based approach.

@hbaltzell
Copy link

OK, when I get some time I'll have a look around

@MichaelTiemannOSC MichaelTiemannOSC pinned this issue Nov 4, 2021
@hbaltzell
Copy link

hbaltzell commented Nov 9, 2021 via email

@MichaelTiemannOSC
Copy link
Contributor Author

Yes, I've found another half-dozen, but the pickings are silm. I have enough to keep me busy for the short term. Thanks!

@HeatherAck
Copy link
Contributor

@HeatherAck
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants