Hello, world! #1

MichaelTiemannOSC · 2021-11-03T16:15:34Z

Just a note saying that we now have two notebooks in this one pipeline. One processes a Shell report from 2020 (over 600 rows of data) and the other a DPDHL report from 2020 (almost 300 rows of data).

The DPDHL script is very much a hacked version of the script processing Shell's data. The next step is to port functionality back from DPDHL to Shell so that Shell becomes a unified script handling two similarly-shaped reports. If that proves reasonably feasible, we should create a new generic ingestion pipeline with a name relevant to the generic shapes it's prepared to handle. With GitHub branches, new report variations of that fundamental shape can be addressed, then merged back in. With luck, we can create a generic script that can process hundreds if not thousands of spreadsheets with minimal effort (and CPU overhead).

Here is the shape of the tidy data it produces:

Variable Notes Category Segmentation Unit Year Value

Variable = the specific datapoint being observed
Notes = row-speciifc note about the observation
Category = Top-level grouping, such as "Emissions" or "Energy". In the case of DPDHL we preserve Category:Subcategory:SubSubCategory as concatenated text in the Category, when appropriate.
Segmentation = For a category that can be sliced in various ways, a description of the slicing (e.g. 'by country', 'by fuel source', 'by business')
Unit = the unit of measurement (some work needed for percentage and so-called pure numbers that are really "number of buildings" or some such)
Year = the year of the measurement
Value = the measured value

@erikerlandson @caldeirav @MichaelClifford @oindrillac @ChristianMeyndt @idemir-ids @HeatherAck

@hbaltzell if you could find the next 5-10 spreadsheets shaped like

https://reports.shell.com/sustainability-report/2020/our-performance-data/greenhouse-gas-and-energy-data.html

https://reporting-hub.dpdhl.com/downloads/2020/4/DPDHL-ESG-Statbook-2020-en.xls

MichaelTiemannOSC · 2021-11-03T18:57:39Z

Added @Shreyanand

hbaltzell · 2021-11-03T20:02:52Z

Michael, great to see progress on this. Do you think we could put someone on the case of hunting the web (with code) to find more spreadsheets with this type of data? Maybe Mike Platt? Is he still involved? Are you able to see whether there are these types of spreadsheets in the S&P repository?

MichaelTiemannOSC · 2021-11-04T01:38:09Z

Just for fun, I added a 3rd script to handle 10 years of Unilever's Emissions/Energy data (162 rows):

https://www.unilever.com/planet-and-society/sustainability-reporting-centre/sustainability-performance-data/

hbaltzell · 2021-11-04T03:21:26Z

Michael, I had assembles about 15 examples of corproate spreadsheets in a folder called "Corp ESG spreadsheets". I can't find it on the Google drive. I would be glad to upload again if you point me to the location. In this I also had a sample spreadsheet for CDP, GRI, and EEI-AGA. The latter is the Edison Electric Institutw and the American Gas Association. They created an ESG reporting template for their members that many of them use, and it is often posted on their websites, so if you created a script for that, you could get as many as 50 utilities. This would overlap with data that we already have from RMI, but it also has other ESG data. LMK what you would like me to do with these or we can have a call.

MichaelTiemannOSC · 2021-11-04T07:59:29Z

I did find that directory, and I've been using that to guide me to the new 2020 (and soon 2021) reports. Virtually all the companies I've looked at thus far have updated and improved their reporting, making it all more regular (and thus easier to parse for my purposes).

The name of the folder to which you refer is "Corporate ESG spreadsheets".

hbaltzell · 2021-11-04T12:12:16Z

Ok, I can hunt for more, but we should create a company list and set priorities for what sectors we want. Since we already have a lot of data on utilities, maybe we should look at the other priority sectors that the ITR team will focus on. Also, maybe there will be a way to check the results against the data vault.

…

Sent from my iPhone On Nov 4, 2021, at 3:59 AM, Michael Tiemann ***@***.***> wrote: I did find that directory, and I've been using that to guide me to the new 2020 (and soon 2021) reports. Virtually all the companies I've looked at thus far have updated and improved their reporting, making it all more regular (and thus easier to parse for my purposes). The name of the folder to which you refer is "Corporate ESG spreadsheets". — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://url.emailprotection.link/?bL2vq9TdExtd-J9AixJMT1_tWcvNMFfqlXAwf4-5h5Y5bVNzkCeSE0WHVMy8NuPsIZ4HqvT2rWE6__e-w8MA3z0Zlwaa8KzzANq4aV6kfIvhujyFhlhBxq3ciwRxGPDax>, or unsubscribe<https://url.emailprotection.link/?bw4co7j4yQROlz6getjSf_3eOhyMIFoQwxv9DEf5A0SzhsDwu7aZcfBTLtgEEc6zn3SvTOo9h9AeOTnDVpPdQwDGvjr0qXsNEfvrjqN4PtSoEYj7-nYOjMV09JjWhT-4m5ttAVJ2crjBXHpFA--jWVRJo6g3sXg8-EkVzv9ddi78~>. Triage notifications on the go with GitHub Mobile for iOS<https://url.emailprotection.link/?bHbqr4pqZntaKmctrp3nVOA9lXKmqQfoGWfzOMULGKrBtc_m1m6jfJpADn0X7gphfZUEezL9ZM9xiu4Vc815JHSDmRbDcg5e8Gtjnjv_MMv2dkkZCs3w5CD8bw9bQuRVpolER3tVij4-Fd56vpTOj-2i0MRoHgINKwKGYyHg1A1Y~> or Android<https://url.emailprotection.link/?by9TaYIjr5apoEDqCac_SKInlXm8CaDG-lVLYwZI1k9g9rvYs8aghlhH01W8qz3J %20SSlMKMvIT28GJtdogGCaTsZ8n_Zzz9dzd1OGkkEsVp0iZQQEbYbAXrqGwAsnbU44KcVMFP_mztu_m2_OxiF6Jnvlh6LqzXflKgztNfWue6HtFldGdCO96P-A4WmkoQ3rLIUCI4LyQyF7P1YMkI5AqvA~~>.

MichaelTiemannOSC · 2021-11-04T13:14:08Z

I have a slightly different agenda, which is to find corporate reports that are similarly shaped to Shell, DPDHL, Unilever, AEP, etc. The BHP Billiton report is an example of something that is not so similarly shaped (it mixes WIDE and LONG, making it more challenging to interpret). But if we can collect WIDE-form (dates in columns left to right) reports from major companies and we can build to a single script reading consistently from dozens of sources, we'd have something to say.

We could also start a front-end that handles LONG data. That was actually the first thing I tackled with Vale SE. If we can find 10-20 like that (but not BHP yet), that would be good, too.

Once we have a strong ingestion engine, we can go about developing a sector-based approach.

hbaltzell · 2021-11-04T13:24:13Z

OK, when I get some time I'll have a look around

hbaltzell · 2021-11-09T01:10:17Z

Michael, I’ve been hunting around for more spreadsheets but without much success. Lots of data tables in pdfs, of course, and I can share additional ones of these but I bet they’re already in the S&P repository. If you want to find more, other than my hope that someone on the team could figure out how to search corporate websites to find any xls or csv downloadable files (which we could then use a search routine on to see which have sustainability data), then one suggestion is to just put out a request to the ESG analysts among our members – both the asset managers and of course S&P and LSEG. Their analysts would probably know off the tops of their heads what companies publish data in these formats rather than pdf. You could even run some type of Google survey or other method to collect names, or even better, to have these analysts provide the urls or files. This would be a simple collaborative effort if we can’t find this type of data by machine.

MichaelTiemannOSC · 2021-11-09T01:12:40Z

Yes, I've found another half-dozen, but the pickings are silm. I have enough to keep me busy for the short term. Thanks!

HeatherAck · 2021-11-09T14:29:24Z

I found a few as well, see attached

entire_abb_csr18.xls
bp-esg-datasheet-2020.xlsx
daimler_sr_2019_kpis_environmental_protection.xls
210223-esg-datapack-2020-excel.xlsx
sap-2020-5year-summary-and-chart-generator-data.xlsx
shape_future_st_sr20.xls
Data_Library_2016_2020.xlsx.xlsx

HeatherAck · 2021-11-09T14:30:26Z

in_depth_sustainability_reporting_castellum_ar19.xls

MichaelTiemannOSC pinned this issue Nov 4, 2021

MichaelTiemannOSC mentioned this issue Nov 17, 2021

Code review and next steps #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hello, world! #1

Hello, world! #1

MichaelTiemannOSC commented Nov 3, 2021

MichaelTiemannOSC commented Nov 3, 2021

hbaltzell commented Nov 3, 2021 via email •

edited by MichaelTiemannOSC

Loading

MichaelTiemannOSC commented Nov 4, 2021

hbaltzell commented Nov 4, 2021

MichaelTiemannOSC commented Nov 4, 2021

hbaltzell commented Nov 4, 2021 via email

MichaelTiemannOSC commented Nov 4, 2021

hbaltzell commented Nov 4, 2021

hbaltzell commented Nov 9, 2021 via email •

edited by MichaelTiemannOSC

Loading

MichaelTiemannOSC commented Nov 9, 2021

HeatherAck commented Nov 9, 2021

HeatherAck commented Nov 9, 2021

Hello, world! #1

Hello, world! #1

Comments

MichaelTiemannOSC commented Nov 3, 2021

MichaelTiemannOSC commented Nov 3, 2021

hbaltzell commented Nov 3, 2021 via email • edited by MichaelTiemannOSC Loading

MichaelTiemannOSC commented Nov 4, 2021

hbaltzell commented Nov 4, 2021

MichaelTiemannOSC commented Nov 4, 2021

hbaltzell commented Nov 4, 2021 via email

MichaelTiemannOSC commented Nov 4, 2021

hbaltzell commented Nov 4, 2021

hbaltzell commented Nov 9, 2021 via email • edited by MichaelTiemannOSC Loading

MichaelTiemannOSC commented Nov 9, 2021

HeatherAck commented Nov 9, 2021

HeatherAck commented Nov 9, 2021

hbaltzell commented Nov 3, 2021 via email •

edited by MichaelTiemannOSC

Loading

hbaltzell commented Nov 9, 2021 via email •

edited by MichaelTiemannOSC

Loading