Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OH missing earliest years from PDF #561

Open
stucka opened this issue Sep 14, 2023 · 0 comments
Open

OH missing earliest years from PDF #561

stucka opened this issue Sep 14, 2023 · 0 comments
Labels
easy An easy task. These are great to start with.

Comments

@stucka
Copy link
Contributor

stucka commented Sep 14, 2023

The Ohio scraper has been rebuilt and most of the archives were consolidated into a single CSV for download.

However, the CSV that Big Local News had been hosting contained badly parsed data from the PDFs of 2015 and 2016, containing a bunch of junk characters. We could use someone to parse out the two PDFs into a CSV format so we can get them added to our archival data.

The original PDFs are included in the ZIP, as is the then-consolidated snapshot of the CSV:

https://storage.googleapis.com/bln-data-public/warn-layoffs/oh_2015-2022.zip

The current scraper is grabbing 2017-2022 from a CSV similar to the one that's in the ZIP file here, other than the 2015, 2016, and 2023 data have been purged from it.

@stucka stucka added the easy An easy task. These are great to start with. label Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy An easy task. These are great to start with.
Projects
None yet
Development

No branches or pull requests

1 participant