Hey to anyone using this! Starting April 2nd, 2020, I will be abandoning this repo for a private, forked version of it. Every so often, I might make some updates to this repo if they're general enough, but I'll probably forget to, if I'm honest. There's still good stuff here, but I no longer want to deal with the hassle of constantly guessing what should be public and what should be private.
Thanks!
This is (most) of the code of my personal web-scraping Python projects. Feel free to use the source code to learn from, but if you borrow stuff, please source me.
This code is mostly for me. That means I'm still working on it, playing around with it, and I understand things about it that you, a stranger, may not. Although I "try" to document things and be clear, do not assume any of the code works in the way that you think it will.
Additionally, much of this code is aimed at scraping particular websites, so please do not start running stuff willy-nilly. For example, if you want the data from the website I scraped for the manga_updates.py
/manga_project
project, please
just help yourself to what I've collected (everything_json.json
and everything_json_issues_slimmed.json
).
Please do not try to scrape it again yourself---they don't need multiple people bombarding their site with my code.
If you want to see how it's done by running through the code yourself, please limit yourself to ~20 requests per run.