mutual-relevance-scraper

Simple PRAW script for scraping concatenated posts, both relevant and irrelevant to one another, depth-first from random Reddit comment pools. To use, first set the environment variables REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET, then run like so:

$ python data.py --length 0.1G --encoding utf-8 -o annotations.txt

note: while it is possible to redirect stdout, using --opath allows data.py to remember where it left off and account for progress accordingly.

data.py — print supervised fastText annotations to stdout

TODO: measure toxicity v. supportiveness: Some replies are constructive, and some aren't; we should be able to measure either a lack of attacks or presence of positive features in responses, though fastText may not represent the most accurate means to accomplish this

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
data.py		data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mutual-relevance-scraper

About

Releases

Packages

Languages

kavorite/mutual-relevance-scraper

Folders and files

Latest commit

History

Repository files navigation

mutual-relevance-scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages