GitHub - Gavsum/JobScrape: Job posting web scrapper project

JobScrape Project

I started working on this project as a simple demo and to also yield some (hopefully) useful insights into the job market. Currently the aim is to scrape data from relevant job postings on carreerbeacon and then present the data visually.

Libraries in use so Far

Scrapy: Open source python frame work for scraping web pages https://scrapy.org/
Scrapy-Splash: plugin for scrapy to render js based pages https://github.com/scrapy-plugins/scrapy-splash

TODO

Clean up spider code
Remove nested loops if possible
Commenting
Format Scrapy output format into JSON like style for use in MongoDB
Sepaarate functionallity in single spider into other python files (eg: use pipeline file for exporting tasks)
Functionality (in pipeline) to reject poorly formatted results, or duds that have scrapped
Implement wait times in spider to avoid accidentally DOS'ing a site
Plan out use of other libraries
MongoDB for results storage and ease of access
Accessing MongoDB with NodeJS
Angular & Express or React or Electron(Desktop) for front end stuff
Using D3 to represent the keyword frequency visually (eg: most commonly requested skills) or any other useful visualizations
After stack is figured out using low volume scrape results, scale up to scraping complete entire web page worth of results
Also implementing User input for ...
Job search keyword
Title keywords to accept / rejected in search results
Option to scrape other websites for similar data (probably not though)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
findJob		findJob
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JobScrape Project

Libraries in use so Far

TODO

About

Releases

Packages

Languages

License

Gavsum/JobScrape

Folders and files

Latest commit

History

Repository files navigation

JobScrape Project

Libraries in use so Far

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages