Skip to content

Gavsum/JobScrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

JobScrape Project

I started working on this project as a simple demo and to also yield some (hopefully) useful insights into the job market. Currently the aim is to scrape data from relevant job postings on carreerbeacon and then present the data visually.

Libraries in use so Far

TODO

  • Clean up spider code
  • Remove nested loops if possible
  • Commenting
  • Format Scrapy output format into JSON like style for use in MongoDB
  • Sepaarate functionallity in single spider into other python files (eg: use pipeline file for exporting tasks)
  • Functionality (in pipeline) to reject poorly formatted results, or duds that have scrapped
  • Implement wait times in spider to avoid accidentally DOS'ing a site
  • Plan out use of other libraries
  • MongoDB for results storage and ease of access
  • Accessing MongoDB with NodeJS
  • Angular & Express or React or Electron(Desktop) for front end stuff
  • Using D3 to represent the keyword frequency visually (eg: most commonly requested skills) or any other useful visualizations
  • After stack is figured out using low volume scrape results, scale up to scraping complete entire web page worth of results
  • Also implementing User input for ...
  • Job search keyword
  • Title keywords to accept / rejected in search results
  • Option to scrape other websites for similar data (probably not though)

About

Job posting web scrapper project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published