Web Scraping implementing Parallel Computing

This project was developed as the final assignment for a class in Parallel Computing at HUS. The focus of the project is on utilizing parallel computing techniques to enhance the process of web scraping. The goal is to extract large amounts of data from web pages in a shorter amount of time, by dividing the workload among multiple processes running in parallel. The use of parallel computing in web scraping has become increasingly important, as the amount of data available on the web continues to grow rapidly. This project aims to demonstrate the benefits of parallel computing for web scraping, and to provide a practical example of its implementation.

To be more specific, the project was implemented to extract job listings related to the Java programming language from the itviec.com website using web scraping techniques. The extracted data included job titles, company names, locations, and other relevant information. The use of parallel computing in this project allowed for a faster and more efficient scraping process, enabling a larger volume of data to be extracted in a shorter period of time. I also use time library to calculate the time taken to scrape all the job listings there are. All of the jobs scraped will be saved to a .csv file.

Python libraries used:

requests
BeautifulSoup
multiprocessing
time (optional)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
jobs_details_parallel.csv		jobs_details_parallel.csv
jobs_details_serial.csv		jobs_details_serial.csv
parallel_scraping.py		parallel_scraping.py
serial_scraping.py		serial_scraping.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping implementing Parallel Computing

About

Releases

Packages

Languages

annguyen18/web-scraping-parallel-computing

Folders and files

Latest commit

History

Repository files navigation

Web Scraping implementing Parallel Computing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages