dba-scraper

python3 based web scraper

This code is an update to the previous scraper written in Python 2.7.

There are some differences on library imports and calls. To start with, setup the python3 environment:

$ python3 -m venv env

$ source env/bin/activate

We will be using urllib3 to handle the page request. so we have to install the urllib3:

(env)$ pip install urllib3

install the Beautifulsoup package:

$ pip install beautifulsoup4

The URL we are looking at is from a danish website DBA:

https://www.dba.dk/saelger/privat/dba/5683282/?page=1

Which is the first page for this advertiser

Selenium

I have added selenium to this project in order to be able to scrape from web pages with javascript. WHat happens here is that, the webpage will load, and the next event happening is that the javascript requests the ad content to load.

Therfor, we need to make use of a slightly different method than only beautifulsoup.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
booksitescraper.py		booksitescraper.py
dbascape.py		dbascape.py
productlist.csv		productlist.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dba-scraper

Selenium

About

Releases

Packages

Languages

motethansen/dba-scraper

Folders and files

Latest commit

History

Repository files navigation

dba-scraper

Selenium

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages