scrapews

A news scraper made in Python using the packages requests and lxml.

from scrapews.scrapers import NewYorkTimes


ny_scraper = NewYorkTimes()

ny_scraper.scrape()
ny_scraper.send_to_server()

print(ny_scraper.data.get('articles'))

Idea

The core ideia of the scrapews scraper is to request the HTML of a news site and extract from it, through XPath expressions, the primary information about an article, such as title, description and url.

Combining with a RESTful API service, the scraper can be used to feed a content agregator app, for example.

Check out the base_scraper class for more understanding of the code.

Instalation

First Clone this repo

git clone https://github.com/mateusvictor/scrapews.git

Change into the project directory

cd scrapews/

Create a Virtualenv in the project directory

python -m venv venv

Activate the virtualenv

venv\Scripts\activate.bat

Install the project dependencies

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

scrapews

Idea

Instalation

Files

README.md

Latest commit

History

README.md

File metadata and controls

scrapews

Idea

Instalation