Skip to content

Latest commit

 

History

History
58 lines (35 loc) · 1.23 KB

README.md

File metadata and controls

58 lines (35 loc) · 1.23 KB

scrapews

A news scraper made in Python using the packages requests and lxml.

from scrapews.scrapers import NewYorkTimes


ny_scraper = NewYorkTimes()

ny_scraper.scrape()
ny_scraper.send_to_server()

print(ny_scraper.data.get('articles'))

Idea

The core ideia of the scrapews scraper is to request the HTML of a news site and extract from it, through XPath expressions, the primary information about an article, such as title, description and url.

Combining with a RESTful API service, the scraper can be used to feed a content agregator app, for example.

Check out the base_scraper class for more understanding of the code.

Instalation

  • First Clone this repo
git clone https://github.com/mateusvictor/scrapews.git
  • Change into the project directory
cd scrapews/
  • Create a Virtualenv in the project directory
python -m venv venv
  • Activate the virtualenv
venv\Scripts\activate.bat
  • Install the project dependencies
pip install -r requirements.txt