Skip to content

duiliuliu/simple-spiders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simple Spider

python -> 3.4+ coverage -> 37% build -> passing

     _              _       ___       _    _
 ___<_>._ _ _  ___ | | ___ / __> ___ <_> _| | ___  _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_||  _/|_|\___.<___/|  _/|_|\___|\___.|_|
              |_|               |_|

中文

Overview

A simple web crawling framework.Document

Getting Started

pip install sspider

You should construst project.py to suit your needs

   >>> from sspider import Spider, Request
   >>> # 建立request对象
   >>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
   >>> # 建立爬虫对象
   >>> spider = Spider()
   >>> # 运行爬虫
   >>> spider.run(request)
   ...
   >>> # 保存爬取结果
   >>> spider.write('test.txt)

python project.py

Ctrl-C to stop

Referenced Document

Referenced Libraries

  • Using requests as htmlDownloader
  • Using lxml as default htmlParser
  • Using csv provide feature that export file as csv type
  • Using xlwt provide feature that export file as excel type
  • Using xlsxwriter provide feature that export file as xexcel type

Project structure


License

This project is published open source under license agreement. Please maintain the open source release after modification and sign the name of the original author. Thank you for your respect

If you need to apply this project for commercial purposes, please contact me( @pengr ) separately to obtain commercial authorization

About

A simple web crawling framework.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages