_ _ ___ _ _
___<_>._ _ _ ___ | | ___ / __> ___ <_> _| | ___ _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_|| _/|_|\___.<___/| _/|_|\___|\___.|_|
|_| |_|
A simple web crawling framework.Document
pip install sspider
You should construst project.py to suit your needs
>>> from sspider import Spider, Request
>>> # 建立request对象
>>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
>>> # 建立爬虫对象
>>> spider = Spider()
>>> # 运行爬虫
>>> spider.run(request)
>>> # 保存爬取结果
>>> spider.write('test.txt)
python project.py
Ctrl-C to stop
- Project document
- Blog document
- Using requests as htmlDownloader
- Using lxml as default htmlParser
- Using csv provide feature that export file as csv type
- Using xlwt provide feature that export file as excel type
- Using xlsxwriter provide feature that export file as xexcel type
This project is published open source under agreement. Please maintain the open source release after modification and sign the name of the original author. Thank you for your respect
If you need to apply this project for commercial purposes, please contact me( @pengr ) separately to obtain commercial authorization