_ _ ___ _ _
___<_>._ _ _ ___ | | ___ / __> ___ <_> _| | ___ _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_|| _/|_|\___.<___/| _/|_|\___|\___.|_|
|_| |_|
一个简单的爬虫框架。 详细文档
pip install sspider
You should construst project.py to suit your needs
>>> from sspider import Spider, Request
>>> # 建立request对象
>>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
>>> # 建立爬虫对象
>>> spider = Spider()
>>> # 运行爬虫
>>> spider.run(request)
...
>>> # 保存爬取结果
>>> spider.write('test.txt)
python project.py
Ctrl-C to stop
- Using requests as htmlDownloader
- Using lxml as default htmlParser
- Using csv provide feature that export file as csv type
- Using xlwt provide feature that export file as excel type
- Using xlsxwriter provide feature that export file as xexcel type
本项目采用 协议开源发布,请您在修改后维持开源发布,并为原作者额外署名,谢谢您的尊重。
若您需要将本项目应用于商业目的,请单独联系本人( @pengr ),获取商业授权。