A web parser focus on parsing product information on Japan ACG figure sites.
pip install figure_parser
from pprint import pprint
import requests as rq
from bs4 import BeautifulSoup
from figure_parser.exceptions import FigureParserException
from figure_parser.factories import GeneralBs4ProductFactory
factory = GeneralBs4ProductFactory.create_factory()
url = "https://www.goodsmile.info/ja/product/11246/PA+15+%E9%AB%98%E6%A0%A1%E8%83%B8%E3%82%AD%E3%83%A5%E3%83%B3%E7%89%A9%E8%AA%9E.html"
resp = rq.get(url)
try:
product = factory.create_product(resp.url, BeautifulSoup(resp.content, 'lxml'))
pprint(product.dict())
except FigureParserException as e:
print(e)
{'adult': False,
'category': 'フィギュア',
'copyright': '© SUNBORN Network Technology Co., Ltd. © SUNBORN Japan Co., '
'Ltd.',
'distributer': 'グッドスマイルカンパニー',
'images': ['https://images.goodsmile.info/cgm/images/product/20210521/11246/85011/large/346a0402da0a835b6969105e77c7bf7f.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85012/large/e1fb5ad64d58498477611082c7219759.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85013/large/cad59d379e0ac60b8d386eee93253502.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85014/large/4e4957b4783cc9b8cc6e6101aaf346b3.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85015/large/9bf879603be71259f2d673a84d1b3b2a.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85016/large/f464915a47d744441a0574e97016e8d0.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85017/large/f8ae4c2ebfb05d3b3c2c9a427d9dd9af.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85018/large/d03ebd90e1fd832e5909deba3c78432c.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85019/large/1a4421435c14c53857d5125c0f3da4aa.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85020/large/eede36da01b9ab86ba35a3e5f30a8394.jpg',
'https://images.goodsmile.info/cgm/images/product/20210521/11246/85021/large/8afccdd56243497830857ec612374266.jpg'],
'jan': None,
'manufacturer': 'Phat!',
'name': 'PA-15 高校胸キュン物語',
'og_image': 'http://images.goodsmile.info/cgm/images/product/20210521/11246/85023/medium/b1a1a49e9bb72ebd95670ca757e22735.jpg',
'order_period': {'end': datetime.datetime(2021, 7, 7, 21, 0),
'start': datetime.datetime(2021, 5, 27, 12, 0)},
'paintworks': ['緋色 (scarlet)'],
'releaser': 'ファット・カンパニー',
'releases': [{'announced_at': None,
'price': 19800,
'release_date': datetime.date(2022, 12, 1),
'tax_including': True}],
'rerelease': False,
'scale': 7,
'sculptors': ['Phat!'],
'series': 'ドールズフロントライン',
'size': 280,
'thumbnail': 'http://images.goodsmile.info/cgm/images/product/20210521/11246/85023/medium/b1a1a49e9bb72ebd95670ca757e22735.jpg',
'url': 'https://www.goodsmile.info/ja/product/11246/PA+15+%E9%AB%98%E6%A0%A1%E8%83%B8%E3%82%AD%E3%83%A5%E3%83%B3%E7%89%A9%E8%AA%9E.html'}
This project is using poetry as package manager.
Install dependencies
poetry install
Use virtualenv
poetry shell
Install pre-commit
pre-commit install
Generate new parser (the name should be in snake case)
python cli.py generate new_site
After generating the new site, the test data can be found here.
Run the test and coverage
tox
coverage combine
coverage report -m
Type check
mypy
Lint the code
isort -e .
black .
If you add or update dependencies
poetry export --without-hashes --dev -f requirements.txt --output requirements.txt
If you use Makefile
, it provides several useful command.
clean-test-cache Clean cache of test.
cov-report Show the coverage of tests.
format Format the code.
freeze Export the requirements.txt file.
help Show this help message.
install Install requirements of project.
lint Lint the code.
test Run the tests.
type-check Type check with mypy.