new utils #17

gustavorps · 2017-09-14T10:47:00Z


def url_is_from_any_domain(url, domains):
    """Return True if the url belongs to any of the given domains"""
    """Reference: https://github.com/scrapy/scrapy/blob/7e8453cf1ec992e5df5cebfeda08552c58e7c9bc/scrapy/utils/url.py#L28"""
    host = parse_url(url).netloc.lower()
    if not host:
        return False
    domains = [d.lower() for d in domains]
    return any((host == d) or (host.endswith('.%s' % d)) for d in domains)

def url_is_from_a_spider(url, spider):
    """Return True if the url belongs to the given spider"""
    return url_is_from_any_domain(url,
        [spider.name] + list(getattr(spider, 'allowed_domains', [])))

The text was updated successfully, but these errors were encountered:

gustavorps added the enhancement label Sep 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new utils #17

new utils #17

gustavorps commented Sep 14, 2017

new utils #17

new utils #17

Comments

gustavorps commented Sep 14, 2017