Skip to content

A python library for scraping/checking/fetching/storing proxies. 🎭

License

Notifications You must be signed in to change notification settings

gagan1510/greendeck-proxygrabber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

67 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

greendeck-proxygrabber 🎭

Gd Logo

This package is developed by Greendeck

Install from pip

https://pypi.org/project/greendeck-proxygrabber/

pip install greendeck-proxygrabber


WHATS NEW?

Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.


πŸ‘‰ What is proxy service?

Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.

πŸ‘‰ How to use?

import the service class
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
                       update_time = 300,
                       pool_limit = 1000,
                       update_count = 200,
                       database_name = 'proxy_pool',
                       collection_name_http = 'http',
                       collection_name_https = 'https',
                       country_code = 'ALL'
                       )

This creates a service object.

Args
  • update_time = Time after which proxies will be updated (in seconds)
  • pool_limit = Limit after which insertion will change to updating
  • update_count = Number of proxies to request grabber at a time
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Starting the service

service.start()

Starting service gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

πŸ‘‰ What is proxy to mongo?

Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.

πŸ‘‰ How to use?

import the ProxyToMongo class
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
                        pool_limit = 1000,
                        length_proxy = 200,
                        database_name='proxy_pool',
                        collection_name_http='http',
                        collection_name_https='https',
                        country_code='DE'
                        )

This creates a service object.

Args
  • pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
  • length_proxy = Number of proxies to fetch at once
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Calling the ProxyToMongo grabber

service.get_quick_proxy()

Starting Grabber gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

πŸ‘‰ How to use Proxy Grabber Class?

import ProxyGrabber class
from greendeck_proxygrabber import ProxyGrabber
initialize ProxyGrabber object
grabber = ProxyGrabber(len_proxy_list, country_code, timeout)

Here default values of some arguments are,

len_proxy_list = 10
country_code = 'ALL'
timeout = 2

Currently the program only supports proxies of combined regions

Getting checked, running proxies

The grab_proxy grab_proxy() function helps to fetch the proxies.

grabber.grab_proxy()

This returns a dictionary of the following structure:

{
    'https': [< list of https proxies >],
    'http': [< list of http proxies >],
    'region': 'ALL' # default for now
}
Getting an unchecked list of proxies

The grab_proxy proxy_scraper() method of ScrapeProxy helps to fetch the proxies. This returns a list of 200 proxies of both type http and https.

from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()

This returns list of proxies of type http proxies followed by https proxies.

http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
Filtering invalid proxies from a list of proxies

The proxy_checker_https and proxy_checker_http methods from ProxyChecker class helps to validate the proxies.

Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.

from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)

πŸ‘‰ How to build your own pip package

In the parent directory

  • python setup.py sdist bdist_wheel
  • twine upload dist/*

references

MADE WITH 🐍 BY Gagan