Skip to content

This is an automated web scraping tool using Selenium to access and scrape all the audiobooks in audibles.com.

License

Notifications You must be signed in to change notification settings

shahriar-rahman/Web-Scraping-Audible-Using-Selenium-Webdriver

Repository files navigation

===========================================================================

Web Scraping Audible using Selenium Webdriver

An automated scraping script for Audible product information based on a user search, which would, at first, parse through the pages using the concepts of Pagination and scrape all relevant information. After the scraping procedure, it stores the collected data inside the memory as different extensions (CSV, XML, XLSX, and JSON).

audible.gif



◘ Introduction


For regular customers at Audible, it is a predominant task to keep track of a myriad of audiobooks. The primary initiative of this project is to obtain relevant information regarding the audiobooks that are handpicked by Audible site and are considered the best sellers to notify users of their new potential purchases.

The data contains the title of the audiobooks sorted by best rating, their respective authors, the regular prices of such items, and the release dates. This way, a customer can plan ahead of time and decide as soon as a better deal offer is announced, which in most cases is a limited type of offer.



alt text



◘ Methodologies & Technologies applied

  • Webdriver and Expected Conditions
  • System queue, Implicit and Explicit Waits
  • Chrome and Chrome Options
  • Pagination
  • DataFrame Storage and Manipulation
  • Saving file extensions using CSV, Excel, JSON and XML format
  • PyCharm IDE 2023.1 Community Edition



◘ Flowchart of the proposed Scraping process

alt text



◘ Project Organization


├── LICENSE
├── Makefile             <- Makefile with various commands
├── README.md        <- The top-level README for developers using this project.
├── scraping_data
│   ├── csv              <- Data in csv format compatible with pandas dataframe.
│   ├── excel           <- Data in xlsx format for better data analysis.
│   ├── xml             <- Data in xml format.
│   └── json            <- Data in Json format for better utilization.
│
│
│
├── img                 <- Contains project image files.
│   
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         			generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── main           <- Contains scripts for automating web scraping using Selenium
│   │   └── selenium_audible.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io



◘ Module Requirements

  • Python 3.11
  • Selenium 4.8.3
  • Webdriver Manager 3.8.6
  • Pandas 2.0.0
  • openpyxl 3.1.2



◘ Installation (using pip)

In order to install the required packages on the local machine, follow these steps:

  1. Open pip and run the following command:
> pip install selenium                                                   
  1. To install the Pandas Library, type:
> pip install pandas                                                          
  1. openpyxl is a Python library to read/write Excel extensions (xlsx/xlsm/xltx/xltm files):
> pip install openpyxl                                                          



◘ Import Packages

To import the dependencies, simply open the preferred IDE or Notebook:

  1. For Pandas, run the following command:
import pandas                                     
  1. Time is a built-in Python library and can be accessed by typing:
import time                                         
  1. Then, for Selenium, type the following command:
import selenium                                     
  1. Lastly, import the webdriver from the Selenium module:
from selenium.webdriver import *                                     



◘ Installing setup.py

  1. To use the setup.py file in Python, the first objective is to have the setuptools module installed. It can be accomplished by running the following command:
pip install setuptools                                     
  1. Once the setuptools module is installed, use the setup.py file to build and distribute the Python package by running the following command:
python setup.py sdist bdist_wheel
  1. In order to install the my_package package, run the following command:
pip install my_package                                 
  1. This will install the my_package package and any of its dependencies that are not already installed on your system. Once the package is installed, you can use it in your Python programs by importing it like any other module. For example:
import my_package                                



◘ Supplementary Resources

For more details, visit the following links:



◘ License

This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.



===========================================================================

About

This is an automated web scraping tool using Selenium to access and scrape all the audiobooks in audibles.com.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages