YiraBot 1.0.7.3
YiraBot Release Notes: Version 1.0.7.3
Upgrade YiraBot using pip:
pip install --upgrade yirabot
Overview
YiraBot 1.0.7.3 brings a pivotal update to enhance web crawling capabilities, especially for accessing protected pages. This release focuses on enabling users to effectively extract data from websites requiring login credentials, broadening the scope of YiraBot as a comprehensive Python library for web crawling.
New Features
Introducing the Session Command
- Crawl Protected Pages: The new 'session' command empowers YiraBot to navigate and crawl pages that necessitate user authentication.
- User-Friendly Interaction: Designed to simplify the process of accessing and crawling content behind login screens.
How to Use the Session Command
- Gathering Form Input Details: Retrieve the names of the login form input fields (usually 'username' and 'password') by inspecting the HTML of the login page.
- Understanding and Obtaining the Success Redirect URL:
- What Is It: The success redirect URL is the page you are directed to after successfully logging in. It's where you land after entering your credentials on the website.
- How to Get It: To obtain this URL, manually log into the website and note the URL of the page you land on after the login process. This is the success redirect URL needed by YiraBot.
- Why It's Needed: YiraBot uses this URL to verify successful login by comparing the post-login landing page with the provided success redirect URL.
- Limitations with Advanced Authentication Methods: The session command may not work with websites using two-factor authentication, CAPTCHAs, or dynamic forms relying on JavaScript.
Session Command Usage
- Begin a session for crawling protected pages:
yirabot session
- Follow the prompts to input the login URL, expected success redirect URL, and the input field names for username and password.
End-User Benefits
- Broader Access to Web Content: Users can now extract data from websites that require login, including subscription-based platforms and private applications.
- Streamlined Data Collection: Enhances the ability to collect and analyze data from a variety of online sources.
Example Usage
yirabot session
- Input the requested URLs and credentials as prompted.
- Select your preferred type of crawl for the authenticated page.
With Version 1.0.7.3, YiraBot continues to advance, making web data extraction more accessible and accommodating the diverse needs of its user base.