Releases: OwenOrcan/YiraBot-Crawler
YiraBot 1.0.9
YiraBot Release Notes: Version 1.0.9
We are thrilled to announce the release of YiraBot 1.0.9, the most significant update to date. This version marks a monumental leap in the evolution of YiraBot, featuring a complete code rewrite and introducing a plethora of enhancements and new features. Upgrade to the latest version to experience the unparalleled speed and efficiency of YiraBot.
Upgrade YiraBot using pip:
pip install --upgrade yirabot
Overview
YiraBot 1.0.9 sets a new standard in web crawling and SEO analysis with an astounding 86% increase in performance speed. This update is not just about speed; it introduces a new Python module, enabling users to integrate YiraBot's powerful features directly into their Python scripts. With this release, we've streamlined the installation process by reducing the required packages from 10 to just 4, making it lighter and more efficient.
New Python Module
In response to community feedback, YiraBot 1.0.9 introduces a Python module, allowing for more versatile use of YiraBot in programming projects. Here’s how you can get started:
from yirabot import YiraBot
bot = YiraBot()
For detailed documentation on using the new Python module, please refer to the README file.
CLI Enhancements
New -mobile
Flag
A new -mobile
flag has been added, enabling YiraBot to use a mobile user agent while crawling. This feature is crucial for testing mobile responsiveness and SEO performance from a mobile perspective.
Multiple Commands
YiraBot now supports the execution of multiple commands in a single run, further enhancing its flexibility and usability for comprehensive SEO analysis and web crawling.
Overall Improvements
- Performance: Complete code rewrite leading to an 86% increase in processing speed.
- Reduced Dependencies: The number of required packages has been dramatically reduced from 10 to 4, streamlining the installation process.
- License Change: The software license has been changed from MIT to GPL-3.0. This change means that any derivative work must also be open-sourced under the GPL-3.0 license, ensuring that improvements and modifications to YiraBot are shared with the community. Developers integrating YiraBot into their projects should consider the implications of this license change, as it requires any modifications or derivative works to be distributed under the same license, promoting a more open and collaborative development environment.
This update is a testament to our commitment to providing a state-of-the-art tool that meets the evolving needs of developers, SEO specialists, and content creators. We are excited to see how you leverage YiraBot 1.0.9 in your projects and workflows.
For any questions or feedback, please refer to the documentation or reach out!
Happy crawling!
YiraBot 1.0.8
YiraBot Release Notes: Version 1.0.8
Upgrade YiraBot using pip:
pip install --upgrade yirabot
Overview
YiraBot 1.0.8 introduces a significant focus on SEO analysis. The enhanced SEO command incorporates comprehensive features to streamline and optimize your website's SEO performance.
Enhanced SEO Analysis Command
The updated seo
command (formerly check
) now includes extensive features for a thorough SEO assessment:
- Title and Meta Description Length Analysis: Analyzes and compares the length of titles and meta descriptions against SEO standards, offering tailored feedback.
- Keyword Extraction: Identifies the most prominent keywords used on the page, aiding in content optimization strategies.
- Heading Hierarchy Analysis: Evaluates the heading structure and provides feedback on any header usage errors or inconsistencies.
- Mobile Responsiveness Check: Assesses the website's mobile compatibility, a crucial aspect of modern SEO.
- Social Media Integration Discovery: Identifies and displays the website's connections with various social media platforms.
- Website Language Identification: Detects the primary language of the website, essential for targeted SEO tactics.
Command Renaming for Improved User Experience
For better clarity and user experience, we have renamed the following commands:
check
is nowseo
crawl-content
has been updated toscrape
Example Usage
For SEO analysis using the new seo command
yirabot seo <url>
Developer Side Changes
To enhance the YiraBot's performance and code maintainability:
- Code Organization: The codebase has been restructured, segregating different functionalities into multiple files for improved organization and scalability.
YiraBot 1.0.7.3
YiraBot Release Notes: Version 1.0.7.3
Upgrade YiraBot using pip:
pip install --upgrade yirabot
Overview
YiraBot 1.0.7.3 brings a pivotal update to enhance web crawling capabilities, especially for accessing protected pages. This release focuses on enabling users to effectively extract data from websites requiring login credentials, broadening the scope of YiraBot as a comprehensive Python library for web crawling.
New Features
Introducing the Session Command
- Crawl Protected Pages: The new 'session' command empowers YiraBot to navigate and crawl pages that necessitate user authentication.
- User-Friendly Interaction: Designed to simplify the process of accessing and crawling content behind login screens.
How to Use the Session Command
- Gathering Form Input Details: Retrieve the names of the login form input fields (usually 'username' and 'password') by inspecting the HTML of the login page.
- Understanding and Obtaining the Success Redirect URL:
- What Is It: The success redirect URL is the page you are directed to after successfully logging in. It's where you land after entering your credentials on the website.
- How to Get It: To obtain this URL, manually log into the website and note the URL of the page you land on after the login process. This is the success redirect URL needed by YiraBot.
- Why It's Needed: YiraBot uses this URL to verify successful login by comparing the post-login landing page with the provided success redirect URL.
- Limitations with Advanced Authentication Methods: The session command may not work with websites using two-factor authentication, CAPTCHAs, or dynamic forms relying on JavaScript.
Session Command Usage
- Begin a session for crawling protected pages:
yirabot session
- Follow the prompts to input the login URL, expected success redirect URL, and the input field names for username and password.
End-User Benefits
- Broader Access to Web Content: Users can now extract data from websites that require login, including subscription-based platforms and private applications.
- Streamlined Data Collection: Enhances the ability to collect and analyze data from a variety of online sources.
Example Usage
yirabot session
- Input the requested URLs and credentials as prompted.
- Select your preferred type of crawl for the authenticated page.
With Version 1.0.7.3, YiraBot continues to advance, making web data extraction more accessible and accommodating the diverse needs of its user base.
YiraBot 1.0.7
YiraBot Release Notes: Version 1.0.7
Overview
In this latest update, YiraBot Version 1.0.7, the focus is on refining and enhancing the command-line interface (CLI) aspect of the tool. This version marks a significant step in YiraBot's journey, further solidifying its position as a robust and versatile Python library for web crawling and data extraction. With an emphasis on user experience, efficiency, and versatility, Version 1.0.7 introduces a suite of new features and improvements that cater to a wide range of web crawling needs. Whether it's for data analysis, SEO audits, or content aggregation, YiraBot now offers more powerful and user-friendly options for professionals and enthusiasts alike. This update not only streamlines existing functionalities but also introduces new commands and features that enhance the tool's adaptability to various web environments and user requirements.
New Features
Dynamic Delay for Server Load Management
- Dynamic Delay: Introduces a dynamic delay mechanism in crawling processes. This feature adjusts the crawling speed based on the server's response time, minimizing server overwhelm.
Enhanced Data Extraction and Storage
- JSON Data Extraction: Added functionality to extract crawl data into JSON format. This can be activated using the
-json
flag during the crawl command.
Advanced Crawling Commands
- Check Command: A new
check
command is implemented, enabling YiraBot to crawl through a website and identify any broken links or potential issues. - Get-HTML Command: The
get-html
command is introduced to create an exact HTML copy of a website, which is then saved as an HTML file.
Performance Improvements
- Increased Speed: YiraBot's overall performance and crawling speed have been significantly improved, offering a faster and more efficient web crawling experience.
Example Usage
yirabot check example.com
yirabot get-html example.com
yirabot crawl example.com -json
YiraBot 1.0.6
YiraBot Release Notes: Version 1.0.6
Introduction
YiraBot transitions from a command-line tool to a versatile Python library, enabling integration into various projects.
New Features
Python Library Integration
- YiraBot is now available as a Python module, allowing for seamless incorporation into scripts.
Enhanced Web Crawling Methods
(All of these methods are for python module usage)
get_html(url)
: Retrieves the HTML content of a webpage.crawl(url)
: Performs comprehensive crawling of a webpage.crawl_content(url)
: Extracts detailed content like paragraphs, headings, and lists.is_allowed_by_robots_txt(url)
: Checks if crawling a URL is allowed by the site's robots.txt.parse_sitemap(url)
: Parses the sitemap of a website for URL discovery.
Usage Examples
- Import and initialize YiraBot in your script:
from yirabot import Yirabot bot = Yirabot() content = bot.crawl("https://example.com") # Get All Data In Key Value Format. for item in content: print(item, content[item])
YiraBot 1.0.5
YiraBot Release Notes: Version 1.0.5
Release Highlights:
New Command Implementation
- Get Content Command: A new command has been introduced, enabling YiraBot to retrieve specific web content more efficiently.
Modernization and Design Improvements
- Modernized Interface: The interface of YiraBot has been upgraded to offer a more modern and user-friendly experience.
- Enhanced Progress Bar: The progress bar is now equipped with additional functionality, providing clearer and more detailed feedback during operations.
- Design Overhaul: A significant design upgrade has been implemented, enhancing both the visual appeal and usability.
Code and Functionality Enhancements
- Code Organization: The codebase has been reorganized, separating the code into different files for better clarity and maintenance.
- File Writing Style: File writing style has been improved for enhanced readability and structure.
- Error Handling: Error handling mechanisms have been strengthened for increased robustness and reliability of operations.
User Experience and Performance Fixes
- HTTPS Simplification: The requirement to type 'https' in URLs has been removed, streamlining the web crawling process.
- Sitemap Parser Fix: Corrections have been made to the sitemap parser, ensuring more accurate and efficient crawling of websites.
- File Writing Check: The redundant checks during file writing have been identified and rectified.
The latest update aims to significantly elevate the performance, usability, and dependability of YiraBot, ensuring a more seamless and efficient web crawling experience. Your continued support and feedback are much appreciated.
YiraBot 1.0.4
Release Notes for YiraBot v1.0.4
Introduction
YiraBot, the proficient webpage crawler, announces its latest update with version 1.0.4. This update introduces key features and improvements, enhancing the tool's functionality in web crawling and data extraction.
What's New in v1.0.4
1. Data Extraction to File Feature:
- A pivotal addition in this release is the ability to directly export crawled data into a file, facilitated by a new flag option.
- Command Example:
yirabot crawl https://example.com -file
- The newly introduced
-file
flag, when used at the end of a crawl command, automatically saves the crawled data into a file, streamlining data management and usage.
2. Enhanced Crawling Efficiency:
- Significant improvements have been made to the crawling algorithm, ensuring faster and more efficient data retrieval from webpages.
3. User Interface Improvements:
- Updates to the user interface provide clearer and more informative feedback during the crawling process, aiding in better progress monitoring and output understanding.
4. Bug Fixes and Performance Enhancements:
- This version includes various bug fixes and minor performance enhancements, aimed at improving stability and user experience.
Getting Started
The new features in YiraBot v1.0.4 are designed for easy integration into users' existing workflows. The addition of the -file
flag does not require any complex setup or configuration.
Feedback and Support
YiraBot values user input for continuous development and enhancement. Users encountering issues or having suggestions for future versions are encouraged to reach out to the support team.
Version 1.0.4 focuses on enriching YiraBot's web crawling capabilities and user-friendly data handling.
Detailed usage instructions and information about the new features can be found in the updated documentation accompanying this release.