Skip to content

This web-crawler allows you to crawl websites and extract information from their HTML content.

Notifications You must be signed in to change notification settings

saikiranreddy201/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web-crawler

This web-crawler allows you to crawl websites and extract information from their HTML content.

Installation

  1. Clone the repository:
git clone https://github.com/saikiranreddy201/web-crawler.git
  1. Navigate to the project directory:
cd web-crawler
  1. Install the dependencies:

Usage

  1. Run the server.js file using the command below
npm run dev
  1. Then navigate to localhost:3000/index.html and provide the URL of the website you want to crawl.

The crawler will start fetching and parsing web pages based on the provided configuration.

  1. View the results:
  • By default, the crawler will output the crawled files to the public folder.
  • You can modify the code in server.js to save the data to a file or integrate it with a database, if desired.

Customization

Feel free to customize and extend this web crawler to suit your specific requirements. You can modify the crawling logic, data extraction methods, or add additional features as needed.

Contributing

Contributions are welcome! If you encounter any issues or have ideas for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

About

This web-crawler allows you to crawl websites and extract information from their HTML content.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published