Skip to content

Latest commit

 

History

History
48 lines (28 loc) · 1.21 KB

README.md

File metadata and controls

48 lines (28 loc) · 1.21 KB

web-crawler

This web-crawler allows you to crawl websites and extract information from their HTML content.

Installation

  1. Clone the repository:
git clone https://github.com/saikiranreddy201/web-crawler.git
  1. Navigate to the project directory:
cd web-crawler
  1. Install the dependencies:

Usage

  1. Run the server.js file using the command below
npm run dev
  1. Then navigate to localhost:3000/index.html and provide the URL of the website you want to crawl.

The crawler will start fetching and parsing web pages based on the provided configuration.

  1. View the results:
  • By default, the crawler will output the crawled files to the public folder.
  • You can modify the code in server.js to save the data to a file or integrate it with a database, if desired.

Customization

Feel free to customize and extend this web crawler to suit your specific requirements. You can modify the crawling logic, data extraction methods, or add additional features as needed.

Contributing

Contributions are welcome! If you encounter any issues or have ideas for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.