This web-crawler allows you to crawl websites and extract information from their HTML content.
- Clone the repository:
git clone https://github.com/saikiranreddy201/web-crawler.git
- Navigate to the project directory:
cd web-crawler
- Install the dependencies:
- Run the server.js file using the command below
npm run dev
- Then navigate to
localhost:3000/index.html
and provide the URL of the website you want to crawl.
The crawler will start fetching and parsing web pages based on the provided configuration.
- View the results:
- By default, the crawler will output the crawled files to the public folder.
- You can modify the code in
server.js
to save the data to a file or integrate it with a database, if desired.
Feel free to customize and extend this web crawler to suit your specific requirements. You can modify the crawling logic, data extraction methods, or add additional features as needed.
Contributions are welcome! If you encounter any issues or have ideas for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.