web-crawler

This web-crawler allows you to crawl websites and extract information from their HTML content.

Installation

Clone the repository:

git clone https://github.com/saikiranreddy201/web-crawler.git

Navigate to the project directory:

cd web-crawler

Install the dependencies:

Usage

Run the server.js file using the command below

npm run dev

Then navigate to localhost:3000/index.html and provide the URL of the website you want to crawl.

The crawler will start fetching and parsing web pages based on the provided configuration.

View the results:

By default, the crawler will output the crawled files to the public folder.
You can modify the code in server.js to save the data to a file or integrate it with a database, if desired.

Customization

Feel free to customize and extend this web crawler to suit your specific requirements. You can modify the crawling logic, data extraction methods, or add additional features as needed.

Contributing

Contributions are welcome! If you encounter any issues or have ideas for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

web-crawler

Installation

Usage

Customization

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

web-crawler

Installation

Usage

Customization

Contributing

License