web-crawler

This web-crawler allows you to crawl websites and extract information from their HTML content.

Installation

Clone the repository:

git clone https://github.com/saikiranreddy201/web-crawler.git

Navigate to the project directory:

cd web-crawler

Install the dependencies:

Usage

Run the server.js file using the command below

npm run dev

Then navigate to localhost:3000/index.html and provide the URL of the website you want to crawl.

The crawler will start fetching and parsing web pages based on the provided configuration.

View the results:

By default, the crawler will output the crawled files to the public folder.
You can modify the code in server.js to save the data to a file or integrate it with a database, if desired.

Customization

Feel free to customize and extend this web crawler to suit your specific requirements. You can modify the crawling logic, data extraction methods, or add additional features as needed.

Contributing

Contributions are welcome! If you encounter any issues or have ideas for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
public		public
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-crawler

Installation

Usage

Customization

Contributing

License

About

Releases

Packages

Languages

saikiranreddy201/web-crawler

Folders and files

Latest commit

History

Repository files navigation

web-crawler

Installation

Usage

Customization

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages