Goroutine and Page Pool #343

movrcx · 2021-01-23T04:20:32Z

movrcx
Jan 23, 2021

I'm new to this concurrency thing, let's say I want to implement a page crawler with the page pool, the same as the one in example https://github.com/go-rod/rod/blob/master/examples_test.go#L532 but with a bit of modification to accept an array of URLs and doing the crawling job, is it a best practice to just create all the goroutine for every URL I want to crawl or do I need to implement another pool for this? For example:

...
	wg := sync.WaitGroup{}
	for _, url := range urls {
		wg.Add(1)
		go func(url string) {
			defer wg.Done()
			yourJob(url)
		}(url)
	}
	wg.Wait()
...

Seeing the process at the htop command, it doesn't really create a hundred threads as I thought it would and it looks like go uses all the max core of the current CPU. Is there a way to limit this process usage of cores? Thanks 🙏 .

movrcx · 2021-01-23T16:00:45Z

movrcx
Jan 23, 2021
Author

I ended up using Redis SUBSCRIBE to act as a buffered stream for URL input and the go binary act as the consumer for the input from Redis. There should be a more simple way to do this but I'm happy that this is working and it seems like the memory footprint is not as large as I used puppeteer before.

0 replies

ysmood · 2021-01-23T18:25:29Z

ysmood
Jan 23, 2021
Maintainer

Yes, each url a goroutine. No need to use redis for single machine. Use redis or other db for distributed system. Too general to teach you what to do, you can get them by just googling

1 reply

movrcx Jan 24, 2021
Author

Yeah, I figured it would be better to use a buffered channel and a goroutine to gulp down the URL from socket listener or better yet just create a simple webhook l in this scenario, but that's for later, I guess. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goroutine and Page Pool #343

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Goroutine and Page Pool #343

movrcx Jan 23, 2021

Replies: 2 comments · 1 reply

movrcx Jan 23, 2021 Author

ysmood Jan 23, 2021 Maintainer

movrcx Jan 24, 2021 Author

movrcx
Jan 23, 2021

Replies: 2 comments 1 reply

movrcx
Jan 23, 2021
Author

ysmood
Jan 23, 2021
Maintainer

movrcx Jan 24, 2021
Author