-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the crawler concurrent #91
Comments
This was implemented then removed because it lead to the library website dropping requests. |
Did the implementation have an upperbound on the number of parallel requests being made? AFAIR no. IMO using waitgroups to limit the number of concurrent workers to 2-3 should improve the perform significantly. |
I dont remember exactly. |
@rajivharlalka or @harshkhandeparkar please update the state of this issue to be reflected on the kanban. |
@shikharish what should be the status of this? |
It is not needed as of now. We only need to run the crawler once or twice a semester so it's very low priority. |
Is it hard to do? |
Not at all |
Then just finish it off maybe? |
No point in keeping hanging issues if they can be solved in a few minutes. |
@shikharish updates? |
Did some testing and turns out even using 2 go routines leads to dropping of 1-2 requests. Further increasing it to 6 goroutines makes it 3-4 requests. Should we skip this one for now? |
Try to implement a retry function. Also, how many requests are you able to make concurrently. Even if it is more than one, then that's a win situation. |
Halting this, till we have time to have a look at it comfortably. |
Is your feature request related to a problem? Please describe.
Currently the crawler sequentially fetches each paper details, parses it and downloads the paper. This can be made lot faster using go-routines.
The text was updated successfully, but these errors were encountered: