Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gist Searching in GitHub Provider is now Rate Limited (and doesn't appear to be affected by OAuth authentication) #823

Open
CMCDragonkai opened this issue Oct 14, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@CMCDragonkai
Copy link
Member

Describe the bug

In our discovery system, we use getIdentityData method from github provider which will look up gists via https://gist.github.com/search.

It seems recently this now has some secondary rate limit applied (https://docs.github.com/en/rest/using-the-rest-api/troubleshooting-the-rest-api?apiVersion=2022-11-28), which is not solvable even with authenticated requests. Atm it is done unauthenticated, because it's basically a public page that we index over.

Gists are not currently searchable via the official GitHub API, so it seems that gist search has basically become impossible to index programmatically now. This is pretty bad. Especially because it's a secondary rate limit.

image

I tried doing things like:

curl -H "Authorization: token <TOKEN>" "https://gist.github.com/search?q=user%3ACDeltakai+filename%3Acryptolink.txt+Cryptolink+between+Polykey+Keynode+and+Github+Identity&s=updated&o=desc"

But no use, it's just 429 too many requests.

The only other option right now is to change using the API for gists, and because there's no search feature, you have to basically index over all gists via the API, but we could use since to do this efficiently without having to repeat. https://docs.github.com/en/rest/gists/gists?apiVersion=2022-11-28#list-gists-for-a-user. Effectively only going over the new gists representing new claims. The timestamp acts like a cursor.

To Reproduce

  1. Try discovering any identity... and you may see WARN:polykey.PolykeyAgent.task v0pocinl3mpo0195g4m2kd1t8k0:Failed - Reason: ErrorProviderCall: Provider responded with 429 Too Many Requests show up in the agent logs.

Expected behavior

It needs to work just like normal and discover without problems.

Screenshots

image

Notify maintainers

@tegefaulkes

@CMCDragonkai CMCDragonkai added the bug Something isn't working label Oct 14, 2024
@CMCDragonkai
Copy link
Member Author

This is a critical issue for all social identity discovery system.

@CMCDragonkai
Copy link
Member Author

Sustainable solution is to switch to using the Github gist API, and using since as a cursor.

This means the graph has to be extended with metadata like the since cursor for a given identity vertex.

@CMCDragonkai
Copy link
Member Author

Technically this could be defined by the GitHubProvider.

@CMCDragonkai
Copy link
Member Author

I think GitHub is basically preventing scraping the gists.

@CMCDragonkai
Copy link
Member Author

We should also take into account this information as part of the metadata of any given provider, so we can track how often we hit the API: https://docs.github.com/en/rest/rate-limit/rate-limit?apiVersion=2022-11-28

@CMCDragonkai
Copy link
Member Author

To get around this atm, you can also do identities discover <NODEID> - in this case it does create a gestalt around a single NodeID and allows sharing. The vault share command should do this automatically if not done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

1 participant