Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tarpit server to throttle API abuse #5

Open
nvkelso opened this issue Jun 6, 2017 · 3 comments
Open

Add tarpit server to throttle API abuse #5

nvkelso opened this issue Jun 6, 2017 · 3 comments

Comments

@nvkelso
Copy link
Member

nvkelso commented Jun 6, 2017

While at Mapzen we occationally saw spikes of 429 requests (access forbidden, either for exceeding their free limits for an API key or historically from an IP address) but send soooooo much traffic our way that our Fastly requests costs were impacted.

Fastly pricing [1] (in NA) is $0.0075 per 10,000 requests.
We're getting about 30 million requests per day.

Our monthly cost of 429 requests:
0.0075 / 10000 * 30000000 * 30 = $675.00 for 1 month

[1] https://www.fastly.com/pricing

@zerebubuth
Copy link
Member

It's worth noting that the tarpit will only work if we have a lot of sequential requests. Something like a browser downloading more than 4 tiles at once and queueing them, or a curl script downloading a bunch of tiles one after the other.

Before embarking on implementation, we should quickly check that we're getting a large number (say, 8 or more) requests per-IP for these 429 cases. If that's not the case, we'll need to look for a different remedy.

@nvkelso
Copy link
Member Author

nvkelso commented Jun 6, 2017

@zerebubuth now that we don't allow keyless can we just test for same API key in the time window?

@zerebubuth
Copy link
Member

I think we might be talking about different things. I'm saying that the tarpit is only effective against certain types of request pattern where a single client is going to wait for the response to one request before making another. We use the fact that they're waiting for a response to slow them down by delaying the response deliberately.

For example, if a single client is requesting things in serial then it looks something like this:

client1: request tile 1
client1: wait for response
client1: do something with the response
client1: request tile 2
etc...

In this case we can slow them down. However, it is less effective if they are performing the requests asynchronously, or a large number of different clients are all making a small number of requests, e.g:

client1: request tile 1
client2: request tile 2
client1: wait for response
client2: wait for response
client1: do something with the response
client2: do something with the response

In this case, we can still slow client1 and client2 down, but it won't reduce the number of requests on us. This is because the rate is dependent on the number of clients (e.g: each making one request), rather than the number of requests.

It seems unlikely that the clients would be completely asynchronous or requesting a very small number of tiles. However, it's an easy check to see what the average count per-IP is for 429s - although it needs to be done against the raw logs, since we don't have that information in the analytics redshift.

For the implementation of this, we wouldn't necessarily have to look at the IP address, instead have some per-API-key 429 counter which trips the tarpit protection. I'm just saying we should check to see if the fix will be effective (it seems likely) before spending the time to implement it.

@nvkelso nvkelso transferred this issue from another repository Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants