-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A77: xDS Server-Side Rate Limiting #414
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up! Overall, this looks good, but I have few comments around making it clearer to readers.
Please let me know if you have any questions. Thanks!
generates the `onClientCall` handlers (aka interceptors in Java and Go, and | ||
filters in C++). | ||
|
||
##### RLQS Cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we decided that this wasn't going to be an RLQS-specific cache but rather the general-purpose mechanism described in A83, right? If so, please reword this accordingly, and update the diagram above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant A83-based cache here, I've pointed it out in several other places. This was supposed to describe how this persistent cache is used by RLQS. I see how this can be confusing for the reader. Will update accordingly.
malicious Control Plane, leading to such potential exploits as: | ||
|
||
1. Leaking customer's Application Default Credentials OAuth token. | ||
2. Causing MalOut/DDoS by sending bad data from the compromised RLQS (f.e. set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this bullet is relevant. Nothing that we're talking about here protects us against a compromised RLQS server, just against the control plane pointing us at the wrong RLQS server. And if the attacker has control over the xDS server, they can just as easily insert a fault injection config that fails 100% of requests, so they don't need RLQS at all.
|
||
##### RLQS Filter State | ||
|
||
RLQS Filter State contains the business logic for rate limiting, and the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should introduce this concept using something like this:
"""
In order to retain filter state across LDS/RDS updates, the actual logic for the RLQS filter will be moved into a separate object called RLQS Filter State, which will be stored in the persistent filter state mechanism described in gRFC A83. The key in the persistent filter state will be the RLQS xDS HTTP filter config, which ensures that two RLQS filter instances with the same config will share filter state but two RLQS filter instances with different configs will each have their own filter state.
"""
|
||
With this proposal, the filter state is lost if change is made to the filter | ||
config, including updates to inconsequential fields such as deny response | ||
status. Additional logic can be introduced to handle updates to such fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest starting the second sentence with "If this becomes a problem, ".
* Go: [`Activation.ResolveName(string)`](https://github.com/google/cel-go/blob/3f12ecad39e2eb662bcd82b6391cfd0cb4cb1c5e/interpreter/activation.go#L30) | ||
* Java: [`CelVariableResolver`](https://javadoc.io/doc/dev.cel/runtime/0.6.0/dev/cel/runtime/CelVariableResolver.html) | ||
|
||
### Persistent Filter Cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of this section should be moved to gRFC A83 as part of #465. The examples will probably need to change as part of that.
Also, we should clarify exactly what the "xDS HTTP filter" object is here in Java and Go, since that's different than what we have in C-core.
status. Additional logic can be introduced to handle updates to such fields | ||
while preserving the filter state. | ||
|
||
### Multithreading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should be part of the description of the Bucket Map object.
linkStyle 3,4 stroke: RoyalBlue, stroke-width: 2px; | ||
``` | ||
|
||
##### RLQS HTTP Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One criteria I like to use for a gRFC is that it should be detailed enough that a junior engineer familiar with the existing gRPC code base can implement the design correctly based solely on what's in the document. To make this design that clear, I would like to see this document break down the structure in two different ways.
First, I would like to see a list of objects, each of which lists its data members and explains when those data members are accessed. The description should be language-agnostic, and we don't want it to be any more detailed than it needs to be -- we need only enough here to make the intended structure clear. For example, the "RLQS Filter State" object will include the following data members:
- matcher tree: From filter config, initialized at instantiation, constant. Used to identify the bucket map entry for each data plane RPC.
- bucket map: Accessed on each data plane RPC, when we get a response from the RLQS server, and when report timers fire.
- RLQS client: Accessed when we get the first data plane RPC for a given bucket and when a report timer fires. Notifies RLQS Filter State of responses received from the RLQS server.
- report timer: Created/modified when we get an RLQS response for a given bucket, or when a previous timer fires.
I'd like to see this list for each of the following objects:
- RLQS xDS HTTP Filter, channel-level (i.e., in Java, this is the object that stores the persistent filter state)
- RLQS xDS HTTP Filter, call-level (i.e., in Java, this is the actual interceptor that runs for each data plane RPC)
- RLQS Filter State
- RLQS Client
- RLQS Bucket Map
Second, I would like to see a high-level walk through of each event we expect to see the filter handle and what actions it will take for each one. For example:
When processing each data plane RPC, the RLQS filter will ask the RLQS filter state for a rate-limit decision for the RPC. The RLQS filter state uses the matcher tree from the filter config to determine which bucket to use for the RPC. It then looks for that bucket in the bucket map, creating it if it doesn't already exist. If a new bucket was created, it sends a request on the RLQS stream informing the server of the bucket's creation. Finally, it returns the resulting rate-limit decision based on the bucket contents.
I think the set of events that we need to describe are these:
- When receiving an LDS/RDS update
- When processing each data plane RPC
- When receiving a response from the RLQS server
- When a report timer fires
destroyed, `RlqsFilterState(c2)` is still used by two onCallHandlers, so it's | ||
preserved in RLQS Cache. | ||
|
||
##### Future considerations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sub-section can stay in this gRFC, since this is about the decision to use the full filter config as the persistent filter state key.
Each gRPC implementation needs to consider what synchronization primitives are | ||
available in their language to minimize the thread lock time. | ||
|
||
### Code Samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code snippets should be part of the descriptions of the RLQS filter (interceptor) and RLQS Filter State objects, respectively.
Rate Limit Quota Service (RLQS) gRFC.
Rendered version: https://github.com/sergiitk/grfc-proposal/blob/a77-rlqs/A77-xds-rate-limiting-rlqs.md