-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ADR about server side filtering #729
Conversation
24ee112
to
d4cee9d
Compare
@leplatrem Thanks for putting this up together! I agree with you that "option 2" appears to be the most reasonable choice at the moment for its low complexity & cost and its ability to meet our present business needs. Longer term, from a consumer's perspective, I'd be curious about the viability of "option 4" specifically regarding how to handle the increased signing workload for server-side filtering. If the scalability problem is intrinsically hard to solve, would it be reasonable for us to opt out content signing for certain use cases to work around that challenging? An interesting observation we have made from building user-facing content serving features in Firefox over the past years, we've been gradually shifting our mental model from error prevention to error tolerance & rapid responding. Granted it will lose certain security protections w/o content signing, though that lose could be totally acceptable if we have other mechanisms in place for emergency/incident handling. As an RS consumer, having such flexibility seems quite useful and compelling in certain cases especially if it'd unlock other capabilities. |
In order to choose our solution we considered the following criteria: | ||
|
||
- **Complexity**: Low → High: how complex is the solution | ||
- **User experience**: Bad → Good: how adapted to users is the solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to take into account 2 user experiences here. The remote-settings admins, and everybody that uses firefox (and other potential remote-settings clients).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean developer experience? Or client API complexity?
I could be wrong but I think Firefox users should not be impacted by this internal implementation details. Do you have something concrete in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a good question. Maybe I'm overthinking it but I suppose there is:
- remote-settings admins - How hard is it to change these records and understand what is happening in the background.
- client developers - How hard is it to use and understand these enhancements. We do occasionally run into a latency question because of CDN caching, and this is adding one more layer on top of that.
- end users - Is my computer running better because of these changes? Am I using more or less power/bandwidth? etc. I guess at an individual level the impact will be minimal though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this makes sense.
I rephrased some parts but didn't find justified to create a criteria for each of these. Let me know if you don't agree, I'd be happy to adjust more until it's good enough
For example, if records have `regions` and `locales` fields: | ||
|
||
``` | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could collection config look like in this example to know how the records should be pivoted? Something like?
{
...
pivotBy: [ "regions", "locales" ]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with records should be pivoted?
How would this pivotBy
field be used?
We could add a few lines below to show the client code required to leverage this option.
It would look like this:
ChromeUtils.defineESModuleGetters(lazy, {
Region: "resource://gre/modules/Region.sys.mjs",
});
const region = lazy.Region.home || "default";
const locale = Services.locale.appLocaleAsBCP47.substr(0, 2);
client = RemoteSettings(`shops-${region}-${locale}`);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe this was a bad assumption on my part. I assumed there would be a generic automated process that looks for these collections that need to create/update the read-only virtual collections. And that process would need to know how to disperse those records to the virtual collections when it runs.
In my head this is basically a generic ETL process that takes the records from the collection, and applies some "pivot table" like logic to disperse these records out.
I realize now there could be multiple processes that have their own custom logic in them, one for each client or collection. But I still like the idea of having a paved road for a generic process where we could list which properties we want to create collections for and it would just run with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, you were looking for a generic way to create and populate all the sub-collections, and on the client side "route" them to the right sub-collection.
With the proposed solution here, the sub-collections are hardcoded in the cronjobs manifests, and the clients have hardcoded logics to pick the proper sub-collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly that. But you were thinking about storing that config in the cronjobs. I guess the end result is the same and this is down to small implementation details. Although if we put the config in the collection properties, admins could see it (and even play with it in dev) 🤔 . That could allow for some self-service.
Either way, we don't need to decide it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generic/automatic/self-serve solution is neat, but unlike the proposed option, it requires some (minor) development.
Shall we add another entry in this doc and evaluate?
The configuration of this "pivot" would require 2 aspects to be configured: the list of fields, and the pattern of the side collections.
For example pivotBy: ["region", "locale"]
, and targetPattern: "shop-{region}-{locale}"
.
I think it would still require us to create the side collections manually, and the cronjob would just "ignore" the records if they no existing collection matches (eg. if there's no shop-oceania-de
collection, do not put this record anywhere)
- **Cost of operation**: Low. Clients download only the subset of data, saving bandwidth, and signing datasets on each publication is relatively cheap. | ||
|
||
|
||
### Option 4 - Implement dynamic signatures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a hybrid approach here. Where we create the result on the first request and cache it. However if many clients make the request at the same time, we would need to implement a queue so we don't do the same work for each of them.
I think option 2 is our best value without major changes. Option 4 feels like it's not quite complete but could be great. We could setup a custom sync endpoint where the client provides info on it's current state, and a signed bundle is returned with exactly what it needs to be brought up to date. Something similar to couchDB's _changes endpoint, but with a signature attached. But we'd need some additional infrastructure to prevent competing requests from being processed at the same time. |
Should I add more details?
This could be one idea of improvement, but I feel that I does not relate directly to server side filtering. |
You're right, that is a more generic solution and doesn't apply directly to server side filtering. We can leave that out for now. |
Yes! I would define virtual here, because with Option2 they are real/concrete collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving it
Yeah virtual is the wrong word here. Maybe "staging" or "calculated" or "pre-filtered" or something? |
Calculated or computed seems better 👌 |
Shall I merge? Did you want to add a diagram or more options? We can also always iterate... |
Added the diagram to the decision outcome section. If this looks good, let's merge it and start getting this built out. |
No description provided.