Add ADR about server side filtering #729

leplatrem · 2025-01-03T12:01:54Z

No description provided.

docs/adr/adr_004_server_side_filtering.md

ncloudioj · 2025-01-07T20:59:10Z

@leplatrem Thanks for putting this up together!

I agree with you that "option 2" appears to be the most reasonable choice at the moment for its low complexity & cost and its ability to meet our present business needs.

Longer term, from a consumer's perspective, I'd be curious about the viability of "option 4" specifically regarding how to handle the increased signing workload for server-side filtering. If the scalability problem is intrinsically hard to solve, would it be reasonable for us to opt out content signing for certain use cases to work around that challenging? An interesting observation we have made from building user-facing content serving features in Firefox over the past years, we've been gradually shifting our mental model from error prevention to error tolerance & rapid responding. Granted it will lose certain security protections w/o content signing, though that lose could be totally acceptable if we have other mechanisms in place for emergency/incident handling. As an RS consumer, having such flexibility seems quite useful and compelling in certain cases especially if it'd unlock other capabilities.

alexcottner · 2025-01-08T21:35:52Z

docs/adr/adr_004_server_side_filtering.md

+In order to choose our solution we considered the following criteria:
+
+- **Complexity**: Low → High: how complex is the solution
+- **User experience**: Bad → Good: how adapted to users is the solution


I think we need to take into account 2 user experiences here. The remote-settings admins, and everybody that uses firefox (and other potential remote-settings clients).

Do you mean developer experience? Or client API complexity?

I could be wrong but I think Firefox users should not be impacted by this internal implementation details. Do you have something concrete in mind?

Oh that's a good question. Maybe I'm overthinking it but I suppose there is:

remote-settings admins - How hard is it to change these records and understand what is happening in the background.

client developers - How hard is it to use and understand these enhancements. We do occasionally run into a latency question because of CDN caching, and this is adding one more layer on top of that.

end users - Is my computer running better because of these changes? Am I using more or less power/bandwidth? etc. I guess at an individual level the impact will be minimal though.

All of this makes sense.
I rephrased some parts but didn't find justified to create a criteria for each of these. Let me know if you don't agree, I'd be happy to adjust more until it's good enough

alexcottner · 2025-01-08T21:58:03Z

docs/adr/adr_004_server_side_filtering.md

+For example, if records have `regions` and `locales` fields:
+
+```
+{


What could collection config look like in this example to know how the records should be pivoted? Something like?

{ ... pivotBy: [ "regions", "locales" ] }

What do you mean with records should be pivoted?
How would this pivotBy field be used?

We could add a few lines below to show the client code required to leverage this option.
It would look like this:

ChromeUtils.defineESModuleGetters(lazy, { Region: "resource://gre/modules/Region.sys.mjs", }); const region = lazy.Region.home || "default"; const locale = Services.locale.appLocaleAsBCP47.substr(0, 2); client = RemoteSettings(`shops-${region}-${locale}`);

Sorry, maybe this was a bad assumption on my part. I assumed there would be a generic automated process that looks for these collections that need to create/update the read-only virtual collections. And that process would need to know how to disperse those records to the virtual collections when it runs.

In my head this is basically a generic ETL process that takes the records from the collection, and applies some "pivot table" like logic to disperse these records out.

I realize now there could be multiple processes that have their own custom logic in them, one for each client or collection. But I still like the idea of having a paved road for a generic process where we could list which properties we want to create collections for and it would just run with it.

If I understand correctly, you were looking for a generic way to create and populate all the sub-collections, and on the client side "route" them to the right sub-collection.

With the proposed solution here, the sub-collections are hardcoded in the cronjobs manifests, and the clients have hardcoded logics to pick the proper sub-collection.

Yes, exactly that. But you were thinking about storing that config in the cronjobs. I guess the end result is the same and this is down to small implementation details. Although if we put the config in the collection properties, admins could see it (and even play with it in dev) 🤔 . That could allow for some self-service.

Either way, we don't need to decide it here.

The generic/automatic/self-serve solution is neat, but unlike the proposed option, it requires some (minor) development.
Shall we add another entry in this doc and evaluate?
The configuration of this "pivot" would require 2 aspects to be configured: the list of fields, and the pattern of the side collections.
For example pivotBy: ["region", "locale"], and targetPattern: "shop-{region}-{locale}".
I think it would still require us to create the side collections manually, and the cronjob would just "ignore" the records if they no existing collection matches (eg. if there's no shop-oceania-de collection, do not put this record anywhere)

alexcottner · 2025-01-08T22:02:38Z

docs/adr/adr_004_server_side_filtering.md

+- **Cost of operation**:  Low. Clients download only the subset of data, saving bandwidth, and signing datasets on each publication is relatively cheap.
+
+
+### Option 4 - Implement dynamic signatures


We could have a hybrid approach here. Where we create the result on the first request and cache it. However if many clients make the request at the same time, we would need to implement a queue so we don't do the same work for each of them.

alexcottner · 2025-01-08T22:29:51Z

I think option 2 is our best value without major changes.

Option 4 feels like it's not quite complete but could be great. We could setup a custom sync endpoint where the client provides info on it's current state, and a signed bundle is returned with exactly what it needs to be brought up to date. Something similar to couchDB's _changes endpoint, but with a signature attached. But we'd need some additional infrastructure to prevent competing requests from being processed at the same time.

leplatrem · 2025-01-09T09:45:33Z

Option 4 feels like it's not quite complete but could be great.

Should I add more details?

We could setup a custom sync endpoint where the client provides info on it's current state, and a signed bundle is returned with exactly what it needs to be brought up to date.

This could be one idea of improvement, but I feel that I does not relate directly to server side filtering.

alexcottner · 2025-01-10T17:49:25Z

This could be one idea of improvement, but I feel that I does not relate directly to server side filtering.

You're right, that is a more generic solution and doesn't apply directly to server side filtering. We can leave that out for now.

alexcottner · 2025-01-10T18:03:54Z

Because I like diagrams, does this look right?

leplatrem · 2025-01-10T18:39:54Z

Because I like diagrams, does this look right?

Yes! I would define virtual here, because with Option2 they are real/concrete collections

alexcottner

Loving it

alexcottner · 2025-01-10T20:14:23Z

Because I like diagrams, does this look right?

Yes! I would define virtual here, because with Option2 they are real/concrete collections

Yeah virtual is the wrong word here. Maybe "staging" or "calculated" or "pre-filtered" or something?

leplatrem · 2025-01-13T10:16:45Z

Because I like diagrams, does this look right?

Yes! I would define virtual here, because with Option2 they are real/concrete collections

Yeah virtual is the wrong word here. Maybe "staging" or "calculated" or "pre-filtered" or something?

Calculated or computed seems better 👌

leplatrem · 2025-01-13T10:21:22Z

Loving it

Shall I merge? Did you want to add a diagram or more options? We can also always iterate...

alexcottner · 2025-01-13T19:23:39Z

Added the diagram to the decision outcome section. If this looks good, let's merge it and start getting this built out.

leplatrem added the documentation label Jan 3, 2025

Add ADR about server side filtering

d4cee9d

leplatrem force-pushed the adr-server-side-filtering branch from 24ee112 to d4cee9d Compare January 7, 2025 17:16

Minor tweak

b5e0e10

leplatrem marked this pull request as ready for review January 7, 2025 17:34

leplatrem requested a review from a team as a code owner January 7, 2025 17:34

ncloudioj reviewed Jan 7, 2025

View reviewed changes

docs/adr/adr_004_server_side_filtering.md Show resolved Hide resolved

Mention multiple conditions

4d772bc

alexcottner reviewed Jan 8, 2025

View reviewed changes

Add client code snippet and details about hardcoded/manual steps

3d39d3e

alexcottner approved these changes Jan 10, 2025

View reviewed changes

Adding diagram to decision outcome section

bdb3839

leplatrem merged commit d0a7b01 into main Jan 14, 2025
6 checks passed

leplatrem deleted the adr-server-side-filtering branch January 14, 2025 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ADR about server side filtering #729

Add ADR about server side filtering #729

leplatrem commented Jan 3, 2025

ncloudioj commented Jan 7, 2025

alexcottner Jan 8, 2025

leplatrem Jan 9, 2025

alexcottner Jan 10, 2025

leplatrem Jan 13, 2025

alexcottner Jan 8, 2025 •

edited

Loading

leplatrem Jan 9, 2025

alexcottner Jan 10, 2025

leplatrem Jan 10, 2025

alexcottner Jan 10, 2025

leplatrem Jan 10, 2025 •

edited

Loading

alexcottner Jan 8, 2025

alexcottner commented Jan 8, 2025

leplatrem commented Jan 9, 2025

alexcottner commented Jan 10, 2025

alexcottner commented Jan 10, 2025

leplatrem commented Jan 10, 2025

alexcottner left a comment

alexcottner commented Jan 10, 2025

leplatrem commented Jan 13, 2025

leplatrem commented Jan 13, 2025

alexcottner commented Jan 13, 2025

		- Cost of operation: Low. Clients download only the subset of data, saving bandwidth, and signing datasets on each publication is relatively cheap.


		### Option 4 - Implement dynamic signatures

Add ADR about server side filtering #729

Add ADR about server side filtering #729

Conversation

leplatrem commented Jan 3, 2025

ncloudioj commented Jan 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcottner Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leplatrem Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcottner commented Jan 8, 2025

leplatrem commented Jan 9, 2025

alexcottner commented Jan 10, 2025

alexcottner commented Jan 10, 2025

leplatrem commented Jan 10, 2025

alexcottner left a comment

Choose a reason for hiding this comment

alexcottner commented Jan 10, 2025

leplatrem commented Jan 13, 2025

leplatrem commented Jan 13, 2025

alexcottner commented Jan 13, 2025

alexcottner Jan 8, 2025 •

edited

Loading

leplatrem Jan 10, 2025 •

edited

Loading