Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for using a dynamic variable resolver and a Value::DynamicCollection that can dynamically resolve nested values of collection types. #58

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chirino
Copy link
Contributor

@chirino chirino commented Jul 2, 2024

This is an improved version of #47 it has been updated to include the suggestions given in #47.

@chirino
Copy link
Contributor Author

chirino commented Jul 3, 2024

Added a slightly modified version of the benchmark @clarkmcc provided in #47
This variation lets you easily switch between static or dynamic resolvers. It constructs the Context within the iteration loop since that better compares the overhead of populating the context in the static case.

Dynamic resolution seems to be faster, about 20%:

     Running benches/runtime.rs (target/release/deps/runtime-6a18d2a40aded4e1)
variable resolution/variable_resolution_1
                        time:   [49.821 ns 49.889 ns 49.968 ns]
                        change: [-32.108% -31.969% -31.813%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
variable resolution/variable_resolution_10
                        time:   [490.61 ns 495.56 ns 501.14 ns]
                        change: [-23.252% -22.414% -21.410%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
variable resolution/variable_resolution_100
                        time:   [5.6754 µs 5.7097 µs 5.7500 µs]
                        change: [-19.484% -18.817% -18.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) high mild
  1 (1.00%) high severe

@clarkmcc
Copy link
Owner

clarkmcc commented Jul 3, 2024

It's likely faster because we're comparing apples to oranges. In this benchmark you're really seeing the difference between a hashmap lookup to produce a Value::Null and skipping the hashmap altogether (as is the case with dynamic resolution). The problem with this is it doesn't accurately capture the overhead of only dynamic resolution. To me this just tells me that it's faster to return Ok(Value::Null) than it is to hashmap.get(variable) which makes perfect sense.

To properly benchmark the overhead of dynamic resolution, the hashmap lookup should serve as the control in the experiment. This means the dynamic resolution should also include the same hashmap lookup.

For the benchmark, what happens in the dynamic resolver doesn't really matter so much as that it does the same thing as the standard resolver. In practice, the dynamic resolver is almost always going to be slower than the standard resolver because, at least in your use-case, you'll be making HTTP requests in the dynamic resolver. So the benchmark should control for the hashmap in my opinion.

I believe these adjustments more accurately control for the resolver's overhead.

Index: interpreter/benches/runtime.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/interpreter/benches/runtime.rs b/interpreter/benches/runtime.rs
--- a/interpreter/benches/runtime.rs	(revision 38c263c5ea544cd8b2509853b9fe5bbe62741eaf)
+++ b/interpreter/benches/runtime.rs	(date 1720022327640)
@@ -1,5 +1,5 @@
 use cel_interpreter::context::Context;
-use cel_interpreter::{Program, Value};
+use cel_interpreter::{ExecutionError, Program, Value};
 use criterion::{black_box, criterion_group, criterion_main, Criterion};
 use std::collections::HashMap;
 
@@ -71,7 +71,7 @@
     let sizes = vec![1, 10, 100];
 
     // flip this bool to compare the performance of dynamic resolver vs static resolvers
-    let use_dynamic_resolver = true;
+    let use_dynamic_resolver = false;
 
     for size in sizes {
         let mut expr = String::new();
@@ -86,13 +86,22 @@
         group.bench_function(format!("variable_resolution_{}", size).as_str(), |b| {
             let mut ctx = Context::default();
             if use_dynamic_resolver {
-                ctx.set_dynamic_resolver(move |_, _| Ok(Value::Null));
+                let mut variables: HashMap<String, Value> = HashMap::default();
+                for i in 0..size {
+                    variables.insert(format!("var_{i}", i = i), Value::Null);
+                }
+                ctx.set_dynamic_resolver(move |_, v| {
+                    variables
+                        .get(v)
+                        .cloned()
+                        .ok_or(ExecutionError::UndeclaredReference(v.to_string().into()))
+                });
             } else {
                 for i in 0..size {
                     ctx.add_variable_from_value(&format!("var_{i}", i = i), Value::Null);
                 }
-            }
-            b.iter(|| program.execute(&ctx).unwrap())
+            };
+            b.iter(|| program.execute(&ctx))
         });
     }
     group.finish();
@@ -100,8 +109,8 @@
 
 criterion_group!(
     benches,
-    criterion_benchmark,
-    map_macro_benchmark,
+    // criterion_benchmark,
+    // map_macro_benchmark,
     variable_resolution_benchmark
 );

When measuring only the overhead of the dynamic resolver, we get ~50% performance regression

Benchmarking variable resolution/variable_resolution_1: Collecting 100 samples in estimated 5.0004 s (51Mvariable resolution/variable_resolution_1
                        time:   [95.403 ns 95.649 ns 95.907 ns]
                        change: [+53.123% +54.927% +56.336%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
Benchmarking variable resolution/variable_resolution_10: Collecting 100 samples in estimated 5.0029 s (5.variable resolution/variable_resolution_10
                        time:   [980.32 ns 986.39 ns 992.87 ns]
                        change: [+47.167% +48.708% +50.222%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking variable resolution/variable_resolution_100: Collecting 100 samples in estimated 5.0526 s (4variable resolution/variable_resolution_100
                        time:   [11.307 µs 11.360 µs 11.416 µs]
                        change: [+51.871% +53.270% +54.583%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

@chirino
Copy link
Contributor Author

chirino commented Jul 4, 2024

So the use case of using http to resolve values was just an illustration. My actual use case looks more like I get a gRPC request with a large payload document. I want to run a CEL expression against that document. Converting that whole document to cel Values seems like more overhead than dynamically picking out the fields of that document based on the expression.

@clarkmcc
Copy link
Owner

clarkmcc commented Jul 4, 2024

That sounds like a more legitimate use-case.

How are you envisioning we'd preserve the spec's Field Selection requirements with a dynamic resolver present? In order for the dynamic resolver to be useful, you want the full path of the expression before you do any resolving. But that doesn't align with the spec and would result in different error behaviors depending on whether the dynamic resolver is added or not.

If we did follow the spec on field selection, then the dynamic resolver would need to traverse each layer of the object converting each layer to a CEL value, which doesn't really get you the wins that you're looking for.

@chirino
Copy link
Contributor Author

chirino commented Jul 4, 2024

I'll try to update the test_deep_dynamic_resolver test in this PR to be a better example of the usecase. Maybe you can help figure out the edge cases where if fails the spec expectations.

@chirino chirino force-pushed the any-value branch 6 times, most recently from 24220a3 to 60cbf7d Compare July 5, 2024 15:06
@chirino chirino changed the title Add support for using a dynamic variable resolver and an Any variant to the Value type Add support for using a dynamic variable resolver and a Value::DynamicCollection that can dynamically resolve nested values of collection types. Jul 5, 2024
@chirino
Copy link
Contributor Author

chirino commented Jul 5, 2024

Ok.. the latest commits change things a bit, instead of a Value::Any we add a Value::DynamicCollection which holds a resolver closure. This made it a bit simpler to write the more realistic deep resolver test case. This also further simplified the resovler function signature.

@chirino
Copy link
Contributor Author

chirino commented Jul 8, 2024

Added a feature that integrates with Parc types which are like Arcs but which let you access deep references of the object via reference projections.

@clarkmcc
Copy link
Owner

clarkmcc commented Jul 8, 2024

That's pretty awesome actually! But I am concerned that they're pretty convoluted to use. Do you think a derive macro could generate this for a struct?

@chirino
Copy link
Contributor Author

chirino commented Jul 8, 2024

derive macro might be possible, but that should likely be a separate PR.

@chirino
Copy link
Contributor Author

chirino commented Jul 8, 2024

One thing I might want to do part of this PR is see if it's possible to provide a blanket impl a From<Vec where T: DynamicCollection which should simplify the vec indexing cases.

@clarkmcc
Copy link
Owner

clarkmcc commented Jul 8, 2024

derive macro might be possible, but that should likely be a separate PR.

Could be. I'm just looking at what it takes to use, and I don't see this feature being very useful if it requires all this manual boilerplate. Imo they're pretty tightly coupled. But if we can get a separate PR up at the same time then that works for me. What I don't want is to introduce a feature that is unwieldy to use and then never get around to cleaning up the user experience.

One thing I might want to do part of this PR is see if it's possible to provide a blanket impl a From<Vec where T: DynamicCollection which should simplify the vec indexing cases.

👍

@chirino
Copy link
Contributor Author

chirino commented Jul 9, 2024

@clarkmcc ok I think the last bits are in there. Could we get this in without the macro? Our project could take advantage of this as is.

@clarkmcc
Copy link
Owner

clarkmcc commented Jul 9, 2024

Let me play around with the PR now that it's ready to go, it's big enough I can't once-over it here in Github 😁

@chirino
Copy link
Contributor Author

chirino commented Jul 9, 2024

ok thx.

…cCollection variant to the Value type

* includes a benchmark
* users can implement a DynamicCollection trait to aid in struct conversions to Values
* added pared feature which integrates the pared dependency so that you can use Arc projection refs to documents avoiding heavy clones when using a dynamic resolver.

Signed-off-by: Hiram Chirino <[email protected]>
@chirino
Copy link
Contributor Author

chirino commented Jul 16, 2024

squashed and rebased to avoid merge conflicts.

@clarkmcc
Copy link
Owner

Thanks, I took a look at this on Saturday but didn't finish going through it. Still WIP

@chirino
Copy link
Contributor Author

chirino commented Aug 2, 2024

any updates?

@clarkmcc
Copy link
Owner

clarkmcc commented Aug 2, 2024

I'm working through some ideas based on #73. Seems like the right solution could solve #73, this, and what you're proposing here -- allowing CEL to get arc references to fields on Rust structs and skip serializing the objects altogether.

In other words, I'm not sure about this PR as it currently stands, but I want to take a lot of what you have here to improve this project on several different fronts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants