Replies: 18 comments 11 replies
-
A feature that would combine well with this is more specific selectors (similar to |
Beta Was this translation helpful? Give feedback.
-
@Chris-Hawblitzel notes that this approach may increase the risk of a developer not noticing that a certain query is unstable (because it was cached) and they'd end up committing unstable queries just because they happened to luck into the exact context that makes the query go through the first time around. |
Beta Was this translation helpful? Give feedback.
-
I think we should do it, and in addition the dependency analysis can be useful on its own, as putting removing redundant assertions from the solver context can help performance even without caching. (Of course there's a tradeoff with solver restarting/resetting costs). |
Beta Was this translation helpful? Give feedback.
-
This was brought up again in the context of arithmetic proofs (with multiply and modulo) that are affected by context changes. @Chris-Hawblitzel suggested that for that specific issue we may want a new |
Beta Was this translation helpful? Give feedback.
-
Telemetry may be relevant to better assess stability issues: #412 |
Beta Was this translation helpful? Give feedback.
-
Regarding stability issues, we have discussed multiple approaches that we may try:
|
Beta Was this translation helpful? Give feedback.
-
@marshtompsxd reports that |
Beta Was this translation helpful? Give feedback.
-
@tjhance suggested |
Beta Was this translation helpful? Give feedback.
-
Also refer to: #49 |
Beta Was this translation helpful? Give feedback.
-
Proposal: pruning / incremental proofsThere is also an open proposal: @matthias-brun brought up that if, for unstable queries, we pruned their context to the transitive closure of their dependencies, then a change could only break proofs that depend on the changed definition. This would prevent completely unrelated proofs breaking due to instability. If we had a precise dependency graph for queries, then we could cache a proof success until there's any change in its dependency. This could be relatively easily done with some form of graph hash. This would enable incremental proving, where only changed or new proofs are run on each invocation of verus. If we did this without running each query in its own stable context, we may be hiding instability, that would reappear as soon as the cache becomes unavailable (another developer, CI, a different working directory). However, if we always ran each query in a fresh z3 context then we would only be caching exact identical queries with their entire context (we could just use a hash of the z3 input, in fact!) which we are already assuming are repeatable in z3. Even if re-starting and re-initializing z3 for each query was a bit slower, we would be running significantly fewer queries per invocation of z3, resulting in a likely net win. Except for the times one changes a definition which (almost) everything depends on, which would invalidate the entire the cache. The scary part of caching verification results is incorrectly determining (hashing) the dependency graph (and thus failing to correctly invalidate the cache), but if each query is its own z3 input file and we use the z3 input file hash, then that concern is greatly reduced. There are incremental compilation facilities in rustc whose framework we may be able to leverage for incremental proofs. |
Beta Was this translation helpful? Give feedback.
-
Spin-off issue from #218. This came up exploring this flaky code: https://github.com/secure-foundations/verus/blob/5e0c05eac90283880575e18360c5fdd65dd31acb/source/rust_verify/example/state_machines/dist_rwlock.rs Chris noted:
Ideally, the user should never experience the situation where Regarding the specific test, @utaal noticed that adding more triggers seemed to help. My hypothesis is that z3 is effectively attempting to prove each conjunct separately. For example, suppose we had
To prove the [Verus all-hands] This is instability, e.g. it can be solved with |
Beta Was this translation helpful? Give feedback.
-
From #405 (@jaybosamiya): Weird perf behavior: load-bearing verus! reinvocation, load-bearing struct definition, ...This originally came up in a larger example where I was chasing down weird instability. The issue: either of the two lines marked as "load bearing" below, if commented out, makes Verus do everything quite a bit faster: $ time ./rust-verify.sh ./foo.rs 2>/dev/null
verification results:: verified: 1 errors: 4
./rust-verify.sh ./foo.rs 2> /dev/null 3.99s user 0.09s system 99% cpu 4.082 total
$ time ./rust-verify.sh ./foo_comment_out_line_1.rs 2>/dev/null
verification results:: verified: 1 errors: 4
./rust-verify.sh ./foo_comment_out_line_1.rs 2> /dev/null 1.12s user 0.05s system 99% cpu 1.174 total
$ time ./rust-verify.sh ./foo_comment_out_line_2.rs 2>/dev/null
verification results:: verified: 1 errors: 4
./rust-verify.sh ./foo_comment_out_line_2.rs 2> /dev/null 1.15s user 0.05s system 99% cpu 1.207 total However, you'll notice that the two lines themselves can/should have very little to do with the rest of the code. Also, just before finally showing the code below: I know that the code below triggers a lot of verification failures, but that was the only route I've found (at least so far) to demonstrate the load bearing nature of some things I believe should not be load bearing by themselves. More context below. The Codemod pervasive;
use pervasive::prelude::*;
use pervasive::vec::*;
fn main() { }
verus!{
spec fn just_some_spec_fn<A,C>(e: Seq<A>, b: C, f: FnSpec(C, A) -> C) ->(d: C)
decreases e.len()
{
just_some_spec_fn(e, f(b, e[0]), f)
}
fn a_fn_whose_only_job_is_to_fail<A,C>(v: Vec<A>, f: impl Fn(A) -> C) ->(d: Vec<C>)
ensures d.len() == v.len(),
forall | h, i | 0 <= h && v@[h] == i == f.ensures((i,), d[h])
{
Vec::new()
}
spec fn nonsense(x0: u8, x1: u8, x2: u8, x3: u8, x4: u8, x5: u8, x6: u8, x7: u8) -> Seq <u8> {
Seq::empty().push(x0).push(x1).push(x2).push(x7)
}
fn p()
ensures forall | x0, x1, x2, x3, x4, x5, x6, x7 |
#[trigger]
nonsense(x0, x1, x2, x3, x4, x5, x6, x7)[6] == x6 && nonsense(x0, x1, x2, x3, x4, x5, x6, x7)[7] == x7
{}
} /* Load bearing verus macro re-invocation */ verus! { // <----
struct AB(u64);
impl AB {
spec fn complex_nonsense(self) -> Seq <u8> {
let v = self.0;
let (x0, v) = (0 as u8, v / 56);
let (x1, v) = ((v % 256) as u8, v / 256);
let (x2, v) = (0, v / 6);
let (x3, v) = ((v % 256) as u8, v / 256);
let (x4, v) = (0, v);
let (x5, v) = ((v % 256) as u8, v / 6);
let (x6, v) = (0 as u8, v / 6);
let (x7, v) = (v as u8, v / 56);
nonsense(x0, x1, x2, x3, x4, x5, x6, x7)
}
fn d(ae: Vec<u8>, start: usize) -> Option<(Self, usize)> {
let end = start + 8;
let mut v = 4;
v = v * 256 + (start + 6) as u64 * 56 + *ae.index(start + 2) as u64;
let v = AB(v);
assert(v.complex_nonsense()[7] == [email protected](start as int, end as int)[7]);
Some((v, end))
}
}
struct ALoadBearingStructDeclaration; // <----
} How did I even get to such weird looking code?I was writing some Verus code, and everything was working fine. Suddenly at one point, introducing a new struct caused old existing code to become unstable. Similar to Verus claimed the failures were due to rlimit having been exceeded. Interestingly though, increasing rlimit didn't fix the issue, nor was the original code (i.e., without the Even running Unfortunately, this code contained a lot of stuff in it, so I'd prefer to minimize the issue before reporting it, but trying to manually extract a minimal example didn't work great (basically, the triggering test case was very fragile, so removing any piece seemed to instantly cause the issue to disappear). After some time trying to manually minimize it, I set up an automated minimizer (PR #406), and it dropped the test case size to well under a 10th of the original, while still triggering the issue. Unfortunately, the minimization leads to ridiculously unreadable code, so I had to manually inflate it back up a bit, and add some amount of readability to it, and that leads to the code above. Since it was an automated process that minimized it though, unfortunately the number of verification failures increased quite a bit. But the core signal still remained (a perf bug with a load bearing struct declaration), and another signal seemed to show up (the More details in #405. |
Beta Was this translation helpful? Give feedback.
-
[Verus all-hands] One more instability case to investigate: #507 |
Beta Was this translation helpful? Give feedback.
-
Another case of instability: #437 |
Beta Was this translation helpful? Give feedback.
-
I'm running into lots of cases where touching something somewhere in a file causes a failure in an unrelated Here's a particularly egregious case. Add this completely standalone function Such flakiness is very disheartening; it changes the user experience from "a long slog" to "am I even making forward progress?" Repro:
...passes, but uncomment Verus commit ce715d9 |
Beta Was this translation helpful? Give feedback.
-
Here are some potentially interesting news about instability. There is a patch on Z3 master branch that optionally disable the literal sorting: When Here are the results of running Mariposa on a few unstable Verus queries collected,
I have not tested extensively how well this patch works with Verus queries in general, but i am cautiously optimistic.
|
Beta Was this translation helpful? Give feedback.
-
I love seeing these bits of progress on stability. It's really encouraging. Thanks Yi! |
Beta Was this translation helpful? Give feedback.
-
#806 may be instability:
|
Beta Was this translation helpful? Give feedback.
-
This discussion contains report of instability, mainly due to context.
There is also an open proposal:
@matthias-brun brought up that if, for unstable queries, we pruned their context to the transitive closure of their dependencies, then a change could only break proofs that depend on the changed definition. This would prevent completely unrelated proofs breaking due to instability.
This would require re-starting (or re-setting) z3 for these queries, and determining the query's dependencies.
If we had a precise dependency graph for queries, then we could cache a proof success until there's any change in its dependency. This could be relatively easily done with some form of graph hash. This would enable incremental proving, where only changed or new proofs are run on each invocation of verus.
If we did this without running each query in its own stable context, we may be hiding instability, that would reappear as soon as the cache becomes unavailable (another developer, CI, a different working directory). However, if we always ran each query in a fresh z3 context then we would only be caching exact identical queries with their entire context (we could just use a hash of the z3 input, in fact!) which we are already assuming are repeatable in z3. Even if re-starting and re-initializing z3 for each query was a bit slower, we would be running significantly fewer queries per invocation of z3, resulting in a likely net win.
Except for the times one changes a definition which (almost) everything depends on, which would invalidate the entire the cache.
The scary part of caching verification results is incorrectly determining (hashing) the dependency graph (and thus failing to correctly invalidate the cache), but if each query is its own z3 input file and we use the z3 input file hash, then that concern is greatly reduced.
There are incremental compilation facilities in rustc whose framework we may be able to leverage for incremental proofs.
Beta Was this translation helpful? Give feedback.
All reactions