Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize interaction with the SMT solver #482

Open
facundominguez opened this issue Aug 18, 2021 · 23 comments
Open

Optimize interaction with the SMT solver #482

facundominguez opened this issue Aug 18, 2021 · 23 comments

Comments

@facundominguez
Copy link
Collaborator

facundominguez commented Aug 18, 2021

AmericanFlag.hs takes near 20 seconds to verify, while z3 AmericanFlag.hs.smt2 takes only 2 seconds. It turns out most of the time is spent in Language.Fixpoint.Smt.Interface.command which is called 280000 times! Each invocation takes near 50 microseconds, which I conjecture is going to be difficult to speed up.

Maybe SMT solvers can be linked as libraries to cut on that, or maybe it would be possible to batch a few commands and see if just reducing the amount of exchanges with the SMT solver reduces the overall time.

Any thoughts?

@facundominguez
Copy link
Collaborator Author

facundominguez commented Aug 18, 2021

Found a couple of solver-agnostic apis:
https://github.com/makaimann/smt-switch
https://github.com/sosy-lab/java-smt (calling the jvm from Haskell is a bit slow though).

@facundominguez
Copy link
Collaborator Author

facundominguez commented Aug 19, 2021

There's also the chance that after studying the example we might find a way to verify it with less calls to the smt solver.

I tried some batching that removed some hFlush calls on the pipe to the smt solver. This produced a speed up (70% of the original time), but apparently we need a different solution if we want to get rid of all of the overhead.

@ranjitjhala
Copy link
Member

Hi @facundominguez -- sorry for not replying, have been thinking this over. The data about AmericanFlag is very interesting -- I didn't realize we get a 50 ms hit per call -- if my math is right, thats 280000 * 50 / 1000000 = 14s (!) just spent in the interface? Does the 50ms include the serialization time? Or is it only the invocation?

The batching sounds very promising! The catch in general is that there is a dependency but still, there could be significant gains as your numbers show -- does AmericanFlag drop to about 10s with your edits?

I have been wary of in-memory APIs because the .smt logs are so useful for debugging but this is a lot of slowdown...

@facundominguez
Copy link
Collaborator Author

facundominguez commented Aug 20, 2021

Does the 50ms include the serialization time?

Yes, it does include serialization.

Does AmericanFlag drop to about 10s with your edits?

AmericanFlag overall goes from 18 seconds to 14 seconds. The time spent in the command function goes down to 10 seconds.

I have been wary of in-memory APIs because the .smt logs are so useful for debugging but this is a lot of slowdown...

Perhaps LF could log just the same with a modest amount of work.

The main drawbacks of batching are:

  1. It is limited to the cases that don't need intermediate results to influence subsequent queries
  2. It really makes the code harder to read by splitting all the functions in the query-generation part and the result-collection part.
  3. Both query-generation and result-collection need to be kept in sync.
  4. Not all the overhead can be removed, we still need to serialize queries and deserialize results

The prospect just doesn't excite me :)

@facundominguez
Copy link
Collaborator Author

Here's a flamegraph. While some overhead can be attributed to the synchronization between processes it also shows that there are quite a few other areas to improve. It was created with SCCs for all top-level functions in liquid-fixpoint.
americanflag.svg.zip

@facundominguez
Copy link
Collaborator Author

Here's the flamegraph after the improvements of #492.

@facundominguez
Copy link
Collaborator Author

With #493, the time of AmericanFlag is near 9 seconds.

liquid-fixpoint spends 7 seconds. From here, I'm finding it difficult to bring it down.

  • The SMT solver spends near 2.5 seconds.
  • The SMT interactions are near 1 second.
  • HashMap lookups take near 1.5 seconds.
  • Environment reduction spends 1 second and it does improve performance still.
  • Elaboration, unification and other expression manipulations seem to take the rest.

I have added batching in #493. If we switched to using the FFI to talk to the SMT solver we might earn a second, two seconds, or none. It depends on how much overhead deserialization takes on the SMT side, and how much overhead it is to build the expressions in the FFI AST.

I can't figure out what else to trim down though. It is looking like we would need to reduce the amount of lookups or manipulations that we need to do in the common paths. e.g. who needs sort checking or elaboration? :) But I don't think removing those is possible.

@ranjitjhala
Copy link
Member

ranjitjhala commented Sep 21, 2021 via email

@facundominguez
Copy link
Collaborator Author

facundominguez commented Sep 21, 2021

#480 brought AmericanFlag from 28 to 14 seconds. #473 to 13 seconds. I think #492 and #493 are bringing it from there to 9 seconds.

@facundominguez
Copy link
Collaborator Author

facundominguez commented Sep 25, 2021

I dived some more into the FFI path, and its looking laborious. smt-switch doesn't support parametric datatypes, apparently, and neither there is good documentation on the z3 bindings to define them. Add to that the modest improvement, and the hassle to mantain the FFI interface, and it doesn't look so attractive anymore.

There is a new hope though. I found that removing logical qualifiers from AmericanFlag.hs.fq speeds up LF from the current 7 seconds to 2 seconds. Maybe there is some room to either identify irrelevant qualifiers, or serve the qualifiers in an order that optimizes the time in the successful case.

One odd fact in this space is that out of 250 qualifiers, removing only two qualifiers accounts for most of the time reduction, these are:

qualif Auto(VV##0 : int, x : e##xo): ((VV##0 =
                                         (lexSize x))) // SourcePos {sourceName = "benchmarks/vector-algorithms-0.5.4.2/Data/Vector/Algorithms/AmericanFlag.hs", sourceLine = Pos 62, sourceColumn = Pos 3}
qualif Auto(VV##0 : int, x : e##xo): ((VV##0 <
                                         (lexSize x))) // SourcePos {sourceName = "benchmarks/vector-algorithms-0.5.4.2/Data/Vector/Algorithms/AmericanFlag.hs", sourceLine = Pos 65, sourceColumn = Pos 3}

Any ideas of what is special about these?

@ranjitjhala
Copy link
Member

ranjitjhala commented Sep 25, 2021 via email

@ranjitjhala
Copy link
Member

ranjitjhala commented Sep 25, 2021 via email

@facundominguez
Copy link
Collaborator Author

facundominguez commented Sep 25, 2021

The result is unchanged when eliminating those qualifiers.

These are the stats with no modifications

# Sub Constraints     :   243
# WF  Constraints     :   102
# Constraints         :   208
# Refine Iterations   :   746
# SMT Brackets        :   655
# SMT Queries (Valid) : 53451
# SMT Queries (Total) : 70183
# Sliced Constraints  :   208
# Target Constraints  :    91

These are the stats with the two qualifiers removed:

# Sub Constraints     :   243
# WF  Constraints     :   102
# Constraints         :   208
# Refine Iterations   :   746
# SMT Brackets        :   655
# SMT Queries (Valid) : 30893
# SMT Queries (Total) : 41639
# Sliced Constraints  :   208
# Target Constraints  :    91

And these are the stats with as many qualifiers as I could remove without getting unsafe constraints (near 100 qualifiers survived):

# Sub Constraints     :   243
# WF  Constraints     :   102
# Constraints         :   208
# Refine Iterations   :   746
# SMT Brackets        :   582
# SMT Queries (Valid) : 20162
# SMT Queries (Total) : 23806
# Sliced Constraints  :   208
# Target Constraints  :    91

The runtime decreases linearly with the amount of queries at 1 second every 10000 queries.

@ranjitjhala
Copy link
Member

Wow so those two irrelevant qualifiers are responsible for nearly 25K SMT queries!

@facundominguez
Copy link
Collaborator Author

facundominguez commented Sep 26, 2021

Will the second parameter of the qualifier x : e##xo be substituted with values of any sort? This could perhaps explain a long list of qualifiers created by replacing x with various local values that then need to be discarded by calling to the SMT.

@ranjitjhala
Copy link
Member

ranjitjhala commented Sep 26, 2021 via email

@facundominguez
Copy link
Collaborator Author

They originate from the specification of typeclass methods.

class Lexicographic e where
  terminate :: e -> Int -> Bool
  size      :: e -> Int
  index     :: Int -> e -> Int

{-@ measure lexSize :: a -> Int                                           @-}
{-@ assume size  :: (Lexicographic e) => x:e -> {v:Nat | v = (lexSize x)}        @-}
{-@ assume index :: (Lexicographic e) => Int -> x:e -> {v:Nat | v < (lexSize x)} @-}

https://github.com/ucsd-progsys/liquidhaskell/blob/cecfe71415014456391e8c2d5d7c4f912ebd4458/benchmarks/vector-algorithms-0.5.4.2/Data/Vector/Algorithms/AmericanFlag.hs#L70-L72

@ranjitjhala
Copy link
Member

ranjitjhala commented Sep 26, 2021 via email

@ranjitjhala
Copy link
Member

ranjitjhala commented Sep 26, 2021 via email

@ranjitjhala
Copy link
Member

Here's an idea that may address this problem without requiring any support for typeclasses/constraints in fixpoint:

restrict measures (e.g. lexSize) ONLY to the set of monomorphic sorts that they are already appear at in the rest of the constraints.

Right now the sort for lexSize is forall a. a -> int which is causing this blowup.

Instead, walk over the constraints -- after elaboration -- to find all existing terms
of the form lexSize [@t] e and only allow lexSize to be subsequently instantiated at
the sorts t.

This could greatly shrink the sorts at which valid instantiations are allowed, and shouldn't affect the constraint solving...

@facundominguez
Copy link
Collaborator Author

Oddly though those two qualifiers seem to come from somewhere else: as they respectively check if lexSize equals or is greater than 0?

Aren't the comparisons made with VV##0 rather than 0? I'm failing to notice why they should come from elsewhere.

Here's an idea that may address this problem without requiring any support for typeclasses/constraints in fixpoint

Worth considering. I'll try removing only the qualifiers that have unconstraint parameters and see how that goes.

@ranjitjhala
Copy link
Member

Aren't the comparisons made with VV##0 rather than 0? I'm failing to notice why they should come from elsewhere.

Oops my apologies, you are correct!

@facundominguez
Copy link
Collaborator Author

facundominguez commented Sep 28, 2021

Removing only the qualifiers with unrestricted parameters, gives

# Sub Constraints     :   243
# WF  Constraints     :   102
# Constraints         :   208
# Refine Iterations   :   746
# SMT Brackets        :   655
# SMT Queries (Valid) : 26631
# SMT Queries (Total) : 31949
# Sliced Constraints  :   208
# Target Constraints  :    91

Which isn't too bad.

My great discovery today was --minimizeqs which yields the following 10 qualifiers

qualif Auto(v : bool, n : int): ((((n + 1) >=
                                     maxPassesN) => v)) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1066, sourceColumn = Pos 8}
qualif Cmp(v : @(0), x : @(0)): ((v >
                                    x)) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1106, sourceColumn = Pos 8}
qualif Cmp(v : @(0), x : @(0)): ((v >=
                                    x)) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1108, sourceColumn = Pos 8}
qualif CmpZ(v : @(0)): ((v >
                           0)) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1128, sourceColumn = Pos 8}
qualif CmpZ(v : @(0)): ((v >=
                           0)) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1130, sourceColumn = Pos 8}
qualif EqSiz(x : @(0), y : @(1)): (((vsize x) =
                                      (vsize y))) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1142, sourceColumn = Pos 8}
qualif NonEmpty(v : @(0)): ((0 <
                               (vsize v))) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1168, sourceColumn = Pos 8}
qualif OkIdx(v : int, x : @(0)): ((v <
                                     (vsize x))) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1170, sourceColumn = Pos 8}
qualif OkIdx(v : int, x : @(0)): ((v <=
                                     (vsize x))) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1172, sourceColumn = Pos 8}
qualif Plus(v : int, x : int, y : int): (((v + x) =
                                            y)) // SourcePos {sourceName = "AmericanFlag.hs.fq", sourceLine = Pos 1179, sourceColumn = Pos 8}

with these, verification takes only 1 second in LF! (if I don't take into account the parsing overhead which I've been ignoring all along)

Which sends me into thinking if there could be a practical way to cache these 10 qualifiers and use them as a starting point whenever the user modifies the code.

# Sub Constraints     :  243
# WF  Constraints     :  102
# Constraints         :  208
# Refine Iterations   :  612
# SMT Brackets        :  405
# SMT Queries (Valid) : 6371
# SMT Queries (Total) : 7939
# Sliced Constraints  :  208
# Target Constraints  :   91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants