forked from chriseth/solidity_isola
-
Notifications
You must be signed in to change notification settings - Fork 0
/
smt_fv.tex
506 lines (456 loc) · 18.2 KB
/
smt_fv.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
\section{SMT-based Solidity Verification}
\label{section:smt}
SMT solvers are powerful tools to prove satisfiability of formulas in different
logics which often have the necessary expressiveness to model software in a
straightforward manner~\cite{Komuravelli13,Alt17,Donaldson11,Beyer11}.
We translate Solidity contracts and their functions into SMT formulas using a
combination of different quantifier-free theories.
%
We shall name the translated formulas the \emph{SMT encoding} of the Solidity
program.
%
The goal of the translation from Solidity to SMT formulas is to verify safety
properties from the Solidity program by performing queries to the SMT solver.
\subsection{SMT Encoding}
The SMT encoding is computed during a depth-first traversal of the
abstract syntax tree (AST) of the
Solidity program and thus roughly follows the execution order.
%
For now, each function is analyzed in isolation and thus the
context regarding the SMT solver (contract storage, local variables, etc.)
is cleared before each function of a contract is visited.
%
There are five types of formulas that are encoded from Solidity inside each function.
%
Three of them, \emph{Control-flow}, \emph{Type constraint} and \emph{Variable assignment}
are simply translated as SMT constraints.
%
The \emph{Branch conditions} are the conditions of the current branch of
execution and thus grow and shrink as we traverse the AST.
%
The last, \emph{Verification Target}, creates a formula consisting of the
verification goal conjoined with the previously mentioned constraints,
including the current branch conditions, and
queries the SMT solver for satisfiability.
%
The different types of encoding are described below.
\begin{paragraph}{Branch conditions.}
For an if-statement \code{if ($c$) $T$ else $F$}, we add
$c$ to the branch conditions during the visit of $T$.
After that, we replace $c$ by $\neg c$
for the visit of $F$ and also remove that when we are
finished with the if-statement.
\end{paragraph}
\begin{paragraph}{Control-flow.}
These constraints model conditional termination of execution.
A \code{require($r$)} statement (and similar for \code{assert($r$)})
terminates execution if $r$ evaluates to $\mathrm{false}$, but
of course only if it is executed. Thus,
we add a constraint $b \to r$, where $b$ is the conjunction of
the current branch conditions. Note that due to the implication,
we can keep this constraint even when we leave the current branch.
\end{paragraph}
\begin{paragraph}{Type constraint.}
A variable declaration leads to a correspondent SMT variable that is assigned
the default value of the declared type.
%
For example, Boolean variables are assigned $\mathrm{false}$, and integer variables are
assigned 0.
%
Function parameters are initialized with a range of valid values for the given
type, since their value is unknown. For instance, a parameter \code{uint32 x}
is initialized as $0 \le x < 2^{32}$ (32 bits), a parameter \code{int256 y}
is initialized as $-2^{255} \le y < 2^{255}$, and a parameter \code{address
a} is assigned the range $0 \le a < 2^{(8*20)}$ (20 bytes).
%
The encoder currently supports Boolean and the various sizes of Integer
variables.
\end{paragraph}
\begin{paragraph}{Variable assignment.}
The encoding of a variable assignment follows the \emph{Single Static Assignment}
(SSA) where each assignment to a program variable introduces a new SMT variable
that is assigned to only once.
%
When a program variable is modified inside different branches of execution,
a new variable is created after the branch to re-combine the different
values after the branches.
We use the if-then-else-function \code{ite} to assign the value
\code{ite(c, $x_1$, $x_2$)} (if-then-else), where $c$ is the branch
condition and $x_1$ and $x_2$ are the two SSA variables corresponding to $x$
at the ends of the branches (cf.\ the $\phi$ function in SSA).
\end{paragraph}
\begin{paragraph}{Verification target.}
Every arithmetic operation is checked against underflow and overflow according
to the type of the values, and an example is given if there is an underflow or
overflow.
%
We also check whether branch conditions are constant, warning the user about
unreachable blocks or trivial conditions.
%
The conditions in calls to \code{assert} represent target postconditions that the
Solidity programmer wants to ensure at runtime and are verified statically.
%
If it is possible to disprove the assertion provided that the control flow
can reach it (i.e.\ the current branch conditions are satisfiable),
the user is given a counterexample.
%
In contrast, \code{require} conditions are meant to be used as filters for
unwanted input values when they are unknown, for example, in public functions,
acting like preconditions for the rest of the scope.
%
Therefore, failing calls to \code{require} are not treated as errors and are
just checked for triviality and reachability.
\end{paragraph}
Figure~\ref{figure:solidity_encoding_1} shows on the left a Solidity sample
that requires all five types of encoding, shown on the right, in order to
verify the intended properties.
%
Since the variables \code{uint256 a} and \code{uint256 b} are function
parameters, they are initialized (lines 1 and 2) with the valid range of values
for their type (\code{uint256}).
%
If \code{a = 0}, the \code{require} condition about \code{b} is used as a
precondition when verifying the assertion in the end of the function (line 3).
%
The next two assignments to \code{b} create the new SSA variables
$b_1$ and $b_2$ (line 4).
%
Variable $b_3$ encodes the second and third conditions, and $b_4$
encodes the first condition (lines 5 and 6).
%
Finally, $b_4$ is used in the assertion check (line 7).
%
Note that the nested control-flow is implicitly encoded in the $ite$
variables $b_3$ and $b_4$.
%
We can see that the target assertion is safe within its function.
\begin{figure}
\label{figure:solidity_encoding_1}
\noindent\begin{minipage}{.48\textwidth}
\begin{verbatim}
contract C
{
function f(uint256 a, uint256 b)
{
if (a == 0)
require(b <= 100);
else if (a == 1)
b = 1000;
else
b = 10000;
assert(b <= 100000);
}
}
\end{verbatim}
\end{minipage}\hfill
\begin{minipage}{.48\textwidth}
1. $a_0 \ge 0 \land a_0 < 2^{256} \land \phantom{x}$\\
2. $b_0 \ge 0 \land b_0 < 2^{256} \land \phantom{x}$\\
3. $(a_0 = 0) \rightarrow (b_0 \le 100) \, \land$\\
4. $b_1 = 1000 \land b_2 = 10000$\\
5. $b_3 = ite(a == 1, b_1, b_2) \land \phantom{x}$\\
6. $b_4 = ite(a == 0, b_0, b_3) \land \phantom{x}$\\
7. $\neg b_4 \le 100000$
\end{minipage}
\caption{SMT encoding of an assertion check.}
\end{figure}
As described above, the component performs several local checks during a single
run, therefore it is critical that the used SMT solver supports
incremental checking.
%
Moreover, we do not abstract difficult operations such as multiplication
between variables, and rather try to give precise answers when possible.
%
Therefore we combine various quantifier-free theories, such as Linear
Arithmetics, Uninterpreted Functions and Nonlinear Arithmetics.
%
Solidity has integrated Z3~\cite{Z3} and CVC4~\cite{CVC4} via their C++ APIs.
%
The two SMT solvers are used together to increase solving power.
%
This has been important especially for the programs that require Nonlinear
reasoning, since often one solver is able to prove a property that the other
cannot.
%
The component is also able to generate \code{smtlib2}~\cite{SMTLIB}
formulas in order to interface with additional solvers.
\subsection{Specific Examples}
Even though the current implementation of the SMT module supports a small
subset of Solidity, it can already be used to detect flaws that might be
overlooked by the user.
%
We present now a few examples of buggy code that the compiler is able to
detect regarding constant conditions, overflow, and assertion checking.
The following loop is infinite because the author of the code
forgot to increment the loop variablee \code{i}.
%
In that case, the user receives a message about the loop's condition being
always true for the case where \code{owners.length} is not zero.
\begin{verbatim}
for (uint i = 0; i < owners.length;)
{
// ...
}
\end{verbatim}
Another type of problem that the compiler finds automatically is unreachable
code.
%
In the following control-flow expressions, it warns the user that the condition
in the \code{else if} is unreachable.
\begin{verbatim}
if (a >= 7) { ... }
else if (a >= 10) { ... }
\end{verbatim}
Arithmetic operations should be checked against overflow, especially when
parameters of public functions are used.
%
The code below may easily lead to an overflow, which the tool reports with a
counterexample.
%
The overflow can be prevented with a \code{require} statement.
\begin{verbatim}
function addFunds(uint256 _amount) {
// require((_amount + funds) >= funds);
funds += _amount;
}
\end{verbatim}
One of the most important features is the ability to check safety properties
statically, by using Solidity's \code{assert}.
%
The following example code uses an \code{assert} to check the equivalence of
two computations, once written using control-flow statements, once as a direct
Boolean formula.
\begin{verbatim}
function f(bool a, bool b) public pure {
bool c;
if (a) {
if (b) c = false;
else c = true;
}
else {
if (b) c = true;
else c = false;
}
assert(c != ((a && !b) || (!a && b)));
}
\end{verbatim}
Note that the assertion will be reported to fail with the valuation
\code{a = false, b = false, c = false}.
%
The safe condition would be
\code{assert(c == ((a \&\& !b) || (!a \&\& b)));}.
\subsection{Future Plans}
We introduce now the features that we intend to implement in the SMT module, as
well as discuss arising research problems where we present simple examples that
highlight how the new features will work.
Our current implementation plans for the component involve supporting
a larger subset of the language, including more complex data structures
such as \code{mapping}.
%
This is especially important for cases such as token contracts, where
properties such as funds leakage and wrong balance could be used as targets.
%
The component is meant to be built as a Bounded Model Checker, unrolling loops
up to a constant bound and automatically detecting bounds when possible.
%
We also intend to introduce a loop pre and postconditions syntax to help the
unbounded case.
\begin{paragraph}{Range restriction of real life values.}
Some Solidity environment variables have a 256 bit unsigned integer type,
although the range of their values is much more restricted in practice.
For instance, the UNIX timestamp of the current block in seconds,
\texttt{block.timestamp} will not exceed 64 bits for the next
500 billion years. To reduce the false positives rate for
overflows, it makes sense to restrict the value range for these
variables in the SMT encoding. It is an open question how to do
this properly, since a straightforward hard cap at some point
could create undesired artefacts around that point.
%
Another environment variable that could have a similar
behavior is \texttt{block.number}.
\end{paragraph}
\begin{paragraph}{Revert after Error.}
Errors are irrelevant if they result in a state change reversion
(Sec.~\ref{section:smart_contracts}). The user should be warned
about failing checks such as overflow only if they do not result in a state reversion.
%
One popular example is the SafeMath~\cite{SafeMath} contract which
is commonly used to turn wrapping arithmetics into overflow-checked arithmetics:
\begin{verbatim}
function add(uint256 a, uint256 b) internal pure
returns (uint256) {
uint256 c = a + b;
require(c >= a);
return c;
}
\end{verbatim}
Although the tool detects an overflow in the computation of \code{a + b},
the overflow will result in a truncation of \code{c} in two's complement and thus
any execution that contains the overflow will revert at the \code{require}.
%
In this case the user should not be warned of the error, since no erroneous cases
exist in accepted executions.
\end{paragraph}
\begin{paragraph}{Aliasing.}
In many languages, complex data structures are only assigned by
reference, creating two names for the same object and thus changes
performed via one name also affect references via the second name.
This is of course a big challenge for formal verification and
is known as the \emph{aliasing problem}.
This is also the case for some aspects of Solidity, but data stored
in storage does not have this problem: The structure of
storage is determined at compile-time, and all objects are
statically allocated; while arrays can grow, their position in
storage is fixed at compile-time. Because of that, the aliasing problem
is not an issue, as long as we can assume that there are no hash
collisions in keccak256 and dynamic arrays are small enough.
\end{paragraph}
\begin{figure}[h]
\begin{verbatim}
contract C
{
uint a;
constructor () public {}
function a1() public { a = 1; }
function a2() public { a = 2; }
function a3() public { a = 3; }
function a4() public { a = 4; }
function plusA(uint x) public view returns (uint) {
require(x < 1000);
return a + x;
}
}
\end{verbatim}
\caption{Contract with a storage variable invariant.}
\label{fig:invariant_a}
\end{figure}
\begin{paragraph}{Multi-transaction invariants.}
One of the most interesting aspects we intend to research and support is
multi-transaction invariants.
%
The ultimate goal is to compute invariants for state variables (resident in the
contract's storage) considering any arbitrary number of calls to the contract.
%
This would enable these invariants to be used as preconditions whenever they
are accessed.
%
Fig.~\ref{fig:invariant_a} presents an example contract with a state variable
\code{a} which can be assigned differently depending on which public function
is called.
%
We can see that if we consider all possible paths, \code{a} is never greater
than 4, so the invariant $a \le 4$ holds.
%
Currently, without the discovery of the invariant, the SMT module reports an
overflow case in the \code{return} statement of function \code{plusA}.
%
If the invariant is used as a pre-condition of the function, by adding
\code{require(a <= 4)}, for example, no overflow is reported.
%
The SMT component should in the future be able to automatically infer these
invariants.
\end{paragraph}
\begin{paragraph}{Post-constructor invariants.}
A special and restricted case of multi-transaction invariants usage are
contracts where a state variable is assigned in the constructor and never
modified again.
%
A common example is contract \code{Token} from Sec.~\ref{section:solidity}.
%
We can see from the constructor that the \code{totalSupply} of tokens is 10000,
which is also the initial amount of tokens given to the deployer of the
contract.
%
The only way to move tokens is via the function \code{transfer}, which
decreases a certain amount of tokens from one account, if it owns enough, and
increases the same amount in another account.
%
We can modify function \code{transfer} to use the invariant about state
variable \code{totalSupply}:
\begin{verbatim}
function transfer(address _to, uint256 _value) public {
require(balances[msg.sender] >= _value);
uint256 sumBefore = balances[msg.sender] + balances[_to];
totalSupply -= sumBefore;
balances[msg.sender] -= _value;
balances[_to] += _value;
uint256 sumAfter = balances[msg.sender] + balances[_to];
totalSupply += sumAfter;
assert(sumBefore == sumAfter);
assert(totalSupply == 10000);
}
\end{verbatim}
As we can see, the number of total tokens never changes and the invariant
$\mathrm{totalSupply} = 10000$ holds in the beginning of any function of the
contract.
%
Similarly to the previous example, it is not possible to prove the last
assertion without the knowledge about the invariant.
\end{paragraph}
\begin{paragraph}{Modifiers as pre and postconditions.}
An orthogonal approach to automatically inferred invariants is to provide a
good syntax so that Solidity programmers can explicitly state pre and
postconditions of functions.
%
Modifiers (Sec.\ref{section:solidity}) are a natural candidate for that, given
their ability to behave as patterns that wrap functions.
%
In the following code, the modifier \code{safeBalance} states pre and
postconditions for the \code{transfer} function in the \code{Token} contract
(Sec.~\ref{section:solidity}), ensuring that the concrete value of
\code{totalSupply} does not change after a token transfer.
\begin{verbatim}
modifier safeBalance {
require(totalSupply == 10000);
_;
assert(totalSupply == 10000);
}
function transfer(address _to, uint256 _value) safeBalance {
...
}
\end{verbatim}
\end{paragraph}
\begin{paragraph}{Function abstraction.}
If modifiers are used as pre and postconditions as described above, it could be
possible to abstract functions based on these modifiers.
%
Let \code{zeroAccount} be a function from contract \code{Token} that transfers
all the tokens that an account holds to another one of their choice.
%
Function \code{zeroAccount} should also be sure that the \code{totalSupply} did
not change.
\begin{verbatim}
function zeroAccount(address _to) {
transfer(_to, balance[msg.sender]);
assert(totalSupply == 10000);
}
\end{verbatim}
One approach to analyze \code{zeroAccount} is to abstract function
\code{transfer} by encoding only its modifiers and ignoring its body when
trying to prove the assertion.
%
This query is much cheaper for the SMT solver, and in many cases (as it is in
this one) it might be enough to prove the assertion.
\end{paragraph}
\begin{paragraph}{Effective Callback Freeness.}
The idea of \emph{Effective Callback Freeness} was recently introduced by
\cite{Grossman}.
%
A smart contract $C$ is effectively callback free, if any state change caused
by a callback in $C$ can also be caused by an execution that does not have this
callback.
%
Straightforward examples include a contract that uses a mutex mechanism to
disallow state changes if the function is called as a callback, and the general
pattern where all functions perform state changes before they call other
contracts.
%
The authors show that most of the contracts deployed on Ethereum have this
property.
%
This is a powerful property, since it means that any invariant computed for a
contract's state variables still holds even after calling external contracts
with unknown behavior.
%
We intend to study how to integrate this approach to our static analysis.
\end{paragraph}