[cfe-dev] [GSoC 2018] Using the Z3 SMT Solver to Validate Bugs Reported by the Clang Static Analyzer

Tue Jun 26 16:49:27 PDT 2018

Hi,

Indeed, just removing the taint check in evalBinOp fixes the constraints
being dropped and a number of test cases now print the expected output.
There are some crashes as well, as mentioned in D28953.

Em qua, 27 de jun de 2018 às 00:30, Dominic Chen <d.c.ddcc at gmail.com>
escreveu:

> The reason symbols are getting dropped, is because, currently, CSA doesn’t
> generate symbols for things that RangedConstraintManager doesn’t support,
> unless taint tracking is enabled. The querying that I mentioned is just a
> performance optimization, to check the capabilities of the
> ConstraintManager at runtime and avoid generating unsupported symbols.
>
> If, as Artem suggests, the new policy is to generate accurate symbols,
> even if the underlying ConstraintManager doesn’t support reasoning over
> them, then the solution is to identify all the places where unknown values
> (over-approximations) are being generated, and replace them with the
> correct SVal.
>
> In your patch below, removing the first part is correct. In fact, I have
> the same change in D35450. But, the second isn’t, because at that point you
> already have an unknown value, and must propagate it to maintain
> correctness of the analysis.
>
> The reason I mention D28955 and D35450, is because I’ve already started on
> some of these changes, so you don’t have to do everything. I’ll try to
> rebase and update them this weekend.
>
> Dominic
>
> On Jun 26, 2018, at 2:37 PM, Mikhail Ramalho <mikhail.ramalho at gmail.com>
> wrote:
>
> Hi,
>
> Another solution, since you're using Z3 already, is to implement runtime
>> support for querying the underlying ConstraintManager's about the types
>> of constraints that it supports (e.g. canReasonAboutFoo() ). Then, you
>> can use this to generate the appropriate SVal's at runtime, which could
>> include support for symbolic extension/truncation, remainder, shifts, etc.
>>
>>
> I see, but for the project I proposed to check why the constraints were
> being dropped, regardless of having the SMT solver enabled or not.
>
> ~
>
> Anyway, I removed 4 lines of code and it seems to be working:
>
> diff --git a/lib/StaticAnalyzer/Core/SValBuilder.cpp
> b/lib/StaticAnalyzer/Core/SValBuilder.cpp
> index 137e9a7..39051a8 100644
> --- a/lib/StaticAnalyzer/Core/SValBuilder.cpp
> +++ b/lib/StaticAnalyzer/Core/SValBuilder.cpp
> @@ -377,9 +377,6 @@ SVal SValBuilder::makeSymExprValNN(ProgramStateRef
> State,
>                                     BinaryOperator::Opcode Op,
>                                     NonLoc LHS, NonLoc RHS,
>                                     QualType ResultTy) {
> -  if (!State->isTainted(RHS) && !State->isTainted(LHS))
> -    return UnknownVal();
> -
>    const SymExpr *symLHS = LHS.getAsSymExpr();
>    const SymExpr *symRHS = RHS.getAsSymExpr();
>    // TODO: When the Max Complexity is reached, we should conjure a symbol
> @@ -406,9 +403,6 @@ SVal SValBuilder::evalBinOp(ProgramStateRef state,
> BinaryOperator::Opcode op,
>    if (lhs.isUndef() || rhs.isUndef())
>      return UndefinedVal();
>
> -  if (lhs.isUnknown() || rhs.isUnknown())
> -    return UnknownVal();
> -
>    if (lhs.getAs<nonloc::LazyCompoundVal>() ||
>        rhs.getAs<nonloc::LazyCompoundVal>()) {
>      return UnknownVal();
>
> The static analyzer now generates the following constraints for each path:
>
> 0U % (reg_$0<unsigned int width>): { [1, 4294967295] }
>
> and
>
> 0U % (reg_$0<unsigned int width>): { [0, 0] }
>
> And Z3 is able to remove the false positive.
>
> Now I need to understand the impact of the changes, specially the taint
> check removal.
>
> Thank you,
>
>
>>
>> Dominic
>>
>> On 6/26/2018 1:23 PM, Mikhail Ramalho via cfe-dev wrote:
>> > Hi all,
>> >
>> > Hi guys,
>> >
>> > I'm investigating the constraints being dropped, here's what I got so
>> far.
>> >
>> > First of all, I'm using the following program:
>> >
>> > void foo(unsigned width)
>> > {
>> >   int base;
>> >   int i = 0;
>> >
>> >   if (!(i % width))
>> >     base = 1;
>> >
>> >   if(base == 1)
>> >     abort();
>> > }
>> >
>> > I started by looking at ExprEngine::processBranch, where
>> >
>> > SVal X = PrevState->getSVal(Condition, PredI->getLocationContext());
>> >
>> > returns Unknown for the remainder condition. The getSVal ends up
>> > looking in a map for the value result, so I found that bindExpr fills
>> > that map.
>> >
>> > Going back a bit, into ExprEngine::Visit, when evaluating a
>> > Stmt::BinaryOperatorClass (regardless of the eagerly assume flag)
>> > ExprEngine::VisitBinaryOperator is called, which eventually calls
>> > evalBinOp and, since it doesn't understand remainder, it returns
>> > unknown and BindExpr is never called.
>> >
>> > Back to ExprEngine::processBranch, when the symbol is unknown the
>> > following piece of code is called:
>> >
>> >     // If the condition is still unknown, give up.
>> >     if (X.isUnknownOrUndef()) {
>> >       builder.generateNode(PrevState, true, PredI);
>> >       builder.generateNode(PrevState, false, PredI);
>> >       continue;
>> >     }
>> >
>> > and the condition is simply ignored. When the result is defined, it
>> > creates two nodes assuming the constraints.
>> >
>> > ~
>> >
>> > My idea is when the SVal is undef or unknown, instead of generating
>> > two nodes with no  knowledge about the constraints, we could create
>> > two ranged constraints, like:
>> >
>> > i % width: [0,0]
>> >
>> > and
>> >
>> > i % width: [1,1]
>> >
>> > for each path. That way we can keep the constraints with reasonable
>> > values.
>> >
>> > What do you think? It feels like this will break stuff further down
>> > the line, but I'll know for sure if I implement the change.
>> >
>> > ~
>> >
>> > Artem's response:
>> >
>> > Yep, i strongly believe that any UnknownVal should be treated as a
>> > synonym of "not implemented".
>> >
>> > In *this* example you might also notice that there's no symbol for 'i',
>> > but it's a concrete integer 0 instead. So you can evaluate the whole
>> > remainder to 0, unless 'width' is also equal to 0 (in which case the
>> > answer would be UndefinedVal). Also note that when enabled, DivZero
>> > checker will refute the 'width == 0' branch.
>> >
>> > In the general case you have no choice but to produce an IntSymExpr (if
>> > 'i' is a concrete integer other than 0) or a SymSymExpr (if 'i' is a
>> > symbol).
>> >
>> >
>> > Em qui, 24 de mai de 2018 às 18:15, Mikhail Ramalho
>> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>
>> escreveu:
>> >
>> >
>> >       Total Time      time.analyzer.time.sys (s) (mean)
>> >     time.analyzer.time.user (s) (mean)        time.analyzer.time.wall
>> (s)
>> >     (mean)
>> >
>> >       Reported bugs
>> >     Tmux      99.2    0.076   27.253  7.656
>> >
>> >       32
>> >     Tmux + z3         152.88  0.074   56.251  11.505
>> >
>> >       32
>> >     Ratio     154.11%         97.37%  206.40%         150.27%
>> >       Diff    0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Redis     173.69  0.057   7.083   7.271
>> >
>> >       146
>> >     Redis + z3        193.43  0.057   7.621   7.728
>> >
>> >       140
>> >     Ratio     111.37%         100.00%         107.60%         106.29%
>>
>> >       Diff    6
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     OpenSSL   264.93  0.042   3.31    3.412
>> >
>> >       204
>> >     OpenSSL + z3      213.53  0.035   3.099   3.152
>> >
>> >       204
>> >     Ratio     80.60%  83.33%  93.63%  92.38%
>> >       Diff    0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Twin      143.17  0.067   6.593   6.696
>> >
>> >       138
>> >     Twin + z3         133.83  0.06    6.79    6.882
>> >
>> >       138
>> >     Ratio     93.48%  89.55%  102.99%         102.78%
>> >       Diff    0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Git + z3  333.9   0.075   8.52    8.67
>> >
>> >       96
>> >     Git + z3  289.59  0.062   7.924   8.023
>> >
>> >       90
>> >     Ratio     86.73%  82.67%  93.00%  92.54%
>> >       Diff    6
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Postgresql        889.35  0.079   8.482   8.631
>> >
>> >       676
>> >     Postgresql + z3   902.86  0.077   9.694   9.863
>> >
>> >       676
>> >     Ratio     101.52%         97.47%  114.29%         114.27%
>> >       Diff    0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Sqlite3   1206.3  0.262   368.446         370.786
>> >
>> >       200
>> >     Sqlite3 + z3      1260.85         0.43    407.763         409.688
>>
>> >
>> >       199
>> >     Ratio     104.52%         164.12%         110.67%         110.49%
>>
>> >       Diff    1
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >     Average   104.62%         102.07%         118.37%         109.86%
>>
>> >
>> >
>> >
>> >
>> >
>> >     2018-05-24 15:11 GMT+01:00 Mikhail Ramalho
>> >     <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
>> >
>> >         Hi all,
>> >
>> >         This is my first report to the community, comparing the
>> >         results with and without the Z3 refutation when analyzing a
>> >         number of projects.
>> >
>> >         ~
>> >
>> >         First of all, I'd like to thank Réka Kovács as the first
>> >         version of the refutation using Z3 was created by her
>> >         (https://reviews.llvm.org/D45517)! Thank you very much!
>> >
>> >         After applying patch D45517, you can use the refutation check
>> >         by using -analyzer-config crosscheck-with-z3=true. Obviously,
>> >         you need a version of clang built with Z3.
>> >
>> >         ~
>> >
>> >         I'm currently analyzing 7 C projects (unfortunately, there's a
>> >         bug preventing us from analyzing FFmpeg):
>> >
>> >         1. Tmux
>> >         2. Redis
>> >         3. OpenSSL
>> >         4. Twin
>> >         5. Git
>> >         6. Postgresql
>> >         7. Sqlite3
>> >
>> >         In short, the refutation check slows down the verification by
>> >         ~20%. It removed 6 FPs from Redis, 6 FPs from git and 1 FP
>> >         from Sqlite3 (FP means false positive). Surprisingly enough,
>> >         some analysis were faster with the crosscheck, but I'm not
>> >         sure why (maybe ccache?).
>> >
>> >         Attached is a spreadsheet (report1.ods) with some number
>> >         (total time, average time per check, # of reported bugs) and a
>> >         txt with all the raw data from the analysis (raw.txt). I'll
>> >         add these data to google drive for the next report.
>> >
>> >         In order to generate the raw data, you need to use a version
>> >         of clang with assertions enabled, call scan-build.py with
>> >         '-analyzer-config serialize-stats=true' and you need to apply
>> >         patch https://reviews.llvm.org/D43134.
>> >
>> >         Thank you very much,
>> >
>> >
>> >         2018-05-01 15:27 GMT+01:00 Mikhail Ramalho
>> >         <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
>> >
>> >             Hello all,
>> >
>> >             My proposal for GSoC 2018 [0] about Bug Validation in the
>> >             Clang Static Analyzer using the Z3 SMT Solver was accepted.
>> >
>> >             I'll work with George Karpenkov to improve the bug reports
>> >             that the static analyzer produces by reducing the number
>> >             of false bugs.
>> >
>> >             Thank you,
>> >
>> >             [0]
>> https://docs.google.com/document/d/1-zNSv0l4WyoxYpJUAw8LFnQq_TY4AGjIpPu1VPkmO-g
>> >
>> >             --
>> >
>> >             Mikhail Ramalho.
>> >
>> >
>> >
>> >
>> >         --
>> >
>> >         Mikhail Ramalho.
>> >
>> >
>> >
>> >
>> >     --
>> >
>> >     Mikhail Ramalho.
>> >
>> >
>> >
>> > --
>> >
>> > Mikhail Ramalho.
>> >
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>
> --
>
> Mikhail Ramalho.
>
>
>

-- 

Mikhail Ramalho.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180627/455ab460/attachment.html>