[cfe-dev] [GSoC 2018] Using the Z3 SMT Solver to Validate Bugs Reported by the Clang Static Analyzer

Tue Jun 26 11:37:13 PDT 2018

Hi,

Another solution, since you're using Z3 already, is to implement runtime
> support for querying the underlying ConstraintManager's about the types
> of constraints that it supports (e.g. canReasonAboutFoo() ). Then, you
> can use this to generate the appropriate SVal's at runtime, which could
> include support for symbolic extension/truncation, remainder, shifts, etc.
>
>
I see, but for the project I proposed to check why the constraints were
being dropped, regardless of having the SMT solver enabled or not.

~

Anyway, I removed 4 lines of code and it seems to be working:

diff --git a/lib/StaticAnalyzer/Core/SValBuilder.cpp
b/lib/StaticAnalyzer/Core/SValBuilder.cpp
index 137e9a7..39051a8 100644
--- a/lib/StaticAnalyzer/Core/SValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -377,9 +377,6 @@ SVal SValBuilder::makeSymExprValNN(ProgramStateRef
State,
                                    BinaryOperator::Opcode Op,
                                    NonLoc LHS, NonLoc RHS,
                                    QualType ResultTy) {
-  if (!State->isTainted(RHS) && !State->isTainted(LHS))
-    return UnknownVal();
-
   const SymExpr *symLHS = LHS.getAsSymExpr();
   const SymExpr *symRHS = RHS.getAsSymExpr();
   // TODO: When the Max Complexity is reached, we should conjure a symbol
@@ -406,9 +403,6 @@ SVal SValBuilder::evalBinOp(ProgramStateRef state,
BinaryOperator::Opcode op,
   if (lhs.isUndef() || rhs.isUndef())
     return UndefinedVal();

-  if (lhs.isUnknown() || rhs.isUnknown())
-    return UnknownVal();
-
   if (lhs.getAs<nonloc::LazyCompoundVal>() ||
       rhs.getAs<nonloc::LazyCompoundVal>()) {
     return UnknownVal();

The static analyzer now generates the following constraints for each path:

0U % (reg_$0<unsigned int width>): { [1, 4294967295] }

and

0U % (reg_$0<unsigned int width>): { [0, 0] }

And Z3 is able to remove the false positive.

Now I need to understand the impact of the changes, specially the taint
check removal.

Thank you,


>
> Dominic
>
> On 6/26/2018 1:23 PM, Mikhail Ramalho via cfe-dev wrote:
> > Hi all,
> >
> > Hi guys,
> >
> > I'm investigating the constraints being dropped, here's what I got so
> far.
> >
> > First of all, I'm using the following program:
> >
> > void foo(unsigned width)
> > {
> >   int base;
> >   int i = 0;
> >
> >   if (!(i % width))
> >     base = 1;
> >
> >   if(base == 1)
> >     abort();
> > }
> >
> > I started by looking at ExprEngine::processBranch, where
> >
> > SVal X = PrevState->getSVal(Condition, PredI->getLocationContext());
> >
> > returns Unknown for the remainder condition. The getSVal ends up
> > looking in a map for the value result, so I found that bindExpr fills
> > that map.
> >
> > Going back a bit, into ExprEngine::Visit, when evaluating a
> > Stmt::BinaryOperatorClass (regardless of the eagerly assume flag)
> > ExprEngine::VisitBinaryOperator is called, which eventually calls
> > evalBinOp and, since it doesn't understand remainder, it returns
> > unknown and BindExpr is never called.
> >
> > Back to ExprEngine::processBranch, when the symbol is unknown the
> > following piece of code is called:
> >
> >     // If the condition is still unknown, give up.
> >     if (X.isUnknownOrUndef()) {
> >       builder.generateNode(PrevState, true, PredI);
> >       builder.generateNode(PrevState, false, PredI);
> >       continue;
> >     }
> >
> > and the condition is simply ignored. When the result is defined, it
> > creates two nodes assuming the constraints.
> >
> > ~
> >
> > My idea is when the SVal is undef or unknown, instead of generating
> > two nodes with no  knowledge about the constraints, we could create
> > two ranged constraints, like:
> >
> > i % width: [0,0]
> >
> > and
> >
> > i % width: [1,1]
> >
> > for each path. That way we can keep the constraints with reasonable
> > values.
> >
> > What do you think? It feels like this will break stuff further down
> > the line, but I'll know for sure if I implement the change.
> >
> > ~
> >
> > Artem's response:
> >
> > Yep, i strongly believe that any UnknownVal should be treated as a
> > synonym of "not implemented".
> >
> > In *this* example you might also notice that there's no symbol for 'i',
> > but it's a concrete integer 0 instead. So you can evaluate the whole
> > remainder to 0, unless 'width' is also equal to 0 (in which case the
> > answer would be UndefinedVal). Also note that when enabled, DivZero
> > checker will refute the 'width == 0' branch.
> >
> > In the general case you have no choice but to produce an IntSymExpr (if
> > 'i' is a concrete integer other than 0) or a SymSymExpr (if 'i' is a
> > symbol).
> >
> >
> > Em qui, 24 de mai de 2018 às 18:15, Mikhail Ramalho
> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>> escreveu:
> >
> >
> >       Total Time      time.analyzer.time.sys (s) (mean)
> >     time.analyzer.time.user (s) (mean)        time.analyzer.time.wall (s)
> >     (mean)
> >
> >       Reported bugs
> >     Tmux      99.2    0.076   27.253  7.656
> >
> >       32
> >     Tmux + z3         152.88  0.074   56.251  11.505
> >
> >       32
> >     Ratio     154.11%         97.37%  206.40%         150.27%
> >       Diff    0
> >
> >
> >
> >
> >
> >
> >
> >
> >     Redis     173.69  0.057   7.083   7.271
> >
> >       146
> >     Redis + z3        193.43  0.057   7.621   7.728
> >
> >       140
> >     Ratio     111.37%         100.00%         107.60%         106.29%
>
> >       Diff    6
> >
> >
> >
> >
> >
> >
> >
> >
> >     OpenSSL   264.93  0.042   3.31    3.412
> >
> >       204
> >     OpenSSL + z3      213.53  0.035   3.099   3.152
> >
> >       204
> >     Ratio     80.60%  83.33%  93.63%  92.38%
> >       Diff    0
> >
> >
> >
> >
> >
> >
> >
> >
> >     Twin      143.17  0.067   6.593   6.696
> >
> >       138
> >     Twin + z3         133.83  0.06    6.79    6.882
> >
> >       138
> >     Ratio     93.48%  89.55%  102.99%         102.78%
> >       Diff    0
> >
> >
> >
> >
> >
> >
> >
> >
> >     Git + z3  333.9   0.075   8.52    8.67
> >
> >       96
> >     Git + z3  289.59  0.062   7.924   8.023
> >
> >       90
> >     Ratio     86.73%  82.67%  93.00%  92.54%
> >       Diff    6
> >
> >
> >
> >
> >
> >
> >
> >
> >     Postgresql        889.35  0.079   8.482   8.631
> >
> >       676
> >     Postgresql + z3   902.86  0.077   9.694   9.863
> >
> >       676
> >     Ratio     101.52%         97.47%  114.29%         114.27%
> >       Diff    0
> >
> >
> >
> >
> >
> >
> >
> >
> >     Sqlite3   1206.3  0.262   368.446         370.786
> >
> >       200
> >     Sqlite3 + z3      1260.85         0.43    407.763         409.688
>
> >
> >       199
> >     Ratio     104.52%         164.12%         110.67%         110.49%
>
> >       Diff    1
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >     Average   104.62%         102.07%         118.37%         109.86%
>
> >
> >
> >
> >
> >
> >     2018-05-24 15:11 GMT+01:00 Mikhail Ramalho
> >     <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
> >
> >         Hi all,
> >
> >         This is my first report to the community, comparing the
> >         results with and without the Z3 refutation when analyzing a
> >         number of projects.
> >
> >         ~
> >
> >         First of all, I'd like to thank Réka Kovács as the first
> >         version of the refutation using Z3 was created by her
> >         (https://reviews.llvm.org/D45517)! Thank you very much!
> >
> >         After applying patch D45517, you can use the refutation check
> >         by using -analyzer-config crosscheck-with-z3=true. Obviously,
> >         you need a version of clang built with Z3.
> >
> >         ~
> >
> >         I'm currently analyzing 7 C projects (unfortunately, there's a
> >         bug preventing us from analyzing FFmpeg):
> >
> >         1. Tmux
> >         2. Redis
> >         3. OpenSSL
> >         4. Twin
> >         5. Git
> >         6. Postgresql
> >         7. Sqlite3
> >
> >         In short, the refutation check slows down the verification by
> >         ~20%. It removed 6 FPs from Redis, 6 FPs from git and 1 FP
> >         from Sqlite3 (FP means false positive). Surprisingly enough,
> >         some analysis were faster with the crosscheck, but I'm not
> >         sure why (maybe ccache?).
> >
> >         Attached is a spreadsheet (report1.ods) with some number
> >         (total time, average time per check, # of reported bugs) and a
> >         txt with all the raw data from the analysis (raw.txt). I'll
> >         add these data to google drive for the next report.
> >
> >         In order to generate the raw data, you need to use a version
> >         of clang with assertions enabled, call scan-build.py with
> >         '-analyzer-config serialize-stats=true' and you need to apply
> >         patch https://reviews.llvm.org/D43134.
> >
> >         Thank you very much,
> >
> >
> >         2018-05-01 15:27 GMT+01:00 Mikhail Ramalho
> >         <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
> >
> >             Hello all,
> >
> >             My proposal for GSoC 2018 [0] about Bug Validation in the
> >             Clang Static Analyzer using the Z3 SMT Solver was accepted.
> >
> >             I'll work with George Karpenkov to improve the bug reports
> >             that the static analyzer produces by reducing the number
> >             of false bugs.
> >
> >             Thank you,
> >
> >             [0]
> https://docs.google.com/document/d/1-zNSv0l4WyoxYpJUAw8LFnQq_TY4AGjIpPu1VPkmO-g
> >
> >             --
> >
> >             Mikhail Ramalho.
> >
> >
> >
> >
> >         --
> >
> >         Mikhail Ramalho.
> >
> >
> >
> >
> >     --
> >
> >     Mikhail Ramalho.
> >
> >
> >
> > --
> >
> > Mikhail Ramalho.
> >
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>

-- 

Mikhail Ramalho.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180626/a6b040b4/attachment.html>