[cfe-dev] [GSoC 2018] Using the Z3 SMT Solver to Validate Bugs Reported by the Clang Static Analyzer

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Jun 26 16:03:15 PDT 2018


There's no reason to "handle" unknown/undefined values in evalBinOp() 
family of methods. The only thing you need to do is to avoid producing 
unknown values in the first place, so that you didn't need to perform 
calculations over them in evalBinOp(). The earlier you eliminate the 
Unknown, the more accurate information you'll be able to provide to the 
solver.

Undefined values are values that result from undefined behavior 
according to the language standard. You don't need to handle Undefined 
values because there's a checker that'll interrupt the analysis whenever 
the code tries to perform arithmetic on undefined values, so your new 
handling code will never be executed in practice.

Unknown values represent values of expressions that the analyzer doesn't 
know how to compute. You don't need to handle Unknown values because 
instead you can always prevent them from appearing by fixing the part of 
the analyzer that produces them - just teach the analyzer how to compute 
them instead. Just get rid of them and you won't need to handle them. If 
getting rid of a particular source of Unknowns is suddenly hard (i.e. 
because certain semantics of the language are hard to model), as a 
fallback you can still replace the Unknown with a conjured symbol at 
that source of Unknowns.

Additionally, there are already multiple unsystematic places in the 
analyzer where we "symbolicate" Unknowns, such as 
ExprEngine::VisitBinaryOperator:

     67       if (RightV.isUnknown()) {
     68         unsigned Count = currBldrCtx->blockCount();
     69         RightV = svalBuilder.conjureSymbolVal(nullptr, 
B->getRHS(), LCtx,
     70                                               Count);
     71       }

Which is, again, merely a workaround for producing unknowns in the first 
place, and the proper fix is to avoid producing them.

On 6/26/18 3:17 PM, Mikhail Ramalho wrote:
> Investigated the constraint being dropped and found that it's because 
> of the pre-conditions enforced by evalBinOp.
>
> If any of the arguments is unknown or undefined, the constraint is 
> dropped as the code only handles Loc/NonLoc types. I had two ideas 
> about that:
>
> - Change evalBinOpLL, evalBinOpNN and evalBinOpLN to take SVals are 
> arguments (and obviously change the names): The good thing is that the 
> solution would be handled in the same place BUT these methods are 
> already quite big, and handling unknown/undef would just make them bigger.
>
> - Create new evalBinOpLU, evalBinOpNU, evalBinOpUL, evalBinOpUN in the 
> SValBuilder and implement them in SimpleSValBuilder: it would make the 
> code more modular and easier to understand, but will require some 
> refactoring or evalBinOpLL, evalBinOpNN and evalBinOpLN will end up 
> with duplicated code.
>
> Any other suggestion?
>
> Also, a couple of questions:
>
> 1. What's the difference between a unknown and a undefined symbol when 
> analyzing a program?
>
> From what I understood, an unknown symbol is the same as "I don't 
> understand this expression" or arguments that we don't its value and 
> undefined is, well, undefined values (e.g., uninitialized variables). 
> But when building constraints, does it make any difference? I mean, 
> from an SMT point of view, they would be the same, free variables.
>
> Thank you,
>
>
> Em ter, 26 de jun de 2018 às 19:54, Artem Dergachev 
> <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> escreveu:
>
>     I believe that even if the constraint manager can't "handle"
>     constraints
>     on a symbol, symbols should still be emitted. Because "can handle"
>     here
>     means "can simplify the symbol properly", not "can track constraints
>     over the symbol". The constraint manager would still at least be
>     able to
>     handle the symbol as an opaque symbol and track constraints over it.
>
>     In other words, it's still beneficial to know that "(a % b) == 0"
>     even
>     if we've no idea what "%" means, because when we see "(a % b)" next
>     time, even if we've still no idea what it means, we'll know that it's
>     zero anyway.
>
>
>     On 6/26/18 10:49 AM, Dominic Chen via cfe-dev wrote:
>     > Another solution, since you're using Z3 already, is to implement
>     runtime
>     > support for querying the underlying ConstraintManager's about
>     the types
>     > of constraints that it supports (e.g. canReasonAboutFoo() ).
>     Then, you
>     > can use this to generate the appropriate SVal's at runtime,
>     which could
>     > include support for symbolic extension/truncation, remainder,
>     shifts, etc.
>     >
>     > I have two stale patches that implement this, D28955 and D35450. The
>     > first might solve your problem with remainders; see the changes to
>     > SimpleSValBuilder.
>     >
>     > Dominic
>     >
>     > On 6/26/2018 1:23 PM, Mikhail Ramalho via cfe-dev wrote:
>     >> Hi all,
>     >>
>     >> Hi guys,
>     >>
>     >> I'm investigating the constraints being dropped, here's what I
>     got so far.
>     >>
>     >> First of all, I'm using the following program:
>     >>
>     >> void foo(unsigned width)
>     >> {
>     >>    int base;
>     >>    int i = 0;
>     >>
>     >>    if (!(i % width))
>     >>      base = 1;
>     >>
>     >>    if(base == 1)
>     >>      abort();
>     >> }
>     >>
>     >> I started by looking at ExprEngine::processBranch, where
>     >>
>     >> SVal X = PrevState->getSVal(Condition,
>     PredI->getLocationContext());
>     >>
>     >> returns Unknown for the remainder condition. The getSVal ends up
>     >> looking in a map for the value result, so I found that bindExpr
>     fills
>     >> that map.
>     >>
>     >> Going back a bit, into ExprEngine::Visit, when evaluating a
>     >> Stmt::BinaryOperatorClass (regardless of the eagerly assume flag)
>     >> ExprEngine::VisitBinaryOperator is called, which eventually calls
>     >> evalBinOp and, since it doesn't understand remainder, it returns
>     >> unknown and BindExpr is never called.
>     >>
>     >> Back to ExprEngine::processBranch, when the symbol is unknown the
>     >> following piece of code is called:
>     >>
>     >>      // If the condition is still unknown, give up.
>     >>      if (X.isUnknownOrUndef()) {
>     >>        builder.generateNode(PrevState, true, PredI);
>     >>        builder.generateNode(PrevState, false, PredI);
>     >>        continue;
>     >>      }
>     >>
>     >> and the condition is simply ignored. When the result is defined, it
>     >> creates two nodes assuming the constraints.
>     >>
>     >> ~
>     >>
>     >> My idea is when the SVal is undef or unknown, instead of generating
>     >> two nodes with no  knowledge about the constraints, we could create
>     >> two ranged constraints, like:
>     >>
>     >> i % width: [0,0]
>     >>
>     >> and
>     >>
>     >> i % width: [1,1]
>     >>
>     >> for each path. That way we can keep the constraints with reasonable
>     >> values.
>     >>
>     >> What do you think? It feels like this will break stuff further down
>     >> the line, but I'll know for sure if I implement the change.
>     >>
>     >> ~
>     >>
>     >> Artem's response:
>     >>
>     >> Yep, i strongly believe that any UnknownVal should be treated as a
>     >> synonym of "not implemented".
>     >>
>     >> In *this* example you might also notice that there's no symbol
>     for 'i',
>     >> but it's a concrete integer 0 instead. So you can evaluate the
>     whole
>     >> remainder to 0, unless 'width' is also equal to 0 (in which
>     case the
>     >> answer would be UndefinedVal). Also note that when enabled, DivZero
>     >> checker will refute the 'width == 0' branch.
>     >>
>     >> In the general case you have no choice but to produce an
>     IntSymExpr (if
>     >> 'i' is a concrete integer other than 0) or a SymSymExpr (if 'i'
>     is a
>     >> symbol).
>     >>
>     >>
>     >> Em qui, 24 de mai de 2018 às 18:15, Mikhail Ramalho
>     >> <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>
>     <mailto:mikhail.ramalho at gmail.com
>     <mailto:mikhail.ramalho at gmail.com>>> escreveu:
>     >>
>     >>
>     >>              Total Time      time.analyzer.time.sys (s) (mean)
>     >>      time.analyzer.time.user (s) (mean) time.analyzer.time.wall (s)
>     >>      (mean)
>     >>
>     >>              Reported bugs
>     >>      Tmux    99.2    0.076   27.253  7.656
>     >>
>     >>              32
>     >>      Tmux + z3       152.88  0.074   56.251  11.505
>     >>
>     >>              32
>     >>      Ratio   154.11%         97.37%  206.40%  150.27%
>     >>              Diff    0
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      Redis   173.69  0.057   7.083   7.271
>     >>
>     >>              146
>     >>      Redis + z3      193.43  0.057   7.621   7.728
>     >>
>     >>              140
>     >>      Ratio   111.37%         100.00%         107.60%      
>      106.29%
>     >>              Diff    6
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      OpenSSL         264.93  0.042   3.31    3.412
>     >>
>     >>              204
>     >>      OpenSSL + z3    213.53  0.035   3.099   3.152
>     >>
>     >>              204
>     >>      Ratio   80.60%  83.33%  93.63%  92.38%
>     >>              Diff    0
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      Twin    143.17  0.067   6.593   6.696
>     >>
>     >>              138
>     >>      Twin + z3       133.83  0.06    6.79    6.882
>     >>
>     >>              138
>     >>      Ratio   93.48%  89.55%  102.99%         102.78%
>     >>              Diff    0
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      Git + z3        333.9   0.075   8.52    8.67
>     >>
>     >>              96
>     >>      Git + z3        289.59  0.062   7.924   8.023
>     >>
>     >>              90
>     >>      Ratio   86.73%  82.67%  93.00%  92.54%
>     >>              Diff    6
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      Postgresql      889.35  0.079   8.482   8.631
>     >>
>     >>              676
>     >>      Postgresql + z3         902.86  0.077   9.694  9.863
>     >>
>     >>              676
>     >>      Ratio   101.52%         97.47%  114.29%  114.27%
>     >>              Diff    0
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      Sqlite3         1206.3  0.262   368.446  370.786
>     >>
>     >>              200
>     >>      Sqlite3 + z3    1260.85         0.43    407.763      
>      409.688
>     >>
>     >>              199
>     >>      Ratio   104.52%         164.12%         110.67%      
>      110.49%
>     >>              Diff    1
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      Average         104.62%         102.07%  118.37%       
>      109.86%
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>      2018-05-24 15:11 GMT+01:00 Mikhail Ramalho
>     >>      <mikhail.ramalho at gmail.com
>     <mailto:mikhail.ramalho at gmail.com>
>     <mailto:mikhail.ramalho at gmail.com
>     <mailto:mikhail.ramalho at gmail.com>>>:
>     >>
>     >>          Hi all,
>     >>
>     >>          This is my first report to the community, comparing the
>     >>          results with and without the Z3 refutation when
>     analyzing a
>     >>          number of projects.
>     >>
>     >>          ~
>     >>
>     >>          First of all, I'd like to thank Réka Kovács as the first
>     >>          version of the refutation using Z3 was created by her
>     >>          (https://reviews.llvm.org/D45517)! Thank you very much!
>     >>
>     >>          After applying patch D45517, you can use the
>     refutation check
>     >>          by using -analyzer-config crosscheck-with-z3=true.
>     Obviously,
>     >>          you need a version of clang built with Z3.
>     >>
>     >>          ~
>     >>
>     >>          I'm currently analyzing 7 C projects (unfortunately,
>     there's a
>     >>          bug preventing us from analyzing FFmpeg):
>     >>
>     >>          1. Tmux
>     >>          2. Redis
>     >>          3. OpenSSL
>     >>          4. Twin
>     >>          5. Git
>     >>          6. Postgresql
>     >>          7. Sqlite3
>     >>
>     >>          In short, the refutation check slows down the
>     verification by
>     >>          ~20%. It removed 6 FPs from Redis, 6 FPs from git and 1 FP
>     >>          from Sqlite3 (FP means false positive). Surprisingly
>     enough,
>     >>          some analysis were faster with the crosscheck, but I'm not
>     >>          sure why (maybe ccache?).
>     >>
>     >>          Attached is a spreadsheet (report1.ods) with some number
>     >>          (total time, average time per check, # of reported
>     bugs) and a
>     >>          txt with all the raw data from the analysis (raw.txt).
>     I'll
>     >>          add these data to google drive for the next report.
>     >>
>     >>          In order to generate the raw data, you need to use a
>     version
>     >>          of clang with assertions enabled, call scan-build.py with
>     >>          '-analyzer-config serialize-stats=true' and you need
>     to apply
>     >>          patch https://reviews.llvm.org/D43134.
>     >>
>     >>          Thank you very much,
>     >>
>     >>
>     >>          2018-05-01 15:27 GMT+01:00 Mikhail Ramalho
>     >>          <mikhail.ramalho at gmail.com
>     <mailto:mikhail.ramalho at gmail.com>
>     <mailto:mikhail.ramalho at gmail.com
>     <mailto:mikhail.ramalho at gmail.com>>>:
>     >>
>     >>              Hello all,
>     >>
>     >>              My proposal for GSoC 2018 [0] about Bug Validation
>     in the
>     >>              Clang Static Analyzer using the Z3 SMT Solver was
>     accepted.
>     >>
>     >>              I'll work with George Karpenkov to improve the bug
>     reports
>     >>              that the static analyzer produces by reducing the
>     number
>     >>              of false bugs.
>     >>
>     >>              Thank you,
>     >>
>     >>              [0]
>     https://docs.google.com/document/d/1-zNSv0l4WyoxYpJUAw8LFnQq_TY4AGjIpPu1VPkmO-g
>     >>
>     >>              --
>     >>
>     >>              Mikhail Ramalho.
>     >>
>     >>
>     >>
>     >>
>     >>          --
>     >>
>     >>          Mikhail Ramalho.
>     >>
>     >>
>     >>
>     >>
>     >>      --
>     >>
>     >>      Mikhail Ramalho.
>     >>
>     >>
>     >>
>     >> --
>     >>
>     >> Mikhail Ramalho.
>     >>
>     >>
>     >> _______________________________________________
>     >> cfe-dev mailing list
>     >> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     >
>     > _______________________________________________
>     > cfe-dev mailing list
>     > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
> -- 
>
> Mikhail Ramalho.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180626/43e9282d/attachment.html>


More information about the cfe-dev mailing list