[cfe-dev] [GSoC 2018] Using the Z3 SMT Solver to Validate Bugs Reported by the Clang Static Analyzer
Mikhail Ramalho via cfe-dev
cfe-dev at lists.llvm.org
Tue Jun 26 16:49:27 PDT 2018
Hi,
Indeed, just removing the taint check in evalBinOp fixes the constraints
being dropped and a number of test cases now print the expected output.
There are some crashes as well, as mentioned in D28953.
Em qua, 27 de jun de 2018 às 00:30, Dominic Chen <d.c.ddcc at gmail.com>
escreveu:
> The reason symbols are getting dropped, is because, currently, CSA doesn’t
> generate symbols for things that RangedConstraintManager doesn’t support,
> unless taint tracking is enabled. The querying that I mentioned is just a
> performance optimization, to check the capabilities of the
> ConstraintManager at runtime and avoid generating unsupported symbols.
>
> If, as Artem suggests, the new policy is to generate accurate symbols,
> even if the underlying ConstraintManager doesn’t support reasoning over
> them, then the solution is to identify all the places where unknown values
> (over-approximations) are being generated, and replace them with the
> correct SVal.
>
> In your patch below, removing the first part is correct. In fact, I have
> the same change in D35450. But, the second isn’t, because at that point you
> already have an unknown value, and must propagate it to maintain
> correctness of the analysis.
>
> The reason I mention D28955 and D35450, is because I’ve already started on
> some of these changes, so you don’t have to do everything. I’ll try to
> rebase and update them this weekend.
>
> Dominic
>
> On Jun 26, 2018, at 2:37 PM, Mikhail Ramalho <mikhail.ramalho at gmail.com>
> wrote:
>
> Hi,
>
> Another solution, since you're using Z3 already, is to implement runtime
>> support for querying the underlying ConstraintManager's about the types
>> of constraints that it supports (e.g. canReasonAboutFoo() ). Then, you
>> can use this to generate the appropriate SVal's at runtime, which could
>> include support for symbolic extension/truncation, remainder, shifts, etc.
>>
>>
> I see, but for the project I proposed to check why the constraints were
> being dropped, regardless of having the SMT solver enabled or not.
>
> ~
>
> Anyway, I removed 4 lines of code and it seems to be working:
>
> diff --git a/lib/StaticAnalyzer/Core/SValBuilder.cpp
> b/lib/StaticAnalyzer/Core/SValBuilder.cpp
> index 137e9a7..39051a8 100644
> --- a/lib/StaticAnalyzer/Core/SValBuilder.cpp
> +++ b/lib/StaticAnalyzer/Core/SValBuilder.cpp
> @@ -377,9 +377,6 @@ SVal SValBuilder::makeSymExprValNN(ProgramStateRef
> State,
> BinaryOperator::Opcode Op,
> NonLoc LHS, NonLoc RHS,
> QualType ResultTy) {
> - if (!State->isTainted(RHS) && !State->isTainted(LHS))
> - return UnknownVal();
> -
> const SymExpr *symLHS = LHS.getAsSymExpr();
> const SymExpr *symRHS = RHS.getAsSymExpr();
> // TODO: When the Max Complexity is reached, we should conjure a symbol
> @@ -406,9 +403,6 @@ SVal SValBuilder::evalBinOp(ProgramStateRef state,
> BinaryOperator::Opcode op,
> if (lhs.isUndef() || rhs.isUndef())
> return UndefinedVal();
>
> - if (lhs.isUnknown() || rhs.isUnknown())
> - return UnknownVal();
> -
> if (lhs.getAs<nonloc::LazyCompoundVal>() ||
> rhs.getAs<nonloc::LazyCompoundVal>()) {
> return UnknownVal();
>
> The static analyzer now generates the following constraints for each path:
>
> 0U % (reg_$0<unsigned int width>): { [1, 4294967295] }
>
> and
>
> 0U % (reg_$0<unsigned int width>): { [0, 0] }
>
> And Z3 is able to remove the false positive.
>
> Now I need to understand the impact of the changes, specially the taint
> check removal.
>
> Thank you,
>
>
>>
>> Dominic
>>
>> On 6/26/2018 1:23 PM, Mikhail Ramalho via cfe-dev wrote:
>> > Hi all,
>> >
>> > Hi guys,
>> >
>> > I'm investigating the constraints being dropped, here's what I got so
>> far.
>> >
>> > First of all, I'm using the following program:
>> >
>> > void foo(unsigned width)
>> > {
>> > int base;
>> > int i = 0;
>> >
>> > if (!(i % width))
>> > base = 1;
>> >
>> > if(base == 1)
>> > abort();
>> > }
>> >
>> > I started by looking at ExprEngine::processBranch, where
>> >
>> > SVal X = PrevState->getSVal(Condition, PredI->getLocationContext());
>> >
>> > returns Unknown for the remainder condition. The getSVal ends up
>> > looking in a map for the value result, so I found that bindExpr fills
>> > that map.
>> >
>> > Going back a bit, into ExprEngine::Visit, when evaluating a
>> > Stmt::BinaryOperatorClass (regardless of the eagerly assume flag)
>> > ExprEngine::VisitBinaryOperator is called, which eventually calls
>> > evalBinOp and, since it doesn't understand remainder, it returns
>> > unknown and BindExpr is never called.
>> >
>> > Back to ExprEngine::processBranch, when the symbol is unknown the
>> > following piece of code is called:
>> >
>> > // If the condition is still unknown, give up.
>> > if (X.isUnknownOrUndef()) {
>> > builder.generateNode(PrevState, true, PredI);
>> > builder.generateNode(PrevState, false, PredI);
>> > continue;
>> > }
>> >
>> > and the condition is simply ignored. When the result is defined, it
>> > creates two nodes assuming the constraints.
>> >
>> > ~
>> >
>> > My idea is when the SVal is undef or unknown, instead of generating
>> > two nodes with no knowledge about the constraints, we could create
>> > two ranged constraints, like:
>> >
>> > i % width: [0,0]
>> >
>> > and
>> >
>> > i % width: [1,1]
>> >
>> > for each path. That way we can keep the constraints with reasonable
>> > values.
>> >
>> > What do you think? It feels like this will break stuff further down
>> > the line, but I'll know for sure if I implement the change.
>> >
>> > ~
>> >
>> > Artem's response:
>> >
>> > Yep, i strongly believe that any UnknownVal should be treated as a
>> > synonym of "not implemented".
>> >
>> > In *this* example you might also notice that there's no symbol for 'i',
>> > but it's a concrete integer 0 instead. So you can evaluate the whole
>> > remainder to 0, unless 'width' is also equal to 0 (in which case the
>> > answer would be UndefinedVal). Also note that when enabled, DivZero
>> > checker will refute the 'width == 0' branch.
>> >
>> > In the general case you have no choice but to produce an IntSymExpr (if
>> > 'i' is a concrete integer other than 0) or a SymSymExpr (if 'i' is a
>> > symbol).
>> >
>> >
>> > Em qui, 24 de mai de 2018 às 18:15, Mikhail Ramalho
>> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>
>> escreveu:
>> >
>> >
>> > Total Time time.analyzer.time.sys (s) (mean)
>> > time.analyzer.time.user (s) (mean) time.analyzer.time.wall
>> (s)
>> > (mean)
>> >
>> > Reported bugs
>> > Tmux 99.2 0.076 27.253 7.656
>> >
>> > 32
>> > Tmux + z3 152.88 0.074 56.251 11.505
>> >
>> > 32
>> > Ratio 154.11% 97.37% 206.40% 150.27%
>> > Diff 0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Redis 173.69 0.057 7.083 7.271
>> >
>> > 146
>> > Redis + z3 193.43 0.057 7.621 7.728
>> >
>> > 140
>> > Ratio 111.37% 100.00% 107.60% 106.29%
>>
>> > Diff 6
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > OpenSSL 264.93 0.042 3.31 3.412
>> >
>> > 204
>> > OpenSSL + z3 213.53 0.035 3.099 3.152
>> >
>> > 204
>> > Ratio 80.60% 83.33% 93.63% 92.38%
>> > Diff 0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Twin 143.17 0.067 6.593 6.696
>> >
>> > 138
>> > Twin + z3 133.83 0.06 6.79 6.882
>> >
>> > 138
>> > Ratio 93.48% 89.55% 102.99% 102.78%
>> > Diff 0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Git + z3 333.9 0.075 8.52 8.67
>> >
>> > 96
>> > Git + z3 289.59 0.062 7.924 8.023
>> >
>> > 90
>> > Ratio 86.73% 82.67% 93.00% 92.54%
>> > Diff 6
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Postgresql 889.35 0.079 8.482 8.631
>> >
>> > 676
>> > Postgresql + z3 902.86 0.077 9.694 9.863
>> >
>> > 676
>> > Ratio 101.52% 97.47% 114.29% 114.27%
>> > Diff 0
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Sqlite3 1206.3 0.262 368.446 370.786
>> >
>> > 200
>> > Sqlite3 + z3 1260.85 0.43 407.763 409.688
>>
>> >
>> > 199
>> > Ratio 104.52% 164.12% 110.67% 110.49%
>>
>> > Diff 1
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Average 104.62% 102.07% 118.37% 109.86%
>>
>> >
>> >
>> >
>> >
>> >
>> > 2018-05-24 15:11 GMT+01:00 Mikhail Ramalho
>> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
>> >
>> > Hi all,
>> >
>> > This is my first report to the community, comparing the
>> > results with and without the Z3 refutation when analyzing a
>> > number of projects.
>> >
>> > ~
>> >
>> > First of all, I'd like to thank Réka Kovács as the first
>> > version of the refutation using Z3 was created by her
>> > (https://reviews.llvm.org/D45517)! Thank you very much!
>> >
>> > After applying patch D45517, you can use the refutation check
>> > by using -analyzer-config crosscheck-with-z3=true. Obviously,
>> > you need a version of clang built with Z3.
>> >
>> > ~
>> >
>> > I'm currently analyzing 7 C projects (unfortunately, there's a
>> > bug preventing us from analyzing FFmpeg):
>> >
>> > 1. Tmux
>> > 2. Redis
>> > 3. OpenSSL
>> > 4. Twin
>> > 5. Git
>> > 6. Postgresql
>> > 7. Sqlite3
>> >
>> > In short, the refutation check slows down the verification by
>> > ~20%. It removed 6 FPs from Redis, 6 FPs from git and 1 FP
>> > from Sqlite3 (FP means false positive). Surprisingly enough,
>> > some analysis were faster with the crosscheck, but I'm not
>> > sure why (maybe ccache?).
>> >
>> > Attached is a spreadsheet (report1.ods) with some number
>> > (total time, average time per check, # of reported bugs) and a
>> > txt with all the raw data from the analysis (raw.txt). I'll
>> > add these data to google drive for the next report.
>> >
>> > In order to generate the raw data, you need to use a version
>> > of clang with assertions enabled, call scan-build.py with
>> > '-analyzer-config serialize-stats=true' and you need to apply
>> > patch https://reviews.llvm.org/D43134.
>> >
>> > Thank you very much,
>> >
>> >
>> > 2018-05-01 15:27 GMT+01:00 Mikhail Ramalho
>> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
>> >
>> > Hello all,
>> >
>> > My proposal for GSoC 2018 [0] about Bug Validation in the
>> > Clang Static Analyzer using the Z3 SMT Solver was accepted.
>> >
>> > I'll work with George Karpenkov to improve the bug reports
>> > that the static analyzer produces by reducing the number
>> > of false bugs.
>> >
>> > Thank you,
>> >
>> > [0]
>> https://docs.google.com/document/d/1-zNSv0l4WyoxYpJUAw8LFnQq_TY4AGjIpPu1VPkmO-g
>> >
>> > --
>> >
>> > Mikhail Ramalho.
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Mikhail Ramalho.
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Mikhail Ramalho.
>> >
>> >
>> >
>> > --
>> >
>> > Mikhail Ramalho.
>> >
>> >
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>
> --
>
> Mikhail Ramalho.
>
>
>
--
Mikhail Ramalho.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180627/455ab460/attachment.html>
More information about the cfe-dev
mailing list