[cfe-dev] [GSoC 2018] Using the Z3 SMT Solver to Validate Bugs Reported by the Clang Static Analyzer
Mikhail Ramalho via cfe-dev
cfe-dev at lists.llvm.org
Tue Jun 26 11:37:13 PDT 2018
Hi,
Another solution, since you're using Z3 already, is to implement runtime
> support for querying the underlying ConstraintManager's about the types
> of constraints that it supports (e.g. canReasonAboutFoo() ). Then, you
> can use this to generate the appropriate SVal's at runtime, which could
> include support for symbolic extension/truncation, remainder, shifts, etc.
>
>
I see, but for the project I proposed to check why the constraints were
being dropped, regardless of having the SMT solver enabled or not.
~
Anyway, I removed 4 lines of code and it seems to be working:
diff --git a/lib/StaticAnalyzer/Core/SValBuilder.cpp
b/lib/StaticAnalyzer/Core/SValBuilder.cpp
index 137e9a7..39051a8 100644
--- a/lib/StaticAnalyzer/Core/SValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -377,9 +377,6 @@ SVal SValBuilder::makeSymExprValNN(ProgramStateRef
State,
BinaryOperator::Opcode Op,
NonLoc LHS, NonLoc RHS,
QualType ResultTy) {
- if (!State->isTainted(RHS) && !State->isTainted(LHS))
- return UnknownVal();
-
const SymExpr *symLHS = LHS.getAsSymExpr();
const SymExpr *symRHS = RHS.getAsSymExpr();
// TODO: When the Max Complexity is reached, we should conjure a symbol
@@ -406,9 +403,6 @@ SVal SValBuilder::evalBinOp(ProgramStateRef state,
BinaryOperator::Opcode op,
if (lhs.isUndef() || rhs.isUndef())
return UndefinedVal();
- if (lhs.isUnknown() || rhs.isUnknown())
- return UnknownVal();
-
if (lhs.getAs<nonloc::LazyCompoundVal>() ||
rhs.getAs<nonloc::LazyCompoundVal>()) {
return UnknownVal();
The static analyzer now generates the following constraints for each path:
0U % (reg_$0<unsigned int width>): { [1, 4294967295] }
and
0U % (reg_$0<unsigned int width>): { [0, 0] }
And Z3 is able to remove the false positive.
Now I need to understand the impact of the changes, specially the taint
check removal.
Thank you,
>
> Dominic
>
> On 6/26/2018 1:23 PM, Mikhail Ramalho via cfe-dev wrote:
> > Hi all,
> >
> > Hi guys,
> >
> > I'm investigating the constraints being dropped, here's what I got so
> far.
> >
> > First of all, I'm using the following program:
> >
> > void foo(unsigned width)
> > {
> > int base;
> > int i = 0;
> >
> > if (!(i % width))
> > base = 1;
> >
> > if(base == 1)
> > abort();
> > }
> >
> > I started by looking at ExprEngine::processBranch, where
> >
> > SVal X = PrevState->getSVal(Condition, PredI->getLocationContext());
> >
> > returns Unknown for the remainder condition. The getSVal ends up
> > looking in a map for the value result, so I found that bindExpr fills
> > that map.
> >
> > Going back a bit, into ExprEngine::Visit, when evaluating a
> > Stmt::BinaryOperatorClass (regardless of the eagerly assume flag)
> > ExprEngine::VisitBinaryOperator is called, which eventually calls
> > evalBinOp and, since it doesn't understand remainder, it returns
> > unknown and BindExpr is never called.
> >
> > Back to ExprEngine::processBranch, when the symbol is unknown the
> > following piece of code is called:
> >
> > // If the condition is still unknown, give up.
> > if (X.isUnknownOrUndef()) {
> > builder.generateNode(PrevState, true, PredI);
> > builder.generateNode(PrevState, false, PredI);
> > continue;
> > }
> >
> > and the condition is simply ignored. When the result is defined, it
> > creates two nodes assuming the constraints.
> >
> > ~
> >
> > My idea is when the SVal is undef or unknown, instead of generating
> > two nodes with no knowledge about the constraints, we could create
> > two ranged constraints, like:
> >
> > i % width: [0,0]
> >
> > and
> >
> > i % width: [1,1]
> >
> > for each path. That way we can keep the constraints with reasonable
> > values.
> >
> > What do you think? It feels like this will break stuff further down
> > the line, but I'll know for sure if I implement the change.
> >
> > ~
> >
> > Artem's response:
> >
> > Yep, i strongly believe that any UnknownVal should be treated as a
> > synonym of "not implemented".
> >
> > In *this* example you might also notice that there's no symbol for 'i',
> > but it's a concrete integer 0 instead. So you can evaluate the whole
> > remainder to 0, unless 'width' is also equal to 0 (in which case the
> > answer would be UndefinedVal). Also note that when enabled, DivZero
> > checker will refute the 'width == 0' branch.
> >
> > In the general case you have no choice but to produce an IntSymExpr (if
> > 'i' is a concrete integer other than 0) or a SymSymExpr (if 'i' is a
> > symbol).
> >
> >
> > Em qui, 24 de mai de 2018 às 18:15, Mikhail Ramalho
> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>> escreveu:
> >
> >
> > Total Time time.analyzer.time.sys (s) (mean)
> > time.analyzer.time.user (s) (mean) time.analyzer.time.wall (s)
> > (mean)
> >
> > Reported bugs
> > Tmux 99.2 0.076 27.253 7.656
> >
> > 32
> > Tmux + z3 152.88 0.074 56.251 11.505
> >
> > 32
> > Ratio 154.11% 97.37% 206.40% 150.27%
> > Diff 0
> >
> >
> >
> >
> >
> >
> >
> >
> > Redis 173.69 0.057 7.083 7.271
> >
> > 146
> > Redis + z3 193.43 0.057 7.621 7.728
> >
> > 140
> > Ratio 111.37% 100.00% 107.60% 106.29%
>
> > Diff 6
> >
> >
> >
> >
> >
> >
> >
> >
> > OpenSSL 264.93 0.042 3.31 3.412
> >
> > 204
> > OpenSSL + z3 213.53 0.035 3.099 3.152
> >
> > 204
> > Ratio 80.60% 83.33% 93.63% 92.38%
> > Diff 0
> >
> >
> >
> >
> >
> >
> >
> >
> > Twin 143.17 0.067 6.593 6.696
> >
> > 138
> > Twin + z3 133.83 0.06 6.79 6.882
> >
> > 138
> > Ratio 93.48% 89.55% 102.99% 102.78%
> > Diff 0
> >
> >
> >
> >
> >
> >
> >
> >
> > Git + z3 333.9 0.075 8.52 8.67
> >
> > 96
> > Git + z3 289.59 0.062 7.924 8.023
> >
> > 90
> > Ratio 86.73% 82.67% 93.00% 92.54%
> > Diff 6
> >
> >
> >
> >
> >
> >
> >
> >
> > Postgresql 889.35 0.079 8.482 8.631
> >
> > 676
> > Postgresql + z3 902.86 0.077 9.694 9.863
> >
> > 676
> > Ratio 101.52% 97.47% 114.29% 114.27%
> > Diff 0
> >
> >
> >
> >
> >
> >
> >
> >
> > Sqlite3 1206.3 0.262 368.446 370.786
> >
> > 200
> > Sqlite3 + z3 1260.85 0.43 407.763 409.688
>
> >
> > 199
> > Ratio 104.52% 164.12% 110.67% 110.49%
>
> > Diff 1
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Average 104.62% 102.07% 118.37% 109.86%
>
> >
> >
> >
> >
> >
> > 2018-05-24 15:11 GMT+01:00 Mikhail Ramalho
> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
> >
> > Hi all,
> >
> > This is my first report to the community, comparing the
> > results with and without the Z3 refutation when analyzing a
> > number of projects.
> >
> > ~
> >
> > First of all, I'd like to thank Réka Kovács as the first
> > version of the refutation using Z3 was created by her
> > (https://reviews.llvm.org/D45517)! Thank you very much!
> >
> > After applying patch D45517, you can use the refutation check
> > by using -analyzer-config crosscheck-with-z3=true. Obviously,
> > you need a version of clang built with Z3.
> >
> > ~
> >
> > I'm currently analyzing 7 C projects (unfortunately, there's a
> > bug preventing us from analyzing FFmpeg):
> >
> > 1. Tmux
> > 2. Redis
> > 3. OpenSSL
> > 4. Twin
> > 5. Git
> > 6. Postgresql
> > 7. Sqlite3
> >
> > In short, the refutation check slows down the verification by
> > ~20%. It removed 6 FPs from Redis, 6 FPs from git and 1 FP
> > from Sqlite3 (FP means false positive). Surprisingly enough,
> > some analysis were faster with the crosscheck, but I'm not
> > sure why (maybe ccache?).
> >
> > Attached is a spreadsheet (report1.ods) with some number
> > (total time, average time per check, # of reported bugs) and a
> > txt with all the raw data from the analysis (raw.txt). I'll
> > add these data to google drive for the next report.
> >
> > In order to generate the raw data, you need to use a version
> > of clang with assertions enabled, call scan-build.py with
> > '-analyzer-config serialize-stats=true' and you need to apply
> > patch https://reviews.llvm.org/D43134.
> >
> > Thank you very much,
> >
> >
> > 2018-05-01 15:27 GMT+01:00 Mikhail Ramalho
> > <mikhail.ramalho at gmail.com <mailto:mikhail.ramalho at gmail.com>>:
> >
> > Hello all,
> >
> > My proposal for GSoC 2018 [0] about Bug Validation in the
> > Clang Static Analyzer using the Z3 SMT Solver was accepted.
> >
> > I'll work with George Karpenkov to improve the bug reports
> > that the static analyzer produces by reducing the number
> > of false bugs.
> >
> > Thank you,
> >
> > [0]
> https://docs.google.com/document/d/1-zNSv0l4WyoxYpJUAw8LFnQq_TY4AGjIpPu1VPkmO-g
> >
> > --
> >
> > Mikhail Ramalho.
> >
> >
> >
> >
> > --
> >
> > Mikhail Ramalho.
> >
> >
> >
> >
> > --
> >
> > Mikhail Ramalho.
> >
> >
> >
> > --
> >
> > Mikhail Ramalho.
> >
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
--
Mikhail Ramalho.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180626/a6b040b4/attachment.html>
More information about the cfe-dev
mailing list