[cfe-dev] static analyzer status?

Fri Sep 14 17:17:08 PDT 2018

On 9/13/18 9:37 PM, Lou Wynn wrote:
>
> Thank you for the updates!
>
> I looked through the project report of using Z3 to remove false 
> positives. It is very promising. It's already in Clang 7.
>
> But my problem is finding specific bugs that the static analyzer 
> cannot find. For example, integer overflow which involves multiple 
> symbols in an expression. I guess that the current static analyzer 
> cannot handle this because Z3 only takes findings of the analyzer. The 
> analyzer does not handle multiple symbols, so Z3 has no chance to see 
> the entire expression even though Z3 can process multiple symbols.
>

There's one more trick that we didn't try yet: include the state that 
shouldn't be feasible in the report.

For instance, normally we report division by zero only when the 
denominator *must* be zero on the current path. But what we can do is 
emit report from the checker anyway when the denominator is not known to 
be zero (but may potentially be zero), and only actually display the 
report to the user if Z3 agrees that the non-zero state is in fact 
infeasible (contains self-contradictory constraints, even if our 
constraint manager doesn't realize it).

That's one of the potential approaches to finding more bugs with the 
help of Z3 refutation machinery.

Also for integer overflows you may encounter a completely different 
problem that is currently in a worse shape than constraint solving, and 
it's integral cast representation. Static analyzer currently models 
casts by ignoring them, so the solver doesn't ever get a hold on this 
information. You'll need to lift this restriction, but it'll immediately 
upset the existing solver and a lot of other entities in the analyzer, 
so you'll have to make them prepared for seeing casted symbols. This may 
involve implementing the trick i mentioned above in all checkers, 
because otherwise many checkers will fail to find most of their bugs.

> I can rephrase this to taint propagation and integer expression by 
> saying that an expression involving a tainted value is likely to cause 
> integer overflow. What is the best way to implement this checker if I 
> use this strategy? I've noticed that there is a taint propagation 
> checker but haven't figured out how to use it in another checker. Is 
> there any example code that uses it?
>

Now, the way we treat taint, we don't ever actually remove taint from 
symbols, but instead we consult both taint information and normal 
program state information before we emit the bug. For example, if the 
denominator is tainted and was not checked to be non-zero on the current 
path, we can report the bug without making sure that the denominator 
must be zero on the current path, just knowing that it may be zero is 
enough. By finding such path we already know that the attacker can forge 
the denominator to be zero and bypass all checks.

In other words, taint problems are "per-path" "may" problems, while 
normal problems are "per-path" "must" problems. And for that purpose the 
existing refutation scheme is enough to solve all the problems. You 
still have problems with casts though.

> Another thought of combining the static analysis and Z3 is developing 
> another static analyzer which doesn't use symbolic execution, just 
> abstract interpretation. It'll be more scalable but probably cause 
> many false positives, then use the current Z3 integration to remove 
> false positives. If I'll go this route, where should I start with?
>

In its current shape the static analyzer is, like, 20 man-years of work. 
Part of that is because it's source-based; if you try to analyze, say, 
LLVM IR instead of Clang AST, you might reduce the amount of work you 
need to do (but it'll be trickier to explain the bugs you find to the 
user in terms of the original source code), but that's still a huge 
investment.

You may also try to re-use transfer functions from our static analyzer 
in your analyzer (i.e., only augment the static analyzer with a "state 
merge" operation). This might work, and that's something we consider 
trying some day, but there are a lot of known and unknown unknowns here, 
so i wouldn't outright recommend rushing in this direction either.

> Thanks,
> Lou
> On 09/13/2018 05:31 PM, Artem Dergachev wrote:
>> There were slight improvements, but our ad-hoc constraint solver 
>> quickly becomes unmaintainable (algorithmic complexity exponentially 
>> explodes) while we try to squeeze more features into it.
>>
>> There was also an attempt to use Z3, i.e. a full-featured theorem 
>> prover, instead of our ad-hoc solver. Z3 supports everything, but 
>> makes the analyzer significantly slower (imagine 10x-20x). This is 
>> very experimental and was never supported, because it seems to be a 
>> dead end due to a huge performance hit.
>>
>> Finally, last GSoC there was an attempt to use both constraint 
>> solvers: use ad-hoc solver during analysis (to quickly eliminate 
>> infeasible paths and report bugs), and then cross-check with Z3 only 
>> when the actual bug report is emitted. This fixes the problem when it 
>> comes to eliminating false positives, but it doesn't allow the 
>> analyzer to find new classes of bugs. This completely avoids any 
>> performance problems and looks very promising, and while this is 
>> still not officially supported, we'll probably be looking more into 
>> this to see if we'll be able to ship this somehow, probably with a 
>> different SMT or SAT solver if we run into problems with Z3. See 
>> http://lists.llvm.org/pipermail/cfe-dev/2018-August/058912.html for 
>> more details.
>>
>> So if your main problem is false positives there's much more hope to 
>> see a solution available soon-ish than if your main problem is being 
>> able to find these specific bugs.
>>
>> On 9/13/18 3:00 PM, Lou Wynn via cfe-dev wrote:
>>>
>>> Hi,
>>>
>>> I have watched the Building a Checker in 24 Hours slide 
>>> (http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf). It 
>>> mentioned that one limitation of the Constraint Solver is that it 
>>> can't handle multiple symbols (page 83). The talk was given in 2012. 
>>> I'm wondering if this limitation has been removed now in 2018.
>>>
>>> -- 
>>> Thanks,
>>> Lou
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180914/dc257a42/attachment.html>