[cfe-dev] [StaticAnalyzer] False positive with loop handling

Tue Oct 25 12:37:12 PDT 2016

Let me add a concrete data point regarding performance of SMT solvers for 
static analysis. See slide 45 here: 
http://web.ist.utl.pt/nuno.lopes/pres/smt_school16.pptx
Basically it shows that SMT solvers are many orders of magnitude slower than 
specialized abstract domains. Even simplex implemented in Prolog (CLPQ) is 
faster than Z3 for simple queries.
STP is faster for smaller & simpler queries than Z3, but you'll still see 
this big gap.
LLVM itself has multiple abstract domains (e.g., ConstantRange) that can be 
used by clang static analyzer. That would probably require implementing a 
combined constraint solver in clang, so that it can use a pipeline of 
solvers.

Nuno

-----Original Message-----
From: Anna Zaks via cfe-dev
Sent: Tuesday, October 25, 2016 2:44 AM
Subject: Re: [cfe-dev] [StaticAnalyzer] False positive with loop handling

> On Oct 21, 2016, at 2:29 PM, Dominic Chen via cfe-dev 
> <cfe-dev at lists.llvm.org> wrote:
>
> Thanks for writing the workbook! I wish I had seen it earlier. Do you
> know why the STP solver backend was never merged in? Looking through the
> code, it seems that the status was fairly experimental?

The STP backend was experimental quality but could serve as a good starting 
point to explore adding more powerful solvers. (If choosing now, I’d prefer 
CVC4.) The main reason it was not merged is that the community asked not to 
introduce a dependency on the STP library in the build system and the build 
system integration has not been finished. Feel free to resurrect the patches 
if you are interested. Someone tried to do that a couple of months ago…

As Artem has mentioned, just replacing the range-based solver with SMT will 
make the analyzer much slower. One way of addressing the issue would be to 
run with the faster solver and use SMT on the second slower pass to refute 
false positives. This would be a much more challenging project than what the 
STP patch was trying to achieve.

Anna.
>
> Dominic
>
> On 10/21/2016 5:10 AM, Artem Dergachev wrote:
>> Yep, we're using an extremely simple solver; a powerful solver should
>> significantly increase precision, and probably significantly hurt
>> performance. The analyzer often explores thousands of paths through
>> the same point of the program, and many bugs are found in thousands of
>> copies (on different paths on which they occur, and they get
>> deduplicated by end-of-path point), which would put the solver under
>> heavy stress.
>>
>> There's an experimental work described in this thread (i didn't watch
>> this closely):
>> http://clang-developers.42468.n3.nabble.com/Static-Analyzer-symbolic-expressions-in-a-program-td4030798.html
>>
>> As for documentation, apart from the official website docs and
>> must-reads in /docs/analyzer/, there's my fanboy's workbook at
>> https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf
>>
>> On 10/20/16 11:36 PM, Dominic Chen wrote:
>>> Ah, so it's an issue with the constraint manager. I couldn't find much
>>> documentation on the internals of the static analyzer, so it wasn't
>>> clear how the components interacted with one another.
>>>
>>> More generally, why does the static analyzer implement its own
>>> constraint solver? I noticed that there also used to be a
>>> BasicConstraintManager, but was removed quite a while back for not being
>>> useful in practice. Would replacing the internal constraint solver with
>>> a wrapper around e.g. CVC4 or Z3 significantly increase the precision of
>>> the analysis?
>>>
>>> Thanks,
>>>
>>> Dominic
>>>
>>> On 10/20/2016 6:42 AM, Artem Dergachev wrote:
>>>> A false positive due to a known issue.
>>>>
>>>> This happens because "nc = na + nb - 1" is too complicated for our
>>>> extremely simple and fast solver (RangeConstraintManager) to handle.
>>>> Because of that, we need to *conjure* a completely new symbol for
>>>> "nc", and then it assumes that "nc" may be equal to 2 even when "na"
>>>> and "nb" are equal to 1. Essentially, we can construct a correct
>>>> SymSymExpr for "nc", but we'd need to do some bigger work to let the
>>>> solver solve the equations.
>>>>
>>>> I'm all for producing more SymSymExpr objects and simplifying them
>>>> (through an alpha-renaming-like procedure) whenever their components
>>>> get simplified out, just need time to implement this.
>>>>
>>>> On 10/19/16 10:29 PM, Dominic Chen via cfe-dev wrote:
>>>>> Hello,
>>>>>
>>>>> I've been trying to track down a false positive with symbolic
>>>>> execution of an array assignment within a loop, but I'm having some
>>>>> trouble figuring out the root cause. In the attached loop.c file, the
>>>>> analyzer identifies the LHS of the assignment equal operation on line
>>>>> 11 to be a garbage value, but this is actually initialized on line 6.
>>>>>
>>>>> Initially, I thought that this was an issue with dead symbols being
>>>>> incorrectly purged, because the trimmed ExplodedGraph shows that the
>>>>> analyzer identifies the array "c" to be an ElementRegion "c:
>>>>> &element{c, 0 S64b,int}" in an earlier node before
>>>>> PreStmtPurgeDeadSymbols. But this is unchanged even after eliminating
>>>>> calls to collectNode(), so I'm confused if the graph is representing
>>>>> the actual symbolic execution exploration state, or just simply the
>>>>> ordering of explored states from the worklist?
>>>>>
>>>>> The other place that I've been looking at is the branch condition
>>>>> handling inside ExprEngine::processBranch(). When the symbolic
>>>>> execution engine assumes either of the false/true branches, does that
>>>>> also entail the body of the loop?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Dominic