[cfe-dev] static analyzer status?

Fri Sep 14 18:03:57 PDT 2018

Forgot to answer about re-using taint across checkers. It's quite 
trivial because taint tracking is done directly through API of 
ProgramState, which is available everywhere. So you can just say 
"State->isTainted(Sym)" any time and that's it, it'll pick up info that 
was put there by the taint propagation checker.

There's actually a plan to remove this API from ProgramState and turn it 
into a usual program state trait, and then expose it through a header, 
so that not to clutter ProgramState API with special getters for all 
sorts of stuff you can put in there. I might look into this soon, but it 
shouldn't obstruct your progress.

On 9/14/18 5:17 PM, Artem Dergachev wrote:
>
> On 9/13/18 9:37 PM, Lou Wynn wrote:
>>
>> Thank you for the updates!
>>
>> I looked through the project report of using Z3 to remove false 
>> positives. It is very promising. It's already in Clang 7.
>>
>> But my problem is finding specific bugs that the static analyzer 
>> cannot find. For example, integer overflow which involves multiple 
>> symbols in an expression. I guess that the current static analyzer 
>> cannot handle this because Z3 only takes findings of the analyzer. 
>> The analyzer does not handle multiple symbols, so Z3 has no chance to 
>> see the entire expression even though Z3 can process multiple symbols.
>>
>
> There's one more trick that we didn't try yet: include the state that 
> shouldn't be feasible in the report.
>
> For instance, normally we report division by zero only when the 
> denominator *must* be zero on the current path. But what we can do is 
> emit report from the checker anyway when the denominator is not known 
> to be zero (but may potentially be zero), and only actually display 
> the report to the user if Z3 agrees that the non-zero state is in fact 
> infeasible (contains self-contradictory constraints, even if our 
> constraint manager doesn't realize it).
>
> That's one of the potential approaches to finding more bugs with the 
> help of Z3 refutation machinery.
>
> Also for integer overflows you may encounter a completely different 
> problem that is currently in a worse shape than constraint solving, 
> and it's integral cast representation. Static analyzer currently 
> models casts by ignoring them, so the solver doesn't ever get a hold 
> on this information. You'll need to lift this restriction, but it'll 
> immediately upset the existing solver and a lot of other entities in 
> the analyzer, so you'll have to make them prepared for seeing casted 
> symbols. This may involve implementing the trick i mentioned above in 
> all checkers, because otherwise many checkers will fail to find most 
> of their bugs.
>
>> I can rephrase this to taint propagation and integer expression by 
>> saying that an expression involving a tainted value is likely to 
>> cause integer overflow. What is the best way to implement this 
>> checker if I use this strategy? I've noticed that there is a taint 
>> propagation checker but haven't figured out how to use it in another 
>> checker. Is there any example code that uses it?
>>
>
> Now, the way we treat taint, we don't ever actually remove taint from 
> symbols, but instead we consult both taint information and normal 
> program state information before we emit the bug. For example, if the 
> denominator is tainted and was not checked to be non-zero on the 
> current path, we can report the bug without making sure that the 
> denominator must be zero on the current path, just knowing that it may 
> be zero is enough. By finding such path we already know that the 
> attacker can forge the denominator to be zero and bypass all checks.
>
> In other words, taint problems are "per-path" "may" problems, while 
> normal problems are "per-path" "must" problems. And for that purpose 
> the existing refutation scheme is enough to solve all the problems. 
> You still have problems with casts though.
>
>> Another thought of combining the static analysis and Z3 is developing 
>> another static analyzer which doesn't use symbolic execution, just 
>> abstract interpretation. It'll be more scalable but probably cause 
>> many false positives, then use the current Z3 integration to remove 
>> false positives. If I'll go this route, where should I start with?
>>
>
> In its current shape the static analyzer is, like, 20 man-years of 
> work. Part of that is because it's source-based; if you try to 
> analyze, say, LLVM IR instead of Clang AST, you might reduce the 
> amount of work you need to do (but it'll be trickier to explain the 
> bugs you find to the user in terms of the original source code), but 
> that's still a huge investment.
>
> You may also try to re-use transfer functions from our static analyzer 
> in your analyzer (i.e., only augment the static analyzer with a "state 
> merge" operation). This might work, and that's something we consider 
> trying some day, but there are a lot of known and unknown unknowns 
> here, so i wouldn't outright recommend rushing in this direction either.
>
>> Thanks,
>> Lou
>> On 09/13/2018 05:31 PM, Artem Dergachev wrote:
>>> There were slight improvements, but our ad-hoc constraint solver 
>>> quickly becomes unmaintainable (algorithmic complexity exponentially 
>>> explodes) while we try to squeeze more features into it.
>>>
>>> There was also an attempt to use Z3, i.e. a full-featured theorem 
>>> prover, instead of our ad-hoc solver. Z3 supports everything, but 
>>> makes the analyzer significantly slower (imagine 10x-20x). This is 
>>> very experimental and was never supported, because it seems to be a 
>>> dead end due to a huge performance hit.
>>>
>>> Finally, last GSoC there was an attempt to use both constraint 
>>> solvers: use ad-hoc solver during analysis (to quickly eliminate 
>>> infeasible paths and report bugs), and then cross-check with Z3 only 
>>> when the actual bug report is emitted. This fixes the problem when 
>>> it comes to eliminating false positives, but it doesn't allow the 
>>> analyzer to find new classes of bugs. This completely avoids any 
>>> performance problems and looks very promising, and while this is 
>>> still not officially supported, we'll probably be looking more into 
>>> this to see if we'll be able to ship this somehow, probably with a 
>>> different SMT or SAT solver if we run into problems with Z3. See 
>>> http://lists.llvm.org/pipermail/cfe-dev/2018-August/058912.html for 
>>> more details.
>>>
>>> So if your main problem is false positives there's much more hope to 
>>> see a solution available soon-ish than if your main problem is being 
>>> able to find these specific bugs.
>>>
>>> On 9/13/18 3:00 PM, Lou Wynn via cfe-dev wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have watched the Building a Checker in 24 Hours slide 
>>>> (http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf). It 
>>>> mentioned that one limitation of the Constraint Solver is that it 
>>>> can't handle multiple symbols (page 83). The talk was given in 
>>>> 2012. I'm wondering if this limitation has been removed now in 2018.
>>>>
>>>> -- 
>>>> Thanks,
>>>> Lou
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180914/f260a4a3/attachment.html>