[cfe-dev] static analyzer status?
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Fri Sep 14 18:03:57 PDT 2018
Forgot to answer about re-using taint across checkers. It's quite
trivial because taint tracking is done directly through API of
ProgramState, which is available everywhere. So you can just say
"State->isTainted(Sym)" any time and that's it, it'll pick up info that
was put there by the taint propagation checker.
There's actually a plan to remove this API from ProgramState and turn it
into a usual program state trait, and then expose it through a header,
so that not to clutter ProgramState API with special getters for all
sorts of stuff you can put in there. I might look into this soon, but it
shouldn't obstruct your progress.
On 9/14/18 5:17 PM, Artem Dergachev wrote:
>
> On 9/13/18 9:37 PM, Lou Wynn wrote:
>>
>> Thank you for the updates!
>>
>> I looked through the project report of using Z3 to remove false
>> positives. It is very promising. It's already in Clang 7.
>>
>> But my problem is finding specific bugs that the static analyzer
>> cannot find. For example, integer overflow which involves multiple
>> symbols in an expression. I guess that the current static analyzer
>> cannot handle this because Z3 only takes findings of the analyzer.
>> The analyzer does not handle multiple symbols, so Z3 has no chance to
>> see the entire expression even though Z3 can process multiple symbols.
>>
>
> There's one more trick that we didn't try yet: include the state that
> shouldn't be feasible in the report.
>
> For instance, normally we report division by zero only when the
> denominator *must* be zero on the current path. But what we can do is
> emit report from the checker anyway when the denominator is not known
> to be zero (but may potentially be zero), and only actually display
> the report to the user if Z3 agrees that the non-zero state is in fact
> infeasible (contains self-contradictory constraints, even if our
> constraint manager doesn't realize it).
>
> That's one of the potential approaches to finding more bugs with the
> help of Z3 refutation machinery.
>
> Also for integer overflows you may encounter a completely different
> problem that is currently in a worse shape than constraint solving,
> and it's integral cast representation. Static analyzer currently
> models casts by ignoring them, so the solver doesn't ever get a hold
> on this information. You'll need to lift this restriction, but it'll
> immediately upset the existing solver and a lot of other entities in
> the analyzer, so you'll have to make them prepared for seeing casted
> symbols. This may involve implementing the trick i mentioned above in
> all checkers, because otherwise many checkers will fail to find most
> of their bugs.
>
>> I can rephrase this to taint propagation and integer expression by
>> saying that an expression involving a tainted value is likely to
>> cause integer overflow. What is the best way to implement this
>> checker if I use this strategy? I've noticed that there is a taint
>> propagation checker but haven't figured out how to use it in another
>> checker. Is there any example code that uses it?
>>
>
> Now, the way we treat taint, we don't ever actually remove taint from
> symbols, but instead we consult both taint information and normal
> program state information before we emit the bug. For example, if the
> denominator is tainted and was not checked to be non-zero on the
> current path, we can report the bug without making sure that the
> denominator must be zero on the current path, just knowing that it may
> be zero is enough. By finding such path we already know that the
> attacker can forge the denominator to be zero and bypass all checks.
>
> In other words, taint problems are "per-path" "may" problems, while
> normal problems are "per-path" "must" problems. And for that purpose
> the existing refutation scheme is enough to solve all the problems.
> You still have problems with casts though.
>
>> Another thought of combining the static analysis and Z3 is developing
>> another static analyzer which doesn't use symbolic execution, just
>> abstract interpretation. It'll be more scalable but probably cause
>> many false positives, then use the current Z3 integration to remove
>> false positives. If I'll go this route, where should I start with?
>>
>
> In its current shape the static analyzer is, like, 20 man-years of
> work. Part of that is because it's source-based; if you try to
> analyze, say, LLVM IR instead of Clang AST, you might reduce the
> amount of work you need to do (but it'll be trickier to explain the
> bugs you find to the user in terms of the original source code), but
> that's still a huge investment.
>
> You may also try to re-use transfer functions from our static analyzer
> in your analyzer (i.e., only augment the static analyzer with a "state
> merge" operation). This might work, and that's something we consider
> trying some day, but there are a lot of known and unknown unknowns
> here, so i wouldn't outright recommend rushing in this direction either.
>
>> Thanks,
>> Lou
>> On 09/13/2018 05:31 PM, Artem Dergachev wrote:
>>> There were slight improvements, but our ad-hoc constraint solver
>>> quickly becomes unmaintainable (algorithmic complexity exponentially
>>> explodes) while we try to squeeze more features into it.
>>>
>>> There was also an attempt to use Z3, i.e. a full-featured theorem
>>> prover, instead of our ad-hoc solver. Z3 supports everything, but
>>> makes the analyzer significantly slower (imagine 10x-20x). This is
>>> very experimental and was never supported, because it seems to be a
>>> dead end due to a huge performance hit.
>>>
>>> Finally, last GSoC there was an attempt to use both constraint
>>> solvers: use ad-hoc solver during analysis (to quickly eliminate
>>> infeasible paths and report bugs), and then cross-check with Z3 only
>>> when the actual bug report is emitted. This fixes the problem when
>>> it comes to eliminating false positives, but it doesn't allow the
>>> analyzer to find new classes of bugs. This completely avoids any
>>> performance problems and looks very promising, and while this is
>>> still not officially supported, we'll probably be looking more into
>>> this to see if we'll be able to ship this somehow, probably with a
>>> different SMT or SAT solver if we run into problems with Z3. See
>>> http://lists.llvm.org/pipermail/cfe-dev/2018-August/058912.html for
>>> more details.
>>>
>>> So if your main problem is false positives there's much more hope to
>>> see a solution available soon-ish than if your main problem is being
>>> able to find these specific bugs.
>>>
>>> On 9/13/18 3:00 PM, Lou Wynn via cfe-dev wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have watched the Building a Checker in 24 Hours slide
>>>> (http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf). It
>>>> mentioned that one limitation of the Constraint Solver is that it
>>>> can't handle multiple symbols (page 83). The talk was given in
>>>> 2012. I'm wondering if this limitation has been removed now in 2018.
>>>>
>>>> --
>>>> Thanks,
>>>> Lou
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180914/f260a4a3/attachment.html>
More information about the cfe-dev
mailing list