[cfe-dev] [GSoC 2017] Improve loop execution model of the Clang Static Analyzer

Sat Mar 25 07:33:31 PDT 2017

Hello,

thank you for your replies!

I collected my ideas about it but at the end of the email I put my concerns
in perspective which dependency would be important and maybe it should be
done first. I would gladly take it for the summer since it was mentioned in
the GSoC project page earlier.

My initial thoughts in details about handling loops in a better way:
First, we have a control number N which determines how many steps to unroll
during the analysis of a loop. Instead of this, in my opinion it would be
better to have a number M which determines the maximum times a loop can
branch the ExplodedGraph. In this case simulating a simple (non-branching)
loop to the end would not be too complex. This means we could analyze the
following example without a problem:

vector<int> v;
>
> for(int i = 1; i<SIZE;++i)
>
> {
>
>                 v[i] = f(i);
>
> }
>

But we would not waste much time to analyze a complex loop for „nothing”:

for(auto it = begin();it!=end();it++ )

{
>
>                 if(it->complexfun1()) …
>                 if(it->complexfun2()) …
>                 else …
>
> }
>

And this could be *Milestone1*. (That is quite optional but I think this
would be a great improvement.)

The next step is the LoopWidening so whenever we stop the simulation -
since the loop would result in a too complex/huge ExplodedGraph - we would
continue the analysis and invalidate only the MemRegions of which binded
values can be changed in the loop. In order to avoid false positives, however,
an important restriction would be that whenever we would invalidate too
much MemRegions then it is just better to stop the analyzing process. It
can be tricky to say when exactly it is too much but determining it would
be part of the project.

So how the invalidation should work? I believe a great approch would be the
following: we would iterate over the statements of the loop's body and they
would be handled one by one. Implementing an ASTVisitor could solve this
and the statements which are not handled at the moment would result in the
„conservative analysis” so it would cut the processing of the given path.

This way it can be an extensible and incremental solution since new
callbacks can be implemented in an easy way to cover new statements. (It
could work like in the ASTImporter.)

*Milestone2 *could be implementing the ASTVisitor described above.

The next step could be implementing the following nodes: function calls
(CallExpr) of which parameters are const or passed by value, and *the
assignement operator* in some trivial cases (no pointer/reference/array on
the left side).
So whenever we stop the analysis of a loop which contains only those nodes,
we have the information which regions to invalidate. This way the analysis
can be continued safely.

For example:

int f(int);
> int h(int,int);
> int a,b,c = 0;
> for(int i =0;i<100;i++)
>
> {
>
>                 a = f(i);
>                 b = h(a,i);
>
> }
>
>
> int d = 12/c; // The analyzer should be able to find this and report the
> divByZero bug since the foor loop contains only those statements which are
> implemented.
>

However, in the next example:

int f(int);
> int h(int*,int);
> int a,b,c = 0;
> for(int i =0;i<100;i++)
>
> {
>
>                 a = f(i);
>                 b = h(&a,i);
>
> }
>
> int d = 12/c;
>

Here we have a function call which is not implemented (pointer parameter)
so we just end this path’s analysis and not spot the bug after the loop.
This would be *Milestone3*.

After these fundamentals I would investigate which statements are commonly
used in a loop, testing this feature on open source projects and I would
aim to implement these visitor callbacks which are important according to
these tests.

AAAAnd that is the point where I think we have a problem because usage of
pointers and function calls by reference is quite common in a loop. In the
naive and conservative way all of the *typeT *variable’s regions should be
invalidated whenever we visit an assignment of which left hand side
variable is *typeT*. *Moreover it could be casted to another type so maybe
it should result invalidating all MemRegions. That is why I think that a
points-to analysis would be critical in order to make this whole approach
work on analysis of real projects. Because of that I am considering to
apply for the project “Implement points-to analysis” which I have seen it
on http://llvm.org/OpenProjects.html 2 weeks ago but it mysteriously
disappeared.

My questions:
1. What do you think of this whole approach? (Please let me know if the
whole idea is broken because I missed something.)
2. Do you agree that arguing about pointer's possible values is important
for achieving this?
3. If that is the case could I still apply for that project or is it
removed because of good reasons?

Cheers,
Peter

2017-03-16 13:25 GMT+01:00 Sean Eveson <eveson.sean at gmail.com>:

> Hello Péter,
>
> Sounds good to me. I'd also be interested in knowing the specifics of what
> you are planning to do.
>
> Very happy to help at any point.
>
> Thanks,
>
> Sean Eveson
> SN Systems - Sony Interactive Entertainment Group
>
> On Thu, Mar 16, 2017 at 12:49 AM, Anna Zaks <ganna at apple.com> wrote:
>
>> Hi Péter,
>>
>> Better modeling of loops is definitely an interesting topic. My main
>> concern would be having a specific proposal on what improvements you could
>> complete in the GSoC timeframe. The next step here is to write down more
>> specifics about what you intend to work on. Please, let me, Devin, and
>> Artem know if you’d like to talk to us while figuring this out.
>>
>> Cheers!
>> Anna.
>>
>>
>> On Mar 13, 2017, at 9:43 AM, Péter Szécsi <peterszecsi95 at gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I would love to work on a Clang Static Analyzer project during the summer
>> by joining GSoC.
>> Let me introduce myself:
>> I'm Peter Szecsi, a third-year BSc student at Eötvös Loránd University,
>> Budapest. It would be my first Summer of Code project but I have already
>> contributed to clang:
>> During the last summer I have uploaded 2 patches:
>> - An accepted and merged clang-tidy checker [1]
>> - An accepted Clang SA checker [2]
>> Since then I am working on cross translational unit analysis as a
>> university project (and because of that I`ve send some patches about the
>> ASTImporter [3]). Furthermore I participated in the preparations of a talk
>> that was accepted at the EuroLLVM conference. [4]
>> I found SA interesting since I like to work on algorithmic challenges and
>> enjoy participating in programming competitions and there is a lot of
>> algorithms in the Static Analyzer which could be optimized/made more
>> precise by heuristics.
>>
>> That is the reason why I would be most interested in project "Implement
>> generalized loop execution modeling" from the Clang SA Open Projects page
>> [5]. Hopefully it is a suitable project for GSoC and it is possible to
>> achieve useful improvements during this time frame.
>> I have read the discussion at the "loop widening" patch [6] and the most
>> important enhancements (and milestones in the project) would be
>> invalidating only the possible modified values. In addition to that I was
>> thinking to have some heuristics which would approximate if a loop is too
>> complex to even use this kind of strategy because it would still lead to
>> invalidate most of the values and result in a lot of false positives.
>> Another small heuristic could be: when we already reached the maximum loop
>> unroll number then in case we are sure that there is only a few loopsteps
>> ahead we could unroll it to the end since probably this algorithm will not
>> consume too much time/memory and the result state will not be much more
>> complex.
>>
>> What do you think about these ideas?
>>
>> (I have CC'd everyone from the "loop widening" patch discussion.)
>>
>> Best regards,
>> Peter
>>
>> [1] https://reviews.llvm.org/D22507
>> [2] https://reviews.llvm.org/D24246
>> [3] https://reviews.llvm.org/D29612, https://reviews.llvm.org/D30876,
>> https://reviews.llvm.org/D30877
>> [4] http://llvm.org/devmtg/2017-03//2017/02/20/accepted-sessions.html#7
>> [5] http://clang-analyzer.llvm.org/open_projects.html
>> [6] https://reviews.llvm.org/D12358
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170325/602b5ea4/attachment.html>