<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 25, 2017, at 7:33 AM, Péter Szécsi <<a href="mailto:peterszecsi95@gmail.com" class="">peterszecsi95@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><p class="MsoNormal"><span lang="EN-US" class="">Hello,<br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">
<br class="">
thank you for your replies!<br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">I
collected my ideas about it but at the end of the email I put my
concerns in perspective which dependency would be important and maybe it should be done first. I would gladly
take it for the summer since it was mentioned in the GSoC project page earlier.<br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">My initial thoughts
in details about handling loops in a better way:<br class=""></span></p></div></div></blockquote>Thanks for the writeup!</div><div>It would be very useful if you could prepend each idea with motivation explaining which problem you are trying to address.</div><div><br class=""></div><div>One of the issues we’d like to solve with generalized modeling of loops is that the analyzer looses coverage since it only simulates the first N iterations of the loop. This means that it does not cover paths that represent all other iterations such as the last iteration of the loop. One problem is that the loops that are known by the analyzer to have a constant bound will end exploration of a path. Note that the main reason to have a bound in the first place is that the analyzer usually does not know how many iterations of the loop to “execute” and will keep going forever. It will create all these branches:</div><div> - simulate that exactly one iteration is executed</div><div><div> - simulate that exactly two iterations are executed</div><div><div> - simulate that exactly three iterations are executed</div><div> -….</div><div><blockquote type="cite" class=""><div dir="ltr" class=""></div></blockquote></div></div><div><blockquote type="cite" class=""><div dir="ltr" class=""></div></blockquote></div></div><div><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><p class="MsoNormal"><span lang="EN-US" class=""> First, we have a control number N which determines how many steps to unroll during the analysis of a loop. Instead of this, in my
opinion it would be better to have a number M which determines the maximum times a loop can branch the ExplodedGraph.</span></p></div></div></blockquote>Could you explain how this number will be computed and give examples on what it will be at least for the cases below? And more importantly, how will this number be used? (I don’t think the intent here is that M will replace N in all cases, correct?)<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><p class="MsoNormal"><span lang="EN-US" class=""> In this case simulating a simple (non-branching) loop to the end would not be too complex. This means we could analyze the following example without a problem:<br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p></div></div></blockquote>Are you saying M=0 for the case below? Is SIZE assumed to be a constant?<br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><p class="MsoNormal"><span lang="EN-US" class="">vector<int>
v;</span></p><p class="MsoNormal"><span lang="EN-US" class="">for(int i =
1; i<SIZE;++i)</span></p><p class="MsoNormal"><span lang="EN-US" class="">{</span></p><p class="MsoNormal"><span lang="EN-US" class=""><span class=""> </span>v[i] = f(i);</span></p><p class="MsoNormal"><span lang="EN-US" class="">}</span> </p></blockquote></div></div></blockquote><blockquote type="cite" class=""><div dir="ltr" class=""><p class="MsoNormal"><span lang="EN-US" class="">But we
would not waste much time to analyze a complex loop for „nothing”:<br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span lang="EN-US" class="">
for(auto it = begin();it!=end();it++</span> <span lang="EN-US" class="">)</span> </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span lang="EN-US" class="">{</span><p class="MsoNormal"><span lang="EN-US" class=""><span class=""> </span>if(it->complexfun1()) …<br class="">
<span class=""> </span>if(it->complexfun2()) …<br class="">
<span class=""> </span>else …</span></p><p class="MsoNormal"><span lang="EN-US" class="">}</span></p></blockquote><div class=""> </div><p class="MsoNormal"><span lang="EN-US" class="">And this
could be <b class="">Milestone1</b>. (That is quite optional but I think this would be a great
improvement.)</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">The next
step is the LoopWidening so whenever we stop the simulation - since the loop would
result in a too complex/huge ExplodedGraph - we would continue the analysis
and invalidate only the MemRegions of which binded values can be changed in the loop.
I</span><span lang="EN-US" class=""><span lang="EN-US" class=""><span lang="EN-US" class=""></span>n order to avoid false positives<span class="">, </span>h</span>owever, an important restriction would be </span><span lang="EN-US" class=""><span lang="EN-US" class="">that whenever we would invalidate too much MemRegions then it is just better to
stop the analyzing process</span>. It can be tricky to say when exactly it is too much but
determining it would be part of the project.</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">
So how the invalidation should work? I believe a great approch would be the following: we would iterate over
the statements of the loop's body and they would be handled one by one. Implementing
an ASTVisitor could solve this and the statements which are not handled at the
moment would result in the „conservative analysis” so it would cut the processing of the
given path.</span></p><p class="MsoNormal"><span lang="EN-US" class="">This way it
can be an extensible and incremental solution since new callbacks can be implemented in an
easy way to cover new statements. (It could work like in the ASTImporter.)</span></p><p class="MsoNormal"><span lang="EN-US" class=""><b class="">Milestone2
</b>could be implementing the ASTVisitor described above.</span></p><p class="MsoNormal"><span lang="EN-US" class="">The next step could be implementing the following nodes: function calls (CallExpr) of which parameters are const or
passed by value, and <i class="">the assignement
operator</i> in some trivial cases (no pointer/reference/array on the left side).<br class="">So whenever we stop the analysis of a loop which contains only those nodes, we have the
information which regions to invalidate. This way the analysis can be continued
safely.</span></p><p class="MsoNormal"><span lang="EN-US" class="">For example:</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><p class="MsoNormal"><span lang="EN-US" class="">int f(int);<br class="">
int h(int,int);<br class="">
int a,b,c = 0;<br class="">
for(int i =0;i<100;i++)</span></p><p class="MsoNormal"><span lang="EN-US" class="">{</span></p><p class="MsoNormal"><span lang="EN-US" class=""><span class=""> </span>a = f(i);<br class="">
<span class=""> </span>b = h(a,i);</span></p><p class="MsoNormal"><span lang="EN-US" class="">}</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">int d = 12/c;
// The analyzer should be able to find this and report the divByZero bug
since the foor loop contains only those statements which are implemented.</span></p></blockquote><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">
However, in the next example:</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><p class="MsoNormal"><span lang="EN-US" class="">int f(int);<br class="">
int h(int*,int);<br class="">
int a,b,c = 0;<br class="">
for(int i =0;i<100;i++)</span></p><p class="MsoNormal"><span lang="EN-US" class="">{</span></p><p class="MsoNormal"><span lang="EN-US" class=""><span class=""> </span>a = f(i);<br class="">
<span class=""> </span>b = h(&a,i);</span></p><p class="MsoNormal"><span lang="EN-US" class="">}</span></p><p class="MsoNormal"><span lang="EN-US" class="">int d = 12/c;</span></p></blockquote><div class=""> </div><p class="MsoNormal"><span lang="EN-US" class="">Here we
have a function call which is not implemented (pointer parameter) so we just end this path’s
analysis and not spot the bug after the loop.<br class="">
This would be <b class="">Milestone3</b>.</span></p><p class="MsoNormal"><span lang="EN-US" class="">After these
fundamentals I would investigate which statements are commonly used in a loop,
testing this feature on open source projects and I would aim to implement these visitor
callbacks which are important according to these tests.</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">
AAAAnd that is the point where I think we have a problem because usage of
pointers and function calls by reference is quite common in a loop. In the naive
and conservative way all of the <i class="">typeT </i>variable’s
regions should be invalidated whenever we visit an assignment of which left hand
side variable is <i class="">typeT*. </i>Moreover it
could be casted to another type so maybe it should result invalidating all MemRegions.
That is why I think that a points-to analysis would be critical in order to
make this whole approach work on analysis of real projects. Because of that I am
considering to apply for the project “Implement points-to analysis” which I
have seen it on <a href="http://llvm.org/OpenProjects.html" target="_blank" class="">http://llvm.org/OpenProjects.<wbr class="">html</a>
2 weeks ago but it mysteriously disappeared.</span></p><p class="MsoNormal"><span lang="EN-US" class=""><br class=""></span></p><p class="MsoNormal"><span lang="EN-US" class="">My
questions:<br class="">
1. What do you think of this whole approach? (Please let me know if the whole idea is broken because I missed something.)<br class="">
2. Do you agree that arguing about pointer's possible values is important for achieving this?<br class="">
3. If that is the case could I still apply for that project or is it removed
because of good reasons?</span></p><p class="MsoNormal"><b class=""><span lang="EN-US" class=""><br class=""></span></b></p><p class="MsoNormal"><span lang="EN-US" class="">Cheers,<br class=""></span></p><span lang="EN-US" class="">Peter</span></div><div class="gmail_extra"><br class=""><div class="gmail_quote">2017-03-16 13:25 GMT+01:00 Sean Eveson <span dir="ltr" class=""><<a href="mailto:eveson.sean@gmail.com" target="_blank" class="">eveson.sean@gmail.com</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Hello Péter,<div class=""><br class=""></div><div class="">Sounds good to me. I'd also be interested in knowing the specifics of what you are planning to do.</div><div class=""><br class=""></div><div class="">Very happy to help at any point.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="gmail_extra"><br clear="all" class=""><div class=""><div class="m_5146538026118419111gmail_signature" data-smartmail="gmail_signature"><div dir="ltr" class=""><span style="font-family: 'Segoe UI', 'Segoe UI Web Regular', 'Segoe UI Symbol', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 13px; line-height: 18.85px;" class="">Sean Eveson</span><br style="font-family: 'Segoe UI', 'Segoe UI Web Regular', 'Segoe UI Symbol', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 13px; line-height: 18.85px;" class=""><span style="font-family: 'Segoe UI', 'Segoe UI Web Regular', 'Segoe UI Symbol', 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 13px; line-height: 18.85px;" class="">SN Systems - Sony Interactive Entertainment Group</span><br class=""></div></div></div><div class=""><div class="h5">
<br class=""><div class="gmail_quote">On Thu, Mar 16, 2017 at 12:49 AM, Anna Zaks <span dir="ltr" class=""><<a href="mailto:ganna@apple.com" target="_blank" class="">ganna@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi Péter,<br class=""><div class=""><br class=""></div><div class="">Better modeling of loops is definitely an interesting topic. My main concern would be having a specific proposal on what improvements you could complete in the GSoC timeframe. The next step here is to write down more specifics about what you intend to work on. Please, let me, Devin, and Artem know if you’d like to talk to us while figuring this out.</div><div class=""><br class=""></div><div class="">Cheers!</div><span class="m_5146538026118419111HOEnZb"><font color="#888888" class=""><div class="">Anna.</div></font></span><div class=""><div class="m_5146538026118419111h5"><div class=""><br class=""></div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Mar 13, 2017, at 9:43 AM, Péter Szécsi <<a href="mailto:peterszecsi95@gmail.com" target="_blank" class="">peterszecsi95@gmail.com</a>> wrote:</div><br class="m_5146538026118419111m_-1237608075928966407Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hello,<span class="m_5146538026118419111m_-1237608075928966407gmail-gI"><span class=""></span></span><span class="m_5146538026118419111m_-1237608075928966407gmail-gI"><span class=""> </span></span><br class=""><br class="">I would love to work on a Clang Static Analyzer project during the summer by joining GSoC.<br class="">Let me introduce myself:<br class="">I'm Peter Szecsi, a third-year BSc student at Eötvös Loránd University, Budapest. It would be my first Summer of Code project but I have already contributed to clang:<br class="">During the last summer I have uploaded 2 patches:<br class="">- An accepted and merged clang-tidy checker [1] <br class="">- An accepted Clang SA checker [2]<br class="">Since then I am working on cross translational unit analysis as a university project (and because of that I`ve send some patches about the ASTImporter [3]). Furthermore I participated in the preparations of a talk that was accepted at the EuroLLVM conference. [4]<br class="">I found SA interesting since I like to work on algorithmic challenges and enjoy participating in programming competitions and there is a lot of algorithms in the Static Analyzer which could be optimized/made more precise by heuristics.<br class=""><br class="">That is the reason why I would be most interested in project "Implement generalized loop execution modeling" from the Clang SA Open Projects page [5]. Hopefully it is a suitable project for GSoC and it is possible to achieve useful improvements during this time frame. <br class="">I have read the discussion at the "loop widening" patch [6] and the most important enhancements (and milestones in the project) would be invalidating only the possible modified values. In addition to that I was thinking to have some heuristics which would approximate if a loop is too complex to even use this kind of strategy because it would still lead to invalidate most of the values and result in a lot of false positives. Another small heuristic could be: when we already reached the maximum loop unroll number then in case we are sure that there is only a few loopsteps ahead we could unroll it to the end since probably this algorithm will not consume too much time/memory and the result state will not be much more complex. <br class=""><br class="">What do you think about these ideas?<br class=""><br class="">(I have CC'd everyone from the "loop widening" patch discussion.)<br class=""><br class="">Best regards,<br class="">Peter<br class=""><br class="">[1] <a href="https://reviews.llvm.org/D22507" target="_blank" class="">https://reviews.llvm.org/D2250<wbr class="">7</a><br class="">[2] <a href="https://reviews.llvm.org/D24246" target="_blank" class="">https://reviews.llvm.org/D2424<wbr class="">6</a><br class="">[3] <a href="https://reviews.llvm.org/D29612" target="_blank" class="">https://reviews.llvm.org/D2961<wbr class="">2</a>, <a href="https://reviews.llvm.org/D30876" target="_blank" class="">https://reviews.llvm.org/D3087<wbr class="">6</a>, <a href="https://reviews.llvm.org/D30877" target="_blank" class="">https://reviews.llvm.org/D3087<wbr class="">7</a><br class="">[4] <a href="http://llvm.org/devmtg/2017-03//2017/02/20/accepted-sessions.html#7" target="_blank" class="">http://llvm.org/devmtg/2017-03<wbr class="">//2017/02/20/accepted-sessions<wbr class="">.html#7</a><br class="">[5] <a href="http://clang-analyzer.llvm.org/open_projects.html" target="_blank" class="">http://clang-analyzer.llvm.org<wbr class="">/open_projects.html</a><br class="">[6] <a href="https://reviews.llvm.org/D12358" target="_blank" class="">https://reviews.llvm.org/D1235<wbr class="">8</a><br class=""></div>
</div></blockquote></div><br class=""></div></div></div></div></blockquote></div><br class=""></div></div></div></div>
</blockquote></div><br class=""></div>
</blockquote></div><br class=""></body></html>