<div dir="ltr">Hello everyone,<br><br><div>I am working on improving the loop modeling strategy of the Clang Static Analyzer.<br></div><div>In
the current state of the analyzer it handles loops quite simple. It
unrolls it 4 time by default and than cut the analysis of that path
where the loop would have been unrolled more times.<br><br></div><div>One
motivation, if not anything else, can be that there were already
questions on this behaviour in the cfe-dev mailing list, even it was
thought as a bug.<br></div><div>If we want to have a motivation goal
which can be expressed in a more objective way than the coverage of the
analysis is the closest what can show the impact of the modeling. Why?<br></div><div>Because the naive loop handling (described above) can easily lead to unsimulated/unchecked code lines. A small example for that:<br></div><div><br></div><div><span style="font-family:monospace,monospace">int f(){</span></div><div><span style="font-family:monospace,monospace"> int a[6];</span><br></div><div><font face="monospace,monospace"> for(int i = 0; i < 6; i++)</font></div><div><font face="monospace,monospace"> a[i] = i+2;</font></div><div><font face="monospace,monospace"> //complex body</font></div><div><font face="monospace,monospace">}<br></font></div><div><font face="monospace,monospace"><br></font></div><div>In
this example the analysis will be stopped at the 4th step of the loop
and it would not be continued after the loop. So this cause a lot of
coverage loss in this case. For this problem I experimented with
identifying specific known bound loops which would be worth to
completely unroll. (eg. the above loop in <span style="font-family:monospace,monospace">f()</span>)<br><br></div><div>At the moment a loop has to fulfill the following requirements to be worth unrolled:<br>- Currently only forStmts can be considered.<br>- The bound has to be an integer literal. (eg. i < 5, i >= MACRO_TO_LITERAL)<br>- The counter variable (i) has not escaped before/in the body of the loop and<br> changed only in the increment statement corresponding to the loop. It also<br> has to be initialized by a literal in the corresponding initStmt.<br> - Does not contain goto, switch and returnStmt.<br>These version is marked as <b>Unroll</b> in the statistics.<br><br></div><div>In
addition to that I run the measurements with a twist. So whenever an
unrolled loop is creates new branches then we consider it as a normal
(non-unroll) loop in the further analysis. It is done because of the
exponential behaviour of the ExplodedGraph (symbolic execution) it could
create a lot of new paths which will be analysed but has only a few
difference to the one step ahead analyzed path. This is marked as <b>Unroll_v2</b> in the statistics.<br></div><div><br></div><div>Time measurements: (avg of 5 run)<br></div><div>
<table cellspacing="0" border="0">
<colgroup width="80" span="3"></colgroup>
<colgroup width="97"></colgroup>
<colgroup width="80" span="3"></colgroup>
<tbody><tr>
<td valign="bottom" height="18" align="center"><font color="#000000"><br></font></td>
<td valign="bottom" align="center"><font color="#000000">curl </font></td>
<td valign="bottom" align="center"><font color="#000000">libpng</font></td>
<td valign="bottom" align="center"><font color="#000000">vim</font></td>
<td valign="bottom" align="center"><font color="#000000">bitcoin</font></td>
<td valign="bottom" align="center"><font color="#000000">ffmpeg</font></td>
<td valign="bottom" align="center"><font color="#000000">xerces</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="18"><font color="#000000">Normal</font></td>
<td valign="bottom" align="center"><font color="#000000">50.168</font></td>
<td valign="bottom" align="center"><font color="#000000">62.128</font></td>
<td valign="bottom" align="center"><font color="#000000">180.404</font></td>
<td valign="bottom" align="center"><font color="#000000">197.385</font></td>
<td valign="bottom" align="center"><font color="#000000">501.14</font></td>
<td valign="bottom" align="center"><font color="#000000">227.6075</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="18"><font color="#000000">Unroll</font></td>
<td valign="bottom" align="center"><font color="#000000">54.562</font></td>
<td valign="bottom" align="center"><font color="#000000">64.186</font></td>
<td valign="bottom" align="center"><font color="#000000">205.468</font></td>
<td valign="bottom" align="center"><font color="#000000">1767.475</font></td>
<td valign="bottom" align="center"><font color="#000000">1127.076</font></td>
<td valign="bottom" align="center"><font color="#000000">248.5375</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="19"><font color="#000000">Unroll v2</font></td>
<td valign="bottom" align="center"><font color="#000000">52.892</font></td>
<td valign="bottom" align="center"><font color="#000000">63.866</font></td>
<td valign="bottom" align="center"><font color="#000000">203.476</font></td>
<td valign="bottom" align="center"><font color="#000000">198.025</font></td>
<td valign="bottom" align="center"><font color="#000000">693.486</font></td>
<td valign="bottom" align="center"><font color="#000000">237.87</font></td>
</tr>
</tbody></table>
</div><div><br></div><div>Coverage measurements (% of reachable basic blocks statistics)<br></div><div>
<table cellspacing="0" border="0">
<colgroup width="80" span="3"></colgroup>
<colgroup width="97"></colgroup>
<colgroup width="80" span="3"></colgroup>
<tbody><tr>
<td valign="bottom" height="20" align="center"><font color="#000000"><br></font></td>
<td valign="bottom" align="center"><font color="#000000">curl </font></td>
<td valign="bottom" align="center"><font color="#000000">libpng</font></td>
<td valign="bottom" align="center"><font color="#000000">vim</font></td>
<td valign="bottom" align="center"><font color="#000000">bitcoin</font></td>
<td valign="bottom" align="center"><font color="#000000">ffmpeg</font></td>
<td valign="bottom" align="center"><font color="#000000">xerces</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="20"><font color="#000000">Normal</font></td>
<td valign="bottom" align="center"><font color="#000000">58.05</font></td>
<td valign="bottom" align="center"><font color="#000000">55.91</font></td>
<td valign="bottom" align="center"><font color="#000000">51.13</font></td>
<td valign="bottom" align="center"><font color="#000000">68.58</font></td>
<td valign="bottom" align="center"><font color="#000000">49.78</font></td>
<td valign="bottom" align="center"><font color="#000000">74.86</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="20"><font color="#000000">Unroll</font></td>
<td valign="bottom" align="center"><font color="#000000">69.1</font></td>
<td valign="bottom" align="center"><font color="#000000">56.04</font></td>
<td valign="bottom" align="center"><font color="#000000">51.41</font></td>
<td valign="bottom" align="center"><font color="#000000">68.7</font></td>
<td valign="bottom" align="center"><font color="#000000">62.18</font></td>
<td valign="bottom" align="center"><font color="#000000">74.79</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="20"><font color="#000000">Unroll v2</font></td>
<td valign="bottom" align="center"><font color="#000000">69.19</font></td>
<td valign="bottom" align="center"><font color="#000000">56.04</font></td>
<td valign="bottom" align="center"><font color="#000000">51.61</font></td>
<td valign="bottom" align="center"><font color="#000000">68.83</font></td>
<td valign="bottom" align="center"><font color="#000000">52.41</font></td>
<td valign="bottom" align="center"><font color="#000000">74.8</font></td>
</tr>
</tbody></table>
</div><div><br></div><div>Summary table which contains the deviation (percentage) to the normal run.<br></div><div>
<table cellspacing="0" border="0">
<colgroup width="170"></colgroup>
<colgroup width="111"></colgroup>
<colgroup width="80"></colgroup>
<colgroup width="97"></colgroup>
<colgroup width="80" span="3"></colgroup>
<tbody><tr>
<td valign="bottom" height="18" align="center"><font color="#000000"><br></font></td>
<td valign="bottom" align="center"><font color="#000000">curl </font></td>
<td valign="bottom" align="center"><font color="#000000">libpng</font></td>
<td valign="bottom" align="center"><font color="#000000">vim</font></td>
<td valign="bottom" align="center"><font color="#000000">bitcoin</font></td>
<td valign="bottom" align="center"><font color="#000000">ffmpeg</font></td>
<td valign="bottom" align="center"><font color="#000000">xerces</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="18"><font color="#000000">Unroll – time</font></td>
<td valign="bottom" align="center"><font color="#000000">8.76</font></td>
<td valign="bottom" align="center"><font color="#000000">3.31</font></td>
<td valign="bottom" align="center"><font color="#000000">13.89</font></td>
<td valign="bottom" align="center"><font color="#000000">795.445</font></td>
<td valign="bottom" align="center"><font color="#000000">124.9</font></td>
<td valign="bottom" align="center"><font color="#000000">9.19</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="20"><font color="#000000">Unroll – coverage</font></td>
<td valign="bottom" align="center"><font color="#000000">11.05</font></td>
<td valign="bottom" align="center"><font color="#000000">0.13</font></td>
<td valign="bottom" align="center"><font color="#000000">0.28</font></td>
<td valign="bottom" align="center"><font color="#000000">0.12</font></td>
<td valign="bottom" align="center"><font color="#000000">12.4</font></td>
<td valign="bottom" align="center"><font color="#000000">-0.07</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="20"><font color="#000000">Unroll v2 - time</font></td>
<td valign="bottom" align="center"><font color="#000000">5.43</font></td>
<td valign="bottom" align="center"><font color="#000000">2.8</font></td>
<td valign="bottom" align="center"><font color="#000000">12.79</font></td>
<td valign="bottom" align="center"><font color="#000000">0.324</font></td>
<td valign="bottom" align="center"><font color="#000000">38.38</font></td>
<td valign="bottom" align="center"><font color="#000000">4.5</font></td>
</tr>
<tr>
<td style="text-align:right" valign="bottom" height="18"><font color="#000000">Unroll v2 – cov</font></td>
<td valign="bottom" align="center"><font color="#000000">11.14</font></td>
<td valign="bottom" align="center"><font color="#000000">0.13</font></td>
<td valign="bottom" align="center"><font color="#000000">0.48</font></td>
<td valign="bottom" align="center"><font color="#000000">0.25</font></td>
<td valign="bottom" align="center"><font color="#000000">2.63</font></td>
<td valign="bottom" align="center"><font color="#000000">-0.06</font></td>
</tr>
</tbody></table>
</div><div>The most outstanding (negative) difference is
that the analysis time of project bitcoin increased and took 8x more
time than in the normal case. That was single handedly caused by one
benchmark file which contains a lot of for loop which makes ~10,000,000
step.<br>(An improvement can be to not allow unrolling loops which takes
N or more steps where N could be a well chosen number - I am running
some measurements at the moment with the version N = 128).</div><div><br></div><div><br></div><div>The number of findings:<br>
<table cellspacing="0" border="0">
<colgroup width="89"></colgroup>
<colgroup width="111"></colgroup>
<colgroup width="80"></colgroup>
<colgroup width="97"></colgroup>
<colgroup width="80" span="3"></colgroup>
<tbody><tr>
<td valign="bottom" height="18" align="center"><font color="#000000"><br></font></td>
<td valign="bottom" align="center"><font color="#000000">curl </font></td>
<td valign="bottom" align="center"><font color="#000000">libpng</font></td>
<td valign="bottom" align="center"><font color="#000000">vim</font></td>
<td valign="bottom" align="center"><font color="#000000">bitcoin</font></td>
<td valign="bottom" align="center"><font color="#000000">ffmpeg</font></td>
<td valign="bottom" align="center"><font color="#000000">xerces</font></td>
</tr>
<tr>
<td valign="bottom" height="18" align="right"><font color="#000000">Normal</font></td>
<td valign="bottom" align="center"><font color="#000000">35</font></td>
<td valign="bottom" align="center"><font color="#000000">32</font></td>
<td valign="bottom" align="center"><font color="#000000">81</font></td>
<td valign="bottom" align="center"><font color="#000000">10</font></td>
<td valign="bottom" align="center"><font color="#000000">375</font></td>
<td valign="bottom" align="center"><font color="#000000">52</font></td>
</tr>
<tr>
<td valign="bottom" height="18" align="right"><font color="#000000">Unroll</font></td>
<td valign="bottom" align="center"><font color="#000000">27</font></td>
<td valign="bottom" align="center"><font color="#000000">33</font></td>
<td valign="bottom" align="center"><font color="#000000">81</font></td>
<td valign="bottom" align="center"><font color="#000000">10</font></td>
<td valign="bottom" align="center"><font color="#000000">367</font></td>
<td valign="bottom" align="center"><font color="#000000">48</font></td>
</tr>
<tr>
<td valign="bottom" height="18" align="right"><font color="#000000">Unroll v2</font></td>
<td valign="bottom" align="center"><font color="#000000">27</font></td>
<td valign="bottom" align="center"><font color="#000000">32</font></td>
<td valign="bottom" align="center"><font color="#000000">81</font></td>
<td valign="bottom" align="center"><font color="#000000">10</font></td>
<td valign="bottom" align="center"><font color="#000000">363</font></td>
<td valign="bottom" align="center"><font color="#000000">52</font></td>
</tr>
</tbody></table>
<br></div><div>Most of the time it hasn't changed the founded bugs. On FFmpeg there are new finding which come from the fact that we are able to unroll loops (and analyze the codelines after them). However the unrolling resulted some loss of the findings as well. Most of the time (curl, ffmpeg) these loss was because these bugs were founded in complex functions on long paths which analysis exhausted the maximum limit of the nodes in the ExplodedGraph even in the normal analysis. It happened "faster" with the unroll feature since we so these paths were not analyzed thus the bugs were not found.<br><br></div><div>In
conclusion the unrolling of the above defined specific loops can have a
positive impact on the coverage (% of reachable basic blocks), however,
it comes with the price that it affects the running time. In most
projects (all except curl) the time increase % was higher than the
coverage increase %.<br></div><div><br></div><div>So all in all, I think it is a feature which can be useful for a more <span class="m_8795779713071489574gmail-m_-981880197780097369gmail-m_-8185922259641316962gmail-prop m_8795779713071489574gmail-m_-981880197780097369gmail-m_-8185922259641316962gmail-prop_content m_8795779713071489574gmail-m_-981880197780097369gmail-m_-8185922259641316962gmail-freetext">thoroughgoing analysis but should not be used by default at the time.<br></span></div><div><br></div><div>What do you think?<br></div><div><br></div><div>Cheers,<br></div>Peter<br><br>Note: if you are interested in the detailed results they are available
on the following site: <a href="http://cc.elte.hu/szepet/loop_modeling/unrolling/" target="_blank">http://cc.elte.hu/szepet/loop_<wbr>modeling/unrolling/</a></div>