<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 23 May 2019 at 21:58, Kristóf Umann <<a href="mailto:dkszelethus@gmail.com">dkszelethus@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Please let me know if I didn't address something you mentioned!</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 23 May 2019 at 02:29, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
My primary question is, how do you plan to use the data that you
obtain via your analysis?</div></blockquote><div><br></div><div> Gather relevant statements. Keep ExplodedNodes in the bugpath for which PathDiagnosticLocation::getStmt() is in the set (or, something like that at least). But honestly, I didn't think too much about that yet :^) I guess we could just gather ExplodedNodes rather than statements, we'll see.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
Like, do you plan to backtrack the value of all relevant variables
and/or expressions in the final bug report that were also
encountered along the bug path? If yes, then is it possible that the
slicing criterion gets updated in the process and you'll have to
re-run the CFG-based analysis to take it into account?<br></div></blockquote><div><br></div><div>No, the slicing criterion is what it is, and will not change. The set of relevant statements to the slicing criterion define the actual program slice, which is basically the thing we're going for in this project. When it turns out that a value of a variable directly affects a variable in the slicing criterion (either through data or control dependency), we just add it to the set of relevant variables, and then if something in the set of relevant variables is affected (some statements or variables turn out to be a control/data dependencies to them), then it's added to the respective relevancy sets. Should we only traverse the basic blocks the analyzer visited, I think we can pull off the slicing with a single pass.<br></div><div></div><div></div></div></div></blockquote><div><br></div><div>We'll see about that single pass thing though. I'm gathering some tricky examples in the document I shared and working on a neat algorithm.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
> What would also be really great is to assist this traversal
with the information the analyzer already has, essentially only
inspecting basic blocks the analyzer actually visits.<br>
<br>
You mean visits on the current bug path or visits on the equivalence
class of bugs or visits in general during analysis?<br>
<br>
Regardless, for many problems ideally we should traverse basic
blocks that the analyzer *didn't* visit (eg., if the function didn't
initialize the variable on the current path, we're interested in
parts of the code in which it *did* initialize the variable, even
though we didn't necessarily have time to visit them).
</div></blockquote><div><br></div><div>Well I mean slicing by definition isn't intended to do that. The entire point of it is to gather the smallest slice related to the slicing criterion, and my project aims to fill in the gaps where we don't provide enough information.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
It actually sounds as if all problems that we're trying to solve
here can be classified into "must-have-happened" problems (eg.,
"variable must-have been declared without initialization", "variable
must-have been set to nullptr", "memory must-have been allocated")
and "should-not-have-happened" problems (eg., "variable
should-not-have been initialized", "null value should-not-have been
overwritten", "pointer should-not-have escaped").<br>
<br>
For must-have-happened problems, i'm recently under an impression
that we should suppress the bug report entirely when we fail to
solve them (i.e., if fail to explain to the user where exactly does
this happen, then the report is impossible to understand anyway).
This is currently a very big problem for our null dereference
checker on some codebases, especially because it uses this tracking
for suppressions as well (aka inlined defensive checks), so when it
fails to track the variable it's likely a legit false positive as
well, not simply a hard-to-understand report.<br></div></blockquote><div><br></div><div>I think this set calculating approach inherently gathers far more information, allowing us to make better judgement on whether we should suppress the report.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
For should-not-have-happened problems i'm much more confused. We're
talking about looking for places where it *could* have happened and
then trying to explain to the user why none of them have actually
happened. I'm not sure what are the consequences of failing to
explain to the user why didn't a particular piece of code do
something, because i've no idea how do users intuitively figure out
which code *could* have done these things and which clearly
couldn't.<br></div></blockquote><div><br></div><div>Im really at a loss here :) Can you provide some example as to what a problematic "must-have-happened" bug report would look like, and how would you like to see it improved? Same for "should-not-have-happened". Because as I understand it, and I might be wrong, you have problems with this:</div><div><br><font face="courier new, monospace">int a; // declaring a without initializing<br><br></font></div><div><font face="courier new, monospace">if (rand()) // assuming that the condition is false</font></div><div><font face="courier new, monospace"> a = 5;</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">print(a); // a is undef</font></div><div><br>And prefer to see this:<br><br><font face="courier new, monospace">int a; // declaring a without initializing<br><br></font></div><div><font face="courier new, monospace">if (rand()) // should this condition be false, a's value will be indeterministic</font></div><div><font face="courier new, monospace"> a = 5; // writing to a skipped</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">print(a); // a is undef</font></div><div><br></div><div>and I just don't see how this would be possible with slicing at all. Also, I can't see how this would scale to real-world production code.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">
<br>
<div class="gmail-m_-2008566417967606901m_-7889642444734106153gmail-m_-8947792374282452487moz-cite-prefix">On 5/22/19 4:11 PM, Kristóf Umann
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><span id="gmail-m_-2008566417967606901m_-7889642444734106153gmail-m_-8947792374282452487gmail-docs-internal-guid-3c74b999-7fff-e99a-77f6-4bd5385fb78f">
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Hi!</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">I'd like to share some thoughts about my GSoC project, "Enhancing bug reporting with static backward program slicing"[1].</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">My proposal is very specific about </span><span style="font-family:Arial;background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">what</span><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> I'm aiming to improve on, but vague on the </span><span style="font-family:Arial;background-color:transparent;font-style:italic;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">how</span><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> part of it. This mail aims to add clarify some of this.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Program slicing is essentially a technique of discovering data and control dependencies to the slicing criterion, which is a (statement, {set of variables}) pair. Fortunately, tools for control dependencies, namely, dominator set calculations are already implemented, but seems to be unstable with clang's CFG. It would be a great tool if I were able to fix it.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">While my proposal states that I plan to implement an AST-based solution, I'm actually not sure that this is the optimal approach. We could instead inspect CFG blocks in a backwards manner (coupled with the analyzer's call stack), and gather relevant variable in the meanwhile.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">What would also be really great is to assist this traversal with the information the analyzer already has, essentially only inspecting basic blocks the analyzer actually visits.</span></p>
<br>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Here’s my idea for an algorithm (from the point of the slicing criterion already being constructed):</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">
</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><a href="https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit?usp=sharing" target="_blank">https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit?usp=sharing</a>
</span></p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br>
</p>
<p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt">Please
note that this is a veeery rough sketch, I didn't think
about every edge case that exists, but I think it's an
okay start.</p>
<p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br>
</p>
</span><span id="gmail-m_-2008566417967606901m_-7889642444734106153gmail-m_-8947792374282452487gmail-docs-internal-guid-5954bf5e-7fff-362e-5b73-2faf8cd77d3f"><span style="font-family:Arial;background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">[1] <a href="https://docs.google.com/document/d/1ci1BCAKojPlqIxIw1J_K2dnATA3z01PuwR_vHJS55TI/edit" target="_blank">https://docs.google.com/document/d/1ci1BCAKojPlqIxIw1J_K2dnATA3z01PuwR_vHJS55TI/edit</a></span></span> <br class="gmail-m_-2008566417967606901m_-7889642444734106153gmail-m_-8947792374282452487gmail-Apple-interchange-newline">
</div>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div></div>
</blockquote></div></div>