[cfe-dev] [analyzer] Rough sketch of the algorithm for "Enhancing bug reporting with static backward program slicing"

Thu May 23 13:25:27 PDT 2019

On Thu, 23 May 2019 at 21:58, Kristóf Umann <dkszelethus at gmail.com> wrote:

> Please let me know if I didn't address something you mentioned!
>
> On Thu, 23 May 2019 at 02:29, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>
>> My primary question is, how do you plan to use the data that you obtain
>> via your analysis?
>>
>
>  Gather relevant statements. Keep ExplodedNodes in the bugpath for which
> PathDiagnosticLocation::getStmt() is in the set (or, something like that at
> least). But honestly, I didn't think too much about that yet :^) I guess we
> could just gather ExplodedNodes rather than statements, we'll see.
>
>
>> Like, do you plan to backtrack the value of all relevant variables and/or
>> expressions in the final bug report that were also encountered along the
>> bug path? If yes, then is it possible that the slicing criterion gets
>> updated in the process and you'll have to re-run the CFG-based analysis to
>> take it into account?
>>
>
> No, the slicing criterion is what it is, and will not change. The set of
> relevant statements to the slicing criterion define the actual program
> slice, which is basically the thing we're going for in this project. When
> it turns out that a value of a variable directly affects a variable in the
> slicing criterion (either through data or control dependency), we just add
> it to the set of relevant variables, and then if something in the set of
> relevant variables is affected (some statements or variables turn out to be
> a control/data dependencies to them), then it's added to the respective
> relevancy sets. Should we only traverse the basic blocks the analyzer
> visited, I think we can pull off the slicing with a single pass.
>

We'll see about that single pass thing though. I'm gathering some tricky
examples in the document I shared and working on a neat algorithm.

>
>
>> > What would also be really great is to assist this traversal with the
>> information the analyzer already has, essentially only inspecting basic
>> blocks the analyzer actually visits.
>>
>> You mean visits on the current bug path or visits on the equivalence
>> class of bugs or visits in general during analysis?
>>
>> Regardless, for many problems ideally we should traverse basic blocks
>> that the analyzer *didn't* visit (eg., if the function didn't initialize
>> the variable on the current path, we're interested in parts of the code in
>> which it *did* initialize the variable, even though we didn't necessarily
>> have time to visit them).
>>
>
> Well I mean slicing by definition isn't intended to do that. The entire
> point of it is to gather the smallest slice related to the slicing
> criterion, and my project aims to fill in the gaps where we don't provide
> enough information.
>
> It actually sounds as if all problems that we're trying to solve here can
>> be classified into "must-have-happened" problems (eg., "variable must-have
>> been declared without initialization", "variable must-have been set to
>> nullptr", "memory must-have been allocated") and "should-not-have-happened"
>> problems (eg., "variable should-not-have been initialized", "null value
>> should-not-have been overwritten", "pointer should-not-have escaped").
>>
>> For must-have-happened problems, i'm recently under an impression that we
>> should suppress the bug report entirely when we fail to solve them (i.e.,
>> if fail to explain to the user where exactly does this happen, then the
>> report is impossible to understand anyway). This is currently a very big
>> problem for our null dereference checker on some codebases, especially
>> because it uses this tracking for suppressions as well (aka inlined
>> defensive checks), so when it fails to track the variable it's likely a
>> legit false positive as well, not simply a hard-to-understand report.
>>
>
> I think this set calculating approach inherently gathers far more
> information, allowing us to make better judgement on whether we should
> suppress the report.
>
>
>> For should-not-have-happened problems i'm much more confused. We're
>> talking about looking for places where it *could* have happened and then
>> trying to explain to the user why none of them have actually happened. I'm
>> not sure what are the consequences of failing to explain to the user why
>> didn't a particular piece of code do something, because i've no idea how do
>> users intuitively figure out which code *could* have done these things and
>> which clearly couldn't.
>>
>
> Im really at a loss here :) Can you provide some example as to what a
> problematic "must-have-happened" bug report would look like, and how would
> you like to see it improved? Same for "should-not-have-happened". Because
> as I understand it, and I might be wrong, you have problems with this:
>
> int a; // declaring a without initializing
>
> if (rand()) // assuming that the condition is false
>   a = 5;
>
> print(a); // a is undef
>
> And prefer to see this:
>
> int a; // declaring a without initializing
>
> if (rand()) // should this condition be false, a's value will be
> indeterministic
>   a = 5; // writing to a skipped
>
> print(a); // a is undef
>
> and I just don't see how this would be possible with slicing at all. Also,
> I can't see how this would scale to real-world production code.
>
>>
>> On 5/22/19 4:11 PM, Kristóf Umann wrote:
>>
>> Hi!
>>
>> I'd like to share some thoughts about my GSoC project, "Enhancing bug
>> reporting with static backward program slicing"[1].
>>
>> My proposal is very specific about what I'm aiming to improve on, but
>> vague on the how part of it. This mail aims to add clarify some of this.
>>
>> Program slicing is essentially a technique of discovering data and
>> control dependencies to the slicing criterion, which is a (statement, {set
>> of variables}) pair. Fortunately, tools for control dependencies, namely,
>> dominator set calculations are already implemented, but seems to be
>> unstable with clang's CFG. It would be a great tool if I were able to fix
>> it.
>>
>> While my proposal states that I plan to implement an AST-based solution,
>> I'm actually not sure that this is the optimal approach. We could instead
>> inspect CFG blocks in a backwards manner (coupled with the analyzer's call
>> stack), and gather relevant variable in the meanwhile.
>>
>> What would also be really great is to assist this traversal with the
>> information the analyzer already has, essentially only inspecting basic
>> blocks the analyzer actually visits.
>>
>> Here’s my idea for an algorithm (from the point of the slicing criterion
>> already being constructed):
>>
>>
>> https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit?usp=sharing
>>
>>
>> Please note that this is a veeery rough sketch, I didn't think about
>> every edge case that exists, but I think it's an okay start.
>>
>>
>> [1]
>> https://docs.google.com/document/d/1ci1BCAKojPlqIxIw1J_K2dnATA3z01PuwR_vHJS55TI/edit
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190523/b6923010/attachment.html>