[cfe-dev] [analyzer] Rough sketch of the algorithm for "Enhancing bug reporting with static backward program slicing"

Wed May 22 17:29:48 PDT 2019

My primary question is, how do you plan to use the data that you obtain 
via your analysis?

Like, do you plan to backtrack the value of all relevant variables 
and/or expressions in the final bug report that were also encountered 
along the bug path? If yes, then is it possible that the slicing 
criterion gets updated in the process and you'll have to re-run the 
CFG-based analysis to take it into account?

 > What would also be really great is to assist this traversal with the 
information the analyzer already has, essentially only inspecting basic 
blocks the analyzer actually visits.

You mean visits on the current bug path or visits on the equivalence 
class of bugs or visits in general during analysis?

Regardless, for many problems ideally we should traverse basic blocks 
that the analyzer *didn't* visit (eg., if the function didn't initialize 
the variable on the current path, we're interested in parts of the code 
in which it *did* initialize the variable, even though we didn't 
necessarily have time to visit them).

It actually sounds as if all problems that we're trying to solve here 
can be classified into "must-have-happened" problems (eg., "variable 
must-have been declared without initialization", "variable must-have 
been set to nullptr", "memory must-have been allocated") and 
"should-not-have-happened" problems (eg., "variable should-not-have been 
initialized", "null value should-not-have been overwritten", "pointer 
should-not-have escaped").

For must-have-happened problems, i'm recently under an impression that 
we should suppress the bug report entirely when we fail to solve them 
(i.e., if fail to explain to the user where exactly does this happen, 
then the report is impossible to understand anyway). This is currently a 
very big problem for our null dereference checker on some codebases, 
especially because it uses this tracking for suppressions as well (aka 
inlined defensive checks), so when it fails to track the variable it's 
likely a legit false positive as well, not simply a hard-to-understand 
report.

For should-not-have-happened problems i'm much more confused. We're 
talking about looking for places where it *could* have happened and then 
trying to explain to the user why none of them have actually happened. 
I'm not sure what are the consequences of failing to explain to the user 
why didn't a particular piece of code do something, because i've no idea 
how do users intuitively figure out which code *could* have done these 
things and which clearly couldn't.

On 5/22/19 4:11 PM, Kristóf Umann wrote:
>
> Hi!
>
>
> I'd like to share some thoughts about my GSoC project, "Enhancing bug 
> reporting with static backward program slicing"[1].
>
>
> My proposal is very specific about whatI'm aiming to improve on, but 
> vague on the howpart of it. This mail aims to add clarify some of this.
>
>
> Program slicing is essentially a technique of discovering data and 
> control dependencies to the slicing criterion, which is a (statement, 
> {set of variables}) pair. Fortunately, tools for control dependencies, 
> namely, dominator set calculations are already implemented, but seems 
> to be unstable with clang's CFG. It would be a great tool if I were 
> able to fix it.
>
>
> While my proposal states that I plan to implement an AST-based 
> solution, I'm actually not sure that this is the optimal approach. We 
> could instead inspect CFG blocks in a backwards manner (coupled with 
> the analyzer's call stack), and gather relevant variable in the meanwhile.
>
>
> What would also be really great is to assist this traversal with the 
> information the analyzer already has, essentially only inspecting 
> basic blocks the analyzer actually visits.
>
>
> Here’s my idea for an algorithm (from the point of the slicing 
> criterion already being constructed):
>
> https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit?usp=sharing 
>
>
>
> Please note that this is a veeery rough sketch, I didn't think about 
> every edge case that exists, but I think it's an okay start.
>
>
> [1] 
> https://docs.google.com/document/d/1ci1BCAKojPlqIxIw1J_K2dnATA3z01PuwR_vHJS55TI/edit 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190522/dee16939/attachment.html>