[cfe-dev] [analyzer][GSoC] Problem Statement: Improving BugReporter with static backward program slicing

Thu Apr 4 13:22:46 PDT 2019

I kinda gathered some understanding of the problem and the solution and 
trying to come up with some sort of a plan, and for that purpose i'm 
trying to step back from "we should totally do this cool technique" 
towards a more "adult", "boring" point of view that consists in asking, 
"what *do* we need to do?".

For now i'm seeing that the work on this problem can be decomposed into 
three chunks:

(1) Collect slicing criteria from various sources,
(2) Perform slicing to discover data and control dependencies,
(3) Use information about dependencies to improve the bug report.

Chunk (3) sounds straightforward: we just enable or disable path pieces 
based on information we already have. I don't fully understand how it'll 
go, but i don't see any unknowns here.

Chunk (1) sounds like a large amount of routine, repetitive work: 
convert sources of slicing criteria, such as checkers, to the new system 
one-by-one. There is certain amount of known unknowns here: some sources 
may be tricky to take advantage of (eg., liveness-related sources), some 
may put stress on the checkers.

Chunk (2) is the research-intensive part: it has an easy 80% solution 
(AST matching based detection that has already been implemented and 
proved useful for one of the checkers) and an idea to improve upon it by 
implementing a more sophisticated analysis.

Because chunk (2) is research-intensive, we need experimental data that 
we currently don't have. In order to avoid spending a lot of time on 
something that isn't going to work, we should try to gather that data as 
soon as possible in our plan.

Because chunk (1) is repetitive, we don't need to implement it 
completely; we can pick one or two sources and get them right.

Therefore here's the plan that emerges:
i. Pick one or two slicing criteria sources and implement them.
ii. Implement a simple AST-based algorithm for discovering dependencies.
iii. Apply the information produced by that algorithm to the bug report.
iv. Run the updated analyzer on real-world codebases and see what 
problems remain.
v. Implement solutions to these problems.

Solutions on step v. may or may not be about coding a more sophisticated 
slicing algorithm. They might as well be improvements in chunk (1), i.e. 
discovering more slicing criteria, or they may be about a "horizontal" 
improvement in the existing primitive algorithm (eg., cover more forms 
of control flow or different variants of data dependencies, which should 
have been a separate chunk).

That's roughly how i imagine it.

On 4/4/19 10:32 AM, Kristóf Umann wrote:
> > Instead we can also have a look at the execution path on the 
> ExplodedGraph that goes through the true-branch and see if the value 
> remains null when it exits the if-branch. Kristof, you're saying that 
> you plan to do the slicing over the ExplodedGraph - is that what you 
> mean? This could work as long as the other branch is actually 
> *explored* by the analyzer. If it isn't, it'll be almost impossible to 
> detect, and i don't know how often would this happen, but there's a 
> good chance it's going to be much more rare than us having problems 
> with such highlighting right now.
> >
> > We can also come up with a completely separate CFG-based analysis 
> for this purpose. This is probably the most precise thing to do, 
> because the problem is essentially an all-paths problem (which is why 
> loss of coverage in the ExplodedGraph would bite us). I cannot 
> estimate how difficult such analysis would be to implement (probably 
> pretty scary), but i think Kristof wasn't looking in this direction.
>
> I guess it depends on what we want to do.
>
> Static slicing isn't defined for the ExplodedGraph, I'll have show 
> that we can transform an ExplodedGraph into a CFG, or define the 
> algorithm on it. From afar, it's very hard to tell how difficult this 
> would be. When looking for a long-term solution, I think it would be 
> better to find an approach where we won't have to worry about whether 
> the analyzer explored everything or not. On the other hand, we could 
> eventually turn this backward static slicing into a backward 
> /conditional/ slicing with the constraints the analyzer gathered, 
> should we chose to go with an ExplodedGraph based algorithm.
>
> So we'll have to balance these two. Greater accuracy with a chance 
> that the slice will be too big (CFG), or not as accurate but resulting 
> in a smaller slice (ExplodedGraph). I am not yet sure which one I'd 
> like to pursue, but the latter would probably fit our needs better.
>
> > Id suggest first starting with a syntax-based approach (1) because 
> it sounds to me that the approach that we chose for identifying 
> dependencies on paths that aren't part of the bug path is orthogonal 
> to actually making use of that information to produce notes. Once we 
> learn how to make use of this information, we'll have a taste of this 
> medicine and see if it works. Then we'll see if we need to transition 
> to (2) or (3) depending on such evaluation.
>
> I'll be honest, I don't really understand what you mean here. Would 
> you like to see a more down-to-earth syntax-based solution as an 
> exercise "for the big thing"? If so, I'm not sure what you mean 
> specifically.
>
>
> On Thu, 4 Apr 2019 at 01:06, Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>> wrote:
>
>     I mean, assigning 3 to 'n' :)
>
>     On 4/3/19 4:05 PM, Artem Dergachev wrote:
>>     *Assigning 'n' to '3'*
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190404/7a3ad7e1/attachment.html>