[cfe-dev] [analyzer] Rough sketch of the algorithm for "Enhancing bug reporting with static backward program slicing"

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Fri May 31 13:43:39 PDT 2019


Nice point, indeed this may be of interest. I think it'd be of much more 
interest if the Flag was initially initialized to 0. If we fail to 
justify that, our warning would still be well-justified, but in some 
cases it might be a nice piece of info to add indeded.

What i was actually trying to say is "hey, our approach is probably 
still not entirely perfect, but i've no idea what 'perfect' is and it 
shouldn't really be preventing us from trying out your solution; we 
don't have anything better anyway" :)

On 5/31/19 2:47 AM, Kristóf Umann wrote:
>
>
> On Fri, 31 May 2019 at 03:12, Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>> wrote:
>
>
>
>     On 5/30/19 6:04 PM, Kristóf Umann wrote:
>>
>>
>>     On Fri, 31 May 2019, 02:55 Artem Dergachev, <noqnoqneo at gmail.com
>>     <mailto:noqnoqneo at gmail.com>> wrote:
>>
>>         Interesting.
>>
>>         Condition on line 12 would not be our control dependency -
>>         neither in example 1 nor in example 2, simply because there's
>>         nothing interesting within either branches of that
>>         if-statement. The 'new int' assignment would have been
>>         interesting if we solved the should-not-have-happened problem
>>         of "x should not have been overwritten", but for now we don't
>>         do that.
>>
>>
>>     Thats my game plan exactly :)
>>
>>
>>         At the same time, condition on line 16 would still be a
>>         control dependency of our report in both examples, because
>>         this is where the actual crash happens. So we will have a
>>         note on line 05 in both cases, even if it's completely
>>         unnecessary in the second example.
>>
>>
>>     Why would it be? We'd never come across the nullptr dereference
>>     if it wasn't for flag having its value reset on line 14.
>
>     This note is unnecessary because it's not that surprising that the
>     flag can be true in this case. It simply follows from the presence
>     of the check: if the flag is always false, then this if-statement
>     is dead code, which is a bug on its own.
>
>
> I somewhat disagree. I think the bug report is clearer when we argue 
> about why we assume (and do not know) the value of flag. But I guess 
> we'd really need to see the impact of this before reasoning about it. 
> I don't immediately see any major roadblocks that would prevent me 
> from implementing this as an experiment within an hour or so, and 
> gather some experience with it.
>
>     In example 1 the real thing that's worth conveying is that the
>     value of the flag changes between line 12 and line 16 (and also
>     between line 10 and line 12).
>
>
>     But i don't see an immediate way to formalize this kind of reasoning.
>
>
> I hope it'll work out with the combined solution to the 
> "must-have-happened" and "should-not-have-happened" cases, but it's 
> hard to foresee it this early on.
>
>>
>>         We would indeed never have a note on line 10. This is, in
>>         fact, a dead store, this value is never used, so we wouldn't
>>         be able to track anything back to it.
>>
>>
>>     I messed up the numbering, I did in fact mean 11 when I wrote 10 :)
>>
>>
>>         On 5/30/19 4:48 PM, Kristóf Umann wrote:
>>>         Allow me to elaborate.
>>>
>>>         *Example 1.:*
>>>
>>>         01 int flag;
>>>
>>>         02 bool coin();
>>>
>>>         03
>>>
>>>         04 void foo() {
>>>
>>>         05   flag = coin(); // no note
>>>
>>>         06 }
>>>
>>>         07
>>>
>>>         08 int main() {
>>>
>>>         09   int *x = 0; // x initialized to 0
>>>
>>>         10   flag = 1;
>>>
>>>         11   foo();
>>>
>>>         12   if (flag) // assumed false
>>>
>>>         13     x = new int;
>>>
>>>         14   foo();
>>>
>>>         15
>>>
>>>         16   if (flag) // assumed true
>>>
>>>         17     *x = 5; // warn
>>>
>>>         18 }
>>>
>>>
>>>         *Example 2.: (note the change on line 15)*
>>>
>>>         01 int flag;
>>>
>>>         02 bool coin();
>>>
>>>         03
>>>
>>>         04 void foo() {
>>>
>>>         05   flag = coin(); // no note
>>>
>>>         06 }
>>>
>>>         07
>>>
>>>         08 int main() {
>>>
>>>         09   int *x = 0;
>>>
>>>         10   flag = 1;
>>>
>>>         11   foo();
>>>
>>>         12   if (flag) // assumed false
>>>
>>>         13     x = new int;
>>>
>>>         14   foo();
>>>
>>>         15   x = 0; // x set to 0
>>>
>>>         16   if (flag) // assumed true
>>>
>>>         17     *x = 5; // warn
>>>
>>>         18 }
>>>
>>>
>>>         By tweaking trackExpressionValue a bit:
>>>
>>>         bool trackExpressionValue(const ExplodedNode *N,
>>>
>>>                                  const Stmt *S, BugReport &R,
>>>
>>>                                  bool IsArg=false,
>>>
>>>                                  bool EnableNullFPSuppression=true);
>>>
>>>
>>>         Should a block Bin the bug path be a control dependency of
>>>         N, track the expression value in the condition of B. With
>>>         that, I believe we should get a note for statement 14. For
>>>         statement 10, we still wouldn’t get a note saying that
>>>         flag’s value was invalidated (I fear), but it’s more of a
>>>         should-not-have-happened case. For Example 2 however, we’d
>>>         get a pretty much perfect bug report.
>>>
>>>         On Fri, 31 May 2019 at 00:16, Kristóf Umann
>>>         <dkszelethus at gmail.com <mailto:dkszelethus at gmail.com>> wrote:
>>>
>>>
>>>
>>>             On Mon, 27 May 2019 at 19:48, Artem Dergachev
>>>             <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>>
>>>
>>>
>>>                 On 5/27/19 10:12 AM, Gábor Horváth wrote:
>>>                 > Hi!
>>>                 >
>>>                 > I wanted to share some of my points as well but
>>>                 had little time to do so.
>>>                 >
>>>                 > Artem identified two kinds of issues. One of which
>>>                 has all the
>>>                 > information available in the bug path (in his
>>>                 terminology:
>>>                 > must-have-happened) and the other which has not (
>>>                 > should-not-have-happened). The goal is the same in
>>>                 both of the cases,
>>>                 > identify all dependencies including control and
>>>                 data dependencies. The
>>>                 > main difference is that in one case we could use
>>>                 all the information
>>>                 > available in the bug path while in the other we
>>>                 cannot.
>>>                 >
>>>                 > 1. must-have-happened:
>>>                 > Since the bug-path contains all the information we
>>>                 need, I do agree
>>>                 > with Artem using bug path visitors might be
>>>                 sufficient. I do not know
>>>                 > how reliable `trackExpressionValue` is in
>>>                 identifying data
>>>                 > dependencies, but revising it could be one step in
>>>                 GSoC. For example,
>>>                 > after quickly skimming through the implementation
>>>                 I am not sure if it
>>>                 > is able to track the source of a constant value
>>>                 that was constant
>>>                 > folded during the symbolic execution from
>>>                 different values.
>>>
>>>                 It should be able to, because that's how inlined
>>>                 defensive check
>>>                 suppressions work.
>>>
>>>
>>>             I recently finished (not yet published) my work on
>>>             implementing control dependencies for clang's CFG via
>>>             LLVM's IDFCalculator. Theoretically speaking, by simply
>>>             tracking the variables in the block that is a control
>>>             dependency of the block that is being tracked, we should
>>>             be able to get a proper bug report for Example 1? From
>>>             afar, it seems reasonable.
>>>
>>>                 > Only if we had a way to test this functionality in
>>>                 isolation quickly
>>>                 > :) I think making that possible would also be a
>>>                 valuable addition.
>>>                 >
>>>                 > For control dependencies, it is great that LLVM
>>>                 already has some
>>>                 > algorithms to calculate dominator sets and it
>>>                 might be possible to
>>>                 > make that work for the Clang CFG.
>>>                 >
>>>                 > In my opinion, if you only tackle these problems
>>>                 by the end of the
>>>                 > summer your project already brought great value to
>>>                 the community.
>>>                 >
>>>                 > 2. should-not-have-happened
>>>                 > This category cannot be solved only using visitors
>>>                 for sure. So we
>>>                 > might end up with an implementation for this
>>>                 problem that is
>>>                 > completely separate from the previous one (maybe
>>>                 except for getting
>>>                 > control dependencies?). As Artem pointed out, we
>>>                 might query some
>>>                 > information from the analyzer e.g. which functions
>>>                 were inlined. One
>>>                 > option is to use a slicing algorithm here.
>>>                 However, we should make
>>>                 > sure that the way a function is conservatively
>>>                 evaluated by the
>>>                 > slicing algorithm and the analyzer is compatible.
>>>                 E.g.: if the
>>>                 > analyzer does not invalidate a memory region
>>>                 during evaluating the
>>>                 > call, the statement should not be considered as
>>>                 relevant for the
>>>                 > variable in question. We have several challenges
>>>                 here, the analyzer
>>>                 > reasons about memory regions while the slicing
>>>                 algorithm will probably
>>>                 > not. Also, we need to do something with pointers,
>>>                 since we do not want
>>>                 > to implement a full-fledged pointer analysis. We
>>>                 also need to decide
>>>                 > how to represent arrays, structures etc.
>>>                 >
>>>                 > I skimmed through the algorithm in the google
>>>                 docs, and I think there
>>>                 > are some other details we need to work out. For
>>>                 example, you never
>>>                 > remove anything from the set of relevant
>>>                 variables. Consider the
>>>                 > following example:
>>>                 > x = 5;
>>>                 > y = x + 2;
>>>                 > y = 2;
>>>                 > z = y + 3;
>>>                 >
>>>                 > Whit the slicing criterion (z = y + 3). Here, once
>>>                 y is overwritten
>>>                 > with 2, it is no longer relevant, sox should not
>>>                 end up in the slice.
>>>                 > The algorithm you proposed will not handle this
>>>                 correctly (yet). Also
>>>                 > I am not sure if we actually need
>>>                 therelevant_variable_map or having a
>>>                 > set will be sufficient.
>>>                 >
>>>                 > Regards,
>>>                 > Gabor
>>>                 >
>>>                 > On Mon, 27 May 2019 at 12:52, Kristóf Umann
>>>                 <dkszelethus at gmail.com <mailto:dkszelethus at gmail.com>
>>>                 > <mailto:dkszelethus at gmail.com
>>>                 <mailto:dkszelethus at gmail.com>>> wrote:
>>>                 >
>>>                 >     + Ádám Balogh
>>>                 >
>>>                 >     On Fri, 24 May 2019 at 23:31, Artem Dergachev
>>>                 <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>
>>>                 >     <mailto:noqnoqneo at gmail.com
>>>                 <mailto:noqnoqneo at gmail.com>>> wrote:
>>>                 >
>>>                 >         Ok, i think i have a little bit of clarity.
>>>                 >
>>>                 >         Let's first talk about must-have-happened
>>>                 problems. For instance,
>>>                 >
>>>                 >           int *foo() {
>>>                 >               return coin() ? new int : nullptr;
>>>                 // needs note
>>>                 >           }
>>>                 >           void bar() {
>>>                 >             int *x = foo();
>>>                 >             *x = 1; // warning: null dereference
>>>                 >           }
>>>                 >
>>>                 >         In this case trackExpressionValue solves the
>>>                 >  "must-have-happened" problem of "x must-have been
>>>                 assigned a
>>>                 >         null pointer value" by displaying a note
>>>                 when foo() returns
>>>                 >         nullptr. This currently more or less works
>>>                 correctly - there
>>>                 >         are a lot of bugs but the overall idea behind
>>>                 >  trackExpressionValue is correct.
>>>                 >
>>>                 >         Example 1 in your document is another
>>>                 example of a
>>>                 >         must-have-happened problem: in order for
>>>                 the report to be
>>>                 >         valid, we need to show that 'flag'
>>>                 must-have-changed between
>>>                 >         line 17 and line 13. That's something that
>>>                 the Static Analyzer
>>>                 >         currently cannot handle and you plan to
>>>                 improve upon it.
>>>                 >
>>>                 >         Hʏᴘᴏᴛʜᴇsɪs1. All "must-have-happened"
>>>                 problems should be
>>>                 >         solved with a bug visitor. We don't need
>>>                 any new analysis
>>>                 >         algorithm for that.
>>>                 >
>>>                 >         In particular, i believe that Example 1
>>>                 can be solved by
>>>                 >         extending trackExpressionValue() with a
>>>                 bug visitor that
>>>                 >         detects control-flow-dependencies via this
>>>                 approach of yours:
>>>                 >
>>>                 >         > "Check whether the statement is a
>>>                 control dependency of
>>>                 >         Statement 18 with dominator sets: 18
>>>                 doesn't post dominate 17,
>>>                 >         but every statement in between them does,
>>>                 meaning that 17 is a
>>>                 >         control dependency of 18."
>>>                 >
>>>                 >         I.e., while ascending through
>>>                 ExplodedGraph, we pick
>>>                 >         interesting statements and perform this
>>>                 domination check on
>>>                 >         their predecessor nodes until we reach the
>>>                 control dependency,
>>>                 >         then we track the control dependency
>>>                 recursively by adding
>>>                 >         more visitors.
>>>                 >
>>>                 >         I think this is the first thing we should
>>>                 try on our GSoC.
>>>                 >
>>>                 >         The reason why i believe that Hypothesis 1
>>>                 is true is that all
>>>                 >         the information that we need is fully
>>>                 contained in the bug
>>>                 >         path. If something must have happened on
>>>                 the path, then it's
>>>                 >         probably in the path and we can figure it
>>>                 out by looking at
>>>                 >         the path. If something must have happened
>>>                 on the path and it's
>>>                 >         not in the path, then why did we emit a
>>>                 warning in the first
>>>                 >         place?
>>>                 >
>>>                 >         ==============
>>>                 >
>>>                 >         Now, to should-not-have-happened problems.
>>>                 The main motivating
>>>                 >         example is:
>>>                 >
>>>                 >           void foo(int *y) {
>>>                 >               if (coin()) // needs note
>>>                 >                 *y = 1;
>>>                 >           }
>>>                 >           void bar() {
>>>                 >             int x;
>>>                 >             foo(&x);
>>>                 >             use(x); // warning: use of
>>>                 uninitialized value
>>>                 >           }
>>>                 >
>>>                 >         In order to make the warning
>>>                 comprehensible, we have to
>>>                 >         explain to the user why do we think that
>>>                 'x' is uninitialized,
>>>                 >         as it clearly "should-not-have-been"
>>>                 initialized for the
>>>                 >         warning to be correct. For that purpose it
>>>                 is absolutely
>>>                 >         essential to display the execution path
>>>                 within the inlined
>>>                 >         call to foo().
>>>                 >
>>>                 >         One way to achieve that would be to
>>>                 display *all* execution
>>>                 >         path and leave it up to the user to see
>>>                 that the event that
>>>                 >  should-not-have-happened has indeed never happened.
>>>                 >
>>>                 >         That, however, would be overwhelming, so
>>>                 we use "path pruning"
>>>                 >         that cuts away execution paths in inlined
>>>                 calls that are known
>>>                 >         to have no interesting events happening.
>>>                 Which, in turn, makes
>>>                 >         us incapable of solving
>>>                 should-not-have-happened problems, as
>>>                 >         it's the whole point of the problem to
>>>                 display execution path
>>>                 >         in inlined functions in which nothing has
>>>                 happened.
>>>                 >
>>>                 >         In order to figure this out, we need to
>>>                 learn how to
>>>                 >         discriminate between inlined calls in
>>>                 which nothing
>>>                 >         interesting *could* have happened in the
>>>                 first place and
>>>                 >         inlined calls in which something
>>>                 interesting could have
>>>                 >         happened but *didn't*.
>>>                 >
>>>                 >         NoStoreFuncVisitor solves this problem for
>>>                 the example above.
>>>                 >         The visitor notices that the local
>>>                 variable is passed by
>>>                 >         non-const pointer into the inlined call
>>>                 and the call
>>>                 >         syntactically contains assignments through
>>>                 this pointer,
>>>                 >         therefore this call *could* have
>>>                 initialized the value. The
>>>                 >         visitor then suppresses pruning of this
>>>                 call by emitting an
>>>                 >         interesting note within the call
>>>                 ("returning without
>>>                 >         initializing '*y'"...).
>>>                 >
>>>                 >         However, AST-based analysis in
>>>                 NoStoreFuncVisitor is very
>>>                 >         primitive and easy to trick. A more
>>>                 sophisticated analysis is
>>>                 >         clearly necessary.
>>>                 >
>>>                 >         The second thing we can try in our GSoC is
>>>                 to come up with a
>>>                 >         better analysis specifically for the
>>>                 uninitialized value
>>>                 >         checker, because we already have a place
>>>                 to stick it into and
>>>                 >         we know how exactly do we want to consume
>>>                 the result of such
>>>                 >         analysis and evaluate it.
>>>                 >
>>>                 >         The third thing we can try in our GSoC is
>>>                 to come up with more
>>>                 >         such analysis for other checkers and other
>>>                 >  should-have-happened problems and see if a
>>>                 reusable framework
>>>                 >         for creating such analyses can be implemented.
>>>                 >
>>>                 >
>>>                 >         On 5/23/19 1:25 PM, Kristóf Umann wrote:
>>>                 >>
>>>                 >>
>>>                 >>         On Thu, 23 May 2019 at 21:58, Kristóf Umann
>>>                 >>         <dkszelethus at gmail.com
>>>                 <mailto:dkszelethus at gmail.com>
>>>                 <mailto:dkszelethus at gmail.com
>>>                 <mailto:dkszelethus at gmail.com>>> wrote:
>>>                 >>
>>>                 >>             Please let me know if I didn't
>>>                 address something you
>>>                 >>             mentioned!
>>>                 >>
>>>                 >>             On Thu, 23 May 2019 at 02:29, Artem
>>>                 Dergachev
>>>                 >>             <noqnoqneo at gmail.com
>>>                 <mailto:noqnoqneo at gmail.com>
>>>                 <mailto:noqnoqneo at gmail.com
>>>                 <mailto:noqnoqneo at gmail.com>>> wrote:
>>>                 >>
>>>                 >>                 My primary question is, how do
>>>                 you plan to use the
>>>                 >>                 data that you obtain via your
>>>                 analysis?
>>>                 >>
>>>                 >>
>>>                 >>              Gather relevant statements. Keep
>>>                 ExplodedNodes in the
>>>                 >>             bugpath for which
>>>                 PathDiagnosticLocation::getStmt() is in
>>>                 >>             the set (or, something like that at
>>>                 least). But honestly,
>>>                 >>             I didn't think too much about that
>>>                 yet :^) I guess we
>>>                 >>             could just gather ExplodedNodes
>>>                 rather than statements,
>>>                 >>             we'll see.
>>>                 >>
>>>                 >>                 Like, do you plan to backtrack
>>>                 the value of all
>>>                 >>  relevant variables and/or expressions in the final
>>>                 >>                 bug report that were also
>>>                 encountered along the bug
>>>                 >>                 path? If yes, then is it possible
>>>                 that the slicing
>>>                 >>  criterion gets updated in the process and you'll
>>>                 have
>>>                 >>                 to re-run the CFG-based analysis
>>>                 to take it into account?
>>>                 >>
>>>                 >>
>>>                 >>             No, the slicing criterion is what it
>>>                 is, and will not
>>>                 >>             change. The set of relevant
>>>                 statements to the slicing
>>>                 >>             criterion define the actual program
>>>                 slice, which is
>>>                 >>             basically the thing we're going for
>>>                 in this project. When
>>>                 >>             it turns out that a value of a
>>>                 variable directly affects
>>>                 >>             a variable in the slicing criterion
>>>                 (either through data
>>>                 >>             or control dependency), we just add
>>>                 it to the set of
>>>                 >>             relevant variables, and then if
>>>                 something in the set of
>>>                 >>             relevant variables is affected (some
>>>                 statements or
>>>                 >>             variables turn out to be a
>>>                 control/data dependencies to
>>>                 >>             them), then it's added to the
>>>                 respective relevancy sets.
>>>                 >>             Should we only traverse the basic
>>>                 blocks the analyzer
>>>                 >>             visited, I think we can pull off the
>>>                 slicing with a
>>>                 >>             single pass.
>>>                 >>
>>>                 >>
>>>                 >>         We'll see about that single pass thing
>>>                 though. I'm gathering
>>>                 >>         some tricky examples in the document I
>>>                 shared and working on
>>>                 >>         a neat algorithm.
>>>                 >>
>>>                 >>                 > What would also be really great
>>>                 is to assist this
>>>                 >>  traversal with the information the analyzer already
>>>                 >>                 has, essentially only inspecting
>>>                 basic blocks the
>>>                 >>  analyzer actually visits.
>>>                 >>
>>>                 >>                 You mean visits on the current
>>>                 bug path or visits on
>>>                 >>                 the equivalence class of bugs or
>>>                 visits in general
>>>                 >>                 during analysis?
>>>                 >>
>>>                 >>  Regardless, for many problems ideally we should
>>>                 >>  traverse basic blocks that the analyzer *didn't*
>>>                 >>                 visit (eg., if the function
>>>                 didn't initialize the
>>>                 >>  variable on the current path, we're interested in
>>>                 >>                 parts of the code in which it
>>>                 *did* initialize the
>>>                 >>  variable, even though we didn't necessarily have
>>>                 time
>>>                 >>                 to visit them).
>>>                 >>
>>>                 >>
>>>                 >>             Well I mean slicing by definition
>>>                 isn't intended to do
>>>                 >>             that. The entire point of it is to
>>>                 gather the smallest
>>>                 >>             slice related to the slicing
>>>                 criterion, and my project
>>>                 >>             aims to fill in the gaps where we
>>>                 don't provide enough
>>>                 >>  information.
>>>                 >>
>>>                 >>                 It actually sounds as if all
>>>                 problems that we're
>>>                 >>                 trying to solve here can be
>>>                 classified into
>>>                 >>  "must-have-happened" problems (eg., "variable
>>>                 >>  must-have been declared without initialization",
>>>                 >>  "variable must-have been set to nullptr", "memory
>>>                 >>  must-have been allocated") and
>>>                 >>  "should-not-have-happened" problems (eg., "variable
>>>                 >>  should-not-have been initialized", "null value
>>>                 >>  should-not-have been overwritten", "pointer
>>>                 >>  should-not-have escaped").
>>>                 >>
>>>                 >>                 For must-have-happened problems,
>>>                 i'm recently under
>>>                 >>                 an impression that we should
>>>                 suppress the bug report
>>>                 >>  entirely when we fail to solve them (i.e., if
>>>                 fail to
>>>                 >>                 explain to the user where exactly
>>>                 does this happen,
>>>                 >>                 then the report is impossible to
>>>                 understand anyway).
>>>                 >>                 This is currently a very big
>>>                 problem for our null
>>>                 >>  dereference checker on some codebases, especially
>>>                 >>                 because it uses this tracking for
>>>                 suppressions as
>>>                 >>                 well (aka inlined defensive
>>>                 checks), so when it fails
>>>                 >>                 to track the variable it's likely
>>>                 a legit false
>>>                 >>  positive as well, not simply a
>>>                 hard-to-understand report.
>>>                 >>
>>>                 >>
>>>                 >>             I think this set calculating approach
>>>                 inherently gathers
>>>                 >>             far more information, allowing us to
>>>                 make better
>>>                 >>             judgement on whether we should
>>>                 suppress the report.
>>>                 >>
>>>                 >>                 For should-not-have-happened
>>>                 problems i'm much more
>>>                 >>  confused. We're talking about looking for places
>>>                 >>                 where it *could* have happened
>>>                 and then trying to
>>>                 >>                 explain to the user why none of
>>>                 them have actually
>>>                 >>  happened. I'm not sure what are the consequences of
>>>                 >>                 failing to explain to the user
>>>                 why didn't a
>>>                 >>  particular piece of code do something, because i've
>>>                 >>                 no idea how do users intuitively
>>>                 figure out which
>>>                 >>                 code *could* have done these
>>>                 things and which clearly
>>>                 >>  couldn't.
>>>                 >>
>>>                 >>
>>>                 >>             Im really at a loss here :) Can you
>>>                 provide some example
>>>                 >>             as to what a problematic
>>>                 "must-have-happened" bug report
>>>                 >>             would look like, and how would you
>>>                 like to see it
>>>                 >>             improved? Same for
>>>                 "should-not-have-happened". Because as
>>>                 >>             I understand it, and I might be
>>>                 wrong, you have problems
>>>                 >>             with this:
>>>                 >>
>>>                 >>             int a; // declaring a without
>>>                 initializing
>>>                 >>
>>>                 >>             if (rand()) // assuming that the
>>>                 condition is false
>>>                 >>             a = 5;
>>>                 >>
>>>                 >>             print(a); // a is undef
>>>                 >>
>>>                 >>             And prefer to see this:
>>>                 >>
>>>                 >>             int a; // declaring a without
>>>                 initializing
>>>                 >>
>>>                 >>             if (rand()) // should this condition
>>>                 be false, a's value
>>>                 >>             will be indeterministic
>>>                 >>             a = 5; // writing to a skipped
>>>                 >>
>>>                 >>             print(a); // a is undef
>>>                 >>
>>>                 >>             and I just don't see how this would
>>>                 be possible with
>>>                 >>             slicing at all. Also, I can't see how
>>>                 this would scale to
>>>                 >>             real-world production code.
>>>                 >>
>>>                 >>
>>>                 >>                 On 5/22/19 4:11 PM, Kristóf Umann
>>>                 wrote:
>>>                 >>>
>>>                 >>>                 Hi!
>>>                 >>>
>>>                 >>>
>>>                 >>>                 I'd like to share some thoughts
>>>                 about my GSoC
>>>                 >>>  project, "Enhancing bug reporting with static
>>>                 >>>  backward program slicing"[1].
>>>                 >>>
>>>                 >>>
>>>                 >>>                 My proposal is very specific
>>>                 about whatI'm aiming to
>>>                 >>>  improve on, but vague on the howpart of it. This
>>>                 >>>  mail aims to add clarify some of this.
>>>                 >>>
>>>                 >>>
>>>                 >>>  Program slicing is essentially a technique of
>>>                 >>>  discovering data and control dependencies to the
>>>                 >>>  slicing criterion, which is a (statement, {set of
>>>                 >>>  variables}) pair. Fortunately, tools for control
>>>                 >>>  dependencies, namely, dominator set
>>>                 calculations are
>>>                 >>>  already implemented, but seems to be unstable with
>>>                 >>>  clang's CFG. It would be a great tool if I were
>>>                 able
>>>                 >>>                 to fix it.
>>>                 >>>
>>>                 >>>
>>>                 >>>  While my proposal states that I plan to
>>>                 implement an
>>>                 >>>  AST-based solution, I'm actually not sure that this
>>>                 >>>                 is the optimal approach. We
>>>                 could instead inspect
>>>                 >>>                 CFG blocks in a backwards manner
>>>                 (coupled with the
>>>                 >>>  analyzer's call stack), and gather relevant
>>>                 variable
>>>                 >>>                 in the meanwhile.
>>>                 >>>
>>>                 >>>
>>>                 >>>  What would also be really great is to assist this
>>>                 >>>  traversal with the information the analyzer already
>>>                 >>>  has, essentially only inspecting basic blocks the
>>>                 >>>  analyzer actually visits.
>>>                 >>>
>>>                 >>>
>>>                 >>>  Here’s my idea for an algorithm (from the point of
>>>                 >>>                 the slicing criterion already
>>>                 being constructed):
>>>                 >>>
>>>                 >>>
>>>                 https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit?usp=sharing
>>>                 >>>
>>>                 >>>
>>>                 >>>
>>>                 >>>  Please note that this is a veeery rough sketch, I
>>>                 >>>  didn't think about every edge case that exists, but
>>>                 >>>                 I think it's an okay start.
>>>                 >>>
>>>                 >>>
>>>                 >>>                 [1]
>>>                 >>>
>>>                 https://docs.google.com/document/d/1ci1BCAKojPlqIxIw1J_K2dnATA3z01PuwR_vHJS55TI/edit
>>>                 >>>
>>>                 >>
>>>                 >
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190531/ace2d4ad/attachment.html>


More information about the cfe-dev mailing list