[cfe-dev] [analyzer][GSoC] Problem Statement: Improving BugReporter with static backward program slicing

Wed Apr 3 15:45:05 PDT 2019

If i understand correctly (hmm, there seem to be image problems again), 
in this bug report it's about doing trackNullOrUndefValue() over `n` 
whenever we track the uninitialized value back to a load from an array 
with index `n` - i think it easily gets us straight to `a = 3` and it 
doesn't require introducing any new analyses.

On 4/3/19 3:35 PM, Kristóf Umann wrote:
> Thank you for giving this this much thought!!! I reply somewhat slowly 
> because I'm trying to keep things fairly formal and precise, an area 
> in which I still have to improve on.
>
> So, in short, Example 1. was a poor choice. You're right, the /value 
> /of <dynamically allocated int object> isn't affected by what points 
> to it -- using the traditional Weiser algorithm wouldn't contain 
> statement 5 and 6. In a sense, your mutex code is similar to Example 
> 1, we're not interested, at least in this case, what the actual value 
> of the mutex is.
>
> For now, I'm only planning to use this technique to reason about 
> /values /of certain variables. Example 1. wanted to demonstrate how 
> BugReporter is prone to make a too short of a bugpath -- however, the 
> following example can show this as well:
>
> *asd.cpp*
> 1  void useInt(int);
> 2
> 3  int getInt(int x) {
> 4    int a;
> 5
> 6    if (x > 0)
> 7      a = 3;
> 8    else
> 9      a = 2;
> 10
> 11   return a;
> 12 }
> 13
> 14 int g();
> 15
> 16 int main() {
> 17   int arr[10];
> 18
> 19   for (int i = 0; i < 3; ++i)
> 20     arr[i] = 0;
> 21
> 22   int x = g();
> 23   int n = getInt(x);
> 24  useInt(arr[n]);
> 25 }
>
> image.png
>
> The attached ExplodedGraph demonstrates that the analyzer assumed 
> that n == 3, but the bugreport doesn't mention that.
>
> Alright, with my /original point /being made properly, let's address 
> the raised questions.
>
> *Q: *My primary question is how much do you think this will put stress 
> on the checkers. Like, how much more difficult would it be to write 
> checkers when we require them to include the slicing criterion with 
> their bug reports? How much of it would we (i.e., BugReporter) be able 
> to infer ourselves?
> *
> *
> *A: *As stated in this response, this approach would only consider the 
> value of variables. This could essentially be regarded as a more 
> general, better approach to bugreporter::trackNullOrUndefValue() and 
> similar functions. It would be great however to make this general 
> enough to eventually also handle whatever properties (e.g. lockedness) 
> regions might have. My worry is that it would offer little gain for a 
> lot of chore, the current BugReporterVisitor system (especially with 
> your easier-to-implement NoteTag system) gets the job done just fine. 
> Most of it, as I know, boils down to "Let's find the specific 
> ExplodedNode where the property of this region was changed", and a 
> simple traversal of the bugpath is all you need for that.
>
> With that being said, I plan to construct the set of variables from 
> the interesting symbol set, and this API is already present in the 
> code. We'll see how that goes though :^).
>
> However, liveness of variables is a very interesting topic. I really 
> need to do some research on this before elaborating more, but I'll try 
> to see how we could tackle this with backward slicing.
>
> *Q:* Instead we can also have a look at the execution path on the 
> ExplodedGraph that goes through the true-branch and see if the value 
> remains null when it exits the if-branch. Kristof, you're saying that 
> you plan to do the slicing over the ExplodedGraph - is that what you 
> mean? This could work as long as the other branch is actually 
> *explored* by the analyzer. If it isn't, it'll be almost impossible to 
> detect, and i don't know how often would this happen, but there's a 
> good chance it's going to be much more rare than us having problems 
> with such highlighting right now.
>
> We can also come up with a completely separate CFG-based analysis for 
> this purpose. This is probably the most precise thing to do, because 
> the problem is essentially an all-paths problem (which is why loss of 
> coverage in the ExplodedGraph would bite us). I cannot estimate how 
> difficult such analysis would be to implement (probably pretty scary), 
> but i think Kristof wasn't looking in this direction.
>
> *A: *Well, I originally imagined it to implement this on the 
> ExplodedGraph, but if achievable, a CFG based solution would indeed be 
> the ideal thing. This is some great feedback -- Again, I'll follow up 
> with this after some more research.
>
> -----------------
>
> In the meanwhile (pretty much before I'm done with the proposal), I 
> won't have the time to interact with Phabricator much though :^)
>
> On Wed, 3 Apr 2019 at 23:40, Gábor Horváth <xazax.hun at gmail.com 
> <mailto:xazax.hun at gmail.com>> wrote:
>
>     I just realized after reading your answer how ambiguous my email
>     was. When I said it is possible to trick the NoStoreFuncVisitor
>     what I meant was to prevent it from emitting notes and thus
>     pruning interesting parts of the bug path. But I am glad that we
>     are actually on the same page regarding the value of detecting
>     control dependencies. The example 4 you provided is a very
>     compelling one :)
>
>     On Wed, 3 Apr 2019 at 23:16, Artem Dergachev <noqnoqneo at gmail.com
>     <mailto:noqnoqneo at gmail.com>> wrote:
>
>         The thing in NoStoreFuncVisitor for IVars looks like it's just
>         saying "let's see if there's a data dependency on at least one
>         statement in this call". Given that the statement found by
>         such AST matcher was not executed on the bug's execution path,
>         there must be at least one control flow dependency in this
>         function as well.
>
>         I think this pattern-matching may only have false negatives,
>         i.e. it's possible to write into a variable/field/ivar without
>         producing an assignement-operator to a DeclRefExpr for that
>         variable/field/ivar, but it's not possible to write an
>         assignment into a DeclRefExpr for that variable/field/ivar
>         without modifying the variable/field/ivar in the process. A
>         more sophisticated analysis may find situations when such
>         write is always done via an aliasing pointer, but i cannot
>         think of anything else. Well, there's also the self-assignment
>         cornercase, i.e. `x = x`, but that's esoteric.
>
>         ------------------------
>
>         Ok, i think i understand: one of the great goals for this
>         project would be to correctly highlight control flow
>         dependencies, which we currently don't do. I.e.:
>
>         *Example 4.*
>
>         01  int flag;
>         02
>         03  bool coin();
>         04
>         05  void foo() {
>         06    flag = coin();
>         07  }
>         08
>         09  void bar() {
>         10    int *x = 0;
>         11    // Set the flag to true.
>         12    flag = true;
>         13    foo();
>         14    if (flag) { // Taking false branch... wait, what?
>         15      x = new int;
>         16    }
>         17    foo();
>         18    if (flag) { // Now it's taking true branch again?!
>         19      *x = 1; // Null dereference.
>         20    }
>         21  }
>
>         Because we wouldn't highlight the data dependency on line 06
>         of the control flow dependency on lines 14 and 18, the
>         positive may be incomprehensible.
>
>         This one's tricky to do with a visitor because it's hard to
>         figure out that the true-branch of the if-statement on line 14
>         would have updated `x`, given that we didn't execute it.
>
>         (1) Again, we can do a syntactic match over the true-branch,
>         like we did in NoStoreFuncVisitor, to see if there are any
>         data depenencies within it (there are, line 15). Again, this
>         may fail to cover the tricky cases (what if this branch
>         instead calls a function that initializes `x`?).
>
>         (2) Instead we can also have a look at the execution path on
>         the ExplodedGraph that goes through the true-branch and see if
>         the value remains null when it exits the if-branch. Kristof,
>         you're saying that you plan to do the slicing over the
>         ExplodedGraph - is that what you mean? This could work as long
>         as the other branch is actually *explored* by the analyzer. If
>         it isn't, it'll be almost impossible to detect, and i don't
>         know how often would this happen, but there's a good chance
>         it's going to be much more rare than us having problems with
>         such highlighting right now.
>
>         (3) We can also come up with a completely separate CFG-based
>         analysis for this purpose. This is probably the most precise
>         thing to do, because the problem is essentially an all-paths
>         problem (which is why loss of coverage in the ExplodedGraph
>         would bite us). I cannot estimate how difficult such analysis
>         would be to implement (probably pretty scary), but i think
>         Kristof wasn't looking in this direction.
>
>         Yay, i finally understand how to solve these problems we have!
>
>         I'd suggest first starting with a syntax-based approach (1)
>         because it sounds to me that the approach that we chose for
>         identifying dependencies on paths that aren't part of the bug
>         path is orthogonal to actually making use of that information
>         to produce notes. Once we learn how to make use of this
>         information, we'll have a taste of this medicine and see if it
>         works. Then we'll see if we need to transition to (2) or (3)
>         depending on such evaluation.
>
>         What i just described sounds like a fairly realistic plan to
>         me. Of course, the question still remains about how
>         checker-specific would the analysis be. If i learned anything,
>         it's "don't estimate the power of the esoteric checker
>         contracts". But the idea that we can often get away with just
>         tracking interesting regions and symbols and consuming
>         trackExpressionValue() hints sounds like a good thing to try.
>
>
>
>         On 4/2/19 11:23 PM, Gábor Horváth wrote:
>>         Yeah, this particular example would be fixed, but we do not
>>         know how noisy it would be in general. One way to make it a
>>         bit less noisy could be something like what the
>>         NoStoreFuncVisitor already does for IVars. We could also add
>>         some syntactic checks to see if the method/function actually
>>         has the possibility to escape the region.
>>
>>         I am not familiar with Obj-C, but I suspect it would be
>>         possible to trick NoStoreFuncVisitor, or at least the
>>         syntactic checks in `potentiallyWritesIntoIvar `.
>>         I think the main value of slicing here could be to get rid of
>>         those fully syntactic heuristics and replace them by
>>         something smarter. Does this make sense?
>>
>>         On Wed, 3 Apr 2019 at 02:25, Artem Dergachev
>>         <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> wrote:
>>
>>             One more question about Example 1. Do you think this
>>             particular example could be solved by developing some
>>             sort of NoStoreFuncVisitor but for MallocChecker? I.e.,
>>             in addition to emitting notes like "returning without
>>             initializing variable `x`", it could emit notes like
>>             "returning without escaping pointer `p`". If such visitor
>>             is developed, what would be the differences between your
>>             proposal and whatever behavior we'll get with such
>>             visitor? In other words, is it possible to construct an
>>             example with the "use of uninitialized variable" checker
>>             that, like Example 1, has vital pieces of information
>>             missing, given that this checker already uses
>>             NoStoreFuncVisitor?
>>
>>             I mean, it's most likely a terrible idea to develop such
>>             visitor for MallocChecker, because the only reason this
>>             visitor works more or less reliably is the existence of
>>             const-qualifiers. For MallocChecker they don't exist, so
>>             it's going to be hard to figure out if a function was
>>             *supposed* to escape the pointer.
>>
>>             On 4/2/19 5:14 PM, Artem Dergachev wrote:
>>>             Ahaaaaa, ooooookkkkk, iiiiiiii seeeeeeeeee!
>>>
>>>             Sounds fascinating, must-have and hard.
>>>
>>>             My primary question is how much do you think this will
>>>             put stress on the checkers. Like, how much more
>>>             difficult would it be to write checkers when we require
>>>             them to include the slicing criterion with their bug
>>>             reports? How much of it would we (i.e., BugReporter) be
>>>             able to infer ourselves?
>>>
>>>             ------------------------
>>>
>>>             Say, let's take Example 1. You describe the slicing
>>>             criterion as:
>>>
>>>                 (13, <dynamically allocated int object>)
>>>
>>>             The value of the dynamically allocated into object
>>>             remains the same regardless of whether the object is
>>>             stored in global_ptr or not, so the slice doesn't need
>>>             to include line 5 or 6. Therefore i think that the
>>>             slicing criterion you proposed is not what we're looking
>>>             for, and the real slicing criterion we're looking for is:
>>>
>>>                 (13, <liveness of <dynamically allocated int object>>)
>>>
>>>             Like, we're mapping every dynamically allocated object
>>>             to an imaginary heap location that represents its
>>>             current liveness, and include that imaginary variable,
>>>             rather than the object itself, in the slicing criterion.
>>>             Line 6 would affect liveness of the object on line 13
>>>             (if it's stored into a global, it's alive; otherwise
>>>             it's not), therefore it'll be part of the slice. So, am
>>>             i understanding correctly that you're proposing a
>>>             checker API that'll look like this:
>>>
>>>               BugReport *R = new BugReport(Node, ...);
>>>             R->addLivenessBasedSlicingCriterion(HeapSym);
>>>
>>>             ?
>>>
>>>             I guess we can infer the statement of the slicing
>>>             criterion from the Node, but i'm not entirely sure; see
>>>             also below.
>>>
>>>             I'd actually love it if you elaborate this example
>>>             further because it's fairly interesting. Like, we know
>>>             that the assignment affects the liveness information,
>>>             but how would the slicing algorithm figure this out? Do
>>>             you have a step-by-step description of how the algorithm
>>>             behaves in this case?
>>>
>>>             ------------------------
>>>
>>>             Let's take another example i just came up with. Consider
>>>             alpha.unix.PthreadLock - a checker that finds various
>>>             bugs with mutexes, such as double locks or double
>>>             unlocks. Consider code:
>>>
>>>             1  pthread_mutex_t mtx1, mtx2;
>>>             2
>>>             3  void foo(bool x1, bool x2) {
>>>             4    if (x1)
>>>             5      pthread_mutex_lock(&mtx1);
>>>             6    if (x2)
>>>             7      pthread_mutex_lock(&mtx1);
>>>             8    // ...
>>>             9  }
>>>
>>>             In this example we'll report a double lock bug due to a
>>>             copy-paste error: line 7 should probably lock &mtx2
>>>             rather than &mtx1.
>>>
>>>             The whole program is relevant to the report. Without
>>>             line 5, there would only be a single lock. It
>>>             transitions the mutex object from state "unknown" to
>>>             state "locked".
>>>
>>>             I don't think the Analyzer would currently do a bad job
>>>             at explaining the bug; it's pretty straightforward.
>>>
>>>             What would be the slicing criterion in this case? As far
>>>             as i understand, it'll be
>>>
>>>                 (7, <locked-ness of mtx1>)
>>>
>>>             In this case "lockedness" is, again, an "imaginary heap
>>>             location" that contains metadata for the mutex. More
>>>             specifically, it is a GDM map value. How would a checker
>>>             API look in this case? How would it even describe to the
>>>             BugReporter what to look at, given that currently only
>>>             the checker understands how to read its GDM values?
>>>
>>>             My random guess is:
>>>
>>>               BugReport *R = new BugReport(Node, ...);
>>>             R->addGDMBasedSlicingCriterion<MutexState>(MutexMR);
>>>
>>>             ------------------------
>>>
>>>             So it sounds to me that your project is, like many other
>>>             awesome projects, brings in, as a dependency, replacing
>>>             GDM with a better, smarter data map that the analyzer
>>>             core *can* introspect and manipulate without the direct
>>>             help from the checkers. It might be that your project
>>>             actually requires relatively little amounts of such
>>>             introspection - i.e., you might be able to get away with
>>>             "it's some GDM map and the value for this key has
>>>             changed, therefore this statement is a relevant data
>>>             dependency" for quite a lot of checkers.
>>>
>>>             That's the subject i was also bringing up as part of
>>>             https://reviews.llvm.org/D59861 - "this was the plan
>>>             that is part of the even-more-long-term plan of
>>>             completely deprecating the void*-based Generic Data
>>>             Map...". This was also something that became a blocker
>>>             on my old attempt to introduce summary-based analysis.
>>>             It required stuffing fairly large chunks of code into
>>>             every checker that would teach the checker how to
>>>             collect summary information and apply it when the
>>>             summary is applied, and this new boilerplate was fairly
>>>             brain-damaging to implement and hard to get right or
>>>             even to explain. This problem also blocks other
>>>             projects, such as state merging / loop widening (on one
>>>             branch the mutex is locked, on the other branch it is
>>>             unlocked, hey checker please teach me how to merge this).
>>>
>>>             The variety of slicing criteria (that is a consequence
>>>             of the variety of checkers that we have) sounds pretty
>>>             scary to me. Some of them are really complicated, eg.
>>>             the liveness criterion. Making sure that the generic
>>>             algorithm works correctly with at least a significant
>>>             chunk of them is going to be fun. Making sure it can
>>>             actually handle arbitrary criterions required by the
>>>             checkers without having every checker come with its own
>>>             boilerplate for slicing also sounds hard. And it'll be
>>>             very bad if we have to tell our checker developers "hey,
>>>             by the way, you also need to know what backward slicing
>>>             is and write down part of the algorithm in order to ever
>>>             get your checker enabled by default" - i'm already
>>>             feeling pretty bad explaining dead symbols and pointer
>>>             escapes (which are pretty much must-have in most
>>>             checkers), these are definitely examples of a
>>>             boilerplate that a smarter version of GDM could handle
>>>             automatically. In order to conquer the world, i think we
>>>             should stick to our "writing a checker in 24 hours"
>>>             utopia: writing a checker should be as easy as writing
>>>             down the transfer functions for relevant statements. In
>>>             my opinion, we should take it as an important constraint.
>>>
>>>
>>>
>>>
>>>             On 4/2/19 12:21 PM, Kristóf Umann wrote:
>>>>             Somehow the images and the attached files were left
>>>>             out, please find them here:
>>>>
>>>>             On Tue, 2 Apr 2019 at 21:16, Kristóf Umann
>>>>             <dkszelethus at gmail.com <mailto:dkszelethus at gmail.com>>
>>>>             wrote:
>>>>
>>>>                 Hi!
>>>>
>>>>                 In this letter, I'd like to describe a particular
>>>>                 problem in the Clang StaticAnalyzer's BugReporter
>>>>                 class that I'd like to tackle in a Google Summer of
>>>>                 Code project this summer. I'll show real-world
>>>>                 examples on some of its shortcomings, and propose a
>>>>                 potential solution using static backward program
>>>>                 slicing. At last, I'll ask some specific questions.
>>>>                 I'd love to hear any and all feedback on this!
>>>>
>>>>                 This is a /problem statement, /not a /proposal/. I
>>>>                 plan to send out a formal proposal by Friday (but
>>>>                 not later then Saturday), that will contain more
>>>>                 details on both the problem and the solution. I
>>>>                 don't introduce myself or my previous works within
>>>>                 the project, that will also be detailed in my
>>>>                 upcoming letter. I also plan to incorporate the
>>>>                 feedback I'll receive to this letter.
>>>>
>>>>                 *---=== BugReporter constructs bad reports ===---*
>>>>                 *
>>>>                 *
>>>>                 *What does the BugReporter do?*
>>>>
>>>>                 After the Static Analyzer found an error, the
>>>>                 BugReporter receives an ExplodedNode, which,
>>>>                 accompanied by its predecessors, contains all the
>>>>                 information needed to reproduce that error. This
>>>>                 /bugpath /is then shortened with a variety of
>>>>                 heuristics, such as removing unrelated function
>>>>                 calls, unrelated branches and so on. BugReporter by
>>>>                 the end of this process will construct a
>>>>                 PathDiagnostic object for each report, that is,
>>>>                 ideally, minimal.
>>>>                 *
>>>>                 *
>>>>                 *Example 1.*
>>>>
>>>>                 Consider the following code example:
>>>>
>>>>                 1 // leak.cpp
>>>>                 2 int *global_ptr;
>>>>                 3
>>>>                 4 void save_ext(int storeIt, int *ptr) {
>>>>                 5   if (storeIt)
>>>>                 6     global_ptr = ptr;
>>>>                 7 }
>>>>                 8
>>>>                 9 void test(int b) {
>>>>                 10  int *myptr = new int;
>>>>                 11  save_ext(b, myptr);
>>>>                 12  delete global_ptr;
>>>>                 13 }
>>>>
>>>>                 It's clear that if test is invoked with b's value
>>>>                 set to true, there is no error in the program.
>>>>                 However, should b be false, we'll leak memory.
>>>>
>>>>                 $ clang -cc1 -analyze
>>>>                 -analyzer-checker=core,cplusplus leak.cpp
>>>>                 image.png
>>>>
>>>>                 The Static Analyzer is able to catch this error,
>>>>                 but fails to mention the call to save_ext entirely,
>>>>                 despite the error only occurring because the
>>>>                 analyzer assumed that storeIt is false. I've also
>>>>                 attached the exploded graph leak.svg that
>>>>                 demonstrates this.
>>>>
>>>>                 *Example 2.*
>>>>
>>>>                 Consider the following code example:
>>>>
>>>>                 1 // divbyzero.cpp
>>>>                 2 void f() {
>>>>                 3   int i = 0;
>>>>                 4   (void) (10 / i);
>>>>                 5 }
>>>>                 6
>>>>                 7 void g() { f(); }
>>>>                 8 void h() { g(); }
>>>>                 9 void j() { h(); }
>>>>                 10 void k() { j(); }
>>>>
>>>>                 Its clear that a call to f will result in a
>>>>                 division by zero error.
>>>>
>>>>                 $ clang -cc1 -analyze -analyzer-checker=core
>>>>                 divbyzero.cpp
>>>>                 image.png
>>>>                 Again, the Static Analyzer is plenty smart enough
>>>>                 to catch this, but the constructed bug report is
>>>>                 littered with a lot of useless information -- it
>>>>                 would be enough to only show the body of f, and,
>>>>                 optionally where f itself was called. For the sake
>>>>                 of completeness, I've attached divbyzero.svg that
>>>>                 contains the exploded graph for the above code snippet.
>>>>
>>>>                 The above examples demonstrate that BugReporter
>>>>                 sometimes reduces the bugpath too much or too little.
>>>>
>>>>                 *---=== Solving the above problem with static
>>>>                 backward program slicing ===---*
>>>>
>>>>                 *What is static backward program slicing?*
>>>>
>>>>                 A /program slice/ consists of the parts of a
>>>>                 program that (potentially) affect the values
>>>>                 computed at some point of interest, called the
>>>>                 slicing criterion. /*Program slicing*/ is a
>>>>                 decomposition technique that elides program
>>>>                 components not relevant to the slicing criterion
>>>>                 (which is a pair of (statement, set of variables)),
>>>>                 creating a program slice[1][2]. *Static *slicing
>>>>                 preserves the meaning of the variable(s) in the
>>>>                 slicing criterion for all possible inputs[1].
>>>>                 *Backward *slices answer the question “what program
>>>>                 components might affect a selected computation?”[1]
>>>>
>>>>                 While statement-minimal slices are not necessarily
>>>>                 unique[3], Weisel developed a popular algorithm
>>>>                 that constructs one. In essence, his fix-point
>>>>                 algorithm constructs sets of /relevant
>>>>                 variables///for each edge in between node /i/ and
>>>>                 node /j/ in a CFG graph, from which he constructs
>>>>                 /relevant statements/. The fix-point of the
>>>>                 relevant statements set is the slice itself.
>>>>
>>>>                 One of the characteristic of his algorithm is that
>>>>                 the resulting program slice will be executable.
>>>>                 However, our problem doesn't require the code to be
>>>>                 executable, so we could use a more "aggressive"
>>>>                 approach that creates a smaller slice. An
>>>>                 improvement to his algorithm is presented in [4].
>>>>                 *
>>>>                 *
>>>>                 *How does this relate to BugReporter?*
>>>>
>>>>                 We can show that using Weiser's algorithm, issues
>>>>                 raised in Example 1. and Example 2. can be improved
>>>>                 upon.
>>>>
>>>>                 For example 1., the statement-minimal program slice
>>>>                 with the criterion (13, <dynamically allocated int
>>>>                 object>) will contain the statements 5 and 6, and
>>>>                 for example 2., the statement-minimal program slice
>>>>                 with the criterion (4,i) won't contain anything but
>>>>                 statements 3 and 4. For the latter, we can even
>>>>                 improve the algorithm to also contain statement 7,
>>>>                 where a call to f is made.
>>>>
>>>>                 The analyzer, as stated earlier, gives BugReporter
>>>>                 an ExplodedNode, from which the slicing criterion
>>>>                 must be constructed. The statement corresponding to
>>>>                 this node, coupled with the interesting regions the
>>>>                 checker itself marked could be used to construct
>>>>                 this slicing criterion.
>>>>
>>>>                 *Challenges*
>>>>
>>>>                 While the algorithm Weiser developed along with the
>>>>                 improvements made by others are interprocedural, I
>>>>                 would imagine that in implementation, it would be a
>>>>                 challenging step from an intraprocedural prototype.
>>>>
>>>>                 Several articles also describe pointers,
>>>>                 references, and dynamically allocated regions, as
>>>>                 well as gotos and other tricky parts of the
>>>>                 language, but I still expect to see some skeletons
>>>>                 falling out of the closet when implementing this
>>>>                 for C++, not only C.
>>>>
>>>>                 *Drawbacks*
>>>>
>>>>                 Static slicing, as an algorithm that doesn't know
>>>>                 anything about input values, suffers from the same
>>>>                 issues that all static techniques do, meaning that
>>>>                 without heuristics, it'll have to do very rough
>>>>                 guesses, possibly leaving a far too big program
>>>>                 slice. However, with the symbolic values the
>>>>                 analyzer holds, this could be improved, turning
>>>>                 this into /conditioned slicing, /as described in
>>>>                 [1]. This however is only planned as a followup
>>>>                 work after GSoC.
>>>>
>>>>                 For this reason, this project would be developed as
>>>>                 an alternative approach to some of the techniques
>>>>                 used in BugReporter, as an optional off-by-default
>>>>                 analyzer feature.
>>>>
>>>>                 *---=== Questions ===---*
>>>>
>>>>                 What do you think of this approach?
>>>>                 Do you think that implementing this algorithm is
>>>>                 achievable, but tough enough task for GSoC?
>>>>                 Would you prefer to see a general program slicing
>>>>                 library, or an analyzer-specific implementation?
>>>>                 Traversing the ExplodedGraph would be far easier in
>>>>                 terms of what I want to achieve, but a more general
>>>>                 approach that traverses the CFG (like
>>>>                 llvm::DominatorTree[5]) could be beneficial to more
>>>>                 developers, but possibly at the cost of not being
>>>>                 able to improve the prototype with the symbolic
>>>>                 value information the analyzer holds.
>>>>
>>>>                 */References, links/*
>>>>
>>>>                 [1] Gallagher, Keith, and David Binkley. "Program
>>>>                 slicing." /2008 Frontiers of Software Maintenance/.
>>>>                 [2] Tip, Frank. /A survey of program slicing
>>>>                 techniques/. Centrum voor Wiskunde en Informatica,
>>>>                 1994.
>>>>                 [3] Weiser, Mark. "Program slicing." /Proceedings
>>>>                 of the 5th international conference on Software
>>>>                 engineering/. IEEE Press, 1981.
>>>>                 [4] Binkley, David. "Precise executable
>>>>                 interprocedural slices." /ACM Letters on
>>>>                 Programming Languages and Systems (LOPLAS)/ 2.1-4
>>>>                 (1993): 31-45.
>>>>                 [5]
>>>>                 http://llvm.org/doxygen/classllvm_1_1DominatorTree.html
>>>>
>>>>                 Link to previous GSoC related letter I sent:
>>>>                 http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html
>>>>
>>>>                 Cheers,
>>>>                 Kristóf Umann
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190403/b5ce8067/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 126147 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190403/b5ce8067/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 171712 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190403/b5ce8067/attachment-0001.png>