<div dir="ltr"><div>Thank you for giving this this much thought!!! I reply somewhat slowly because I'm trying to keep things fairly formal and precise, an area in which I still have to improve on.</div><div dir="ltr"><div class="gmail_quote"><div><div><br></div><div>So, in short, Example 1. was a poor choice. You're right, the <i>value </i>of <font face="monospace, monospace"><dynamically allocated int object></font> isn't affected by what points to it -- using the traditional Weiser algorithm wouldn't contain statement 5 and 6. In a sense, your mutex code is similar to Example 1, we're not interested, at least in this case, what the actual value of the mutex is.</div></div><div><br></div><div>For now, I'm only planning to use this technique to reason about <i>values </i>of certain variables. Example 1. wanted to demonstrate how BugReporter is prone to make a too short of a bugpath -- however, the following example can show this as well:<br><br></div><div><b>asd.cpp</b></div><div><div><font face="monospace, monospace">1 void useInt(int);</font></div><div><font face="monospace, monospace">2 </font></div><div><font face="monospace, monospace">3 int getInt(int x) {</font></div><div><font face="monospace, monospace">4 int a;</font></div><div><font face="monospace, monospace">5</font></div><div><font face="monospace, monospace">6 if (x > 0)</font></div><div><font face="monospace, monospace">7 a = 3;</font></div><div><font face="monospace, monospace">8 else</font></div><div><font face="monospace, monospace">9 a = 2;</font></div><div><font face="monospace, monospace">10</font></div><div><font face="monospace, monospace">11 return a;</font></div><div><font face="monospace, monospace">12 }</font></div><div><font face="monospace, monospace">13</font></div><div><font face="monospace, monospace">14 int g();</font></div><div><font face="monospace, monospace">15</font></div><div><font face="monospace, monospace">16 int main() {</font></div><div><font face="monospace, monospace">17 int arr[10];</font></div><div><font face="monospace, monospace">18</font></div><div><font face="monospace, monospace">19 for (int i = 0; i < 3; ++i)</font></div><div><font face="monospace, monospace">20 arr[i] = 0;</font></div><div><font face="monospace, monospace">21</font></div><div><font face="monospace, monospace">22 int x = g();</font></div><div><font face="monospace, monospace">23 int n = getInt(x);</font></div><div><font face="monospace, monospace">24 useInt(arr[n]);</font></div><div><font face="monospace, monospace">25 }</font></div></div><div><font face="monospace, monospace"><br></font></div><div><div><img src="cid:ii_ju1o03pw0" alt="image.png" width="560" height="555"><br></div></div><div><br></div><div>The attached ExplodedGraph demonstrates that the analyzer assumed that<font face="monospace, monospace"> n == 3</font>, but the bugreport doesn't mention that.</div><div><br></div><div>Alright, with my <i>original point </i>being made properly, let's address the raised questions.</div><div><div><br></div></div><div><b>Q: </b>My primary question is how much do you think this will put stress on the checkers. Like, how much more difficult would it be to write checkers when we require them to include the slicing criterion with their bug reports? How much of it would we (i.e., BugReporter) be able to infer ourselves?</div><div><b><br></b></div><div><b>A: </b>As stated in this response, this approach would only consider the value of variables. This could essentially be regarded as a more general, better approach to <font face="monospace, monospace">bugreporter::trackNullOrUndefValue()</font> and similar functions. It would be great however to make this general enough to eventually also handle whatever properties (e.g. lockedness) regions might have. My worry is that it would offer little gain for a lot of chore, the current <font face="monospace, monospace">BugReporterVisitor</font> system (especially with your easier-to-implement <font face="monospace, monospace">NoteTag</font> system) gets the job done just fine. Most of it, as I know, boils down to "Let's find the specific <font face="monospace, monospace">ExplodedNode</font> where the property of this region was changed", and a simple traversal of the bugpath is all you need for that.</div><div><br></div><div>With that being said, I plan to construct the set of variables from the interesting symbol set, and this API is already present in the code. We'll see how that goes though :^).</div><div><br></div><div>However, liveness of variables is a very interesting topic. I really need to do some research on this before elaborating more, but I'll try to see how we could tackle this with backward slicing.</div><div><br><b>Q:</b> Instead we can also have a look at the execution path on the ExplodedGraph that goes through the true-branch and see if the value remains null when it exits the if-branch. Kristof, you're saying that you plan to do the slicing over the ExplodedGraph - is that what you mean? This could work as long as the other branch is actually *explored* by the analyzer. If it isn't, it'll be almost impossible to detect, and i don't know how often would this happen, but there's a good chance it's going to be much more rare than us having problems with such highlighting right now.</div><div><div><br></div><div>We can also come up with a completely separate CFG-based analysis for this purpose. This is probably the most precise thing to do, because the problem is essentially an all-paths problem (which is why loss of coverage in the ExplodedGraph would bite us). I cannot estimate how difficult such analysis would be to implement (probably pretty scary), but i think Kristof wasn't looking in this direction.<br></div><br class="gmail-Apple-interchange-newline"></div><div><b>A: </b>Well, I originally imagined it to implement this on the ExplodedGraph, but if achievable, a CFG based solution would indeed be the ideal thing. This is some great feedback -- Again, I'll follow up with this after some more research.<br><br></div><div>-----------------</div><div><br></div><div>In the meanwhile (pretty much before I'm done with the proposal), I won't have the time to interact with Phabricator much though :^)<br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 3 Apr 2019 at 23:40, Gábor Horváth <<a href="mailto:xazax.hun@gmail.com">xazax.hun@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I just realized after reading your answer how ambiguous my email was. When I said it is possible to trick the NoStoreFuncVisitor what I meant was to prevent it from emitting notes and thus pruning interesting parts of the bug path. But I am glad that we are actually on the same page regarding the value of detecting control dependencies. The example 4 you provided is a very compelling one :)</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 3 Apr 2019 at 23:16, Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
The thing in NoStoreFuncVisitor for IVars looks like it's just
saying "let's see if there's a data dependency on at least one
statement in this call". Given that the statement found by such AST
matcher was not executed on the bug's execution path, there must be
at least one control flow dependency in this function as well.<br>
<br>
I think this pattern-matching may only have false negatives, i.e.
it's possible to write into a variable/field/ivar without producing
an assignement-operator to a DeclRefExpr for that
variable/field/ivar, but it's not possible to write an assignment
into a DeclRefExpr for that variable/field/ivar without modifying
the variable/field/ivar in the process. A more sophisticated
analysis may find situations when such write is always done via an
aliasing pointer, but i cannot think of anything else. Well, there's
also the self-assignment cornercase, i.e. `x = x`, but that's
esoteric.<br>
<br>
------------------------<br>
<br>
Ok, i think i understand: one of the great goals for this project
would be to correctly highlight control flow dependencies, which we
currently don't do. I.e.:<br>
<br>
<b>Example 4.</b><br>
<br>
01 int flag;<br>
02 <br>
03 bool coin();<br>
04 <br>
05 void foo() {<br>
06 flag = coin();<br>
07 }<br>
08 <br>
09 void bar() {<br>
10 int *x = 0;<br>
11 // Set the flag to true.<br>
12 flag = true;<br>
13 foo();<br>
14 if (flag) { // Taking false branch... wait, what?<br>
15 x = new int;<br>
16 }<br>
17 foo();<br>
18 if (flag) { // Now it's taking true branch again?!<br>
19 *x = 1; // Null dereference.<br>
20 }<br>
21 }<br>
<br>
Because we wouldn't highlight the data dependency on line 06 of the
control flow dependency on lines 14 and 18, the positive may be
incomprehensible.<br>
<br>
This one's tricky to do with a visitor because it's hard to figure
out that the true-branch of the if-statement on line 14 would have
updated `x`, given that we didn't execute it.<br>
<br>
(1) Again, we can do a syntactic match over the true-branch, like we
did in NoStoreFuncVisitor, to see if there are any data depenencies
within it (there are, line 15). Again, this may fail to cover the
tricky cases (what if this branch instead calls a function that
initializes `x`?). <br>
<br>
(2) Instead we can also have a look at the execution path on the
ExplodedGraph that goes through the true-branch and see if the value
remains null when it exits the if-branch. Kristof, you're saying
that you plan to do the slicing over the ExplodedGraph - is that
what you mean? This could work as long as the other branch is
actually *explored* by the analyzer. If it isn't, it'll be almost
impossible to detect, and i don't know how often would this happen,
but there's a good chance it's going to be much more rare than us
having problems with such highlighting right now.<br>
<br>
(3) We can also come up with a completely separate CFG-based
analysis for this purpose. This is probably the most precise thing
to do, because the problem is essentially an all-paths problem
(which is why loss of coverage in the ExplodedGraph would bite us).
I cannot estimate how difficult such analysis would be to implement
(probably pretty scary), but i think Kristof wasn't looking in this
direction.<br>
<br>
Yay, i finally understand how to solve these problems we have!<br>
<br>
I'd suggest first starting with a syntax-based approach (1) because
it sounds to me that the approach that we chose for identifying
dependencies on paths that aren't part of the bug path is orthogonal
to actually making use of that information to produce notes. Once we
learn how to make use of this information, we'll have a taste of
this medicine and see if it works. Then we'll see if we need to
transition to (2) or (3) depending on such evaluation.<br>
<br>
What i just described sounds like a fairly realistic plan to me. Of
course, the question still remains about how checker-specific would
the analysis be. If i learned anything, it's "don't estimate the
power of the esoteric checker contracts". But the idea that we can
often get away with just tracking interesting regions and symbols
and consuming trackExpressionValue() hints sounds like a good thing
to try.<br>
<br>
<br>
<br>
<div class="gmail-m_1040859510329759042gmail-m_3651988806800932803moz-cite-prefix">On 4/2/19 11:23 PM, Gábor Horváth
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Yeah, this particular example would be fixed, but we do not
know how noisy it would be in general. One way to make it a
bit less noisy could be something like what the
NoStoreFuncVisitor already does for IVars. We could also add
some syntactic checks to see if the method/function actually
has the possibility to escape the region. <br>
</div>
<div><br>
</div>
<div>I am not familiar with Obj-C, but I suspect it would be
possible to trick NoStoreFuncVisitor, or at least the
syntactic checks in `<span class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-pl-en">potentiallyWritesIntoIvar</span>
`.</div>
<div>I think the main value of slicing here could be to get rid
of those fully syntactic heuristics and replace them by
something smarter. Does this make sense?<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 3 Apr 2019 at 02:25,
Artem Dergachev <<a href="mailto:noqnoqneo@gmail.com" target="_blank">noqnoqneo@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> One more question about Example 1.
Do you think this particular example could be solved by
developing some sort of NoStoreFuncVisitor but for
MallocChecker? I.e., in addition to emitting notes like
"returning without initializing variable `x`", it could
emit notes like "returning without escaping pointer `p`".
If such visitor is developed, what would be the
differences between your proposal and whatever behavior
we'll get with such visitor? In other words, is it
possible to construct an example with the "use of
uninitialized variable" checker that, like Example 1, has
vital pieces of information missing, given that this
checker already uses NoStoreFuncVisitor?<br>
<br>
I mean, it's most likely a terrible idea to develop such
visitor for MallocChecker, because the only reason this
visitor works more or less reliably is the existence of
const-qualifiers. For MallocChecker they don't exist, so
it's going to be hard to figure out if a function was
*supposed* to escape the pointer.<br>
<br>
<div class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944moz-cite-prefix">On
4/2/19 5:14 PM, Artem Dergachev wrote:<br>
</div>
<blockquote type="cite"> Ahaaaaa, ooooookkkkk, iiiiiiii
seeeeeeeeee!<br>
<br>
Sounds fascinating, must-have and hard.<br>
<br>
My primary question is how much do you think this will
put stress on the checkers. Like, how much more
difficult would it be to write checkers when we require
them to include the slicing criterion with their bug
reports? How much of it would we (i.e., BugReporter) be
able to infer ourselves?<br>
<br>
------------------------<br>
<br>
Say, let's take Example 1. You describe the slicing
criterion as:<br>
<br>
(13, <dynamically allocated int object>)<br>
<br>
The value of the dynamically allocated into object
remains the same regardless of whether the object is
stored in global_ptr or not, so the slice doesn't need
to include line 5 or 6. Therefore i think that the
slicing criterion you proposed is not what we're looking
for, and the real slicing criterion we're looking for
is:<br>
<br>
(13, <liveness of <dynamically allocated int
object>>)<br>
<br>
Like, we're mapping every dynamically allocated object
to an imaginary heap location that represents its
current liveness, and include that imaginary variable,
rather than the object itself, in the slicing criterion.
Line 6 would affect liveness of the object on line 13
(if it's stored into a global, it's alive; otherwise
it's not), therefore it'll be part of the slice. So, am
i understanding correctly that you're proposing a
checker API that'll look like this:<br>
<br>
BugReport *R = new BugReport(Node, ...);<br>
R->addLivenessBasedSlicingCriterion(HeapSym);<br>
<br>
?<br>
<br>
I guess we can infer the statement of the slicing
criterion from the Node, but i'm not entirely sure; see
also below.<br>
<br>
I'd actually love it if you elaborate this example
further because it's fairly interesting. Like, we know
that the assignment affects the liveness information,
but how would the slicing algorithm figure this out? Do
you have a step-by-step description of how the algorithm
behaves in this case?<br>
<br>
------------------------<br>
<br>
Let's take another example i just came up with. Consider
alpha.unix.PthreadLock - a checker that finds various
bugs with mutexes, such as double locks or double
unlocks. Consider code:<br>
<br>
1 pthread_mutex_t mtx1, mtx2;<br>
2<br>
3 void foo(bool x1, bool x2) {<br>
4 if (x1)<br>
5 pthread_mutex_lock(&mtx1);<br>
6 if (x2)<br>
7 pthread_mutex_lock(&mtx1);<br>
8 // ...<br>
9 }<br>
<br>
In this example we'll report a double lock bug due to a
copy-paste error: line 7 should probably lock &mtx2
rather than &mtx1.<br>
<br>
The whole program is relevant to the report. Without
line 5, there would only be a single lock. It
transitions the mutex object from state "unknown" to
state "locked".<br>
<br>
I don't think the Analyzer would currently do a bad job
at explaining the bug; it's pretty straightforward.<br>
<br>
What would be the slicing criterion in this case? As far
as i understand, it'll be<br>
<br>
(7, <locked-ness of mtx1>)<br>
<br>
In this case "lockedness" is, again, an "imaginary heap
location" that contains metadata for the mutex. More
specifically, it is a GDM map value. How would a checker
API look in this case? How would it even describe to the
BugReporter what to look at, given that currently only
the checker understands how to read its GDM values?<br>
<br>
My random guess is:<br>
<br>
BugReport *R = new BugReport(Node, ...);<br>
R->addGDMBasedSlicingCriterion<MutexState>(MutexMR);<br>
<br>
------------------------<br>
<br>
So it sounds to me that your project is, like many other
awesome projects, brings in, as a dependency, replacing
GDM with a better, smarter data map that the analyzer
core *can* introspect and manipulate without the direct
help from the checkers. It might be that your project
actually requires relatively little amounts of such
introspection - i.e., you might be able to get away with
"it's some GDM map and the value for this key has
changed, therefore this statement is a relevant data
dependency" for quite a lot of checkers.<br>
<br>
That's the subject i was also bringing up as part of <a class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944moz-txt-link-freetext" href="https://reviews.llvm.org/D59861" target="_blank">https://reviews.llvm.org/D59861</a>
- "this was the plan that is part of the
even-more-long-term plan of completely deprecating the
void*-based Generic Data Map...". This was also
something that became a blocker on my old attempt to
introduce summary-based analysis. It required stuffing
fairly large chunks of code into every checker that
would teach the checker how to collect summary
information and apply it when the summary is applied,
and this new boilerplate was fairly brain-damaging to
implement and hard to get right or even to explain. This
problem also blocks other projects, such as state
merging / loop widening (on one branch the mutex is
locked, on the other branch it is unlocked, hey checker
please teach me how to merge this).<br>
<br>
The variety of slicing criteria (that is a consequence
of the variety of checkers that we have) sounds pretty
scary to me. Some of them are really complicated, eg.
the liveness criterion. Making sure that the generic
algorithm works correctly with at least a significant
chunk of them is going to be fun. Making sure it can
actually handle arbitrary criterions required by the
checkers without having every checker come with its own
boilerplate for slicing also sounds hard. And it'll be
very bad if we have to tell our checker developers "hey,
by the way, you also need to know what backward slicing
is and write down part of the algorithm in order to ever
get your checker enabled by default" - i'm already
feeling pretty bad explaining dead symbols and pointer
escapes (which are pretty much must-have in most
checkers), these are definitely examples of a
boilerplate that a smarter version of GDM could handle
automatically. In order to conquer the world, i think we
should stick to our "writing a checker in 24 hours"
utopia: writing a checker should be as easy as writing
down the transfer functions for relevant statements. In
my opinion, we should take it as an important
constraint.<br>
<br>
<br>
<br>
<br>
<div class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944moz-cite-prefix">On
4/2/19 12:21 PM, Kristóf Umann wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">Somehow the images and the attached
files were left out, please find them here:</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 2 Apr
2019 at 21:16, Kristóf Umann <<a href="mailto:dkszelethus@gmail.com" target="_blank">dkszelethus@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><span style="background-color:rgb(255,255,255)"><font color="#000000">Hi!<br>
<br>
</font></span>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">In this
letter, I'd like to describe a
particular problem in the
Clang StaticAnalyzer's <font face="monospace, monospace">BugReporter
</font>class that I'd like to
tackle in a Google Summer of
Code project this summer. I'll
show real-world examples on
some of its shortcomings, and
propose a potential solution
using static backward program
slicing. At last, I'll ask
some specific questions. I'd
love to hear any and all
feedback on this!</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">This is a <i>problem
statement, </i>not a <i>proposal</i>.
I plan to send out a formal
proposal by Friday (but not
later then Saturday), that
will contain more details on
both the problem and the
solution. I don't introduce
myself or my previous works
within the project, that will
also be detailed in my
upcoming letter. I also plan
to incorporate the feedback
I'll receive to this letter.</font></span></div>
</div>
<div dir="ltr"><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><b><font style="background-color:rgb(255,255,255)" size="4" color="#000000">---===
BugReporter constructs bad
reports ===---</font></b></div>
<div><b><font style="background-color:rgb(255,255,255)" size="4" color="#000000"><br>
</font></b></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><b>What does the
BugReporter do?</b><br>
<br>
After the Static Analyzer found
an error, the <font face="monospace, monospace">BugReporter
</font>receives an <font face="monospace, monospace">ExplodedNode</font>,
which, accompanied by its
predecessors, contains all the
information needed to reproduce
that error. This <i>bugpath </i>is
then shortened with a variety of
heuristics, such as removing
unrelated function calls,
unrelated branches and so on. <font face="monospace, monospace">BugReporter
</font>by the end of this
process will construct a <font face="monospace, monospace">PathDiagnostic
</font>object for each report,
that is, ideally, minimal.</font></span></div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></b></div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000">Example 1.</font></b></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Consider the
following code example:<br>
<br>
</font></span>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">1 // leak.cpp</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">2 int
*global_ptr;</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">3 </font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">4 void
save_ext(int storeIt, int *ptr)
{</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">5 if
(storeIt)</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">6
global_ptr = ptr;</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">7 }</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">8</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">9 void test(int
b) {</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">10 int *myptr
= new int;</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">11 save_ext(b,
myptr);</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">12 delete
global_ptr;</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">13 }</font></div>
</div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div dir="ltr">
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">It's clear
that if test is invoked with <font face="monospace, monospace">b</font>'s
value set to true, there is no
error in the program. However,
should b be false, we'll leak
memory.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<span style="font-family:monospace,monospace;background-color:rgb(255,255,255)"><font color="#000000">$ clang -cc1
-analyze
-analyzer-checker=core,cplusplus
leak.cpp</font></span></div>
<div dir="ltr">
<div><img src="cid:169e5205a23cb971f161" alt="image.png" width="472" height="168"><br>
</div>
</div>
<div dir="ltr"><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000"><br>
</font>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">The Static
Analyzer is able to catch this
error, but fails to mention
the call to <font face="monospace, monospace">save_ext</font> entirely,
despite the error only
occurring because the analyzer
assumed that <font face="monospace, monospace">storeIt</font> is
false. I've also attached the
exploded graph <font face="monospace, monospace">leak.svg </font><font face="arial, helvetica,
sans-serif">that
demonstrates this.</font></font></span></div>
<div><font style="background-color:rgb(255,255,255)" face="arial, helvetica,
sans-serif" color="#000000"><br>
</font></div>
<div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000">Example 2.</font></b></div>
</div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Consider the
following code example:<br>
<br>
<font face="monospace,
monospace">1 //
divbyzero.cpp</font><br>
</font></span>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">2 void f() {</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">3 int i =
0;</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">4 (void)
(10 / i);</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">5 }</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">6</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">7 void g() {
f(); }</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">8 void h() {
g(); }</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">9 void j() {
h(); }</font></div>
<div><font style="background-color:rgb(255,255,255)" face="monospace, monospace" color="#000000">10 void k() {
j(); }</font></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944m_6421736142923902491gmail-m_-3454961580602141710gmail-m_-6979117027997739172gmail-m_4426039388547674777gmail-m_5439174890474816058gmail-m_6038659214782379119gmail-Apple-interchange-newline">
Its clear that a call to f
will result in a division by
zero error.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<span style="background-color:rgb(255,255,255)"><font color="#000000"><span style="font-family:monospace,monospace">$
clang -cc1 -analyze
-analyzer-checker=core </span><span style="font-family:monospace,monospace">divbyzero.cpp</span></font></span></div>
<div><img src="cid:169e5205a24cb971f162" alt="image.png" width="334" height="472"><br>
</div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Again, the
Static Analyzer is plenty
smart enough to catch this,
but the constructed bug report
is littered with a lot of
useless information -- it
would be enough to only show
the body of f, and, optionally
where f itself was called. For
the sake of completeness, I've
attached <font face="monospace, monospace">divbyzero.svg</font> that
contains the exploded graph
for the above code snippet.<br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">The above
examples demonstrate that <font face="monospace, monospace">BugReporter
</font>sometimes reduces the
bugpath too much or too
little.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><b><font style="background-color:rgb(255,255,255)" size="4" color="#000000">---===
Solving the above problem with
static backward program
slicing ===---</font></b></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000">What is static
backward program slicing?</font></b></div>
<div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">A <i>program
slice</i> consists of the
parts of a program that
(potentially) affect the
values computed at some
point of interest, called
the slicing criterion. <i><b>Program
slicing</b></i> is a
decomposition technique that
elides program components
not relevant to the slicing
criterion (which is a pair
of (statement, set of
variables)), creating a
program slice[1][2]. <b>Static </b>slicing
preserves the meaning of the
variable(s) in the slicing
criterion for all possible
inputs[1]. <b>Backward </b>slices
answer the question “what
program components might
affect a selected
computation?”[1]</font></span></div>
</div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">While
statement-minimal slices are
not necessarily unique[3],
Weisel developed a popular
algorithm that constructs one.
In essence, his fix-point
algorithm constructs sets of <i>relevant
variables</i><i> </i>for
each edge in between node <i>i</i> and
node <i>j</i> in a CFG graph,
from which he constructs <i>relevant
statements</i>. The
fix-point of the relevant
statements set is the slice
itself.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">One of the
characteristic of his
algorithm is that the
resulting program slice will
be executable. However, our
problem doesn't require the
code to be executable, so we
could use a more "aggressive"
approach that creates a
smaller slice. An improvement
to his algorithm is presented
in [4].</font></span></div>
<div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></b></div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000">How does
this relate to BugReporter?</font></b></div>
</div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">We can show
that using Weiser's algorithm,
issues raised in Example 1.
and Example 2. can be improved
upon.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">For example
1., the statement-minimal
program slice with the
criterion (13, <font face="monospace, monospace"><dynamically
allocated int object></font>)
will contain the statements 5
and 6, and for example 2., the
statement-minimal program
slice with the criterion (4,<font face="monospace, monospace">
i</font>) won't contain
anything but statements 3 and
4. For the latter, we can even
improve the algorithm to also
contain statement 7, where a
call to f is made.</font></span></div>
<span style="background-color:rgb(255,255,255)"><font color="#000000"><br class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944m_6421736142923902491gmail-m_-3454961580602141710gmail-m_-6979117027997739172gmail-m_4426039388547674777gmail-Apple-interchange-newline">
The analyzer, as stated earlier,
gives <font face="monospace,
monospace">BugReporter </font>an
<font face="monospace,
monospace">ExplodedNode</font>,
from which the slicing criterion
must be constructed. The
statement corresponding to this
node, coupled with the
interesting regions the checker
itself marked could be used to
construct this slicing
criterion.</font></span>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><b>Challenges</b><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">While the
algorithm Weiser developed
along with the improvements
made by others are
interprocedural, I would
imagine that in
implementation, it would be a
challenging step from an
intraprocedural prototype.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Several
articles also describe
pointers, references, and
dynamically allocated regions,
as well as gotos and other
tricky parts of the language,
but I still expect to see some
skeletons falling out of the
closet when implementing this
for C++, not only C.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><b style="background-color:rgb(255,255,255)"><font color="#000000">Drawbacks</font></b></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Static
slicing, as an algorithm that
doesn't know anything about
input values, suffers from the
same issues that all static
techniques do, meaning that
without heuristics, it'll have
to do very rough guesses,
possibly leaving a far too big
program slice. However, with
the symbolic values the
analyzer holds, this could be
improved, turning this into <i>conditioned
slicing, </i>as described in
[1]. This however is only
planned as a followup work
after GSoC.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">For this
reason, this project would be
developed as an alternative
approach to some of the
techniques used in <font face="monospace, monospace">BugReporter</font>,
as an optional off-by-default
analyzer feature.</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><font size="4" color="#000000"><b style="background-color:rgb(255,255,255)">---=== Questions ===---</b></font></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">What do you
think of this approach?</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Do you think
that implementing this
algorithm is achievable, but
tough enough task for GSoC?</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Would you
prefer to see a general
program slicing library, or an
analyzer-specific
implementation? Traversing the
<font face="monospace,
monospace">ExplodedGraph </font>would
be far easier in terms of what
I want to achieve, but a more
general approach that
traverses the CFG (like <font face="monospace, monospace">llvm::DominatorTree</font><font face="arial, helvetica,
sans-serif">[5]) could be
beneficial to more
developers, but possibly at
the cost of not being able
to improve the prototype
with the symbolic value
information the analyzer
holds.</font></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><b><i style="background-color:rgb(255,255,255)"><font color="#000000">References,
links</font></i></b></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">[1] <span style="font-size:13px;font-family:Arial,sans-serif">Gallagher,
Keith, and David Binkley.
"Program slicing." </span><i style="font-size:13px;font-family:Arial,sans-serif">2008 Frontiers of
Software Maintenance</i><span style="font-size:13px;font-family:Arial,sans-serif">.</span></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><span style="font-size:13px;font-family:Arial,sans-serif">[2] </span><span style="font-size:13px;font-family:Arial,sans-serif">Tip, Frank. </span><i style="font-size:13px;font-family:Arial,sans-serif">A survey of program
slicing techniques</i><span style="font-size:13px;font-family:Arial,sans-serif">. Centrum voor
Wiskunde en Informatica,
1994.</span></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><span style="font-size:13px;font-family:Arial,sans-serif">[3] </span><span style="font-size:13px;font-family:Arial,sans-serif">Weiser, Mark.
"Program slicing." </span><i style="font-size:13px;font-family:Arial,sans-serif">Proceedings of the
5th international conference
on Software engineering</i><span style="font-size:13px;font-family:Arial,sans-serif">. IEEE Press, 1981.</span></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><span style="font-size:13px;font-family:Arial,sans-serif">[4] </span><span style="font-size:13px;font-family:Arial,sans-serif">Binkley, David.
"Precise executable
interprocedural slices." </span><i style="font-size:13px;font-family:Arial,sans-serif">ACM Letters on
Programming Languages and
Systems (LOPLAS)</i><span style="font-size:13px;font-family:Arial,sans-serif"> 2.1-4
(1993): 31-45.</span></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><span style="font-size:13px;font-family:Arial,sans-serif">[5] </span><font face="Arial, sans-serif"><a href="http://llvm.org/doxygen/classllvm_1_1DominatorTree.html" target="_blank">http://llvm.org/doxygen/classllvm_1_1DominatorTree.html</a></font></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Link to
previous GSoC related letter I
sent: <a href="http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html" target="_blank">http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html</a></font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000"><br>
</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Cheers,</font></span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000">Kristóf Umann</font></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div></div>
</blockquote></div>