<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
If i understand correctly (hmm, there seem to be image problems
again), in this bug report it's about doing trackNullOrUndefValue()
over `n` whenever we track the uninitialized value back to a load
from an array with index `n` - i think it easily gets us straight to
`a = 3` and it doesn't require introducing any new analyses.<br>
<br>
<div class="moz-cite-prefix">On 4/3/19 3:35 PM, Kristóf Umann wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAGcXOD5kqur+_Q1aX8FPBZ8Jv51gWsU1TKpZVury7kM1SzzEiw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>Thank you for giving this this much thought!!! I reply
somewhat slowly because I'm trying to keep things fairly
formal and precise, an area in which I still have to improve
on.</div>
<div dir="ltr">
<div class="gmail_quote">
<div>
<div><br>
</div>
<div>So, in short, Example 1. was a poor choice. You're
right, the <i>value </i>of <font face="monospace,
monospace"><dynamically allocated int object></font> isn't
affected by what points to it -- using the traditional
Weiser algorithm wouldn't contain statement 5 and 6. In
a sense, your mutex code is similar to Example 1, we're
not interested, at least in this case, what the actual
value of the mutex is.</div>
</div>
<div><br>
</div>
<div>For now, I'm only planning to use this technique to
reason about <i>values </i>of certain variables. Example
1. wanted to demonstrate how BugReporter is prone to make
a too short of a bugpath -- however, the following example
can show this as well:<br>
<br>
</div>
<div><b>asd.cpp</b></div>
<div>
<div><font face="monospace, monospace">1 void
useInt(int);</font></div>
<div><font face="monospace, monospace">2 </font></div>
<div><font face="monospace, monospace">3 int getInt(int
x) {</font></div>
<div><font face="monospace, monospace">4 int a;</font></div>
<div><font face="monospace, monospace">5</font></div>
<div><font face="monospace, monospace">6 if (x > 0)</font></div>
<div><font face="monospace, monospace">7 a = 3;</font></div>
<div><font face="monospace, monospace">8 else</font></div>
<div><font face="monospace, monospace">9 a = 2;</font></div>
<div><font face="monospace, monospace">10</font></div>
<div><font face="monospace, monospace">11 return a;</font></div>
<div><font face="monospace, monospace">12 }</font></div>
<div><font face="monospace, monospace">13</font></div>
<div><font face="monospace, monospace">14 int g();</font></div>
<div><font face="monospace, monospace">15</font></div>
<div><font face="monospace, monospace">16 int main() {</font></div>
<div><font face="monospace, monospace">17 int arr[10];</font></div>
<div><font face="monospace, monospace">18</font></div>
<div><font face="monospace, monospace">19 for (int i =
0; i < 3; ++i)</font></div>
<div><font face="monospace, monospace">20 arr[i] = 0;</font></div>
<div><font face="monospace, monospace">21</font></div>
<div><font face="monospace, monospace">22 int x = g();</font></div>
<div><font face="monospace, monospace">23 int n =
getInt(x);</font></div>
<div><font face="monospace, monospace">24
useInt(arr[n]);</font></div>
<div><font face="monospace, monospace">25 }</font></div>
</div>
<div><font face="monospace, monospace"><br>
</font></div>
<div>
<div><img src="cid:ii_ju1o03pw0" alt="image.png"
moz-do-not-send="true" width="560" height="555"><br>
</div>
</div>
<div><br>
</div>
<div>The attached ExplodedGraph demonstrates that the
analyzer assumed that<font face="monospace, monospace"> n
== 3</font>, but the bugreport doesn't mention that.</div>
<div><br>
</div>
<div>Alright, with my <i>original point </i>being made
properly, let's address the raised questions.</div>
<div>
<div><br>
</div>
</div>
<div><b>Q: </b>My primary question is how much do you think
this will put stress on the checkers. Like, how much more
difficult would it be to write checkers when we require
them to include the slicing criterion with their bug
reports? How much of it would we (i.e., BugReporter) be
able to infer ourselves?</div>
<div><b><br>
</b></div>
<div><b>A: </b>As stated in this response, this approach
would only consider the value of variables. This could
essentially be regarded as a more general, better approach
to <font face="monospace, monospace">bugreporter::trackNullOrUndefValue()</font> and
similar functions. It would be great however to make this
general enough to eventually also handle whatever
properties (e.g. lockedness) regions might have. My worry
is that it would offer little gain for a lot of chore, the
current <font face="monospace, monospace">BugReporterVisitor</font> system
(especially with your easier-to-implement <font
face="monospace, monospace">NoteTag</font> system) gets
the job done just fine. Most of it, as I know, boils down
to "Let's find the specific <font face="monospace,
monospace">ExplodedNode</font> where the property of
this region was changed", and a simple traversal of the
bugpath is all you need for that.</div>
<div><br>
</div>
<div>With that being said, I plan to construct the set of
variables from the interesting symbol set, and this API is
already present in the code. We'll see how that goes
though :^).</div>
<div><br>
</div>
<div>However, liveness of variables is a very interesting
topic. I really need to do some research on this before
elaborating more, but I'll try to see how we could tackle
this with backward slicing.</div>
<div><br>
<b>Q:</b> Instead we can also have a look at the execution
path on the ExplodedGraph that goes through the
true-branch and see if the value remains null when it
exits the if-branch. Kristof, you're saying that you plan
to do the slicing over the ExplodedGraph - is that what
you mean? This could work as long as the other branch is
actually *explored* by the analyzer. If it isn't, it'll be
almost impossible to detect, and i don't know how often
would this happen, but there's a good chance it's going to
be much more rare than us having problems with such
highlighting right now.</div>
<div>
<div><br>
</div>
<div>We can also come up with a completely separate
CFG-based analysis for this purpose. This is probably
the most precise thing to do, because the problem is
essentially an all-paths problem (which is why loss of
coverage in the ExplodedGraph would bite us). I cannot
estimate how difficult such analysis would be to
implement (probably pretty scary), but i think Kristof
wasn't looking in this direction.<br>
</div>
<br class="gmail-Apple-interchange-newline">
</div>
<div><b>A: </b>Well, I originally imagined it to implement
this on the ExplodedGraph, but if achievable, a CFG based
solution would indeed be the ideal thing. This is some
great feedback -- Again, I'll follow up with this after
some more research.<br>
<br>
</div>
<div>-----------------</div>
<div><br>
</div>
<div>In the meanwhile (pretty much before I'm done with the
proposal), I won't have the time to interact with
Phabricator much though :^)<br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 3 Apr 2019 at 23:40,
Gábor Horváth <<a href="mailto:xazax.hun@gmail.com"
moz-do-not-send="true">xazax.hun@gmail.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>I just realized after reading your answer how ambiguous
my email was. When I said it is possible to trick the
NoStoreFuncVisitor what I meant was to prevent it from
emitting notes and thus pruning interesting parts of the
bug path. But I am glad that we are actually on the same
page regarding the value of detecting control
dependencies. The example 4 you provided is a very
compelling one :)</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 3 Apr 2019 at
23:16, Artem Dergachev <<a
href="mailto:noqnoqneo@gmail.com" target="_blank"
moz-do-not-send="true">noqnoqneo@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> The thing in NoStoreFuncVisitor
for IVars looks like it's just saying "let's see if
there's a data dependency on at least one statement in
this call". Given that the statement found by such AST
matcher was not executed on the bug's execution path,
there must be at least one control flow dependency in
this function as well.<br>
<br>
I think this pattern-matching may only have false
negatives, i.e. it's possible to write into a
variable/field/ivar without producing an
assignement-operator to a DeclRefExpr for that
variable/field/ivar, but it's not possible to write an
assignment into a DeclRefExpr for that
variable/field/ivar without modifying the
variable/field/ivar in the process. A more
sophisticated analysis may find situations when such
write is always done via an aliasing pointer, but i
cannot think of anything else. Well, there's also the
self-assignment cornercase, i.e. `x = x`, but that's
esoteric.<br>
<br>
------------------------<br>
<br>
Ok, i think i understand: one of the great goals for
this project would be to correctly highlight control
flow dependencies, which we currently don't do. I.e.:<br>
<br>
<b>Example 4.</b><br>
<br>
01 int flag;<br>
02 <br>
03 bool coin();<br>
04 <br>
05 void foo() {<br>
06 flag = coin();<br>
07 }<br>
08 <br>
09 void bar() {<br>
10 int *x = 0;<br>
11 // Set the flag to true.<br>
12 flag = true;<br>
13 foo();<br>
14 if (flag) { // Taking false branch... wait,
what?<br>
15 x = new int;<br>
16 }<br>
17 foo();<br>
18 if (flag) { // Now it's taking true branch
again?!<br>
19 *x = 1; // Null dereference.<br>
20 }<br>
21 }<br>
<br>
Because we wouldn't highlight the data dependency on
line 06 of the control flow dependency on lines 14 and
18, the positive may be incomprehensible.<br>
<br>
This one's tricky to do with a visitor because it's
hard to figure out that the true-branch of the
if-statement on line 14 would have updated `x`, given
that we didn't execute it.<br>
<br>
(1) Again, we can do a syntactic match over the
true-branch, like we did in NoStoreFuncVisitor, to see
if there are any data depenencies within it (there
are, line 15). Again, this may fail to cover the
tricky cases (what if this branch instead calls a
function that initializes `x`?). <br>
<br>
(2) Instead we can also have a look at the execution
path on the ExplodedGraph that goes through the
true-branch and see if the value remains null when it
exits the if-branch. Kristof, you're saying that you
plan to do the slicing over the ExplodedGraph - is
that what you mean? This could work as long as the
other branch is actually *explored* by the analyzer.
If it isn't, it'll be almost impossible to detect, and
i don't know how often would this happen, but there's
a good chance it's going to be much more rare than us
having problems with such highlighting right now.<br>
<br>
(3) We can also come up with a completely separate
CFG-based analysis for this purpose. This is probably
the most precise thing to do, because the problem is
essentially an all-paths problem (which is why loss of
coverage in the ExplodedGraph would bite us). I cannot
estimate how difficult such analysis would be to
implement (probably pretty scary), but i think Kristof
wasn't looking in this direction.<br>
<br>
Yay, i finally understand how to solve these problems
we have!<br>
<br>
I'd suggest first starting with a syntax-based
approach (1) because it sounds to me that the approach
that we chose for identifying dependencies on paths
that aren't part of the bug path is orthogonal to
actually making use of that information to produce
notes. Once we learn how to make use of this
information, we'll have a taste of this medicine and
see if it works. Then we'll see if we need to
transition to (2) or (3) depending on such evaluation.<br>
<br>
What i just described sounds like a fairly realistic
plan to me. Of course, the question still remains
about how checker-specific would the analysis be. If i
learned anything, it's "don't estimate the power of
the esoteric checker contracts". But the idea that we
can often get away with just tracking interesting
regions and symbols and consuming
trackExpressionValue() hints sounds like a good thing
to try.<br>
<br>
<br>
<br>
<div
class="gmail-m_1040859510329759042gmail-m_3651988806800932803moz-cite-prefix">On
4/2/19 11:23 PM, Gábor Horváth wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Yeah, this particular example would be fixed,
but we do not know how noisy it would be in
general. One way to make it a bit less noisy
could be something like what the
NoStoreFuncVisitor already does for IVars. We
could also add some syntactic checks to see if
the method/function actually has the possibility
to escape the region. <br>
</div>
<div><br>
</div>
<div>I am not familiar with Obj-C, but I suspect
it would be possible to trick
NoStoreFuncVisitor, or at least the syntactic
checks in `<span
class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-pl-en">potentiallyWritesIntoIvar</span>
`.</div>
<div>I think the main value of slicing here could
be to get rid of those fully syntactic
heuristics and replace them by something
smarter. Does this make sense?<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, 3 Apr
2019 at 02:25, Artem Dergachev <<a
href="mailto:noqnoqneo@gmail.com"
target="_blank" moz-do-not-send="true">noqnoqneo@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"> One more question
about Example 1. Do you think this
particular example could be solved by
developing some sort of NoStoreFuncVisitor
but for MallocChecker? I.e., in addition to
emitting notes like "returning without
initializing variable `x`", it could emit
notes like "returning without escaping
pointer `p`". If such visitor is developed,
what would be the differences between your
proposal and whatever behavior we'll get
with such visitor? In other words, is it
possible to construct an example with the
"use of uninitialized variable" checker
that, like Example 1, has vital pieces of
information missing, given that this checker
already uses NoStoreFuncVisitor?<br>
<br>
I mean, it's most likely a terrible idea to
develop such visitor for MallocChecker,
because the only reason this visitor works
more or less reliably is the existence of
const-qualifiers. For MallocChecker they
don't exist, so it's going to be hard to
figure out if a function was *supposed* to
escape the pointer.<br>
<br>
<div
class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944moz-cite-prefix">On
4/2/19 5:14 PM, Artem Dergachev wrote:<br>
</div>
<blockquote type="cite"> Ahaaaaa,
ooooookkkkk, iiiiiiii seeeeeeeeee!<br>
<br>
Sounds fascinating, must-have and hard.<br>
<br>
My primary question is how much do you
think this will put stress on the
checkers. Like, how much more difficult
would it be to write checkers when we
require them to include the slicing
criterion with their bug reports? How much
of it would we (i.e., BugReporter) be able
to infer ourselves?<br>
<br>
------------------------<br>
<br>
Say, let's take Example 1. You describe
the slicing criterion as:<br>
<br>
(13, <dynamically allocated int
object>)<br>
<br>
The value of the dynamically allocated
into object remains the same regardless of
whether the object is stored in global_ptr
or not, so the slice doesn't need to
include line 5 or 6. Therefore i think
that the slicing criterion you proposed is
not what we're looking for, and the real
slicing criterion we're looking for is:<br>
<br>
(13, <liveness of <dynamically
allocated int object>>)<br>
<br>
Like, we're mapping every dynamically
allocated object to an imaginary heap
location that represents its current
liveness, and include that imaginary
variable, rather than the object itself,
in the slicing criterion. Line 6 would
affect liveness of the object on line 13
(if it's stored into a global, it's alive;
otherwise it's not), therefore it'll be
part of the slice. So, am i understanding
correctly that you're proposing a checker
API that'll look like this:<br>
<br>
BugReport *R = new BugReport(Node, ...);<br>
R->addLivenessBasedSlicingCriterion(HeapSym);<br>
<br>
?<br>
<br>
I guess we can infer the statement of the
slicing criterion from the Node, but i'm
not entirely sure; see also below.<br>
<br>
I'd actually love it if you elaborate this
example further because it's fairly
interesting. Like, we know that the
assignment affects the liveness
information, but how would the slicing
algorithm figure this out? Do you have a
step-by-step description of how the
algorithm behaves in this case?<br>
<br>
------------------------<br>
<br>
Let's take another example i just came up
with. Consider alpha.unix.PthreadLock - a
checker that finds various bugs with
mutexes, such as double locks or double
unlocks. Consider code:<br>
<br>
1 pthread_mutex_t mtx1, mtx2;<br>
2<br>
3 void foo(bool x1, bool x2) {<br>
4 if (x1)<br>
5 pthread_mutex_lock(&mtx1);<br>
6 if (x2)<br>
7 pthread_mutex_lock(&mtx1);<br>
8 // ...<br>
9 }<br>
<br>
In this example we'll report a double lock
bug due to a copy-paste error: line 7
should probably lock &mtx2 rather than
&mtx1.<br>
<br>
The whole program is relevant to the
report. Without line 5, there would only
be a single lock. It transitions the mutex
object from state "unknown" to state
"locked".<br>
<br>
I don't think the Analyzer would currently
do a bad job at explaining the bug; it's
pretty straightforward.<br>
<br>
What would be the slicing criterion in
this case? As far as i understand, it'll
be<br>
<br>
(7, <locked-ness of mtx1>)<br>
<br>
In this case "lockedness" is, again, an
"imaginary heap location" that contains
metadata for the mutex. More specifically,
it is a GDM map value. How would a checker
API look in this case? How would it even
describe to the BugReporter what to look
at, given that currently only the checker
understands how to read its GDM values?<br>
<br>
My random guess is:<br>
<br>
BugReport *R = new BugReport(Node, ...);<br>
R->addGDMBasedSlicingCriterion<MutexState>(MutexMR);<br>
<br>
------------------------<br>
<br>
So it sounds to me that your project is,
like many other awesome projects, brings
in, as a dependency, replacing GDM with a
better, smarter data map that the analyzer
core *can* introspect and manipulate
without the direct help from the checkers.
It might be that your project actually
requires relatively little amounts of such
introspection - i.e., you might be able to
get away with "it's some GDM map and the
value for this key has changed, therefore
this statement is a relevant data
dependency" for quite a lot of checkers.<br>
<br>
That's the subject i was also bringing up
as part of <a
class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944moz-txt-link-freetext"
href="https://reviews.llvm.org/D59861"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D59861</a>
- "this was the plan that is part of the
even-more-long-term plan of completely
deprecating the void*-based Generic Data
Map...". This was also something that
became a blocker on my old attempt to
introduce summary-based analysis. It
required stuffing fairly large chunks of
code into every checker that would teach
the checker how to collect summary
information and apply it when the summary
is applied, and this new boilerplate was
fairly brain-damaging to implement and
hard to get right or even to explain. This
problem also blocks other projects, such
as state merging / loop widening (on one
branch the mutex is locked, on the other
branch it is unlocked, hey checker please
teach me how to merge this).<br>
<br>
The variety of slicing criteria (that is a
consequence of the variety of checkers
that we have) sounds pretty scary to me.
Some of them are really complicated, eg.
the liveness criterion. Making sure that
the generic algorithm works correctly with
at least a significant chunk of them is
going to be fun. Making sure it can
actually handle arbitrary criterions
required by the checkers without having
every checker come with its own
boilerplate for slicing also sounds hard.
And it'll be very bad if we have to tell
our checker developers "hey, by the way,
you also need to know what backward
slicing is and write down part of the
algorithm in order to ever get your
checker enabled by default" - i'm already
feeling pretty bad explaining dead symbols
and pointer escapes (which are pretty much
must-have in most checkers), these are
definitely examples of a boilerplate that
a smarter version of GDM could handle
automatically. In order to conquer the
world, i think we should stick to our
"writing a checker in 24 hours" utopia:
writing a checker should be as easy as
writing down the transfer functions for
relevant statements. In my opinion, we
should take it as an important constraint.<br>
<br>
<br>
<br>
<br>
<div
class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944moz-cite-prefix">On
4/2/19 12:21 PM, Kristóf Umann wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">Somehow the images and
the attached files were left out,
please find them here:</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On
Tue, 2 Apr 2019 at 21:16, Kristóf
Umann <<a
href="mailto:dkszelethus@gmail.com"
target="_blank"
moz-do-not-send="true">dkszelethus@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr"><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Hi!<br>
<br>
</font></span>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">In
this letter, I'd
like to describe
a particular
problem in the
Clang
StaticAnalyzer's
<font
face="monospace,
monospace">BugReporter
</font>class
that I'd like to
tackle in a
Google Summer of
Code project
this summer.
I'll show
real-world
examples on some
of its
shortcomings,
and propose a
potential
solution using
static backward
program slicing.
At last, I'll
ask some
specific
questions. I'd
love to hear any
and all feedback
on this!</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">This
is a <i>problem
statement, </i>not
a <i>proposal</i>.
I plan to send
out a formal
proposal by
Friday (but not
later then
Saturday), that
will contain
more details on
both the problem
and the
solution. I
don't introduce
myself or my
previous works
within the
project, that
will also be
detailed in my
upcoming letter.
I also plan to
incorporate the
feedback I'll
receive to this
letter.</font></span></div>
</div>
<div dir="ltr"><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><b><font
style="background-color:rgb(255,255,255)"
size="4"
color="#000000">---===
BugReporter
constructs bad
reports ===---</font></b></div>
<div><b><font
style="background-color:rgb(255,255,255)"
size="4"
color="#000000"><br>
</font></b></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><b>What
does the
BugReporter do?</b><br>
<br>
After the Static
Analyzer found an
error, the <font
face="monospace,
monospace">BugReporter
</font>receives an
<font
face="monospace,
monospace">ExplodedNode</font>,
which, accompanied
by its
predecessors,
contains all the
information needed
to reproduce that
error. This <i>bugpath </i>is
then shortened
with a variety of
heuristics, such
as removing
unrelated function
calls, unrelated
branches and so
on. <font
face="monospace,
monospace">BugReporter
</font>by the end
of this process
will construct a <font
face="monospace,
monospace">PathDiagnostic
</font>object for
each report, that
is, ideally,
minimal.</font></span></div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></b></div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000">Example
1.</font></b></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Consider
the following code
example:<br>
<br>
</font></span>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">1
// leak.cpp</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">2
int *global_ptr;</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">3 </font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">4
void save_ext(int
storeIt, int *ptr)
{</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">5
if (storeIt)</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">6
global_ptr =
ptr;</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">7
}</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">8</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">9
void test(int b) {</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">10
int *myptr = new
int;</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">11
save_ext(b,
myptr);</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">12
delete
global_ptr;</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">13
}</font></div>
</div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div dir="ltr">
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">It's
clear that if
test is invoked
with <font
face="monospace,
monospace">b</font>'s
value set to
true, there is
no error in the
program.
However, should
b be false,
we'll leak
memory.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<span
style="font-family:monospace,monospace;background-color:rgb(255,255,255)"><font
color="#000000">$
clang -cc1
-analyze
-analyzer-checker=core,cplusplus
leak.cpp</font></span></div>
<div dir="ltr">
<div><img
src="cid:part7.74E3A644.1EE15463@gmail.com"
alt="image.png"
class=""
width="472"
height="168"><br>
</div>
</div>
<div dir="ltr"><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000"><br>
</font>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">The
Static Analyzer
is able to catch
this error, but
fails to mention
the call to <font
face="monospace, monospace">save_ext</font> entirely, despite the error
only
occurring because
the analyzer
assumed that <font
face="monospace, monospace">storeIt</font> is false. I've also attached
the exploded
graph <font
face="monospace,
monospace">leak.svg </font><font
face="arial,
helvetica,
sans-serif">that
demonstrates
this.</font></font></span></div>
<div><font
style="background-color:rgb(255,255,255)"
face="arial,
helvetica,
sans-serif"
color="#000000"><br>
</font></div>
<div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000">Example 2.</font></b></div>
</div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Consider
the following
code example:<br>
<br>
<font
face="monospace,
monospace">1
//
divbyzero.cpp</font><br>
</font></span>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">2
void f() {</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">3
int i = 0;</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">4
(void) (10 /
i);</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">5
}</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">6</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">7
void g() { f();
}</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">8
void h() { g();
}</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">9
void j() { h();
}</font></div>
<div><font
style="background-color:rgb(255,255,255)"
face="monospace,
monospace"
color="#000000">10
void k() { j();
}</font></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br
class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944m_6421736142923902491gmail-m_-3454961580602141710gmail-m_-6979117027997739172gmail-m_4426039388547674777gmail-m_5439174890474816058gmail-m_6038659214782379119gmail-Apple-interchange-newline">
Its clear that
a call to f
will result in
a division by
zero error.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<span
style="background-color:rgb(255,255,255)"><font
color="#000000"><span
style="font-family:monospace,monospace">$ clang -cc1 -analyze
-analyzer-checker=core </span><span
style="font-family:monospace,monospace">divbyzero.cpp</span></font></span></div>
<div><img
src="cid:part8.EB0923F3.A9A23BE1@gmail.com"
alt="image.png"
class=""
width="334"
height="472"><br>
</div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Again,
the Static
Analyzer is
plenty smart
enough to catch
this, but the
constructed bug
report is
littered with a
lot of useless
information --
it would be
enough to only
show the body of
f, and,
optionally where
f itself was
called. For the
sake of
completeness,
I've attached <font
face="monospace, monospace">divbyzero.svg</font> that contains the
exploded graph
for the above
code snippet.<br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">The
above examples
demonstrate that
<font
face="monospace,
monospace">BugReporter
</font>sometimes
reduces the
bugpath too much
or too little.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><b><font
style="background-color:rgb(255,255,255)"
size="4"
color="#000000">---===
Solving the
above problem
with static
backward program
slicing ===---</font></b></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000">What
is static
backward program
slicing?</font></b></div>
<div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">A <i>program slice</i> consists of the parts of a
program that
(potentially)
affect the
values
computed at
some point of
interest,
called the
slicing
criterion. <i><b>Program
slicing</b></i> is
a
decomposition
technique that
elides program
components not
relevant to
the slicing
criterion
(which is a
pair of
(statement,
set of
variables)),
creating a
program
slice[1][2]. <b>Static </b>slicing
preserves the
meaning of the
variable(s) in
the slicing
criterion for
all possible
inputs[1]. <b>Backward </b>slices
answer the
question “what
program
components
might affect a
selected
computation?”[1]</font></span></div>
</div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">While
statement-minimal slices are not necessarily unique[3], Weisel developed
a popular
algorithm that
constructs one.
In essence, his
fix-point
algorithm
constructs sets
of <i>relevant
variables</i><i> </i>for
each edge in
between node <i>i</i> and
node <i>j</i> in
a CFG graph,
from which he
constructs <i>relevant
statements</i>.
The fix-point of
the relevant
statements set
is the slice
itself.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">One
of the
characteristic
of his algorithm
is that the
resulting
program slice
will be
executable.
However, our
problem doesn't
require the code
to be
executable, so
we could use a
more
"aggressive"
approach that
creates a
smaller slice.
An improvement
to his algorithm
is presented in
[4].</font></span></div>
<div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></b></div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000">How does this relate to BugReporter?</font></b></div>
</div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">We
can show that
using Weiser's
algorithm,
issues raised in
Example 1. and
Example 2. can
be improved
upon.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">For
example 1., the
statement-minimal program slice with the criterion (13, <font
face="monospace,
monospace"><dynamically
allocated int
object></font>)
will contain the
statements 5 and
6, and for
example 2., the
statement-minimal program slice with the criterion (4,<font
face="monospace,
monospace"> i</font>)
won't contain
anything but
statements 3 and
4. For the
latter, we can
even improve the
algorithm to
also contain
statement 7,
where a call to
f is made.</font></span></div>
<span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br
class="gmail-m_1040859510329759042gmail-m_3651988806800932803gmail-m_444863209060437944m_6421736142923902491gmail-m_-3454961580602141710gmail-m_-6979117027997739172gmail-m_4426039388547674777gmail-Apple-interchange-newline">
The analyzer, as
stated earlier,
gives <font
face="monospace,
monospace">BugReporter
</font>an <font
face="monospace,
monospace">ExplodedNode</font>,
from which the
slicing criterion
must be
constructed. The
statement
corresponding to
this node, coupled
with the
interesting
regions the
checker itself
marked could be
used to construct
this slicing
criterion.</font></span>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><b>Challenges</b><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">While
the algorithm
Weiser developed
along with the
improvements
made by others
are
interprocedural,
I would imagine
that in
implementation,
it would be a
challenging step
from an
intraprocedural
prototype.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Several
articles also
describe
pointers,
references, and
dynamically
allocated
regions, as well
as gotos and
other tricky
parts of the
language, but I
still expect to
see some
skeletons
falling out of
the closet when
implementing
this for C++,
not only C.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><b
style="background-color:rgb(255,255,255)"><font
color="#000000">Drawbacks</font></b></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Static
slicing, as an
algorithm that
doesn't know
anything about
input values,
suffers from the
same issues that
all static
techniques do,
meaning that
without
heuristics,
it'll have to do
very rough
guesses,
possibly leaving
a far too big
program slice.
However, with
the symbolic
values the
analyzer holds,
this could be
improved,
turning this
into <i>conditioned
slicing, </i>as
described in
[1]. This
however is only
planned as a
followup work
after GSoC.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">For
this reason,
this project
would be
developed as an
alternative
approach to some
of the
techniques used
in <font
face="monospace,
monospace">BugReporter</font>,
as an optional
off-by-default
analyzer
feature.</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><font size="4"
color="#000000"><b
style="background-color:rgb(255,255,255)">---=== Questions ===---</b></font></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">What
do you think of
this approach?</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Do
you think that
implementing
this algorithm
is achievable,
but tough enough
task for GSoC?</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Would
you prefer to
see a general
program slicing
library, or an
analyzer-specific
implementation?
Traversing the <font
face="monospace, monospace">ExplodedGraph </font>would be far easier in
terms of what I
want to achieve,
but a more
general approach
that traverses
the CFG (like <font
face="monospace, monospace">llvm::DominatorTree</font><font face="arial,
helvetica,
sans-serif">[5])
could be
beneficial to
more
developers,
but possibly
at the cost of
not being able
to improve the
prototype with
the symbolic
value
information
the analyzer
holds.</font></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><b><i
style="background-color:rgb(255,255,255)"><font
color="#000000">References, links</font></i></b></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">[1] <span
style="font-size:13px;font-family:Arial,sans-serif">Gallagher, Keith,
and David
Binkley.
"Program
slicing." </span><i
style="font-size:13px;font-family:Arial,sans-serif">2008 Frontiers of
Software
Maintenance</i><span
style="font-size:13px;font-family:Arial,sans-serif">.</span></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><span
style="font-size:13px;font-family:Arial,sans-serif">[2] </span><span
style="font-size:13px;font-family:Arial,sans-serif">Tip,
Frank. </span><i
style="font-size:13px;font-family:Arial,sans-serif">A survey of program
slicing
techniques</i><span
style="font-size:13px;font-family:Arial,sans-serif">. Centrum voor
Wiskunde en
Informatica,
1994.</span></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><span
style="font-size:13px;font-family:Arial,sans-serif">[3] </span><span
style="font-size:13px;font-family:Arial,sans-serif">Weiser,
Mark. "Program
slicing." </span><i
style="font-size:13px;font-family:Arial,sans-serif">Proceedings of the
5th
international
conference on
Software
engineering</i><span
style="font-size:13px;font-family:Arial,sans-serif">. IEEE Press, 1981.</span></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><span
style="font-size:13px;font-family:Arial,sans-serif">[4] </span><span
style="font-size:13px;font-family:Arial,sans-serif">Binkley,
David.
"Precise
executable
interprocedural
slices." </span><i
style="font-size:13px;font-family:Arial,sans-serif">ACM Letters on
Programming
Languages and
Systems
(LOPLAS)</i><span
style="font-size:13px;font-family:Arial,sans-serif"> 2.1-4 (1993):
31-45.</span></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><span
style="font-size:13px;font-family:Arial,sans-serif">[5] </span><font
face="Arial,
sans-serif"><a
href="http://llvm.org/doxygen/classllvm_1_1DominatorTree.html"
target="_blank"
moz-do-not-send="true">http://llvm.org/doxygen/classllvm_1_1DominatorTree.html</a></font></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Link
to previous GSoC
related letter I
sent: <a
href="http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html"
target="_blank" moz-do-not-send="true">http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html</a></font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000"><br>
</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Cheers,</font></span></div>
<div><span
style="background-color:rgb(255,255,255)"><font
color="#000000">Kristóf
Umann</font></span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>