<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
I believe in our current situation we should simply generate a sink
whenever we discover symbolic pointer aliasing, because RegionStore
always implicitly assumes that there's no aliasing. Eg., consider:<br>
<br>
void example1(int *x, int *y) {<br>
*x = 1; // (SymRegion{reg_$1<x>}, 0, direct): 1 S32b<br>
*y = 2; // (SymRegion{reg_$2<y>}, 0, direct): 2 S32b<br>
if (x == y) {<br>
// ???<br>
}<br>
}<br>
<br>
The correct way to model the then-branch of the if-statement is to
figure out which of the bindings is *newer* and keep it; in this
case it'll mean dropping the binding of 'x' and keeping the binding
of 'y'. This would mean that the value of the common pointee of 'x'
and 'y' would be equal to 2. However, in its current shape
RegionStore simply does not track history of bindings. We can make
it track the history, but it'll be a ridiculously complex data
structure - and even the current simple structure is so complicated
and messy that nobody understands how it works.<br>
<br>
All right, suppose we make a new Store implementation that'll handle
aliasing perfectly and is otherwise perfect in every way. Suppose we
also have a perfect constraint solver that knows that in the
following code<br>
<br>
void example2(int *x, int *y) {<br>
if (*x > 5)<br>
if (*y < 10)<br>
if (x == y) {<br>
// ???<br>
}<br>
}<br>
<br>
the valid constraint range for the common pointee of 'x' and 'y'
would be [6, 9] (this won't be immediately solved by the Z3 solver
because heap shape information still needs to be stuffed into it).
Now, how about these examples?:<br>
<br>
void example3(int *x, int *y) {<br>
delete x;<br>
if (x == y) {<br>
// ???<br>
}<br>
}<br>
<br>
void example4(int *x, int *y) {<br>
delete x;<br>
delete y;<br>
if (x == y) {<br>
// ???<br>
}<br>
}<br>
<br>
In example 3 the common pointee would be deleted. It'd cause a
warning if 'y' is used later. In example 4 it's an outright
retroactive double-free. We need to modify MallocChecker to handle
both of these cases correctly.<br>
<br>
If the same happens to RetainCountChecker, we'll need to *add*
reference counts of the two pointers together. That's already a
large variety of very non-trivial operations that we need to
perform. How about aliasing file descriptors (which aren't even
necessarily pointers)? How about aliasing two mutexes one of which
is locked and the other is unlocked? How much completely non-trivial
and non-reusable code do we need to write in dozens of checkers to
solve this problem, given that we can barely solve it for
RegionStore?<br>
<br>
===========<br>
<br>
What we *can* do, however, is to not only generate the sink, but
*restart* the analysis from the beginning, assuming that the two
pointers point to the same memory from the start. Say, force the
SymbolManager that's owned by this analysis's ExprEngine to produce
reg_$1<x> whenever it's asked for reg_$2<y>. This would
make sure that we don't make wrong assumptions in the first place,
so that we didn't have to undo them in the middle of the analysis.
This wouldn't require much checker-side support, so it'll be much
more in the spirit of the Analyzer. And it's also much easier to
implement. As usual, we'll need some heuristics to make sure we
don't exponentially explode on the number of different combinations
of pointers that can potentially alias.<br>
<br>
<br>
<div class="moz-cite-prefix">On 05.06.2019 05:23, Gábor Horváth via
cfe-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAPRL4a1DMn3ZAdCF+-X1_C5s_a8n39G0O6w3T1TzTEU7fH6_8A@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>
<div>Hi!<br>
<br>
</div>
This is an excellent point and it is not easy to solve. One of
the reason why your original idea is hard to implement is the
following. Let's look at the code snippet below:<br>
</div>
<div><span style="font-family:courier new,monospace">void f(int
*p, int *q) {<br>
</span></div>
<div><span style="font-family:courier new,monospace"> int *p2 =
p + 2;<br>
</span></div>
<div><span style="font-family:courier new,monospace"> if (p ==
q) {<br>
</span></div>
<div><span style="font-family:courier new,monospace"> // Here
we learned something new about p2 as well.<br>
</span></div>
<div><span style="font-family:courier new,monospace"> // So
we might need to update all the symbolic expressions and
bindings, not just p.<br>
</span></div>
<div><span style="font-family:courier new,monospace"> }<br>
</span></div>
<div><span style="font-family:courier new,monospace">}<br>
</span><br>
</div>
<div>I think your second idea should work well, but I have no
idea about the performance implications. As a starting point
we could implement your second solution and do some benchmarks
both performance wise and looking at the results from some
open source projects to see what happens to the false positive
rate.<br>
<br>
</div>
<div>Out of curiosity, do we have the same problems with
invalidations?<br>
<br>
</div>
<div>I am sure Artem will have some more to add but as far as I
know he is busy with WWDC this week. <br>
</div>
<div><br>
</div>
<div>Regards,<br>
</div>
<div>Gabor<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 4 Jun 2019 at 14:53,
Ádám Balogh via cfe-dev <<a
href="mailto:cfe-dev@lists.llvm.org" moz-do-not-send="true">cfe-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_8770444531635733315WordSection1">
<p class="MsoNormal">Hello,</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">This is not the first time that we
are facing the following problem.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Given a code that clears all
non-sentinel elements from a circular double-linked list
with sentinel element:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">typedef struct ListNode {</p>
<p class="MsoNormal"> struct ListNode *next;</p>
<p class="MsoNormal"> struct ListNode *prev;</p>
<p class="MsoNormal"> void *data;</p>
<p class="MsoNormal">} ListNode;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">void ListClear(ListNode *list) {</p>
<p class="MsoNormal"> while (list->next != list) {</p>
<p class="MsoNormal"> ListNode *node = list->next;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> node->next->prev =
node->prev;</p>
<p class="MsoNormal"> node->prev->next =
node->next;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> node->next = NULL;</p>
<p class="MsoNormal"> node->prev = NULL;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> free(node);</p>
<p class="MsoNormal"> }</p>
<p class="MsoNormal">}</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">unix.Malloc Use after free is
reported in the second iteration of the loop for `node`.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">This is not necessarily a false
positive if analyzed top level. The function that clears
the nodes of the list implicitly assumes that it is a
consistent circular double-linked list with a sentinel
node. However this invariant is not ensured anywhere
inside the function. It is thus not known that the
`prev` pointer of the first non-sentinel node points
back to the sentinel node, neither is it known that
applying a `prev` and a `next` after each other in any
order leads back to the original node. Without these
invariants the analyzer cannot know that after unlinking
and freeing the first non-sentinel element and then
taking again the first non-sentinel element is not equal
to the original sentinel element. So it assumes that
they are equal and used again after freeing its memory
space. Thus from the perspective of the standalone
function it can be a use-after-free situation.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">So I fixed the function by inserting
assertions to ensure the invariant:</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">void ListClear(ListNode *list) {</p>
<p class="MsoNormal"> while (list->next != list) {</p>
<p class="MsoNormal"> ListNode *node = list->next;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> assert(node->prev == list);</p>
<p class="MsoNormal"> assert(node->next->prev ==
node);</p>
<p class="MsoNormal"> assert(node->prev->next ==
node);</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> node->next->prev =
node->prev;</p>
<p class="MsoNormal"> node->prev->next =
node->next;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> node->next = NULL;</p>
<p class="MsoNormal"> node->prev = NULL;</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> free(node);</p>
<p class="MsoNormal"> }</p>
<p class="MsoNormal">}</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">However, it did not help because the
analyzer cannot handle equality of pointers correctly.
If we have an assumption that two pointers are equal the
analyzer still creates two symbolic regions and records
a constraint that the symbols of the symbolic regions
are equal. Later this constraint can be used for
deciding whether they are equal or not but it does not
handle them as the same region. So if we change the
pointer `node->prev->next` to `node->next` then
`node->prev->next` is changed but `list->next`
not even if we assumed that they are the same. That is
the reason the analyzer takes the freed region again in
the next iteration of the loop and reports an error.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">A strange thing is that the
constraint manager rearranges the comparison of the two
symbols (of the symbolic regions) to a difference
without `aggressive-binary-operation-simplification`
disabled. (It should not do it even if this option is
enabled since the ranges of symbolic values are not
constrained to `MAX/4` of the type.)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">The question is how to fix this. I
think that if two pointers are the same, then they must
point to exactly the same memory region which means
exactly the same symbol if it is a symbolic region. Thus
instead of assigning `$reg2` to `node->prev` and
recording whether the range of `$reg2-$reg0` is `[0..0]`
the analyzer should assign `$reg0` to `node->prev`
which is the region of `list`. Similarly,
`node->next->prev` should not become `$reg7` where
the range of `$reg7-$reg1` is `[0..0]` and
`node->prev->next ` should not become `$reg11`
where the range of `$reg11-$reg1` is `[0..0]` but both
should become `$reg1`. This way when we unlink `node`
`node->next->prev` becomes `reg0` (thus the region
of `list`) and `node->prev->next` e.g. `$reg2`.
This way in the next iteration `node->next` is
`$reg2` instead of the freed `$reg1` so the false
warning goes away. Of course I know this is not easy to
implement. If both pointers already have regions
assigned then which one to chose and how to replace the
other one?</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Another solution could be iterating
all the existing regions upon bindings and do the same
binding to the regions equal to the affected region. (I
hope I managed to express myself clearly enough here.)</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">This is an old weakness of the
Analyzer which should be solved eventually because it
causes false positives. A new constraint manager does
not help because the problem is not there.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Any ideas?</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Regards,</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"><span lang="HU">Ádám</span></p>
<p class="MsoNormal"><span lang="HU"> </span></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal"> </p>
</div>
</div>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank"
moz-do-not-send="true">cfe-dev@lists.llvm.org</a><br>
<a
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev"
rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
cfe-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>