<div dir="ltr"><div><div><div><div><div><div>> <i>It can be done in some cases.
Take a look at DereferenceChecker.cpp when it generates an error
report. There it walks up the ExplodedGraph to determine what
variable/array/field that a null pointer value got loaded from. This is
used by the diagnostic machinery.<br><br></i></div>I guess you mean bugreporter::trackNullOrUndefValue. I've been looking at its source code, but I don't yet even understand what it's meant to do, let alone how it works. I'll work on this though, thanks!<br>
</div><div><br>> <i>I think the terminology is confusing. SVals don’t alias each other. That concept doesn’t even apply here.<br><br></i></div>You're correct, of course. I still get confused around what is an SVal and what is a MemRegion - even after e.g. Jordan's explanation.<br>
For example, when I have a Foo* fp, and 'fp' appears somewhere (e.g. in a null-checking condition), then:<br> - I get an SVal that says '&fp'<br></div> - I can .getAsRegion() to get a MemRegion that dumps to 'fp'<br>
</div> - I can then ProgramState::getSVal() on this MemRegion to get e.g. '&SymRegion{conj_$3{struct Foo *}}'<br><br></div>As far as I can tell, this last in the 'value' of the fp variable, as in, it is what was last bound to it; then '&fp' is the memory region in which the value of the variable is stored, and finally 'fp' is pointer value to the previous region. Is this roughly correct?<br>
<br>> <i>Currently the analyzer doesn’t reason about the condition accurately
because ‘p’ and ‘q’ are assumed to essentially not alias because they
will refer to two different symbolic pieces of memory. To solve this
problem we would need the ability to “unify” two memory regions along a
path. That’s a complicated problem, and nobody has gotten to
implementing it yet.</i><br><br></div>Out of curiosity, conceptually, what makes this complicated? I realize you must hate this question, and it will probably be evident once I start studying how the Static Analyzer is implemented, but I just couldn't refrain from asking. You can ignore me here. :)<br>
<div><br>> <i>This is essentially an “all paths” problem, and we have a few
checkers, such as the IdempotentOperations checker, which try and
address this kind of problem.</i><div><i>Roughly speaking, the checker has to be implemented in two stages: [...]<br><br></i></div><div>Thank you very much, I implemented your approach, and it worked! It was quite simple - following your idea -, and I'm actually quite embarrassed that I had to ask for help on this. I was thinking along the lines of traversing the ExplodedGraph in checkEndAnalysis, which I suppose would have worked as well, except it would have been way more complicated.<br>
<br></div><div>Once again, thank you very much for your help!<br></div><div><br>Gabor<br></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/7/24 Ted Kremenek <span dir="ltr"><<a href="mailto:kremenek@apple.com" target="_blank">kremenek@apple.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div class="im">On Jul 24, 2013, at 7:36 AM, Gábor Kozár <<a href="mailto:kozargabor@gmail.com" target="_blank">kozargabor@gmail.com</a>> wrote:<br>
</div><div><div class="im"><br><blockquote type="cite"><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
Yes, I realize the Static Analyzer will recognize that the line p = q means that 'p' and 'q' now has the same value, but it does not copy the user state (REGISTER_*_WITH_PROGRAMSTATE data) - which makes total sense, since it does not know how that data is structured. Therefore, I need to record the fact the aliasing has happened using checkBind. This is where I encountered the problem described in my original e-mail.<br>
<br></div><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
Reading through the<span> </span><a href="http://clang-analyzer.llvm.org/checker_dev_manual.html" target="_blank">checker dev manual</a>, I found the relevant section:<br><i>"When<span> </span><tt>x</tt><span> </span>is evaluated, we first construct an<span> </span><tt>SVal</tt><span> </span>that represents the lvalue of<span> </span><tt>x</tt>, in this case it is an<span> </span><tt>SVal</tt><span> </span>that references the<span> </span><tt>MemRegion</tt><span> </span>for<span> </span><tt>x</tt>. Afterwards, when we do the lvalue-to-rvalue conversion, we get a new<span> </span><tt>SVal</tt>, which references the value<span> </span><b>currently bound</b><span> </span>to<span> </span><tt>x</tt>. That value is symbolic; it's whatever<span> </span><tt>x</tt><span> </span>was bound to at the start of the function."<br>
<br></i></div><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
I need the reverse: not the value currently bound to x, but the SVal that references the MemRegion for x itself. I searched the documentation, but all I could find was ProgramState::getLValue, none of whose overloads give me what I need. So how could I achieve this?<br>
</div></blockquote><div><br></div></div><div>It can be done in some cases. Take a look at DereferenceChecker.cpp when it generates an error report. There it walks up the ExplodedGraph to determine what variable/array/field that a null pointer value got loaded from. This is used by the diagnostic machinery.</div>
<div class="im"><br><blockquote type="cite"><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br></div><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
Back to the aliasing problem: I think it would be very useful if the Static Analyzer exposes functions to retrieve aliasing information. For example, I would like to ask whether a given SVal is an alias of another.</div>
</blockquote>
<div><br></div></div><div>I think the terminology is confusing. SVals don’t alias each other. That concept doesn’t even apply here.</div><div><br></div><div>For example, suppose I had the expression:</div><div><br></div>
<div> x + y</div><div><br></div><div>and x and y were each bound to the value 1. When the analyzer evaluates “x” and “y” separately we get an SVal of 1 for each subexpressions. Those SVals are the same value, but they don’t alias each other. Aliasing doesn’t even make sense here.</div>
<div><br></div><div>In the end, SVals are just values. They can wrap different kinds of values, be it references to symbolic pieces of memory, actual constants such as 1 and 2, addresses of goto labels, and so on.</div>
<div>
<br></div><div>What you want to know is if two pointer variables, say “p” and “q”, point to the same piece of memory. In the static analyzer they will be bound to a value which represents their respective pointer values. If “p” and “q” refer to MemRegions that are completely disjoint (say two separate VarRegions) then they cannot alias each other. If they both refer to two SymbolicRegions than they *may* alias each other.</div>
<div><br></div><div>Right now most of the analyzer assumes that two SymbolicRegions always refer to separate chunks of memory. That’s an optimistic assumption (that a compiler could never make for optimization), but it works well in practice. There are opportunities to improve the analyzer here. Suppose we saw something like:</div>
<div><br></div><div> int *p = foo();</div><div> int *q = bar();</div><div><br></div><div> if (p == q) { … }</div><div><br></div><div>Currently the analyzer doesn’t reason about the condition accurately because ‘p’ and ‘q’ are assumed to essentially not alias because they will refer to two different symbolic pieces of memory. To solve this problem we would need the ability to “unify” two memory regions along a path. That’s a complicated problem, and nobody has gotten to implementing it yet.</div>
<div><br></div><div>But I think you aren’t concerned about this problem. I think you are more thinking of the following scenario:</div><div><br></div><div> int *p = foo();</div><div> ...</div><div> int *q = p;</div><div>
…</div><div><br></div><div>In this case, ‘p’ and ‘q’ alias. Given two variables, it’s easy to determine if they currently alias if the resolve to the same MemRegion. You can even chop off region offsets if you want to make things more course grained.</div>
<div><br></div><div>But my point is that for your checker this isn’t relevant. The analyzer already reasons about all of this for you. Your checker doesn’t really care about ‘p’ or ‘q’, but rather what it points to. For example, suppose you had:</div>
<div><br></div><div> int *p = foo();</div><div> …</div><div> int *r = p;</div><div> …</div><div> int *q = r;</div><div> …</div><div> *r = 1;</div><div> …</div><div> if (q) { … }</div><div><br></div><div>In this case, we have ‘p’, ‘q’, and ‘r’ all aliasing each other. The analyzer engine tracks all of this for you without you doing anything. What’s important here is that the pointer value that is loaded from ‘q’ at the if statement is perfectly constrained to be non-null since it was already dereferenced earlier. That’s all that matters. The aliasing is irrelevant.</div>
<div class="im"><div><br></div><blockquote type="cite"><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
My original thought was that this alias-analysis could be implemented as a checker itself, which would dispatch an event when it detects an aliasing, but I do not see how it could expose methods to the other checkers (e.g. for them to be able to ask "is x an alias of y in this ProgramState?"). Does this make sense?<br>
</div></blockquote><div><br></div></div>I really don’t see how aliasing is relevant.</div><div><div class="im"><br><blockquote type="cite" dir="auto"><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<br>><span> </span><i>A null check might not always be redundant. On some paths the null check may be redundant and others it won’t be. Thus there is a dominance relationship here that needs to be checked. Essentially, all paths need to show that the pointer is always non-null before the pointer is checked. Otherwise you’ll get false positives. Doing this correctly is hard, because not all paths are guaranteed to be traced. You’ll need to handle that too.<br>
<br></i></div><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
Yes, this is a problem, but I have absolutely no clue as to how it could be solved. I would naturally want to keep the number of false-positive at minimum, even at the cost of some bugs going undetected.<br></div></blockquote>
<div><br></div></div><div>It’s a real problem, and unless you have a solution it probably makes the checker unusable.</div><div><br></div><div>This is essentially an “all paths” problem, and we have a few checkers, such as the IdempotentOperations checker, which try and address this kind of problem.</div>
<div><br></div><div>Roughly speaking, the checker has to be implemented in two stages:</div><div><br></div><div>(1) Keep on the side a map from “condition checks” to a tri-state: { always null, may-be-null, no value }. The “no value” is the default, and it essentially means you have no data for a given condition.</div>
<div><br></div><div>For example, suppose you had:</div><div><br></div><div> if ( pointer value )</div><div> …</div><div> if ( pointer value )</div><div><br></div><div>You would have a side-map mapping from each of these IfStmts to the tri-state.</div>
<div><br></div><div>(2) Monitor checks of the pointer value using the analyzer visitor interface. If the pointer value is null and the current map value not “may-be-null”, mark it “null”. Otherwise, mark it “may-be-null”, which essentially means that the given condition statement is no longer in contention for the warning.</div>
<div><br></div><div>(3) After the analyzer finishes exploring paths, go over this map and find the entries that are marked “always null”. Those are the places to emit a warning.</div><div><br></div></div></div></blockquote>
</div><br></div>