[cfe-dev] GSOC Static Analyzer Proposal

Fri Apr 12 13:53:14 PDT 2013

On Fri, Apr 12, 2013 at 1:08 PM, Adam Schnitzer <adamschn at umich.edu> wrote:

> Richard, This is very helpful. Thank you for sharing.
>
> As John pointed out, looking at multiple modification of an lvalue over a
> sequence
> point seems interesting. And it doesn't look like something that UBSan is
> currently
> getting. Although it might be difficult to get all violations of that
> type, catching
> some simpler ones seems like a good goal.
>

There are two different things you might be interested in here. One is
easy, the other is useful ;-)

The easy (but not very useful) one is catching undefined behavior due to
unsequenced reads/writes to the same location (this is undefined behavior,
and is Object Model item 4 in the document). This is "easy" because (1) we
already have implementation experience (in the static check for this bug)
of finding unsequenced operations, and we would "just" need to generate the
code to check whether any of the unsequenced and potentially-conflicting
operations are actually operating on the same location. Unfortunately, I
don't expect this to catch very many bugs which the static check misses;
this will probably only catch cases where the modified objects are the same
but not "obviously" the same, such as:

  int n;
  int &f() { return n; }
  int g(int, int);
  int k = g(f()++, n);

However, providing the sequencing information to the IR can allow further
optimizations (for instance, a sequencing-based alias analysis can discover
that "f()"s result does not alias "n" in the above code), so there may be
value in this beyond detecting UB bugs.

The useful (but not easy) one is catching cases where the operations are
indeterminately sequenced, rather than unsequenced (this happens when the
accesses are not directly part of the same full-expression, for instance if
one of them is within a function call, and does not provoke undefined
behavior). It's hard because:
1) runtime instrumentation to catch this is hard (this is similar to what
TSan checks for), and
2) removing false positives is hard (most cases of indeterminately
sequenced accesses are benign, because the end result of the program is the
same either way).

Example false positive, to give you a taste of why this is hard:

std::map<int, string*> stuff;
string build(int);
string &get(int k) {
  string *&p = stuff[k];
  if (!p) p = new string(build(k));
  return *p;
}
void f(string &a, string &b);
void g() {
  f(get(1), get(2)); // indeterminately-sequenced modifications to 'stuff'
...
  // ... but this is almost certainly fine, if the results of the 'build'
calls
  // don't depend on the order in which they're called
}

One reasonable heuristic for detecting issues here would be to provide a
mechanism to randomize the evaluation order (subject to the sequencing
rules and with a seed for debugability), and then either (a) just run the
tests for the program, or (b) run the program in parallel with multiple
different orders and check that the "observable behavior" (library IO
calls, etc) is the same.

[As an aside, thinking about "sequence points" is likely to prove unhelpful
to you. The "sequence points" rules are known to be broken, and the
"sequenced before" rules in C11 and C++11 are intended to describe how the
"sequenced before" rules were really meant to work.]

> Adam
>
> On Fri, Apr 12, 2013 at 2:23 PM, Richard Smith <richard at metafoo.co.uk>wrote:
>
>> On Wed, Apr 10, 2013 at 2:07 PM, Adam Schnitzer <adamschn at umich.edu>wrote:
>>
>>> John and Sean,
>>>
>>> Thank you very much for the feedback. I have a better idea of scope and
>>> where to focus.
>>>
>>> John, I think you're absolutely right, with -fsanitize=undefined and
>>> others, more behavior is being caught at runtime/compile time. I will start
>>> working on a list of behaviors for which no diagnostics currently exist,
>>> and select a subset to focus on.
>>>
>>
>> I made such a list when I started UBSan, and have (mostly) kept it
>> up-to-date with what we currently catch:
>>
>>
>> https://docs.google.com/document/d/1o7Xw6dohIuHLOve3hxtTjCrrUbd2NHUgmq_WGrao6js/edit#heading=h.pgx8h2ru49tm
>>
>> This only covers core language undefined behavior; the standard library
>> is another country ;)
>>
>>  Adam
>>>
>>> On Wed, Apr 10, 2013 at 1:54 PM, John Regehr <regehr at cs.utah.edu> wrote:
>>>
>>>> I would like to work on improving support for C++ in the static
>>>>> analyzer. Specifically, I think it
>>>>> would be valuable to improve the checkers for undefined behavior
>>>>> including those already suggested.
>>>>>
>>>>
>>>> I'd be happy to provide feedback on a more specific version of this
>>>> part of the proposal.
>>>>
>>>> In particular, a useful starting point (maybe this already exists?)
>>>> would be a list of all C/C++ undefined behaviors broken down by whether
>>>> Clang/LLVM...
>>>>
>>>> - can reliably provide a compile-time diagnostic
>>>>
>>>> - can reliably provide a runtime diagnostic
>>>>
>>>> - cannot provide any diagnostic, but implements a predictable behavior
>>>>
>>>> - cannot provide any diagnostic and also implements unpredictable
>>>> behavior
>>>>
>>>> Obviously the last category is the interesting place for future work.
>>>>
>>>> John
>>>>
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130412/9f00742d/attachment.html>