[cfe-dev] C sequence-point analysis

Richard Smith richard at metafoo.co.uk
Fri Nov 15 12:46:38 PST 2013


On Fri, Nov 15, 2013 at 9:21 AM, Lukas Hellebrandt <kamikazecz at gmail.com>wrote:

> Hi all,
>
> I'm trying to write a tool for detecting undefined behavior in C
> regarding sequence points and side effects.
>
> I'm not sure whether it should be a Clang plugin or LLVM run or
> something completely different (I'm really new to both Clang and LLVM)
> and that's what I need advice with.
>
> For my work, I need to use both AST and alias analysis
>
> Clang plugin:
>         +relatively easy to use
>         +access to AST with all the needed info EXCEPT alias analysis
> (right?)
>         -no alias analysis, I'd need to write one myself
>
> LLVM run:
>         +built-in alias analysis (I'd like to use it, writing my own alias
> analysis is not really what my work is all about)
>         -I do NOT have access to AST
>         -I don't know it at all (but I'm ready to learn it if it shows up
> to be
> the best option)
>
> The big PROBLEM is: a behavior that is undefined in C (and which Clang
> has access to) might be (and in my case WILL be) well defined in LLVM
> (for example, i=i++; is undefined in C but in LLVM code it will be
> already well defined and the result will depend on Clang behavior).
>
> So I thought I could use both, in Clang create a list of rules, for
> example "On line L, there is an undefined behavior if X aliases with Y"
> and then SOMEHOW dig this info from LLVM run.
>
> Is this a good idea? Is there a way (other than output to file from
> Clang and then read it in LLVM) to give this set of rules to LLVM? I'd
> also be glad for any other idea, not necessarily including LLVM and
> Clang, to solve this problem.


I've thought about this a bit from the point of view of adding a check to
-fsanitize=undefined. We already detect cases where it's "locally" obvious
that the expression will have undefined behavior, for instance, we'll warn
on your example of "i = i++;".

>From the ubsan perspective, I think the thing to do here is to form a list
of pairs of subexpressions within the full-expression where, if the two
subexpressions refer to the same object, we will have unsequenced accesses
(in the presence of ?:, &&, and ||, we'll need to be careful to only check
these if the relevant subexpression was actually evaluated). Then, perhaps
at the end of the full-expression, we emit a check that no such pair
evaluated to the same address.

Building the set of pairs is a little tricky, but the code implementing
-Wunsequenced can probably be reused to form this set.

That said, we have a representational weakness in LLVM here: we have no way
to express that the language semantics allow us to freely reorder two
memory operations. If we could emit somehow (perhaps as metadata) the
information that we have about full-expressions, we could use that as input
to an LLVM alias analysis -- then we could use the same representation both
to drive a check and to drive optimizations. In the long term, that's
probably the superior approach.


(Not-fully-baked ideas follow...) We could imagine this working something
like the following. Given:

  (a++, b) = c ? d++ : e

We divide it into separately-sequenced portions:

 full expression: seq 0
 a++: seq 1 for load, seq 2 for store
 b: seq 3
 c: seq 4
 d++: seq 5 for load, seq 6 for store
 e: seq 7
 assignment: seq 8

We know that:

  seq 2 and seq 3 are sequenced after seq 1
  seq 4 and seq 5 are sequenced after seq 3
  seq 6 is sequenced after seq 5
  seq 8 is sequenced after seq 3, 5, and 7
  seq 1-8 are nested within seq 0 (so are by default not sequenced with
other things inside seq 0)

We could then express this as metadata as:

  %a = load @a, !seq !seq1
  %a.1 = add %a, 1
  store @a, %a.1, !seq !seq2
  ; ...

!seq0 = !{}
!seq1 = !{seq0} ; parent is 0
!seq2 = !{seq0, !{seq1}} ; parent is 0, sequenced after 1
!seq3 = !{seq0, !{seq1}} ; parent is 0, sequenced after 1
; ...

... with the semantics that memory operations A and B are known not to
alias if at least one is a load, and they have the same parent, and neither
is (transitively) sequenced before the other.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20131115/e5fbe029/attachment.html>


More information about the cfe-dev mailing list