[cfe-dev] C sequence-point analysis

Richard Smith richard at metafoo.co.uk
Mon Nov 18 10:54:59 PST 2013


On Mon, Nov 18, 2013 at 5:17 AM, Lukas Hellebrandt <kamikazecz at gmail.com>wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi, Richard,
>
> Thanks for confirming my thoughts! That's what I meant, make a set of
> rules using Clang and then check them by LLVM.
>
> As I said, I am relatively new to Clang and have never used LLVM
> before. So, what's the best way to do this? Do I make a Clang plugin?
> I can do that (I already did a part of the analyzer, however, without
> any alias analysis; and I never did any change to the AST). And if
> yes, how do I send some (meta)data to LLVM? The only way I can think
> of is to save them to a file. And then, do I run a LLVM pass? Just
> asking if I understood correctly. My problem is now more about using
> Clang together with LLVM (as you confirmed my approach was a good
> idea, thanks for that again!).


To clarify, I'm suggesting that you teach Clang's IR generation to emit
code to catch these bugs when the compiled program is run. That shouldn't
require any alias analysis or similar at the point where you insert the
checks (if you wanted to be a bit cleverer, you could emit metadata from
Clang and use an LLVM pass to emit the checks, and avoid emitting checks
for cases where alias analysis could prove they're unnecessary, but the
optimizer should remove the redundant checks anyway, so such cleverness
should hopefully be unnecessary).

We have deliberately avoided having any checks in Clang that operate by
inspecting the IR post-optimization, since such checks tend to be very
unstable across changes to the compiler, and that generates pain for users
of the checks (a seemingly-unrelated change to the code could cause the
optimizer to make a different decision and expose a problem, leading to an
unexpected build break for a -Werror build).


> *****************************
> Lukas Hellebrandt
> kamikazecz at gmail.com
> *****************************
>
> On 11/15/2013 09:46 PM, Richard Smith wrote:
> > On Fri, Nov 15, 2013 at 9:21 AM, Lukas Hellebrandt
> > <kamikazecz at gmail.com <mailto:kamikazecz at gmail.com>> wrote:
> >
> > Hi all,
> >
> > I'm trying to write a tool for detecting undefined behavior in C
> > regarding sequence points and side effects.
> >
> > I'm not sure whether it should be a Clang plugin or LLVM run or
> > something completely different (I'm really new to both Clang and
> > LLVM) and that's what I need advice with.
> >
> > For my work, I need to use both AST and alias analysis
> >
> > Clang plugin: +relatively easy to use +access to AST with all the
> > needed info EXCEPT alias analysis (right?) -no alias analysis, I'd
> > need to write one myself
> >
> > LLVM run: +built-in alias analysis (I'd like to use it, writing my
> > own alias analysis is not really what my work is all about) -I do
> > NOT have access to AST -I don't know it at all (but I'm ready to
> > learn it if it shows up to be the best option)
> >
> > The big PROBLEM is: a behavior that is undefined in C (and which
> > Clang has access to) might be (and in my case WILL be) well defined
> > in LLVM (for example, i=i++; is undefined in C but in LLVM code it
> > will be already well defined and the result will depend on Clang
> > behavior).
> >
> > So I thought I could use both, in Clang create a list of rules,
> > for example "On line L, there is an undefined behavior if X aliases
> > with Y" and then SOMEHOW dig this info from LLVM run.
> >
> > Is this a good idea? Is there a way (other than output to file
> > from Clang and then read it in LLVM) to give this set of rules to
> > LLVM? I'd also be glad for any other idea, not necessarily
> > including LLVM and Clang, to solve this problem.
> >
> >
> > I've thought about this a bit from the point of view of adding a
> > check to -fsanitize=undefined. We already detect cases where it's
> > "locally" obvious that the expression will have undefined behavior,
> > for instance, we'll warn on your example of "i = i++;".
> >
> > From the ubsan perspective, I think the thing to do here is to form
> > a list of pairs of subexpressions within the full-expression where,
> > if the two subexpressions refer to the same object, we will have
> > unsequenced accesses (in the presence of ?:, &&, and ||, we'll need
> > to be careful to only check these if the relevant subexpression was
> > actually evaluated). Then, perhaps at the end of the
> > full-expression, we emit a check that no such pair evaluated to the
> > same address.
> >
> > Building the set of pairs is a little tricky, but the code
> > implementing -Wunsequenced can probably be reused to form this
> > set.
> >
> > That said, we have a representational weakness in LLVM here: we
> > have no way to express that the language semantics allow us to
> > freely reorder two memory operations. If we could emit somehow
> > (perhaps as metadata) the information that we have about
> > full-expressions, we could use that as input to an LLVM alias
> > analysis -- then we could use the same representation both to drive
> > a check and to drive optimizations. In the long term, that's
> > probably the superior approach.
> >
> >
> > (Not-fully-baked ideas follow...) We could imagine this working
> > something like the following. Given:
> >
> > (a++, b) = c ? d++ : e
> >
> > We divide it into separately-sequenced portions:
> >
> > full expression: seq 0 a++: seq 1 for load, seq 2 for store b: seq
> > 3 c: seq 4 d++: seq 5 for load, seq 6 for store e: seq 7
> > assignment: seq 8
> >
> > We know that:
> >
> > seq 2 and seq 3 are sequenced after seq 1 seq 4 and seq 5 are
> > sequenced after seq 3 seq 6 is sequenced after seq 5 seq 8 is
> > sequenced after seq 3, 5, and 7 seq 1-8 are nested within seq 0 (so
> > are by default not sequenced with other things inside seq 0)
> >
> > We could then express this as metadata as:
> >
> > %a = load @a, !seq !seq1 %a.1 = add %a, 1 store @a, %a.1, !seq
> > !seq2 ; ...
> >
> > !seq0 = !{} !seq1 = !{seq0} ; parent is 0 !seq2 = !{seq0, !{seq1}}
> > ; parent is 0, sequenced after 1 !seq3 = !{seq0, !{seq1}} ; parent
> > is 0, sequenced after 1 ; ...
> >
> > ... with the semantics that memory operations A and B are known not
> > to alias if at least one is a load, and they have the same parent,
> > and neither is (transitively) sequenced before the other.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.13 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iF4EAREIAAYFAlKKE28ACgkQHFHs/Czs0u61uAD+M+vsA0kFpGS9YztJkuN9Jkqb
> 8a8q4RfFdCJiyku75QwBAKrshaFAfM98UMpymLrEbJn6q60UK9hw9wkny6u0M17S
> =GLxY
> -----END PGP SIGNATURE-----
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20131118/5ab838fb/attachment.html>


More information about the cfe-dev mailing list