[PATCH] More precise aliasing for char arrays

Richard Smith richard at metafoo.co.uk
Thu Jun 26 04:10:33 PDT 2014


On Wed, Jun 25, 2014 at 7:34 PM, Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
wrote:

> On Wed, Jun 25, 2014 at 3:26 PM, Sanjin Sijaric <ssijaric at codeaurora.org>
> wrote:
> >
> >>     int *p;
> >>     typedef struct {
> >>       char a;
> >>       char b[100];
> >>       char c;
> >>     } S;
> >>
> >>     S x;
> >>
> >>     void func1 (char d) {
> >>       for (int i = 0; i < 100; i++) {
> >>         x.b[i] += 1;
> >>         d = *p;
> >>         x.a += d;
> >>       }
> >>     }
> >>
> >> It seems like you want the compiler to hoist the read of `*p` above the
> write to `x.b[i]`.
> >> But that isn't generally possible, is it? because the caller might have
> executed
> >>
> >>    p = &x.b[3];
> >>
> >> before the call to func1.
> >
> > Here, "p" is a pointer to int, whereas b is a char array.  Wouldn't "p =
> &x.b[3];" break ansi aliasing rules?
>
> I was under the impression that that was the entire point of "omnipotent
> char"!
>

I think that's backwards from the intent: if you swap over 'int' and 'char'
in the example, we cannot do the reordering, because p could point to (some
byte of) one of the ints.

With the test as-is, we *can* reorder the *p load (and even move it out of
the loop):
  -- *p cannot alias x.b[i], because if 'x.b[i] += 1' has defined behavior,
then x is an object of type S and x.b is an object of type char[100] and 0
<= i < 100, and therefore there is no int object aliased by that store
  -- *p cannot alias x.a, because if 'x.a += d' has defined behavior, then
x is an object of type S, so a store to S::a cannot alias any int object.

I think this kind of analysis should probably be covered by
-fstruct-path-tbaa, not enabled by default, though. (I'm a little surprised
that -fstruct-path-tbaa doesn't get this right today...)

See lines 106-108 in the very file you're changing:
>
> 00106     // Character types are special and can alias anything.
> 00107     // In C++, this technically only includes "char" and "unsigned
> char",
> 00108     // and not "signed char". In C, it includes all three. For now,
> 00109     // the risk of exploiting this detail in C++ seems likely to
> outweigh
> 00110     // the benefit.
>
> Source: http://clang.llvm.org/doxygen/CodeGenTBAA_8cpp_source.html
>
> –Arthur
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140626/c8ff5942/attachment.html>


More information about the cfe-commits mailing list