[PATCH] D32199: [TBAASan] A TBAA Sanitizer (Clang)

Hal Finkel via cfe-commits cfe-commits at lists.llvm.org
Mon May 1 12:16:19 PDT 2017


On 05/01/2017 12:49 PM, Daniel Berlin wrote:
>
>
> On Fri, Apr 21, 2017 at 4:03 AM, Hal Finkel via Phabricator 
> <reviews at reviews.llvm.org <mailto:reviews at reviews.llvm.org>> wrote:
>
>     hfinkel added a comment.
>
>     In https://reviews.llvm.org/D32199#732737
>     <https://reviews.llvm.org/D32199#732737>, @rsmith wrote:
>
>     > In https://reviews.llvm.org/D32199#732189
>     <https://reviews.llvm.org/D32199#732189>, @hfinkel wrote:
>     >
>     > > In https://reviews.llvm.org/D32199#731472
>     <https://reviews.llvm.org/D32199#731472>, @rsmith wrote:
>     > >
>     > > > 1. C's "effective type" rule allows writes to set the type
>     pretty much unconditionally, unless the storage is for a variable
>     with a declared type
>     > >
>     > >
>     > > To come back to this point: We don't really implement these
>     rules now, and it is not clear that we will. The problem here is
>     that, if we take the specification literally, then we can't use
>     our current TBAA at all. The problem is that if we have:
>     > >
>     > >   write x, !tbaa "int"
>     > >   read x, !tbaa "int"
>     > >   write x, !tbaa "float"
>     > >
>     > >
>     > > TBAA will currently tell us that the "float" write aliases
>     with neither the preceding read nor the preceding write.
>     >
>     >
>     > Right, C's TBAA rules do not (in general) permit a store to be
>     reordered before a memory operation of a different type, they only
>     allow loads to be moved before stores. (Put another way, they do
>     not tell you that pointers point to distinct memory locations,
>     just that a stored value cannot be observed by a load of a
>     different type.) You get the more general "distinct memory
>     locations" result only for objects of a declared type.
>     >
>     > C++ is similar, except that (because object lifetimes do not
>     currently begin magically due to a store) you /can/ reorder stores
>     past a memory operation of a different type if you know no
>     object's lifetime began in between. (But currently we do not
>     record all lifetime events in IR, so we can't do that today. Also,
>     we may be about to lose the property that you can statically
>     determine a small number of places that might start an object
>     lifetime.)
>     >
>     > > Also, a strict reading of C's access rules seems to rule out
>     the premise underlying our struct-path TBAA entirely. So long as
>     I'm accessing a value using a struct that has some member,
>     including recursively, with that type, then it's fine. The
>     matching of the relative offsets is a sufficient, but not
>     necessary, condition for well-defined access. C++ has essentially
>     the same language (and, thus, potentially the same problem).
>     >
>     > I agree this rule is garbage, but it's not as permissive as I
>     think you're suggesting. The rule says that you can use an lvalue
>     of struct type to access memory of struct field type. In C this
>     happens during struct assignment, for instance. It does *not*
>     permit using an lvalue of struct field type to access unrelated
>     fields of the same struct. So C appears to allow this nonsense:
>     >
>     >   char *p = malloc(8);
>     >   *(int*)p = 0;
>     >   *(int*)(p + 4) = 0;
>     >   struct S {int n; float f;} s = *(struct S*)p; // use lvalue of
>     type `struct S` to access object of effective type `int`, to
>     initialize a `float`
>     >
>     >
>     > but not this nonsense:
>     >
>     >   float q = ((struct S*)p)->f; // ub, cannot use lvalue of type
>     `float` to access object of effective type `int`
>     >
>     >
>     > ... which just means that we can't make much use of TBAA when
>     emitting struct copies in C.
>     >
>     > In C++, on the other hand, the rule is even more garbage, since
>     there is no way to perform a memory access with a glvalue of class
>     type. (The closest you get is that a defaulted union
>     construction/assignment copies the object representation, but
>     that's expressed in terms of copying a sequence of unsigned chars,
>     and in any case those are member functions and so already require
>     an object of the correct type to exist.) See wg21.link/cwg2051
>
>
>     Our struct-path TBAA does the following:
>
>       struct X { int a, b; };
>       X x { 50, 100 };
>       X *o = (X*) (((int*) &x) + 1);
>
>       int a_is_b = o->a; // This is UB (or so we say)?
>
>
> This is UB.
> A good resource for this stuff is 
> http://www.cl.cam.ac.uk/~pes20/cerberus/ 
> <http://www.cl.cam.ac.uk/%7Epes20/cerberus/> which has a long document 
> where they exlpore all of these and what various compilers do, along 
> with what the standard seems to say.

http://www.cl.cam.ac.uk/~pes20/cerberus/notes30-full.pdf is 172 pages, 
and so I may have missed it, but I don't see this case. Also, I'd really 
like to see where the standard says this is UB. I don't see it.

Thanks again,
Hal

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20170501/24733a08/attachment-0001.html>


More information about the cfe-commits mailing list