[PATCH] D32199: [TBAASan] A TBAA Sanitizer (Clang)

Daniel Berlin via cfe-commits cfe-commits at lists.llvm.org
Mon May 1 12:28:17 PDT 2017


On Mon, May 1, 2017 at 12:16 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 05/01/2017 12:49 PM, Daniel Berlin wrote:
>
>
>
> On Fri, Apr 21, 2017 at 4:03 AM, Hal Finkel via Phabricator <
> reviews at reviews.llvm.org> wrote:
>
>> hfinkel added a comment.
>>
>> In https://reviews.llvm.org/D32199#732737, @rsmith wrote:
>>
>> > In https://reviews.llvm.org/D32199#732189, @hfinkel wrote:
>> >
>> > > In https://reviews.llvm.org/D32199#731472, @rsmith wrote:
>> > >
>> > > > 1. C's "effective type" rule allows writes to set the type pretty
>> much unconditionally, unless the storage is for a variable with a declared
>> type
>> > >
>> > >
>> > > To come back to this point: We don't really implement these rules
>> now, and it is not clear that we will. The problem here is that, if we take
>> the specification literally, then we can't use our current TBAA at all. The
>> problem is that if we have:
>> > >
>> > >   write x, !tbaa "int"
>> > >   read x, !tbaa "int"
>> > >   write x, !tbaa "float"
>> > >
>> > >
>> > > TBAA will currently tell us that the "float" write aliases with
>> neither the preceding read nor the preceding write.
>> >
>> >
>> > Right, C's TBAA rules do not (in general) permit a store to be
>> reordered before a memory operation of a different type, they only allow
>> loads to be moved before stores. (Put another way, they do not tell you
>> that pointers point to distinct memory locations, just that a stored value
>> cannot be observed by a load of a different type.) You get the more general
>> "distinct memory locations" result only for objects of a declared type.
>> >
>> > C++ is similar, except that (because object lifetimes do not currently
>> begin magically due to a store) you /can/ reorder stores past a memory
>> operation of a different type if you know no object's lifetime began in
>> between. (But currently we do not record all lifetime events in IR, so we
>> can't do that today. Also, we may be about to lose the property that you
>> can statically determine a small number of places that might start an
>> object lifetime.)
>> >
>> > > Also, a strict reading of C's access rules seems to rule out the
>> premise underlying our struct-path TBAA entirely. So long as I'm accessing
>> a value using a struct that has some member, including recursively, with
>> that type, then it's fine. The matching of the relative offsets is a
>> sufficient, but not necessary, condition for well-defined access. C++ has
>> essentially the same language (and, thus, potentially the same problem).
>> >
>> > I agree this rule is garbage, but it's not as permissive as I think
>> you're suggesting. The rule says that you can use an lvalue of struct type
>> to access memory of struct field type. In C this happens during struct
>> assignment, for instance. It does *not* permit using an lvalue of struct
>> field type to access unrelated fields of the same struct. So C appears to
>> allow this nonsense:
>> >
>> >   char *p = malloc(8);
>> >   *(int*)p = 0;
>> >   *(int*)(p + 4) = 0;
>> >   struct S {int n; float f;} s = *(struct S*)p; // use lvalue of type
>> `struct S` to access object of effective type `int`, to initialize a `float`
>> >
>> >
>> > but not this nonsense:
>> >
>> >   float q = ((struct S*)p)->f; // ub, cannot use lvalue of type `float`
>> to access object of effective type `int`
>> >
>> >
>> > ... which just means that we can't make much use of TBAA when emitting
>> struct copies in C.
>> >
>> > In C++, on the other hand, the rule is even more garbage, since there
>> is no way to perform a memory access with a glvalue of class type. (The
>> closest you get is that a defaulted union construction/assignment copies
>> the object representation, but that's expressed in terms of copying a
>> sequence of unsigned chars, and in any case those are member functions and
>> so already require an object of the correct type to exist.) See
>> wg21.link/cwg2051
>>
>>
>> Our struct-path TBAA does the following:
>>
>>   struct X { int a, b; };
>>   X x { 50, 100 };
>>   X *o = (X*) (((int*) &x) + 1);
>>
>>   int a_is_b = o->a; // This is UB (or so we say)?
>>
>
> This is UB.
> A good resource for this stuff is http://www.cl.cam.ac.uk/~pes20/cerberus/
> which has a long document where they exlpore all of these and what various
> compilers do, along with what the standard seems to say.
>
>
> http://www.cl.cam.ac.uk/~pes20/cerberus/notes30-full.pdf is 172 pages,
> and so I may have missed it, but I don't see this case.
>

Let me find it for you. it may only be in one of the other versions.


> Also, I'd really like to see where the standard says this is UB. I don't
> see it.
>
>
So you believe that you can index into an object randomly by pointer
arithmetic and pull out a different field?

For starters, this is  illegal because you don't know where the padding
bytes are.
You cannot assume that X.a + 1 == X.b
"Implementation alignment requirements might cause two adjacent members not
to be allocated immediately after each other;"

See 9.2.14
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20170501/065191af/attachment.html>


More information about the cfe-commits mailing list