[llvm-dev] [RFC] Design of a TBAA sanitizer

Wed Apr 12 15:49:07 PDT 2017

On 04/12/2017 05:09 AM, Stephen Kell wrote:
>>>> A design goal of a TBAA sanitizer is to limit the shadow-memory overhead
>>>> of the implementation. ASan, for example, uses 1 bit per byte. Here
>>>> we're hoping to keep the overhead down to 2 bits per byte for the TBAA
>>>> sanitizing. We might be able to do this, while handling all common types
>>>> on the fast path, if we use both alignment and type information.
>>> Slightly provocative question, but are you sure that byte-scale shadow
>>> memory is a good fit?
>> Am I sure? No. But I *think* that is because I'd like to avoid false
>> positives, and that means dealing with places where we dynamically change
>> the type of part of an allocation (via placement new or whatever). This
>> certainly seems necessary to deal with some C++ containers at least.
> (Sorry for the delayed response. Short summary is "I agree with you" but
> wanted to pick some nits/details anyway. :-)
>
> Can you elaborate on the C++ containers thing? If it's just std::variant
> and things like std::any that use aligned_storage_t, then see below.

Those are what come to mind (although in general it is legal to 
partially end the lifetime of an allocated array by placement-newing 
over parts of it).

>
> The dynamically-changing-types thing seems to work okay in both cases.
> In my runtime it is possible to change the recorded type of a heap
> allocation, and to create new types at run time. So there are levers to
> deal with the "part of an allocation" thing, though it does get a bit
> ugly. (At present, a bit more engineering is necessary to support this
> trick for stack objects, e.g. to do placement new on them... but I see
> no real problem supporting it.)
>
> I can see the simplicity benefits of shadow memory, so I'm not arguing
> too hard against it -- just feeling my way around the issues....
>   
>>> Just so I'm understanding: is this talking about shadowing memory with
>>> its "leaf" type only, or with its "top-level" type? So if I have, say,
>>> the following usually-64-bit type:
>>>
>>> struct compound { int x; float y; }
>>>
>>> ... would we paint the shadow area with an alternating pattern of int
>>> (0110 0110) and floats (0110 1001)? Or would we be using the "all other
>>> types" thing since the memory is actually "compound"?
>> I intended to imply that we'd fill with the alternating pattern indicating
>> int and float.
> Understood.
>
> I'm now scratching my head about whether that sacrifices the ability to
> detect UB where the problematic "access" is of a composite type. Passing
> a struct to a function is a kind of access that comes to mind. (I'm
> wavering about "object" versus "subobject", a distinction which I have
> never properly understood in C11....)

Yea. I'm currently concerned that this compact scheme does not really 
work because it can't capture the semantics contained in our struct-path 
TBAA (which is, I believe, equivalent to your point here), which means 
that we might not catch violations of properties on which the optimizer 
might otherwise rely.

>
>>>> For pointers, this scheme would consider all pointers to be the same
>>>> (regardless of pointee type). Doing otherwise would mostly requiring
>>>> putting pointer-type checking on the slow path (i.e. access via a
>>>> pointer pointer), and that could add considerable overhead. We might,
>>>> however, split out function pointers from other pointers. We could
>>>> provide a compile-time option to control the granularity of pointer-type
>>>> checks.
>>> Aha, the pointers-to-pointers problem. Treating all pointers the same
>>> feels like a mistake because, say, aliasing a T* through a void** is
>>> frequently done and really quite useful (although technically UB, if I
>>> understand correctly; is this exploited in LLVM?
>> I don't know if it is exploited by LLVM, although as I recall, all
>> implementations of posix_memalign and similar functions potentially run
>> afoul of the rules (because they need to store the new pointer as a void*
>> regardless of what it actually is).
> I think returning a pointer like posix_memalign() does it is okay -- the
> client just has to declare the written-to pointer as a void*. If they
> want a different type of pointer, they have to do a cast/assignment
> after the call. So it's clients that are in danger of UB here... they
> might use it the sloppy way, by casting a T** to void**. Of course,
> *not* doing this gets tedious in some cases, e.g. when calling generic
> linked-structure routines written in C.
>
>> I'm not sure how strict we can be here
>> in practice. Currently, AFAIKT, Clang emits TBAA metadata which does not
>> differentiate between pointer types (it calls them all "any pointer").
> Interesting -- I should take a look at this (thanks).
>
>>> ), whereas aliasing a T*
>>> through an arbitrary S** is probably a bug.
>>>
>>> Just for contrast: I'm familiar with this problem, and my way around
>>> this comes in a few parts. I check pointer creation (casts) only; checks
>>> on pointer use (dereference) are unnecessary. Usually it takes a while
>>> before people believe me on this, but the short story is that
>>> pointer-creation checks enforce an invariant that no bad-type pointer
>>> gets into circulation. So dereferences are always okay (until the first
>>> warning, at least; execution can continue after a warning, in a
>>> best-effort no-guarantees fashion). This also greatly helps performance.
>> I'm sure this is true, but the problem is that C/C++ don't actually outlaw
>> such pointer casts. As you also mention below, it only becomes a problem if
>> such pointers are used to access an object of some incompatible type. If the
>> sanitizer has false positives then it is not nearly as useful to me, even
>> though the technique you describe might actually find more bugs.
> Fair point. For what it's worth, I do also care a lot about minimising
> false positives, and there's a trick I should have mentioned: issuing a
> "trap pointer" when a bad cast occurs, to delay warnings until it's
> used. On x86-64 this just means flipping some high-order bits so we get
> a non-canonical address that can be losslessly converted back to the
> real address. Trap bits are erased for pointer comparisons, or on casts
> back to a good type. If a trap pointer is used (definitely a bug!), the
> segfault handler generates a sensible warning message, decodes the
> faulting instruction, "de-traps" the relevant pointer in the saved
> register file, and resumes the program. (This could work a bit better
> than it currently does; happy to elaborate.)

This makes sense. Thanks for mentioning this.

>
>>>> Thoughts?
>>> My high-level thought is the following question. Is the plan fixed on
>>> developing a tool specifically for C/C++ effective-type rules, i.e.
>>> focused on catching/fixing those particular cases of UB and with the
>>> goal of improving user code performance? Or is there any interest in the
>>> overlapping but distinct problem of helping users detect and debug
>>> bad-pointer problems of a type flavour (i.e. those that are not bounds
>>> errors or temporal errors)?
>> My goal is the former, although I certainly believe there is also value in
>> the latter.
> Understood.
>
>>> Plenty of pointer bugs are not UB... e.g. for any heap allocation
>>> ("object without a declared type", in C11-speak) I'm allowed to scribble
>>> over it, thereby changing its effective type. It's only UB if I read it
>>> with its previous effective type.
>> Yes.
>>
>>>   But the scribbling itself is fairly
>>> likely to be the bug,
>> Agreed. However, there are also legitimate use cases, and implementations of
>> standard types (variant, any, etc.) may actually do this.
> I agree that there is some risk of false positives here, but I think the
> range of problematic code is really small and does not include the cases
> you mention.
>
> The core issue is whether type-changing writes should need to be flagged
> specially in the code somehow, allowing them to be instrumented
> appropriately, or whether arbitrary writes must be able to silently
> change effective types. A 100% faithful adherence to standard C does
> require the latter. However, I don't think C++ std::variant or std::any
> are examples of code that do this, because they pretty much have to use
> unions and/or placement new, which are both ways of doing the
> flagging.

Good point.

>
> I do know some C code that relies on silently type-changing writes. In
> fact in SPEC CPU2006 alone there are two cases -- one in bzip2, one in
> lbm. The lbm case is bonkers and really should be a union. In the bzip2
> case it is arguably sane; my tentative fix is to add a realloc() on the
> heap array with a different type but the same size -- basically my
> hacked-up C version of signalling a "placement new".
>
> In general, relying on the "no declared type" property of heap storage
> is pretty fragile, because code suddenly becomes UB if you refactor it
> so that a heap object becomes a stack or static object. So asking users
> to manually suppress these warnings doesn't seem unreasonable to me,
> just as writing your own memcpy() would do... although no doubt opinions
> differ.
>
> (I admit I haven't tried much C++ or standard-library code yet with my
> system. Also, for full disclosure: currently I do not have great support
> for unions... although it errs on the side of avoiding false positives,
> and I think I've figured out how it can be made to work precisely.)
>
>>>   because programmers rarely want to change the type
>>> of an allocation mid-lifetime. Meanwhile, many pointer-type problems are
>>> too indirect or complex for the compiler to actually do TBAA-derived
>>> optimisation on, leaving little to be gained from the UB/TBAA point of
>>> view... but again, plenty from the debugging perspective. I suspect the
>>> biggest value of any tool, even if geared specifically towards TBAA,
>>> will turn out to be largely in debugging scenarios more general than
>>> effective-type UB bugs.
>> My motivation is to help people who are currently forced to build their code
>> with -fno-strict-aliasing figure out what needs to be fixed in their code so
>> they don't need to do that.
> Thanks for the clarification.
>
>> This was good feedback. Your work is indeed focused on a slightly different
>> problem. It could be interesting and useful to have both.
> I will happily agree with this, despite the above nitpicks. :-) And (no
> obligation but) I'd love to be directed towards any further
> problematic/awkward code not covered by the above.

Me too :-) - We might find out...

  -Hal

>
> Stephen

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory