[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 2 16:02:33 PST 2017


On Mon, Jan 2, 2017 at 3:22 PM, Sanjoy Das
<sanjoy at playingwithpointers.com> wrote:
> Hi George,
>
> On Mon, Jan 2, 2017 at 1:18 PM, George Burgess IV
> <george.burgess.iv at gmail.com> wrote:
>>> Continuing from above: in other words, since your use case is always
>>> setting the `null-is-unknown` bit, can you re-define @llvm.objectsize
>>> to have that semantic without adding a new parameter?
>>>
>>> That's not backward compatible, but there's a simple but conservative
>>> update strategy.
>>
>> That would almost definitely be the simplest way forward, but I'm unsure how
>> happy this would make other users of objectsize. In particular, our
>> objectsize sanitizer will no longer catch any null-related issues like:
>>
>> struct Foo { int a; };
>> int getA(struct Foo *f) { return f->a; }
>> int getA2() { struct Foo *f = 0; return f->a; }
>
> I guess that's another sign that the spec is wrong. :)
>
> However, maybe we can use a trick here -- instead of lowering the
> check to assert(ObjectSize(f) UGE 4) how about assert((1 +
> ObjectSize(f)) UGT 4)?  The addition should always constant fold, and
> if `f` is null then we'll assert((1 + (-1)) UGT 4) ==> assert(false).

Never mind; that won't work.  We'll also fail the assert for
unanalyzable values.

>
> -- Sanjoy
>
>>
>> // getA2 now traps rather than calling a special ubsan-check-failed function
>>
>> Grepping through, that's the only other direct user of objectsize that I can
>> find. If we're okay with that loss in accuracy (and don't believe that we
>> have other objectsize users that care), I'm happy to just alter objectsize's
>> default behavior and see how it does.
>>
>>> However, none of this matters if we're not in a position to change the
>>> specification.
>>
>> Sadly, we're not. :/
>>
>> On Mon, Jan 2, 2017 at 11:45 AM, Sanjoy Das <sanjoy at playingwithpointers.com>
>> wrote:
>>>
>>> Hi George,
>>>
>>> On Mon, Jan 2, 2017 at 10:41 AM, George Burgess IV
>>> <george.burgess.iv at gmail.com> wrote:
>>> > Thanks for the comments!
>>> >
>>> >> Have you considered changing our existing behavior to match GCC's
>>> >> builtin_object_size instead of adding a new parameter
>>> >
>>> > Yup! My issue with turning `i1 %min` into `i8 %flags` is that
>>> > __builtin_object_size would get lowered like:
>>> >
>>> > __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i8 2)
>>> > __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i8 2)
>>> > __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i8 3)
>>> > (__builtin_object_size(p, 3) doesn't actually get lowered)
>>> >
>>> > ...Which might be a bit surprising to some people. If we think that's a
>>> > non-issue, I'm happy to take the simpler approach and use an i8.
>>>
>>> What I had in mind was:
>>>
>>>  __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i1 0)
>>>  __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i1 0)
>>>  __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i1 1)
>>>  __builtin_object_size(p, 3) -> @llvm.objectsize(i8* %p, i1 1)
>>>
>>> and changing the specification of @llvm.objectsize to match.
>>>
>>> >> We'll have to have some <min>-awareness either in clang (to decide if
>>> >> the <null-is-unknown> bit needs to be set) or in the middle end.  What
>>> >> is your plan here?
>>> >
>>> > My plan was just to always set the `null-is-unknown` bit when lowering a
>>> > call to __builtin_object_size in clang. If `min` is true, we treat
>>> > unknown
>>> > and null values identically in @llvm.objectsize, so whether
>>> > `null-is-unknown` is set in that case shouldn't matter.
>>>
>>> Continuing from above: in other words, since your use case is always
>>> setting the `null-is-unknown` bit, can you re-define @llvm.objectsize
>>> to have that semantic without adding a new parameter?
>>>
>>> That's not backward compatible, but there's a simple but conservative
>>> update strategy.
>>>
>>> >> However, since Malloc can return null:
>>> >
>>> > I think I was unclear: this behavior would only exist if
>>> > @llvm.objectsize
>>> > was actually handed null. I wasn't planning on changing how we'd handle
>>> > memory allocation functions that may return null (GCC gives back 2 for
>>> > __builtin_object_size(malloc(2), 0)). In other words, this
>>> > `null-is-unknown`
>>> > bit only makes the objectsize evaluator see `T* null` as though it was
>>> > `call
>>> > T* @opaque_user_defined_function()`, nothing else.
>>>
>>> I meant to say that it looks like the semantics of ObjectSizeMin(X) is
>>> that "return the conservative minimum object size for all values that
>>> X may take at runtime" (resp. ObjectSizeMax(X)).
>>>
>>> For instance this:
>>>
>>> void i(int c, volatile int* sink) {
>>>   void* mem1 = malloc(20);
>>>   void* mem2 = malloc(40);
>>>
>>>   *sink = __builtin_object_size(c ? mem1 : mem2, 0);
>>>   *sink = __builtin_object_size(c ? mem1 : mem2, 2);
>>> }
>>>
>>> is lowered to
>>>
>>>         movl    $40, (%rsi)
>>>         movl    $20, (%rsi)
>>>         ret
>>>
>>> by GCC.
>>>
>>> Applying the same logic to malloc(N), since it returns a location that
>>> has N dereferenceable bytes or NULL, it follows that
>>> ObjectSizeMin(malloc(N)) should return the smaller of
>>> ObjectSizeMin(NULL) and ObjectSizeMin(MemoryLocOfNBytes) == the
>>> smaller of ObjectSizeMin(NULL) and N.  Given that we want the result
>>> of ObjectSizeMin(malloc(N)) == UMIN(ObjectSizeMin(NULL), N) to be N,
>>> we'd want ObjectSizeMin(NULL) to be (unsigned)-1 for consistency.
>>>
>>> However, none of this matters if we're not in a position to change the
>>> specification.
>>>
>>> -- Sanjoy
>>>
>>>
>>> > This is also consistent with how @llvm.objectsize already acts: if the
>>> > pointer it's given is the result of a call to `malloc(N)`, it'll return
>>> > N
>>> > regardless of the value of `min`.
>>> >
>>> > But yeah, I agree that some of the features of __builtin_object_size
>>> > aren't
>>> > entirely intuitive. :)
>>> >
>>> > On Mon, Jan 2, 2017 at 12:01 AM, Sanjoy Das
>>> > <sanjoy at playingwithpointers.com>
>>> > wrote:
>>> >>
>>> >> Hi George,
>>> >>
>>> >> Have you considered changing our existing behavior to match GCC's
>>> >> builtin_object_size instead of adding a new parameter?  That may be
>>> >> simpler overall.  There's also a clear upgrade strategy -- fold every
>>> >> old style call to "<min> ? 0 : 1".
>>> >>
>>> >> You probably already know this, but GCC folds
>>> >> builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0.
>>> >> We'll have to have some <min>-awareness either in clang (to decide if
>>> >> the <null-is-unknown> bit needs to be set) or in the middle end.  What
>>> >> is your plan here?
>>> >>
>>> >> I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and
>>> >> builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the
>>> >> inverse.  This follows from the following "intuitive" rules
>>> >>
>>> >> ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B))
>>> >> ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B))
>>> >>
>>> >> (where X is a value that can either be A or B at runtime)
>>> >>
>>> >> and that we want to fold
>>> >>
>>> >> ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N
>>> >>
>>> >> However, since Malloc can return null:
>>> >>
>>> >> ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N
>>> >> ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N
>>> >>
>>> >> and for that to be true ObjSizeMax(NULL) =
>>> >> builtin_object_size(NULL, 0) needs to be 0 and ObjSizeMin(NULL) =
>>> >> builtin_object_size(NULL, 2) needs to be (unsigned)-1.
>>> >>
>>> >> However, I'm not sure if it is up to us to change that; given the very
>>> >> motivation of this thread is GCC compatibility.
>>> >>
>>> >> -- Sanjoy
>>> >>
>>> >> On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev
>>> >> <llvm-dev at lists.llvm.org> wrote:
>>> >> > Happy New Year ping. :)
>>> >> >
>>> >> > Will ping again on Wednesday. If I don't get comments by EOD
>>> >> > Thursday,
>>> >> > I'll
>>> >> > assume everyone's OK with this and put together a patch.
>>> >> >
>>> >> > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV
>>> >> > <george.burgess.iv at gmail.com> wrote:
>>> >> >>
>>> >> >> tl;dr: We'd like to add a bit to @llvm.objectsize to make it
>>> >> >> optionally
>>> >> >> be
>>> >> >> conservative when it's handed a null pointer.
>>> >> >>
>>> >> >> Happy Wednesday!
>>> >> >>
>>> >> >> We're trying to fix PR23277, which is a bug about how clang+LLVM
>>> >> >> treat
>>> >> >> __builtin_object_size when it's given a null pointer. For
>>> >> >> compatibility
>>> >> >> with
>>> >> >> GCC, clang would like to be able to hand back a conservative result
>>> >> >> (e.g.
>>> >> >> (size_t)-1), but the LLVM intrinsic that clang lowers
>>> >> >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back 0
>>> >> >> when
>>> >> >> given a null pointer. Because it's often assumed that
>>> >> >> __builtin_object_size
>>> >> >> always folds to a constant, handling this only in clang may be
>>> >> >> tricky:
>>> >> >> if we
>>> >> >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) :
>>> >> >> -1ULL)
>>> >> >> and
>>> >> >> LLVM can't fold away the null check, we've failed to emit a
>>> >> >> constant.
>>> >> >>
>>> >> >> So, the best path forward that I can see is to add a "null is
>>> >> >> unknown
>>> >> >> size" bit to @llvm.objectsize, much like the "min" bit it already
>>> >> >> has.
>>> >> >> If
>>> >> >> this bit is true, null would be treated like an opaque pointer.
>>> >> >> Otherwise,
>>> >> >> @llvm.objectsize would act no differently than it does today.
>>> >> >>
>>> >> >> If we like this approach, I'm assuming it would be best to have this
>>> >> >> bit
>>> >> >> as a third argument to @llvm.objectsize, rather than making the
>>> >> >> second
>>> >> >> argument an i8 and using it as a bag of bits.
>>> >> >>
>>> >> >> All thoughts/comments/alternative approaches/etc. highly
>>> >> >> appreciated.
>>> >> >> :)
>>> >> >>
>>> >> >> Thanks (and Happy Holidays)!
>>> >> >> George
>>> >> >
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > LLVM Developers mailing list
>>> >> > llvm-dev at lists.llvm.org
>>> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Sanjoy Das
>>> >> http://playingwithpointers.com
>>> >
>>> >
>>
>>


More information about the llvm-dev mailing list