[llvm-dev] RFC: Allowing @llvm.objectsize to be more conservative with null.

George Burgess IV via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 2 13:18:59 PST 2017


> Continuing from above: in other words, since your use case is always
> setting the `null-is-unknown` bit, can you re-define @llvm.objectsize
> to have that semantic without adding a new parameter?
>
> That's not backward compatible, but there's a simple but conservative
> update strategy.

That would almost definitely be the simplest way forward, but I'm unsure
how happy this would make other users of objectsize. In particular, our
objectsize sanitizer will no longer catch any null-related issues like:

struct Foo { int a; };
int getA(struct Foo *f) { return f->a; }
int getA2() { struct Foo *f = 0; return f->a; }

// getA2 now traps rather than calling a special ubsan-check-failed function

Grepping through, that's the only other direct user of objectsize that I
can find. If we're okay with that loss in accuracy (and don't believe that
we have other objectsize users that care), I'm happy to just alter
objectsize's default behavior and see how it does.

> However, none of this matters if we're not in a position to change the
> specification.

Sadly, we're not. :/

On Mon, Jan 2, 2017 at 11:45 AM, Sanjoy Das <sanjoy at playingwithpointers.com>
wrote:

> Hi George,
>
> On Mon, Jan 2, 2017 at 10:41 AM, George Burgess IV
> <george.burgess.iv at gmail.com> wrote:
> > Thanks for the comments!
> >
> >> Have you considered changing our existing behavior to match GCC's
> >> builtin_object_size instead of adding a new parameter
> >
> > Yup! My issue with turning `i1 %min` into `i8 %flags` is that
> > __builtin_object_size would get lowered like:
> >
> > __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i8 2)
> > __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i8 2)
> > __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i8 3)
> > (__builtin_object_size(p, 3) doesn't actually get lowered)
> >
> > ...Which might be a bit surprising to some people. If we think that's a
> > non-issue, I'm happy to take the simpler approach and use an i8.
>
> What I had in mind was:
>
>  __builtin_object_size(p, 0) -> @llvm.objectsize(i8* %p, i1 0)
>  __builtin_object_size(p, 1) -> @llvm.objectsize(i8* %p, i1 0)
>  __builtin_object_size(p, 2) -> @llvm.objectsize(i8* %p, i1 1)
>  __builtin_object_size(p, 3) -> @llvm.objectsize(i8* %p, i1 1)
>
> and changing the specification of @llvm.objectsize to match.
>
> >> We'll have to have some <min>-awareness either in clang (to decide if
> >> the <null-is-unknown> bit needs to be set) or in the middle end.  What
> >> is your plan here?
> >
> > My plan was just to always set the `null-is-unknown` bit when lowering a
> > call to __builtin_object_size in clang. If `min` is true, we treat
> unknown
> > and null values identically in @llvm.objectsize, so whether
> > `null-is-unknown` is set in that case shouldn't matter.
>
> Continuing from above: in other words, since your use case is always
> setting the `null-is-unknown` bit, can you re-define @llvm.objectsize
> to have that semantic without adding a new parameter?
>
> That's not backward compatible, but there's a simple but conservative
> update strategy.
>
> >> However, since Malloc can return null:
> >
> > I think I was unclear: this behavior would only exist if @llvm.objectsize
> > was actually handed null. I wasn't planning on changing how we'd handle
> > memory allocation functions that may return null (GCC gives back 2 for
> > __builtin_object_size(malloc(2), 0)). In other words, this
> `null-is-unknown`
> > bit only makes the objectsize evaluator see `T* null` as though it was
> `call
> > T* @opaque_user_defined_function()`, nothing else.
>
> I meant to say that it looks like the semantics of ObjectSizeMin(X) is
> that "return the conservative minimum object size for all values that
> X may take at runtime" (resp. ObjectSizeMax(X)).
>
> For instance this:
>
> void i(int c, volatile int* sink) {
>   void* mem1 = malloc(20);
>   void* mem2 = malloc(40);
>
>   *sink = __builtin_object_size(c ? mem1 : mem2, 0);
>   *sink = __builtin_object_size(c ? mem1 : mem2, 2);
> }
>
> is lowered to
>
>         movl    $40, (%rsi)
>         movl    $20, (%rsi)
>         ret
>
> by GCC.
>
> Applying the same logic to malloc(N), since it returns a location that
> has N dereferenceable bytes or NULL, it follows that
> ObjectSizeMin(malloc(N)) should return the smaller of
> ObjectSizeMin(NULL) and ObjectSizeMin(MemoryLocOfNBytes) == the
> smaller of ObjectSizeMin(NULL) and N.  Given that we want the result
> of ObjectSizeMin(malloc(N)) == UMIN(ObjectSizeMin(NULL), N) to be N,
> we'd want ObjectSizeMin(NULL) to be (unsigned)-1 for consistency.
>
> However, none of this matters if we're not in a position to change the
> specification.
>
> -- Sanjoy
>
>
> > This is also consistent with how @llvm.objectsize already acts: if the
> > pointer it's given is the result of a call to `malloc(N)`, it'll return N
> > regardless of the value of `min`.
> >
> > But yeah, I agree that some of the features of __builtin_object_size
> aren't
> > entirely intuitive. :)
> >
> > On Mon, Jan 2, 2017 at 12:01 AM, Sanjoy Das <
> sanjoy at playingwithpointers.com>
> > wrote:
> >>
> >> Hi George,
> >>
> >> Have you considered changing our existing behavior to match GCC's
> >> builtin_object_size instead of adding a new parameter?  That may be
> >> simpler overall.  There's also a clear upgrade strategy -- fold every
> >> old style call to "<min> ? 0 : 1".
> >>
> >> You probably already know this, but GCC folds
> >> builtin_object_size(0, 0) to -1 and builtin_object_size(0, 2) to 0.
> >> We'll have to have some <min>-awareness either in clang (to decide if
> >> the <null-is-unknown> bit needs to be set) or in the middle end.  What
> >> is your plan here?
> >>
> >> I also found gcc's choice of folding builtin_object_size(0, 2) to 0 and
> >> builtin_object_size(0, 0) to -1 somewhat odd; I'd have expected the
> >> inverse.  This follows from the following "intuitive" rules
> >>
> >> ObjSizeMax(X) = UMAX(ObjSizeMax(A), ObjSizeMax(B))
> >> ObjSizeMin(X) = UMIN(ObjSizeMin(A), ObjSizeMin(B))
> >>
> >> (where X is a value that can either be A or B at runtime)
> >>
> >> and that we want to fold
> >>
> >> ObjSizeMax(Malloc(N)) = ObjSizeMin(Malloc(N)) = N
> >>
> >> However, since Malloc can return null:
> >>
> >> ObjSizeMax(Malloc(N)) = UMAX(N, ObjSizeMax(NULL)) = N
> >> ObjSizeMin(Malloc(N)) = UMIN(N, ObjSizeMin(NULL)) = N
> >>
> >> and for that to be true ObjSizeMax(NULL) =
> >> builtin_object_size(NULL, 0) needs to be 0 and ObjSizeMin(NULL) =
> >> builtin_object_size(NULL, 2) needs to be (unsigned)-1.
> >>
> >> However, I'm not sure if it is up to us to change that; given the very
> >> motivation of this thread is GCC compatibility.
> >>
> >> -- Sanjoy
> >>
> >> On Sun, Jan 1, 2017 at 10:03 PM, George Burgess IV via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >> > Happy New Year ping. :)
> >> >
> >> > Will ping again on Wednesday. If I don't get comments by EOD Thursday,
> >> > I'll
> >> > assume everyone's OK with this and put together a patch.
> >> >
> >> > On Wed, Dec 21, 2016 at 11:44 AM, George Burgess IV
> >> > <george.burgess.iv at gmail.com> wrote:
> >> >>
> >> >> tl;dr: We'd like to add a bit to @llvm.objectsize to make it
> optionally
> >> >> be
> >> >> conservative when it's handed a null pointer.
> >> >>
> >> >> Happy Wednesday!
> >> >>
> >> >> We're trying to fix PR23277, which is a bug about how clang+LLVM
> treat
> >> >> __builtin_object_size when it's given a null pointer. For
> compatibility
> >> >> with
> >> >> GCC, clang would like to be able to hand back a conservative result
> >> >> (e.g.
> >> >> (size_t)-1), but the LLVM intrinsic that clang lowers
> >> >> __builtin_object_size(p, N) to, @llvm.objectsize, only hands back 0
> >> >> when
> >> >> given a null pointer. Because it's often assumed that
> >> >> __builtin_object_size
> >> >> always folds to a constant, handling this only in clang may be
> tricky:
> >> >> if we
> >> >> emit the IR equivalent to ((p) ? __builtin_object_size(p, N) : -1ULL)
> >> >> and
> >> >> LLVM can't fold away the null check, we've failed to emit a constant.
> >> >>
> >> >> So, the best path forward that I can see is to add a "null is unknown
> >> >> size" bit to @llvm.objectsize, much like the "min" bit it already
> has.
> >> >> If
> >> >> this bit is true, null would be treated like an opaque pointer.
> >> >> Otherwise,
> >> >> @llvm.objectsize would act no differently than it does today.
> >> >>
> >> >> If we like this approach, I'm assuming it would be best to have this
> >> >> bit
> >> >> as a third argument to @llvm.objectsize, rather than making the
> second
> >> >> argument an i8 and using it as a bag of bits.
> >> >>
> >> >> All thoughts/comments/alternative approaches/etc. highly appreciated.
> >> >> :)
> >> >>
> >> >> Thanks (and Happy Holidays)!
> >> >> George
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm-dev at lists.llvm.org
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> >
> >>
> >>
> >>
> >> --
> >> Sanjoy Das
> >> http://playingwithpointers.com
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170102/47503063/attachment.html>


More information about the llvm-dev mailing list