[llvm] r176408 - recommit r172363 & r171325 (reverted in r172756)

Sun Mar 3 15:38:51 PST 2013

----- Original Message -----
> From: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> To: "Nuno Lopes" <nunoplopes at sapo.pt>
> Cc: llvm-commits at cs.uiuc.edu
> Sent: Sunday, March 3, 2013 5:13:10 PM
> Subject: Re: [llvm] r176408 - recommit r172363 & r171325 (reverted in r172756)
> 
> 
> On Mar 3, 2013, at 4:17 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
> 
> >> I have to agree with Shuxin here. The LangRef says about the
> >> semantics of "llvm.objectsize.i32/i64":
> >> 
> >> "The llvm.objectsize intrinsic is designed to provide information
> >> to the optimizers to determine at compile time whether a) an
> >> operation (like memcpy) will overflow a buffer that corresponds
> >> to an object, or b) that a runtime check for overflow isn’t
> >> necessary. An object in this context means an allocation of a
> >> specific class, structure, array, or other object."
> >> 
> >> "The llvm.objectsize intrinsic is lowered to a constant
> >> representing the size of the object concerned. If the size cannot
> >> be determined at compile time,llvm.objectsize returns i32/i64 -1
> >> or 0 (depending on the min argument)."
> >> 
> >> Emphasis: "An object in this context means an allocation of a
> >> specific class, structure, array, or other object."
> >> 
> >> To me this can be clearly interpreted as a requirement to point to
> >> one "identified underlying object" in LLVM terms. A front end
> >> could now assume that if "llvm.objectsize.i32/i64" returns a
> >> value other than the failurer value it is guaranteed that the
> >> pointer value points to one underlying object and not two.
> > 
> > Let me give an example:
> > 
> > %p1 = malloc(42);
> > %p2 = malloc(42);
> > %p3 = phi(%p1, %p2);
> > %size = @llvm.objectsize(%p3);
> > 
> > The patch I commited enables instcombine to replace %size  with the
> > constant 42.
> 
> Yes, and that is the problem. I claim that it is questionable whether
> the semantics of "llvm.objectsize" does allow you to do so.
> 
> "The llvm.objectsize intrinsic is lowered to a constant representing
> the size of THE object concerned"
>  - http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic
> 
> There is no mentioning of more than one object. I believe you could
> interpret this as a contract that llvm.objectsize will only return a
> valid size value for one identified underlying object.
> 
> "An object in this context means AN allocation of A specific class,
> structure, array, or other object."
>  - http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic
> 
> Again, no mentioning of a set of objects. To me this reads as a clear
> specification of what the object is. ONE and not a set.
> 
> If you think that the semantics of "llvm.objectsize" should be
> different in that it allows for a set of underlying objects, again,
> I think we have to change the LangRef. We might be breaking out of
> tree code that interpreted the semantics as I did, though.
> 
> What I am concerned with is external code that took my interpretation
> of what llvm.objectsize means:
> 
> Imagine a code generator taking my view and generating the following
> code:
> 
> %p1 = malloc(42);
> %p2 = malloc(42);
> %p3 = phi(%p1, %p2);
> %size = @llvm.objectsize(%p3, true);
> 
> if (%size != 0) {
>   free(%p1)
> } else {
>    free(%p1)
>    free(%p2)
> }

I'll comment from the sidelines that, while I can see this is a theoretical possibility, we should not use it as a justification for inhibiting a useful optimization. Nevertheless, this seems to be a solvable problem. For example, we could add a third parameter to the objectsize intrinsic indicating whether ambiguous object sizes are allowed, and we could auto-upgrade the two argument form to the three argument form such that the new argument defaults to false.

 -Hal

> 
> Now you have a dangling pointer p2. Or imagine code that initialize a
> class with valid values …
> 
> > I fail to see why that transformation is incorrect.  While it's
> > true that we don't know to which object %p3 will point to at run
> > time, we know that its size will be 42.  Of course, if we had
> > %p2=malloc(3)', then we couldn't do this constant propagation.
> > For the analysis of objectsize, it's safe to look to a superset of
> > the objects pointed-to by the input pointer.
> > 
> 
> It is true that we can possibly determine an "common size of a
> pointer pointing to a set of underlying objects" which is not
> necessarily the same thing as "llvm.objectsize".
> 
> The problem is with what the semantics of llvm.objectsize is. What
> does it return if the pointer could point to two objects? I believe
> that from the text in
> "http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic" you
> cannot replace it with a constant if you have several objects or at
> least that it is a feasible interpretation of the text.
> 
> > 
> > Shuxin: I'm not breaking any assumption here.  If you notice, we
> > already do a similar analysis for select instructions. Phi nodes
> > were not processed because nobody (I mean, me) implemented the
> > code.
> 
> Then, existing code also breaks the semantics of llvm.objectsize and
> has to be fixed. At least if you take the view that my
> interpretation of llvm.objectsize is a feasible one.
> 
> Again, there are two issues here:
> 
> What is the semantics of "llvm.objectsize"?
> 
> What size can we compute for a pointer that might point to a set of
> underlying objects.
> 
> Those might be two separate things, depending on which view you take
> on the semantics of "llvm.objectsize".
> 
> 
> > I agree BasicAA could use more comments. I'll add a comment there.
> > 
> > Nuno
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>