[llvm] r176408 - recommit r172363 & r171325 (reverted in r172756)

Sun Mar 3 15:13:10 PST 2013

On Mar 3, 2013, at 4:17 PM, Nuno Lopes <nunoplopes at sapo.pt> wrote:

>> I have to agree with Shuxin here. The LangRef says about the semantics of "llvm.objectsize.i32/i64":
>> 
>> "The llvm.objectsize intrinsic is designed to provide information to the optimizers to determine at compile time whether a) an operation (like memcpy) will overflow a buffer that corresponds to an object, or b) that a runtime check for overflow isn’t necessary. An object in this context means an allocation of a specific class, structure, array, or other object."
>> 
>> "The llvm.objectsize intrinsic is lowered to a constant representing the size of the object concerned. If the size cannot be determined at compile time,llvm.objectsize returns i32/i64 -1 or 0 (depending on the min argument)."
>> 
>> Emphasis: "An object in this context means an allocation of a specific class, structure, array, or other object."
>> 
>> To me this can be clearly interpreted as a requirement to point to one "identified underlying object" in LLVM terms. A front end could now assume that if "llvm.objectsize.i32/i64" returns a value other than the failurer value it is guaranteed that the pointer value points to one underlying object and not two.
> 
> Let me give an example:
> 
> %p1 = malloc(42);
> %p2 = malloc(42);
> %p3 = phi(%p1, %p2);
> %size = @llvm.objectsize(%p3);
> 
> The patch I commited enables instcombine to replace %size  with the constant 42.

Yes, and that is the problem. I claim that it is questionable whether the semantics of "llvm.objectsize" does allow you to do so.

"The llvm.objectsize intrinsic is lowered to a constant representing the size of THE object concerned"
 - http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic

There is no mentioning of more than one object. I believe you could interpret this as a contract that llvm.objectsize will only return a valid size value for one identified underlying object.

"An object in this context means AN allocation of A specific class, structure, array, or other object."
 - http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic

Again, no mentioning of a set of objects. To me this reads as a clear specification of what the object is. ONE and not a set.

If you think that the semantics of "llvm.objectsize" should be different in that it allows for a set of underlying objects, again, I think we have to change the LangRef. We might be breaking out of tree code that interpreted the semantics as I did, though.

What I am concerned with is external code that took my interpretation of what llvm.objectsize means:

Imagine a code generator taking my view and generating the following code:

%p1 = malloc(42);
%p2 = malloc(42);
%p3 = phi(%p1, %p2);
%size = @llvm.objectsize(%p3, true);

if (%size != 0) {
  free(%p1)
} else {
   free(%p1)
   free(%p2)
}

Now you have a dangling pointer p2. Or imagine code that initialize a class with valid values …

> I fail to see why that transformation is incorrect.  While it's true that we don't know to which object %p3 will point to at run time, we know that its size will be 42.  Of course, if we had %p2=malloc(3)', then we couldn't do this constant propagation.
> For the analysis of objectsize, it's safe to look to a superset of the objects pointed-to by the input pointer.
> 

It is true that we can possibly determine an "common size of a pointer pointing to a set of underlying objects" which is not necessarily the same thing as "llvm.objectsize".

The problem is with what the semantics of llvm.objectsize is. What does it return if the pointer could point to two objects? I believe that from the text in "http://llvm.org/docs/LangRef.html#llvm-objectsize-intrinsic" you cannot replace it with a constant if you have several objects or at least that it is a feasible interpretation of the text.

> 
> Shuxin: I'm not breaking any assumption here.  If you notice, we already do a similar analysis for select instructions. Phi nodes were not processed because nobody (I mean, me) implemented the code.

Then, existing code also breaks the semantics of llvm.objectsize and has to be fixed. At least if you take the view that my interpretation of llvm.objectsize is a feasible one.

Again, there are two issues here:

What is the semantics of "llvm.objectsize"?

What size can we compute for a pointer that might point to a set of underlying objects.

Those might be two separate things, depending on which view you take on the semantics of "llvm.objectsize".

> I agree BasicAA could use more comments. I'll add a comment there.
> 
> Nuno