[LLVMdev] Question about intrinsic function llvm.objectsize

Wed Feb 27 11:29:53 PST 2013

On 2/27/13 11:21 AM, Arnold Schwaighofer wrote:
> On Feb 27, 2013, at 12:37 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>> Hi, Nuno and Arnold:
>>
>>    Thank you all for the input.
>>
>>    Let me coin a term, say "clique" for this discussion to avoid unnecessary confusion.
>>    A clique is statically or dynamically allocated type-free stretch of memory.  A "clique"
>>     1) is maximal in the sense that a clique dose not have any enclosing data structure that can
>>        completely cover or, partially overlap with it.
>>       e.g1 x = malloc(100)
>>            the clique has 100 byte
>>
>>       e.g.  foo( T t;  xxxx = t.y.z; } // the corresponding clique accessed by the load is t of sizeof(T).
>>       e.g . bar(T* t) {  xxxx = t->y.z; } // we have no idea which clique the load is accessing  as we don't
>>                                                        // know if the *t has enclosing data structure or not.
>>
>>     2). a clique's address cannot be derived from another clique.
>>        This feature can be used for disambiguation.  Consider two access, one access's corresponding
>>        clique has N byte, the other one has M byte, where M > N. It is impossible that these accesses alias.
>>          ….
> This is not how LLVM disambiguates in the basicaa code fragment you quote below.
>
> The code fragment asserts that something can be assumed "no alias" if the access size of an memory access (i.e. a load or store) is bigger than the underlying object of the other accessed object.
>
> For you example below:
>
> V1Size = sizeof(mydata)
> V2Size = say 4
>
> assuming that GetUnderlyingObject for addr1 and addr2 returns mydata:
This assumption is wrong. The GetUnderlyingObject() return phi node 
instead of &mydata.

>
> getObjectSize(O1) = sizeof(mydata)
> getObjectSize(O2) = sizeof(mydata)
>
> this should be
> isObjectSmallerThan(O2, V1Size) == false
> isObjectSmallerThan(O1, V2Size) == false
>
> ??
>
>>      In llvm, the "object pointed to by ptr p" seems to take on two meanings. Suppose the corresponding
>> clique of the "object", regardless which context it is in, is C.
>>      a): the object refers to the stretch of memory staring from p to the end of the C.
>>      b): the entire clique C.
>>
>>   In the context of llvm.objectsize(). As you guys pointed out, the "object" take meaning of a).
>> If would not be useful if it take meaning  b) as  the compiler is not able to use this info to convert
>> run-time testing into a static test.
>>
>>   In the context of memory disambiguation, the "object" take meaning of b) in order to take advantage of
>> clique's 2nd feature.
>>
>>   Following is the real example I'm investigating where the "object" take on meaning b) leads to miscompilation.
>>
>> -----------------------------------------
>> struct { T1* p1, T2 *p2, int xyz} mydata;
>>
>>     addr1 = phi(&mydata, &mydata); // all operands have same value.
>>     addr2 = phi(&mydata.xyz, &mydata.xyz);
>>
>>     memcpy(addr1, ... sizeof(mydata))  // mem1
>>      = load addr2->syz                        // mem2
>> --------------------------------------------------------
>>
>> ( But for the phi node, getUnderlyingObject() would return "mydata" as "object" for both memory accesses. )
>>   getObjectSize(mem2) return 4 instead of sizeof(mydata). Now that the "object" only have 4 bytes,
>>   the "object" of another access, which has sizeof(mydata) byte, should not cover this object, and hence
>>   mistakenly give "noalias".
>>
>>
>>
>>
>>
>> On 2/27/13 7:41 AM, Arnold Schwaighofer wrote:
>>> On Feb 27, 2013, at 4:05 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
>>>
>>>> Hi,
>>>>
>>>> Regarding the definition of object for @llvm.objectsize, it is identical to gcc's __builtin_object_size().  So it's not wrong; it's just the way it was defined to be.
>>>>
>>>> Regarding the BasicAA's usage of these functions, I'm unsure.  It seems to me that isObjectSmallerThan() also expects the same definition, but I didn't review the code carefully.
>>>> When you do a load from a certain memory address, basicAA is interested to know if that load will overflow the buffer bounds, which means that no aliasing can occur (or it's an UB operation).
>>>> Therefore I don't think basicAA cares about the size of the whole object, but just the remaining part of it (size-offset).  But again, I could be wrong here.
>>> No, it cares about the size of the whole object in the code quoted below.
>>> const Value *O1 = GetUnderlyingObject(V1, TD);
>>>
>>> But that is fine. See answers below.
>>>
>>>> Quoting Shuxin Yang <shuxin.llvm at gmail.com>:
>>>>
>>>>> Hi,
>>>>>
>>>>>   In the following instruction sequence, llvm.objectsize.i64(p) returns 6 (the entire *.ll is attached to the mail).
>>>>> Is this correct? Shouldn't the "object" refer to the entire block of memory being allocated?
>>>>>
>>> No, the ""llvm.objectsize intrinsic" is defined as the size of the object pointed to by the first argument, in your case "p+50".
>>> To implement the llvm.objectsize intrinsic we call getObjectSize on the "p+50" pointer.
>>>
>>> See http://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html for intended use.
>>>
>>>>>   (char*) p = malloc(56)
>>>>>   llvm.objectisize.i32(p+50);
>>>>>
>>>>> Thanks
>>>>> Shuxin
>>>>>
>>>>>
>>>>>
>>>>>   This question is related to PR14988 (failure in bootstrap build with LTO).
>>>>> Part of the reason is that
>>>>> the compiler interpret the pretty vague team "object" in different ways -- the "object" suggested by
>>>>> line 359 @ figure 1 is just a part of the "object" implied by Figure 2.
>>>>>
>>> Well, it depends on which object you are talking about an object pointed to by a pointer or an "underlying object". Both are valid things to talk about in varying contexts.
>>>
>>>>>   I try to fix the problem PR14988 by replacing line 359 with "Size = Offset.getZextValue()"; It does fix the problem.\
>>> That does not sound right. The size of an object pointed to by a pointer is the size of the underlying object minus any offsets.
>>>
>>>>> However, I'm wondering in what situations the Offset != 0. So I put an assertion right before line 359,
>>> The offset is not null if the "Ptr" passed does not point to an underlying object.
>>>
>>>>> saying "assert(Offset != 0)". It catches two cases in SingleSource test suite. I investigate one of them,
>>>>> the assertion is triggered when the llvm::getObjectSize() is called by instcombine (instead of alias analyzer)
>>> Yes, this observation makes sense. In this case there is an offset (you are passing "ptr+offset") and you want this behavior (llvm.objectsize is defined as such). See below.
>>>>> in an attempt to replace llvm.objectsize() with a constant. I think the way  llvm::getObjectSize() interpret "object" is wrong.
>>>>>
>>> No, it is behaving as expected depending on where you call it from. See below.
>>>>> Figure 1
>>>>> cat -n lib/Analysis/MemoryBuiltins.cpp
>>>>>   344 bool llvm::getObjectSize(const Value *Ptr, uint64_t &Size, const DataLayout *TD,
>>>>>   345                          const TargetLibraryInfo *TLI, bool RoundToAlign) {
>>>>>   346   if (!TD)
>>>>>   347     return false;
>>>>>   348
>>>>>   349   ObjectSizeOffsetVisitor Visitor(TD, TLI, Ptr->getContext(), RoundToAlign);
>>>>>   350   SizeOffsetType Data = Visitor.compute(const_cast<Value*>(Ptr));
>>>>>   351   if (!Visitor.bothKnown(Data))
>>>>>   352     return false;
>>>>>   353
>>>>>   354   APInt ObjSize = Data.first, Offset = Data.second;
>>>>>   355   // check for overflow
>>>>>   356   if (Offset.slt(0) || ObjSize.ult(Offset))
>>>>>   357     Size = 0;
>>>>>   358   else
>>>>>   359     Size = (ObjSize - Offset).getZExtValue(); ??? What the hack is "object"??????
>>>>>   360   return true;
>>>>>   361 }
>>>>>
>>> The object is the object pointed to by "Ptr"; depending on where you call the "getObjectSize" function from these might be different things: an underlying object or an object "based on" (http://llvm.org/docs/LangRef.html#pointer-aliasing-rules) an underlying object.
>>>
>>> In the "llvm.objectsize" context we pass an object "based on p" to getObjectSize: "p+50". In the basicaa context, we wanna know whether an access is beyond the bounds of an underlying object (undefined behavior land) so we pass the underlying object (which in your example would be the "p" returned from malloc) to the getObjectSize function.
>>>
>>> In the first case (passing "p+50" to getObjectSize) ObjSize should be 56 and the Offset will be 50 yielding 6 in the second case your ObjSize will be 56 and the offset is zero because basicaa passed the underlying object "p".
>>>
>>> I have to agree though the term "object" is somewhat confusing.
>>>
>>>>> Figure 2
>>>>> cat -n lib/Analysis/BasicAliasAnalysis.cpp
>>>>> 1205   // If the size of one access is larger than the entire object on the other
>>>>> 1206   // side, then we know such behavior is undefined and can assume no alias.
>>>>> 1207   if (TD)
>>>>> 1208     if ((V1Size != UnknownSize && isObjectSmallerThan(O2, V1Size, *TD, *TLI)) ||
>>>>> 1209         (V2Size != UnknownSize && isObjectSmallerThan(O1, V2Size, *TD, *TLI)))
>>>>> 1210       return NoAlias;
>>> Best,
>>> Arnold