[llvm-dev] [RFC] Introducing the maxobjsize attribute

Mon Oct 19 17:41:43 PDT 2020

On 10/16/2020 9:53 AM, Johannes Doerfert wrote:
>
> On 10/16/20 11:33 AM, Philip Reames wrote:
>>
>> On 10/15/20 9:57 PM, Johannes Doerfert wrote:
>>>
>>> On 10/15/20 10:50 PM, Philip Reames wrote:
>>>>
>>>> On 10/15/20 6:30 PM, Johannes Doerfert wrote:
>>>>>
>>>>> On 10/15/20 6:00 PM, Philip Reames via llvm-dev wrote:
>>>>>>
>>>>>> On 10/13/20 9:35 AM, Atmn Patel via llvm-dev wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> We've prepared a new attribute `maxobjsize(<n>)` that tracks the 
>>>>>>> maximum size of the object that a pointer points to. This 
>>>>>>> attribute will be deduced through the Attributor framework and 
>>>>>>> it is used for aliasing queries. The `maxobjsize` of an object, 
>>>>>>> and number of `dereferenceable` bytes can be used as upper and 
>>>>>>> lower bounds on the object size, and if there is no overlap, we 
>>>>>>> can determine that the underlying objects cannot alias. 
>>>>>>> Basically, an object that is at most N bytes long is not 
>>>>>>> aliasing one that is at least N+1 bytes long.
>>>>>>
>>>>>> This is commingling two separate concerns.  At minimum, a wording 
>>>>>> clarification is needed, it's possible the proposed use case does 
>>>>>> not work.
>>>>>>
>>>>>> Deferenceability is the amount of space which can be accessed 
>>>>>> without a runtime fault.  Of key importance is the 
>>>>>> dereferenceability is disconnected from object size. There may be 
>>>>>> space beyond an object which is dereferenceable, but outside the 
>>>>>> object.
>>>>>>
>>>>>> As a simple example, imagine an allocator which allocates 32 byte 
>>>>>> blocks of memory, 32 byte aligned.  If the actual object 
>>>>>> allocated is only 16 bytes, the pointer is still known to be 32 
>>>>>> byte aligned and deref for 32 bytes.  The contents past the 
>>>>>> object are simply unspecified.
>>>>>>
>>>>>> Saying that a 32 byte derefenceable pointer doesn't alias one 
>>>>>> with a maximum object size of 16 bytes would be wrong and lead to 
>>>>>> miscompiles in practice.
>>>>>
>>>>> We already perform exactly this deduction in BasicAA right now, 
>>>>> except that max object size is not made explicit.
>>>>> You just assumed the allocated object is 16 bytes and therefore it 
>>>>> will imply `maxobjsize(16)` while the underlying memory region is 
>>>>> `dereferenceable(32)`.
>>>>> You cannot have both in our object-driven model.
>>>>
>>>> Then, as I stated, at minimum, you have a wording problem. In my 
>>>> example, I can reasonable state the object size is 16 bytes. If 
>>>> it's not legal to also state "maxobjectsize(16)" you need to either 
>>>> a) pick another attribute name, or b) be very very pedantic about 
>>>> defining the terminology.  I'll note that neither the original 
>>>> email or your response does the later.
>>>>
>>>> I'll also note that the existing LangRef wording for 
>>>> dereferenceability says nothing about object sizes.  You seem to be 
>>>> implying the opposite, but I don't follow your claim as it doesn't 
>>>> seem to match the actual wording.
>>>
>>> The problem you describe, at least the way I understand it, is not 
>>> with maxobjsize but with the current (and here proposed) used of 
>>> dereferenceable.
>>>
>>> We do (for a while now) use dereferenceable as a lower bound of an 
>>> object without specifying so in the lang ref. (I don't oppose that 
>>> we do though).
>>
>> If the implementation of the optimizer disagrees with the 
>> specification (LangRef), we should fix one or the other.  In 
>> particular, we should do that *before* extending the problematic 
>> implementation.  In this particular case, I believe the specification 
>> is correct and we may need to revisit problematic optimizations.  
>> (Well, given current knowledge; I may change my mind with more info.)
>>
>> Can you give a pointer to the specific bit of optimizer reasoning 
>> you're concerned by?  I think it would help to discuss this 
>> concretely instead of abstractly.
>>
>>> At the same time we would (conceptually) extend dereferenceable for 
>>> a pointer if we know a second object is placed right behind the first.
>>> I don't think that can ever happen if the addresses are not given by 
>>> the user though.
>> It can.  Consider the vectorizor deciding to version a loop for 
>> alignment purposes.  I'm not 100% sure we actually do this today 
>> (haven't checked), but it's definitely a common enough transform to 
>> be considered.  Now, this is a slightly different case as we'd have a 
>> contextual fact about the larger deref, but not a base fact on the 
>> object, but I'd really not want to split that hair. Combining that 
>> hair splitting with (say) the returned attribute on some call down 
>> the dominated path would get error prone real fast.
>>>
>>> Going back to the allocator example, the allocator should not say it 
>>> allocated 32 bytes or it should say the object it allocated is 32 
>>> bytes.
>>> Having only one of the two breaks our model. I'm unsure what to 
>>> change for this though because I think maxobjsize in this example is 
>>> 32 if the
>>> allocator returns 32 usable bytes, or the dereferenceable is and 
>>> will be 16 if the bytes are not usable.
>> I disagree with this.  The size of the object is the size of the 
>> object.  This may for instance be a call to malloc with a runtime 
>> commutable size.  It's entirely reasonable and expected to both 
>> support "minimum dereferenceable span is 32 bytes", and "can allocate 
>> objects of arbitrary size".
>
> I honestly do not understand your points. I don't say allocators 
> cannot allocate objects of arbitrary size. I say that if you allocate 
> an object of size X it has X dereferenceable bytes and maxobjsize is 
> X. It is wrong to mark you allocator (manually since we don't do it 
> for you) the way you described it earlier, e.g., that the "object" is 
> smaller than the "allocated memory that is visible/usable". That 
> doesn't mean you cannot allocate more memory, but it cannot be exposed 
> outside the allocator. (FWIW, similarly, `noalias` for allocator 
> return values only works because we cannot look into them.)
>
> I also did not follow the vectorizer comment. The vectorizer, nor any 
> other pass, can assume relative ordering between objects. That would 
> break everything. It can extend or merge objects/allocations but that 
> is going to impact both dereferenceable as well as maxobjsize.
>
> Can you provide a small example where you think things go wrong, with 
> or without this attribute? I know you asked me for an example but as I 
> said, the issue I described should not occur in practice as we don't 
> allow users to specify the addresses of objects in IR (only later).

Johannes,

I'm going to have to reverse this on you.  I have expressed concern with 
the design.  Please treat those concerns as blocking review comments.  
(i.e. please do not land until reconciled)  I have tried to explain in 
writing, and that doesn't seem to be working.  Let's talk offline.

Philip

>
> ~ Johannes
>
>
>>> Maybe it's just late and I miss your point :(
>>>
>>> ~ Johannes
>>>
>>>
>>>>
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>>>
>>>>>>> These changes are in:
>>>>>>> - D87975 - [IR] Introduce MaxObjSize Attribute
>>>>>>> - D87978 - [Attributor] Adds deduction for the MaxObjSize Attribute
>>>>>>> - D88353 - [BasicAA] Integrate MaxobjSize for NoAlias
>>>>>>>
>>>>>>> These are the Statistics changes for CTMark *without* the actual 
>>>>>>> deduction (https://reviews.llvm.org/D88353#2301597):
>>>>>>> CHANGED: branch-folder                NumHoist  438 ->      431 
>>>>>>> (  -1.598%)
>>>>>>> CHANGED: codegenprepare               NumBlocksElim 16093 ->     
>>>>>>>  15885 (  -1.292%)
>>>>>>> CHANGED: codegenprepare               NumExtsMoved 6373 ->       
>>>>>>> 6439 (  +1.036%)
>>>>>>> CHANGED: gvn  IsValueFullyAvailableInBlockNumSpeculationsMax 
>>>>>>> 6746 ->       6858 (    +1.660%)
>>>>>>> CHANGED: gvn                          NumGVNInstr 78434 ->     
>>>>>>>  79330 (  +1.142%)
>>>>>>> CHANGED: instcombine                  NumReassoc  22830 ->     
>>>>>>>  23213 (  +1.678%)
>>>>>>> CHANGED: instsimplify                 NumSimplified 21278 ->     
>>>>>>>  21495 (  +1.020%)
>>>>>>> CHANGED: licm                         NumPromoted 407 ->       
>>>>>>>  497 ( +22.113%)
>>>>>>> CHANGED: loop-rotate  NumNotRotatedDueToHeaderSize 37 ->         
>>>>>>> 35 (    -5.405%)
>>>>>>> CHANGED: loop-simplify                NumNested   126 ->       
>>>>>>>  128 (  +1.587%)
>>>>>>> CHANGED: machinelicm                  NumPostRAHoisted  131 ->   
>>>>>>>      134 (  +2.290%)
>>>>>>> CHANGED: memory-builtins  ObjectVisitorLoad           96077 ->   
>>>>>>>    97496 (  +1.477%)
>>>>>>> CHANGED: regalloc NumDCEFoldedLoads              38 ->         
>>>>>>> 37 (  -2.632%)
>>>>>>> CHANGED: regalloc                     NumLaneConflicts 4408 ->   
>>>>>>>     4332 (  -1.724%)
>>>>>>> CHANGED: regalloc NumReloadsRemoved            1062 ->       
>>>>>>> 1050 (  -1.130%)
>>>>>>> CHANGED: regalloc                     NumSnippets  1168 ->       
>>>>>>> 1152 (  -1.370%)
>>>>>>> CHANGED: regalloc                     NumSpillsRemoved  672 ->   
>>>>>>>      665 (  -1.042%)
>>>>>>> CHANGED: stack-slot-coloring          NumDead  14 ->   18 ( 
>>>>>>> +28.571%)
>>>>>>> CHANGED: twoaddressinstruction  NumConvertedTo3Addr             
>>>>>>>     27054 ->  26695 (  -1.327%)
>>>>>>>
>>>>>>> These are the Statistic Changes in CTMark w/O3 before/after 
>>>>>>> these patches (https://reviews.llvm.org/D87978#2307622):
>>>>>>> CHANGED: codegenprepare               NumExtsMoved 3631 ->       
>>>>>>> 3699 (  +1.873%)
>>>>>>> CHANGED: dse                          NumFastOther  192 ->       
>>>>>>>  194 (  +1.042%)
>>>>>>> CHANGED: gvn  IsValueFullyAvailableInBlockNumSpeculationsMax 
>>>>>>> 4958 ->       5060 (    +2.057%)
>>>>>>> CHANGED: gvn                          NumGVNInstr 46657 ->     
>>>>>>>  47534 (  +1.880%)
>>>>>>> CHANGED: jump-threading               NumDupes 91 ->   92 ( 
>>>>>>>  +1.099%)
>>>>>>> CHANGED: licm                         NumMovedLoads  6272 ->     
>>>>>>>   6344 (  +1.148%)
>>>>>>> CHANGED: licm                         NumPromoted 381 ->       
>>>>>>>  438 ( +14.961%)
>>>>>>> CHANGED: loop-rotate  NumNotRotatedDueToHeaderSize 31 ->         
>>>>>>> 29 (    -6.452%)
>>>>>>> CHANGED: machinelicm                  NumPostRAHoisted   88 ->   
>>>>>>>       89 (  +1.136%)
>>>>>>> CHANGED: memdep NumCacheNonLocalPtr 1005887 ->  1016671 (  +1.072%)
>>>>>>> CHANGED: memory-builtins  ObjectVisitorLoad           62048 ->   
>>>>>>>    63473 (  +2.297%)
>>>>>>> CHANGED: peephole-opt                 NumCmps 532 ->  526 ( 
>>>>>>>  -1.128%)
>>>>>>> CHANGED: regalloc NumDCEFoldedLoads              27 ->         
>>>>>>> 26 (  -3.704%)
>>>>>>> CHANGED: regalloc                     NumLocalSplits 1891 ->     
>>>>>>>   1870 (  -1.111%)
>>>>>>>
>>>>>>> Feedback Welcome.
>>>>>>>
>>>>>>> Atmn and Johannes
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev