[llvm-dev] [RFC] Introducing the maxobjsize attribute

Fri Oct 16 09:33:29 PDT 2020

On 10/15/20 9:57 PM, Johannes Doerfert wrote:
>
> On 10/15/20 10:50 PM, Philip Reames wrote:
>>
>> On 10/15/20 6:30 PM, Johannes Doerfert wrote:
>>>
>>> On 10/15/20 6:00 PM, Philip Reames via llvm-dev wrote:
>>>>
>>>> On 10/13/20 9:35 AM, Atmn Patel via llvm-dev wrote:
>>>>> Hi All,
>>>>>
>>>>> We've prepared a new attribute `maxobjsize(<n>)` that tracks the 
>>>>> maximum size of the object that a pointer points to. This 
>>>>> attribute will be deduced through the Attributor framework and it 
>>>>> is used for aliasing queries. The `maxobjsize` of an object, and 
>>>>> number of `dereferenceable` bytes can be used as upper and lower 
>>>>> bounds on the object size, and if there is no overlap, we can 
>>>>> determine that the underlying objects cannot alias. Basically, an 
>>>>> object that is at most N bytes long is not aliasing one that is at 
>>>>> least N+1 bytes long.
>>>>
>>>> This is commingling two separate concerns.  At minimum, a wording 
>>>> clarification is needed, it's possible the proposed use case does 
>>>> not work.
>>>>
>>>> Deferenceability is the amount of space which can be accessed 
>>>> without a runtime fault.  Of key importance is the 
>>>> dereferenceability is disconnected from object size.  There may be 
>>>> space beyond an object which is dereferenceable, but outside the 
>>>> object.
>>>>
>>>> As a simple example, imagine an allocator which allocates 32 byte 
>>>> blocks of memory, 32 byte aligned.  If the actual object allocated 
>>>> is only 16 bytes, the pointer is still known to be 32 byte aligned 
>>>> and deref for 32 bytes.  The contents past the object are simply 
>>>> unspecified.
>>>>
>>>> Saying that a 32 byte derefenceable pointer doesn't alias one with 
>>>> a maximum object size of 16 bytes would be wrong and lead to 
>>>> miscompiles in practice.
>>>
>>> We already perform exactly this deduction in BasicAA right now, 
>>> except that max object size is not made explicit.
>>> You just assumed the allocated object is 16 bytes and therefore it 
>>> will imply `maxobjsize(16)` while the underlying memory region is 
>>> `dereferenceable(32)`.
>>> You cannot have both in our object-driven model.
>>
>> Then, as I stated, at minimum, you have a wording problem.  In my 
>> example, I can reasonable state the object size is 16 bytes. If it's 
>> not legal to also state "maxobjectsize(16)" you need to either a) 
>> pick another attribute name, or b) be very very pedantic about 
>> defining the terminology.  I'll note that neither the original email 
>> or your response does the later.
>>
>> I'll also note that the existing LangRef wording for 
>> dereferenceability says nothing about object sizes.  You seem to be 
>> implying the opposite, but I don't follow your claim as it doesn't 
>> seem to match the actual wording.
>
> The problem you describe, at least the way I understand it, is not 
> with maxobjsize but with the current (and here proposed) used of 
> dereferenceable.
>
> We do (for a while now) use dereferenceable as a lower bound of an 
> object without specifying so in the lang ref. (I don't oppose that we 
> do though).

If the implementation of the optimizer disagrees with the specification 
(LangRef), we should fix one or the other.  In particular, we should do 
that *before* extending the problematic implementation.  In this 
particular case, I believe the specification is correct and we may need 
to revisit problematic optimizations.  (Well, given current knowledge; I 
may change my mind with more info.)

Can you give a pointer to the specific bit of optimizer reasoning you're 
concerned by?  I think it would help to discuss this concretely instead 
of abstractly.

> At the same time we would (conceptually) extend dereferenceable for a 
> pointer if we know a second object is placed right behind the first.
> I don't think that can ever happen if the addresses are not given by 
> the user though.
It can.  Consider the vectorizor deciding to version a loop for 
alignment purposes.  I'm not 100% sure we actually do this today 
(haven't checked), but it's definitely a common enough transform to be 
considered.  Now, this is a slightly different case as we'd have a 
contextual fact about the larger deref, but not a base fact on the 
object, but I'd really not want to split that hair.  Combining that hair 
splitting with (say) the returned attribute on some call down the 
dominated path would get error prone real fast.
>
> Going back to the allocator example, the allocator should not say it 
> allocated 32 bytes or it should say the object it allocated is 32 bytes.
> Having only one of the two breaks our model. I'm unsure what to change 
> for this though because I think maxobjsize in this example is 32 if the
> allocator returns 32 usable bytes, or the dereferenceable is and will 
> be 16 if the bytes are not usable.
I disagree with this.  The size of the object is the size of the 
object.  This may for instance be a call to malloc with a runtime 
commutable size.  It's entirely reasonable and expected to both support 
"minimum dereferenceable span is 32 bytes", and "can allocate objects of 
arbitrary size".
>
> Maybe it's just late and I miss your point :(
>
> ~ Johannes
>
>
>>
>>>
>>> ~ Johannes
>>>
>>>
>>>>>
>>>>> These changes are in:
>>>>> - D87975 - [IR] Introduce MaxObjSize Attribute
>>>>> - D87978 - [Attributor] Adds deduction for the MaxObjSize Attribute
>>>>> - D88353 - [BasicAA] Integrate MaxobjSize for NoAlias
>>>>>
>>>>> These are the Statistics changes for CTMark *without* the actual 
>>>>> deduction (https://reviews.llvm.org/D88353#2301597):
>>>>> CHANGED: branch-folder                NumHoist  438 ->      431 ( 
>>>>>  -1.598%)
>>>>> CHANGED: codegenprepare               NumBlocksElim 16093 ->     
>>>>>  15885 (  -1.292%)
>>>>> CHANGED: codegenprepare               NumExtsMoved 6373 ->       
>>>>> 6439 (  +1.036%)
>>>>> CHANGED: gvn  IsValueFullyAvailableInBlockNumSpeculationsMax 6746 
>>>>> ->       6858 (    +1.660%)
>>>>> CHANGED: gvn                          NumGVNInstr 78434 ->     
>>>>>  79330 (  +1.142%)
>>>>> CHANGED: instcombine                  NumReassoc  22830 ->     
>>>>>  23213 (  +1.678%)
>>>>> CHANGED: instsimplify                 NumSimplified 21278 ->     
>>>>>  21495 (  +1.020%)
>>>>> CHANGED: licm                         NumPromoted 407 ->       
>>>>>  497 ( +22.113%)
>>>>> CHANGED: loop-rotate  NumNotRotatedDueToHeaderSize 37 ->         
>>>>> 35 (    -5.405%)
>>>>> CHANGED: loop-simplify                NumNested   126 ->       
>>>>>  128 (  +1.587%)
>>>>> CHANGED: machinelicm                  NumPostRAHoisted    131 ->   
>>>>>      134 (  +2.290%)
>>>>> CHANGED: memory-builtins              ObjectVisitorLoad           
>>>>> 96077 ->      97496 (  +1.477%)
>>>>> CHANGED: regalloc                     NumDCEFoldedLoads           
>>>>>    38 ->         37 (  -2.632%)
>>>>> CHANGED: regalloc                     NumLaneConflicts   4408 ->   
>>>>>     4332 (  -1.724%)
>>>>> CHANGED: regalloc                     NumReloadsRemoved           
>>>>>  1062 ->       1050 (  -1.130%)
>>>>> CHANGED: regalloc                     NumSnippets  1168 ->       
>>>>> 1152 (  -1.370%)
>>>>> CHANGED: regalloc                     NumSpillsRemoved    672 ->   
>>>>>      665 (  -1.042%)
>>>>> CHANGED: stack-slot-coloring          NumDead  14 ->   18 ( +28.571%)
>>>>> CHANGED: twoaddressinstruction        NumConvertedTo3Addr         
>>>>>           27054 ->      26695 (  -1.327%)
>>>>>
>>>>> These are the Statistic Changes in CTMark w/O3 before/after these 
>>>>> patches (https://reviews.llvm.org/D87978#2307622):
>>>>> CHANGED: codegenprepare               NumExtsMoved 3631 ->       
>>>>> 3699 (  +1.873%)
>>>>> CHANGED: dse                          NumFastOther  192 ->       
>>>>>  194 (  +1.042%)
>>>>> CHANGED: gvn  IsValueFullyAvailableInBlockNumSpeculationsMax 4958 
>>>>> ->       5060 (    +2.057%)
>>>>> CHANGED: gvn                          NumGVNInstr 46657 ->     
>>>>>  47534 (  +1.880%)
>>>>> CHANGED: jump-threading               NumDupes 91 ->   92 (  +1.099%)
>>>>> CHANGED: licm                         NumMovedLoads  6272 ->       
>>>>> 6344 (  +1.148%)
>>>>> CHANGED: licm                         NumPromoted 381 ->       
>>>>>  438 ( +14.961%)
>>>>> CHANGED: loop-rotate  NumNotRotatedDueToHeaderSize 31 ->         
>>>>> 29 (    -6.452%)
>>>>> CHANGED: machinelicm                  NumPostRAHoisted     88 ->   
>>>>>       89 (  +1.136%)
>>>>> CHANGED: memdep                       NumCacheNonLocalPtr         
>>>>>         1005887 ->    1016671 (  +1.072%)
>>>>> CHANGED: memory-builtins              ObjectVisitorLoad           
>>>>> 62048 ->      63473 (  +2.297%)
>>>>> CHANGED: peephole-opt                 NumCmps 532 ->  526 (  -1.128%)
>>>>> CHANGED: regalloc                     NumDCEFoldedLoads           
>>>>>    27 ->         26 (  -3.704%)
>>>>> CHANGED: regalloc                     NumLocalSplits 1891 ->       
>>>>> 1870 (  -1.111%)
>>>>>
>>>>> Feedback Welcome.
>>>>>
>>>>> Atmn and Johannes
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev