[LLVMdev] RFC: GEP as canonical form for pointer addressing

Philip Reames listmail at philipreames.com
Wed Feb 19 10:15:37 PST 2014


On 02/18/2014 01:02 PM, Andrew Trick wrote:
> On Feb 18, 2014, at 12:58 PM, David Chisnall 
> <David.Chisnall at cl.cam.ac.uk <mailto:David.Chisnall at cl.cam.ac.uk>> wrote:
>> On 18 Feb 2014, at 19:51, Philip Reames <listmail at philipreames.com 
>> <mailto:listmail at philipreames.com>> wrote:
>>> On 02/17/2014 02:53 PM, Andrew Trick wrote:
>>>> On Feb 17, 2014, at 2:31 AM, David Chisnall 
>>>> <David.Chisnall at cl.cam.ac.uk <mailto:David.Chisnall at cl.cam.ac.uk>> 
>>>> wrote:
>>>>> On 15 Feb 2014, at 23:55, Andrew Trick <atrick at apple.com 
>>>>> <mailto:atrick at apple.com>> wrote:
>>>>>> On Feb 14, 2014, at 5:18 PM, Philip Reames 
>>>>>> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>>>>>>
>>>>> Not directly related, but our canonical form for loops involving 
>>>>> pointers[1] turns a loop that contains a GEP with the loop 
>>>>> induction variable into a GEP with the increment inside the loop. 
>>>>>  This has two annoying properties for code generation:
>>>>>
>>>>> - The GEP with the induction variable as the offset maps cleanly 
>>>>> to CPU addressing modes and so we generate better code if we don't 
>>>>> do this canonicalisation, and therefore end up trying to undo it 
>>>>> in the back end (yuck).
>>>>>
>>>>> - If the source is the start of an object, then this behaviour is 
>>>>> GC-hostile because it means that IR that contains a pointer to an 
>>>>> object start now only contains a pointer to the middle, requiring 
>>>>> the GC to deal with inner pointers.
>>>>>
>>>>> It would be nice if we could have canonical forms such that if the 
>>>>> front end ensures that there are no inner pointers without 
>>>>> pointers to the object's start in the IR, the optimisers don't 
>>>>> break this.
>>>>>
>>>>> David
>>>>>
>>>>> [1] Are canonical forms actually documented anywhere, or are they 
>>>>> simply undocumented implicit contracts?
>>>> I would say whatever form is currently generated by IR passes is 
>>>> defined as canonical. It’s not easy to specify. At some points in 
>>>> the pipeline (early and late) it’s fine to permit multiple forms of 
>>>> the same expression as long as it’s canonical-enough for the 
>>>> downstream analysis.
>>>>
>>>> If some pass is generating a suboptimal form, it’s good to question 
>>>> whether it’s really necessary for any analysis. If not, we should 
>>>> change it. Without a test case, I can’t say what issue you’re 
>>>> running into above.
>>> David, do you happen to have a test case on hand?  I know I've seen 
>>> this before, but my attempt to write out a quick example from memory 
>>> failed.
>>
>> I don't have one to hand - it's something I see in things with fairly 
>> complex loop structures.  I'll try to find one next time I'm chasing 
>> performance issues.
>
> If you’re running LSR, all bets are off. When Philip says he wants 
> canonical IR before CodeGenPrepare, I take that to mean anything 
> scheduled in TargetPassConfig::addIRPasses, including LSR, constant 
> hoisting, etc.
>
> -Andy
This is exactly what I meant.  Thanks for the clarification.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140219/1b508a13/attachment.html>


More information about the llvm-dev mailing list