[llvm-dev] LSR

Wed Apr 19 09:25:15 PDT 2017

On 04/18/2017 12:28 AM, Jonas Paulsson wrote:
> Hi Hal,
>>
>> No, LSR won't add new PHIs. This is a long-standing deficiency (and 
>> also, in part, prevents it from properly handling pre/post-inc 
>> addressing modes, which motives some target-specific passes such as 
>> lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).
>>
> I do find in this regression that gcc manages to use four address 
> registers, while llvm uses just one, with a lot of extra address 
> building instructions as a result, where the gcc loop looks very 
> clean. I suspect that this could be a needed feature. Would you 
> recommend doing this (splitting PHIs) as a "LSRPrep" pass, perhaps in 
> the target, or would you try to extend LSR itself?

I recall thinking that the "right" way to do this is to extend LSR 
itself. This way the cost of adding extra PHIs could be weighed with 
addressing modes, register pressure, and other factors. I have not, 
however, looked enough into the details to say exactly how this would 
work - if I had, I probably would have done it myself ;) -- cc'ing Andy 
in case he'd like to chime in.

>
>>>
>>>
>>> I also see that LSR is thinking in terms of increments between the 
>>> memory accesses. In the loop I am working with it's disappointing to 
>>> see that before each memory access, the base address is loaded into 
>>> register, and then the offset is added, and then the access, which 
>>> is 3 instructions. It should have been just an add/sub after the 
>>> previous access before the memory access, per LSRs intentions. I 
>>> wonder where this is supposed to be handled: In some sort of target 
>>> pre-isel pass that chains the GEPs? Or is this just folded more 
>>> often on other targets?
>>
>> As I recall, it does not do this now (although this is also needed 
>> for handling pre/post-inc addressing modes properly).
>>
> Same question - It might be simpler to do a separate post-LSR GEP 
> handling (in CodeGenPrepare, perhaps?), but I suspect it would also be 
> possible to extend LSR to do this instead?

Splitting is what, in practice, we have now. Targets have "fixup/prep" 
passes to account for the fact that LSR won't add new PHIs. This seems 
simpler, but it is probably also suboptimal (i.e. it works reasonably 
for targets with simpler addressing modes, like PPC, although there are 
still issues, but I don't see that it would work well for targets with 
complicated ones).

  -Hal

>
> /Jonas
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory