[llvm] r207301 - [X86] Implement TargetLowering::getScalingFactorCost hook.

Quentin Colombet qcolombet at apple.com
Fri Apr 25 20:15:16 PDT 2014



Envoyé de mon iPhone

> Le 25 avr. 2014 à 19:53, Chandler Carruth <chandlerc at google.com> a écrit :
> 
> 
>> On Fri, Apr 25, 2014 at 7:11 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> What allocation means here (on a second thought that is not super clear) is the number of ports an instruction uses.
> 
> This is a fine explanation. =] It would be good to put it into the x86 implementation of getScalingFactor, maybe with examples?
Sure, I'll update that on Monday.
>  
>> 
>> Regarding the benchmarks, the numbers are unchanged on the llvm test suite + specs.
>> 
>> In fact, this commit is just the first step toward performance improvements. Indeed, this hook is not yet sufficient to make LSR to prefer the addressing mode of the form 'reg'  compared to those of the form 'reg1 + reg2 * scale'.
>> Indeed, currently we still prefer 'reg1 + reg2 * scale' to 'reg', in many cases and in particular with scale = 1, which is wrong performance wise. I am working on fixing this.
> 
> Makes sense.
>  
>> 
>> With my current prototype, I see up to 30% speed up on small kernels.
> 
> 
> Cool.
> 
> My only concern are the cases where not doing the scale requires more instructions.
Theoretically, this should not happen because the scaling factor has a very little weight in the rating of the formulae. I.e., it discriminates after almost everything else and should require as much instructions as the scaled version.

> In particular, I have seen a lot of performance problems in the past[1] which stemmed essentially from using lea to do address computations so that the addressing mode operand was simpler. Just want to make sure the LSR and other users of this will be sufficiently conservative.
Completely share your point of view!
In fact, I saw that happened with my prototype, but not with the current patch :).
> 
> [1]: So fun story here. The fact that LLVM so aggressively forms complex addressing modes may explain why this used to be such a big problem for me. It would use every single part of the addressing mode in structuring the loop body, and then during instruction selection we would fail in a large number of cases to match that as an actual addressing mode. I spent a bunch of time teaching the instruction selection layer for x86 to re-constitute every addressing mode it could in order to fix this. This may even become relevant, because while I *tried* to only do these heroics for cases that were strictly better (ie, fewer instructions total), I could have messed it up, and we might re-form complex addressing modes even when unnecessary.
> 
Good to know, thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140425/9f6cfcc8/attachment.html>


More information about the llvm-commits mailing list