[llvm-commits] [PATCH] X86: Turn cmovs into branches when profitable.

Wed Apr 25 23:04:47 PDT 2012

Hi Benjamin,

You are right. LLVM likes to canonicalize to select instructions and that can really hurt us on some modern cpu's. I've wanted a way to undo llvm selects for a while. 

That said. I'm not sure this is the right approach. I think it's better to turn selects back into control flows at llvm ir level. I'm envision something that's done around codegen prep time. IMHO, there are a few potential benefits. 

1. It will be target independent. I don't want to duplicate this kind of patterns for different targets. 
2. It will be possible to use better and more sophisticated heuristics (look at how the MI level if-converter compute the profitability). It should be able to take advantage of profile info if it's available. 

I also think the isel approach feels wrong. Isel really shouldn't use these complex predicates to drive isel decisions. It hurts compile time and it just generally goes against the design. Expanding pseudo instructions into control flows is also kinda yucky. We should only use it when there isn't a better design. 

Can I interest you into writing a llvm it de-select pass? :) We'd be more than happy in helping you with performance benchmarking and analysis. 

Evan

On Apr 25, 2012, at 1:45 PM, Benjamin Kramer <benny.kra at googlemail.com> wrote:

> This came up when a change in block placement formed a cmov and slowed down a
> hot loop by 50%:
> 
>   ucomisd    (%rdi), %xmm0
>   cmovbel    %edx, %esi
> 
> cmov is a really bad choice in this context because it doesn't get branch
> prediction. If we emit it as a branch, an out-of-order CPU can do a better job
> (if the branch is predicted right) and avoid waiting for the slow load+compare
> instruction to finish. Of course it won't help if the branch is unpredictable,
> but those are really rare in practice.
> 
> As a heuristic the attached patch turns all cmovs that have one use and a direct memory
> operand into branches. cmovs usually save some code size, so disable the transform
> in -Os mode and on atom, which is an in-order microarchitecture and unlikely to benefit
> from this.
> 
> Test suite shows a 7% improvement on richards_benchmark on x86_64/westmere. I'm
> not aware of any significant execution time regressions. The results depend a
> lot on the used microarchitecture so YMMV. I'm already staring to pull my hair out
> because number fluctuate a lot between different machines.
> 
> There is probably still more to get out of this, as LLVM really likes setcc and
> cmov, sometimes forming long chains of them which are deadly on modern CPUs.
> 
> Thanks to Chandler Carruth for the initial test case and providing me with
> test-suite numbers that were slightly more stable than mine :)
> 
> <cmov-into-branch.patch>
> 
> 
> - Ben
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits