[llvm-commits] [PATCH] X86: Turn cmovs into branches when profitable.

Thu Apr 26 12:30:16 PDT 2012

On 26.04.2012, at 08:04, Evan Cheng wrote:

> Hi Benjamin,
> 
> You are right. LLVM likes to canonicalize to select instructions and that can really hurt us on some modern cpu's. I've wanted a way to undo llvm selects for a while. 
> 
> That said. I'm not sure this is the right approach. I think it's better to turn selects back into control flows at llvm ir level. I'm envision something that's done around codegen prep time. IMHO, there are a few potential benefits. 
> 
> 1. It will be target independent. I don't want to duplicate this kind of patterns for different targets. 
> 2. It will be possible to use better and more sophisticated heuristics (look at how the MI level if-converter compute the profitability). It should be able to take advantage of profile info if it's available. 
> 
> I also think the isel approach feels wrong. Isel really shouldn't use these complex predicates to drive isel decisions. It hurts compile time and it just generally goes against the design. Expanding pseudo instructions into control flows is also kinda yucky. We should only use it when there isn't a better design. 
> 
> Can I interest you into writing a llvm it de-select pass? :) We'd be more than happy in helping you with performance benchmarking and analysis. 

My first idea was to do it during selection DAG formation but that was even uglier, it makes sense to do it at IR level though.

Attached is a basic pass that is run by codegen the same way CodeGenPrepare runs. It uses the (dumb) cmp-with-load heuristic I came up with for the x86 backend patch. I didn't run the test-suite this time (no non-noisy builder at my hands) but the improvement on richards_benchmark is still measurable. Testing the patch is easy, just pass -enable-select2branch to llc.

Some questions remain open. Should it be merged into CodeGenPrepare? We need some kind of target hook to avoid doing the optimization on CPUs that are unlikely to benefit, like Atom and pre-A9 ARM cores. Of course benchmarking is key, it's just hard to dig up the hardware ;)

Coming up with good heuristics is hard. I'd love to use BranchProbabilityInfo here, but it doesn't understand selects. We also need some clever way to break up long select chains.

- Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: select2branch-1.diff
Type: application/octet-stream
Size: 9126 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120426/d7a5f8ff/attachment.obj>
-------------- next part --------------

> 
> Evan
> 
> On Apr 25, 2012, at 1:45 PM, Benjamin Kramer <benny.kra at googlemail.com> wrote:
> 
>> This came up when a change in block placement formed a cmov and slowed down a
>> hot loop by 50%:
>> 
>>  ucomisd    (%rdi), %xmm0
>>  cmovbel    %edx, %esi
>> 
>> cmov is a really bad choice in this context because it doesn't get branch
>> prediction. If we emit it as a branch, an out-of-order CPU can do a better job
>> (if the branch is predicted right) and avoid waiting for the slow load+compare
>> instruction to finish. Of course it won't help if the branch is unpredictable,
>> but those are really rare in practice.
>> 
>> As a heuristic the attached patch turns all cmovs that have one use and a direct memory
>> operand into branches. cmovs usually save some code size, so disable the transform
>> in -Os mode and on atom, which is an in-order microarchitecture and unlikely to benefit
>> from this.
>> 
>> Test suite shows a 7% improvement on richards_benchmark on x86_64/westmere. I'm
>> not aware of any significant execution time regressions. The results depend a
>> lot on the used microarchitecture so YMMV. I'm already staring to pull my hair out
>> because number fluctuate a lot between different machines.
>> 
>> There is probably still more to get out of this, as LLVM really likes setcc and
>> cmov, sometimes forming long chains of them which are deadly on modern CPUs.
>> 
>> Thanks to Chandler Carruth for the initial test case and providing me with
>> test-suite numbers that were slightly more stable than mine :)
>> 
>> <cmov-into-branch.patch>
>> 
>> 
>> - Ben
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>