[llvm-commits] [PATCH] X86: Turn cmovs into branches when profitable.

Wed Apr 25 13:45:18 PDT 2012

This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:

	ucomisd	(%rdi), %xmm0
	cmovbel	%edx, %esi

cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.

As a heuristic the attached patch turns all cmovs that have one use and a direct memory
operand into branches. cmovs usually save some code size, so disable the transform
in -Os mode and on atom, which is an in-order microarchitecture and unlikely to benefit
from this.

Test suite shows a 7% improvement on richards_benchmark on x86_64/westmere. I'm
not aware of any significant execution time regressions. The results depend a
lot on the used microarchitecture so YMMV. I'm already staring to pull my hair out
because number fluctuate a lot between different machines.

There is probably still more to get out of this, as LLVM really likes setcc and
cmov, sometimes forming long chains of them which are deadly on modern CPUs.

Thanks to Chandler Carruth for the initial test case and providing me with
test-suite numbers that were slightly more stable than mine :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cmov-into-branch.patch
Type: application/octet-stream
Size: 6187 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120425/129210e2/attachment.obj>
-------------- next part --------------

- Ben