[PATCH] D14496: X86: More efficient codegen for 64-bit compare-and-branch
David Kreitzer via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 9 09:40:49 PST 2015
DavidKreitzer added a comment.
(1) The subl/sbbl sequence cannot distinguish between greater-than and equal-to, so it doesn't work for ==/!= without an additional instruction at which point, it's no better than the current sequence. This also means you have to be careful about how you order the operands depending on the condition. For (a < b), you'd generate what Michael showed:
test(long long, long long):
movl 4(%esp), %eax
subl 12(%esp), %eax
movl 8(%esp), %edx
sbbl 16(%esp), %edx
jge ..B1.3
movl $1, %eax
ret
..B1.3:
movl $2, %eax
ret
For (a >= b), you can simply replace the "jge" with "jl". For (a > b), you have to reverse the sense of the subtraction like this:
test(long long, long long):
movl 12(%esp), %eax
subl 4(%esp), %eax
movl 16(%esp), %edx
sbbl 8(%esp), %edx
jge ..B1.3
movl $1, %eax
ret
..B1.3:
movl $2, %eax
ret
(a <= b) would have the same operand order but with "jl" instead of "jge".
(2) I think there is no clear winner between the current 1-branch implementation of ==/!= and the proposed 2-branch implementation. The branch prediction effect will be data dependent, and the context will determine whether the extra branch or the longer dependence chain of the current implementation is more harmful. For example, one situation where the 1-branch implementation will shine is when the compare operands almost always compare unequal, but the lower bits often compare equal.
(3) ICC uses both eax and edx to give the post-RA scheduler more flexibility, which is sometimes useful, sometimes not. (In Michael's code snippet, it clearly didn't accomplish anything.)
http://reviews.llvm.org/D14496
More information about the llvm-commits
mailing list