[PATCH] D14496: X86: More efficient codegen for 64-bit compare-and-branch

Mon Nov 9 09:40:49 PST 2015

DavidKreitzer added a comment.

(1) The subl/sbbl sequence cannot distinguish between greater-than and equal-to, so it doesn't work for ==/!= without an additional instruction at which point, it's no better than the current sequence. This also means you have to be careful about how you order the operands depending on the condition.  For (a < b), you'd generate what Michael showed:

  test(long long, long long):
          movl      4(%esp), %eax 
          subl      12(%esp), %eax
          movl      8(%esp), %edx 
          sbbl      16(%esp), %edx
          jge       ..B1.3        
          movl      $1, %eax      
          ret                     
  ..B1.3:                         
          movl      $2, %eax      
          ret

For (a >= b), you can simply replace the "jge" with "jl". For (a > b), you have to reverse the sense of the subtraction like this:

  test(long long, long long):
          movl     12(%esp), %eax 
          subl      4(%esp), %eax
          movl      16(%esp), %edx 
          sbbl      8(%esp), %edx
          jge       ..B1.3        
          movl      $1, %eax      
          ret                     
  ..B1.3:                         
          movl      $2, %eax      
          ret

(a <= b) would have the same operand order but with "jl" instead of "jge".

(2) I think there is no clear winner between the current 1-branch implementation of ==/!= and the proposed 2-branch implementation. The branch prediction effect will be data dependent, and the context will determine whether the extra branch or the longer dependence chain of the current implementation is more harmful. For example, one situation where the 1-branch implementation will shine is when the compare operands almost always compare unequal, but the lower bits often compare equal.

(3) ICC uses both eax and edx to give the post-RA scheduler more flexibility, which is sometimes useful, sometimes not.  (In Michael's code snippet, it clearly didn't accomplish anything.)

http://reviews.llvm.org/D14496