[PATCH] D22941: Fix confusion over x86_64 CMOV semantics in order to avoid unnecessary zero extensions

Thu Jul 28 14:37:10 PDT 2016

DavidKreitzer created this revision.
DavidKreitzer added reviewers: sunfish, mkuper, aaboud.
DavidKreitzer added a subscriber: llvm-commits.

We noticed this issue while working on something unrelated.

The check for X86ISD::CMOV was added long ago in this revision:
------------------------------------------------------------------------
r81814 | djg | 2009-09-14 17:14:11 -0700 (Mon, 14 Sep 2009) | 3 lines

On x86-64, the 32-bit cmov doesn't actually clear the high 32-bit of
its result if the condition is false.

------------------------------------------------------------------------
But that statement is incorrect.  The 32-bit CMOVs do clear the high 32 bits of the result regardless of whether the condition is true or false. That is easily verifiable, and other compilers including MSVC and the Intel compiler take advantage of this semantic to avoid unnecessary 32-bit --> 64-bit zero extends.

The latest architecture manuals from both Intel and AMD support this change, though I wonder if an earlier documentation bug caused the confusion. At any rate, the latest AMD manual says, "In 64-bit mode, CMOVcc with a 32-bit operand size will clear the upper 32 bits of the destination register even
if the condition is false." And the latest Intel manual describes the behavior in pseudo-code as

Operation

temp ← SRC

IF condition TRUE
  THEN
    DEST ← temp;
  FI;
ELSE
  IF (OperandSize = 32 and IA-32e mode active)
    THEN
      DEST[63:32] ← 0;
  FI;
FI;

Not surprisingly, there was no significant performance impact from this change (on cpu2000, et al).


https://reviews.llvm.org/D22941

Files:
  lib/Target/X86/X86InstrCompiler.td
  test/CodeGen/X86/cmov.ll

Index: lib/Target/X86/X86InstrCompiler.td
===================================================================

--- lib/Target/X86/X86InstrCompiler.td
+++ lib/Target/X86/X86InstrCompiler.td
@@ -1289,15 +1289,13 @@
 
 // Any instruction that defines a 32-bit result leaves the high half of the
 // register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
-// be copying from a truncate. And x86's cmov doesn't do anything if the
-// condition is false. But any other 32-bit operation will zero-extend
+// be copying from a truncate. Any other 32-bit operation will zero-extend
 // up to 64 bits.
 def def32 : PatLeaf<(i32 GR32:$src), [{
   return N->getOpcode() != ISD::TRUNCATE &&
          N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&
          N->getOpcode() != ISD::CopyFromReg &&
-         N->getOpcode() != ISD::AssertSext &&
-         N->getOpcode() != X86ISD::CMOV;
+         N->getOpcode() != ISD::AssertSext;
 }]>;
 
 // In the case of a 32-bit def that is known to implicitly zero-extend,
Index: test/CodeGen/X86/cmov.ll
===================================================================
--- test/CodeGen/X86/cmov.ll
+++ test/CodeGen/X86/cmov.ll
@@ -33,16 +33,17 @@
 }
 
 
-; x86's 32-bit cmov doesn't clobber the high 32 bits of the destination
-; if the condition is false. An explicit zero-extend (movl) is needed
-; after the cmov.
+; x86's 32-bit cmov zeroes the high 32 bits of the destination. Make
+; sure CodeGen takes advantage of that to avoid an unnecessary
+; zero-extend (movl) after the cmov.
 
 declare void @bar(i64) nounwind
 
 define void @test3(i64 %a, i64 %b, i1 %p) nounwind {
 ; CHECK-LABEL: test3:
 ; CHECK:      cmov{{n?}}el %[[R1:e..]], %[[R2:e..]]
-; CHECK-NEXT: movl    %[[R2]], %{{e..}}
+; CHECK-NOT:  movl
+; CHECK:      call
 
   %c = trunc i64 %a to i32
   %d = trunc i64 %b to i32


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D22941.65995.patch
Type: text/x-patch
Size: 1840 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160728/21e0adec/attachment.bin>