[llvm] 642eed3 - [x86] fix miscompile in buildvector v16i8 lowering

Tue Jul 7 10:02:40 PDT 2020

Author: Sanjay Patel
Date: 2020-07-07T13:02:31-04:00
New Revision: 642eed37134db4aca953704d1e4ae856af675f51

URL: https://github.com/llvm/llvm-project/commit/642eed37134db4aca953704d1e4ae856af675f51
DIFF: https://github.com/llvm/llvm-project/commit/642eed37134db4aca953704d1e4ae856af675f51.diff

LOG: [x86] fix miscompile in buildvector v16i8 lowering

In the test based on PR46586:
https://bugs.llvm.org/show_bug.cgi?id=46586
...we are inserting 16-bits into the high element of the vector, shuffling it
to element 0, and extracting 32-bits. But xmm1 was never initialized, so the
top 16-bits of the extract are undef without this patch.

(It seems like we could do better than this by recognizing that we only demand
a subsection of the build vector, but I want to make sure we fix the
miscompile 1st.)

This path is only used for pre-SSE4.1, and simpler patterns get squashed
somewhere along the way, so the test still includes a 'urem' as it did in the
original test from the bug report.

Differential Revision: https://reviews.llvm.org/D83319

Added: 
    

Modified: 
    llvm/lib/Target/X86/X86ISelLowering.cpp
    llvm/test/CodeGen/X86/buildvec-insertvec.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 4821dd44e01f..575f358361b1 100644

--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -8002,10 +8002,11 @@ static SDValue LowerBuildVectorv16i8(SDValue Op, unsigned NonZeros,
         Elt = NextElt;
     }
 
-    // If our first insertion is not the first index then insert into zero
-    // vector to break any register dependency else use SCALAR_TO_VECTOR.
+    // If our first insertion is not the first index or zeros are needed, then
+    // insert into zero vector. Otherwise, use SCALAR_TO_VECTOR (leaves high
+    // elements undefined).
     if (!V) {
-      if (i != 0)
+      if (i != 0 || NumZero)
         V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl);
       else {
         V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt);

diff  --git a/llvm/test/CodeGen/X86/buildvec-insertvec.ll b/llvm/test/CodeGen/X86/buildvec-insertvec.ll
index 3922450b0f21..e428ae8d5919 100644
--- a/llvm/test/CodeGen/X86/buildvec-insertvec.ll
+++ b/llvm/test/CodeGen/X86/buildvec-insertvec.ll
@@ -784,12 +784,13 @@ define <4 x i32> @ossfuzz5688(i32 %a0) {
   ret <4 x i32> %5
 }
 
-; FIXME: If we do not define all bytes that are extracted, this is a miscompile.
+; If we do not define all bytes that are extracted, this is a miscompile.
 
 define i32 @PR46586(i8* %p, <4 x i32> %v) {
 ; SSE2-LABEL: PR46586:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    movzbl 3(%rdi), %eax
+; SSE2-NEXT:    pxor %xmm1, %xmm1
 ; SSE2-NEXT:    pinsrw $6, %eax, %xmm1
 ; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3]
 ; SSE2-NEXT:    movd %xmm1, %eax