[PATCH] D83319: [x86] fix miscompile in buildvector v16i8 lowering

Tue Jul 7 09:16:30 PDT 2020

spatel created this revision.
spatel added reviewers: craig.topper, RKSimon, lebedev.ri.
Herald added subscribers: hiraditya, mcrosier.
Herald added a project: LLVM.

In the test based on PR46586:
https://bugs.llvm.org/show_bug.cgi?id=46586
...we are inserting 16-bits into the high element of the vector, shuffling it to element 0, and extracting 32-bits. But xmm1 was never initialized, so the top 16-bits of the extract are undef without this patch.

(It seems like we could do better than this by recognizing that we only demand a subsection of the build vector, but I want to make sure we fix the miscompile 1st.)

This path is only used for pre-SSE4.1, and simpler patterns get squashed somewhere along the way, so the test still includes a 'urem' as it did in the original test from the bug report.


https://reviews.llvm.org/D83319

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/buildvec-insertvec.ll


Index: llvm/test/CodeGen/X86/buildvec-insertvec.ll
===================================================================

--- llvm/test/CodeGen/X86/buildvec-insertvec.ll
+++ llvm/test/CodeGen/X86/buildvec-insertvec.ll
@@ -784,12 +784,13 @@
   ret <4 x i32> %5
 }
 
-; FIXME: If we do not define all bytes that are extracted, this is a miscompile.
+; If we do not define all bytes that are extracted, this is a miscompile.
 
 define i32 @PR46586(i8* %p, <4 x i32> %v) {
 ; SSE2-LABEL: PR46586:
 ; SSE2:       # %bb.0:
 ; SSE2-NEXT:    movzbl 3(%rdi), %eax
+; SSE2-NEXT:    pxor %xmm1, %xmm1
 ; SSE2-NEXT:    pinsrw $6, %eax, %xmm1
 ; SSE2-NEXT:    pshufd {{.*#+}} xmm1 = xmm1[3,1,2,3]
 ; SSE2-NEXT:    movd %xmm1, %eax
Index: llvm/lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- llvm/lib/Target/X86/X86ISelLowering.cpp
+++ llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -8002,10 +8002,11 @@
         Elt = NextElt;
     }
 
-    // If our first insertion is not the first index then insert into zero
-    // vector to break any register dependency else use SCALAR_TO_VECTOR.
+    // If our first insertion is not the first index or zeros are needed, then
+    // insert into zero vector. Otherwise, use SCALAR_TO_VECTOR (leaves high
+    // elements undefined).
     if (!V) {
-      if (i != 0)
+      if (i != 0 || NumZero)
         V = getZeroVector(MVT::v8i16, Subtarget, DAG, dl);
       else {
         V = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Elt);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D83319.276090.patch
Type: text/x-patch
Size: 1521 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200707/52e62933/attachment.bin>