[llvm] r214011 - [x86] Fix PR20355 (for real). There are many layers to this bug.

Fri Jul 25 20:46:57 PDT 2014

Author: chandlerc
Date: Fri Jul 25 22:46:57 2014
New Revision: 214011

URL: http://llvm.org/viewvc/llvm-project?rev=214011&view=rev
Log:
[x86] Fix PR20355 (for real). There are many layers to this bug.

The tale starts with r212808 which attempted to fix inversion of the low
and high bits when lowering MUL_LOHI. Sadly, that commit did not include
any positive test cases, and just removed some operations from a test
case where the actual logic being changed isn't fully visible from the
test.

What this commit did was two things. First, it reversed the low and high
results in the formation of the MERGE_VALUES node for the multiple
results. This is entirely correct.

Second it changed the shuffles for extracting the low and high
components from the i64 results of the multiplies to extract them
assuming a big-endian-style encoding of the multiply results. This
second change is wrong. There is no big-endian encoding in x86, the
results of the multiplies are normal v2i64s: when cast to v4i32, the low
i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3.

However, the first change wasn't enough to actually fix the bug, which
is (I assume) why the second change was also made. There was another bug
in the MERGE_VALUES formation: we weren't using a VTList, and so were
getting a single result node! When grabbing the *second* result from the
node, we got... well.. colud be anything. I think this *appeared* to
invert things, but had to be causing other problems as well.

Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should
have been fine, right? NOOOPE! Because the core bug was never addressed,
the test in vector-idiv failed when I fixed the MERGE_VALUES node.
Because there are essentially no docs for this node, I had to guess at
how to fix it and tried swapping the operands, restoring the order of
the original code before r212808. While this "fixed" the test case (in
that we produced the write instructions) we were still extracting the
wrong elements of the i64s, and thus PR20355 was still broken.

This commit essentially reverts the big-endian-style extraction part of
r212808 and goes back to the original masks which were correct. Now that
the MERGE_VALUES node formation is also correct, everything works. I've
also included a more detailed test from PR20355 to make sure this stays
fixed.

Modified:
    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
    llvm/trunk/test/CodeGen/X86/vector-idiv.ll

Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=214011&r1=214010&r2=214011&view=diff
==============================================================================

--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Jul 25 22:46:57 2014
@@ -15444,29 +15444,16 @@ static SDValue LowerMUL_LOHI(SDValue Op,
                              DAG.getNode(Opcode, dl, MulVT, Odd0, Odd1));
 
   // Shuffle it back into the right order.
-  // The internal representation is big endian.
-  // In other words, a i64 bitcasted to 2 x i32 has its high part at index 0
-  // and its low part at index 1.
-  // Moreover, we have: Mul1 = <ae|cg> ; Mul2 = <bf|dh>
-  // Vector index                0 1   ;          2 3
-  // We want      <ae|bf|cg|dh>
-  // Vector index   0  2  1  3
-  // Since each element is seen as 2 x i32, we get:
-  // high_mask[i] = 2 x vector_index[i]
-  // low_mask[i] = 2 x vector_index[i] + 1
-  // where vector_index = {0, Size/2, 1, Size/2 + 1, ...,
-  //                       Size/2 - 1, Size/2 + Size/2 - 1}
-  // where Size is the number of element of the final vector.
   SDValue Highs, Lows;
   if (VT == MVT::v8i32) {
-    const int HighMask[] = {0, 8, 2, 10, 4, 12, 6, 14};
+    const int HighMask[] = {1, 9, 3, 11, 5, 13, 7, 15};
     Highs = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, HighMask);
-    const int LowMask[] = {1, 9, 3, 11, 5, 13, 7, 15};
+    const int LowMask[] = {0, 8, 2, 10, 4, 12, 6, 14};
     Lows = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, LowMask);
   } else {
-    const int HighMask[] = {0, 4, 2, 6};
+    const int HighMask[] = {1, 5, 3, 7};
     Highs = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, HighMask);
-    const int LowMask[] = {1, 5, 3, 7};
+    const int LowMask[] = {1, 4, 2, 6};
     Lows = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, LowMask);
   }
 
@@ -15484,9 +15471,9 @@ static SDValue LowerMUL_LOHI(SDValue Op,
     Highs = DAG.getNode(ISD::SUB, dl, VT, Highs, Fixup);
   }
 
-  // THe first result of MUL_LOHI is actually the high value, followed by the
-  // low value.
-  SDValue Ops[] = {Highs, Lows};
+  // The first result of MUL_LOHI is actually the low value, followed by the
+  // high value.
+  SDValue Ops[] = {Lows, Highs};
   return DAG.getMergeValues(Ops, dl);
 }
 

Modified: llvm/trunk/test/CodeGen/X86/vector-idiv.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-idiv.ll?rev=214011&r1=214010&r2=214011&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-idiv.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-idiv.ll Fri Jul 25 22:46:57 2014
@@ -220,3 +220,42 @@ define <2 x i16> @test12() {
 ; AVX-LABEL: test12:
 ; AVX: xorps
 }
+
+define <4 x i32> @PR20355(<4 x i32> %a) {
+; SSE-LABEL: PR20355:
+; SSE:         movdqa {{.*}}, %[[X1:xmm[0-9]+]]
+; SSE-NEXT:    movdqa %[[X1]], %[[X2:xmm[0-9]+]]
+; SSE-NEXT:    psrad  $31, %[[X2]]
+; SSE-NEXT:    pand   %xmm0, %[[X2]]
+; SSE-NEXT:    movdqa %xmm0, %[[X3:xmm[0-9]+]]
+; SSE-NEXT:    psrad  $31, %[[X3]]
+; SSE-NEXT:    pand   %[[X1]], %[[X3]]
+; SSE-NEXT:    paddd  %[[X2]], %[[X3]]
+; SSE-NEXT:    pshufd {{.*}} # [[X4:xmm[0-9]+]] = xmm0[1,0,3,0]
+; SSE-NEXT:    pmuludq %[[X1]], %xmm0
+; SSE-NEXT:    pshufd {{.*}} # [[X1]] = [[X1]][1,0,3,0]
+; SSE-NEXT:    pmuludq %[[X4]], %[[X1]]
+; SSE-NEXT:    shufps {{.*}} # xmm0 = xmm0[1,3],[[X1]][1,3]
+; SSE-NEXT:    pshufd {{.*}} # [[X5:xmm[0-9]+]] = xmm0[0,2,1,3]
+; SSE-NEXT:    psubd  %[[X3]], %[[X5]]
+; SSE-NEXT:    movdqa %[[X5]], %xmm0
+; SSE-NEXT:    psrld  $31, %xmm0
+; SSE-NEXT:    paddd  %[[X5]], %xmm0
+; SSE-NEXT:    retq
+;
+; SSE41-LABEL: PR20355:
+; SSE41:         movdqa {{.*}}, %[[X1:xmm[0-9]+]]
+; SSE41-NEXT:    pshufd {{.*}} # [[X2:xmm[0-9]+]] = xmm0[1,0,3,0]
+; SSE41-NEXT:    pmuldq %[[X1]], %xmm0
+; SSE41-NEXT:    pshufd {{.*}} # [[X1]] = [[X1]][1,0,3,0]
+; SSE41-NEXT:    pmuldq %[[X2]], %[[X1]]
+; SSE41-NEXT:    shufps {{.*}} # xmm0 = xmm0[1,3],[[X1]][1,3]
+; SSE41-NEXT:    pshufd {{.*}} # [[X3:xmm[0-9]+]] = xmm0[0,2,1,3]
+; SSE41-NEXT:    movdqa %[[X3]], %xmm0
+; SSE41-NEXT:    psrld  $31, %xmm0
+; SSE41-NEXT:    paddd  %[[X3]], %xmm0
+; SSE41-NEXT:    retq
+entry:
+  %sdiv = sdiv <4 x i32> %a, <i32 3, i32 3, i32 3, i32 3>
+  ret <4 x i32> %sdiv
+}