[llvm] r214011 - [x86] Fix PR20355 (for real). There are many layers to this bug.

Mon Jul 28 12:46:07 PDT 2014

Hi Chandler,

I think we still have a bug in the current implementation.
See my inline comments.

Also, I think it would be nice to update the comments on the masks value instead of removing them.
When I reversed engineered this code, it took me ages to figure out what were these magic values.

What do you think?

Thanks,
-Quentin

On Jul 25, 2014, at 8:46 PM, Chandler Carruth <chandlerc at gmail.com> wrote:

> Author: chandlerc
> Date: Fri Jul 25 22:46:57 2014
> New Revision: 214011
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=214011&view=rev
> Log:
> [x86] Fix PR20355 (for real). There are many layers to this bug.
> 
> The tale starts with r212808 which attempted to fix inversion of the low
> and high bits when lowering MUL_LOHI. Sadly, that commit did not include
> any positive test cases, and just removed some operations from a test
> case where the actual logic being changed isn't fully visible from the
> test.
> 
> What this commit did was two things. First, it reversed the low and high
> results in the formation of the MERGE_VALUES node for the multiple
> results. This is entirely correct.
> 
> Second it changed the shuffles for extracting the low and high
> components from the i64 results of the multiplies to extract them
> assuming a big-endian-style encoding of the multiply results. This
> second change is wrong. There is no big-endian encoding in x86, the
> results of the multiplies are normal v2i64s: when cast to v4i32, the low
> i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3.
Weird, I made a quick test with v2i32 to i64 and I saw the opposite but I may have done the wrong check.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2.ll
Type: application/octet-stream
Size: 820 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140728/6d037ce7/attachment.obj>
-------------- next part --------------

> 
> However, the first change wasn't enough to actually fix the bug, which
> is (I assume) why the second change was also made. There was another bug
> in the MERGE_VALUES formation: we weren't using a VTList, and so were
> getting a single result node! When grabbing the *second* result from the
> node, we got... well.. colud be anything. I think this *appeared* to
> invert things, but had to be causing other problems as well.
> 
> Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should
> have been fine, right? NOOOPE! Because the core bug was never addressed,
> the test in vector-idiv failed when I fixed the MERGE_VALUES node.
> Because there are essentially no docs for this node, I had to guess at
> how to fix it and tried swapping the operands, restoring the order of
> the original code before r212808. While this "fixed" the test case (in
> that we produced the write instructions) we were still extracting the
> wrong elements of the i64s, and thus PR20355 was still broken.
> 
> This commit essentially reverts the big-endian-style extraction part of
> r212808 and goes back to the original masks which were correct. Now that
> the MERGE_VALUES node formation is also correct, everything works. I've
> also included a more detailed test from PR20355 to make sure this stays
> fixed.
> 
> Modified:
>    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>    llvm/trunk/test/CodeGen/X86/vector-idiv.ll
> 
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=214011&r1=214010&r2=214011&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Jul 25 22:46:57 2014
> @@ -15444,29 +15444,16 @@ static SDValue LowerMUL_LOHI(SDValue Op,
>                              DAG.getNode(Opcode, dl, MulVT, Odd0, Odd1));
> 
>   // Shuffle it back into the right order.
> -  // The internal representation is big endian.
> -  // In other words, a i64 bitcasted to 2 x i32 has its high part at index 0
> -  // and its low part at index 1.
> -  // Moreover, we have: Mul1 = <ae|cg> ; Mul2 = <bf|dh>
> -  // Vector index                0 1   ;          2 3
> -  // We want      <ae|bf|cg|dh>
> -  // Vector index   0  2  1  3
> -  // Since each element is seen as 2 x i32, we get:
> -  // high_mask[i] = 2 x vector_index[i]
> -  // low_mask[i] = 2 x vector_index[i] + 1
> -  // where vector_index = {0, Size/2, 1, Size/2 + 1, ...,
> -  //                       Size/2 - 1, Size/2 + Size/2 - 1}
> -  // where Size is the number of element of the final vector.
>   SDValue Highs, Lows;
>   if (VT == MVT::v8i32) {
> -    const int HighMask[] = {0, 8, 2, 10, 4, 12, 6, 14};
> +    const int HighMask[] = {1, 9, 3, 11, 5, 13, 7, 15};
>     Highs = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, HighMask);
> -    const int LowMask[] = {1, 9, 3, 11, 5, 13, 7, 15};
> +    const int LowMask[] = {0, 8, 2, 10, 4, 12, 6, 14};
>     Lows = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, LowMask);
>   } else {
> -    const int HighMask[] = {0, 4, 2, 6};
> +    const int HighMask[] = {1, 5, 3, 7};
>     Highs = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, HighMask);
> -    const int LowMask[] = {1, 5, 3, 7};
> +    const int LowMask[] = {1, 4, 2, 6};
It should be 0, 4, 2, 6 not *1*, 4, 2, 6 ;-)

>     Lows = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, LowMask);
>   }
> 
> @@ -15484,9 +15471,9 @@ static SDValue LowerMUL_LOHI(SDValue Op,
>     Highs = DAG.getNode(ISD::SUB, dl, VT, Highs, Fixup);
>   }
> 
> -  // THe first result of MUL_LOHI is actually the high value, followed by the
> -  // low value.
> -  SDValue Ops[] = {Highs, Lows};
> +  // The first result of MUL_LOHI is actually the low value, followed by the
> +  // high value.
> +  SDValue Ops[] = {Lows, Highs};
>   return DAG.getMergeValues(Ops, dl);
> }
> 
> 
> Modified: llvm/trunk/test/CodeGen/X86/vector-idiv.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-idiv.ll?rev=214011&r1=214010&r2=214011&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-idiv.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-idiv.ll Fri Jul 25 22:46:57 2014
> @@ -220,3 +220,42 @@ define <2 x i16> @test12() {
> ; AVX-LABEL: test12:
> ; AVX: xorps
> }
> +
> +define <4 x i32> @PR20355(<4 x i32> %a) {
> +; SSE-LABEL: PR20355:
> +; SSE:         movdqa {{.*}}, %[[X1:xmm[0-9]+]]
> +; SSE-NEXT:    movdqa %[[X1]], %[[X2:xmm[0-9]+]]
> +; SSE-NEXT:    psrad  $31, %[[X2]]
> +; SSE-NEXT:    pand   %xmm0, %[[X2]]
> +; SSE-NEXT:    movdqa %xmm0, %[[X3:xmm[0-9]+]]
> +; SSE-NEXT:    psrad  $31, %[[X3]]
> +; SSE-NEXT:    pand   %[[X1]], %[[X3]]
> +; SSE-NEXT:    paddd  %[[X2]], %[[X3]]
> +; SSE-NEXT:    pshufd {{.*}} # [[X4:xmm[0-9]+]] = xmm0[1,0,3,0]
> +; SSE-NEXT:    pmuludq %[[X1]], %xmm0
> +; SSE-NEXT:    pshufd {{.*}} # [[X1]] = [[X1]][1,0,3,0]
> +; SSE-NEXT:    pmuludq %[[X4]], %[[X1]]
> +; SSE-NEXT:    shufps {{.*}} # xmm0 = xmm0[1,3],[[X1]][1,3]
> +; SSE-NEXT:    pshufd {{.*}} # [[X5:xmm[0-9]+]] = xmm0[0,2,1,3]
> +; SSE-NEXT:    psubd  %[[X3]], %[[X5]]
> +; SSE-NEXT:    movdqa %[[X5]], %xmm0
> +; SSE-NEXT:    psrld  $31, %xmm0
> +; SSE-NEXT:    paddd  %[[X5]], %xmm0
> +; SSE-NEXT:    retq
> +;
> +; SSE41-LABEL: PR20355:
> +; SSE41:         movdqa {{.*}}, %[[X1:xmm[0-9]+]]
> +; SSE41-NEXT:    pshufd {{.*}} # [[X2:xmm[0-9]+]] = xmm0[1,0,3,0]
> +; SSE41-NEXT:    pmuldq %[[X1]], %xmm0
> +; SSE41-NEXT:    pshufd {{.*}} # [[X1]] = [[X1]][1,0,3,0]
> +; SSE41-NEXT:    pmuldq %[[X2]], %[[X1]]
> +; SSE41-NEXT:    shufps {{.*}} # xmm0 = xmm0[1,3],[[X1]][1,3]
> +; SSE41-NEXT:    pshufd {{.*}} # [[X3:xmm[0-9]+]] = xmm0[0,2,1,3]
> +; SSE41-NEXT:    movdqa %[[X3]], %xmm0
> +; SSE41-NEXT:    psrld  $31, %xmm0
> +; SSE41-NEXT:    paddd  %[[X3]], %xmm0
> +; SSE41-NEXT:    retq
> +entry:
> +  %sdiv = sdiv <4 x i32> %a, <i32 3, i32 3, i32 3, i32 3>
> +  ret <4 x i32> %sdiv
> +}
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits