[llvm] r214011 - [x86] Fix PR20355 (for real). There are many layers to this bug.

Fri Jul 25 22:09:14 PDT 2014

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Friday, July 25, 2014 10:46:57 PM
> Subject: [llvm] r214011 - [x86] Fix PR20355 (for real). There are many layers	to this bug.
> 
> Author: chandlerc
> Date: Fri Jul 25 22:46:57 2014
> New Revision: 214011
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=214011&view=rev
> Log:
> [x86] Fix PR20355 (for real). There are many layers to this bug.
> 
> The tale starts with r212808 which attempted to fix inversion of the
> low
> and high bits when lowering MUL_LOHI. Sadly, that commit did not
> include
> any positive test cases, and just removed some operations from a test
> case where the actual logic being changed isn't fully visible from
> the
> test.
> 
> What this commit did was two things. First, it reversed the low and
> high
> results in the formation of the MERGE_VALUES node for the multiple
> results. This is entirely correct.
> 
> Second it changed the shuffles for extracting the low and high
> components from the i64 results of the multiplies to extract them
> assuming a big-endian-style encoding of the multiply results. This
> second change is wrong. There is no big-endian encoding in x86, the
> results of the multiplies are normal v2i64s: when cast to v4i32, the
> low
> i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and
> 3.
> 
> However, the first change wasn't enough to actually fix the bug,
> which
> is (I assume) why the second change was also made. There was another
> bug
> in the MERGE_VALUES formation: we weren't using a VTList, and so were
> getting a single result node! When grabbing the *second* result from
> the
> node, we got... well.. colud be anything. I think this *appeared* to
> invert things, but had to be causing other problems as well.
> 
> Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should
> have been fine, right? NOOOPE! Because the core bug was never
> addressed,
> the test in vector-idiv failed when I fixed the MERGE_VALUES node.
> Because there are essentially no docs for this node, 

Can you please fix this lack of documentation problem too?

Thanks again,
Hal

>I had to guess
> at
> how to fix it and tried swapping the operands, restoring the order of
> the original code before r212808. While this "fixed" the test case
> (in
> that we produced the write instructions) we were still extracting the
> wrong elements of the i64s, and thus PR20355 was still broken.
> 
> This commit essentially reverts the big-endian-style extraction part
> of
> r212808 and goes back to the original masks which were correct. Now
> that
> the MERGE_VALUES node formation is also correct, everything works.
> I've
> also included a more detailed test from PR20355 to make sure this
> stays
> fixed.
> 
> Modified:
>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>     llvm/trunk/test/CodeGen/X86/vector-idiv.ll
> 
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=214011&r1=214010&r2=214011&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Jul 25 22:46:57
> 2014
> @@ -15444,29 +15444,16 @@ static SDValue LowerMUL_LOHI(SDValue Op,
>                               DAG.getNode(Opcode, dl, MulVT, Odd0,
>                               Odd1));
>  
>    // Shuffle it back into the right order.
> -  // The internal representation is big endian.
> -  // In other words, a i64 bitcasted to 2 x i32 has its high part at
> index 0
> -  // and its low part at index 1.
> -  // Moreover, we have: Mul1 = <ae|cg> ; Mul2 = <bf|dh>
> -  // Vector index                0 1   ;          2 3
> -  // We want      <ae|bf|cg|dh>
> -  // Vector index   0  2  1  3
> -  // Since each element is seen as 2 x i32, we get:
> -  // high_mask[i] = 2 x vector_index[i]
> -  // low_mask[i] = 2 x vector_index[i] + 1
> -  // where vector_index = {0, Size/2, 1, Size/2 + 1, ...,
> -  //                       Size/2 - 1, Size/2 + Size/2 - 1}
> -  // where Size is the number of element of the final vector.
>    SDValue Highs, Lows;
>    if (VT == MVT::v8i32) {
> -    const int HighMask[] = {0, 8, 2, 10, 4, 12, 6, 14};
> +    const int HighMask[] = {1, 9, 3, 11, 5, 13, 7, 15};
>      Highs = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, HighMask);
> -    const int LowMask[] = {1, 9, 3, 11, 5, 13, 7, 15};
> +    const int LowMask[] = {0, 8, 2, 10, 4, 12, 6, 14};
>      Lows = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, LowMask);
>    } else {
> -    const int HighMask[] = {0, 4, 2, 6};
> +    const int HighMask[] = {1, 5, 3, 7};
>      Highs = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, HighMask);
> -    const int LowMask[] = {1, 5, 3, 7};
> +    const int LowMask[] = {1, 4, 2, 6};
>      Lows = DAG.getVectorShuffle(VT, dl, Mul1, Mul2, LowMask);
>    }
>  
> @@ -15484,9 +15471,9 @@ static SDValue LowerMUL_LOHI(SDValue Op,
>      Highs = DAG.getNode(ISD::SUB, dl, VT, Highs, Fixup);
>    }
>  
> -  // THe first result of MUL_LOHI is actually the high value,
> followed by the
> -  // low value.
> -  SDValue Ops[] = {Highs, Lows};
> +  // The first result of MUL_LOHI is actually the low value,
> followed by the
> +  // high value.
> +  SDValue Ops[] = {Lows, Highs};
>    return DAG.getMergeValues(Ops, dl);
>  }
>  
> 
> Modified: llvm/trunk/test/CodeGen/X86/vector-idiv.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-idiv.ll?rev=214011&r1=214010&r2=214011&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vector-idiv.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vector-idiv.ll Fri Jul 25 22:46:57
> 2014
> @@ -220,3 +220,42 @@ define <2 x i16> @test12() {
>  ; AVX-LABEL: test12:
>  ; AVX: xorps
>  }
> +
> +define <4 x i32> @PR20355(<4 x i32> %a) {
> +; SSE-LABEL: PR20355:
> +; SSE:         movdqa {{.*}}, %[[X1:xmm[0-9]+]]
> +; SSE-NEXT:    movdqa %[[X1]], %[[X2:xmm[0-9]+]]
> +; SSE-NEXT:    psrad  $31, %[[X2]]
> +; SSE-NEXT:    pand   %xmm0, %[[X2]]
> +; SSE-NEXT:    movdqa %xmm0, %[[X3:xmm[0-9]+]]
> +; SSE-NEXT:    psrad  $31, %[[X3]]
> +; SSE-NEXT:    pand   %[[X1]], %[[X3]]
> +; SSE-NEXT:    paddd  %[[X2]], %[[X3]]
> +; SSE-NEXT:    pshufd {{.*}} # [[X4:xmm[0-9]+]] = xmm0[1,0,3,0]
> +; SSE-NEXT:    pmuludq %[[X1]], %xmm0
> +; SSE-NEXT:    pshufd {{.*}} # [[X1]] = [[X1]][1,0,3,0]
> +; SSE-NEXT:    pmuludq %[[X4]], %[[X1]]
> +; SSE-NEXT:    shufps {{.*}} # xmm0 = xmm0[1,3],[[X1]][1,3]
> +; SSE-NEXT:    pshufd {{.*}} # [[X5:xmm[0-9]+]] = xmm0[0,2,1,3]
> +; SSE-NEXT:    psubd  %[[X3]], %[[X5]]
> +; SSE-NEXT:    movdqa %[[X5]], %xmm0
> +; SSE-NEXT:    psrld  $31, %xmm0
> +; SSE-NEXT:    paddd  %[[X5]], %xmm0
> +; SSE-NEXT:    retq
> +;
> +; SSE41-LABEL: PR20355:
> +; SSE41:         movdqa {{.*}}, %[[X1:xmm[0-9]+]]
> +; SSE41-NEXT:    pshufd {{.*}} # [[X2:xmm[0-9]+]] = xmm0[1,0,3,0]
> +; SSE41-NEXT:    pmuldq %[[X1]], %xmm0
> +; SSE41-NEXT:    pshufd {{.*}} # [[X1]] = [[X1]][1,0,3,0]
> +; SSE41-NEXT:    pmuldq %[[X2]], %[[X1]]
> +; SSE41-NEXT:    shufps {{.*}} # xmm0 = xmm0[1,3],[[X1]][1,3]
> +; SSE41-NEXT:    pshufd {{.*}} # [[X3:xmm[0-9]+]] = xmm0[0,2,1,3]
> +; SSE41-NEXT:    movdqa %[[X3]], %xmm0
> +; SSE41-NEXT:    psrld  $31, %xmm0
> +; SSE41-NEXT:    paddd  %[[X3]], %xmm0
> +; SSE41-NEXT:    retq
> +entry:
> +  %sdiv = sdiv <4 x i32> %a, <i32 3, i32 3, i32 3, i32 3>
> +  ret <4 x i32> %sdiv
> +}
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory