[llvm] r220777 - [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.

Fri Oct 31 17:16:52 PDT 2014

Hi Robert,

This code breaks our internal buildbot.
I have attached a reduced test case.

Could you take a look please?

To reproduce:
Without r220777 patch:
llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o old.s
With r220777 patch:
llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o new.s

diff -U 10 old.s new.s

--- old.s	2014-10-31 16:57:37.000000000 -0700
+++ new.s	2014-10-31 16:57:32.000000000 -0700
@@ -48,21 +48,21 @@
 	vshufps	$-35, %xmm3, %xmm4, %xmm3 ## xmm3 = xmm4[1,3],xmm3[1,3]
 	vshufps	$-40, %xmm3, %xmm3, %xmm3 ## xmm3 = xmm3[0,2,1,3]
 	vpsrld	$31, %xmm3, %xmm4
 	vpaddd	%xmm4, %xmm3, %xmm3
 	vpmulld	LCPI0_1(%rip), %xmm3, %xmm3
 	vpsubd	%xmm3, %xmm0, %xmm0
 	vpxor	%xmm3, %xmm3, %xmm3
 	vpcmpeqd	%xmm3, %xmm0, %xmm0
 	vpslld	$31, %xmm0, %xmm0                    ## <--- This shouldn't be here if we use vpand
 	vblendvps	%xmm0, %xmm1, %xmm2, %xmm1
-	vblendvps	%xmm0, LCPI0_2(%rip), %xmm3, %xmm0
+	vpand	LCPI0_2(%rip), %xmm0, %xmm0
 	vmovdqa	LCPI0_3(%rip), %xmm2    ## xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
 	vpshufb	%xmm2, %xmm0, %xmm0
 	vmovq	%xmm0, (%rdi)
 	vpshufb	%xmm2, %xmm1, %xmm0
 	vmovq	%xmm0, (%rsi)
 	retq
 	.cfi_endproc

The vpand is wrong because it uses the blend mask instead of the original mask.
The right code is either with the blend instruction or with the vpand instruction, but without the shift, like highlighted in the assembly.

The underlying problem is that the optimization now kicks in cases where the input mask has been modified to generate blend. In other words, the input mask does not have the nice <ty 0xffff..ff, …> form that would make this transformation valid. Previously the presence of SETCC ensured that the input mask had the correct input.

In this example, instead of masking with 0xffff, we will mask with 0x80 and that does not work.

You end up with this modified input mask because of this code (in X86ISelLowering.cpp line 22626):
    if (TLO.ShrinkDemandedConstant(Cond, DemandedMask) ||
        (TLI.SimplifyDemandedBits(Cond, DemandedMask, KnownZero, KnownOne,
                                  TLO) &&                             ///     <<<<<< This is the culprit optimization.
         // Don't optimize vector of constants. Those are handled by
         // the generic code and all the bits must be properly set for
         // the generic optimizer.
         !ISD::isBuildVectorOfConstantSDNodes(TLO.New.getNode())))

That is the second time this optimization bits us (r220242), so if you think this is the place to be fixed, go for it :).
IMHO, I believe we should directly generate a blend here as we broke the generic assumption, but since we are style legalizing, this may have weird effects!

Thanks,
-Quentin

> On Oct 28, 2014, at 8:59 AM, Robert Khasanov <rob.khasanov at gmail.com> wrote:
> 
> Author: rkhasanov
> Date: Tue Oct 28 10:59:40 2014
> New Revision: 220777
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=220777&view=rev
> Log:
> [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
> This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
> Added tests for vselect of <X x i1> values.
> 
> Modified:
>    llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>    llvm/trunk/test/CodeGen/X86/avx512-select.ll
>    llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> 
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=220777&r1=220776&r2=220777&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Oct 28 10:59:40 2014
> @@ -22479,22 +22479,22 @@ static SDValue PerformSELECTCombine(SDNo
>       return DAG.getNode(Opc, DL, VT, LHS, RHS);
>   }
> 
> -  // Simplify vector selection if the selector will be produced by CMPP*/PCMP*.
> -  if (N->getOpcode() == ISD::VSELECT && Cond.getOpcode() == ISD::SETCC &&
> -      // Check if SETCC has already been promoted
> -      TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT &&
> -      // Check that condition value type matches vselect operand type
> -      CondVT == VT) { 
> -
> +  // Simplify vector selection if condition value type matches vselect
> +  // operand type
> +  if (N->getOpcode() == ISD::VSELECT && CondVT == VT) {
>     assert(Cond.getValueType().isVector() &&
>            "vector select expects a vector selector!");
> 
>     bool TValIsAllOnes = ISD::isBuildVectorAllOnes(LHS.getNode());
>     bool FValIsAllZeros = ISD::isBuildVectorAllZeros(RHS.getNode());
> 
> -    if (!TValIsAllOnes && !FValIsAllZeros) {
> -      // Try invert the condition if true value is not all 1s and false value
> -      // is not all 0s.
> +    // Try invert the condition if true value is not all 1s and false value
> +    // is not all 0s.
> +    if (!TValIsAllOnes && !FValIsAllZeros &&
> +        // Check if the selector will be produced by CMPP*/PCMP*
> +        Cond.getOpcode() == ISD::SETCC &&
> +        // Check if SETCC has already been promoted
> +        TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT) {
>       bool TValIsAllZeros = ISD::isBuildVectorAllZeros(LHS.getNode());
>       bool FValIsAllOnes = ISD::isBuildVectorAllOnes(RHS.getNode());
> 
> 
> Modified: llvm/trunk/test/CodeGen/X86/avx512-select.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-select.ll?rev=220777&r1=220776&r2=220777&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx512-select.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx512-select.ll Tue Oct 28 10:59:40 2014
> @@ -48,3 +48,47 @@ define <16 x double> @select04(<16 x dou
>   %sel = select <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x double> %a, <16 x double> %b
>   ret <16 x double> %sel
> }
> +
> +; CHECK-LABEL: select05
> +; CHECK: kmovw   %esi, %k0
> +; CHECK-NEXT: kmovw   %edi, %k1
> +; CHECK-NEXT: korw    %k1, %k0, %k0
> +; CHECK-NEXT: kmovw   %k0, %eax
> +define i8 @select05(i8 %a.0, i8 %m) {
> +  %mask = bitcast i8 %m to <8 x i1>
> +  %a = bitcast i8 %a.0 to <8 x i1>
> +  %r = select <8 x i1> %mask, <8 x i1> <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>, <8 x i1> %a
> +  %res = bitcast <8 x i1> %r to i8
> +  ret i8 %res;
> +}
> +
> +; CHECK-LABEL: select06
> +; CHECK: kmovw   %esi, %k0
> +; CHECK-NEXT: kmovw   %edi, %k1
> +; CHECK-NEXT: kandw    %k1, %k0, %k0
> +; CHECK-NEXT: kmovw   %k0, %eax
> +define i8 @select06(i8 %a.0, i8 %m) {
> +  %mask = bitcast i8 %m to <8 x i1>
> +  %a = bitcast i8 %a.0 to <8 x i1>
> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> zeroinitializer
> +  %res = bitcast <8 x i1> %r to i8
> +  ret i8 %res;
> +}
> +
> +; CHECK-LABEL: select07
> +; CHECK-DAG:  kmovw   %edx, %k0
> +; CHECK-DAG:  kmovw   %edi, %k1
> +; CHECK-DAG:  kmovw   %esi, %k2
> +; CHECK: kandw %k0, %k1, %k1
> +; CHECK-NEXT: knotw    %k0, %k0
> +; CHECK-NEXT: kandw    %k0, %k2, %k0
> +; CHECK-NEXT: korw %k0, %k1, %k0
> +; CHECK-NEXT: kmovw   %k0, %eax
> +define i8 @select07(i8 %a.0, i8 %b.0, i8 %m) {
> +  %mask = bitcast i8 %m to <8 x i1>
> +  %a = bitcast i8 %a.0 to <8 x i1>
> +  %b = bitcast i8 %b.0 to <8 x i1>
> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> %b
> +  %res = bitcast <8 x i1> %r to i8
> +  ret i8 %res;
> +}
> 
> Modified: llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vselect-avx.ll?rev=220777&r1=220776&r2=220777&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vselect-avx.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vselect-avx.ll Tue Oct 28 10:59:40 2014
> @@ -14,8 +14,8 @@ target triple = "x86_64-apple-macosx"
> ; <rdar://problem/18675020>
> 
> ; CHECK-LABEL: test:
> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65533,124,125,14807]
> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65535,0,0,65535]
> +; CHECK: vmovdqa {{.*#+}} xmm0 = [65535,0,0,65535]
> +; CHECK: vmovdqa {{.*#+}} xmm2 = [65533,124,125,14807]
> ; CHECK: ret
> define void @test(<4 x i16>* %a, <4 x i16>* %b) {
> body:
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits