[llvm] r220777 - [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.

Wed Nov 5 18:41:26 PST 2014

Hi Robert,

I went ahead and fixed the root cause of the issue in r221429.
You can keep optimize VSELECTs now without worrying about the “messed-up” condition :).

And BTW, sorry, it seems I didn’t attached the test case the first time. You will find it in r221429 select-avx.ll test3, in case you are interested.

Thanks,
-Quentin

> On Nov 4, 2014, at 10:11 AM, Quentin Colombet <qcolombet at apple.com> wrote:
> 
> Hi Robert,
> 
> Any thoughts on the reported failure?
> 
> Thanks,
> -Quentin
> 
> On Oct 31, 2014, at 5:39 PM, Adrian Prantl <aprantl at apple.com> wrote:
> 
>> Hi Robert,
>> 
>> I temporarily reverted the patch in r221028 while this is being investigated, so we can get the bots running over the weekend again.
>> 
>> thanks,
>> adrian
>> 
>> 
>>> On Oct 31, 2014, at 5:16 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>> 
>>> Hi Robert,
>>> 
>>> This code breaks our internal buildbot.
>>> I have attached a reduced test case.
>>> 
>>> Could you take a look please?
>>> 
>>> To reproduce:
>>> Without r220777 patch:
>>> llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o old.s
>>> With r220777 patch:
>>> llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o new.s
>>> 
>>> diff -U 10 old.s new.s
>>> --- old.s	2014-10-31 16:57:37.000000000 -0700
>>> +++ new.s	2014-10-31 16:57:32.000000000 -0700
>>> @@ -48,21 +48,21 @@
>>> 	vshufps	$-35, %xmm3, %xmm4, %xmm3 ## xmm3 = xmm4[1,3],xmm3[1,3]
>>> 	vshufps	$-40, %xmm3, %xmm3, %xmm3 ## xmm3 = xmm3[0,2,1,3]
>>> 	vpsrld	$31, %xmm3, %xmm4
>>> 	vpaddd	%xmm4, %xmm3, %xmm3
>>> 	vpmulld	LCPI0_1(%rip), %xmm3, %xmm3
>>> 	vpsubd	%xmm3, %xmm0, %xmm0
>>> 	vpxor	%xmm3, %xmm3, %xmm3
>>> 	vpcmpeqd	%xmm3, %xmm0, %xmm0
>>> 	vpslld	$31, %xmm0, %xmm0                    ## <--- This shouldn't be here if we use vpand
>>> 	vblendvps	%xmm0, %xmm1, %xmm2, %xmm1
>>> -	vblendvps	%xmm0, LCPI0_2(%rip), %xmm3, %xmm0
>>> +	vpand	LCPI0_2(%rip), %xmm0, %xmm0
>>> 	vmovdqa	LCPI0_3(%rip), %xmm2    ## xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
>>> 	vpshufb	%xmm2, %xmm0, %xmm0
>>> 	vmovq	%xmm0, (%rdi)
>>> 	vpshufb	%xmm2, %xmm1, %xmm0
>>> 	vmovq	%xmm0, (%rsi)
>>> 	retq
>>> 	.cfi_endproc
>>> 
>>> The vpand is wrong because it uses the blend mask instead of the original mask.
>>> The right code is either with the blend instruction or with the vpand instruction, but without the shift, like highlighted in the assembly.
>>> 
>>> The underlying problem is that the optimization now kicks in cases where the input mask has been modified to generate blend. In other words, the input mask does not have the nice <ty 0xffff..ff, …> form that would make this transformation valid. Previously the presence of SETCC ensured that the input mask had the correct input.
>>> 
>>> In this example, instead of masking with 0xffff, we will mask with 0x80 and that does not work.
>>> 
>>> You end up with this modified input mask because of this code (in X86ISelLowering.cpp line 22626):
>>>  if (TLO.ShrinkDemandedConstant(Cond, DemandedMask) ||
>>>      (TLI.SimplifyDemandedBits(Cond, DemandedMask, KnownZero, KnownOne,
>>>                                TLO) &&                             ///     <<<<<< This is the culprit optimization.
>>>       // Don't optimize vector of constants. Those are handled by
>>>       // the generic code and all the bits must be properly set for
>>>       // the generic optimizer.
>>>       !ISD::isBuildVectorOfConstantSDNodes(TLO.New.getNode())))
>>> 
>>> That is the second time this optimization bits us (r220242), so if you think this is the place to be fixed, go for it :).
>>> IMHO, I believe we should directly generate a blend here as we broke the generic assumption, but since we are style legalizing, this may have weird effects!
>>> 
>>> Thanks,
>>> -Quentin
>>> 
>>>> On Oct 28, 2014, at 8:59 AM, Robert Khasanov <rob.khasanov at gmail.com> wrote:
>>>> 
>>>> Author: rkhasanov
>>>> Date: Tue Oct 28 10:59:40 2014
>>>> New Revision: 220777
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=220777&view=rev
>>>> Log:
>>>> [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
>>>> This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
>>>> Added tests for vselect of <X x i1> values.
>>>> 
>>>> Modified:
>>>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> llvm/trunk/test/CodeGen/X86/avx512-select.ll
>>>> llvm/trunk/test/CodeGen/X86/vselect-avx.ll
>>>> 
>>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=220777&r1=220776&r2=220777&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Oct 28 10:59:40 2014
>>>> @@ -22479,22 +22479,22 @@ static SDValue PerformSELECTCombine(SDNo
>>>>    return DAG.getNode(Opc, DL, VT, LHS, RHS);
>>>> }
>>>> 
>>>> -  // Simplify vector selection if the selector will be produced by CMPP*/PCMP*.
>>>> -  if (N->getOpcode() == ISD::VSELECT && Cond.getOpcode() == ISD::SETCC &&
>>>> -      // Check if SETCC has already been promoted
>>>> -      TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT &&
>>>> -      // Check that condition value type matches vselect operand type
>>>> -      CondVT == VT) { 
>>>> -
>>>> +  // Simplify vector selection if condition value type matches vselect
>>>> +  // operand type
>>>> +  if (N->getOpcode() == ISD::VSELECT && CondVT == VT) {
>>>>  assert(Cond.getValueType().isVector() &&
>>>>         "vector select expects a vector selector!");
>>>> 
>>>>  bool TValIsAllOnes = ISD::isBuildVectorAllOnes(LHS.getNode());
>>>>  bool FValIsAllZeros = ISD::isBuildVectorAllZeros(RHS.getNode());
>>>> 
>>>> -    if (!TValIsAllOnes && !FValIsAllZeros) {
>>>> -      // Try invert the condition if true value is not all 1s and false value
>>>> -      // is not all 0s.
>>>> +    // Try invert the condition if true value is not all 1s and false value
>>>> +    // is not all 0s.
>>>> +    if (!TValIsAllOnes && !FValIsAllZeros &&
>>>> +        // Check if the selector will be produced by CMPP*/PCMP*
>>>> +        Cond.getOpcode() == ISD::SETCC &&
>>>> +        // Check if SETCC has already been promoted
>>>> +        TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT) {
>>>>    bool TValIsAllZeros = ISD::isBuildVectorAllZeros(LHS.getNode());
>>>>    bool FValIsAllOnes = ISD::isBuildVectorAllOnes(RHS.getNode());
>>>> 
>>>> 
>>>> Modified: llvm/trunk/test/CodeGen/X86/avx512-select.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-select.ll?rev=220777&r1=220776&r2=220777&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/X86/avx512-select.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/avx512-select.ll Tue Oct 28 10:59:40 2014
>>>> @@ -48,3 +48,47 @@ define <16 x double> @select04(<16 x dou
>>>> %sel = select <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x double> %a, <16 x double> %b
>>>> ret <16 x double> %sel
>>>> }
>>>> +
>>>> +; CHECK-LABEL: select05
>>>> +; CHECK: kmovw   %esi, %k0
>>>> +; CHECK-NEXT: kmovw   %edi, %k1
>>>> +; CHECK-NEXT: korw    %k1, %k0, %k0
>>>> +; CHECK-NEXT: kmovw   %k0, %eax
>>>> +define i8 @select05(i8 %a.0, i8 %m) {
>>>> +  %mask = bitcast i8 %m to <8 x i1>
>>>> +  %a = bitcast i8 %a.0 to <8 x i1>
>>>> +  %r = select <8 x i1> %mask, <8 x i1> <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>, <8 x i1> %a
>>>> +  %res = bitcast <8 x i1> %r to i8
>>>> +  ret i8 %res;
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: select06
>>>> +; CHECK: kmovw   %esi, %k0
>>>> +; CHECK-NEXT: kmovw   %edi, %k1
>>>> +; CHECK-NEXT: kandw    %k1, %k0, %k0
>>>> +; CHECK-NEXT: kmovw   %k0, %eax
>>>> +define i8 @select06(i8 %a.0, i8 %m) {
>>>> +  %mask = bitcast i8 %m to <8 x i1>
>>>> +  %a = bitcast i8 %a.0 to <8 x i1>
>>>> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> zeroinitializer
>>>> +  %res = bitcast <8 x i1> %r to i8
>>>> +  ret i8 %res;
>>>> +}
>>>> +
>>>> +; CHECK-LABEL: select07
>>>> +; CHECK-DAG:  kmovw   %edx, %k0
>>>> +; CHECK-DAG:  kmovw   %edi, %k1
>>>> +; CHECK-DAG:  kmovw   %esi, %k2
>>>> +; CHECK: kandw %k0, %k1, %k1
>>>> +; CHECK-NEXT: knotw    %k0, %k0
>>>> +; CHECK-NEXT: kandw    %k0, %k2, %k0
>>>> +; CHECK-NEXT: korw %k0, %k1, %k0
>>>> +; CHECK-NEXT: kmovw   %k0, %eax
>>>> +define i8 @select07(i8 %a.0, i8 %b.0, i8 %m) {
>>>> +  %mask = bitcast i8 %m to <8 x i1>
>>>> +  %a = bitcast i8 %a.0 to <8 x i1>
>>>> +  %b = bitcast i8 %b.0 to <8 x i1>
>>>> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> %b
>>>> +  %res = bitcast <8 x i1> %r to i8
>>>> +  ret i8 %res;
>>>> +}
>>>> 
>>>> Modified: llvm/trunk/test/CodeGen/X86/vselect-avx.ll
>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vselect-avx.ll?rev=220777&r1=220776&r2=220777&view=diff
>>>> ==============================================================================
>>>> --- llvm/trunk/test/CodeGen/X86/vselect-avx.ll (original)
>>>> +++ llvm/trunk/test/CodeGen/X86/vselect-avx.ll Tue Oct 28 10:59:40 2014
>>>> @@ -14,8 +14,8 @@ target triple = "x86_64-apple-macosx"
>>>> ; <rdar://problem/18675020>
>>>> 
>>>> ; CHECK-LABEL: test:
>>>> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65533,124,125,14807]
>>>> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65535,0,0,65535]
>>>> +; CHECK: vmovdqa {{.*#+}} xmm0 = [65535,0,0,65535]
>>>> +; CHECK: vmovdqa {{.*#+}} xmm2 = [65533,124,125,14807]
>>>> ; CHECK: ret
>>>> define void @test(<4 x i16>* %a, <4 x i16>* %b) {
>>>> body:
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits