[llvm] r220777 - [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.

Quentin Colombet qcolombet at apple.com
Fri Nov 7 11:10:48 PST 2014


Hi Robert,

On Nov 7, 2014, at 10:56 AM, Robert Khasanov <rob.khasanov at gmail.com> wrote:

> Hi Quentin,
> 
> I didn't have time to look on the issue this week. Sorry for that. 
> I understand the problem, but I am not sure that understand your solution fully, it seems for me to be very conservative..  Did you check your test with -mattr=+avx512f flag?

No, I didn’t. I have assumed, maybe wrongly, that if the generated code did not change for anything before avx2 and make check passed without regressions then the avx512 impact should remain minimal.
Regarding the patch being conservative, it may indeed be possible to have some of the VSELECT optimization to kick on the SHRUNKBLEND instruction. I did a quick check on those before committing the fix and I did not see any obvious candidate we were missing.


> I guess that vpand w/o shift will be created but not sure.. I'm worrying how it affects on AVX-512.

None of the AVX-512 test cases showed regressions, so I assumed it was okay. If you see any cases where we regressed, let me know.

Thanks for the follow-up.
-Quentin

> 
> 
> 2014-11-06 5:41 GMT+03:00 Quentin Colombet <qcolombet at apple.com>:
> Hi Robert,
> 
> I went ahead and fixed the root cause of the issue in r221429.
> You can keep optimize VSELECTs now without worrying about the “messed-up” condition :).
> 
> And BTW, sorry, it seems I didn’t attached the test case the first time. You will find it in r221429 select-avx.ll test3, in case you are interested.
> 
> Thanks,
> -Quentin
> 
> > On Nov 4, 2014, at 10:11 AM, Quentin Colombet <qcolombet at apple.com> wrote:
> >
> > Hi Robert,
> >
> > Any thoughts on the reported failure?
> >
> > Thanks,
> > -Quentin
> >
> > On Oct 31, 2014, at 5:39 PM, Adrian Prantl <aprantl at apple.com> wrote:
> >
> >> Hi Robert,
> >>
> >> I temporarily reverted the patch in r221028 while this is being investigated, so we can get the bots running over the weekend again.
> >>
> >> thanks,
> >> adrian
> >>
> >>
> >>> On Oct 31, 2014, at 5:16 PM, Quentin Colombet <qcolombet at apple.com> wrote:
> >>>
> >>> Hi Robert,
> >>>
> >>> This code breaks our internal buildbot.
> >>> I have attached a reduced test case.
> >>>
> >>> Could you take a look please?
> >>>
> >>> To reproduce:
> >>> Without r220777 patch:
> >>> llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o old.s
> >>> With r220777 patch:
> >>> llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o new.s
> >>>
> >>> diff -U 10 old.s new.s
> >>> --- old.s   2014-10-31 16:57:37.000000000 -0700
> >>> +++ new.s   2014-10-31 16:57:32.000000000 -0700
> >>> @@ -48,21 +48,21 @@
> >>>     vshufps $-35, %xmm3, %xmm4, %xmm3 ## xmm3 = xmm4[1,3],xmm3[1,3]
> >>>     vshufps $-40, %xmm3, %xmm3, %xmm3 ## xmm3 = xmm3[0,2,1,3]
> >>>     vpsrld  $31, %xmm3, %xmm4
> >>>     vpaddd  %xmm4, %xmm3, %xmm3
> >>>     vpmulld LCPI0_1(%rip), %xmm3, %xmm3
> >>>     vpsubd  %xmm3, %xmm0, %xmm0
> >>>     vpxor   %xmm3, %xmm3, %xmm3
> >>>     vpcmpeqd        %xmm3, %xmm0, %xmm0
> >>>     vpslld  $31, %xmm0, %xmm0                    ## <--- This shouldn't be here if we use vpand
> >>>     vblendvps       %xmm0, %xmm1, %xmm2, %xmm1
> >>> -   vblendvps       %xmm0, LCPI0_2(%rip), %xmm3, %xmm0
> >>> +   vpand   LCPI0_2(%rip), %xmm0, %xmm0
> >>>     vmovdqa LCPI0_3(%rip), %xmm2    ## xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
> >>>     vpshufb %xmm2, %xmm0, %xmm0
> >>>     vmovq   %xmm0, (%rdi)
> >>>     vpshufb %xmm2, %xmm1, %xmm0
> >>>     vmovq   %xmm0, (%rsi)
> >>>     retq
> >>>     .cfi_endproc
> >>>
> >>> The vpand is wrong because it uses the blend mask instead of the original mask.
> >>> The right code is either with the blend instruction or with the vpand instruction, but without the shift, like highlighted in the assembly.
> >>>
> >>> The underlying problem is that the optimization now kicks in cases where the input mask has been modified to generate blend. In other words, the input mask does not have the nice <ty 0xffff..ff, …> form that would make this transformation valid. Previously the presence of SETCC ensured that the input mask had the correct input.
> >>>
> >>> In this example, instead of masking with 0xffff, we will mask with 0x80 and that does not work.
> >>>
> >>> You end up with this modified input mask because of this code (in X86ISelLowering.cpp line 22626):
> >>>  if (TLO.ShrinkDemandedConstant(Cond, DemandedMask) ||
> >>>      (TLI.SimplifyDemandedBits(Cond, DemandedMask, KnownZero, KnownOne,
> >>>                                TLO) &&                             ///     <<<<<< This is the culprit optimization.
> >>>       // Don't optimize vector of constants. Those are handled by
> >>>       // the generic code and all the bits must be properly set for
> >>>       // the generic optimizer.
> >>>       !ISD::isBuildVectorOfConstantSDNodes(TLO.New.getNode())))
> >>>
> >>> That is the second time this optimization bits us (r220242), so if you think this is the place to be fixed, go for it :).
> >>> IMHO, I believe we should directly generate a blend here as we broke the generic assumption, but since we are style legalizing, this may have weird effects!
> >>>
> >>> Thanks,
> >>> -Quentin
> >>>
> >>>> On Oct 28, 2014, at 8:59 AM, Robert Khasanov <rob.khasanov at gmail.com> wrote:
> >>>>
> >>>> Author: rkhasanov
> >>>> Date: Tue Oct 28 10:59:40 2014
> >>>> New Revision: 220777
> >>>>
> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=220777&view=rev
> >>>> Log:
> >>>> [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.
> >>>> This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
> >>>> Added tests for vselect of <X x i1> values.
> >>>>
> >>>> Modified:
> >>>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> >>>> llvm/trunk/test/CodeGen/X86/avx512-select.ll
> >>>> llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> >>>>
> >>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=220777&r1=220776&r2=220777&view=diff
> >>>> ==============================================================================
> >>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> >>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Oct 28 10:59:40 2014
> >>>> @@ -22479,22 +22479,22 @@ static SDValue PerformSELECTCombine(SDNo
> >>>>    return DAG.getNode(Opc, DL, VT, LHS, RHS);
> >>>> }
> >>>>
> >>>> -  // Simplify vector selection if the selector will be produced by CMPP*/PCMP*.
> >>>> -  if (N->getOpcode() == ISD::VSELECT && Cond.getOpcode() == ISD::SETCC &&
> >>>> -      // Check if SETCC has already been promoted
> >>>> -      TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT &&
> >>>> -      // Check that condition value type matches vselect operand type
> >>>> -      CondVT == VT) {
> >>>> -
> >>>> +  // Simplify vector selection if condition value type matches vselect
> >>>> +  // operand type
> >>>> +  if (N->getOpcode() == ISD::VSELECT && CondVT == VT) {
> >>>>  assert(Cond.getValueType().isVector() &&
> >>>>         "vector select expects a vector selector!");
> >>>>
> >>>>  bool TValIsAllOnes = ISD::isBuildVectorAllOnes(LHS.getNode());
> >>>>  bool FValIsAllZeros = ISD::isBuildVectorAllZeros(RHS.getNode());
> >>>>
> >>>> -    if (!TValIsAllOnes && !FValIsAllZeros) {
> >>>> -      // Try invert the condition if true value is not all 1s and false value
> >>>> -      // is not all 0s.
> >>>> +    // Try invert the condition if true value is not all 1s and false value
> >>>> +    // is not all 0s.
> >>>> +    if (!TValIsAllOnes && !FValIsAllZeros &&
> >>>> +        // Check if the selector will be produced by CMPP*/PCMP*
> >>>> +        Cond.getOpcode() == ISD::SETCC &&
> >>>> +        // Check if SETCC has already been promoted
> >>>> +        TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT) {
> >>>>    bool TValIsAllZeros = ISD::isBuildVectorAllZeros(LHS.getNode());
> >>>>    bool FValIsAllOnes = ISD::isBuildVectorAllOnes(RHS.getNode());
> >>>>
> >>>>
> >>>> Modified: llvm/trunk/test/CodeGen/X86/avx512-select.ll
> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-select.ll?rev=220777&r1=220776&r2=220777&view=diff
> >>>> ==============================================================================
> >>>> --- llvm/trunk/test/CodeGen/X86/avx512-select.ll (original)
> >>>> +++ llvm/trunk/test/CodeGen/X86/avx512-select.ll Tue Oct 28 10:59:40 2014
> >>>> @@ -48,3 +48,47 @@ define <16 x double> @select04(<16 x dou
> >>>> %sel = select <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x double> %a, <16 x double> %b
> >>>> ret <16 x double> %sel
> >>>> }
> >>>> +
> >>>> +; CHECK-LABEL: select05
> >>>> +; CHECK: kmovw   %esi, %k0
> >>>> +; CHECK-NEXT: kmovw   %edi, %k1
> >>>> +; CHECK-NEXT: korw    %k1, %k0, %k0
> >>>> +; CHECK-NEXT: kmovw   %k0, %eax
> >>>> +define i8 @select05(i8 %a.0, i8 %m) {
> >>>> +  %mask = bitcast i8 %m to <8 x i1>
> >>>> +  %a = bitcast i8 %a.0 to <8 x i1>
> >>>> +  %r = select <8 x i1> %mask, <8 x i1> <i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1, i1 -1>, <8 x i1> %a
> >>>> +  %res = bitcast <8 x i1> %r to i8
> >>>> +  ret i8 %res;
> >>>> +}
> >>>> +
> >>>> +; CHECK-LABEL: select06
> >>>> +; CHECK: kmovw   %esi, %k0
> >>>> +; CHECK-NEXT: kmovw   %edi, %k1
> >>>> +; CHECK-NEXT: kandw    %k1, %k0, %k0
> >>>> +; CHECK-NEXT: kmovw   %k0, %eax
> >>>> +define i8 @select06(i8 %a.0, i8 %m) {
> >>>> +  %mask = bitcast i8 %m to <8 x i1>
> >>>> +  %a = bitcast i8 %a.0 to <8 x i1>
> >>>> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> zeroinitializer
> >>>> +  %res = bitcast <8 x i1> %r to i8
> >>>> +  ret i8 %res;
> >>>> +}
> >>>> +
> >>>> +; CHECK-LABEL: select07
> >>>> +; CHECK-DAG:  kmovw   %edx, %k0
> >>>> +; CHECK-DAG:  kmovw   %edi, %k1
> >>>> +; CHECK-DAG:  kmovw   %esi, %k2
> >>>> +; CHECK: kandw %k0, %k1, %k1
> >>>> +; CHECK-NEXT: knotw    %k0, %k0
> >>>> +; CHECK-NEXT: kandw    %k0, %k2, %k0
> >>>> +; CHECK-NEXT: korw %k0, %k1, %k0
> >>>> +; CHECK-NEXT: kmovw   %k0, %eax
> >>>> +define i8 @select07(i8 %a.0, i8 %b.0, i8 %m) {
> >>>> +  %mask = bitcast i8 %m to <8 x i1>
> >>>> +  %a = bitcast i8 %a.0 to <8 x i1>
> >>>> +  %b = bitcast i8 %b.0 to <8 x i1>
> >>>> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> %b
> >>>> +  %res = bitcast <8 x i1> %r to i8
> >>>> +  ret i8 %res;
> >>>> +}
> >>>>
> >>>> Modified: llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vselect-avx.ll?rev=220777&r1=220776&r2=220777&view=diff
> >>>> ==============================================================================
> >>>> --- llvm/trunk/test/CodeGen/X86/vselect-avx.ll (original)
> >>>> +++ llvm/trunk/test/CodeGen/X86/vselect-avx.ll Tue Oct 28 10:59:40 2014
> >>>> @@ -14,8 +14,8 @@ target triple = "x86_64-apple-macosx"
> >>>> ; <rdar://problem/18675020>
> >>>>
> >>>> ; CHECK-LABEL: test:
> >>>> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65533,124,125,14807]
> >>>> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65535,0,0,65535]
> >>>> +; CHECK: vmovdqa {{.*#+}} xmm0 = [65535,0,0,65535]
> >>>> +; CHECK: vmovdqa {{.*#+}} xmm2 = [65533,124,125,14807]
> >>>> ; CHECK: ret
> >>>> define void @test(<4 x i16>* %a, <4 x i16>* %b) {
> >>>> body:
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> llvm-commits mailing list
> >>>> llvm-commits at cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>>
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141107/c68470be/attachment.html>


More information about the llvm-commits mailing list