[llvm] r220777 - [x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros.

Robert Khasanov rob.khasanov at gmail.com
Fri Nov 7 10:56:14 PST 2014


Hi Quentin,

I didn't have time to look on the issue this week. Sorry for that.
I understand the problem, but I am not sure that understand your solution
fully, it seems for me to be very conservative..  Did you check your test
with -mattr=+avx512f flag? I guess that vpand w/o shift will be created but
not sure.. I'm worrying how it affects on AVX-512.


2014-11-06 5:41 GMT+03:00 Quentin Colombet <qcolombet at apple.com>:

> Hi Robert,
>
> I went ahead and fixed the root cause of the issue in r221429.
> You can keep optimize VSELECTs now without worrying about the “messed-up”
> condition :).
>
> And BTW, sorry, it seems I didn’t attached the test case the first time.
> You will find it in r221429 select-avx.ll test3, in case you are interested.
>
> Thanks,
> -Quentin
>
> > On Nov 4, 2014, at 10:11 AM, Quentin Colombet <qcolombet at apple.com>
> wrote:
> >
> > Hi Robert,
> >
> > Any thoughts on the reported failure?
> >
> > Thanks,
> > -Quentin
> >
> > On Oct 31, 2014, at 5:39 PM, Adrian Prantl <aprantl at apple.com> wrote:
> >
> >> Hi Robert,
> >>
> >> I temporarily reverted the patch in r221028 while this is being
> investigated, so we can get the bots running over the weekend again.
> >>
> >> thanks,
> >> adrian
> >>
> >>
> >>> On Oct 31, 2014, at 5:16 PM, Quentin Colombet <qcolombet at apple.com>
> wrote:
> >>>
> >>> Hi Robert,
> >>>
> >>> This code breaks our internal buildbot.
> >>> I have attached a reduced test case.
> >>>
> >>> Could you take a look please?
> >>>
> >>> To reproduce:
> >>> Without r220777 patch:
> >>> llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o old.s
> >>> With r220777 patch:
> >>> llc -mtriple=x86_64-apple-macosx -mattr=+avx < tmp.ll -o new.s
> >>>
> >>> diff -U 10 old.s new.s
> >>> --- old.s   2014-10-31 16:57:37.000000000 -0700
> >>> +++ new.s   2014-10-31 16:57:32.000000000 -0700
> >>> @@ -48,21 +48,21 @@
> >>>     vshufps $-35, %xmm3, %xmm4, %xmm3 ## xmm3 = xmm4[1,3],xmm3[1,3]
> >>>     vshufps $-40, %xmm3, %xmm3, %xmm3 ## xmm3 = xmm3[0,2,1,3]
> >>>     vpsrld  $31, %xmm3, %xmm4
> >>>     vpaddd  %xmm4, %xmm3, %xmm3
> >>>     vpmulld LCPI0_1(%rip), %xmm3, %xmm3
> >>>     vpsubd  %xmm3, %xmm0, %xmm0
> >>>     vpxor   %xmm3, %xmm3, %xmm3
> >>>     vpcmpeqd        %xmm3, %xmm0, %xmm0
> >>>     vpslld  $31, %xmm0, %xmm0                    ## <--- This
> shouldn't be here if we use vpand
> >>>     vblendvps       %xmm0, %xmm1, %xmm2, %xmm1
> >>> -   vblendvps       %xmm0, LCPI0_2(%rip), %xmm3, %xmm0
> >>> +   vpand   LCPI0_2(%rip), %xmm0, %xmm0
> >>>     vmovdqa LCPI0_3(%rip), %xmm2    ## xmm2 =
> [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
> >>>     vpshufb %xmm2, %xmm0, %xmm0
> >>>     vmovq   %xmm0, (%rdi)
> >>>     vpshufb %xmm2, %xmm1, %xmm0
> >>>     vmovq   %xmm0, (%rsi)
> >>>     retq
> >>>     .cfi_endproc
> >>>
> >>> The vpand is wrong because it uses the blend mask instead of the
> original mask.
> >>> The right code is either with the blend instruction or with the vpand
> instruction, but without the shift, like highlighted in the assembly.
> >>>
> >>> The underlying problem is that the optimization now kicks in cases
> where the input mask has been modified to generate blend. In other words,
> the input mask does not have the nice <ty 0xffff..ff, …> form that would
> make this transformation valid. Previously the presence of SETCC ensured
> that the input mask had the correct input.
> >>>
> >>> In this example, instead of masking with 0xffff, we will mask with
> 0x80 and that does not work.
> >>>
> >>> You end up with this modified input mask because of this code (in
> X86ISelLowering.cpp line 22626):
> >>>  if (TLO.ShrinkDemandedConstant(Cond, DemandedMask) ||
> >>>      (TLI.SimplifyDemandedBits(Cond, DemandedMask, KnownZero, KnownOne,
> >>>                                TLO) &&
>  ///     <<<<<< This is the culprit optimization.
> >>>       // Don't optimize vector of constants. Those are handled by
> >>>       // the generic code and all the bits must be properly set for
> >>>       // the generic optimizer.
> >>>       !ISD::isBuildVectorOfConstantSDNodes(TLO.New.getNode())))
> >>>
> >>> That is the second time this optimization bits us (r220242), so if you
> think this is the place to be fixed, go for it :).
> >>> IMHO, I believe we should directly generate a blend here as we broke
> the generic assumption, but since we are style legalizing, this may have
> weird effects!
> >>>
> >>> Thanks,
> >>> -Quentin
> >>>
> >>>> On Oct 28, 2014, at 8:59 AM, Robert Khasanov <rob.khasanov at gmail.com>
> wrote:
> >>>>
> >>>> Author: rkhasanov
> >>>> Date: Tue Oct 28 10:59:40 2014
> >>>> New Revision: 220777
> >>>>
> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=220777&view=rev
> >>>> Log:
> >>>> [x86] Simplify vector selection if condition value type matches
> vselect value type and true value is all ones or false value is all zeros.
> >>>> This transformation worked if selector is produced by SETCC, however
> SETCC is needed only if we consider to swap operands. So I replaced SETCC
> check for this case.
> >>>> Added tests for vselect of <X x i1> values.
> >>>>
> >>>> Modified:
> >>>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> >>>> llvm/trunk/test/CodeGen/X86/avx512-select.ll
> >>>> llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> >>>>
> >>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> >>>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=220777&r1=220776&r2=220777&view=diff
> >>>>
> ==============================================================================
> >>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> >>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Tue Oct 28 10:59:40
> 2014
> >>>> @@ -22479,22 +22479,22 @@ static SDValue PerformSELECTCombine(SDNo
> >>>>    return DAG.getNode(Opc, DL, VT, LHS, RHS);
> >>>> }
> >>>>
> >>>> -  // Simplify vector selection if the selector will be produced by
> CMPP*/PCMP*.
> >>>> -  if (N->getOpcode() == ISD::VSELECT && Cond.getOpcode() ==
> ISD::SETCC &&
> >>>> -      // Check if SETCC has already been promoted
> >>>> -      TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT &&
> >>>> -      // Check that condition value type matches vselect operand type
> >>>> -      CondVT == VT) {
> >>>> -
> >>>> +  // Simplify vector selection if condition value type matches
> vselect
> >>>> +  // operand type
> >>>> +  if (N->getOpcode() == ISD::VSELECT && CondVT == VT) {
> >>>>  assert(Cond.getValueType().isVector() &&
> >>>>         "vector select expects a vector selector!");
> >>>>
> >>>>  bool TValIsAllOnes = ISD::isBuildVectorAllOnes(LHS.getNode());
> >>>>  bool FValIsAllZeros = ISD::isBuildVectorAllZeros(RHS.getNode());
> >>>>
> >>>> -    if (!TValIsAllOnes && !FValIsAllZeros) {
> >>>> -      // Try invert the condition if true value is not all 1s and
> false value
> >>>> -      // is not all 0s.
> >>>> +    // Try invert the condition if true value is not all 1s and
> false value
> >>>> +    // is not all 0s.
> >>>> +    if (!TValIsAllOnes && !FValIsAllZeros &&
> >>>> +        // Check if the selector will be produced by CMPP*/PCMP*
> >>>> +        Cond.getOpcode() == ISD::SETCC &&
> >>>> +        // Check if SETCC has already been promoted
> >>>> +        TLI.getSetCCResultType(*DAG.getContext(), VT) == CondVT) {
> >>>>    bool TValIsAllZeros = ISD::isBuildVectorAllZeros(LHS.getNode());
> >>>>    bool FValIsAllOnes = ISD::isBuildVectorAllOnes(RHS.getNode());
> >>>>
> >>>>
> >>>> Modified: llvm/trunk/test/CodeGen/X86/avx512-select.ll
> >>>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-select.ll?rev=220777&r1=220776&r2=220777&view=diff
> >>>>
> ==============================================================================
> >>>> --- llvm/trunk/test/CodeGen/X86/avx512-select.ll (original)
> >>>> +++ llvm/trunk/test/CodeGen/X86/avx512-select.ll Tue Oct 28 10:59:40
> 2014
> >>>> @@ -48,3 +48,47 @@ define <16 x double> @select04(<16 x dou
> >>>> %sel = select <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true,
> i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1
> false, i1 false, i1 false, i1 false>, <16 x double> %a, <16 x double> %b
> >>>> ret <16 x double> %sel
> >>>> }
> >>>> +
> >>>> +; CHECK-LABEL: select05
> >>>> +; CHECK: kmovw   %esi, %k0
> >>>> +; CHECK-NEXT: kmovw   %edi, %k1
> >>>> +; CHECK-NEXT: korw    %k1, %k0, %k0
> >>>> +; CHECK-NEXT: kmovw   %k0, %eax
> >>>> +define i8 @select05(i8 %a.0, i8 %m) {
> >>>> +  %mask = bitcast i8 %m to <8 x i1>
> >>>> +  %a = bitcast i8 %a.0 to <8 x i1>
> >>>> +  %r = select <8 x i1> %mask, <8 x i1> <i1 -1, i1 -1, i1 -1, i1 -1,
> i1 -1, i1 -1, i1 -1, i1 -1>, <8 x i1> %a
> >>>> +  %res = bitcast <8 x i1> %r to i8
> >>>> +  ret i8 %res;
> >>>> +}
> >>>> +
> >>>> +; CHECK-LABEL: select06
> >>>> +; CHECK: kmovw   %esi, %k0
> >>>> +; CHECK-NEXT: kmovw   %edi, %k1
> >>>> +; CHECK-NEXT: kandw    %k1, %k0, %k0
> >>>> +; CHECK-NEXT: kmovw   %k0, %eax
> >>>> +define i8 @select06(i8 %a.0, i8 %m) {
> >>>> +  %mask = bitcast i8 %m to <8 x i1>
> >>>> +  %a = bitcast i8 %a.0 to <8 x i1>
> >>>> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> zeroinitializer
> >>>> +  %res = bitcast <8 x i1> %r to i8
> >>>> +  ret i8 %res;
> >>>> +}
> >>>> +
> >>>> +; CHECK-LABEL: select07
> >>>> +; CHECK-DAG:  kmovw   %edx, %k0
> >>>> +; CHECK-DAG:  kmovw   %edi, %k1
> >>>> +; CHECK-DAG:  kmovw   %esi, %k2
> >>>> +; CHECK: kandw %k0, %k1, %k1
> >>>> +; CHECK-NEXT: knotw    %k0, %k0
> >>>> +; CHECK-NEXT: kandw    %k0, %k2, %k0
> >>>> +; CHECK-NEXT: korw %k0, %k1, %k0
> >>>> +; CHECK-NEXT: kmovw   %k0, %eax
> >>>> +define i8 @select07(i8 %a.0, i8 %b.0, i8 %m) {
> >>>> +  %mask = bitcast i8 %m to <8 x i1>
> >>>> +  %a = bitcast i8 %a.0 to <8 x i1>
> >>>> +  %b = bitcast i8 %b.0 to <8 x i1>
> >>>> +  %r = select <8 x i1> %mask, <8 x i1> %a, <8 x i1> %b
> >>>> +  %res = bitcast <8 x i1> %r to i8
> >>>> +  ret i8 %res;
> >>>> +}
> >>>>
> >>>> Modified: llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> >>>> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vselect-avx.ll?rev=220777&r1=220776&r2=220777&view=diff
> >>>>
> ==============================================================================
> >>>> --- llvm/trunk/test/CodeGen/X86/vselect-avx.ll (original)
> >>>> +++ llvm/trunk/test/CodeGen/X86/vselect-avx.ll Tue Oct 28 10:59:40
> 2014
> >>>> @@ -14,8 +14,8 @@ target triple = "x86_64-apple-macosx"
> >>>> ; <rdar://problem/18675020>
> >>>>
> >>>> ; CHECK-LABEL: test:
> >>>> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65533,124,125,14807]
> >>>> -; CHECK: vmovdqa {{.*#+}} xmm1 = [65535,0,0,65535]
> >>>> +; CHECK: vmovdqa {{.*#+}} xmm0 = [65535,0,0,65535]
> >>>> +; CHECK: vmovdqa {{.*#+}} xmm2 = [65533,124,125,14807]
> >>>> ; CHECK: ret
> >>>> define void @test(<4 x i16>* %a, <4 x i16>* %b) {
> >>>> body:
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> llvm-commits mailing list
> >>>> llvm-commits at cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>>
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141107/35bc835b/attachment.html>


More information about the llvm-commits mailing list