[llvm] r209643 - Convert some X86 blendv* intrinsics into IR.

Tue May 27 10:00:12 PDT 2014

On Tue, May 27, 2014 at 5:30 PM, Filipe Cabecinhas <me at filcab.net> wrote:
> Andrea: I just noticed. It wasn't on-list, Nadav replied only to me. I
> should have noticed and asked him to reply via a more public channel.
>
> Filipe

Ah I see, no problem then! :-)

>
>
> On Tue, May 27, 2014 at 9:29 AM, Filipe Cabecinhas <me at filcab.net> wrote:
>>
>> Thanks for the comments.
>> Unfortunately I had the reverse select instruction. I'm committing the fix
>> as soon as the tests finish running.
>>
>> Daniel, David: It didn't trigger warnings on the default compile of clang
>> for me, but you made me look again, so I added the parenthesis just in case.
>> I also ran clang-format, which I had forgotten.
>>
>> Itia, Andrea: Indeed, I was doing the inverse select mask. I'll fix it in
>> the next commit.

Thanks!

>>
>> Andrea: The patch was reviewed on-list but not on phab, it seems.
>>
>> Thanks,
>>
>>   Filipe
>>
>>
>>
>>
>> On Tue, May 27, 2014 at 9:22 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>> On Tue, May 27, 2014 at 3:08 AM, Daniel Jasper <djasper at google.com>
>>> wrote:
>>> >
>>> >
>>> >
>>> > On Tue, May 27, 2014 at 5:42 AM, Filipe Cabecinhas <me at filcab.net>
>>> > wrote:
>>> >>
>>> >> Author: filcab
>>> >> Date: Mon May 26 22:42:20 2014
>>> >> New Revision: 209643
>>> >>
>>> >> URL: http://llvm.org/viewvc/llvm-project?rev=209643&view=rev
>>> >> Log:
>>> >> Convert some X86 blendv* intrinsics into IR.
>>> >>
>>> >> Summary:
>>> >> Implemented an InstCombine transformation that takes a blendv*
>>> >> intrinsic
>>> >> call and translates it into an IR select, if the mask is constant.
>>> >>
>>> >> This will eventually get lowered into blends with immediates if
>>> >> possible,
>>> >> or pblendvb (with an option to further optimize if we can transform
>>> >> the
>>> >> pblendvb into a blend+immediate instruction, depending on the
>>> >> selector).
>>> >> It will also enable optimizations by the IR passes, which give up on
>>> >> sight of the intrinsic.
>>> >>
>>> >> Both the transformation and the lowering of its result to asm got
>>> >> shiny
>>> >> new tests.
>>> >>
>>> >> The transformation is a bit convoluted because of blendvp[sd]'s
>>> >> definition:
>>> >>
>>> >> Its mask is a floating point value! This forces us to convert it and
>>> >> get
>>> >> the highest bit. I suppose this happened because the mask has type
>>> >> __m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
>>> >>
>>> >> I will send an email to llvm-dev to discuss if we want to change this
>>> >> or
>>> >> not.
>>> >>
>>> >> Reviewers: grosbach, delena, nadav
>>> >>
>>> >> Differential Revision: http://reviews.llvm.org/D3859
>>> >>
>>> >> Added:
>>> >>     llvm/trunk/test/CodeGen/X86/avx2-blend.ll
>>> >>     llvm/trunk/test/Transforms/InstCombine/blend_x86.ll
>>> >> Modified:
>>> >>     llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>>> >>     llvm/trunk/test/CodeGen/X86/avx-blend.ll
>>> >>     llvm/trunk/test/CodeGen/X86/sse41-blend.ll
>>> >>
>>> >> Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>>> >> URL:
>>> >>
>>> >> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp?rev=209643&r1=209642&r2=209643&view=diff
>>> >>
>>> >>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>>> >> (original)
>>> >> +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp Mon May
>>> >> 26
>>> >> 22:42:20 2014
>>> >> @@ -718,6 +718,41 @@ Instruction *InstCombiner::visitCallInst
>>> >>      break;
>>> >>    }
>>> >>
>>> >> +  case Intrinsic::x86_sse41_pblendvb:
>>> >> +  case Intrinsic::x86_sse41_blendvps:
>>> >> +  case Intrinsic::x86_sse41_blendvpd:
>>> >> +  case Intrinsic::x86_avx_blendv_ps_256:
>>> >> +  case Intrinsic::x86_avx_blendv_pd_256:
>>> >> +  case Intrinsic::x86_avx2_pblendvb: {
>>> >> +    // Convert blendv* to vector selects if the mask is constant.
>>> >> +    // This optimization is convoluted because the intrinsic is
>>> >> defined
>>> >> as
>>> >> +    // getting a vector of floats or doubles for the ps and pd
>>> >> versions.
>>> >> +    // FIXME: That should be changed.
>>> >> +    Value *Mask = II->getArgOperand(2);
>>> >> +    if (auto C = dyn_cast<ConstantDataVector>(Mask)) {
>>> >> +      auto Tyi1 = Builder->getInt1Ty();
>>> >> +      auto SelectorType = cast<VectorType>(Mask->getType());
>>> >> +      auto EltTy = SelectorType->getElementType();
>>> >> +      unsigned Size = SelectorType->getNumElements();
>>> >> +      unsigned BitWidth = EltTy->isFloatTy() ? 32 :
>>> >> (EltTy->isDoubleTy()
>>> >> ? 64 : EltTy->getIntegerBitWidth());
>>> >> +      assert(BitWidth == 64 || BitWidth == 32 || BitWidth == 8 &&
>>> >> "Wrong
>>> >> arguments for variable blend intrinsic");
>>> >
>>> >
>>> > This is assert is bad in that it triggers Clang's operator precedence
>>>
>>> Are you sure this triggered /Clang's/ warning? I was pretty sure
>>> Clang's warning had a suppression for this case, since it doesn't
>>> actually change the semantics of the condition.
>>>
>>> > warning and works correctly more or less by accident (consider the
>>> > precedence of || and && - with the string implicitly evaluating to
>>> > true).
>>> > Fixed in r209648.
>>> >
>>> > Also, please adhere to LLVM coding standards (most importantly the 80
>>> > column
>>> > limit).
>>> >
>>> >>
>>> >> +      SmallVector<Constant*, 32> Selectors;
>>> >> +      for (unsigned I = 0; I < Size; ++I) {
>>> >> +        // The intrinsics only read the top bit
>>> >> +        uint64_t Selector;
>>> >> +        if (BitWidth == 8)
>>> >> +          Selector = C->getElementAsInteger(I);
>>> >> +        else
>>> >> +          Selector =
>>> >> C->getElementAsAPFloat(I).bitcastToAPInt().getZExtValue();
>>> >> +        Selectors.push_back(ConstantInt::get(Tyi1, Selector >>
>>> >> (BitWidth
>>> >> - 1)));
>>> >> +      }
>>> >> +      auto NewSelector = ConstantVector::get(Selectors);
>>> >> +      return SelectInst::Create(NewSelector, II->getArgOperand(0),
>>> >> II->getArgOperand(1), "blendv");
>>> >> +    } else {
>>> >> +      break;
>>> >> +    }
>>> >> +  }
>>> >> +
>>> >>    case Intrinsic::x86_avx_vpermilvar_ps:
>>> >>    case Intrinsic::x86_avx_vpermilvar_ps_256:
>>> >>    case Intrinsic::x86_avx_vpermilvar_pd:
>>> >>
>>> >> Modified: llvm/trunk/test/CodeGen/X86/avx-blend.ll
>>> >> URL:
>>> >>
>>> >> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-blend.ll?rev=209643&r1=209642&r2=209643&view=diff
>>> >>
>>> >>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/test/CodeGen/X86/avx-blend.ll (original)
>>> >> +++ llvm/trunk/test/CodeGen/X86/avx-blend.ll Mon May 26 22:42:20 2014
>>> >> @@ -135,3 +135,26 @@ define <2 x double> @testb(<2 x double>
>>> >>    %min = select <2 x i1> %min_is_x, <2 x double> %x, <2 x double> %y
>>> >>    ret <2 x double> %min
>>> >>  }
>>> >> +
>>> >> +; If we can figure out a blend has a constant mask, we should emit
>>> >> the
>>> >> +; blend instruction with an immediate mask
>>> >> +define <4 x double> @constant_blendvpd_avx(<4 x double> %xy, <4 x
>>> >> double>
>>> >> %ab) {
>>> >> +; CHECK-LABEL: constant_blendvpd_avx:
>>> >> +; CHECK-NOT: mov
>>> >> +; CHECK: vblendpd
>>> >> +; CHECK: ret
>>> >> +  %1 = select <4 x i1> <i1 false, i1 false, i1 true, i1 false>, <4 x
>>> >> double> %xy, <4 x double> %ab
>>> >> +  ret <4 x double> %1
>>> >> +}
>>> >> +
>>> >> +define <8 x float> @constant_blendvps_avx(<8 x float> %xyzw, <8 x
>>> >> float>
>>> >> %abcd) {
>>> >> +; CHECK-LABEL: constant_blendvps_avx:
>>> >> +; CHECK-NOT: mov
>>> >> +; CHECK: vblendps
>>> >> +; CHECK: ret
>>> >> +  %1 = select <8 x i1> <i1 false, i1 false, i1 false, i1 true, i1
>>> >> false,
>>> >> i1 false, i1 false, i1 true>, <8 x float> %xyzw, <8 x float> %abcd
>>> >> +  ret <8 x float> %1
>>> >> +}
>>> >> +
>>> >> +declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x
>>> >> float>,
>>> >> <8 x float>)
>>> >> +declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x
>>> >> double>, <4 x double>)
>>> >>
>>> >> Added: llvm/trunk/test/CodeGen/X86/avx2-blend.ll
>>> >> URL:
>>> >>
>>> >> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx2-blend.ll?rev=209643&view=auto
>>> >>
>>> >>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/test/CodeGen/X86/avx2-blend.ll (added)
>>> >> +++ llvm/trunk/test/CodeGen/X86/avx2-blend.ll Mon May 26 22:42:20 2014
>>> >> @@ -0,0 +1,11 @@
>>> >> +; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core-avx2 |
>>> >> FileCheck
>>> >> %s
>>> >> +
>>> >> +define <32 x i8> @constant_pblendvb_avx2(<32 x i8> %xyzw, <32 x i8>
>>> >> %abcd) {
>>> >> +; CHECK-LABEL: constant_pblendvb_avx2:
>>> >> +; CHECK: vmovdqa
>>> >> +; CHECK: vpblendvb
>>> >> +  %1 = select <32 x i1> <i1 false, i1 false, i1 true, i1 false, i1
>>> >> true,
>>> >> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> >> true,
>>> >> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> >> true,
>>> >> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> >> true,
>>> >> i1 true, i1 true, i1 false>, <32 x i8> %xyzw, <32 x i8> %abcd
>>> >> +  ret <32 x i8> %1
>>> >> +}
>>> >> +
>>> >> +declare <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8>, <32 x i8>, <32 x
>>> >> i8>)
>>> >>
>>> >> Modified: llvm/trunk/test/CodeGen/X86/sse41-blend.ll
>>> >> URL:
>>> >>
>>> >> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse41-blend.ll?rev=209643&r1=209642&r2=209643&view=diff
>>> >>
>>> >>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/test/CodeGen/X86/sse41-blend.ll (original)
>>> >> +++ llvm/trunk/test/CodeGen/X86/sse41-blend.ll Mon May 26 22:42:20
>>> >> 2014
>>> >> @@ -88,3 +88,35 @@ entry:
>>> >>    store double %extract214vector_func.i, double addrspace(1)* undef,
>>> >> align 8
>>> >>    ret void
>>> >>  }
>>> >> +
>>> >> +; If we can figure out a blend has a constant mask, we should emit
>>> >> the
>>> >> +; blend instruction with an immediate mask
>>> >> +define <2 x double> @constant_blendvpd(<2 x double> %xy, <2 x double>
>>> >> %ab) {
>>> >> +; In this case, we emit a simple movss
>>> >> +; CHECK-LABEL: constant_blendvpd
>>> >> +; CHECK: movsd
>>> >> +; CHECK: ret
>>> >> +  %1 = select <2 x i1> <i1 true, i1 false>, <2 x double> %xy, <2 x
>>> >> double> %ab
>>> >> +  ret <2 x double> %1
>>> >> +}
>>> >> +
>>> >> +define <4 x float> @constant_blendvps(<4 x float> %xyzw, <4 x float>
>>> >> %abcd) {
>>> >> +; CHECK-LABEL: constant_blendvps
>>> >> +; CHECK-NOT: mov
>>> >> +; CHECK: blendps $7
>>> >> +; CHECK: ret
>>> >> +  %1 = select <4 x i1> <i1 false, i1 false, i1 false, i1 true>, <4 x
>>> >> float> %xyzw, <4 x float> %abcd
>>> >> +  ret <4 x float> %1
>>> >> +}
>>> >> +
>>> >> +define <16 x i8> @constant_pblendvb(<16 x i8> %xyzw, <16 x i8> %abcd)
>>> >> {
>>> >> +; CHECK-LABEL: constant_pblendvb:
>>> >> +; CHECK: movaps
>>> >> +; CHECK: pblendvb
>>> >> +; CHECK: ret
>>> >> +  %1 = select <16 x i1> <i1 false, i1 false, i1 true, i1 false, i1
>>> >> true,
>>> >> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> >> true,
>>> >> i1 true, i1 true, i1 false>, <16 x i8> %xyzw, <16 x i8> %abcd
>>> >> +  ret <16 x i8> %1
>>> >> +}
>>> >> +declare <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8>, <16 x i8>, <16
>>> >> x
>>> >> i8>)
>>> >> +declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x
>>> >> float>, <4
>>> >> x float>)
>>> >> +declare <2 x double> @llvm.x86.sse41.blendvpd(<2 x double>, <2 x
>>> >> double>,
>>> >> <2 x double>)
>>> >>
>>> >> Added: llvm/trunk/test/Transforms/InstCombine/blend_x86.ll
>>> >> URL:
>>> >>
>>> >> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/blend_x86.ll?rev=209643&view=auto
>>> >>
>>> >>
>>> >> ==============================================================================
>>> >> --- llvm/trunk/test/Transforms/InstCombine/blend_x86.ll (added)
>>> >> +++ llvm/trunk/test/Transforms/InstCombine/blend_x86.ll Mon May 26
>>> >> 22:42:20 2014
>>> >> @@ -0,0 +1,56 @@
>>> >> +; RUN: opt < %s -instcombine -mtriple=x86_64-apple-macosx
>>> >> -mcpu=core-avx2
>>> >> -S | FileCheck %s
>>> >> +
>>> >> +define <2 x double> @constant_blendvpd(<2 x double> %xy, <2 x double>
>>> >> %ab) {
>>> >> +; CHECK-LABEL: @constant_blendvpd
>>> >> +; CHECK: select <2 x i1> <i1 true, i1 false>
>>> >> +  %1 = tail call <2 x double> @llvm.x86.sse41.blendvpd(<2 x double>
>>> >> %xy,
>>> >> <2 x double> %ab, <2 x double> <double 0xFFFFFFFFE0000000, double
>>> >> 0.000000e+00>)
>>> >> +  ret <2 x double> %1
>>> >> +}
>>> >> +
>>> >> +define <4 x float> @constant_blendvps(<4 x float> %xyzw, <4 x float>
>>> >> %abcd) {
>>> >> +; CHECK-LABEL: @constant_blendvps
>>> >> +; CHECK: select <4 x i1> <i1 false, i1 false, i1 false, i1 true>
>>> >> +  %1 = tail call <4 x float> @llvm.x86.sse41.blendvps(<4 x float>
>>> >> %xyzw,
>>> >> <4 x float> %abcd, <4 x float> <float 0.000000e+00, float
>>> >> 0.000000e+00,
>>> >> float 0.000000e+00, float 0xFFFFFFFFE0000000>)
>>> >> +  ret <4 x float> %1
>>> >> +}
>>> >> +
>>> >> +define <16 x i8> @constant_pblendvb(<16 x i8> %xyzw, <16 x i8> %abcd)
>>> >> {
>>> >> +; CHECK-LABEL: @constant_pblendvb
>>> >> +; CHECK: select <16 x i1> <i1 false, i1 false, i1 true, i1 false, i1
>>> >> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1
>>> >> false, i1
>>> >> true, i1 true, i1 true, i1 false>
>>> >> +  %1 = tail call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> %xyzw,
>>> >> <16
>>> >> x i8> %abcd, <16 x i8> <i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8
>>> >> 255, i8
>>> >> 0, i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8 0>)
>>> >> +  ret <16 x i8> %1
>>> >> +}
>>> >> +
>>> >> +define <4 x double> @constant_blendvpd_avx(<4 x double> %xy, <4 x
>>> >> double>
>>> >> %ab) {
>>> >> +; CHECK-LABEL: @constant_blendvpd_avx
>>> >> +; CHECK: select <4 x i1> <i1 true, i1 false, i1 true, i1 false>
>>> >> +  %1 = tail call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x
>>> >> double>
>>> >> %xy, <4 x double> %ab, <4 x double> <double 0xFFFFFFFFE0000000, double
>>> >> 0.000000e+00, double 0xFFFFFFFFE0000000, double 0.000000e+00>)
>>> >> +  ret <4 x double> %1
>>> >> +}
>>> >> +
>>> >> +define <8 x float> @constant_blendvps_avx(<8 x float> %xyzw, <8 x
>>> >> float>
>>> >> %abcd) {
>>> >> +; CHECK-LABEL: @constant_blendvps_avx
>>> >> +; CHECK: select <8 x i1> <i1 false, i1 false, i1 false, i1 true, i1
>>> >> false, i1 false, i1 false, i1 true>
>>> >> +  %1 = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
>>> >> %xyzw, <8 x float> %abcd, <8 x float> <float 0.000000e+00, float
>>> >> 0.000000e+00, float 0.000000e+00, float 0xFFFFFFFFE0000000, float
>>> >> 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float
>>> >> 0xFFFFFFFFE0000000>)
>>> >> +  ret <8 x float> %1
>>> >> +}
>>> >> +
>>> >> +define <32 x i8> @constant_pblendvb_avx2(<32 x i8> %xyzw, <32 x i8>
>>> >> %abcd) {
>>> >> +; CHECK-LABEL: @constant_pblendvb_avx2
>>> >> +; CHECK: select <32 x i1> <i1 false, i1 false, i1 true, i1 false, i1
>>> >> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1
>>> >> false, i1
>>> >> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1
>>> >> false, i1
>>> >> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1
>>> >> false, i1
>>> >> true, i1 true, i1 true, i1 false>
>>> >> +  %1 = tail call <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8> %xyzw,
>>> >> <32 x
>>> >> i8> %abcd,
>>> >> +        <32 x i8> <i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255,
>>> >> i8
>>> >> 0,
>>> >> +                   i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255,
>>> >> i8
>>> >> 0,
>>> >> +                   i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255,
>>> >> i8
>>> >> 0,
>>> >> +                   i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255,
>>> >> i8
>>> >> 0>)
>>> >> +  ret <32 x i8> %1
>>> >> +}
>>> >> +
>>> >> +declare <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8>, <16 x i8>, <16
>>> >> x
>>> >> i8>)
>>> >> +declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x
>>> >> float>, <4
>>> >> x float>)
>>> >> +declare <2 x double> @llvm.x86.sse41.blendvpd(<2 x double>, <2 x
>>> >> double>,
>>> >> <2 x double>)
>>> >> +
>>> >> +declare <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8>, <32 x i8>, <32 x
>>> >> i8>)
>>> >> +declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x
>>> >> float>,
>>> >> <8 x float>)
>>> >> +declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x
>>> >> double>, <4 x double>)
>>> >> +
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> llvm-commits mailing list
>>> >> llvm-commits at cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > llvm-commits mailing list
>>> > llvm-commits at cs.uiuc.edu
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> >
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>