[llvm] r209643 - Convert some X86 blendv* intrinsics into IR.

Tue May 27 05:42:41 PDT 2014

On Tue, May 27, 2014 at 11:43 AM, Ilia Filippov <ili.filippov at gmail.com> wrote:
> Are you sure, that the order of arguments is right now? Your changes now
> translate this intrinsic:
> %blend = call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %old, <8 x
> float> %new, <8 x float> <float 0xFFFFFFFFE0000000, float
> 0xFFFFFFFFE0000000, float 0.000000e+00, float 0.000000e+00, float
> 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00>)
> into this instruction
> %blend = select <8 x i1> <i1 true, i1 true, i1 false, i1 false, i1 false, i1
> false, i1 false, i1 false>, <8 x float> %old, <8 x float> %new
>
> But before changes the first two elements will be from "new" and after
> changes the first two elements will be from "old".

Ilia is right.

the last two operands of the select instruction are definitely in the
wrong order.

Another example:

define <2 x double> @constant_blendvpd(<2 x double> %xy, <2 x double> %ab) {
  %1 = tail call <2 x double> @llvm.x86.sse41.blendvpd(<2 x double>
%xy, <2 x double> %ab, <2 x double> <double 0xFFFFFFFFE0000000, double
0.000000e+00>)
  ret <2 x double> %1
}

Before this patch (with -mcpu=corei7 -march=x86-64 ) llc produced the
following sequence:
  movapd %xmm0, %xmm2
  movsd .LCPI0_0(%rip), %xmm0     // with ,LCPI0_0 pointing to '.quad
-536870912'
  blendvpd %xmm1, %xmm2  // %xmm2 = <%xmm1[0], %xmm2[1]>
  movapd %xmm2, %xmm0

That sequence is correct, and %xmm0 is basically vector <ab[0], xy[1]>

With this patch we get instead:
  movsd %xmm0, %xmm1  // %xmm1 = <%xmm0[0], %xmm1[1]>
  movaps %xmm1, %xmm0

So, basically <%xy[0], %ab[1]>.
It should have been the other way round (i.e. movsd %xmm1, %xmm0).

As a side note:  revision 209643 adds extra test cases to files:
avx-blend.ll; sse41-blend.ll and avx2-blend.ll.
However, those tests are unrelated to the change committed at revision
209643 since they are clearly not testing the new folding rule added
to the instruction combiner.

>
>
> 2014-05-27 14:08 GMT+04:00 Daniel Jasper <djasper at google.com>:
>
>>
>>
>>
>> On Tue, May 27, 2014 at 5:42 AM, Filipe Cabecinhas <me at filcab.net> wrote:
>>>
>>> Author: filcab
>>> Date: Mon May 26 22:42:20 2014
>>> New Revision: 209643
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=209643&view=rev
>>> Log:
>>> Convert some X86 blendv* intrinsics into IR.
>>>
>>> Summary:
>>> Implemented an InstCombine transformation that takes a blendv* intrinsic
>>> call and translates it into an IR select, if the mask is constant.
>>>
>>> This will eventually get lowered into blends with immediates if possible,
>>> or pblendvb (with an option to further optimize if we can transform the
>>> pblendvb into a blend+immediate instruction, depending on the selector).
>>> It will also enable optimizations by the IR passes, which give up on
>>> sight of the intrinsic.
>>>
>>> Both the transformation and the lowering of its result to asm got shiny
>>> new tests.
>>>
>>> The transformation is a bit convoluted because of blendvp[sd]'s
>>> definition:
>>>
>>> Its mask is a floating point value! This forces us to convert it and get
>>> the highest bit. I suppose this happened because the mask has type
>>> __m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.
>>>
>>> I will send an email to llvm-dev to discuss if we want to change this or
>>> not.
>>>
>>> Reviewers: grosbach, delena, nadav
>>>
>>> Differential Revision: http://reviews.llvm.org/D3859
>>>
>>> Added:
>>>     llvm/trunk/test/CodeGen/X86/avx2-blend.ll
>>>     llvm/trunk/test/Transforms/InstCombine/blend_x86.ll
>>> Modified:
>>>     llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>>>     llvm/trunk/test/CodeGen/X86/avx-blend.ll
>>>     llvm/trunk/test/CodeGen/X86/sse41-blend.ll
>>>
>>> Modified: llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp?rev=209643&r1=209642&r2=209643&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp (original)
>>> +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp Mon May 26
>>> 22:42:20 2014
>>> @@ -718,6 +718,41 @@ Instruction *InstCombiner::visitCallInst
>>>      break;
>>>    }
>>>
>>> +  case Intrinsic::x86_sse41_pblendvb:
>>> +  case Intrinsic::x86_sse41_blendvps:
>>> +  case Intrinsic::x86_sse41_blendvpd:
>>> +  case Intrinsic::x86_avx_blendv_ps_256:
>>> +  case Intrinsic::x86_avx_blendv_pd_256:
>>> +  case Intrinsic::x86_avx2_pblendvb: {
>>> +    // Convert blendv* to vector selects if the mask is constant.
>>> +    // This optimization is convoluted because the intrinsic is defined
>>> as
>>> +    // getting a vector of floats or doubles for the ps and pd versions.
>>> +    // FIXME: That should be changed.
>>> +    Value *Mask = II->getArgOperand(2);
>>> +    if (auto C = dyn_cast<ConstantDataVector>(Mask)) {
>>> +      auto Tyi1 = Builder->getInt1Ty();
>>> +      auto SelectorType = cast<VectorType>(Mask->getType());
>>> +      auto EltTy = SelectorType->getElementType();
>>> +      unsigned Size = SelectorType->getNumElements();
>>> +      unsigned BitWidth = EltTy->isFloatTy() ? 32 : (EltTy->isDoubleTy()
>>> ? 64 : EltTy->getIntegerBitWidth());
>>> +      assert(BitWidth == 64 || BitWidth == 32 || BitWidth == 8 && "Wrong
>>> arguments for variable blend intrinsic");
>>
>>
>> This is assert is bad in that it triggers Clang's operator precedence
>> warning and works correctly more or less by accident (consider the
>> precedence of || and && - with the string implicitly evaluating to true).
>> Fixed in r209648.
>>
>> Also, please adhere to LLVM coding standards (most importantly the 80
>> column limit).
>>
>>>
>>> +      SmallVector<Constant*, 32> Selectors;
>>> +      for (unsigned I = 0; I < Size; ++I) {
>>> +        // The intrinsics only read the top bit
>>> +        uint64_t Selector;
>>> +        if (BitWidth == 8)
>>> +          Selector = C->getElementAsInteger(I);
>>> +        else
>>> +          Selector =
>>> C->getElementAsAPFloat(I).bitcastToAPInt().getZExtValue();
>>> +        Selectors.push_back(ConstantInt::get(Tyi1, Selector >> (BitWidth
>>> - 1)));
>>> +      }
>>> +      auto NewSelector = ConstantVector::get(Selectors);
>>> +      return SelectInst::Create(NewSelector, II->getArgOperand(0),
>>> II->getArgOperand(1), "blendv");
>>> +    } else {
>>> +      break;
>>> +    }
>>> +  }
>>> +
>>>    case Intrinsic::x86_avx_vpermilvar_ps:
>>>    case Intrinsic::x86_avx_vpermilvar_ps_256:
>>>    case Intrinsic::x86_avx_vpermilvar_pd:
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/avx-blend.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-blend.ll?rev=209643&r1=209642&r2=209643&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/avx-blend.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/avx-blend.ll Mon May 26 22:42:20 2014
>>> @@ -135,3 +135,26 @@ define <2 x double> @testb(<2 x double>
>>>    %min = select <2 x i1> %min_is_x, <2 x double> %x, <2 x double> %y
>>>    ret <2 x double> %min
>>>  }
>>> +
>>> +; If we can figure out a blend has a constant mask, we should emit the
>>> +; blend instruction with an immediate mask
>>> +define <4 x double> @constant_blendvpd_avx(<4 x double> %xy, <4 x
>>> double> %ab) {
>>> +; CHECK-LABEL: constant_blendvpd_avx:
>>> +; CHECK-NOT: mov
>>> +; CHECK: vblendpd
>>> +; CHECK: ret
>>> +  %1 = select <4 x i1> <i1 false, i1 false, i1 true, i1 false>, <4 x
>>> double> %xy, <4 x double> %ab
>>> +  ret <4 x double> %1
>>> +}
>>> +
>>> +define <8 x float> @constant_blendvps_avx(<8 x float> %xyzw, <8 x float>
>>> %abcd) {
>>> +; CHECK-LABEL: constant_blendvps_avx:
>>> +; CHECK-NOT: mov
>>> +; CHECK: vblendps
>>> +; CHECK: ret
>>> +  %1 = select <8 x i1> <i1 false, i1 false, i1 false, i1 true, i1 false,
>>> i1 false, i1 false, i1 true>, <8 x float> %xyzw, <8 x float> %abcd
>>> +  ret <8 x float> %1
>>> +}
>>> +
>>> +declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x
>>> float>, <8 x float>)
>>> +declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x
>>> double>, <4 x double>)
>>>
>>> Added: llvm/trunk/test/CodeGen/X86/avx2-blend.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx2-blend.ll?rev=209643&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/avx2-blend.ll (added)
>>> +++ llvm/trunk/test/CodeGen/X86/avx2-blend.ll Mon May 26 22:42:20 2014
>>> @@ -0,0 +1,11 @@
>>> +; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core-avx2 | FileCheck
>>> %s
>>> +
>>> +define <32 x i8> @constant_pblendvb_avx2(<32 x i8> %xyzw, <32 x i8>
>>> %abcd) {
>>> +; CHECK-LABEL: constant_pblendvb_avx2:
>>> +; CHECK: vmovdqa
>>> +; CHECK: vpblendvb
>>> +  %1 = select <32 x i1> <i1 false, i1 false, i1 true, i1 false, i1 true,
>>> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 true,
>>> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 true,
>>> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 true,
>>> i1 true, i1 true, i1 false>, <32 x i8> %xyzw, <32 x i8> %abcd
>>> +  ret <32 x i8> %1
>>> +}
>>> +
>>> +declare <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8>, <32 x i8>, <32 x
>>> i8>)
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/sse41-blend.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse41-blend.ll?rev=209643&r1=209642&r2=209643&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/sse41-blend.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/sse41-blend.ll Mon May 26 22:42:20 2014
>>> @@ -88,3 +88,35 @@ entry:
>>>    store double %extract214vector_func.i, double addrspace(1)* undef,
>>> align 8
>>>    ret void
>>>  }
>>> +
>>> +; If we can figure out a blend has a constant mask, we should emit the
>>> +; blend instruction with an immediate mask
>>> +define <2 x double> @constant_blendvpd(<2 x double> %xy, <2 x double>
>>> %ab) {
>>> +; In this case, we emit a simple movss
>>> +; CHECK-LABEL: constant_blendvpd
>>> +; CHECK: movsd
>>> +; CHECK: ret
>>> +  %1 = select <2 x i1> <i1 true, i1 false>, <2 x double> %xy, <2 x
>>> double> %ab
>>> +  ret <2 x double> %1
>>> +}
>>> +
>>> +define <4 x float> @constant_blendvps(<4 x float> %xyzw, <4 x float>
>>> %abcd) {
>>> +; CHECK-LABEL: constant_blendvps
>>> +; CHECK-NOT: mov
>>> +; CHECK: blendps $7
>>> +; CHECK: ret
>>> +  %1 = select <4 x i1> <i1 false, i1 false, i1 false, i1 true>, <4 x
>>> float> %xyzw, <4 x float> %abcd
>>> +  ret <4 x float> %1
>>> +}
>>> +
>>> +define <16 x i8> @constant_pblendvb(<16 x i8> %xyzw, <16 x i8> %abcd) {
>>> +; CHECK-LABEL: constant_pblendvb:
>>> +; CHECK: movaps
>>> +; CHECK: pblendvb
>>> +; CHECK: ret
>>> +  %1 = select <16 x i1> <i1 false, i1 false, i1 true, i1 false, i1 true,
>>> i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1 true,
>>> i1 true, i1 true, i1 false>, <16 x i8> %xyzw, <16 x i8> %abcd
>>> +  ret <16 x i8> %1
>>> +}
>>> +declare <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8>, <16 x i8>, <16 x
>>> i8>)
>>> +declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x float>,
>>> <4 x float>)
>>> +declare <2 x double> @llvm.x86.sse41.blendvpd(<2 x double>, <2 x
>>> double>, <2 x double>)
>>>
>>> Added: llvm/trunk/test/Transforms/InstCombine/blend_x86.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/blend_x86.ll?rev=209643&view=auto
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/InstCombine/blend_x86.ll (added)
>>> +++ llvm/trunk/test/Transforms/InstCombine/blend_x86.ll Mon May 26
>>> 22:42:20 2014
>>> @@ -0,0 +1,56 @@
>>> +; RUN: opt < %s -instcombine -mtriple=x86_64-apple-macosx
>>> -mcpu=core-avx2 -S | FileCheck %s
>>> +
>>> +define <2 x double> @constant_blendvpd(<2 x double> %xy, <2 x double>
>>> %ab) {
>>> +; CHECK-LABEL: @constant_blendvpd
>>> +; CHECK: select <2 x i1> <i1 true, i1 false>
>>> +  %1 = tail call <2 x double> @llvm.x86.sse41.blendvpd(<2 x double> %xy,
>>> <2 x double> %ab, <2 x double> <double 0xFFFFFFFFE0000000, double
>>> 0.000000e+00>)
>>> +  ret <2 x double> %1
>>> +}
>>> +
>>> +define <4 x float> @constant_blendvps(<4 x float> %xyzw, <4 x float>
>>> %abcd) {
>>> +; CHECK-LABEL: @constant_blendvps
>>> +; CHECK: select <4 x i1> <i1 false, i1 false, i1 false, i1 true>
>>> +  %1 = tail call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %xyzw,
>>> <4 x float> %abcd, <4 x float> <float 0.000000e+00, float 0.000000e+00,
>>> float 0.000000e+00, float 0xFFFFFFFFE0000000>)
>>> +  ret <4 x float> %1
>>> +}
>>> +
>>> +define <16 x i8> @constant_pblendvb(<16 x i8> %xyzw, <16 x i8> %abcd) {
>>> +; CHECK-LABEL: @constant_pblendvb
>>> +; CHECK: select <16 x i1> <i1 false, i1 false, i1 true, i1 false, i1
>>> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> true, i1 true, i1 true, i1 false>
>>> +  %1 = tail call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> %xyzw, <16
>>> x i8> %abcd, <16 x i8> <i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8
>>> 0, i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8 0>)
>>> +  ret <16 x i8> %1
>>> +}
>>> +
>>> +define <4 x double> @constant_blendvpd_avx(<4 x double> %xy, <4 x
>>> double> %ab) {
>>> +; CHECK-LABEL: @constant_blendvpd_avx
>>> +; CHECK: select <4 x i1> <i1 true, i1 false, i1 true, i1 false>
>>> +  %1 = tail call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>
>>> %xy, <4 x double> %ab, <4 x double> <double 0xFFFFFFFFE0000000, double
>>> 0.000000e+00, double 0xFFFFFFFFE0000000, double 0.000000e+00>)
>>> +  ret <4 x double> %1
>>> +}
>>> +
>>> +define <8 x float> @constant_blendvps_avx(<8 x float> %xyzw, <8 x float>
>>> %abcd) {
>>> +; CHECK-LABEL: @constant_blendvps_avx
>>> +; CHECK: select <8 x i1> <i1 false, i1 false, i1 false, i1 true, i1
>>> false, i1 false, i1 false, i1 true>
>>> +  %1 = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
>>> %xyzw, <8 x float> %abcd, <8 x float> <float 0.000000e+00, float
>>> 0.000000e+00, float 0.000000e+00, float 0xFFFFFFFFE0000000, float
>>> 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float
>>> 0xFFFFFFFFE0000000>)
>>> +  ret <8 x float> %1
>>> +}
>>> +
>>> +define <32 x i8> @constant_pblendvb_avx2(<32 x i8> %xyzw, <32 x i8>
>>> %abcd) {
>>> +; CHECK-LABEL: @constant_pblendvb_avx2
>>> +; CHECK: select <32 x i1> <i1 false, i1 false, i1 true, i1 false, i1
>>> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 true, i1 false, i1
>>> true, i1 true, i1 true, i1 false>
>>> +  %1 = tail call <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8> %xyzw, <32
>>> x i8> %abcd,
>>> +        <32 x i8> <i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8
>>> 0,
>>> +                   i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8
>>> 0,
>>> +                   i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8
>>> 0,
>>> +                   i8 0, i8 0, i8 255, i8 0, i8 255, i8 255, i8 255, i8
>>> 0>)
>>> +  ret <32 x i8> %1
>>> +}
>>> +
>>> +declare <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8>, <16 x i8>, <16 x
>>> i8>)
>>> +declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x float>,
>>> <4 x float>)
>>> +declare <2 x double> @llvm.x86.sse41.blendvpd(<2 x double>, <2 x
>>> double>, <2 x double>)
>>> +
>>> +declare <32 x i8> @llvm.x86.avx2.pblendvb(<32 x i8>, <32 x i8>, <32 x
>>> i8>)
>>> +declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x
>>> float>, <8 x float>)
>>> +declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x
>>> double>, <4 x double>)
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>