r274110 - [AVX512] Zero extend cmp intrinsic return value.

Mon Jul 4 00:19:48 PDT 2016

I've modified the indices we use for the zero vector in r274484. This gives
better codegen as it now looks like a concat to the backend. So now we just
end up emitting unnecessary kshiftlw/kshiftrw pairs instead of converting
to a wider vector op and back.

~Craig

On Sun, Jul 3, 2016 at 1:32 PM, Craig Topper <craig.topper at gmail.com> wrote:

> Also should we change the AutoUpgrade code for the cmp intrinsics in the
> backend to also use zero instead of undef?
>
> ~Craig
>
> On Sun, Jul 3, 2016 at 12:11 AM, Breger, Igor <igor.breger at intel.com>
> wrote:
>
>> Hello Craig,
>>
>> Thanks a lot for pointing it out to me.  I familiar with this problem,
>>  we are planning  to improve CodeGen to handle with this case in near
>> future.
>>
>>
>>
>> Regards,
>>
>> Igor
>>
>>
>>
>> *From:* Craig Topper [mailto:craig.topper at gmail.com]
>> *Sent:* Saturday, July 02, 2016 08:46
>> *To:* Breger, Igor; Eric Christopher via cfe-commits
>> *Subject:* Re: r274110 - [AVX512] Zero extend cmp intrinsic return value.
>>
>>
>>
>> This change codgens to something really awful now. Can you take a look?
>>
>>
>>
>>             .section            __TEXT,__text,regular,pure_instructions
>>
>>             .section            __TEXT,__literal8,8byte_literals
>>
>>             .p2align           3
>>
>> LCPI0_0:
>>
>>             .quad   -1
>>
>>             .section            __TEXT,__const
>>
>>             .p2align           6
>>
>> LCPI0_1:
>>
>>             .quad   0
>>
>>             .quad   1
>>
>>             .quad   2
>>
>>             .quad   3
>>
>>             .quad   8
>>
>>             .quad   8
>>
>>             .quad   8
>>
>>             .quad   8
>>
>>             .section            __TEXT,__text,regular,pure_instructions
>>
>>             .globl   _test_mm_cmpeq_epu32_mask
>>
>>             .p2align           4, 0x90
>>
>> _test_mm_cmpeq_epu32_mask:
>>
>>             vpcmpeqd       %xmm1, %xmm0, %k1
>>
>>             vpbroadcastq   LCPI0_0(%rip), %zmm0 {%k1} {z}
>>
>>             vpxord %zmm1, %zmm1, %zmm1
>>
>>             vmovdqa64     LCPI0_1(%rip), %zmm2
>>
>>             vpermt2q         %zmm1, %zmm2, %zmm0
>>
>>             vpsllq   $63, %zmm0, %zmm0
>>
>>             vptestmq         %zmm0, %zmm0, %k0
>>
>>             kmovw            %k0, %eax
>>
>>             retq
>>
>>
>> ~Craig
>>
>>
>>
>> On Wed, Jun 29, 2016 at 1:14 AM, Igor Breger via cfe-commits <
>> cfe-commits at lists.llvm.org> wrote:
>>
>> Author: ibreger
>> Date: Wed Jun 29 03:14:17 2016
>> New Revision: 274110
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=274110&view=rev
>> Log:
>> [AVX512]  Zero extend cmp intrinsic return value.
>>
>> Differential Revision: http://reviews.llvm.org/D21746
>>
>> Modified:
>>     cfe/trunk/lib/CodeGen/CGBuiltin.cpp
>>     cfe/trunk/test/CodeGen/avx512vl-builtins.c
>>
>> Modified: cfe/trunk/lib/CodeGen/CGBuiltin.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGBuiltin.cpp?rev=274110&r1=274109&r2=274110&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/lib/CodeGen/CGBuiltin.cpp (original)
>> +++ cfe/trunk/lib/CodeGen/CGBuiltin.cpp Wed Jun 29 03:14:17 2016
>> @@ -6460,8 +6460,8 @@ static Value *EmitX86MaskedCompare(CodeG
>>        Indices[i] = i;
>>      for (unsigned i = NumElts; i != 8; ++i)
>>        Indices[i] = NumElts;
>> -    Cmp = CGF.Builder.CreateShuffleVector(Cmp,
>> UndefValue::get(Cmp->getType()),
>> -                                          Indices);
>> +    Cmp = CGF.Builder.CreateShuffleVector(
>> +        Cmp, llvm::Constant::getNullValue(Cmp->getType()), Indices);
>>    }
>>    return CGF.Builder.CreateBitCast(Cmp,
>>                                     IntegerType::get(CGF.getLLVMContext(),
>>
>> Modified: cfe/trunk/test/CodeGen/avx512vl-builtins.c
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGen/avx512vl-builtins.c?rev=274110&r1=274109&r2=274110&view=diff
>>
>> ==============================================================================
>> --- cfe/trunk/test/CodeGen/avx512vl-builtins.c (original)
>> +++ cfe/trunk/test/CodeGen/avx512vl-builtins.c Wed Jun 29 03:14:17 2016
>> @@ -8,6 +8,7 @@
>>  __mmask8 test_mm_cmpeq_epu32_mask(__m128i __a, __m128i __b) {
>>    // CHECK-LABEL: @test_mm_cmpeq_epu32_mask
>>    // CHECK: icmp eq <4 x i32> %{{.*}}, %{{.*}}
>> +  // CHECK: shufflevector <4 x i1> %{{.*}}, <4 x i1> zeroinitializer, <8
>> x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 4, i32 4, i32 4>
>>    return (__mmask8)_mm_cmpeq_epu32_mask(__a, __b);
>>  }
>>
>>
>>
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160704/f340713d/attachment-0001.html>