[PATCH][AVX512] Add 512b variable bit shift intrinsics

Demikhovsky, Elena elena.demikhovsky at intel.com
Thu Dec 11 00:28:26 PST 2014


You are right. This instruction does not have broadcast semantic.
You can commit the patch.

Thank you

-  Elena


-----Original Message-----
From: Cameron McInally [mailto:cameron.mcinally at nyu.edu] 
Sent: Wednesday, December 10, 2014 19:43
To: Demikhovsky, Elena
Subject: Re: [PATCH][AVX512] Add 512b variable bit shift intrinsics

Hey Elena,

On Wed, Dec 10, 2014 at 7:07 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> Hi Cameron,
>
> Please pass avx512vl_i32_info here:
> defm D : avx512_var_shift_sizes<opc, OpcodeStr#"d", OpNode,
> +                                 avx512vl_i32_info >, EVEX_CD8<32, 
> + CD8VQ>;
>
> and then
> +multiclass avx512_var_shift_sizes<bits<8> opc, string OpcodeStr, SDNode OpNode,
> +                                  AVX512VLVectorVTInfo _> {
> +  defm Z : avx512_var_shift<opc, OpcodeStr, OpNode, _.info512>, 
> +EVEX_V512; }
>
> Please add one test (not for each, just one) for load folding and for broadcast folding.

As it stands now, should the AVX512_maskable multiclass produce broadcast folding patterns? AFAICT, we only produce the non-masking, masking, and zero masking patterns. A simple test shows that the broadcast is not folded into the instruction...

define <8 x i64> @test_x86_avx512_psrlv_q_broadcast(<8 x i64> %a0, i64* %ptr) {
  ; CHECK-LABEL: test_x86_avx512_psrlv_q_broadcast
  ; CHECK: vpsrlvq (%
  %tmp = load i64* %ptr
  %j.0 = insertelement <8 x i64> undef, i64 %tmp, i32 0
  %j.1 = insertelement <8 x i64> %j.0, i64 %tmp, i32 1
  %j.2 = insertelement <8 x i64> %j.1, i64 %tmp, i32 2
  %j.3 = insertelement <8 x i64> %j.2, i64 %tmp, i32 3
  %j.4 = insertelement <8 x i64> %j.3, i64 %tmp, i32 4
  %j.5 = insertelement <8 x i64> %j.4, i64 %tmp, i32 5
  %j.6 = insertelement <8 x i64> %j.5, i64 %tmp, i32 6
  %j.7 = insertelement <8 x i64> %j.6, i64 %tmp, i32 7
  %res = call <8 x i64> @llvm.x86.avx512.mask.psrlv.q(<8 x i64> %a0,
<8 x i64> %j.7, <8 x i64> zeroinitializer, i8 -1)
  ret <8 x i64> %res
}

I've attached an updated patch with your other suggestions.

-Cameron
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.




More information about the llvm-commits mailing list