[PATCH] D46957: [x86] Lower some trunc + shuffle patterns to vpmov[q|d][b|w]

Mikhail Dvoretckii via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 4 04:49:01 PDT 2018


mike.dvoretsky commandeered this revision.
mike.dvoretsky edited reviewers, added: GBuella; removed: mike.dvoretsky.
mike.dvoretsky added a comment.

Taking over at @gbuella's request. The patterns that are currently implemented will be finished, but I don't have much hope for the masked versions. Since the mask is only good for the lower elements and the upped elements must be zeroed out, lowering the masked versions of these intrinsics would require not simple selects (see PR34877), but patterns like

  define <8 x i16> @trunc_v4i64_to_v4i16_return_v2i64_1(<4 x i64> %vec, i8 zeroext %k, <2 x i64> %dest) nounwind {
    %truncated = trunc <4 x i64> %vec to <4 x i16>
    %dst = bitcast <2 x i64> %dest to <8 x i16>
    %dst_select = shufflevector <8 x i16> %dst, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
    %mask = trunc i8 %k to i4
    %mask_vec = bitcast i4 %mask to <4 x i1>
    %res = select <4 x i1> %mask_vec, <4 x i16> %truncated, <4 x i16> %dst_select
    %result = shufflevector <4 x i16> %res, <4 x i16> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
    ret <8 x i16> %result
  }

or

  define <8 x i16> @trunc_v4i64_to_v4i16_return_v2i64_2(<4 x i64> %vec, i8 zeroext %k, <2 x i64> %dest) nounwind {
    %truncated = trunc <4 x i64> %vec to <4 x i16>
    %res = shufflevector <4 x i16> %truncated, <4 x i16> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
    %mask = xor i8 %k, -1
    %mask1 = and i8 %mask, 15
    %mask_vec = bitcast i8 %mask1 to <8 x i1>
    %dst = bitcast <2 x i64> %dest to <8 x i16>
    %result = select <8 x i1> %mask_vec, <8 x i16> %dst, <8 x i16> %res
    ret <8 x i16> %result
  }

and I feel that both of these are too complex.


https://reviews.llvm.org/D46957





More information about the llvm-commits mailing list