[PATCH] D103820: [X86] Prefer vpmovq2m over vpternlogd + vpcmpgtq

Mon Jun 7 09:59:08 PDT 2021

xbolva00 added a comment.

In D103820#2803120 <https://reviews.llvm.org/D103820#2803120>, @davezarzycki wrote:

> Actually, wait. Something weird is going on at the mid-level. These two functions should generate the same optimized IR, right?
>
>   typedef int V __attribute__((vector_size(64)));
>   
>   V lt_zero_x_y(V mask, V x, V y) { return mask <  0 ? x : y; }
>   V ge_zero_y_x(V mask, V x, V y) { return mask >= 0 ? y : x; }

Just cursious why with

  typedef int V __attribute__((vector_size(4)));

we produce

  define dso_local i32 @_Z11lt_zero_x_yDv1_iS_S_(i32 %0, i32 %1, i32 %2) local_unnamed_addr #0 {
    %4 = insertelement <1 x i32> poison, i32 %0, i32 0
    %5 = insertelement <1 x i32> poison, i32 %1, i32 0
    %6 = insertelement <1 x i32> poison, i32 %2, i32 0
    %7 = icmp sgt <1 x i32> %4, <i32 -1>
    %8 = select <1 x i1> %7, <1 x i32> %6, <1 x i32> %5
    %9 = extractelement <1 x i32> %8, i32 0
    ret i32 %9
  }

Why not scalarize it on IR level?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103820/new/

https://reviews.llvm.org/D103820