[PATCH] D52318: [x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)

Fri Sep 21 07:37:12 PDT 2018

andreadb added a comment.

Hi Sanjay,

You should add a test where the mask vector is not a constant.

I verified that on Jaguar, this change improves cases where:

- the mask is a constant
- users access the lo/hi part of the defined YMM.

In one particular case, I saw a quite nice improvement in IPC.

Unfortunately, I also found this regression:

  define <8 x i32> @bar(<8 x i32> %A, <8 x i32> %B, <8 x i32> %Mask) {
    %1 = and <8 x i32> %A, %Mask
    %2 = xor <8 x i32> %1, %Mask
    %3 = add <8 x i32> %2, %B
    ret <8 x i32> %3
  }

Before this patch (-mcpu=btver2):

  vandnps %ymm2, %ymm0, %ymm0
  vextractf128    $1, %ymm1, %xmm3
  vextractf128    $1, %ymm0, %xmm2
  vpaddd  %xmm1, %xmm0, %xmm0
  vpaddd  %xmm3, %xmm2, %xmm2
  vinsertf128     $1, %xmm2, %ymm0, %ymm0
  retq

After your patch:

  vxorps  %xmm3, %xmm3, %xmm3
  vextractf128    $1, %ymm1, %xmm4
  vcmptrueps      %ymm3, %ymm3, %ymm3
  vxorps  %ymm3, %ymm0, %ymm0
  vandps  %xmm2, %xmm0, %xmm3
  vextractf128    $1, %ymm0, %xmm0
  vextractf128    $1, %ymm2, %xmm2
  vpand   %xmm2, %xmm0, %xmm0
  vpaddd  %xmm1, %xmm3, %xmm1
  vpaddd  %xmm4, %xmm0, %xmm0
  vinsertf128     $1, %xmm0, %ymm1, %ymm0
  retq

Could you please have a look at it?

Thanks,
Andrea

https://reviews.llvm.org/D52318