[PATCH] D98981: [SLP] allow matching integer min/max intrinsics as reduction ops

Sun Mar 21 06:32:52 PDT 2021

spatel added a comment.

> I suspect that we will need to adjust the cost models or tests because the PhaseOrdering test in D98152 <https://reviews.llvm.org/D98152> still doesn't vectorize with only x86 SSE2 (it does change if I add an attribute for SSE4.1).

Taking a closer look at that example, and I think this will actually be an improvement (ie, we should adjust the test, not the cost model).

Currently, we are favoring vectorization based on x86 SSE2 costs, but it seems wrong...

Without vectorizing we have a chain of cmov:

  movl	(%rdi), %eax
  movl	4(%rdi), %ecx
  cmpl	%eax, %ecx
  cmovll	%ecx, %eax
  movl	8(%rdi), %ecx
  cmpl	%eax, %ecx
  cmovll	%ecx, %eax
  movl	12(%rdi), %ecx
  cmpl	%eax, %ecx
  cmovll	%ecx, %eax

With vectorization (but without the expected min/max instructions or even blendv), we have more code + transfer from xmm to GPR:

  movdqu	(%rdi), %xmm0
  pshufd	$238, %xmm0, %xmm1              # xmm1 = xmm0[2,3,2,3]
  movdqa	%xmm1, %xmm2
  pcmpgtd	%xmm0, %xmm2
  pand	%xmm2, %xmm0
  pandn	%xmm1, %xmm2
  por	%xmm0, %xmm2
  pshufd	$85, %xmm2, %xmm0               # xmm0 = xmm2[1,1,1,1]
  movdqa	%xmm0, %xmm1
  pcmpgtd	%xmm2, %xmm1
  pand	%xmm1, %xmm2
  pandn	%xmm0, %xmm1
  por	%xmm2, %xmm1
  movd	%xmm1, %eax

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98981/new/

https://reviews.llvm.org/D98981