[PATCH] D98981: [SLP] allow matching integer min/max intrinsics as reduction ops
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Mar 21 06:32:52 PDT 2021
spatel added a comment.
> I suspect that we will need to adjust the cost models or tests because the PhaseOrdering test in D98152 <https://reviews.llvm.org/D98152> still doesn't vectorize with only x86 SSE2 (it does change if I add an attribute for SSE4.1).
Taking a closer look at that example, and I think this will actually be an improvement (ie, we should adjust the test, not the cost model).
Currently, we are favoring vectorization based on x86 SSE2 costs, but it seems wrong...
Without vectorizing we have a chain of cmov:
movl (%rdi), %eax
movl 4(%rdi), %ecx
cmpl %eax, %ecx
cmovll %ecx, %eax
movl 8(%rdi), %ecx
cmpl %eax, %ecx
cmovll %ecx, %eax
movl 12(%rdi), %ecx
cmpl %eax, %ecx
cmovll %ecx, %eax
With vectorization (but without the expected min/max instructions or even blendv), we have more code + transfer from xmm to GPR:
movdqu (%rdi), %xmm0
pshufd $238, %xmm0, %xmm1 # xmm1 = xmm0[2,3,2,3]
movdqa %xmm1, %xmm2
pcmpgtd %xmm0, %xmm2
pand %xmm2, %xmm0
pandn %xmm1, %xmm2
por %xmm0, %xmm2
pshufd $85, %xmm2, %xmm0 # xmm0 = xmm2[1,1,1,1]
movdqa %xmm0, %xmm1
pcmpgtd %xmm2, %xmm1
pand %xmm1, %xmm2
pandn %xmm0, %xmm1
por %xmm2, %xmm1
movd %xmm1, %eax
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98981/new/
https://reviews.llvm.org/D98981
More information about the llvm-commits
mailing list