[PATCH] D47019: [X86] Lowering rotation intrinsics to native IR
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 18 06:24:12 PDT 2018
spatel added a comment.
Have you seen the current llvm-dev thread about adding a generic rotate intrinsic?
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html
This transform has problems when some of the instructions get hoisted from loops (and that's likely the most important consideration for perf).
Here's a minimal example to demonstrate:
#include <immintrin.h>
void rotateInLoop(unsigned *x, unsigned N, __m128i *a, __m128i b) {
for (unsigned i = 0; i < N; ++i)
x[ _mm_extract_epi32(_mm_rolv_epi32(a[i], b), 0) ] = i;
}
Before this patch:
$ ./clang rotv.c -S -O1 -o - -mavx512vl
...
LBB0_2:
vmovdqa (%rdx), %xmm1
vprolvd %xmm0, %xmm1, %xmm1
vmovd %xmm1, %esi
movslq %esi, %rsi
movl %ecx, (%rdi,%rsi,4)
incq %rcx
addq $16, %rdx
cmpq %rcx, %rax
jne LBB0_2
After this patch:
LBB0_2:
vmovdqa (%rdx), %xmm2
vpsllvd %xmm0, %xmm2, %xmm3
vpsrlvd %xmm1, %xmm2, %xmm2
vpor %xmm3, %xmm2, %xmm2
vmovd %xmm2, %esi
movslq %esi, %rsi
movl %ecx, (%rdi,%rsi,4)
incq %rcx
addq $16, %rdx
cmpq %rcx, %rax
jne LBB0_2
I think you'll either need to implement this first:
https://bugs.llvm.org/show_bug.cgi?id=37417
...or limit this patch to the non-variable rotates, or just wait for the generic intrinsic?
Repository:
rL LLVM
https://reviews.llvm.org/D47019
More information about the llvm-commits
mailing list