[PATCH] D47019: [X86] Lowering rotation intrinsics to native IR

Fri May 18 06:24:12 PDT 2018

spatel added a comment.

Have you seen the current llvm-dev thread about adding a generic rotate intrinsic?
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html

This transform has problems when some of the instructions get hoisted from loops (and that's likely the most important consideration for perf).

Here's a minimal example to demonstrate:

  #include <immintrin.h>

  void rotateInLoop(unsigned *x, unsigned N, __m128i *a, __m128i b) {
    for (unsigned i = 0; i < N; ++i)
      x[ _mm_extract_epi32(_mm_rolv_epi32(a[i], b), 0) ] = i;
  }

Before this patch:

  $ ./clang rotv.c -S -O1 -o - -mavx512vl
  ...
  LBB0_2:       
  	vmovdqa	(%rdx), %xmm1
  	vprolvd	%xmm0, %xmm1, %xmm1
  	vmovd	%xmm1, %esi
  	movslq	%esi, %rsi
  	movl	%ecx, (%rdi,%rsi,4)
  	incq	%rcx
  	addq	$16, %rdx
  	cmpq	%rcx, %rax
  	jne	LBB0_2

After this patch:

  LBB0_2: 
  	vmovdqa	(%rdx), %xmm2
  	vpsllvd	%xmm0, %xmm2, %xmm3
  	vpsrlvd	%xmm1, %xmm2, %xmm2
  	vpor	%xmm3, %xmm2, %xmm2
  	vmovd	%xmm2, %esi
  	movslq	%esi, %rsi
  	movl	%ecx, (%rdi,%rsi,4)
  	incq	%rcx
  	addq	$16, %rdx
  	cmpq	%rcx, %rax
  	jne	LBB0_2

I think you'll either need to implement this first:
https://bugs.llvm.org/show_bug.cgi?id=37417
...or limit this patch to the non-variable rotates, or just wait for the generic intrinsic?

Repository:
  rL LLVM

https://reviews.llvm.org/D47019