[PATCH] D48725: [SLP] Vectorize bit-parallel operations with SWAR.

Clement Courbet via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 29 08:32:02 PDT 2018


courbet added a comment.

In https://reviews.llvm.org/D48725#1147883, @RKSimon wrote:

> If we're only ever going to be using load/store + and/or/xor ops I wonder if we'd be better off doing this in the DAG alongside the LoadCombine handling? SLP is going to struggle with more general cases where the sizes of bundle elements differ.




In https://reviews.llvm.org/D48725#1147883, @RKSimon wrote:

> If we're only ever going to be using load/store + and/or/xor ops I wonder if we'd be better off doing this in the DAG alongside the LoadCombine handling? SLP is going to struggle with more general cases where the sizes of bundle elements differ.


There are other advantages that we get from reusing the infrastructure of the SLP vectorizer. Besides load/stores and logicals we also get shuffles for free. Consider this code:

  struct S {
    int32_t a;
    int32_t b;
    int64_t c;
    int32_t d;
  };
  
  S copy_2xi32(const S& s) {
    S result;
    result.a = s.b;
    result.b = s.a;
    return result;
  }

Without the change this lowers to:

  copy_2xi32(S): # @copy_2xi32(S)
    mov eax, dword ptr [rsp + 12]
    mov dword ptr [rdi], eax
    mov eax, dword ptr [rsp + 8]
    mov dword ptr [rdi + 4], eax
    mov rax, rdi
    ret

With the change this lowers to:

  0000000000000000 <_Z10copy_2xi32RK1S>:
     0:	f3 0f 7e 06          	movq   (%rsi),%xmm0
     4:	66 0f 70 c0 e1       	pshufd $0xe1,%xmm0,%xmm0
     9:	66 0f d6 07          	movq   %xmm0,(%rdi)
     d:	48 89 f8             	mov    %rdi,%rax
    10:	c3                   	retq   


Repository:
  rL LLVM

https://reviews.llvm.org/D48725





More information about the llvm-commits mailing list