[PATCH] D20443: [PowerPC] - Combine loads of v4i8 to loads of i32 followed by bitcast

Thu May 19 11:33:11 PDT 2016

nemanjai added a comment.

An example of the type of code we were getting...
With a code pattern such as this:

  define <16 x i8> @test(i32* %s, i32* %t) {
  entry:
    %0 = bitcast i32* %s to <4 x i8>*
    %1 = load <4 x i8>, <4 x i8>* %0, align 4
    %2 = shufflevector <4 x i8> %1, <4 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
    ret <16 x i8> %2
  }

We now get the following:

  lwz 3, 0(3)
  mtvsrd 34, 3
  xxswapd  0, 34
  xxspltw 34, 0, 3

and before this change, we were getting:

  lbz 5, 0(3)
  lbz 6, 1(3)
  addis 4, 2, .LCPI0_0 at toc@ha
  addi 4, 4, .LCPI0_0 at toc@l
  mtvsrd 34, 5
  lbz 5, 2(3)
  lbz 3, 3(3)
  lxvd2x 0, 0, 4
  mtvsrd 35, 6
  xxswapd  34, 34
  mtvsrd 36, 5
  mtvsrd 37, 3
  xxswapd  35, 35
  xxswapd  36, 36
  xxswapd  37, 37
  vmrglw 2, 3, 2
  xxswapd  50, 0
  vmrglw 19, 5, 4
  vperm 2, 19, 2, 18
  xxsldwi 12, 34, 34, 2
  mfvsrwz 3, 34
  xxsldwi 1, 34, 34, 1
  xxsldwi 2, 34, 34, 3
  mtvsrd 34, 3
  mfvsrwz 3, 12
  mfvsrwz 4, 1
  mtvsrd 35, 3
  mfvsrwz 3, 2
  mtvsrd 36, 4
  mtvsrd 37, 3
  addis 3, 2, .LCPI0_1 at toc@ha
  xxswapd  34, 34
  addi 3, 3, .LCPI0_1 at toc@l
  xxswapd  37, 37
  lxvd2x 13, 0, 3
  xxswapd  35, 35
  xxswapd  36, 36
  vmrglb 2, 5, 2
  xxswapd  51, 13
  vmrglb 3, 4, 3
  vmrglh 2, 2, 3
  xxspltw 34, 34, 3
  vperm 2, 2, 2, 19

- The TOC loads above were for materializing constant mask vectors for the vector shuffles that this degrades into.

This code pattern is a simulation of one that comes out of SROA and this patch provides a big improvement in one of the benchmarks.

Repository:
  rL LLVM

http://reviews.llvm.org/D20443