[PATCH] D20443: [PowerPC] - Combine loads of v4i8 to loads of i32 followed by bitcast
Nemanja Ivanovic via llvm-commits
llvm-commits at lists.llvm.org
Thu May 19 11:33:11 PDT 2016
nemanjai added a comment.
An example of the type of code we were getting...
With a code pattern such as this:
define <16 x i8> @test(i32* %s, i32* %t) {
entry:
%0 = bitcast i32* %s to <4 x i8>*
%1 = load <4 x i8>, <4 x i8>* %0, align 4
%2 = shufflevector <4 x i8> %1, <4 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
ret <16 x i8> %2
}
We now get the following:
lwz 3, 0(3)
mtvsrd 34, 3
xxswapd 0, 34
xxspltw 34, 0, 3
and before this change, we were getting:
lbz 5, 0(3)
lbz 6, 1(3)
addis 4, 2, .LCPI0_0 at toc@ha
addi 4, 4, .LCPI0_0 at toc@l
mtvsrd 34, 5
lbz 5, 2(3)
lbz 3, 3(3)
lxvd2x 0, 0, 4
mtvsrd 35, 6
xxswapd 34, 34
mtvsrd 36, 5
mtvsrd 37, 3
xxswapd 35, 35
xxswapd 36, 36
xxswapd 37, 37
vmrglw 2, 3, 2
xxswapd 50, 0
vmrglw 19, 5, 4
vperm 2, 19, 2, 18
xxsldwi 12, 34, 34, 2
mfvsrwz 3, 34
xxsldwi 1, 34, 34, 1
xxsldwi 2, 34, 34, 3
mtvsrd 34, 3
mfvsrwz 3, 12
mfvsrwz 4, 1
mtvsrd 35, 3
mfvsrwz 3, 2
mtvsrd 36, 4
mtvsrd 37, 3
addis 3, 2, .LCPI0_1 at toc@ha
xxswapd 34, 34
addi 3, 3, .LCPI0_1 at toc@l
xxswapd 37, 37
lxvd2x 13, 0, 3
xxswapd 35, 35
xxswapd 36, 36
vmrglb 2, 5, 2
xxswapd 51, 13
vmrglb 3, 4, 3
vmrglh 2, 2, 3
xxspltw 34, 34, 3
vperm 2, 2, 2, 19
- The TOC loads above were for materializing constant mask vectors for the vector shuffles that this degrades into.
This code pattern is a simulation of one that comes out of SROA and this patch provides a big improvement in one of the benchmarks.
Repository:
rL LLVM
http://reviews.llvm.org/D20443
More information about the llvm-commits
mailing list