[PATCH] D53037: [InstCombine] combine a shuffle and an extract subvector shuffle
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 5 08:37:28 PST 2019
fhahn added a comment.
Herald added a project: LLVM.
I found a case were this combine causes a codegen regression on AArch64. In the example below, `%s0` puts data into a 128 bit vector and `%s1` and `%s2` extract the lower and upper halves. Without folding `%s0` and `%s1`, we can generate a single AArch64 tbl instruction for `%s0` and a mov instruction for `%s2`. With the fold in this patch, we generate 3 additional instructions: additional tbl for `%s2` and 2 instructions for loading the mask.
So on AArch64, the combine produces worse code, in case we can generate a single tbl instruction for the top-level shuffle and we extract the lower and upper halves, which is cheap. Do you have an idea how to best address the issue?
define <8 x i16> @test(<16 x i8> %s) {
entry:
%0 = sub <16 x i8> <i8 undef, i8 undef, i8 undef, i8 -1, i8 undef, i8 undef, i8 undef, i8 -1, i8 undef, i8 undef, i8 undef, i8 -1, i8 undef, i8 undef, i8 undef, i8 -1>, %s
%s0 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 3, i32 3, i32 3, i32 3, i32 7, i32 7, i32 7, i32 7, i32 11, i32 11, i32 11, i32 11, i32 15, i32 15, i32 15, i32 15>
%s1 = shufflevector <16 x i8> %s0, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%s2 = shufflevector <16 x i8> %s0, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%a = call <8 x i16> @fn(<8 x i8> %s1, <8 x i8> %s2) #6
ret <8 x i16> %a
}
declare <8 x i16> @fn(<8 x i8>, <8 x i8>)
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D53037/new/
https://reviews.llvm.org/D53037
More information about the llvm-commits
mailing list