[PATCH] D30137: [InstCombine] shrink truncated insertelement with constant operand
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 7 12:44:54 PST 2017
efriedma added a comment.
We can ignore deficiencies which are obviously bugs; I was thinking of something more like this:
define <16 x i8> @trunc_inselt1(<16 x i16> %a, i16 %x) {
%vec = insertelement <16 x i16> %a, i16 %x, i32 1
%trunc = trunc <16 x i16> %vec to <16 x i8>
ret <16 x i8> %trunc
}
define <16 x i8> @trunc_inselt2(<16 x i16> %a, i8 %x) {
%trunc = trunc <16 x i16> %a to <16 x i8>
%vec = insertelement <16 x i8> %trunc, i8 %x, i32 1
ret <16 x i8> %vec
}
For trunc_inselt2, the x86 backend produces:
trunc_inselt2: # @trunc_inselt2
.cfi_startproc
# BB#0:
movdqa .LCPI1_0(%rip), %xmm2 # xmm2 = [255,255,255,255,255,255,255,255]
pand %xmm2, %xmm1
pand %xmm2, %xmm0
packuswb %xmm1, %xmm0
movdqa .LCPI1_1(%rip), %xmm1 # xmm1 = [255,0,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
pand %xmm1, %xmm0
movd %edi, %xmm2
psllw $8, %xmm2
pandn %xmm2, %xmm1
por %xmm1, %xmm0
retq
The pand+movd+psllw+pandn+por sequence is clearly a lot worse than a pinsrw.
https://reviews.llvm.org/D30137
More information about the llvm-commits
mailing list