[PATCH] D14588: [X86][SSE] Transform truncation from v8i32/v16i32 to v8i8/v16i8 into bitand and X86ISD::PACKUS operations during DAG combine.
Cong Hou via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 11 14:11:59 PST 2015
congh created this revision.
congh added reviewers: hfinkel, dexonsmith, RKSimon, davidxl.
congh added a subscriber: llvm-commits.
This patch transforms truncation from v8i32/v16i32 to v8i8/v16i8 into bitand and X86ISD::PACKUS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of those two truncations. For example, for the following IR:
define void @truncate_v16i32_to_v16i8(<16 x i32> %a) {
%1 = trunc <16 x i32> %a to <16 x i8>
store <16 x i8> %1, <16 x i8>* undef, align 4
ret void
}
On SSE2 previously it will be compiled into 33 instructions:
movdqa %xmm3, -24(%rsp)
movdqa %xmm1, -56(%rsp)
movdqa %xmm2, -40(%rsp)
movdqa %xmm0, -72(%rsp)
punpcklbw %xmm3, %xmm1
punpcklbw %xmm2, %xmm0
punpcklbw %xmm1, %xmm0
movd -20(%rsp), %xmm1
movd -52(%rsp), %xmm2
movd -16(%rsp), %xmm3
movd -48(%rsp), %xmm4
punpcklbw %xmm3, %xmm4
movd -36(%rsp), %xmm3
movd -68(%rsp), %xmm5
movd -32(%rsp), %xmm6
movd -64(%rsp), %xmm7
punpcklbw %xmm6, %xmm7
punpcklbw %xmm4, %xmm7
punpcklbw %xmm7, %xmm0
punpcklbw %xmm1, %xmm2
punpcklbw %xmm3, %xmm5
punpcklbw %xmm2, %xmm5
movd -12(%rsp), %xmm1
movd -44(%rsp), %xmm2
punpcklbw %xmm1, %xmm2
movd -28(%rsp), %xmm1
movd -60(%rsp), %xmm3
punpcklbw %xmm1, %xmm3
punpcklbw %xmm2, %xmm3
punpcklbw %xmm3, %xmm5
punpcklbw %xmm5, %xmm0
movdqu %xmm0, (%rax)
retq
and now it is compiled into 10 instructions:
movdqa LCPI0_0(%rip), %xmm4
pand %xmm4, %xmm3
pand %xmm4, %xmm2
packuswb %xmm3, %xmm2
pand %xmm4, %xmm1
pand %xmm4, %xmm0
packuswb %xmm1, %xmm0
packuswb %xmm2, %xmm0
movdqu %xmm0, (%rax)
retq
which saves 22 instructions (many of them are memops).
http://reviews.llvm.org/D14588
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/vector-trunc.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D14588.39971.patch
Type: text/x-patch
Size: 8907 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151111/1bc0cbdc/attachment.bin>
More information about the llvm-commits
mailing list