[PATCH] D19310: X86 TRUNCATE (v16i32 to v16i8) cost change in SSE4.1 mode
Ashutosh Nema via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 19 23:27:51 PDT 2016
ashutosh.nema created this revision.
ashutosh.nema added reviewers: hfinkel, congh, dexonsmith, davidxl, RKSimon.
ashutosh.nema added a subscriber: llvm-commits.
ashutosh.nema set the repository for this revision to rL LLVM.
Below patch transforms truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.
http://reviews.llvm.org/D14588
This change optimizes the code generation for below code by saving 22 instructions.
define void @truncate_v16i32_to_v16i8(<16 x i32> %a) {
%1 = trunc <16 x i32> %a to <16 x i8>
store <16 x i8> %1, <16 x i8>* undef, align 4
ret void
}
With the mentioned patch we generating better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table.
Whereas this change is also applicable for SSE4.1, so we should change the cost of truncate for that as well.
Prior to the mentioned patch in SSE4.1 we used to generate below code:
1 pextrb $4, %xmm0, %eax
2 pextrb $8, %xmm0, %ecx
3 pextrb $12, %xmm0, %edx
4 pinsrb $1, %eax, %xmm0
5 pinsrb $2, %ecx, %xmm0
6 pinsrb $3, %edx, %xmm0
7 pextrb $0, %xmm1, %eax
8 pinsrb $4, %eax, %xmm0
9 pextrb $4, %xmm1, %eax
10 pinsrb $5, %eax, %xmm0
11 pextrb $8, %xmm1, %eax
12 pinsrb $6, %eax, %xmm0
13 pextrb $12, %xmm1, %eax
14 pinsrb $7, %eax, %xmm0
15 pextrb $0, %xmm2, %eax
16 pinsrb $8, %eax, %xmm0
17 pextrb $4, %xmm2, %eax
18 pinsrb $9, %eax, %xmm0
19 pextrb $8, %xmm2, %eax
20 pinsrb $10, %eax, %xmm0
21 pextrb $12, %xmm2, %eax
22 pinsrb $11, %eax, %xmm0
23 pextrb $0, %xmm3, %eax
24 pinsrb $12, %eax, %xmm0
25 pextrb $4, %xmm3, %eax
26 pinsrb $13, %eax, %xmm0
27 pextrb $8, %xmm3, %eax
28 pinsrb $14, %eax, %xmm0
29 pextrb $12, %xmm3, %eax
30 pinsrb $15, %eax, %xmm0
31 movdqu %xmm0, (%rax)
32 retq
But after that we started generating better code:
1 movdqa .LCPI0_0(%rip), %xmm4 # xmm4 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
2 pand %xmm4, %xmm3
3 pand %xmm4, %xmm2
4 packuswb %xmm3, %xmm2
5 pand %xmm4, %xmm1
6 pand %xmm4, %xmm0
7 packuswb %xmm1, %xmm0
8 packuswb %xmm2, %xmm0
9 movdqu %xmm0, (%rax)
10 retq
Proposing change to reduce the cost of “TRUNCATE v16i32 to v16i8” from 30 to 7 in SSE4.1 table.
This change will enable better vectorization as “TRUNCATE v16i32 to v16i8” is not very expensive now.
Repository:
rL LLVM
http://reviews.llvm.org/D19310
Files:
lib/Target/X86/X86TargetTransformInfo.cpp
test/Analysis/CostModel/X86/sse-itoi.ll
Index: test/Analysis/CostModel/X86/sse-itoi.ll
===================================================================
--- test/Analysis/CostModel/X86/sse-itoi.ll
+++ test/Analysis/CostModel/X86/sse-itoi.ll
@@ -279,7 +279,7 @@
; SSE2: cost of 7 {{.*}} trunc
;
; SSE41: truncate_v16i32_to_v16i8
-; SSE41: cost of 30 {{.*}} trunc
+; SSE41: cost of 7 {{.*}} trunc
;
%1 = load <16 x i32>, <16 x i32>* %a
%2 = trunc <16 x i32> %1 to <16 x i8>
Index: lib/Target/X86/X86TargetTransformInfo.cpp
===================================================================
--- lib/Target/X86/X86TargetTransformInfo.cpp
+++ lib/Target/X86/X86TargetTransformInfo.cpp
@@ -731,7 +731,7 @@
{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 6 },
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 3 },
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },
- { ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 30 },
+ { ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 7 },
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },
{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 1 },
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 3 },
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D19310.54321.patch
Type: text/x-patch
Size: 1122 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160420/fc8cf829/attachment.bin>
More information about the llvm-commits
mailing list