[PATCH] D19310: X86 TRUNCATE (v16i32 to v16i8) cost change in SSE4.1 mode

Ashutosh Nema via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 19 23:27:51 PDT 2016


ashutosh.nema created this revision.
ashutosh.nema added reviewers: hfinkel, congh, dexonsmith, davidxl, RKSimon.
ashutosh.nema added a subscriber: llvm-commits.
ashutosh.nema set the repository for this revision to rL LLVM.

Below patch transforms truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine.
http://reviews.llvm.org/D14588

This change optimizes the code generation for below code by saving 22 instructions.
define void @truncate_v16i32_to_v16i8(<16 x i32> %a) {
  %1 = trunc <16 x i32> %a to <16 x i8>
  store <16 x i8> %1, <16 x i8>* undef, align 4
  ret void
}

With the mentioned patch we generating better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table.
Whereas this change is also applicable for SSE4.1, so we should change the cost of truncate for that as well.

Prior to the mentioned patch in SSE4.1 we used to generate below code:
  1         pextrb  $4, %xmm0, %eax
  2         pextrb  $8, %xmm0, %ecx
  3         pextrb  $12, %xmm0, %edx
  4         pinsrb  $1, %eax, %xmm0
  5         pinsrb  $2, %ecx, %xmm0
  6         pinsrb  $3, %edx, %xmm0
  7         pextrb  $0, %xmm1, %eax
  8         pinsrb  $4, %eax, %xmm0
  9         pextrb  $4, %xmm1, %eax
 10         pinsrb  $5, %eax, %xmm0
 11         pextrb  $8, %xmm1, %eax
 12         pinsrb  $6, %eax, %xmm0
 13         pextrb  $12, %xmm1, %eax
 14         pinsrb  $7, %eax, %xmm0
 15         pextrb  $0, %xmm2, %eax
 16         pinsrb  $8, %eax, %xmm0
 17         pextrb  $4, %xmm2, %eax
 18         pinsrb  $9, %eax, %xmm0
 19         pextrb  $8, %xmm2, %eax
 20         pinsrb  $10, %eax, %xmm0
 21         pextrb  $12, %xmm2, %eax
 22         pinsrb  $11, %eax, %xmm0
 23         pextrb  $0, %xmm3, %eax
 24         pinsrb  $12, %eax, %xmm0
 25         pextrb  $4, %xmm3, %eax
 26         pinsrb  $13, %eax, %xmm0
 27         pextrb  $8, %xmm3, %eax
 28         pinsrb  $14, %eax, %xmm0
 29         pextrb  $12, %xmm3, %eax
 30         pinsrb  $15, %eax, %xmm0
 31         movdqu  %xmm0, (%rax)
 32         retq

But after that we started generating better code:
  1         movdqa  .LCPI0_0(%rip), %xmm4   # xmm4 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
  2         pand    %xmm4, %xmm3
  3         pand    %xmm4, %xmm2
  4         packuswb        %xmm3, %xmm2
  5         pand    %xmm4, %xmm1
  6         pand    %xmm4, %xmm0
  7         packuswb        %xmm1, %xmm0
  8         packuswb        %xmm2, %xmm0
  9         movdqu  %xmm0, (%rax)
 10         retq

Proposing change to reduce the cost of “TRUNCATE v16i32 to v16i8” from 30 to 7 in SSE4.1 table.
This change will enable better vectorization as “TRUNCATE v16i32 to v16i8” is not very expensive now.


Repository:
  rL LLVM

http://reviews.llvm.org/D19310

Files:
  lib/Target/X86/X86TargetTransformInfo.cpp
  test/Analysis/CostModel/X86/sse-itoi.ll

Index: test/Analysis/CostModel/X86/sse-itoi.ll
===================================================================
--- test/Analysis/CostModel/X86/sse-itoi.ll
+++ test/Analysis/CostModel/X86/sse-itoi.ll
@@ -279,7 +279,7 @@
 ; SSE2: cost of 7 {{.*}} trunc
 ;
 ; SSE41: truncate_v16i32_to_v16i8
-; SSE41: cost of 30 {{.*}} trunc
+; SSE41: cost of 7 {{.*}} trunc
 ;
   %1 = load <16 x i32>, <16 x i32>* %a
   %2 = trunc <16 x i32> %1 to <16 x i8>
Index: lib/Target/X86/X86TargetTransformInfo.cpp
===================================================================
--- lib/Target/X86/X86TargetTransformInfo.cpp
+++ lib/Target/X86/X86TargetTransformInfo.cpp
@@ -731,7 +731,7 @@
     { ISD::TRUNCATE,    MVT::v16i16, MVT::v16i32, 6 },
     { ISD::TRUNCATE,    MVT::v8i16,  MVT::v8i32,  3 },
     { ISD::TRUNCATE,    MVT::v4i16,  MVT::v4i32,  1 },
-    { ISD::TRUNCATE,    MVT::v16i8,  MVT::v16i32, 30 },
+    { ISD::TRUNCATE,    MVT::v16i8,  MVT::v16i32, 7 },
     { ISD::TRUNCATE,    MVT::v8i8,   MVT::v8i32,  3 },
     { ISD::TRUNCATE,    MVT::v4i8,   MVT::v4i32,  1 },
     { ISD::TRUNCATE,    MVT::v16i8,  MVT::v16i16, 3 },


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D19310.54321.patch
Type: text/x-patch
Size: 1122 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160420/fc8cf829/attachment.bin>


More information about the llvm-commits mailing list