[PATCH] D14730: [Aarch64] Add zero cost for extensions that may be eliminated.

Mon Nov 16 14:17:26 PST 2015

mssimpso created this revision.
mssimpso added reviewers: sbaranga, rengolin, jmolloy, mcrosier.
mssimpso added a subscriber: llvm-commits.
Herald added subscribers: rengolin, aemerson.

Many Aarch64 vector instructions have lengthening and widening variants (e.g.,
usubl, usubl2, usubw, usubw2, etc.). For certain widths, these instructions
automatically extend their operands. The cost model should be able to take this
information into account and report a cost of zero for vector extensions that
can likely be folded away.

For a concrete example, consider the contrived IR shown below after
vectorization.

 define <8 x i32> @f(<8 x i16>* %a, <8 x i16>* %b) {
 entry:
   %opa = load <8 x i16>, <8 x i16>* %a
   %opb = load <8 x i16>, <8 x i16>* %b
   %zea = zext <8 x i16> %opa to <8 x i32>
   %zeb = zext <8 x i16> %opb to <8 x i32>
   %res  = sub nsw <8 x i32> %zea, %zeb
   ret <8 x i32> %res
 }

Note that the subtraction cannot be performed in i16 due to possible overflow.
However, the Aarch64 backend eliminates the zero extensions and generates the
following code.

 ldr     q0, [x0]
 ldr     q2, [x1]
 usubl2  v1.4s, v0.8h, v2.8h
 usubl   v0.4s, v0.4h, v2.4h
 ret

This change is motivated by forthcoming improvements to SLP. Unfortunately,
vectorization of similar cases is currently deemed unprofitable due to the high
cost of the zero extensions, even though they will eventually be eliminated.
For example, SLP calcuates a cost of 36 for zext <8 x i16>: 44 (vector cost) -
8 (scalar cost).

This patch reports a cost of zero for vector extensions having types that could
potentially be folded into lengthening and widening instructions. For these
cases, the width of the destination type must be twice that of the source type.

Performance analysis for spec2000 and spec2006 on a Cortex-A57-like device show
minimal impact. A binary diff reveals only two modified benchmarks; their
performance impact is show below (a positive difference indicates improvement),
though the results are likely within expected noise.

 Benchmark             Diff (%)
 ----------------    ----------
 spec2006/gcc         +0.299015
 spec2006/h264ref     +0.301558

http://reviews.llvm.org/D14730

Files:
  lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Index: lib/Target/AArch64/AArch64TargetTransformInfo.cpp
===================================================================

--- lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -188,12 +188,25 @@
 
   static const TypeConversionCostTblEntry
   ConversionTbl[] = {
-    { ISD::SIGN_EXTEND, MVT::v4i32, MVT::v4i16, 0 },
-    { ISD::ZERO_EXTEND, MVT::v4i32, MVT::v4i16, 0 },
-    { ISD::SIGN_EXTEND, MVT::v2i64, MVT::v2i32, 1 },
-    { ISD::ZERO_EXTEND, MVT::v2i64, MVT::v2i32, 1 },
-    { ISD::TRUNCATE,    MVT::v4i32, MVT::v4i64, 0 },
-    { ISD::TRUNCATE,    MVT::v4i16, MVT::v4i32, 1 },
+    { ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 },
+    { ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },
+
+    // Lengthening and widening instructions (L/W variants), including those
+    // with the second half specifier (2 suffix), perform extensions
+    // automatically. Since many operations have L/W variants, let's
+    // optimistically assume the following casts will always be folded away.
+    { ISD::ZERO_EXTEND, MVT::v8i16,  MVT::v8i8,  0 },
+    { ISD::ZERO_EXTEND, MVT::v4i32,  MVT::v4i16, 0 },
+    { ISD::ZERO_EXTEND, MVT::v2i64,  MVT::v2i32, 0 },
+    { ISD::ZERO_EXTEND, MVT::v16i16, MVT::v16i8, 0 },
+    { ISD::ZERO_EXTEND, MVT::v8i32,  MVT::v8i16, 0 },
+    { ISD::ZERO_EXTEND, MVT::v4i64,  MVT::v4i32, 0 },
+    { ISD::SIGN_EXTEND, MVT::v8i16,  MVT::v8i8,  0 },
+    { ISD::SIGN_EXTEND, MVT::v4i32,  MVT::v4i16, 0 },
+    { ISD::SIGN_EXTEND, MVT::v2i64,  MVT::v2i32, 0 },
+    { ISD::SIGN_EXTEND, MVT::v16i16, MVT::v16i8, 0 },
+    { ISD::SIGN_EXTEND, MVT::v8i32,  MVT::v8i16, 0 },
+    { ISD::SIGN_EXTEND, MVT::v4i64,  MVT::v4i32, 0 },
 
     // The number of shll instructions for the extension.
     { ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i16, 3 },


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D14730.40341.patch
Type: text/x-patch
Size: 1822 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151116/050b38f1/attachment.bin>