[llvm] f2a92db - [AArch64] Don't treat SVE scalable extends as free widening instructions

Wed Nov 30 05:09:53 PST 2022

Author: David Green
Date: 2022-11-30T13:09:48Z
New Revision: f2a92db29eb7519a5eef8792b9c8622aa17e5853

URL: https://github.com/llvm/llvm-project/commit/f2a92db29eb7519a5eef8792b9c8622aa17e5853
DIFF: https://github.com/llvm/llvm-project/commit/f2a92db29eb7519a5eef8792b9c8622aa17e5853.diff

LOG: [AArch64] Don't treat SVE scalable extends as free widening instructions

The logic in isWideningInstruction handles instructions like uaddw and
smull, where 'add(x, zext(y))' or 'mul(sext(x), sext(y))' can be
converted to single instructions, making the extends free. This doesn't
apply the same to SVE instructions though.
https://godbolt.org/z/695d3nhGd

(There are instructions like SMULLT/B, but they require top/bottom lane
interleaving. That is similar to MVE instructions, which required a
special pass to perform the lane interleaving).

This patch just bails out of the call to isWideningInstruction if the
vector is scalable, getting a more accurate cost.

Differential Revision: https://reviews.llvm.org/D138591

Added: 
    

Modified: 
    llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
    llvm/test/Analysis/CostModel/AArch64/sve-widening-instruction.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index a93d7c495a363..14f2b5b9d7dc0 100644

--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -1563,8 +1563,11 @@ bool AArch64TTIImpl::isWideningInstruction(Type *DstTy, unsigned Opcode,
   };
 
   // Exit early if DstTy is not a vector type whose elements are at least
-  // 16-bits wide.
-  if (!DstTy->isVectorTy() || DstTy->getScalarSizeInBits() < 16)
+  // 16-bits wide. SVE doesn't generally have the same set of instructions to
+  // perform an extend with the add/sub/mul. There are SMULLB style
+  // instructions, but they operate on top/bottom, requiring some sort of lane
+  // interleaving to be used with zext/sext.
+  if (!useNeonVector(DstTy) || DstTy->getScalarSizeInBits() < 16)
     return false;
 
   // Determine if the operation has a widening variant. We consider both the

diff  --git a/llvm/test/Analysis/CostModel/AArch64/sve-widening-instruction.ll b/llvm/test/Analysis/CostModel/AArch64/sve-widening-instruction.ll
index ba841dc3277dd..45b2c772499eb 100644
--- a/llvm/test/Analysis/CostModel/AArch64/sve-widening-instruction.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/sve-widening-instruction.ll
@@ -20,8 +20,8 @@ define <vscale x 4 x i32> @widening_nxv4i16(<vscale x 4 x i16> %in1, <vscale x 4
 
 define <vscale x 8 x i32> @widening_nxv8i16(<vscale x 8 x i16> %in1, <vscale x 8 x i16> %in2) {
 ; CHECK-LABEL: 'widening_nxv8i16'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %in1.ext = zext <vscale x 8 x i16> %in2 to <vscale x 8 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %in2.ext = zext <vscale x 8 x i16> %in2 to <vscale x 8 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %in1.ext = zext <vscale x 8 x i16> %in2 to <vscale x 8 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %in2.ext = zext <vscale x 8 x i16> %in2 to <vscale x 8 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %in.add = add <vscale x 8 x i32> %in1.ext, %in2.ext
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <vscale x 8 x i32> %in.add
 ;
@@ -33,8 +33,8 @@ define <vscale x 8 x i32> @widening_nxv8i16(<vscale x 8 x i16> %in1, <vscale x 8
 
 define <8 x i32> @widening_v8i16_svevl2(<8 x i16> %in1, <8 x i16> %in2) vscale_range(2,16) {
 ; CHECK-LABEL: 'widening_v8i16_svevl2'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %in1.ext = zext <8 x i16> %in2 to <8 x i32>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %in2.ext = zext <8 x i16> %in2 to <8 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %in1.ext = zext <8 x i16> %in2 to <8 x i32>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %in2.ext = zext <8 x i16> %in2 to <8 x i32>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %in.add = add <8 x i32> %in1.ext, %in2.ext
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i32> %in.add
 ;