[PATCH] D157628: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0

Tue Aug 22 05:16:57 PDT 2023

david-arm added a comment.

This looks really nice now @kmclaughlin! I just had a few more comments ...

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2060
+  // Check that the cast is doubling the source type.
+  if ((Src->getScalarSizeInBits() != Dst->getScalarSizeInBits() / 2) ||
+      ExtUser->getOpcode() != Instruction::Add || !ExtUser->hasOneUse())
----------------
Is it worth having a negative test for cases like i16->i64 that we don't support?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2075
+
+  auto *Trunc = dyn_cast_or_null<Instruction>(Shr->getUniqueUndroppableUser());
+  if (!Trunc || Trunc->getOpcode() != Instruction::Trunc ||
----------------
Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have

  %ld1 = load <vscale x 16 x i8>, ptr %gep1
  %ld2 = load <vscale x 16 x i8>, ptr %gep2
  %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
  %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
  %add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
  %add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
  %shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
  %trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
  store <vscale x 16 x i16> %trunc, ptr %a, align 16
  ret <vscale x 16 x i8> %trunc

and you should still get

  ld1b ...
  ld1b ...
  srhadd z0, ...
  st1b z0 ...
  ret

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll:1
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s -check-prefix=SVE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s --check-prefix=SVE2
----------------
Given this file now contains fixed-width vectors testing for NEON's urhadd and shade is it worth renaming the file to something like `ext-rhadd.ll`?

================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll:15
+;
+  %gep1 = getelementptr inbounds i8, ptr %a, i64 %n
+  %gep2 = getelementptr inbounds i8, ptr %b, i64 %n
----------------
I think you can simplify the tests here and remove the GEPs, then just pass the ptr directly to the load? That way you can also remove the `%n` argument too. For example,

  %ld1 = load <16 x i8>, ptr %a
  %ld2 = load <16 x i8>, ptr %b

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157628/new/

https://reviews.llvm.org/D157628