[PATCH] D157628: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 22 05:16:57 PDT 2023
david-arm added a comment.
This looks really nice now @kmclaughlin! I just had a few more comments ...
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2060
+ // Check that the cast is doubling the source type.
+ if ((Src->getScalarSizeInBits() != Dst->getScalarSizeInBits() / 2) ||
+ ExtUser->getOpcode() != Instruction::Add || !ExtUser->hasOneUse())
----------------
Is it worth having a negative test for cases like i16->i64 that we don't support?
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2075
+
+ auto *Trunc = dyn_cast_or_null<Instruction>(Shr->getUniqueUndroppableUser());
+ if (!Trunc || Trunc->getOpcode() != Instruction::Trunc ||
----------------
Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have
%ld1 = load <vscale x 16 x i8>, ptr %gep1
%ld2 = load <vscale x 16 x i8>, ptr %gep2
%ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
%ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
%add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
%shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
store <vscale x 16 x i16> %trunc, ptr %a, align 16
ret <vscale x 16 x i8> %trunc
and you should still get
ld1b ...
ld1b ...
srhadd z0, ...
st1b z0 ...
ret
================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll:1
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s -check-prefix=SVE
+; RUN: opt -passes="print<cost-model>" 2>&1 -disable-output -mtriple aarch64-linux-gnu -mattr=+sve2 < %s | FileCheck %s --check-prefix=SVE2
----------------
Given this file now contains fixed-width vectors testing for NEON's urhadd and shade is it worth renaming the file to something like `ext-rhadd.ll`?
================
Comment at: llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll:15
+;
+ %gep1 = getelementptr inbounds i8, ptr %a, i64 %n
+ %gep2 = getelementptr inbounds i8, ptr %b, i64 %n
----------------
I think you can simplify the tests here and remove the GEPs, then just pass the ptr directly to the load? That way you can also remove the `%n` argument too. For example,
%ld1 = load <16 x i8>, ptr %a
%ld2 = load <16 x i8>, ptr %b
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D157628/new/
https://reviews.llvm.org/D157628
More information about the llvm-commits
mailing list