[PATCH] D157628: [AArch64][SVE2] Change the cost of extends with S/URHADD to 0
Kerry McLaughlin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 22 08:47:54 PDT 2023
kmclaughlin added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2060
+ // Check that the cast is doubling the source type.
+ if ((Src->getScalarSizeInBits() != Dst->getScalarSizeInBits() / 2) ||
+ ExtUser->getOpcode() != Instruction::Add || !ExtUser->hasOneUse())
----------------
david-arm wrote:
> Is it worth having a negative test for cases like i16->i64 that we don't support?
When trying to add tests for this I realised that we will still generate s/urhadd instructions when the extend is not doubling. For example, extending from i16->i64:
```
ptrue p0.h
ld1h { z0.h }, p0/z, [x0]
ld1h { z1.h }, p0/z, [x1]
srhadd z0.h, p0/m, z0.h, z1.h
```
Instead I've replaced this with a check that the source type is legal and added some tests for cases such as the above.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:2075
+
+ auto *Trunc = dyn_cast_or_null<Instruction>(Shr->getUniqueUndroppableUser());
+ if (!Trunc || Trunc->getOpcode() != Instruction::Trunc ||
----------------
david-arm wrote:
> Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have
>
> %ld1 = load <vscale x 16 x i8>, ptr %gep1
> %ld2 = load <vscale x 16 x i8>, ptr %gep2
> %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
> %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
> %add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
> %add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
> %shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
> %trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
> store <vscale x 16 x i16> %trunc, ptr %a, align 16
> ret <vscale x 16 x i8> %trunc
>
> and you should still get
>
> ld1b ...
> ld1b ...
> srhadd z0, ...
> st1b z0 ...
> ret
>
>
This is checking that the `LShr` instruction only has one user, then the checks below make sure that this is a truncate. There is no check that the truncate has only one use, so the extends will still be treated as free if the rest of the pattern matches.
I've amended the tests to add a store of the final truncate so that this is tested.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D157628/new/
https://reviews.llvm.org/D157628
More information about the llvm-commits
mailing list