[PATCH] D138203: [AArch64][InstCombine] Simplify repeated complex patterns in dupqlane
Matt Devereau via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 9 07:30:55 PST 2023
MattDevereau added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1468-1469
+ IntrinsicInst &II) {
+ if (II.getType()->isIntOrIntVectorTy())
+ return std::nullopt;
+
----------------
sdesmalen wrote:
> Why is this optimisation limited to integer types?
Oops, I ran into problems with integer types on the first pass I did on this patch but I can see the codegen is quite nice now with a single fmov:
```
fmov s0, w0
mov v0.h[1], w1
mov v0.h[2], w2
mov v0.h[3], w3
mov z0.d, d0
ret
```
for example with `<a, b, c d>` as i16s. I'll remove this.
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1471-1472
+
+ auto VecInsert = dyn_cast<IntrinsicInst>(II.getOperand(0));
+ if (!VecInsert || VecInsert->getIntrinsicID() != Intrinsic::vector_insert ||
+ !isa<FixedVectorType>(VecInsert->getOperand(1)->getType()))
----------------
sdesmalen wrote:
> It might be easier to write this as:
>
> Value *CurrentInsertElt = nullptr, *Default = nullptr;
> if (!match(II, m_Intrinsic<Intrinsic::vector_insert>(m_Value(Default), m_Value(CurrentInsertElt), m_Value()) ||
> !isa<FixedVectorType(CurrentInsertElt->getType())
>
> as that matches the operands in one go.
That looks a bit nicer, done
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1481
+ SmallVector<Value *> InsertEltVec(NumElements, nullptr);
+ for (uint64_t I = NumElements; I > 0; I--) {
+ InsertElementInst *InsertElt =
----------------
sdesmalen wrote:
> Does this need to be a decrementing loop? Or could you write:
>
> while (InsertElementInst *InsertElt =
> dyn_cast<InsertElementInst>(CurrentInsertElt)) {
> auto Idx = cast<ConstantInt>(InsertElt->getOperand(2));
> InsertEltVec[Idx->getValue().getZExtValue()] = InsertElt->getOperand(1);
> CurrentInsertElt = InsertElt->getOperand(0);
> }
It doesn't need to be a decrementing loop. This works well too so i've put it in
================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1491
+
+ bool AllowPoison = isa<PoisonValue>(VecInsert->getOperand(0));
+ // Bail out if there is no pattern found
----------------
sdesmalen wrote:
> You'll also need to check the lanes of the original vector being inserted into:
>
> %1 = insertelement <4 x float> poison, float %x, i64 0 ; <--- this poison value must be checked instead of the llvm.vector.insert one.
> %2 = insertelement <4 x float> %1, float %y, i64 1
> %3 = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v4f32(<vscale x 4 x float> poison, <4 x float> %2, i64 0)
> %4 = tail call <vscale x 4 x float> @llvm.aarch64.sve.dupq.lane.nxv4f32(<vscale x 4 x float> %3, i64 0)
>
> However, you'll also need to check the default value of the llvm.vector.insert for this case:
>
> %1 = insertelement <2 x float> poison, float %x, i64 0
> %2 = insertelement <2 x float> %1, float %y, i64 1
> %3 = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v2f32(<vscale x 4 x float> poison, <2 x float> %2, i64 0) ; the full subvector is defined, but not all lanes in the subvector that's being dupq'ed.
> %4 = tail call <vscale x 4 x float> @llvm.aarch64.sve.dupq.lane.nxv4f32(<vscale x 4 x float> %3, i64 0)
We discussed outside the review that this would be ideally done in a follow-up patch.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D138203/new/
https://reviews.llvm.org/D138203
More information about the llvm-commits
mailing list