[PATCH] D138203: [AArch64][InstCombine] Simplify repeated complex patterns in dupqlane

Mon Jan 9 07:30:55 PST 2023

MattDevereau added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1468-1469
+                                                           IntrinsicInst &II) {
+  if (II.getType()->isIntOrIntVectorTy())
+    return std::nullopt;
+
----------------
sdesmalen wrote:
> Why is this optimisation limited to integer types?
Oops, I ran into problems with integer types on the first pass I did on this patch but I can see the codegen is quite nice now with a single fmov:

```
fmov	s0, w0
mov	v0.h[1], w1
mov	v0.h[2], w2
mov	v0.h[3], w3
mov	z0.d, d0
ret
```
for example with `<a, b, c d>` as i16s. I'll remove this.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1471-1472
+
+  auto VecInsert = dyn_cast<IntrinsicInst>(II.getOperand(0));
+  if (!VecInsert || VecInsert->getIntrinsicID() != Intrinsic::vector_insert ||
+      !isa<FixedVectorType>(VecInsert->getOperand(1)->getType()))
----------------
sdesmalen wrote:
> It might be easier to write this as:
> 
>   Value *CurrentInsertElt = nullptr, *Default = nullptr;
>   if (!match(II, m_Intrinsic<Intrinsic::vector_insert>(m_Value(Default), m_Value(CurrentInsertElt), m_Value()) ||
>       !isa<FixedVectorType(CurrentInsertElt->getType())
> 
> as that matches the operands in one go.
That looks a bit nicer, done

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1481
+  SmallVector<Value *> InsertEltVec(NumElements, nullptr);
+  for (uint64_t I = NumElements; I > 0; I--) {
+    InsertElementInst *InsertElt =
----------------
sdesmalen wrote:
> Does this need to be a decrementing loop? Or could you write:
> 
>   while (InsertElementInst *InsertElt =
>           dyn_cast<InsertElementInst>(CurrentInsertElt)) {
>       auto Idx = cast<ConstantInt>(InsertElt->getOperand(2));
>       InsertEltVec[Idx->getValue().getZExtValue()] = InsertElt->getOperand(1);
>       CurrentInsertElt = InsertElt->getOperand(0);
>   }
It doesn't need to be a decrementing loop. This works well too so i've put it in

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1491
+
+  bool AllowPoison = isa<PoisonValue>(VecInsert->getOperand(0));
+  // Bail out if there is no pattern found
----------------
sdesmalen wrote:
> You'll also need to check the  lanes of the original vector being inserted into:
> 
>   %1 = insertelement <4 x float> poison, float %x, i64 0   ; <--- this poison value must be checked instead of the llvm.vector.insert one.
>   %2 = insertelement <4 x float> %1, float %y, i64 1
>   %3 = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v4f32(<vscale x 4 x float> poison, <4 x float> %2, i64 0)
>   %4 = tail call <vscale x 4 x float> @llvm.aarch64.sve.dupq.lane.nxv4f32(<vscale x 4 x float> %3, i64 0)
> 
> However, you'll also need to check the default value of the llvm.vector.insert for this case:
> 
>   %1 = insertelement <2 x float> poison, float %x, i64 0
>   %2 = insertelement <2 x float> %1, float %y, i64 1
>   %3 = tail call <vscale x 4 x float> @llvm.vector.insert.nxv4f32.v2f32(<vscale x 4 x float> poison, <2 x float> %2, i64 0) ; the full subvector is defined, but not all lanes in the subvector that's being dupq'ed.
>   %4 = tail call <vscale x 4 x float> @llvm.aarch64.sve.dupq.lane.nxv4f32(<vscale x 4 x float> %3, i64 0)
We discussed outside the review that this would be ideally done in a follow-up patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138203/new/

https://reviews.llvm.org/D138203