[PATCH] D138203: [AArch64][InstCombine] Simplify repeated complex patterns in dupqlane

Thu Dec 15 08:33:43 PST 2022

sdesmalen added a comment.

Could you maybe update the commit message, it seems to have the title twice :)

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1390
+  auto VecTy = cast<ScalableVectorType>(II.getType());
+  Type *VecElTy = VecTy->getScalarType();
+
----------------
nit: VecTy->getScalarType() can be propagated into the few uses below.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1400-1402
+  auto VecInsert = dyn_cast<IntrinsicInst>(II.getOperand(0));
+  if (!VecInsert || VecInsert->getIntrinsicID() != Intrinsic::vector_insert)
+    return None;
----------------
nit: This can be moved to the start of the function, so that it bails out immediately if the conditions are not as expected.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1401
+  auto VecInsert = dyn_cast<IntrinsicInst>(II.getOperand(0));
+  if (!VecInsert || VecInsert->getIntrinsicID() != Intrinsic::vector_insert)
+    return None;
----------------
Should it also bail out if the vector being inserted is not a FixedLengthVector?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1408
+  uint64_t NumElements = VecTy->getMinNumElements();
+  for (uint64_t i = NumElements; i > 0; i--) {
+    InsertElementInst *InsertElt = dyn_cast<InsertElementInst>(CurrentValue);
----------------
nit: LLVM's coding style uses capitalised variable names, so `I` in this case.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1408
+  uint64_t NumElements = VecTy->getMinNumElements();
+  for (uint64_t i = NumElements; i > 0; i--) {
+    InsertElementInst *InsertElt = dyn_cast<InsertElementInst>(CurrentValue);
----------------
sdesmalen wrote:
> nit: LLVM's coding style uses capitalised variable names, so `I` in this case.
Does this algorithm work if the vector being inserted is not a power-of-two vector? (I guess when it is inserting into an 'undef' vector, you could still consider the other values to be anything you like and the algorithm may still work).

Could you also add a test-case for this?

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1425-1428
+  // Check for the pattern (y, x, y, x) or (y, x, y, x, y, x, y, x)
+  for (uint64_t i = 0; i < NumElements - 2; i++)
+    if (RSequence[i] != RSequence[i + 2])
+      return None;
----------------
This is a little bit restrictive, because it could also work for e.g.

<16 x i8>  <a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d>

where only `<a, b, c, d>` would need to be splat.

It might help instead to find the 'minimum' set by recursively halving the vector and seeing if all elements match. e.g.

  <a, b, a, b, a, b, a, b>
  => <a, b, a, b> == <a, b, a, b>
  => <a, b> == <a, b>

so that the minimum set to splat is `<a, b>`

================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:86
+
+define dso_local <vscale x 8 x half> @dupq_f16_complex_missing_middle(half %a, half %b) {
+; CHECK-LABEL: @dupq_f16_complex_missing_middle(
----------------
This case could still work right? If the two elements that are missing are both undef, they could be anything including `a, b`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138203/new/

https://reviews.llvm.org/D138203