[PATCH] D147451: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving

Tue Apr 4 08:51:47 PDT 2023

NickGuy added inline comments.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:260-261
+  NodePtr identifyRoot(Instruction *I);
+
+  NodePtr identifyDeinterleave(Instruction *I, Instruction *J);
+
----------------
Might be worth adding a comment here saying what a "Deinterleave" is; It's not immediately clear that a deinterleave is either a specific intrinsic, or a shufflevector instruction.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24473-24476
-  if ((VTyWidth < 128 && VTyWidth != 64) || !llvm::isPowerOf2_32(VTyWidth))
+  if ((VTyWidth < 128 && (VTy->isScalableTy() || VTyWidth != 64)) ||
+      !llvm::isPowerOf2_32(VTyWidth))
     return false;

   return (ScalarTy->isHalfTy() && Subtarget->hasFullFP16()) ||
----------------
When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a `target->getMaxVectorWidth()` or something)?

================
Comment at: llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll:102-107
+; Expected to not transform
+define <4 x half> @complex_add_v4f16_with_intrinsic(<4 x half> %a, <4 x half> %b) {
+; CHECK-LABEL: complex_add_v4f16_with_intrinsic:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    fcadd v0.4h, v1.4h, v0.4h, #90
+; CHECK-NEXT:    ret
----------------
Is the top comment correct? Is this case actually expected to not transform?
Given the `fcadd` in the output, I'd assume that this is expected to transform.

================
Comment at: llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll:101
 }
+
+; Expected to not transform
----------------
igor.kirillov wrote:
> I am not sure if I should duplicate all fixed-width vector tests to reflect the fact that `llvm.experimental.vector.deinterleave2` is now supported there.
It can't hurt, more coverage is always good. On the other hand, this is probably enough to test that the intrinsics work with fixed width cases.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147451/new/

https://reviews.llvm.org/D147451