[PATCH] D147451: [CodeGen] Enable AArch64 SVE FCMLA/FCADD instruction generation in ComplexDeinterleaving
Nicholas Guy via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 4 08:51:47 PDT 2023
NickGuy added inline comments.
================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:260-261
+ NodePtr identifyRoot(Instruction *I);
+
+ NodePtr identifyDeinterleave(Instruction *I, Instruction *J);
+
----------------
Might be worth adding a comment here saying what a "Deinterleave" is; It's not immediately clear that a deinterleave is either a specific intrinsic, or a shufflevector instruction.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24473-24476
- if ((VTyWidth < 128 && VTyWidth != 64) || !llvm::isPowerOf2_32(VTyWidth))
+ if ((VTyWidth < 128 && (VTy->isScalableTy() || VTyWidth != 64)) ||
+ !llvm::isPowerOf2_32(VTyWidth))
return false;
return (ScalarTy->isHalfTy() && Subtarget->hasFullFP16()) ||
----------------
When working with scalable vectors, they don't have the same restriction of bit width. Treating them with a max width of 128 bits seems wasteful and inefficient, is there any way to get the vector width at compile time (is there a `target->getMaxVectorWidth()` or something)?
================
Comment at: llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll:102-107
+; Expected to not transform
+define <4 x half> @complex_add_v4f16_with_intrinsic(<4 x half> %a, <4 x half> %b) {
+; CHECK-LABEL: complex_add_v4f16_with_intrinsic:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: fcadd v0.4h, v1.4h, v0.4h, #90
+; CHECK-NEXT: ret
----------------
Is the top comment correct? Is this case actually expected to not transform?
Given the `fcadd` in the output, I'd assume that this is expected to transform.
================
Comment at: llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-add.ll:101
}
+
+; Expected to not transform
----------------
igor.kirillov wrote:
> I am not sure if I should duplicate all fixed-width vector tests to reflect the fact that `llvm.experimental.vector.deinterleave2` is now supported there.
It can't hurt, more coverage is always good. On the other hand, this is probably enough to test that the intrinsics work with fixed width cases.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147451/new/
https://reviews.llvm.org/D147451
More information about the llvm-commits
mailing list