[PATCH] D148068: [AArch64] Lower fused complex multiply-add intrinsic to AArch64::FCMA

Wed Apr 12 03:49:01 PDT 2023

NickGuy added a comment.

Not sure I agree with having a high-level complex intrinsic (though if done right, I'm not completely against the idea); It locks the IR into the concept of a complex multiply, rather than the individual instructions that make it up. This could result in lower net performance as other optimisation passes aren't able to see how the intrinsic functions internally, meaning they can't apply their optimisations. 
I don't think having a common intrinsic that maps to only one or two backends is worth supporting, especially as mentioned by Igor, llvm/lib/CodeGen/ComplexDeinterleaving.cpp handles this stuff on a per-target basis already.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:5778-5789
+  case ISD::FCMA: {
+    SDLoc dl(Op);
+    SDValue vcmla0 =
+        DAG.getTargetConstant(Intrinsic::aarch64_neon_vcmla_rot0, dl, MVT::i64);
+    SDValue vcmla90 = DAG.getTargetConstant(Intrinsic::aarch64_neon_vcmla_rot90,
+                                            dl, MVT::i64);
+    SDValue Part1 =
----------------
I may be missing it; But where do you handle targets that don't support complex instructions? I appreciate that nothing should be generating FCMAs yet, but if something matches a complex multiply-accumulate and emits one on a target that doesn't support it, the fcma should ideally be transformed to the equivalent fp operations (`shuffle(add(mul, mul), sub(mul, mul))`)

Also, this should probably be moved to its own function, if for nothing else than to match the style of the rest of the switch.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148068/new/

https://reviews.llvm.org/D148068