[PATCH] D148068: [AArch64] Lower fused complex multiply-add intrinsic to AArch64::FCMA

Wed Apr 12 23:43:37 PDT 2023

nujaa added a comment.

In D148068#4260921 <https://reviews.llvm.org/D148068#4260921>, @NickGuy wrote:

> Not sure I agree with having a high-level complex intrinsic (though if done right, I'm not completely against the idea); It locks the IR into the concept of a complex multiply, rather than the individual instructions that make it up. This could result in lower net performance as other optimisation passes aren't able to see how the intrinsic functions internally, meaning they can't apply their optimisations.

For performances, In our use case of BLAS libraries, we manage to reach better performance than hand optimised assembly on caxpy, cgemv and cgemm.

> " But where do you handle targets that don't support complex instructions? I appreciate that nothing should be generating FCMAs yet.

I indeed have not added the support for other architectures / architectures not supporting complex multiply-accumulate for this exact reason. For now, there are no pattern matching generating this intrinsic. This will be required before pushing the MLIR side generating them.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148068/new/

https://reviews.llvm.org/D148068