[PATCH] D114174: [ARM][CodeGen] Add support for complex addition and multiplication

Wed Feb 9 02:16:26 PST 2022

dnsampaio added a comment.

Hi, thanks for working on this. I do like the direction this patch goes, by detecting complex operations at IR level. I do have some question though.

In D114174#3303926 <https://reviews.llvm.org/D114174#3303926>, @NickGuy wrote:

>> The supportsComplexArithmetic method doesn't add a lot on its own. It would probably be better to represent it in terms of asking the backend if it has certain patterns of certain types available. An f32 fcadd, for example, or a i16 fcmul_with_rotate (I'm not sure what to name them).
>> The ComplexArithmeticGraph seems like internal details to the pass. It would be best not to pass it over the TTI interface boundary if we can. The backend just needs to create the correct intrinsics of the correct type.
>> Equally I would keep all the matching logic in the pass, not through matchComplexArithmeticIR. That would be similar to, for example, the MachineCombiner pass which has its own enum of a set of patterns.
>
> Unfortunately, not all architectures/intrinsics follow the same patterns, so this was done to provide a way for architecture-specific stuff to be handled as needed. The MVE halving complex add (which I only now realise was included in this patch) is a good example of this; It only works on integers, and requires some extra instructions to be matched. This instruction is only in MVE, it is not part of NEON in any way.

I'm not exactly sure why to do target specific pattern matching here. We could simply add generic complex intrinsics and the different patterns could be matched at each archs ISEL, no? I do agree that it should check in the backend if it should generate a complex operation of a given vector type.

> The intrinsics that are common are also constructed differently; While the NEON intrinsics are separated out by their rotations (`aarch64_neon_vcmla_rot0`, `aarch64_neon_vcmla_rot90`), MVE squishes them into one and determines the rotation based on a parameter.
> ComplexArithmeticGraph is used to encapsulate all the data the TTI might need in order to correctly generate the right intrinsic. It could be replaced by something a bit lighter (and that doesn't expose the implementation detail of the node list), but something will be needed to provide this info.

These difference can't be detected at isel?

In a more global view, how do you plan to generate such input patterns detected here? Using C/C++ complex usually generates a structure with real/immediate values (https://godbolt.org/z/4snrjj4oG). And they can't be used as base value for the vectors extension.
Will there be a frontend patch linked to this, or you expect to use target specific builtins for it? If target specific builtins, those could generate the target intrinsics directly, no?

================
Comment at: llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:338-341
+  Value *createComplexArithmeticIR(ComplexArithmeticGraph &, Value *, Value *,
+                                   int &GeneratedIntrinsicCount) {
+    return nullptr;
+  }
----------------
Is there any strong reason to create target specific intrinsics instead of having generic intrinsics with polymorphism? Such as `llvm.complex.add.v2f32`?

================
Comment at: llvm/include/llvm/Transforms/Scalar/ComplexArithmetic.h:107
+  void addNode(Instruction *I, enum NodeType NodeType = NodeType::Unknown) {
+    if ((NodeType & Discover) == Discover) {
+      auto LeftType = getNodeType(cast<Instruction>(I->getOperand(0)));
----------------
nit. Are you expecting to have values in `NodeTypes` enum bigger than `Discover` ? If not, you don't need the and operation "& Discover" if you define that the underlying type of the enum is unsigned or unsigned short.

================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:2365
+    return nullptr;
+  auto *VTy = cast<FixedVectorType>(Ty);
+
----------------
nit. Add new line?

================
Comment at: llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp:2457
+  if (Rotation == 90) {
+
+    auto FSubPattern = m_FSub(m_Value(ShuffleBR), m_Value(ShuffleAI));
----------------
nit. extra space.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174