[PATCH] D114174: [ARM][CodeGen] Add support for complex addition and multiplication

Nicholas Guy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 8 02:23:27 PST 2022


NickGuy added a comment.

> The supportsComplexArithmetic method doesn't add a lot on its own. It would probably be better to represent it in terms of asking the backend if it has certain patterns of certain types available. An f32 fcadd, for example, or a i16 fcmul_with_rotate (I'm not sure what to name them).
> The ComplexArithmeticGraph seems like internal details to the pass. It would be best not to pass it over the TTI interface boundary if we can. The backend just needs to create the correct intrinsics of the correct type.
> Equally I would keep all the matching logic in the pass, not through matchComplexArithmeticIR. That would be similar to, for example, the MachineCombiner pass which has its own enum of a set of patterns.

Unfortunately, not all architectures/intrinsics follow the same patterns, so this was done to provide a way for architecture-specific stuff to be handled as needed. The MVE halving complex add (which I only now realise was included in this patch) is a good example of this; It only works on integers, and requires some extra instructions to be matched. This instruction is only in MVE, it is not part of NEON in any way.
The intrinsics that are common are also constructed differently; While the NEON intrinsics are separated out by their rotations (`aarch64_neon_vcmla_rot0`, `aarch64_neon_vcmla_rot90`), MVE squishes them into one and determines the rotation based on a parameter.
ComplexArithmeticGraph is used to encapsulate all the data the TTI might need in order to correctly generate the right intrinsic. It could be replaced by something a bit lighter (and that doesn't expose the implementation detail of the node list), but something will be needed to provide this info.

> Both Arm MVE and AArch64 NEON (and Arm NEON) have fcmla instructions with rotate, two of which combined can perform a complex multiply. If we represent this in terms of the individual instructions that we can use, there is more potential for optimization. You can imagine other cases like adding 10 to the real part of a vector can be transformed to adding fadd [10, 0, 10, 0]. It may be possible to do some of the conjugates with a vrev and a vneg. That kind of thing is (I imagine) likely to come up in larger realistic examples or real math equations. Some of those can hopefully be added later, so long as we think it will be possible to expand. Currently we just seem to be pattern matching individual patterns? With the intention to expand this into a graph?

Agreed, it should be possible to add them later. Ideally it reaches a point when we can remove the idea of a complex multiply from this, and replace it with the specific operations.

> The InterleavedAccessPass (which this is very similar to) uses TLI, not TTI. It may be better to do the same thing here.

I can switch it to use TLI instead, if that's better suited for this job.

> Similarly is there a reason this is in Transforms and not CodeGen?

That was down to some build/link issues I was facing when getting the pass registered with the new pass manager. I don't have the specific error to hand, but it was due to unresolved references.

> The tests look good, but they seem to have a funny infinite loop in them. I don't think they need the loop (or the loads/stores, potentially, although they likely don't hurt anything).

The tests were taken from a cpp snippet, and put through llvm-reduce (testing that complex instructions were being generated, the null-tests had their types changed by hand afterwards), so the IR might have some odd control flow.

> All the tests are fast, but we will need to make sure this conforms to fp arithmatic correctly without that too.

+1, will add more conformance tests.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174



More information about the llvm-commits mailing list