[PATCH] D114174: [ARM][CodeGen] Add support for complex addition and multiplication

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 8 01:32:21 PST 2022


dmgreen added a comment.

This sounds like it should lead to some nice speedup. I have some high level comments:

- The supportsComplexArithmetic method doesn't add a lot on its own. It would probably be better to represent it in terms of asking the backend if it has certain patterns of certain types available. An f32 fcadd, for example, or a i16 fcmul_with_rotate (I'm not sure what to name them).
- The ComplexArithmeticGraph seems like internal details to the pass. It would be best not to pass it over the TTI interface boundary if we can. The backend just needs to create the correct intrinsics of the correct type.
- Equally I would keep all the matching logic in the pass, not through matchComplexArithmeticIR. That would be similar to, for example, the MachineCombiner pass which has its own enum of a set of patterns.
- Both Arm MVE and AArch64 NEON (and Arm NEON) have fcmla instructions with rotate, two of which combined can perform a complex multiply. If we represent this in terms of the individual instructions that we can use, there is more potential for optimization. You can imagine other cases like adding 10 to the real part of a vector can be transformed to adding fadd [10, 0, 10, 0]. It may be possible to do some of the conjugates with a vrev and a vneg. That kind of thing is (I imagine) likely to come up in larger realistic examples or real math equations. Some of those can hopefully be added later, so long as we think it will be possible to expand. Currently we just seem to be pattern matching individual patterns? With the intention to expand this into a graph?
- The InterleavedAccessPass (which this is very similar to) uses TLI, not TTI. It may be better to do the same thing here. Similarly is there a reason this is in Transforms and not CodeGen?
- The tests look good, but they seem to have a funny infinite loop in them. I don't think they need the loop (or the loads/stores, potentially, although they likely don't hurt anything).
- All the tests are fast, but we will need to make sure this conforms to fp arithmatic correctly without that too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174



More information about the llvm-commits mailing list