[PATCH] D114174: [ARM][CodeGen] Add support for complex addition and multiplication

Fri Apr 8 07:24:36 PDT 2022

NickGuy added inline comments.

================
Comment at: llvm/include/llvm/CodeGen/ComplexArithmetic.h:20
+
+#ifndef ENUM_FLAG_DEF
+#define ENUM_FLAG_DEF(Type)                                                    \
----------------
dmgreen wrote:
> Is this needed? Its strange to a macro like this and I'm not sure it's very useful for the places it is used.
It was more useful (though not necessarily needed) in an earlier iteration. I've removed it now.

================
Comment at: llvm/include/llvm/CodeGen/ComplexArithmetic.h:59
+
+ENUM_FLAG_DEF(ComplexArithmeticOperation)
+
----------------
dmgreen wrote:
> It doesn't seem that it would be necessary to combine ComplexArithmeticOperation operations together. It doesn't make much sense to have something that is both an Add _and_ a Partial Mul.
As above, useful in an earlier iteration. Also removed

================
Comment at: llvm/lib/CodeGen/ComplexArithmetic.cpp:40
+static ArrayRef<int> createContiguousMask(int len, int offset = 0) {
+  int *Arr = new int[len];
+  for (int j = 0; j < len; j++)
----------------
dmgreen wrote:
> This leaks memory. Same for the ones below. They can probably return a SmallVector<int>.
Good catch, fixed.

================
Comment at: llvm/lib/CodeGen/ComplexArithmetic.cpp:119-120
+    /**
+     * Will cause the node to look at parents to try and identify the type.
+     * Parents must already be registered and identified.
+     */
----------------
dmgreen wrote:
> What does this mean and refer to?
This entire enum is no longer required, so I've removed it all.

================
Comment at: llvm/lib/CodeGen/ComplexArithmetic.cpp:703
+
+  const unsigned MaxVectorWidth = 128;
+
----------------
dmgreen wrote:
> This shouldn't be something that is hard-coded in the pass. It might make more sense to push the splitting up of larger types to the target. There are cases like fixed-length SVE where the width won't be limited to 128bit vectors, and pushing that out to the target will be more general.
I've pushed the `128` part to be provided by the target, but couldn't figure out a nice way to do the splitting entirely on the target. I'll be revisiting this when I have concrete examples of how scalable vectors will work.

================
Comment at: llvm/lib/CodeGen/ComplexArithmetic.cpp:804
+    }
+    if (Changed) {
+      LLVM_DEBUG(dbgs() << "Trying to substitute graph in block: \n";
----------------
dmgreen wrote:
> This logic is all a bit strange. Why have Changed?
> ```
>   for (auto &I : *B) {
>     if (auto *SVI = dyn_cast<ShuffleVectorInst>(&I)) {
>       if (isInterleaving(SVI))
>         if (traverseAndPopulateGraph(TLI, SVI, Graph))
>           Substituted |= substituteGraph(TLI, &I, Graph, DeadInsts);
> ```
Holdover from an early iteration, where I tried to build the whole graph before changing anything. Simplified

================
Comment at: llvm/lib/CodeGen/ComplexArithmetic.cpp:820
+
+  // TODO clean up the dead instructions better. (Ask in review?)
+  unsigned iter = 0;
----------------
dmgreen wrote:
> I think this is just RecursivelyDeleteTriviallyDeadInstructions
Looks to do the job, thanks

================
Comment at: llvm/test/CodeGen/ARM/ComplexArithmetic/complex-arithmetic-f16-add.ll:33
+vector.body:                                      ; preds = %vector.body, %entry
+  %wide.vec = load <2 x half>, <2 x half>* %a, align 4
+  %strided.vec = shufflevector <2 x half> %wide.vec, <2 x half> zeroinitializer, <1 x i32> <i32 0>
----------------
dmgreen wrote:
> A lot of these tests are strange - they seem to have infinite loops?
> 
> I think you should be able to remove most of it to make a much cleaner test. It doesn't even need the loads/stores:
> ```
> define <8 x half> @complex_add_v8f16(<8 x half> %a, <8 x half> %b) {
> entry:
>   %strided.vec = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
>   %strided.vec21 = shufflevector <8 x half> %a, <8 x half> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
>   %strided.vec23 = shufflevector <8 x half> %b, <8 x half> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
>   %strided.vec24 = shufflevector <8 x half> %b, <8 x half> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
>   %0 = fsub fast <4 x half> %strided.vec23, %strided.vec21
>   %1 = fadd fast <4 x half> %strided.vec24, %strided.vec
>   %interleaved.vec = shufflevector <4 x half> %0, <4 x half> %1, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
>   ret <8 x half> %interleaved.vec
> }
These were initially generated from a larger IR, and pushed through llvm-reduce. I've rewritten them manually to be much smaller (basing them off of your snippet).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114174/new/

https://reviews.llvm.org/D114174