[PATCH] D146988: [CodeGen] Enable processing of interconnected complex number operations

Tue Mar 28 07:19:20 PDT 2023

igor.kirillov added a comment.

I am planning to add several more patches after this one, including:

1. Support for scalable vectors
2. Reductions
3. Full -Ofast mode

With regards -Ofast, the current code is suitable for simple cases where only one multiplication or addition is involved. However, when -Ofast flag is set, the compiler may rearrange the order of instructions, causing the real and imaginary parts to not run in parallel. For example, a loop like the following may not be processed:

  for (int i = 0; i < N; ++i)
    u[i] = v[i] * w[i] + u[i] * y[i];

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:143
-  // the multiply instructions of a partial multiply operation.
-  SmallVector<Instruction *> InternalInstructions;
   ComplexDeinterleavingRotation Rotation;
----------------
NickGuy wrote:
> What's the rationale behind removing this? I might be missing something, but it looks like you're removing it then looking for the internal instructions on-demand in `checkNodes` later
Ah, yes I should explain. I'll be adding support for -Ofast (see top comment), which might cause some issues. Basically, the problem is that the `ComplexDeinterleavingNode` can't be attached to `ComplexDeinterleavingNode::Imag` and `ComplexDeinterleavingNode::Real` instructions, which means that the `submitCompositeNode` function won't be able to fill `InternalInstructions`.

To solve this, I've decided to split the detection and checking process into two stages. I think it is more straightforward.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:905-910
+    if (!FinalInstructions.count(I)) {
+      for (Value *Op : I->operands()) {
+        if (auto *OpI = dyn_cast<Instruction>(Op))
+          ToDo.emplace_back(OpI);
+      }
     }
----------------
NickGuy wrote:
> How is this expected to behave with operands that leave the chain? (i.e. Phi nodes or instructions in a different basic block). Is there a risk of `AllInstructions` containing the most of the function in some cases?
PHINodes will also act as as leaves and be in FinalInstructions, and instructions from different block are not problematic as long as we ensure that there are no other uses, which we do.

In case that a loop has a large number instructions, it will still be processed in the same manner and I don't see any problems.

---

As FCMLA/FCADD are vector instructions, our focus is mainly on vector loops. Therefore, I think it is more beneficial to perform this check on a BasicBlock level rather then a Function level. Alternatively, we could apply this pass per Loop, but this would result in losing support for some Neon cases that can be generated by the autovectorizer outside of a loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146988/new/

https://reviews.llvm.org/D146988