[PATCH] D148558: [CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass

Fri Apr 28 08:25:19 PDT 2023

igor.kirillov added inline comments.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1069
+ComplexDeinterleavingGraph::NodePtr
+ComplexDeinterleavingGraph::identifyMultiplications(
+    std::list<Product> &RealMuls, std::list<Product> &ImagMuls,
----------------
NickGuy wrote:
> One of the original design goals was to not assume full multiplications, instead representing them as 2 partial multiplys. Is that something that is possible to preserve while still accounting for `reassoc` and `-Ofast`?
Consider the case of complex number multiplication, such as (x, y) * (u, v). From each partial multiplication, we can only obtain three out of the four values (x, y, u, v). Consequently, it appears that performing a single partial multiplication without considering the second part is imposible. Also, it seems that current code for handling contract flag cases is capable of extracting only full multiplications, despite the detection process being divided into two methods: **identifyPartialMul **and **identifyNodeWithImplicitAdd**.

Long story short, I don't know how to separate multiplications from each other.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24710
+    if (Rotation == ComplexDeinterleavingRotation::Rotation_0) {
+      return B.CreateFAdd(InputA, InputB);
+    } else if (Rotation == ComplexDeinterleavingRotation::Rotation_180) {
----------------
NickGuy wrote:
> Do we want to be discarding the fastmath flags here? 
Yeah, you're right. Even though the code works fine without the fastmath flags and they might not be important at this point in the optimization pipeline, it's probably better to keep them. But it's not that simple. First of all, we don't know which instruction to get the flags from when dealing with -Ofast code. My idea is to use the **Collect **lambda function from identifyReassocNodes and //AND// all the flags together and store them in the **ComplexDeinterleavingNode**.

The second issue is that currently, replaceSymmetricNode only uses flags from the **Real **Instruction. But what if the flags for the **Imag ** Instruction are different?

The third thing to consider is whether we should add flags to the architecture-specific intrinsics generated by **createComplexDeinterleavingIR**.

I think the best way to handle all this is to add a Flags member to ComplexDeinterleavingNode and then apply those flags when using replaceNode.

What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148558/new/

https://reviews.llvm.org/D148558