[PATCH] D148558: [CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass

Nicholas Guy via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 3 03:39:09 PDT 2023


NickGuy added inline comments.


================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1069
+ComplexDeinterleavingGraph::NodePtr
+ComplexDeinterleavingGraph::identifyMultiplications(
+    std::list<Product> &RealMuls, std::list<Product> &ImagMuls,
----------------
igor.kirillov wrote:
> NickGuy wrote:
> > One of the original design goals was to not assume full multiplications, instead representing them as 2 partial multiplys. Is that something that is possible to preserve while still accounting for `reassoc` and `-Ofast`?
> Consider the case of complex number multiplication, such as (x, y) * (u, v). From each partial multiplication, we can only obtain three out of the four values (x, y, u, v). Consequently, it appears that performing a single partial multiplication without considering the second part is imposible. Also, it seems that current code for handling contract flag cases is capable of extracting only full multiplications, despite the detection process being divided into two methods: **identifyPartialMul **and **identifyNodeWithImplicitAdd**.
> 
> Long story short, I don't know how to separate multiplications from each other.
Due to how complex multiplication works, the second partial multiply will always succeed the first, however the first does not always indicate the presence of a second. While not currently implemented, the design does allow for the partial multiplys to be decoupled, theoretically matching cases like `o = a * b.real()`. The reason why it wasn't implemented initially is down to the arrangement of the shuffles (see below). Given this, I'm happy to defer the implementation for now, but as this patch aims to resolve that, I'd like to revisit the idea once this has landed :)

```
  %strided.vec26 = shufflevector <8 x float> %wide.vec25, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  %14 = fmul fast <8 x float> %wide.vec25, %wide.vec
  %15 = shufflevector <8 x float> %14, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  %16 = fmul fast <4 x float> %strided.vec26, %strided.vec24
  %interleaved.vec = shufflevector <4 x float> %15, <4 x float> %16, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
```


================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:24710
+    if (Rotation == ComplexDeinterleavingRotation::Rotation_0) {
+      return B.CreateFAdd(InputA, InputB);
+    } else if (Rotation == ComplexDeinterleavingRotation::Rotation_180) {
----------------
igor.kirillov wrote:
> NickGuy wrote:
> > Do we want to be discarding the fastmath flags here? 
> Yeah, you're right. Even though the code works fine without the fastmath flags and they might not be important at this point in the optimization pipeline, it's probably better to keep them. But it's not that simple. First of all, we don't know which instruction to get the flags from when dealing with -Ofast code. My idea is to use the **Collect **lambda function from identifyReassocNodes and //AND// all the flags together and store them in the **ComplexDeinterleavingNode**.
> 
> The second issue is that currently, replaceSymmetricNode only uses flags from the **Real **Instruction. But what if the flags for the **Imag ** Instruction are different?
> 
> The third thing to consider is whether we should add flags to the architecture-specific intrinsics generated by **createComplexDeinterleavingIR**.
> 
> I think the best way to handle all this is to add a Flags member to ComplexDeinterleavingNode and then apply those flags when using replaceNode.
> 
> What do you think?
> My idea is to use the Collect lambda function from identifyReassocNodes and AND all the flags together and store them in the ComplexDeinterleavingNode.
ANDing them feels like the most sensible to me, either that or ensuring they're all equal during identification.

> I think the best way to handle all this is to add a Flags member to ComplexDeinterleavingNode and then apply those flags when using replaceNode.
That makes sense to me, but I'd be wary of how to apply the flags themselves. We'd either need to change the target hook to provide the flags, or have some way for a target to specify which instructions get which flags.
If we do decide to apply the flags on the common side, there's also the problem of applying them to all relevant instructions, we only get the one returned from `createComplexDeinterleavingIR`, but it could create an entire subgraph of instructions if the target deems it necessary (try the existing implementation with an unrealistic <32 x double> for an example of this).

> The second issue is that currently, replaceSymmetricNode only uses flags from the Real Instruction. But what if the flags for the Imag Instruction are different?
That's a fair point. I'd err on the side of caution, and only perform the replacement if the flags are consistent on both the Real and Imag sides.

> The third thing to consider is whether we should add flags to the architecture-specific intrinsics generated by createComplexDeinterleavingIR.
That is something I don't feel qualified to answer :D 
My instinct says that it wouldn't be a problem, as nothing would be checking the flags on intrinsics, but can't say for sure.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148558/new/

https://reviews.llvm.org/D148558



More information about the llvm-commits mailing list