[PATCH] D148558: [CodeGen] Improve handling -Ofast generated code by ComplexDeinterleaving pass

Thu May 11 04:23:41 PDT 2023

igor.kirillov added inline comments.

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1017
-  Instruction *I = Node->Real;
-  if (I->isUnaryOp())
-    assert(!InputB &&
----------------
I lost this assertion as we don't have Instruction anymore and there are no **isUnaryOp** / **isBinaryOp** alternative for opcode

================
Comment at: llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1069
+ComplexDeinterleavingGraph::NodePtr
+ComplexDeinterleavingGraph::identifyMultiplications(
+    std::list<Product> &RealMuls, std::list<Product> &ImagMuls,
----------------
NickGuy wrote:
> igor.kirillov wrote:
> > NickGuy wrote:
> > > One of the original design goals was to not assume full multiplications, instead representing them as 2 partial multiplys. Is that something that is possible to preserve while still accounting for `reassoc` and `-Ofast`?
> > Consider the case of complex number multiplication, such as (x, y) * (u, v). From each partial multiplication, we can only obtain three out of the four values (x, y, u, v). Consequently, it appears that performing a single partial multiplication without considering the second part is imposible. Also, it seems that current code for handling contract flag cases is capable of extracting only full multiplications, despite the detection process being divided into two methods: **identifyPartialMul **and **identifyNodeWithImplicitAdd**.
> > 
> > Long story short, I don't know how to separate multiplications from each other.
> Due to how complex multiplication works, the second partial multiply will always succeed the first, however the first does not always indicate the presence of a second. While not currently implemented, the design does allow for the partial multiplys to be decoupled, theoretically matching cases like `o = a * b.real()`. The reason why it wasn't implemented initially is down to the arrangement of the shuffles (see below). Given this, I'm happy to defer the implementation for now, but as this patch aims to resolve that, I'd like to revisit the idea once this has landed :)
> 
> ```
>   %strided.vec26 = shufflevector <8 x float> %wide.vec25, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
>   %14 = fmul fast <8 x float> %wide.vec25, %wide.vec
>   %15 = shufflevector <8 x float> %14, <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
>   %16 = fmul fast <4 x float> %strided.vec26, %strided.vec24
>   %interleaved.vec = shufflevector <4 x float> %15, <4 x float> %16, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
> ```
I made some changes to the algorithm. Now it collects all possible partial multiplications and identify complex numbers by using common instructions across different partial multiplications. I also added a place with TODO where we could handle independent (non-paired) partial multiplications in the future

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148558/new/

https://reviews.llvm.org/D148558