[PATCH] D146988: [CodeGen] Enable processing of interconnected complex number operations

Thu Apr 6 04:57:26 PDT 2023

dmgreen added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/complex-deinterleaving-multiuses.ll:93
+;   *p2 = (a * b) * c;
+;   return d * c
+define <4 x float> @monster(<4 x float> %a, <4 x float> %b, <4 x float> %c, <4 x float> %d, ptr %p1, ptr %p2) {
----------------
igor.kirillov wrote:
> NickGuy wrote:
> > dmgreen wrote:
> > > It is not obvious to me why d * c should prevent the rest of the tree transforming. Can you precommit the tests to show the differences in the review?
> > I'm not sure it's the `d * c` that's preventing it, so much as the store to %p1 (It doesn't reinterleave, so the store is deemed "outside the graph").
> > Though I agree, pre-commits would be useful to see the difference in the other cases
> The idea is that any external use of complex computation chain prevents deinterleaving. And if one chain stops to be deinterleaved it might effect another chain, which is shown in this test. *p1 prevent for a*b to be deinterleaved, which in its turn prevents (a * b * c)  to be deinterleaved and that in its turn spoils d*c expression. As result nothing is deinterleaved.
> 
> At least that's what I thought before pre-commiting this tests where pre-patch code decided to deinterleaved d * c operation - https://reviews.llvm.org/D147659. Now, I have a question. If shufflevector is used outside, should we still deinterleave code?
Sorry I apparently forgot to send this as a reply to @igor.kirillov 

Yeah that was my point, but it will depend on whether the shuffles get folded into a ld2 or need to be replicated. On it's own replacing fadd/fmul(shuffle, shuffle) with fcmla will be profitable even if the shuffles have other uses. But if the shuffles could be combined into a ld2 it isn't as obvious.

One of the earlier version of this pass has a costmodel that tried work out when it was profitable but I suggested removing it to simplify the patch, as it wasn't very useful at the time. I would guess that adding multiple disparate subgraphs into the same graph make that more difficult to do consistently, but in this case we can perhaps just check the shuffles operand to see if it is a load. It is probably worth having tests for both, and giving them better names.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146988/new/

https://reviews.llvm.org/D146988