[PATCH] D158079: [InstCombine] Contracting x^2 + 2xy + y^2 to (x + y)^2 (float)

Fri Aug 18 17:02:34 PDT 2023

rainerzufalldererste marked an inline comment as done.
rainerzufalldererste added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp:1059
+                                       m_FMul(m_Deferred(B), m_Deferred(B))))));
+  }
+
----------------
rainerzufalldererste wrote:
> goldstein.w.n wrote:
> > rainerzufalldererste wrote:
> > > rainerzufalldererste wrote:
> > > > goldstein.w.n wrote:
> > > > > This match code is basically identical to `foldSquareSumInts`. The only difference other than `FMul` vs `Mul` is you do match `FMul(A, 2)` for floats and `m_Shl(A, 1)` for ints.
> > > > > Can you make the match code a helper that takes either fmul/2x matcher (or just lambda wrapping) so it can be used for SumFloat / SumInt?
> > > > Does that imply that `m_c_FAdd` can simply be replaced with `m_c_Add` and will continue to match properly for floating point values as well?
> > > > I presume that would entail partially matching another pattern and then deferring the actual check for the mul2 match, as `BinaryOp_match<RHS, LHS, OpCode>` would have different `OpCode`s for `FMul` and `Shl`, which sounds like a huge mess to me; or is there a cleaner way to do that?
> > > > 
> > > > Something like this sadly doesn't compile (as the lambda return type is ambiguous):
> > > > ```
> > > >   const auto FpMul2Matcher = [](auto &value) {
> > > >     return m_FMul(value, m_SpecificFP(2.0));
> > > >   };
> > > >   const auto IntMul2Matcher = [](auto &value) {
> > > >     return m_Shl(value, m_SpecificInt(1));
> > > >   };
> > > >   const auto Mul2Matcher = FP ? FpMul2Matcher : IntMul2Matcher;
> > > > ```
> > > Even something like this shouldn't work.
> > > 
> > > ```
> > > template <typename TMul2, typename TCAdd, typename TMul>
> > > static bool MatchesSquareSum(BinaryOperator &I, Value *&A, Value *&B,
> > >                              const TMul2 &Mul2, const TCAdd &CAdd,
> > >                              const TMul &Mul) {
> > > 
> > >   // (a * a) + (((a * 2) + b) * b)
> > >   bool Matches =
> > >       match(&I, CAdd(m_OneUse(Mul(m_Value(A), m_Deferred(A))),
> > >                      m_OneUse(Mul(CAdd(Mul2(m_Deferred(A)), m_Value(B)),
> > >                                   m_Deferred(B)))));
> > > 
> > >   // ((a * b) * 2)  or ((a * 2) * b)
> > >   // +
> > >   // (a * a + b * b) or (b * b + a * a)
> > >   if (!Matches) {
> > >     Matches =
> > >         match(&I, CAdd(m_CombineOr(m_OneUse(Mul2(Mul(m_Value(A), m_Value(B)))),
> > >                                    m_OneUse(Mul(Mul2(m_Value(A)), m_Value(B)))),
> > >                        m_OneUse(CAdd(Mul(m_Deferred(A), m_Deferred(A)),
> > >                                      Mul(m_Deferred(B), m_Deferred(B))))));
> > >   }
> > > 
> > >   return Matches;
> > > }
> > > ```
> > > 
> > > I agree that it's messy to have duplicate code, but with the way op-codes are used as template parameters I don't see a way without template specialization to do this nicely; and with template specialization it's even more of a beast.
> > > Am I missing some obvious way built into llvm/InstCombine to do this nicely?
> > Why doesn't that code work?
> Assuming `TMul2` etc. to be a lambda, the return type couln't be consistent, as for both `m_FMul` and `m_Shl` it'd be `BinaryOp_match<RHS, LHS, OpCode>`, with the same `OpCode` for each invocation, but different `RHS` and `LHS`. One could make this work with macros, but I don't know the LLVM stance on macros, or with templace specialization, where there'd be a specialized struct with three functions (`Mul2`, `Mul`, `CAdd`) that simply map to the correct functions for `FAdd`/`Add` etc.
> However, I honestly think that the current implementation is the cleanest way to do it. I'm also not a big fan of code duplication, but the discussed alternatives seem a lot messier to me.
Have you been able to come up with some better ideas? Maybe it's not _that_ terrible to go down the template specialization route, as many of the integer optimizations may have similar counterparts in FP with `nsz` and `reassoc`. Not sure how many of them are already handled twice, but there's a chance one could simplify this process by providing template specialized `m_XAdd<IsFP>(LHS, RHS)` etc. However, I'm not sure if I'm the right person to pass judgement on something that large, as I'm still very new to both LLVM and InstCombine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158079/new/

https://reviews.llvm.org/D158079

[PATCH] D158079: [InstCombine] Contracting x^2 + 2*x*y + y^2 to (x + y)^2 (float)

[PATCH] D158079: [InstCombine] Contracting x^2 + 2xy + y^2 to (x + y)^2 (float)