[PATCH] D67434: [InstCombine] Limit FMul constant folding for fma simplifications.

Fri Sep 13 07:38:22 PDT 2019

fhahn marked 2 inline comments as done.
fhahn added inline comments.

================
Comment at: llvm/lib/Analysis/InstructionSimplify.cpp:4545

+  // fmul 1.0, X ==> X
+  if (match(Op0, m_FPOne()))
----------------
reames wrote:
> These two appear to be new transforms.  Please separate the basic optimization which is mostly refactoring plus hooks (this review) and a following optimization change.
Done, split off as D67553

================
Comment at: llvm/test/Transforms/InstCombine/fma.ll:502-511
+define <2 x double> @fmuladd_const_fmul(<2 x double> %b) {
+; CHECK-LABEL: @fmuladd_const_fmul(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[RES:%.*]] = fadd nnan nsz <2 x double> [[B:%.*]], <double 0x3F8DB6C076AD949B, double 0x3FB75A405B6E6D69>
+; CHECK-NEXT:    ret <2 x double> [[RES]]
+;
+entry:
----------------
fhahn wrote:
> reames wrote:
> > lebedev.ri wrote:
> > > fhahn wrote:
> > > > reames wrote:
> > > > > lebedev.ri wrote:
> > > > > > reames wrote:
> > > > > > > reames wrote:
> > > > > > > > lebedev.ri wrote:
> > > > > > > > > reames wrote:
> > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > I don't think this restriction should apply to `@llvm.fmuladd`, as per https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic
> > > > > > > > > > I don't see why not?  The rounding is specified exactly as described.
> > > > > > > > > I think my comment got garbled. I'm saying the current patch is correct in the sense that
> > > > > > > > > if we are simplifying `fmuladd`, we don't have any rounding concerns:
> > > > > > > > > As per LangRef, `fmuladd` can be arbitrarily expanded to `fmul`+`fadd`, unlike `fma`.
> > > > > > > > > 
> > > > > > > > From the langref: "is equivalent to the expression a * b + c, *except that rounding will not be performed* between the multiplication and addition steps if the code generator fuses the operations. "
> > > > > > > > 
> > > > > > > > Once fused, this implies the rounding is defined.  We can consider changing the LangRef - I think we maybe should - but per the actual wording, we must do the rounding as per an fma, not the component operations.
> > > > > > > p.s.  Looking at the implementation, particularly SelectionDAGBuilder, I think you're right about the intent.  We just need to clarify the semantics in LangRef.  
> > > > > > I fail to read langref any other way already:
> > > > > > 
> > > > > > > The ‘llvm.fmuladd.*’ intrinsic functions represent multiply-add expressions that can be fused if the code generator determines that (a) the target instruction set has support for a fused operation, and (b) that the fused operation is more efficient than the equivalent, separate pair of mul and add instructions.
> > > > > > 
> > > > > > "can be fused **if**"
> > > > > > 
> > > > > > > The expression:
> > > > > > > `%0 = call float @llvm.fmuladd.f32(%a, %b, %c)`
> > > > > > > is equivalent to the expression a * b + c, except that rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations. Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead.
> > > > > > 
> > > > > > "rounding will not be performed between the multiplication and addition steps **if** the code generator fuses the operations"
> > > > > > "Fusion is not guaranteed"
> > > > > The key bit of wording here is that the rounding is described *following the decision to fold*.  
> > > > > 
> > > > > I'm not disputing that converting " a * b + c" to a fmuladd is valid per this description.  I'm disputing that converting *back* to an "a * b + c" sequence is technically disallowed by the wording.  
> > > > > The key bit of wording here is that the rounding is described *following the decision to fold*.
> > > > 
> > > > My understanding is that as long as it is a fmuladd,  the code generator did not make a decision about fusing yet. IIUC there are 2 ways it can decide to fuse: 1) replace it with an fma intrinsic or lower it to a FMA in the backend.
> > > I would agree with that if not for the 
> > > > Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead.
> > > to me that sounds: "use if fmulfadd if you want to request to do FMA; or fma if you want to require FMA (if unavailable compilation will fail)"
> > Ok, let me rephrase how I'm approaching this.
> > 
> > As a reviewer, I have shared a concern about a potential ambiguity in the LangRef which effects the legality of the discussed transform.  Whether there is an agreement that the ambiguity exists or not, I expect a change made to clarify the semantics to remove the potential ambiguity.  Please consider this a hard requirement before any change to exploit the discussed semantics as proposed in the comment that started this sub-thread.  
> > 
> > (This entire sub-thread is off topic for the actual review.)
> > 
> Sounds good, I'll think about how to make it more explicit.
Split off as D67552

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67434/new/

https://reviews.llvm.org/D67434