[PATCH] D43090: GlobalISel: IRTranslate llvm.fmuladd.* intrinsic

Fri Feb 9 17:52:17 PST 2018

qcolombet added inline comments.

================
Comment at: lib/CodeGen/GlobalISel/IRTranslator.cpp:765
+        TLI.isFMAFasterThanFMulAndFAdd(TLI.getValueType(*DL, CI.getType()))) {
+      MIRBuilder.buildInstr(TargetOpcode::G_FMA, Dst, Op0, Op1, Op2);
+    } else {
----------------
dsanders wrote:
> qcolombet wrote:
> > volkan wrote:
> > > aditya_nandakumar wrote:
> > > > qcolombet wrote:
> > > > > So far we stayed away of "optimizations" in the IRTranslator.
> > > > > I wonder if this part of the lowering should be part of a combiner instead.
> > > > > 
> > > > > In particular, the language reference says:
> > > > > Fusion is not guaranteed, even if the target platform supports it.
> > > > > 
> > > > > What do you think?
> > > > That's also what I said when I spoke with him in person. Unfortunately, currently there's no other place to put it (to avoid duplicating across all targets).
> > > Yes, fusion is not guaranteed but I think we still need to translate this to either one of the options. If IRTranslator doesn't do that, every target would have to handle this in somewhere else as the combiner is optional.
> > > Unfortunately, currently there's no other place to put it (to avoid duplicating across all targets).
> > 
> > Couldn't this be offered as a generic combine (I mean G_FMUL (G_FADD) => FMA)?
> > 
> > I know this is optional, but this would be shared and is also sort of an optimization.
> > 
> > >  fusion is not guaranteed but I think we still need to translate this to either one of the options.
> > 
> > Yes, what I am suggesting is that fmuladd could always be expanded into G_FMUL (G_FADD). Just remove the FMA part.
> > 
> > > If IRTranslator doesn't do that, every target would have to handle this in somewhere else as the combiner is optional.
> > 
> > Just to make just we are talking about the same thing, I agree we need to *always* lower that intrinsic in IRTranslator, but G_FMUL(G_FADD) is sufficient for that part.
> My initial thought was that this ought to be a combine. However, I think this intrinsic is intended as a way to say you're expecting either a fused-multiply-add or a multiply-add (some targets have both) but don't care whether the rounding step happens or not, you just want whatever is fastest. We could handle this as a combine but it would cost compile-time to find the optimization opportunity in the expanded code and if we go that route then the intrinsic doesn't really serve a purpose compared to separate mul/add LLVM-IR instructions anymore.
Arguably the compiler knows better (critical path, resources usages, and whatnot) and that wouldn't be the first time that we silently drop an intrinsic in favor of regular instructions (we did that for pretty much all the intel shuffle instructions for instance IIRC). Thus, keeping this as an optimization makes sense to me.

That said, I also get your point and I am fine with either solutions. Given it looks like we have 1 vs. 3 votes, I guess we keep the fancy expansion here :).

Thoughts?

https://reviews.llvm.org/D43090