[llvm] [AMDGPU][DAGCombiner][GlobalISel] Prevent FMA contraction when multiply cannot be eliminated (PR #169735)

Mon Dec 1 07:18:56 PST 2025

adelejjeh wrote:

> The idea seems reasonable to me but note it is somewhat at odds with the original motivation for the "aggressive" option: [62ac736](https://github.com/llvm/llvm-project/commit/62ac736faa3f9e1307529d365a5267b9cfbb8084)
> 
> > [...] The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
> > use, but this is overly-conservative on some systems. Specifically, if the FMA
> > and the FADD have the same latency (and the FMA does not compete for resources
> > with the FMUL any more than the FADD does), there is no need for the
> > restriction, and furthermore, **forming the FMA leaving the FMUL can still allow
> > for higher overall throughput and decreased critical-path length**.
> 
> @hfinkel

This kind of sits somewhere in between the current "aggressive" behavior and the concerns that @b-sumner had about power usage. I made sure to keep existing behavior whenever the multiply will be completely removed, even took the more conservative route for fma assuming it will always be contracted. My testing showed improvements for AMDGCN: I noticed reduced instruction count and register usage in some of the lit tests, and saw decent improvements on some SPEChpc benchmarks with a net positive.

Based on the comments maybe a better approach is to introduce a cost model?

https://github.com/llvm/llvm-project/pull/169735