[llvm] [NVPTX] Add IR pass for FMA transformation in the llc pipeline (PR #154735)

Fri Aug 22 04:14:15 PDT 2025

rajatbajpai wrote:

> Fusing the FMA doesn't really give you new information. You could perform equivalent analysis on the separate operations

We're aiming to vectorize the `fma.f32` instructions into a `fma.f32x2`. To enable this, we plan to fold FMAs during the IR phase, prior to ISel. [CUDA FMA Instructions](https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-fma.).

While the bandwidth of two scalar FMAs is equivalent to that of a vectorized FMA, vectorization can benefit workloads that are bottlenecked by instruction issue rates. We plan to add this transformation as a separate, opt-in optimization pass in the llc pipeline.

https://github.com/llvm/llvm-project/pull/154735