[llvm] [NVPTX] Add IR pass for FMA transformation in the llc pipeline (PR #154735)
Rajat Bajpai via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 22 04:14:15 PDT 2025
rajatbajpai wrote:
> Fusing the FMA doesn't really give you new information. You could perform equivalent analysis on the separate operations
We're aiming to vectorize the `fma.f32` instructions into a `fma.f32x2`. To enable this, we plan to fold FMAs during the IR phase, prior to ISel. [CUDA FMA Instructions](https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-fma.).
While the bandwidth of two scalar FMAs is equivalent to that of a vectorized FMA, vectorization can benefit workloads that are bottlenecked by instruction issue rates. We plan to add this transformation as a separate, opt-in optimization pass in the llc pipeline.
https://github.com/llvm/llvm-project/pull/154735
More information about the llvm-commits
mailing list