[llvm] [NVPTX] Add IR pass for FMA transformation in the llc pipeline (PR #154735)

Wed Oct 15 10:58:44 PDT 2025

rajatbajpai wrote:

> I'm still not sure I fully understand the need to do fma-folding to enable vectorization.

There isn't a strict requirement to perform FMA folding before FMA vectorization; however, it will create additional opportunities for vectorization. We have a real world example where this helps.

> Since PTX supports add.f32x2 and mul.f32x2 can the SPL vectorizer just vectorize these operations with DAGCombine doing FMA folding on the vectorized operations where possible?

That’s another possible approach, but there are a few important considerations:  
1) Potential performance fluctuations when vectorizing general arithmetic operations (add.f32 and mul.f32).  
2) DAGCombine can only perform vectorization within a BB.

https://github.com/llvm/llvm-project/pull/154735