[Mlir-commits] [mlir] [MLIR][Arith] Add denormal attribute to binary/unary operations (PR #112700)

Mon Dec 9 14:31:40 PST 2024

dcaballe wrote:

Thanks everyone for the feedback. What started as a simple extra FMF flag evolved into a more comprehensive denormal model, which indeed would have required an RFC. Apologies for not signaling this earlier. However, the PR has been open for nearly two months and discussed multiple times offline with Jakub and others, including F2F at LLVM Dev. The proposal incorporated the feedback provided so we felt confident enough to proceed. 

A few points I wanted to clarify, some already made by others: 

**"Free for all":** I don't think describing this proposal as "free for all" is accurate. We are not proposing an Nvidia-specific modifier but a generic way to model various denormal behaviors. This is applicable to any hardware and even includes denormal modes which are not supported by our hardware. 

**"Exotic":** I don’t think "Exotic" fits here either. Denormals (unfortunately!) are a fundamental part of the FP model, and the lack of representation is a gap in our ecosystem which is impacting everyone. Besides the emerging AI architectures that Renato mentioned, a quick search led to: 

- Nvidia's FTZ instructions/intrinsics (see [PTX ISA 8.5](https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-add)).
- AMD's FTZ instructions/intrinsics (see  [lit test](https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmad.ftz.ll#L18)) 
- AVX-512's instructions/intrinsics to check for denormal values at instruction level (see [Intel® Intrinsics Guide](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=fpclass))
- Neon's exclusive FTZ mode and its evolution across VPU generations: (see [Learn the architecture - Neon programmers' guide](https://developer.arm.com/documentation/den0018/a/NEON-Instruction-Set-Architecture/Flush-to-zero-mode)) 

**Denormals in LLVM:** LLVM's denormal model is best-effort (see “denormal-fp-math” in [LLVM Language Reference Manual — LLVM 20.0.0git documentation](https://llvm.org/docs/LangRef.html#function-attributes)) and carries legacy baggage so I don’t think LLVM should be used as a strict role model for this particular case. It seems more reasonable to me to learn from LLVM and create something that better aligns with today's hardware requirements, which is what we were aiming for.

**Per-instruction Denormal Mode:** Besides the per-function mode, LLVM seems to allow finer control of the FP mode with `llvm.set.fpmode` and `llvm.get.fpmode` intrinsics (see https://llvm.org/docs/LangRef.html#llvm-set-fpmode-intrinsic). This should allow us to lower Arith to a generic per-instruction denormal model in LLVM. 

**Denormals and Compiler Infrastructure:** As Jakub mentioned, the compiler's role with denormals is to propagate and preserve the behavior set by the front-end/framework. The compiler shouldn't be randomly changing the denormal mode. Consequently, it should be low risk to classify the denormal mode as something experimental while investigating/fixing any potential issues. 

We'll follow up with an RFC on Discourse. Thanks again for your input.

https://github.com/llvm/llvm-project/pull/112700