[llvm-branch-commits] [libclc] libclc: Force assuming fast float fma for AMDGPU (PR #188245)

Tue Mar 24 06:31:53 PDT 2026

https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/188245

>From 404e1bf6aae5df6a0796e057c2867e5cf165233d Mon Sep 17 00:00:00 2001
From: Matt Arsenault <Matthew.Arsenault at amd.com>
Date: Tue, 24 Mar 2026 14:25:55 +0100
Subject: [PATCH] libclc: Force assuming fast float fma for AMDGPU

Currently the build uses the default dummy target, which assumes
FMA is slow. Force this to assume fast fma, which is the case on
any remotely new hardware. In the future if we want better support
for older targets, there should be a separate build of the math
functions for the slow fma case.
---
 libclc/clc/include/clc/math/math.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libclc/clc/include/clc/math/math.h b/libclc/clc/include/clc/math/math.h
index 22ed3f9defcbe..950e8055c98c9 100644
--- a/libclc/clc/include/clc/math/math.h
+++ b/libclc/clc/include/clc/math/math.h
@@ -30,7 +30,9 @@
 #define __CLC_FAST_FMA_F16 0
 #endif
 
-#ifdef FP_FAST_FMAF
+// TODO: Stop forcing this for AMDGPU, and use a separate build for slow-fma
+// case.
+#if defined(FP_FAST_FMAF) || defined(__AMDGPU__)
 #define __CLC_FAST_FMA_F32 1
 #else
 #define __CLC_FAST_FMA_F32 0