[libc-commits] [libc] [libc][math][c23] implement C23 math function asinpif16 (PR #146226)

Wed Jul 9 18:11:26 PDT 2025

================
@@ -0,0 +1,163 @@
+//===-- Half-precision asinpif16(x) function ------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception.
+//
+//===----------------------------------------------------------------------===//
+
+#include "src/math/asinpif16.h"
+#include "hdr/errno_macros.h"
+#include "hdr/fenv_macros.h"
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/PolyEval.h"
+#include "src/__support/FPUtil/cast.h"
+#include "src/__support/FPUtil/except_value_utils.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/sqrt.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+static constexpr float16 ONE_OVER_TWO = 0.5f16;
+
+#ifndef LIBC_MATH_HAS_SKIP_ACCURATE_PASS
+static constexpr size_t N_ASINFPI_EXCEPTS = 3;
+
+static constexpr fputil::ExceptValues<float16, N_ASINFPI_EXCEPTS>
+    ASINFPI_EXCEPTS{{
+        // (input_hex, RZ_output_hex, RU_offset, RD_offset, RN_offset)
+        // x = 0.0, asinfpi(0.0) = 0.0
+        {0x0000, 0x0000, 0, 0, 0},
+
+        // x = 1.0, asinfpi(1) = 1/2
+        {(fputil::FPBits<float16>(1.0f16)).uintval(),
+         (fputil::FPBits<float16>(ONE_OVER_TWO)).uintval(), 0, 0, 0},
+
+        // x = -1.0, asinfpi(-1.0) = -1/2
+        {(fputil::FPBits<float16>(-1.0f16)).uintval(),
+         (fputil::FPBits<float16>(-ONE_OVER_TWO)).uintval(), 0, 0, 0},
+    }};
----------------
overmighty wrote:

With your last commit, there actually aren't any exceptional values. I tried switching from `double` to `float` and there were 2 exceptional values, but the degree-19 polynomial could be truncated down to degree 13 and the number of exceptional values remained the same. When truncating further down to degree 11 however, the number of exceptional values increases to 5.

I benchmarked these different versions on a laptop and phone:

| Test case                                           | Denormal range | Normal range |
|-----------------------------------------------------|----------------|--------------|
| **Intel Core i7-13700H** (F16C, `-march=native`)    |                |              |
| Current PR                                          | 25.83 ns       | 14.82 ns     |
| `float`, degree-13 polynomial, 2 except. values     | 20.17 ns       | 12.14 ns     |
| `float`, degree-11 polynomial, 5 except. values     | 19.30 ns       | 11.69 ns     |
| **Google Tensor G3** (FEAT_FP16, `-mcpu=cortex-x3`) |                |              |
| Current PR                                          | 6.76–7.40 ns   | 4.74–4.77 ns |
| `float`, degree-13 polynomial, 2 except. values     | 5.28–7.05 ns   | 4.49 ns      |
| `float`, degree-11 polynomial, 5 except. values     | 5.37–6.70 ns   | 4.46–4.58 ns |

On the i7-13700H, the results vary by less than 0.3% between runs so it's clear that using the degree-11 polynomial is slightly faster despite the few extra exceptional values, but on the Tensor G3 the results are more volatile so it's not clear which is faster between degree 11 and 13 there.

https://github.com/llvm/llvm-project/pull/146226