[libc] [llvm] [libc][math][c23] Add rsqrtf16() function (PR #137545)
Anton Shepelev via llvm-commits
llvm-commits at lists.llvm.org
Sat Sep 13 09:10:18 PDT 2025
amemov wrote:
> > Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) `P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); ` And has ULP Error = 1.0
> > Also found this already existing implementation:
> > https://github.com/llvm/llvm-project/blob/ae6b4b23ea4291e937192a3c08d0f3c9835864c2/libc/src/__support/fixed_point/sqrt.h#L39
> >
> > It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
> > Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some
>
> Can you compare the performance of this with
>
> ```
> fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));
> ```
I wrote this test to check the performance of the implementation and ran the tests for rsqrtf16 a few times:
```
TEST_F(LlvmLibcRsqrtf16Test, PositiveRange_OneOverSqrtFputil) {
for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
float16 x = FPBits(v).get_val();
float16 y = LIBC_NAMESPACE::fputil::cast<float16, float>(
1.0f / LIBC_NAMESPACE::fputil::sqrt<float, float>(
LIBC_NAMESPACE::fputil::cast<float, float16>(x)));
EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x, y, 1.0);
}
}
```
Turns out that my implementation is ~3x slower than just directly calling `1.0f / fputil::sqrt` :/
Not sure why is that - because of too many branches or I wrote over-complicated approximation. The one you see is the most minimal I was able to derive so far - I started with 7-degree polynomial and 2 iterations of Newton's method and was able to reduce it to 5-degree and 1 iteration. What do you think?
https://github.com/llvm/llvm-project/pull/137545
More information about the llvm-commits
mailing list