[libc] [llvm] [libc][math][c23] Add rsqrtf16() function (PR #137545)

Sat Sep 13 09:10:18 PDT 2025

amemov wrote:

> > Trying to figure out what would be the best option to compute the result. I found that the current polynomial produces the least errors ( bigger ones yield negligible results ) `P = fpminimax(1/sqrt(x), [|0,1,2,3,4,5|], [|SG...|], [0.5, 1]); ` And has ULP Error = 1.0
> > Also found this already existing implementation:
> > https://github.com/llvm/llvm-project/blob/ae6b4b23ea4291e937192a3c08d0f3c9835864c2/libc/src/__support/fixed_point/sqrt.h#L39
> > 
> > It has some other interesting points that I found when I was doing my research: specifically, Newton's method.
> > Upd: Tried adding 2 iterations of Newton's method. Each significantly reduced number of errors, but there are still some
> 
> Can you compare the performance of this with
> 
> ```
>   fputil::cast<float16>(1.0f / fputil::sqrt(fputil::cast<float>(x)));
> ```

I wrote this test to check the performance of the implementation and ran the tests for rsqrtf16 a few times:
```
TEST_F(LlvmLibcRsqrtf16Test, PositiveRange_OneOverSqrtFputil) {
  for (uint16_t v = POS_START; v <= POS_STOP; ++v) {
    float16 x = FPBits(v).get_val();

    float16 y = LIBC_NAMESPACE::fputil::cast<float16, float>(
        1.0f / LIBC_NAMESPACE::fputil::sqrt<float, float>(
                   LIBC_NAMESPACE::fputil::cast<float, float16>(x)));

    EXPECT_MPFR_MATCH_ALL_ROUNDING(mpfr::Operation::Rsqrt, x, y, 1.0);
  }
}
```
Turns out that my implementation is ~3x slower than just directly calling `1.0f / fputil::sqrt` :/ 
Not sure why is that - because of too many branches or I wrote over-complicated approximation. The one you see is the most minimal I was able to derive so far - I started with 7-degree polynomial and 2 iterations of Newton's method and was able to reduce it to 5-degree and 1 iteration. What do you think?

https://github.com/llvm/llvm-project/pull/137545