[libc-commits] [PATCH] D115408: [libc] Implement correctly rounded logf based on RLIBM library.

Tue Ly via Phabricator via libc-commits libc-commits at lists.llvm.org
Sat Dec 11 23:12:53 PST 2021

lntue added a comment.

I've updated the patch using the polynomial provided by @santoshn .  I've tried to benchmark various implementations, and it seems like the most efficient way to compute the quintic polynomial with zero free coeff that @santoshn  provided is as follow:

- First compute it as a quartic polynomial r = a_5 x^4 + a4 x^3 + ... + a1 = polyeval(x, a1, a2, a3, a4, a5) = fma(polyeval(x, a2, a3, a4, a5), x, a1)  // cubic polynomial is optimized in src/math/__support/FPUtil/x86_64/PolyEval.h
- Then use another fma to multiply r by x, and the add the extra part (m *  log(2) + log(f)): result = fma(r, x, fma(m, log(2), log(f)))

With this computation, the exhaustive test found out that the number of exception cases reduced to 5:  // Input decimal: 1.00639712810516357421875000000000000000000000000000

  //   MPFR result: 0.00637675332837318907527892795527933097341434970404
  // Input decimal: 9.47263622283935546875000000000000000000000000000000
  //   MPFR result: 2.24840724468231192940413656173141803167922920169308
  // Input decimal: 58037908.00000000000000000000000000000000000000000000000000
  //   MPFR result: 17.87660694122314468798277511771537045730722298001695
  // Input decimal: 127837836949849943048192.00000000000000000000000000000000000000000000000000
  //   MPFR result: 53.20504951477050802794144423622214960099434498468168
  // Input decimal: 54983060754563292101907316736.00000000000000000000000000000000000000000000000000
  //   MPFR result: 66.17682266235352211194320433287158374936494916276426

The last 2 large exceptional value might come from evaluating m*log(2) + log(f) with fma.

@zimmermann6 : About latency: The early switch cases for exceptional values did slow down the whole function by 20-25%.   I was playing around with it and found out that by moving the switch for exceptional cases close to the final fma call, the extra latency due to the switch is reduce to only ~5% of the whole function, which I think is acceptable.
About correct rounding for all the rounding modes, even though it would be a few extra switches, I'll need to update the testing infrastructure, especially the support for MPFR with all rounding modes, and the exhaustive tests for that.  Also there is another design question needs to be considered is to support all rounding modes at compile-time and run-time.  Since that works quite significant by itself, I would prefer to add the support for other rounding modes in a different patch.

  rG LLVM Github Monorepo



More information about the libc-commits mailing list