[libcxx-commits] [PATCH] D128929: [libc++] Implements 128-bit support in to_chars.

Wed Jul 6 23:10:45 PDT 2022

Mordante marked 18 inline comments as done.
Mordante added a comment.

Thanks for the reviews. I'll upload a new version and land it when the CI passes.

================
Comment at: libcxx/include/__bits:1
 // -*- C++ -*-
 //===----------------------------------------------------------------------===//
----------------
ldionne wrote:
> Can you please document our support for `__int128_t` & its unsigned counterpart under `Libc++ Extensions` in `libcxx/docs/UsingLibcxx.rst`?
I had a look at this page, but nothing in this file has documentation. So instead of adding it in this review I'll crate a separate review which documents this file and its new extension.

================
Comment at: libcxx/include/__bits:56-58
+  return ((__x >> 64) == 0)
+           ? (64 + __builtin_clzll(static_cast<unsigned long long>(__x)))
+           : __builtin_clzll(static_cast<unsigned long long>(__x >> 64));
----------------
ldionne wrote:
> Mordante wrote:
> > nilayvaish wrote:
> > > Would this not be identified by the compiler as constexpr if _LIBCPP_STD_VER > 11?  Essentially I am trying to understand why we need #if on _LIBCPP_STD_VER.
> > The above isn't valid in C++11 due to language limitations. But this isn't very readable so I added a more readable version for newer C++ versions.
> IMO we should have only one version since both are equivalent. The comment can be left in place to explain why the implementation looks that way.
Done, based on the time we spend discussing the C++11 version I added comment to explain what this does.

================
Comment at: libcxx/include/__charconv/tables.h:119-158
+    UINT64_C(0),
+    UINT64_C(10),
+    UINT64_C(100),
+    UINT64_C(1000),
+    UINT64_C(10000),
+    UINT64_C(100000),
+    UINT64_C(1000000),
----------------
Mordante wrote:
> Mordante wrote:
> > nilayvaish wrote:
> > > nilayvaish wrote:
> > > > Is it possible to use quotes (they are only allowed since C++14) after every three digits, starting from the least-significant digit?  Else, maybe we should generate these individual items using pre-processing.  I think in the current form it is hard to know if the number of digits is correct.
> > > Should this be 10^0 = 1?  I did notice that this matches the tables for 32 bits and 64 bits.
> > No it shouldn't. The length determination algorithm uses `__builtin_clzll` which requires the value to be not `0` so there the value is adjusted (orred by 1). That way a value of zero has a width of `1` and needs no further adjustment.
> > 
> > I agree the naming used is a bit misleading, but I decided to keep the naming already used before.
> It's not possible since this needs to compile in C++11. (In fact in another to_chars patch I used quotes until C++11 complained.) In the current form it's indeed looking at the pattern for a shift of one, not great. Do you have a suggestion how to do this cleanly using pre-processing; usually I feel pre-processing doesn't help to make things readable.
I've adjusted as suggested by @ldionne above I think that version is easier to inspect.

================
Comment at: libcxx/include/__charconv/to_chars_base_10.h:151
+  // Maximum unsigned values
+  // 64  bit                              8'446'744'073'709'551'615 (19 digits)
+  // 128 bit    340'282'366'920'938'463'463'374'607'431'768'211'455 39 digits)
----------------
nilayvaish wrote:
> Mordante wrote:
> > nilayvaish wrote:
> > > I think the MSB is missing.
> > What do you exactly mean with "the MSB is missing"?
> 2^64 = 18446744073709551616.  The most significant digit would be 1.
Thanks, good catch! I thought you means the sign bit with MSB, which makes no sense for an unsigned value. Now I see you mean the leading digit, which indeed is the MSB.

I've updated the comment and made some minor changes to the algorithm since step 2 can actually process 19 digits.
(step 3 already used 19 digits which would be invalid if the current value were correct.)

================
Comment at: libcxx/include/charconv:167
+    // There's always a bit set in the upper 64-bits.
+    auto __t = (128 - std::__libcpp_clz(static_cast<uint64_t>(__v >> 64))) * 1233 >> 12;
+    _LIBCPP_ASSERT(__t >= __table<>::__pow10_128_offset, "Index out of bounds");
----------------
nilayvaish wrote:
> ldionne wrote:
> > Can you add a reference to the documentation for this trick?
> +1.  I don't quite understand what we are trying to achieve here.
+1 ;-) I had intended to do that after I investigated this trick in the 32 and 64-bit code. Thanks for the reminder.

In the comment I also explain why the pow table starts with `0` instead of `1`.

================
Comment at: libcxx/include/charconv:464
 __to_chars_integral(char* __first, char* __last, _Tp __value) {
-  return __itoa::__integral<_Base>::__to_chars(__first, __last, __value);
+  return __itoa::__integral<_Base>::__to_chars(__first, __last, __value); // XXX cast to uint64_t?
 }
----------------
ldionne wrote:
> This is a leftover comment, I think we should be casting to `uint64_t`.
It's indeed a left-over comment. However we can't cast to `uint64_t` here. That would break the 128-bit code. (I tested that locally.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128929/new/

https://reviews.llvm.org/D128929