[libc-commits] [PATCH] D156981: [libc] Better IntegerToString API

Tue Aug 8 04:55:26 PDT 2023

gchatelet marked 7 inline comments as done.
gchatelet added inline comments.

================
Comment at: libc/src/__support/StringUtil/signal_to_string.cpp:32
   // the buffer should be able to hold "Real-time signal" + ' ' + num_str
-  return (base_str_len + 1 + max_num_len) * sizeof(char);
+  return (base_str_len + 1 + int_buffer_size()) * sizeof(char);
 }
----------------
jhuber6 wrote:
> Do we need a helper for what is already a one-liner? I think it should fit without needing to be split to the next line.
I agree it's not useful anymore.

================
Comment at: libc/src/__support/integer_to_string.h:90
+  // Invariants
+  static_assert(BASE == 2 || BASE == 8 || BASE == 10 || BASE == 16,
+                "Invalid radix");
----------------
michaelrj wrote:
> why limit this to only these bases? Without the asserts this class would support any base from 2-36. Additionally, that would mirror how the string to integer functions work.
Although the API would absolutely work with other bases I couldn't come up with legitimate uses so I decided to restrict them.
I will revert to the previous state and provide a way to access other bases.

================
Comment at: libc/src/__support/integer_to_string.h:197
+    constexpr size_t digit_size = max(max_digits(), Fmt::MIN_DIGITS);
+    constexpr size_t sign_size = Fmt::BASE == 10 ? 1 : 0;
+    constexpr size_t prefix_size = Fmt::PREFIX ? 2 : 0;
----------------
michaelrj wrote:
> why are signs only supported in base 10?
This is consistent with :
 - `printf/scanf` specification
 - hex/binary calculator

It is technically doable but I don't see a use case for it. We can always lift this restriction afterwards if it proves useful.

================
Comment at: libc/src/__support/integer_to_string.h:229
+        is_negative = value < 0;
+        write_number(is_negative ? -value : value, sink);
+      } else {
----------------
michaelrj wrote:
> is there handling for `value` being the lowest negative value for a signed type (e.g. `LONG_MIN`)? In that case `-value` is equivalent to `value`.
> is there handling for `value` being the lowest negative value for a signed type (e.g. `LONG_MIN`)? In that case `-value` is equivalent to `value`.

Yes, this is covered by the tests.
```
  EXPECT(type, INT8_MIN, "-128");
  EXPECT(type, INT16_MIN, "-32768");
  EXPECT(type, INT32_MIN, "-2147483648");
  EXPECT(type, INT64_MIN, "-9223372036854775808");
```

C++17 does not enforce a single representation for signed integers but C++20 will enforce two's complement as the only valid representation of signed integer : "[However, all C++ compilers use two's complement representation, and as of C++20, it is the only representation allowed by the standard](https://en.cppreference.com/w/cpp/language/types#:~:text=However%2C%20all%20C,8%2Dbit%20type)".

Also "[The builtin unary minus operator calculates the negative of its promoted operand](https://en.cppreference.com/w/cpp/language/operator_arithmetic#:~:text=The%20builtin%20unary,bits%20after%20promotion.)" 

In the case of `int8_t`, `-INT8_MIN` will actually perform `-static_cast<int>(INT8_MIN)` which will end up being `int(128)` then this value is converted back to `uint8_t`.
Then the standard says "if the target type can represent the value, the value is unchanged" this is the case here, `128` is legal in `uint8_t` so this will be the value passed down to `write_number`.

The same reasoning works for `int16_t`, `int32_t` and `int64_t` except that type promotion does not always kick in. In that case I believe that [the second rule](https://en.cppreference.com/w/c/language/conversion#:~:text=otherwise%2C%20if%20the,implement%20modulo%20arithmetic.) applies. It basically states that modulo two arithmetic applies for the conversion from signed to unsigned. Practically speaking for `int64_t` this is equivalent to `std::bit_cast<uint64_t>(INT64_MIN)` leaving the bit pattern unchanged.

`INT64_MIN` interpreted as `int64_t`
```
hex : 8000 0000 0000 0000
dec : -9 223 372 036 854 775 808
```

interpreted as `uint64_t`
```
hex : 8000 0000 0000 0000
dec : 9 223 372 036 854 775 808
```

================
Comment at: libc/src/__support/integer_to_string.h:193-195
+    constexpr auto max = [](size_t a, size_t b) -> size_t {
+      return a > b ? a : b;
+    };
----------------
jhuber6 wrote:
> Unrelated, but this really should probably go in some common `libc` header since there's another use in `rpc_client.h`.
> Unrelated, but this really should probably go in some common `libc` header since there's another use in `rpc_client.h`.

We definitely need it. I plan to provide it in `libc/src/__support/CPP/algorithms.h`.

================
Comment at: libc/src/__support/integer_to_string.h:94
+
+using BinString = details::Config<2>;
+using OctString = details::Config<8>;
----------------
jhuber6 wrote:
> gchatelet wrote:
> > jhuber6 wrote:
> > > I'm not a fan of the name, since this isn't a string it's a config for the radix. Could we do something like `radix::Decimal`, etc?
> > The intent was that it could be read //naturally// : `IntegerTo<DecString::WithPrefix>` -> `Integer to dec string with prefix`.
> > Now I'm fine with another approach.
> > 
> > I'd like to gather a few other opinions / suggestions before changing though.
> I just feel like it's clearer if you have a function called `IntegerToString` which takes a radix config by template, e.g. `IntegerToString<radix::Decimal>`. It's more words but I just find it a little weird to 
> I just feel like it's clearer if you have a function called `IntegerToString` which takes a radix config by template, e.g. `IntegerToString<radix::Decimal>`. It's more words but I just find it a little weird to 

`IntegerToString<radix::XXX>` it is then.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156981/new/

https://reviews.llvm.org/D156981