[libc-commits] [PATCH] D156981: [libc] Better IntegerToString API

Tue Aug 8 06:23:10 PDT 2023

gchatelet added inline comments.

================
Comment at: libc/src/__support/integer_to_string.h:229
+        is_negative = value < 0;
+        write_number(is_negative ? -value : value, sink);
+      } else {
----------------
gchatelet wrote:
> michaelrj wrote:
> > is there handling for `value` being the lowest negative value for a signed type (e.g. `LONG_MIN`)? In that case `-value` is equivalent to `value`.
> > is there handling for `value` being the lowest negative value for a signed type (e.g. `LONG_MIN`)? In that case `-value` is equivalent to `value`.
> 
> Yes, this is covered by the tests.
> ```
>   EXPECT(type, INT8_MIN, "-128");
>   EXPECT(type, INT16_MIN, "-32768");
>   EXPECT(type, INT32_MIN, "-2147483648");
>   EXPECT(type, INT64_MIN, "-9223372036854775808");
> ```
> 
> C++17 does not enforce a single representation for signed integers but C++20 will enforce two's complement as the only valid representation of signed integer : "[However, all C++ compilers use two's complement representation, and as of C++20, it is the only representation allowed by the standard](https://en.cppreference.com/w/cpp/language/types#:~:text=However%2C%20all%20C,8%2Dbit%20type)".
> 
> Also "[The builtin unary minus operator calculates the negative of its promoted operand](https://en.cppreference.com/w/cpp/language/operator_arithmetic#:~:text=The%20builtin%20unary,bits%20after%20promotion.)" 
> 
> In the case of `int8_t`, `-INT8_MIN` will actually perform `-static_cast<int>(INT8_MIN)` which will end up being `int(128)` then this value is converted back to `uint8_t`.
> Then the standard says "if the target type can represent the value, the value is unchanged" this is the case here, `128` is legal in `uint8_t` so this will be the value passed down to `write_number`.
> 
> The same reasoning works for `int16_t`, `int32_t` and `int64_t` except that type promotion does not always kick in. In that case I believe that [the second rule](https://en.cppreference.com/w/c/language/conversion#:~:text=otherwise%2C%20if%20the,implement%20modulo%20arithmetic.) applies. It basically states that modulo two arithmetic applies for the conversion from signed to unsigned. Practically speaking for `int64_t` this is equivalent to `std::bit_cast<uint64_t>(INT64_MIN)` leaving the bit pattern unchanged.
> 
> `INT64_MIN` interpreted as `int64_t`
> ```
> hex : 8000 0000 0000 0000
> dec : -9 223 372 036 854 775 808
> ```
> 
> interpreted as `uint64_t`
> ```
> hex : 8000 0000 0000 0000
> dec : 9 223 372 036 854 775 808
> ```
Tests do fail in certain situations so there's probably something fishy. I'm investigating.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156981/new/

https://reviews.llvm.org/D156981