[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Wed Jun 12 18:20:57 PDT 2019

FWIW, the talks linked by Mehdi really do talk about these things and why I
don't think the really are the correct trade-off.

Even if you imagine an unsigned type that doesn't allow wrapping, I think
this is a really bad type. The problem is that you have made the most
common value of the type (zero in every study I'm aware of) be a boundary
condition. Today, it wraps to a huge value if you cross it. Afterward, it
would trap. Both are super surprising.

Another way of looking at the same lens: do you subtract these values?
Should `a + (b - c)` be the same as `(a + b) - c`? You either need a signed
type or wrapping to have reasonable answers here. And if you solve this
with wrapping, then it makes any attempt to write assertions or other
checks in the same type system very difficult. The fact that you write an
assert to check for "did I accidentally go past zero?" by conjuring some
"it's probably too large" value and then comparing if it is *greater* than
that is ... extraordinarily confusing.

Meanwhile, with signed types, it is quite easy to write asserts that check
for non-negative values in the correct places. They are easy to read and
produce easily understood errors. The boundary conditions are uncommon.

Even on the C++ standards committee, there is remarkably strong consensus
that in the *absence* of unsigned types coming back from `.size()` methods
and such, we should be using signed types for the reasons above.

The fact that we have unsigned `size_t` in a bunch of places is, IMO, a
concern and it is important to have good ways of avoiding warnings. But I
think we have so very many ways that don't require us to just use unsigned
types everywhere and deal with the above issues:

- Change the return types of our containers `size()` methods.
- Add a `ssize()` method. (This is the direction the committee is moving
AFAICT, but they are constrained by a powerful desire to break zero code,
where as LLVM's containers have much more API freedom.)
- Use idioms like the one I suggested with `llvm::seq`.

Any or all of these seem significantly preferable to the readability
concerns I outline above, at least to me. This is why I am still *strongly*
in favor of signed types and assertions around value at known points where
the value should obey that assertion.

-Chandler

On Wed, Jun 12, 2019 at 1:01 AM Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> +1 to both points here.
>
> On Wed, 12 Jun 2019, 07:55 Aaron Ballman via llvm-dev, <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Tue, Jun 11, 2019, 9:59 PM Zachary Turner <zturner at roblox.com> wrote:
>>
>>> On Tue, Jun 11, 2019 at 12:24 PM Mehdi AMINI <joker.eph at gmail.com>
>>> wrote:
>>>
>>>> I agree that readability, maintainability, and ability to debug/find
>>>> issues are key.
>>>> I haven't found myself in a situation where unsigned was helping my
>>>> readability: on the opposite actually I am always wondering where is the
>>>> expecting wrap-around behavior and that is one more thing I have to keep in
>>>> mind when I read code that manipulate unsigned. So YMMV but using unsigned
>>>> *increases* my mental load when reading code.
>>>>
>>> I'm on the other end.  I'm always reading the code wondering "is this
>>> going to warn?"  "Why could a container ever have a negative number of
>>> elements?"  "The maximum value representable by the return type (unsigned)
>>> is larger than that of the value i'm storing it in (signed), so an overflow
>>> could happen even if there were no error.  What then?"
>>>
>>
>> Strong +1 to this.
>>
>> ~Aaron
>>
>>
>>>
>>> On Tue, Jun 11, 2019 at 12:26 PM Michael Kruse <llvmdev at meinersbur.de>
>>> wrote:
>>>
>>>> Am Di., 11. Juni 2019 um 11:45 Uhr schrieb Zachary Turner via llvm-dev
>>>> <llvm-dev at lists.llvm.org>:
>>>> >
>>>> > I'm personally against changing everything to signed integers.  To
>>>> me, this is an example of making code strictly less readable and more
>>>> confusing in order to fight deficiencies in the language standard.  I get
>>>> the problem that it's solving, but I view this as mostly a theoretical
>>>> problem, whereas being able to read the code and have it make sense is a
>>>> practical problem that we must face on a daily basis.  If you change
>>>> everything to signed integers, you may catch a real problem with it a
>>>> couple of times a year.  And by "real problem" here, I'm talking about a
>>>> miscompile or an actual bug that surfaces in production somewhere, rather
>>>> than a "yes, it seems theoretically possible for this to overflow".
>>>>
>>>> Doesn't it make it already worth it?
>>>>
>>> vector.size() returns a size_t, which on 64-bit platforms can represent
>>> types values larger than those that can fit into an int64_t.  So to turn
>>> your argument around, since it's theoretically possible to have a vector
>>> with more items than an int64_t can represent, isn't it already worth it to
>>> use size_t, which is an unsigned type?
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190612/40d574e3/attachment.html>