[llvm-dev] [RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

Wed Jun 12 10:29:27 PDT 2019

Hi Mehdi,

Drive by comment, please feel free to disregard.

> On Jun 12, 2019, at 11:50, Mehdi AMINI via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> (any bug caused "negative indexing of a container" would be as much of a bug with unsigned,

Signed comparison with a maximum size or number of things is a classic software vulnerability. E.g.,

#define MAX_WIDGETS 200
struct widget widgets[MAX_WIDGETS];
...
if (count >= MAX_WIDGETS)
  return -1;
/* now do something with widgets[count] */

With unsigned integers, there would be no bug. There are simple bugs using either signed or unsigned integers so I don't find that dispositive.

> please watch Chandler's talk).

It was a great talk! I'm glad you pointed it out, but I'm not convinced any of the examples he discusses impact loop counters, which seems to be one of the main motivations for this. Let's go through them.

1. Shifting by more than the bit width <https://www.youtube.com/watch?v=yG1OZ69H_-o&t=24m40s> doesn't involve signed values.

2. Signed left shift vs. multiplication <https://www.youtube.com/watch?v=yG1OZ69H_-o&t=27m48s> isn't relevant for loops.

3. The example from Dietz et al. <https://www.youtube.com/watch?v=yG1OZ69H_-o&t=33m15s> boils down to computing 16u + (n-1) * 8u. As far as I can tell, the signedness of the quantities involved does not impact the computation. See <https://godbolt.org/z/HVA6xD>.

4. Using a 32-bit, unsigned integer to index an array <https://www.youtube.com/watch?v=yG1OZ69H_-o&t=39m22s>. Here, using a signed 32-bit integer lets the compiler promote it to a signed 64-bit integer and the compiler can produce slightly better code. Using an appropriately sized integer for the index, namely size_t, has the same effect except with unusual ABIs like x32*. ssize_t would also be fine (and better than size_t for x32).

5. C++ doesn't have a supported arithmetic right shift <https://www.youtube.com/watch?v=yG1OZ69H_-o&t=51m49s>. Great point, but not relevant here.

6. UB twitter's favorite example: memcpy(0, 0, 0) <https://www.youtube.com/watch?v=yG1OZ69H_-o&t=55m0s>. Again, not relevant here.

Up thread, Chandler suggests
> for (int Index = 0, Size = MyVector.size(); Index < Size; ++Index)
or
> for (int Index : llvm::seq<int>(0, MyVector.size()))

I believe int can produce better code than unsigned int here, but neither seem correct. Surely both should be size_t (again, ssize_t would be fine). I'm not sure if LLVM containers ever contain more than INT_MAX elements. If not, then int is fine but since it'll produce the same code as using a correctly sized integer, I don't see the point. (As an aside, the talk by Jon Kalb you linked to <https://www.youtube.com/watch?v=wvtFGa6XJDU&t=4m45s> says that if you need one extra bit, you'll probably need two extra bits. "A sneeze puts you over." But that's another 2 billion more entries in your data structure so I'm not sure I buy that argument.)

One of the guides you linked to <https://google.github.io/styleguide/cppguide.html#Integer_Types>, says "When appropriate, you are welcome to use standard types like size_t and ptrdiff_t." Looping over structures from 0 to some size expressed as a size_t seems like an appropriate time to use size_t to me.

I haven't yet heard or read a compelling argument for using an integer type smaller than the register width for arithmetic, which int usually is these days. I can't think of a good reason to use unsigned ever though.

* Example provided by Rich Felker <https://twitter.com/RichFelker/status/1138166935575826433>.

Cheers,

Steve

-- 
Stephen Checkoway