[cfe-dev] libc++: max_size() of a std::vector

François Fayard fayard.francois at icloud.com
Tue Feb 17 16:38:57 PST 2015


Hi,

I’ll try to give some arguments on why I think using an unsigned integer for indices is the worst idea one could have:

1) for(std::size_t k = v.size() - 1; k >=0; —k) is a nice infinite loop
2) for(std::size_t k = 0; k < v.size() - 1; ++k) might take some time if v.size() == 0
3) It pollutes your types and when you compare signed and unsigned integers, you are very close to a bug
4) Who needs modulo 2^n behaviour but people working with “bits”?
5) Compiler can optimise a + 1 < b + 1 into a < b if a and b are signed. They can't if they are unsigned.

No, let’s get a little less technical. Watch this video http://www.youtube.com/watch?v=Puio5dly9N8 <http://www.youtube.com/watch?v=Puio5dly9N8> at time 42:38 and 1:02:50. You’ll find out that:

1) Bjarne Stroustrup thinks that it is “one of the sad things” of the standard library
2) Herb Sutter says “Sorry, we were young"
3) Chandler Carruth say that using unsigned integers makes the compiler think that you really need modulo 2^n behaviour and that it prevents warnings and optimisation

Now, for the people defending the use of std::size_t which is unsigned, their arguments are often

1) Indices are nonnegative, therefore they should be unsigned. To those I would say :
    - sqrt(x) does not mean anything for x < 0. Nobody invented an unsigned floating point number for that!
    - v[k] still does not mean anything for k >= v.size()
    - An unsigned integer is not just an integer which is signed. It is a member of the mathematical ring Z/(2^n)Z which is another way of saying that it does have modulo 2^n behaviour. It is not what people want when they use unsigned int. And having taught arithmetic and Z/pZ for many years to student, I can tell you, nobody wants to mess with that ring.
2) You get another bit and therefore you can address more memory.
    - Maybe, for the people who needs to work with arrays of chars of more than 2 GB on 32 bit systems...
    - But as you can see with this bug, it does fall apart.
3) It is faster too check if you are in bounds with unsigned integers
    - No it is not. If k and n are std::size_t, the check is k < n. If k and n are std::ptrdiff_t, the check is static_cast<std::size_t>(k) < static_cast<std::size_t>(n) which gives the same assembly code.
4) Indices are useless anyway since we have iterators.
    - I still use indices because:
        - they don’t get invalidated when we copy arrays
        - you need to use them when you want to use struct of arrays instead of array of structs.
        - indices are much more rich than iterators: multiplication, division have no meaning for iterators.

I care about getting indices right because arrays are used so much in my field. It is too late for C++ but many people have been raised to think that it is a wise thing to use unsigned int for array indices. Unfortunately Rust seems to make the same error :-( So it would be useful to say this things loudly *it was a huge mistake* and there is not a single benefit to it.

Cheers,
François

> I take a somewhat opposite position -- the problem isn't that size is unsigned. The size being unsigned properly matches the set of possible sizes (a size cannot be negative). The problem, IMO, is that our only way of calculating the difference between pointer values assumes that we may be doing something like A - B in a situation where A < B, and so that operation yields an instance of a signed type. In generic algorithms, I do not know of situations where you are doing subtraction of pointer values where you don't already know that one iterator is greater than or equal to another. Calculating the size of a vector is a good example of this, as when doing end() - begin(), you know that end() >= begin(). Rather, if we simply had a way in the language (or more likely, via a standard library function with an implementation that's not fully specified in the standard) to calculate the difference between pointers in a way that yielded a size_t rather than a ptrdiff_t, with the precondition of the relative order of the operands, then we would arrive at a solution that doesn't require having a size() function that returns a type that can represent negative values.
> 
> -- 
> -Matt Calabrese

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150218/470b5ff0/attachment.html>


More information about the cfe-dev mailing list