<div dir="ltr"> <div class="gmail_extra"><div class="gmail_quote">On Tue, Feb 17, 2015 at 4:38 PM, François Fayard <span dir="ltr"><<a href="mailto:fayard.francois@icloud.com" target="_blank">fayard.francois@icloud.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Hi,</div><div><br></div><div>I’ll try to give some arguments on why I think using an unsigned integer for indices is the worst idea one could have:</div><div><br></div><div>1) for(std::size_t k = v.size() - 1; k >=0; —k) is a nice infinite loop</div></div></blockquote><div><br></div><div>All you've stated is that you can accidentally create an infinite loop if you incorrectly write a loop. I'd rather have a data type that properly fits my value's constraints and doesn't artificially limit the size of a range.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>2) for(std::size_t k = 0; k < v.size() - 1; ++k) might take some time if v.size() == 0</div></div></blockquote><div><br></div><div>Again, all you're doing is writing an invalid loop. It doesn't make sense to use a signed type just to support a poorly written loop.</div><div><br></div><div>In reference to both of your above examples, I could produce similar mistakes with respect to signed types, only when size() is very close to the largest ptrdiff_t value. They'd be more rare, but so what -- in either case, choosing your datatype based on such an erroneous loop is choosing the wrong solution to the wrong problem. In both cases, the <i>user</i> made a mistake in the way that they wrote their loop, and it would be horrible to use a datatype that doesn't actually match your range of values just because of this. Rather than try to "do the write thing" when the user does the wrong thing, the effort should instead be focused on making it so that the user isn't doing the wrong thing to begin with. In other words, instead of using a datatype that doesn't match the range, why not just encourage proper usage of high-order functions so that users aren't frequently writing silly loops like this. That's not to say that you won't still encounter more manual loops, but I'd say that changing the datatype is a poor solution to making incorrectly written manual loops "just work."</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>3) It pollutes your types and when you compare signed and unsigned integers, you are very close to a bug</div></div></blockquote><div><br></div><div>Yes, this is important, but I argue that consistently using signed types is not the ideal solution. Consistently using unsigned types in this case is. I am very concerned about this, much more so than most people, and I cringe whenever I even see casting between signed an unsigned types. In that respect, I feel that we are very much in agreement. In my personal experience, though, there is nothing at all difficult about consistently using unsigned types and I genuinely feel that we would be better off in the long run to embrace them more-so than currently is the case, and not less. The only time I encounter problems is when <i>other</i> APIs use signed types for everything.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>4) Who needs modulo 2^n behaviour but people working with “bits”?</div></div></blockquote><div><br></div><div>I agree that modulo 2^n behavior being required is an unfortunate limitation regarding optimization, though it is also notably just a language-imposed requirement -- we could just as easily have additional unsigned types with UB on overflow in the language and have no potential difference regarding efficiency when compared to signed types. If we had that, there would be no difference in terms of optimizations, no artificial limitation on sizes, and we'd be using a data type whose set of values more accurately represents the set of possible sizes of our containers. IMO, using a signed type here would just be a hack.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>No, let’s get a little less technical. Watch this video <a href="http://www.youtube.com/watch?v=Puio5dly9N8" target="_blank">http://www.youtube.com/watch?v=Puio5dly9N8</a> at time 42:38 and 1:02:50. You’ll find out that:</div></div></blockquote><div><br></div><div>Yes, I'm aware, and I feel that they are misguided.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div></div><div>Now, for the people defending the use of std::size_t which is unsigned, their arguments are often</div><div><br></div><div>1) Indices are nonnegative, therefore they should be unsigned. To those I would say :</div><div> - sqrt(x) does not mean anything for x < 0. Nobody invented an unsigned floating point number for that!</div></div></blockquote><div> </div><div>A couple of differences, here. For one, operations with operands generally do not map from one datatype to another, apart from promotion (or operations on things like quantities with units attached) because they are usually a part of larger expressions where you need to be consistent. With respect to something like "size()" returning an unsigned type, we have no operand to be consistent with. It is effectively a terminal. Implying that it simply <i>may</i> be used in signed integer expressions is not a compelling argument, and in practice, I find it much more common to be used in unsigned expressions. If one were to make size() yield a signed value, then all of a sudden any time you use it in an expression with unsigned data you'd be back to having similar mismatching concerns. I don't see what advocates really believe they're gaining.</div><div><br></div><div>As well, a simple unsigned type that has UB on overflow would take no effort to standardize, and it would get rid of the modular arithmetic and optimization concerns -- this is not quite the case for a floating point type. All you'd have to do for such an unsigned integral type is specify that overflow is UB or implementation specified in the standard for such a type. The only difficult part is actually convincing people that it's worthwhile.</div><div><br></div><div>Finally, when writing code that deals with ranges over data structures, which is what our examples have been alluding to here, the need for negative values in full expressions and sets of statements that deal with these range sizes is entirely unconvincing to me -- the examples you've shown certainly do not imply any such need, they just show that some people accidentally write incorrect loops. As said earlier, I've written a lot of generic code and the only time I ever find myself having to deal with signed/unsigned mismatches are when other APIs use signed datatypes when they could/should have used an unsigned type.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div> - v[k] still does not mean anything for k >= v.size()</div></div></blockquote><div><br></div><div>I'm not sure I see your point here. You can still represent a very large size when compared to your word size, which is not possible with a signed type. With a signed type, on the other hand, half of all of your values are always completely worthless. I realize that people frequently say "a large size is uncommon," but to me that is an extremely weak argument and does not warrant using a type that doesn't properly match the valid set of values that you need to represent.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div> - An unsigned integer is not just an integer which is signed. It is a member of the mathematical ring Z/(2^n)Z which is another way of saying that it does have modulo 2^n behaviour. It is not what people want when they use unsigned int. And having taught arithmetic and Z/pZ for many years to student, I can tell you, nobody wants to mess with that ring.</div></div></blockquote><div><br></div><div>Agreed, but IMO <i>this</i> is what needs to be fixed. Making size() return a signed type for this reason would, imo, not solve the <i>actual</i> problem.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>2) You get another bit and therefore you can address more memory.</div><div> - Maybe, for the people who needs to work with arrays of chars of more than 2 GB on 32 bit systems...</div></div></blockquote><div><br></div><div>This genuinely can be a problem and I think it's reasonable on its own as a rationale for unsigned, though really it's the artificial change in the range of values if you were to adopt a signed type that is much more upsetting to me, personally, even though it's much more ideological.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div> - But as you can see with this bug, it does fall apart.</div></div></blockquote><div><br></div><div>Again, if users are incorrectly using the datatype, then <i>that</i> is what needs to be addressed. By artificially changing the datatype to make a common mistake "just work," you're creating other subtle problems, minimizing the possible sizes that are able to be represented, and (in practice) making function preconditions frequently more complicated because certain constraints are no longer implied by the type's invariants.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>3) It is faster too check if you are in bounds with unsigned integers</div><div> - No it is not. If k and n are std::size_t, the check is k < n. If k and n are std::ptrdiff_t, the check is static_cast<std::size_t>(k) < static_cast<std::size_t>(n) which gives the same assembly code.</div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>4) Indices are useless anyway since we have iterators.</div><div> - I still use indices because:</div><div> - they don’t get invalidated when we copy arrays</div><div> - you need to use them when you want to use struct of arrays instead of array of structs.</div><div> - indices are much more rich than iterators: multiplication, division have no meaning for iterators.</div></div></blockquote><div><br></div><div>Not much to say here other than that I personally wouldn't argue these.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word"><div>I care about getting indices right because arrays are used so much in my field. It is too late for C++ but many people have been raised to thik that it is a wise thing to use unsigned int for array indices. Unfortunately Rust seems to make the same error :-( So it would be useful to say this things loudly *it was a huge mistake* and there is not a single benefit to it.<br></div></div></blockquote><div><br></div><div>I realize that you feel it's an "error" but I strongly urge you to reconsider. I agree with you 100% regarding modular arithmetic, and that to me is the real problem. It is the only legitimately compelling reason to avoid unsigned types that I ever hear. However, I'd much rather simply see additional unsigned types that do not require this behavior on overflow as a solution as opposed to, imo, using signed types instead of unsigned types because you don't need modular arithmetic.</div><div><br></div></div>-- <br><div class="gmail_signature">-Matt Calabrese</div>
</div></div>