[PATCH] D33104: [BitVector] Implement find_[first/last]_[set/unset]_in

Wed May 17 08:36:52 PDT 2017

chandlerc accepted this revision.
chandlerc added a comment.
This revision is now accepted and ready to land.

LGTM with the assert changes.

================
Comment at: llvm/include/llvm/ADT/BitVector.h:151
+  /// [Begin, End).  Returns -1 if all bits in the range are unset.
+  int find_first_in(unsigned Begin, unsigned End) const {
+    if (Begin >= End)
----------------
zturner wrote:
> chandlerc wrote:
> > Given that we return an int, I would prefer that the arguments also be int (and the intermediate variables within).
> The only reason we return an int is because we have to use a sentinel value to indicate "no bits found".  The methods that these originated from `find_next`, `find_prev`, etc have always taken `unsigned` arguments even before I started tinkering on `BitVector`.  Do you still think it's a good idea to change all of them?  
I mean, my view is that we're doing a lot of arithmetic on these things including potentially subtraction. So generally, unless there is a problem with using a signed integer, I would prefer that.

If we return `int`, then we've given up on having more than `2^31 - 1` bits in one of these containers.

That said, a bitvector is one of the few datastructures where it seems entirely conceivable that we'll actually end up with 500mb vector of bits and an `int` index won't work. But an `unsigned` won't either, so I suspect that isn't the deciding factor and if we care we need a 64-bit type.

So I guess my feeling is that we should do one of two things here:
a) Switch the return types to an unsigned type and use something like `npos`, or
b) Switch the argument types to a signed type and continue to use `-1`.

(Whether we use 32-bits or 64-bits should be orthogonal as I see no reason why for a *bit* count 2gb vs 4gb would be "enough".)

However, looking at the code, I agree that the patch as-is remains consistent, and you shouldn't do either (a) or (b) in *this* patch. But I'd somewhat like to consider doing one of them in a subsequent patch.

My personal, strong preference is to use (b) because I find asserts easier to write for signed types in the face of arithmetic and we have strong usage of UBSan to ensure we don't develop bugs here. I also have a somewhat powerful dislike of `npos`. But I think either (a) or (b) would be an improvement. Anyways, sorry for the digression.

================
Comment at: llvm/include/llvm/ADT/BitVector.h:152-153
+  int find_first_in(unsigned Begin, unsigned End) const {
+    if (Begin >= End)
       return -1;

----------------
zturner wrote:
> chandlerc wrote:
> > Why isn't this an assert?
> I suppose I should assert if `Begin > End`, but `Begin == End` should probably still just return -1 I think.
Ah, yes, that makes sense.

https://reviews.llvm.org/D33104