[libcxx-commits] [PATCH] D117240: [libc++] Fix bug in ranges::advance and refactor the tests

Tue Jan 25 09:56:59 PST 2022

ldionne added a subscriber: jwakely.
ldionne added inline comments.

================
Comment at: libcxx/include/__iterator/advance.h:153
+      // __magnitude_geq(a, b) returns |a| >= |b|, assuming they have the same sign.
+      auto __magnitude_geq = [](auto a, auto b) {
+        if (a == 0) return b == 0;
----------------
Mordante wrote:
> 
Ugh of course, thanks.

================
Comment at: libcxx/include/__iterator/advance.h:171
       // If |n| >= |bound - i|, equivalent to `ranges::advance(i, bound)`.
       if (const auto __M = __bound - __i; __magnitude_geq(__n, __M)) {
         (*this)(__i, __bound);
----------------
CaseyCarter wrote:
> Quuxplusone wrote:
> > I would prefer to inline it at its one call site, here, rather than generalize it unnecessarily.
> > 
> > Unfortunately https://eel.is/c++draft/range.iter.op.advance#6.1.1 seems to mandate that even when `__n == 0`, we need to evaluate `__bound - __i` and then, if that's also zero, we must `__i = __bound` instead of simply returning immediately with `0`. Is there any blanket-wording loophole that gets us out of that obligation? This code could get much simpler if we were allowed to put `if (__n == 0) return 0;` on line 168.
> > 
> > Also, notice that if `!(bidirectional_iterator<_Ip> && same_as<_Ip, _Sp>)`, then comparing `__n <= __M` is going to be superfluous — we know `__n` can't be negative unless `(bidirectional_iterator<_Ip> && same_as<_Ip, _Sp>)`, so, if `!(bidirectional_iterator<_Ip> && same_as<_Ip, _Sp>)`, it would be really nice not to generate any test for that possibility.
> > 
> > https://godbolt.org/z/jE66PEP4E — I haven't really looked at this code but it sure //looks// like libc++ is doing something suboptimal right now, doesn't it?
> > Unfortunately https://eel.is/c++draft/range.iter.op.advance#6.1.1 seems to mandate that even when `__n == 0`, we need to evaluate `__bound - __i` and then, if that's also zero, we must `__i = __bound` instead of simply returning immediately with `0`. Is there any blanket-wording loophole that gets us out of that obligation? This code could get much simpler if we were allowed to put `if (__n == 0) return 0;` on line 168.
> 
> It is unfortunate that the wording requires the difference to be performed when `n == 0`, and the assignment to be performed even when `n == 0` and `last - first == 0`, which implies `first == last` already. MSVC does compute the distance (I think `n == 0` is a weird enough corner not to be worth the cost of checking) but not the assignment when `n == 0`.  I suggest you implement what you think is most performant, and file an LWG issue to allow your implementation and ours.
> 
> I would prefer to inline it at its one call site, here, rather than generalize it unnecessarily.

After looking at the code generation and seeing that we are indeed paying for the generality, I agree that we should special case it: https://godbolt.org/z/Ez6W9Yfej. I'll still use a lambda just to avoid having a super complicated inlined `if` condition. I tried various implementations and yours was basically the most efficient and correct one. Even using `if-else` instead of a ternary somehow produces slightly worse code.

> [about evaluating `__bound - __i` when `__n == 0`]

I still think it's better not to generate a special check for `__n == 0`, since that should be fairly rare (as Casey mentioned).

> https://godbolt.org/z/jE66PEP4E — I haven't really looked at this code but it sure looks like libc++ is doing something suboptimal right now, doesn't it?

I looked, and I think it boils down to libstdc++ not protecting against signed integer overflow like what we fixed back in D106735. @jwakely , you might be interested in taking a look, here's a distilled version of libc++ vs libstdc++: https://godbolt.org/z/aYbq1nxhf

================
Comment at: libcxx/test/std/iterators/iterator.primitives/range.iter.ops/range.iter.ops.advance/iterator_count_sentinel.pass.cpp:154
     }
   }

----------------
Quuxplusone wrote:
> This is obviously better than it was. But I'd love to see some simple regression tests like we were doing on Discord:
> ```
> int a[10];
> int *p;
> const int *pc;
> using ConstPtr = const int*;
> 
> // forward with same type
> p = a+5; assert(std::ranges::advance(p, 0, a+7) == 0); assert(p == a+5);
> p = a+5; assert(std::ranges::advance(p, 1, a+7) == 0); assert(p == a+6);
> p = a+5; assert(std::ranges::advance(p, 2, a+7) == 0); assert(p == a+7);
> p = a+5; assert(std::ranges::advance(p, 3, a+7) == 1); assert(p == a+7);
> 
> // forward with different, non-assignable sentinel
> p = a+5; assert(std::ranges::advance(p, 0, ConstPtr(a+7)) == 0); assert(p == a+5);
> p = a+5; assert(std::ranges::advance(p, 1, ConstPtr(a+7)) == 0); assert(p == a+6);
> p = a+5; assert(std::ranges::advance(p, 2, ConstPtr(a+7)) == 0); assert(p == a+7);
> p = a+5; assert(std::ranges::advance(p, 3, ConstPtr(a+7)) == 1); assert(p == a+7);
> 
> // forward with different, assignable sentinel
> pc = a+5; assert(std::ranges::advance(pc, 0, a+7) == 0); assert(pc == a+5);
> pc = a+5; assert(std::ranges::advance(pc, 1, a+7) == 0); assert(pc == a+6);
> pc = a+5; assert(std::ranges::advance(pc, 2, a+7) == 0); assert(pc == a+7);
> pc = a+5; assert(std::ranges::advance(pc, 3, a+7) == 1); assert(pc == a+7);
> 
> // backward with same type
> p = a+5; assert(std::ranges::advance(p, 0, a+3) == 0); assert(p == a+5);
> p = a+5; assert(std::ranges::advance(p, -1, a+3) == 0); assert(p == a+4);
> p = a+5; assert(std::ranges::advance(p, -2, a+3) == 0); assert(p == a+3);
> p = a+5; assert(std::ranges::advance(p, -3, a+3) == -1); assert(p == a+3);
> 
> // backward with different type is UB so don't test it
> ```
> This covers all the cases, I think. The only other thing that needs testing is whether we assign from Sent to It in all the right places (and not in the wrong places); that'll require a sentinel type with a user-defined `operator=` so we can see that it's being called.
> The only other thing that needs testing is whether we assign from Sent to It in all the right places (and not in the wrong places); that'll require a sentinel type with a user-defined `operator=` so we can see that it's being called.

Actually, in order to test that, I would need to define a whole random access iterator with an `operator=` that takes a sentinel, and count the assignments from the iterator's `operator=`. Otherwise, to only define a sentinel, I have to fall back to some trick like the conversion operator I've used elsewhere, but it wouldn't be clean to count the number of times the sentinel gets converted to the iterator type as a proxy for counting the number of assignments. In other words, defining only the sentinel as a shortcut would be brittle IMO.

Given that the intent is for the Standard not to force us to evaluate those anyway, and given that we are already testing this "from the side" by counting operations with `stride_counting_iterator` to ensure that we don't increment when we should be assigning, I think there is diminishing returns in adding this test. Let me know if you agree -- if not, I'll add it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117240/new/

https://reviews.llvm.org/D117240