[libcxx-commits] [libcxx] [libc++][hardening] Check bounds on arithmetic in __bounded_iter (PR #78876)

Wed Feb 14 00:06:37 PST 2024

https://github.com/var-const requested changes to this pull request.

In general, I like this direction. I'm sorry it's taken me a long time to get to reviewing this patch, but I wanted to give it the attention it deserves. I haven't examined the Godbolt yet, but I wanted to share some initial thoughts in-between.

1. Consider the following (bad) code:
```cpp
void foo(std::span<int> s) {
  {
    auto i = s.begin() + get_offset();
    if (i >= s.end()) return;
    // ...
  }
  {
    auto j = s.begin() + get_offset();
    if (j > s.end()) j = s.end();
    // ...
  }
}
```
I think this is technically UB, but it's likely to be "benign" UB (I call these hobbit iterators because they go there and back again). The current implementation wouldn't reject this case, but with this patch, it will start trapping if `get_offset` returns a value beyond bounds. I think part of the reason `__bounded_iter` is currently implemented this way is a concern that projects out in the wild might inadvertently rely on "hobbit" iterators, and rejecting them would cause unnecessary friction on adoption. I don't have any actual data on this -- this might be a largely hypothetical concern. But I want to make sure that if we want to break code like this, we do so consciously.

2. In general, while it's very important for this type to be performant, I'm a bit concerned if we start to tie the implementation "too much" to the optimizer. I want to be pragmatic here, so obviously some level of attention to the optimizer is necessary. What I'd like to avoid is relying on code that has subtle implicit interactions with the optimizer. For one, I'm concerned that we might end up being tied to the current implementation of the optimizer (and its quirks) and do things that further down the line become unnecessary or even harmful as the optimizer evolves (we also need to take other compilers into account, not only Clang, even though Clang is the more important one). It's also potentially brittle, and we don't currently have any sort of automated benchmarks to catch performance regressions. Specifically, I'm concerned about two things mentioned in this patch:
- an internal assertion that happened to improve performance by providing additional information to the optimizer. IMO, from a higher-level (logical?) perspective, it never makes sense that having an additional check could improve performance (at best, it could have a negligible negative impact). If we ever want to pass some information to the optimizer, we need to write it in a way that makes it explicit (something similar in spirit to `_LIBCPP_ASSUME`). It might expand to anything, including an assertion, under the hood, but the API must make it clear that the intention is to help the optimizer, not e.g. abort the program if the condition doesn't hold.
- the `!=` comparison optimizing better than `<=` comparison. I don't know if this is a fundamental property that the former optimizes better or it just happens to be so with the current implementation of the optimizer. In either case, the two seem to be largely interchangeable from the logical perspective. We need to be pragmatic here, so we likely need to rely on this, but I just want to see if we can find some way to express this more explicitly (comments definitely help, but perhaps we could find a way to go beyond that).

I want to spend a little more time on the patch just to make sure I don't miss anything, but so far I don't see any significant issues with this approach.

https://github.com/llvm/llvm-project/pull/78876