<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/78829>78829</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Ranged-for loop with _LIBCPP_ABI_BOUNDED_ITERATORS does not optimize runtime checks
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          davidben
      </td>
    </tr>
</table>

<pre>
    When enabling bounded iterators, it seems Clang cannot optimize them out of a ranged-for loop. Here's a simple function that just sums over a span with a ranged-for loop, `sum1`, as well as a corresponding hand-written iterator loop, `sum2`.
https://godbolt.org/z/WqsPKdEoa

The left output, without `_LIBCPP_ABI_BOUNDED_ITERATORS` has no runtime checks, and Clang is even able to vectorize the loop. The right output, with `_LIBCPP_ABI_BOUNDED_ITERATORS` contains runtime checks with, in turn, impede the optimization.

**Cause**

I teased apart what `__bounded_iter` was doing. This is the assertion in question:
```
  // Return whether the given iterator is in the bounds of this __bounded_iter.
 _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bool __in_bounds(_Iterator const& __iter) const {
    return __iter >= __begin_ && __iter < __end_;
 }
```
I've extracted that into a minimal `sum3` function, and that seems to reveal the problem. While Clang can prove that `iter >= begin`, it can't prove that `iter < end`. I suspect this is because iterator loops look like `iter != end`, not `iter < end`. In order to prove that `iter < end`, it must also know that `begin <= end`.

Indeed, if I add that assertion to the top, in `sum4`, Clang pays for a *different* undesirable runtime check outside the loop, but then figures it out! So fixing https://github.com/llvm/llvm-project/issues/78784 may have further benefits, if the analysis is extended further.

**Alternative fix**

Another way Clang could have figured it out is if we used `iter != end` instead of `iter < end` for the safety assertion. That's actually what Chromium's checked iterator does. However, libc++ cannot do this because it doesn't check `operator+`, etc., only `operator*`. It is possible for `__current_` to overflow well past `__end_` before we do the check.

I think libc++ should bounds-check `operator+` too. First, once `__current_` has advanced past `__end_`, we may have already hit undefined behavior because pointer arithmetic is locked to the allocation. So the check in `operator*` may be too late anyway. Second, #78771 describes a place where libc++'s `memmove` specialization breaks its own hardening checks! Since `memmove` specialization is pretty valuable, I think it's best to fix #78771 by bounds-checking `operator+` and making sure the specialization takes triggers it early enough.

Now, once `operator+` is bounds-checked, `__bounded_iter` can rely on `__begin_ <= __current_ <= __end_`. That means the safety check need only be `__current_ != __end_`, which should be easily optimized out.

(CC @danakj)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykV99v4joW_mvcl6NBwbRAH3hoadGg3Z0ZtbOafUMn8Qnx1LFzbQfK_etXxw4F2t6rlVZCEGLn_Py-7zgYgt5aooW4uRc3D1fYx8b5hcKdViXZq9Kpw-JXQxbIYmm03ULpeqtIgY7kMTofhFyCjhCI2gBLg3YLFVrrIrgu6lb_SRAbasH1EVwNCB7tltSX2nkwznUj-EqehJwFQAi67QxB3dsqamchNhjhdx8ihL4N4HbkeVeHFvY6Nh-tcTRiWoS-HYtpwf8wwJ6M4V-EynlPoXNWcS4NWvVl73WMZN8SemdGimkxEsWDKO6aGLsgJndCroRcbZ0qnYkj57dCrv4UcvXrj_DjH-rRYd6ev382BIbqyPl3fWTDHDlXQ0yLzT_X98sfPzZ39-vN_fd_f3t4fNisfz4-3f38_vQspgU0GMA68L2NuiWoGqpeUsnRqqHaOgDtyAKWhiA62FEVnR_qPpSYo_B627wP43-KoXI2orbhXRTJQOq-hdh7my7bjlR2PHQfuY-j84oIyZ8l9oHy5fniGiJhIAXYoY-w5_5zjJsBdxtuEwe1xwDKabvl5HTgKrBXDIF8go628EdPga-5Z9nHtBg-6S9AbiU8EScA-4ZiQz4Z2urdOSjYg00LKZDAUI7s9zKyIVE41vTr-uFxs3r6_i-u7tvd5fdvzz8f__PjCUrnDGw22mYrQcj5Zn30WTkbopBT3sBZy9t8C8Ts_pgAgM-x5z0gJo9i8sBR0VbbDQg5PVkAMVnCZkNWbcTkaELMHj4tzlrI2Y6AXqPHKpLKXNQ2OkBotdUtmoEjE27IkbNHcKbtWRWiA087QpMK2HlXGmpH8KvRhk6SwQs7ys-JaXGeTkpm4LOOvFnIWfz8gSWQVUxaWEPoQ0dVzJ3SAUqqGHaXXA_8_QJGv9DJjByz32yJvbKeferDgvOKMeP-Ppwh9Ja1DE1w8GLd_m1zSpB3n7xecGZtFZFKNmpYA6qhvie4R5eKG7N2sbHUmuvBda5yh4cALJUIQt4pXdfkyUYh74AxHLRPInLBc1aMoNVJTdhc2XNRyUKtt72nwKk5lpUxPDuo9WuS10u91LHpy1HlWiFXxuyOP186735TFYVc6RB6CkKuZvPZ_BpaPECDOx4HPvGyJEu1jmGoQ-K7RXMIubv0GinNpmH_J6pzZyJ5i1GzVf36UX_urEuu9ng4ItP1Rg1xpGTVkGzShBr2BD0L1qfQAW1DJFQsFx8xkVrBWQSsKR5O3WRNw5hnYhV7NOaQpXDZeNfqvk1LqT1noxiUozCCr25PO5aLJRhdVkLeC3l_nMnKZTKcmJCeynzK_RbTwnXZIj-b8UOxGvGvs-ZwueMu8yCVo3MhaEYQJ5Zku-o9I2zD2UaXBnht3D6P5A7DoO5JkqYFlFQ7T1xTlfGcQhq9mxCNti_nuYUm9ShL6Je_yAKicyNYaR9iTqSijyHyuEW1Q1uR-hhempp0wiUaT6gO0OiY-FNrSwpKanCnnX-rcee05caj17FpKeqKa2Vcat7AWzTGVXlWMoPeUh-YfFnuFEHJZHdgMDILDns8jOCZKmeTTAg5mc1nszEoCpXXJfHppzNYEU85T2flS1gS06KltnU7YvusmhrNML2h9IQvzPEAbm-hQa_IMsOPx5ExPOuhnn9thfHhKcYD7ND0LDQc6LGdOsO9pBC5JrV-PeVQHi56y54_tpdnTotpMfQ-y9W7CCK-UIDo9XZLPmkWoTcHIOv6bXOBsm9uf46Sd76YQGcBZWX-9JjCc82TOYCzw4ZhLmepP6HvdGcAW9YAaAltOBeJDAtLpDIZy3cwPgrQJWobXTVvPCEgDJqDGk7oivXsnVzOl0sQ14VCiy-_hby9UouJup3c4hUtxrPiZiaLm3lx1SyuZ9eVmtZUFZPxHGuU07mcqOu6pupWFvX8Si9kIa-LsSyKyXgqixEV0_lM3oyn05vJuCIprgtqUZsRzwM-UF-lSbCYzefy9spgSSakVxQpLe0hLQop-Y3FL9IMKfttENeF0SGGk5Woo6HF0-UbQj73_u2hN0kiXLzCXB59r3pvFv_PeJvL2_8GAAD__80ubCw">