<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/80298>80298</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Inefficient codegen for loop with known small trip count
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dzaima
      </td>
    </tr>
</table>

<pre>
    The code

```c
#include<stdint.h>
void f(char* a, uint8_t l) {
 for (int i = 0; i < l; i++) a[i]++;
}
```

compiled with `-O3 -march=haswell` (or `-O3 -mavx2` or similar; [compiler explorer](https://godbolt.org/z/bnfe5j3oE)), generates unrolled `xmm` operations, even though `ymm` ones can trivially replace them.

Same thing happens with an assume of length:
```c
void f(char* a, int l) {
  __builtin_assume(l < 256);
  for (int i = 0; i < l; i++) a[i]++;
}
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0k8-O4ygQxp-mfCklIhD_O_jQSSbSnvawe48wlG1mMViA093z9CucbG-r1deRkI2Lz8Wv6gMZoxkdUQflCcpLIdc0-dDpX9LMsui9fu_-ngiV1wTsAuzl-azYY6jnNxfGKbtqAnGOSRuX9hOIH4_VuzcaB-CNmmQA_oIS-BlX41JzS2iBtwj16aHFwQcE3hiX0CCICzIQp216RrtNgZ-20aKE8mSgvDwj4pkD6ssXzM_sys-LsaTx1aQJoWK7PwXuZhnUBOIyyfhK1kLFMkVm-RDc33gO-4DRzMbKkGmgPD0TBqS3xfpAYSNqppSWCOIF-BX4dfS69zbtfRiBX38Bv_ZuoPKn8D-At9s440iOgkwUcXXB2wwJFXub523fJa8Z72KW0p0cpsmv41bD-1PjKKKSDlMwdyOtfcdAi5WKME007z_34S8556hxI05yWcjFR0ekQxnjOhP6AS25MU25im99_97Z7N0XV_F261djk3G3R3Lgjd085WWVqxcfyt9yAArdCd2KVhbUHWpW1W3TlnUxdbVkgyzrI1FNglXD0B90qaQqK90LTn1hOs74kXF2OByP7aHey6OupeS81FWpFRNwZDRLY_fW3ufsb2FiXKlrGG-bwsqebPzveoUui3b9OkY4Mmtiiv__lkyy1P3haBiMMuTSdu1GcltLrPfLw6B_nH91GGdpbfZ5QeVXl4o12O7LmTNpWvu98jPwa97m-dotwf8klYBfN9QI_LrR_hsAAP__rmQx1A">