<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60857>60857</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [LoopVectorize] vectorizer generates unusual/unprofitable max iteration count check for masked load index
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            performance,
            vectorization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          erikdesjardins
      </td>
    </tr>
</table>

<pre>
    For the following IR:
```llvm
define void @xor_loop(ptr noalias %data, i64 %len, ptr noalias %key) {
start:
  br label %body1

body1:
 %i = phi i64 [ 0, %start ], [ %next, %body2 ]
  %exit = icmp eq i64 %i, %len
  br i1 %exit, label %ret, label %body2

body2:
  %next = add i64 %i, 1
  %masked = and i64 %i, 4095
  %key_ptr = getelementptr inbounds [4096 x i8], ptr %key, i64 0, i64 %masked
  %key_elem = load i8, ptr %key_ptr
  %data_ptr = getelementptr inbounds [0 x i8], ptr %data, i64 0, i64 %i
  %data_elem = load i8, ptr %data_ptr
  %xor = xor i8 %key_elem, %data_elem
  store i8 %xor, ptr %data_ptr
  br label %body1

ret:
  ret void
}

```
which is like
```c
for (size_t i = 0; i < len; ++i)
  data[i] ^= key[i % 4096];
```

The vectorizer (https://godbolt.org/z/4933MGM8f) generates the following check to skip the vector loop if the iteration count is >4096:
```llvm
vector.scevcheck:                                 ; preds = %body2.preheader
  %0 = add i64 %len, -1
  %1 = icmp ugt i64 %0, 4095
  br i1 %1, label %scalar.ph, label %vector.ph
```

While this technically allows it to save one instruction in the vector loop, it means anything larger than 4096 iterations is handled by the scalar loop, which feels suboptimal.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVd-P4jYQ_mvMi3UoODgkDzwsy1FV6r1UVfu4cuKBzGHs1Hb2YP_6apwAgZ62Xa0AO_Pjm5nvm6gQ8GAB1kxumNzOVB9b59fg8aghfFdeow2z2unLeuc8jy3wvTPG_UB74L_-zvIXlm1Z9sKKbPg35v00XGnYowX-7lBztszOzr8Z5zomyi56bp0yqAJnQmoVFROvHIslHQ1YOj0ZHeHCRMXZajNED1H5eEvPee25UTUYsiW4ixFX-hwubsZMSOQs3_KuxSGr3PCMkjIhU2DO5Dad5YbuLJzj-JhCifR4TMyEhDPGFA-bU8fh72slOPpQRXeYuLj60OMbaA-P55TouQgxqXjElRIrrR-SLiZGJxWOoAcz-2i2zCo5sTzC5Y3aTqYHiGDgBDbSDdra9VYHasgyqwp-5liOPUoe44CGIWaTaQ7Zn5JQ5JTFOKUp0kMYwjBxIHr8D1jZTzBNmTUFhc_hPwF0TT9xObsBDH1jOS1qHPgt5tUpROdhtD07_0n4T3lMHLkTwENM6hotVtup6U2Qw_FHi03LMXCDR3iyaIbznqoSZcAPeIt8EEjG8k36-cqJxDmpYcPEBpmorjBSk-UGmdxyJr-SG3FBbpBqII4VNJV881Ngw-cfLfB3aKLz-AEJRhtjF6hYsWNid3C6dibOnT8wsftgYres8vzbL9_KPW2FA1jwKkJ4WlBNC82RR8fDEbv0bEjCaRFx3KcrjOSLzvLG9TZSk1j-NcH-ZLsNgeahgfeUheUv_L_-qH-dB-Jrvr1pfN55aEFpmFIse5b1uBW_TJW9uC-d_hCvltmzsm87Z_GwYEKjjPLzrn24Hevq2k-m9VeLBnhsMfAITWuxUcZcuKK2B44xdVy9A3cWONoQfd-kBqN9HkISZeQnUDZwZS-xpbkZ5Q9AbxtlE3_uMwo0nlZZbUDz-pLCDYXcwg1U3wOYwENfuy7iSZn5TK9zXeWVmsF6UayKVVlKWc7adSYqKfKqWEqxX0mQZbMs91kOy3xZ5UrUM1yLTOSZWFRZkUtZzistV4UqslpqyHVZsGUGJ4VmTvQgjs4whB7WRVbK1Sy1NqQXrBAd-L3zJ2UbYEIw8cqEuBI_FUi3cjvza4r1pe4PgS0zgyGGe_SI0aQ39m_OdX9eZUP6m2jorone9qFXholdbzvv9hhVbYCf1Plf3B8EQ5tgfGsM-9BqOM96b9ZPqsTY9vW8cScmdkkaw9eXzrvv0EQmdqkRgYld6sU_AQAA__8TNXp5">