<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60762>60762</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Non-optimal result for SSE2 range compares
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sesse
      </td>
    </tr>
</table>

<pre>
    Given this function:

```
#include <string.h>
#include <stdint.h>
#include <immintrin.h>

auto func_a(const char *ptr) {
    int8_t b __attribute__((vector_size(16))); 
 memcpy(&b, ptr, sizeof(b)); 
 
 auto non_name_mask = (b <= 0 || b >= 16);
    return _mm_movemask_epi8(non_name_mask);
}
```

Clang 15 x86-64 compiles this as:

```
.LCPI0_0:
        .zero   16,240
func_a(char const*):                           # @func_a(char const*)
        movdqu xmm0, xmmword ptr [rdi]
        movdqa  xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]
 paddb   xmm0, xmm1
        pminub  xmm1, xmm0
        pcmpeqb xmm1, xmm0
        pmovmskb        eax, xmm1
 ret
```

Essentially, it seems it's rewriting it into `b -= 16; min(b, -16) == b` (assuming wrapping adds), which is not the instruction sequence chosen for scalar compares. It should be possible to use `(b - 128) >= -112` instead, which also saves a register. (It does require one more constant, but that sounds reasonable?)

uICA links for the two sequences: https://bit.ly/3I1NafF https://bit.ly/3lt5hn4

In a sense, this is a regression, because Clang 5 has this more obvious instruction sequence, which also is faster (in a loop) according to uICA:

```
.LCPI0_0:
        .zero   16,1
.LCPI0_1:
 .zero   16,15
func_a(char const*):                           # @func_a(char const*)
        movdqu  xmm0, xmmword ptr [rdi]
 movdqa  xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
        pcmpgtb xmm1, xmm0
 pcmpgtb xmm0, xmmword ptr [rip + .LCPI0_1]
        por     xmm0, xmm1
 pmovmskb        eax, xmm0
        ret
```

When asked to compile for armv8-a (removing the pmovmskb intrinsic), it seems Clang chooses the sequence that I want:

```
func_a(char const*): // @func_a(char const*)
        movi    v0.16b, #240
 ldr     q1, [x0]
        movi    v2.16b, #241
        add     v0.16b, v1.16b, v0.16b
        cmhi    v0.16b, v2.16b, v0.16b
 ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8VlFv6jgT_TXmZQRyHBLoAw-Flk9In65Wug_7iJx4IN7GdupxaHt__coOhcKl1e7q7iJwQjwzPsczcxxJpPcWccGKJSseRrIPjfMLQiIcVU69Lf6nD2ghNJpg19s6aGdZfs_4A-PvY8mP3-GvyLWt214hsHxFwWu7nzQsf7w9rbQNn05rY7SNET5apFH2wSVAW8nEvHaWAtSN9MDEfRc8E3fAZsvBFgBA2zDfBqhgu5UheF31AbdbJuZMzA9YB-e3pH8gE_OsZOLu-M2XcAxh0NTdW7IvKyZWkBZZQXRyOybm1U8ux0tCap3dWmlwayQ9AcsfILpEivGeA5ut2GwF8cljfDKAyD8Q8Bh6b2FrzNa4A8Y4W-x0xH8R_KMfmz3cTlEaV620e8gKeJ2X43IKtTOdbpGGZEv6Os2T_69-2_AtP1nB8TP5gd4BJAorMT2an1MVk5TyxcR9AnvyvPFhIgc25Z97Xyxt3EE99_BqDI-5eTXmxXkVUwWsWHqlWfFww0NCNM1uuegOmFjCiWzxkCBF8yGLxTJSHIj-G-MJcCeVqgA-kssuuXRG27664MKvLGrT4XP1lYVxB0NP1ft_lK9Xi3kMX9TUIxHaoGXbvkU_HYAQDYEOTMwIPL54HbTdxxltgwNW8grGx5LPl2C0HZppBePUBHGb43TFSh6bRhL1JkZ48bLr4o1UilLzreCl0XUDmsC6AKFB0JaC75NqAeFzj7ZGqBtHaGHnPFAt21RRppMeaQKbANS4vlVQIXSOSFctQnDQE0awqW3HkEUhuHvv1nGWiQgvroZSnZHIlhyQPCCBBI97TQH9JNLYBFAO444899ojOItgnMehuKUNMUjVRxYyALneqmgsyVlZtcjy9an6h7HfrO6h1faJErFIPry4E-nYzdCE0KW2Fmsm1pUOk5imdb7Jvsnd-tPpNhSNnX5cbGNBAqEljDCTXugjRY9E8YyI8LGWcdsGoSmgkUdtSURdddCup5sputrBePbIuHVx53Rcu3WuiwmQde28ilUQU7RZ3f8K0coujLOz8aVV8d8r21-Stl-gaVnahX_6uxbZKDv7cFN2Pkzd5HWJ9UZo59P1hip-LmVXmve1ov3eoAVJT6hiiR3PyNRi0pvDfCxjTXo07pCqsMHzwsOrC-n6qE4nNRwaom6co3Tc4lmbUrtv4CUqwJel_GXZDR389-pLx-uBT7Lh_YaJ_HR4Q6uGbX5O-WPF8pXfOkqHEOIixNUhJZWCy4UO2elueHZhX5vmCtg5_oX9z1kcqUWu7vI7OcJFVs7KaZmXnI-ahcLZlGOppoVUks-Lkk9xOs1UxuWu4rwa6YXgIuciKzIhOOcTPpOyuitVns-nclfv2JSjkbqdtO3BTJzfjzRRj4uSz0oxamWFLb2_UvtFNBpX_Z7YlLeaAp3dgg4tLr45O3Zd0Ea24JH6NqQC-_79UYCXdo-nA2rU-3ZxqdN7HZq-mtTOMLGOgY-XcefdH1gHJtYJHDGxTvj-DAAA__96eF0w">