<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/60762>60762</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Non-optimal result for SSE2 range compares
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
sesse
</td>
</tr>
</table>
<pre>
Given this function:
```
#include <string.h>
#include <stdint.h>
#include <immintrin.h>
auto func_a(const char *ptr) {
int8_t b __attribute__((vector_size(16)));
memcpy(&b, ptr, sizeof(b));
auto non_name_mask = (b <= 0 || b >= 16);
return _mm_movemask_epi8(non_name_mask);
}
```
Clang 15 x86-64 compiles this as:
```
.LCPI0_0:
.zero 16,240
func_a(char const*): # @func_a(char const*)
movdqu xmm0, xmmword ptr [rdi]
movdqa xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [240,240,240,240,240,240,240,240,240,240,240,240,240,240,240,240]
paddb xmm0, xmm1
pminub xmm1, xmm0
pcmpeqb xmm1, xmm0
pmovmskb eax, xmm1
ret
```
Essentially, it seems it's rewriting it into `b -= 16; min(b, -16) == b` (assuming wrapping adds), which is not the instruction sequence chosen for scalar compares. It should be possible to use `(b - 128) >= -112` instead, which also saves a register. (It does require one more constant, but that sounds reasonable?)
uICA links for the two sequences: https://bit.ly/3I1NafF https://bit.ly/3lt5hn4
In a sense, this is a regression, because Clang 5 has this more obvious instruction sequence, which also is faster (in a loop) according to uICA:
```
.LCPI0_0:
.zero 16,1
.LCPI0_1:
.zero 16,15
func_a(char const*): # @func_a(char const*)
movdqu xmm0, xmmword ptr [rdi]
movdqa xmm1, xmmword ptr [rip + .LCPI0_0] # xmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
pcmpgtb xmm1, xmm0
pcmpgtb xmm0, xmmword ptr [rip + .LCPI0_1]
por xmm0, xmm1
pmovmskb eax, xmm0
ret
```
When asked to compile for armv8-a (removing the pmovmskb intrinsic), it seems Clang chooses the sequence that I want:
```
func_a(char const*): // @func_a(char const*)
movi v0.16b, #240
ldr q1, [x0]
movi v2.16b, #241
add v0.16b, v1.16b, v0.16b
cmhi v0.16b, v2.16b, v0.16b
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8VlFv6jgT_TXmZQRyHBLoAw-Flk9In65Wug_7iJx4IN7GdupxaHt__coOhcKl1e7q7iJwQjwzPsczcxxJpPcWccGKJSseRrIPjfMLQiIcVU69Lf6nD2ghNJpg19s6aGdZfs_4A-PvY8mP3-GvyLWt214hsHxFwWu7nzQsf7w9rbQNn05rY7SNET5apFH2wSVAW8nEvHaWAtSN9MDEfRc8E3fAZsvBFgBA2zDfBqhgu5UheF31AbdbJuZMzA9YB-e3pH8gE_OsZOLu-M2XcAxh0NTdW7IvKyZWkBZZQXRyOybm1U8ux0tCap3dWmlwayQ9AcsfILpEivGeA5ut2GwF8cljfDKAyD8Q8Bh6b2FrzNa4A8Y4W-x0xH8R_KMfmz3cTlEaV620e8gKeJ2X43IKtTOdbpGGZEv6Os2T_69-2_AtP1nB8TP5gd4BJAorMT2an1MVk5TyxcR9AnvyvPFhIgc25Z97Xyxt3EE99_BqDI-5eTXmxXkVUwWsWHqlWfFww0NCNM1uuegOmFjCiWzxkCBF8yGLxTJSHIj-G-MJcCeVqgA-kssuuXRG27664MKvLGrT4XP1lYVxB0NP1ft_lK9Xi3kMX9TUIxHaoGXbvkU_HYAQDYEOTMwIPL54HbTdxxltgwNW8grGx5LPl2C0HZppBePUBHGb43TFSh6bRhL1JkZ48bLr4o1UilLzreCl0XUDmsC6AKFB0JaC75NqAeFzj7ZGqBtHaGHnPFAt21RRppMeaQKbANS4vlVQIXSOSFctQnDQE0awqW3HkEUhuHvv1nGWiQgvroZSnZHIlhyQPCCBBI97TQH9JNLYBFAO444899ojOItgnMehuKUNMUjVRxYyALneqmgsyVlZtcjy9an6h7HfrO6h1faJErFIPry4E-nYzdCE0KW2Fmsm1pUOk5imdb7Jvsnd-tPpNhSNnX5cbGNBAqEljDCTXugjRY9E8YyI8LGWcdsGoSmgkUdtSURdddCup5sputrBePbIuHVx53Rcu3WuiwmQde28ilUQU7RZ3f8K0coujLOz8aVV8d8r21-Stl-gaVnahX_6uxbZKDv7cFN2Pkzd5HWJ9UZo59P1hip-LmVXmve1ov3eoAVJT6hiiR3PyNRi0pvDfCxjTXo07pCqsMHzwsOrC-n6qE4nNRwaom6co3Tc4lmbUrtv4CUqwJel_GXZDR389-pLx-uBT7Lh_YaJ_HR4Q6uGbX5O-WPF8pXfOkqHEOIixNUhJZWCy4UO2elueHZhX5vmCtg5_oX9z1kcqUWu7vI7OcJFVs7KaZmXnI-ahcLZlGOppoVUks-Lkk9xOs1UxuWu4rwa6YXgIuciKzIhOOcTPpOyuitVns-nclfv2JSjkbqdtO3BTJzfjzRRj4uSz0oxamWFLb2_UvtFNBpX_Z7YlLeaAp3dgg4tLr45O3Zd0Ea24JH6NqQC-_79UYCXdo-nA2rU-3ZxqdN7HZq-mtTOMLGOgY-XcefdH1gHJtYJHDGxTvj-DAAA__96eF0w">