<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/88958>88958</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
x86 codegen, AVX2: missing optimization -- the compiler could swap args of commutative intrinsic `_mm256_testz_si256`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
AlexGuteniev
</td>
</tr>
</table>
<pre>
The below function has better codegen when `MASK_FIRST` is defined:
```C++
#include <immintrin.h>
bool f2(const char* p)
{
for (const char* e = p + 256; p != e; p+=32)
{
const __m256i m = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(p));
if (_mm256_testz_si256(
#ifdef MASK_FIRST
_mm256_set1_epi8(~0x40), m
#else
m, _mm256_set1_epi8(~0x40)
#endif
))
{
return true;
}
}
return false;
}
```
See https://godbolt.org/z/qjqdo5x69
I want the compiler to swap arguments automatically for commutative two-args operations.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8lM2O4zYMx59GvhAJHPoj9sGHzKQpFkUvnUXRWyDbdKyFPrwSPZmdQ5-9kJPNJLuLCkZiWuRP4l8UZQjqZIkaUTyJYp_ImUfnm52mt99nJqvoNWld_635PBK0pN0Zhtl2rJyFUQZoiZk8dK6nE1k4j2RBlOmfu5c_jodPf718FmUKKkBPg7LUi2wn0r1Id6JML8-zwKf4XL5ipmyn555AZM_KGGXZK7seRfbb1WP5bZ3TMKDAqnM2MHSj9AJ3MAmsr37bKxIAYHAefvKNS-xhAoFPgEUpsqfF2MSvtFhxW9k-wxs0sh7AcVyox6PBolRgFurRROuoneznY1ARj5UnZZn85ImPnQwssueHWIG7mCZWSxZYi-yHldQQs7iymQK_39gf8g09DXAn_4VwDQrEmyNNqhJY_Zu-5emy0jOYWzzpQHA3HnfwfZgY9H_MG872avg145rkw9yHuJ549hbYz_STDmK7vz-Pe2N5u8YOUoeP4JvfrfIu5gsRjMxTiKWJB4GHk-tbp3nt_Eng4V3g4euXr70r3sr6vgY_wVlaBh4JOmcmpckDOwhnOYH0p9mQ5QByZmckq05q_W0pxM4ZM7Nk9UrAZ7eS_hTATeRlvFRhfaEnfZP1dVbLhJrNdpNhnWG1TcZmQ1leyaHNir6locu32xbrsk-rapA4yCpRDaaYp_mm3FR5lW7X3abfFt2Q1kWd1fWQizwlI5Vea_1qYpaJCmGmpqrqokq0bEmHpR0gWjrDMikQY3fwTYxZtfMpiDzVKnD4oLBiTc1bVX7vBrFIdn__gyLbgVEhKHsCN7Ey6n3JFVarR_k6N-v-pmAANzyodWkGQXWxw_ziGpRpMnvd_HCaise5XXfOCDzEvV7_VpN3X6hjgYclwyDwsCjwXwAAAP__DAZ5ig">