<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/131588>131588</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
x86 avx2 vpor is first done on calculation-heavy operands
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
ImpleLee
</td>
</tr>
</table>
<pre>
See the code and the compilation result at https://godbolt.org/z/Kchh341vW . This code calculates vpor of several operands in the loop, where some operands are relatively cheap to calculate, while some are not. Compilation flags: `-O3 -std=c++2b -march=skylake`.
```c++
#include <experimental/simd>
#include <cstdint>
namespace stdx = std::experimental;
template <class T, std::size_t N>
using simd_of = stdx::simd<T, stdx::simd_abi::deduce_t<T, N>>;
using data_t = simd_of<std::uint64_t, 4>;
data_t f(data_t a, data_t b) {
while (true) {
data_t result = a;
result |= (a << 1) & std::uint64_t(0x802008020080200);
result |= a >> 1;
result |= a >> 10;
data_t temp = a << 50;
result |= data_t([=](auto i) {
if constexpr (i + 1 >= 4) return 0;
else return temp[i + 1];
});
result &= b;
if (all_of((result & ~a) == 0)) return a;
a = result;
}
}
```
The assembly of the loop is as follows.
```asm
.LBB0_1:
vmovdqa %ymm4, %ymm3
vpaddq %ymm4, %ymm4, %ymm4
vpand %ymm1, %ymm4, %ymm4
vpsrlq $1, %ymm3, %ymm5
vpsrlq $10, %ymm3, %ymm6
vpor %ymm6, %ymm5, %ymm5
vpsllq $50, %ymm3, %ymm6
vpermq $249, %ymm6, %ymm6 # latency 3 on skylake
vpblendd $192, %ymm2, %ymm6, %ymm6
vpor %ymm6, %ymm5, %ymm5 # ymm6 is heavy to calculate, but or'ed first
vpor %ymm3, %ymm5, %ymm5 # ymm3 and ymm4 are cheap to calculate, but or'ed later
vpor %ymm4, %ymm5, %ymm4
vpand %ymm0, %ymm4, %ymm4
vptest %ymm4, %ymm3
jae .LBB0_1
```
The critical path of this loop is `vpmov-> vpsll $50 -> vperm -> vpblend -> vpor -> vpor -> vpor -> vpand`, but if ymm6 is vpor'ed later, the other two vpor's does not need to be on the critical path.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyUVkuP4jgQ_jXmUgI5doBw4MBjkVY72j3sSHtETlzpeMaJM7aThjnsb185jyZ0Q2sWIXDsr7766mHHwjn1UiFuyXJPlseZaHxh7Pb3stb4BXGWGnnd_o0IvkDIjEQQlRweylpp4ZWpwKJrtAfhofC-doTvCDsRdnoxMjXaL4x9Iez0k7DTH1lR8Dhq_4EFfC2U6zkzobNGC48O2tpYMDk4bNEKDaZGKyrpQFWdW21MTdgBXgu0CM6UeIMIi2AxaGpRXyErUNTgzY2-N1R6MAz4yvgFHCax5Fq8hAiArOj8Lw5z5yXhx4ywPWF7lsK8FDYrCD-671ctviNZ0QWhu_Bd0f47gMMU46rKdCMRCD_gpUarSqy80ISdnCol4b99gGXOS1X5fqkSJbpaZAjOywsQfoRO0Y7w3R0f3_cqPJZ1CLZj0sI5-BrifjNy6ieePfzZ0zdOVS8QlJxNPrJfRmTQdxjNJ7Nnkar-SaJsMjz7EdfRhu-gpueXwouz7-l7V4Qf3hQ1qvKr-OyDfTw1HsxywpJhKAJmGKeEbYCsAxYAhsISlnjb4P1S-AxGQ6sGJaL3MwLGlfUhLBKWiJBBwg8QdWxsBQ8UJ_SSUEbp7YewzWfEgTVkCKIB9WyZ3rMM-kN1YQR26pb0M3e9GWFJ2OD8SJbHEFnjDaiPOQoflUNmKufxUtuQBQWE7SHqZR0hDlYWfWMrGB2jdjjOBX1kuR-sgrs7cWR9fJoftgoO0vtFlXel0Dr0DEsIS25w-Fd0MfBjMOwSfxP3rrqiS1pvO6wELXQ3_I5bt2-8rwWCcA7LVF_DaTSePKAcCAe50dq8usXUULiS0N3iy35Pz1HoErqDtjSt_CGAsOW1LOPQu_2QT5S1tZDyB3wE3Q3v8JUc5qNfQDurO_Y4mvp_Gy6fgOlD9OoObWxX02FlQvmMXQ_sy19hR1v2aBZvppjbEAjjEM66KrsCB1PBeCaH5NepxkrKt9ZjcbRhN2v2mPP_Bthp6MQoBwWK9vrhlZM2HowlbI0ScmWdf-qDf-6Dd2_fUOHu3fX4BTf1FuZsn4ypm_ihm49NNuLp8zZra4_OPyCetvg3gd3_uDse7rfMKq8yoaEWvug3nXJvu46saFuXpp2Hs7HrpL6PYJhAW47Druzjg7GfjUQlg4Yhayp_K2QATVPIDt0ZYHyBFvyrGQEOpEEXbhFQIcpQjBRDI_r3ES1mcsvlhm_EDLfROmaMxnTNZsV2mcUpXeebLMryZLmSmzTOE0azTR5HOV-LmdoyypaUR2u64TFni3yzwTRJ18k6i6JolZGYYimUXmjdluG2NVPONbiNeLRMkpkWKWrXXfEYq_AVulXCWLjx2W0wmqfNiyMx1cp5d6PxymvcXpIViPbC-tQp1zcxSFN1oY7dp0w17zfAeCObNVZv310JlS-adJGZkrBT8DP8zWtrvmHmCTt16hxhp0F-u2X_BQAA__-z-RMx">