<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/54164>54164</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Narrowing a shift result peephole: (unsigned)(1ULL<<n) could save instructions on RISC-V and ARM [performance]
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
pcordes
</td>
</tr>
</table>
<pre>
```
unsigned pow2(unsigned n){
return 1ULL << n;
}
```
compiles to (https://godbolt.org/z/qej1dbW3T)
```
pow2(unsigned int):
addi a1, a0, -32
bltz a1, .LBB0_2
mv a0, zero
ret
.LBB0_2:
addi a1, zero, 1
sll a0, a1, a0
ret
```
But instead, I think it would be more efficient to materialize a 0 or 1 according to `n<32`, and then shift that.
* `n<32` case: the 32-bit unsigned result has a set bit
* `n>=32 && n<=63` case: the `1ULL << n` result has a bit set, but it's lost when truncating to unsigned, so the result is 0. `0` shifted by anything is 0.
* `n>63` case: the `1ULL << n` has UB so it doesn't matter what we return. I haven't checked on whether RISC-V shifts saturate or mask their count or what, but I assume that `0 << n` is always 0 even for large n. If it saturates the count (shifting out all the bits), we could maybe just shift a constant 1 without an extra data dependency on the count.
```
# sample implementation
sltiu a0, a0, 32 # retval = (n<32)
sll a0, a0, a1 # retval <<= n
ret
```
This seems clearly better, except that the critical path latency is now 2 cycles from input to output (assuming both have 1c latency). The original version shifted a constant after branching, so only 1c latency.
On ARM, shifts with count > reg_width do just shift out all the bits, so GCC compiles this to:
```
# GCC11.2 for ARM32. clang is unfortunately more complicated
pow2(unsigned int):
mov r3, #1
lsl r0, r3, r0
bx lr
```
For x86-64, we use `mov eax,1` / `shlx`, which masks the count, producing a non-zero output when `n&63 < 32`. (On Intel, xor-zero / `bts eax, edi` would be even more efficient, but we miss that peephole.)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVVk1z4zYM_TXyhROPPmzZOfiQOJtOZtJ2ZrvbHncoCba4oUSXpGI7v74PlOTY3m2mVRTJEkHgAXgAVJjquIryeDjjhyi-61qnti1VYmf2aZQuT89tlN5Gi_teSuCw5DvbiuTr87OIsjVOyGSDQLR4GH5cqu-vpWl2SpMT3gjYqL3fuSi7i9JHnFtTFUb7qbFbPL3h_2_6nlTFX9kXhnCm5kr3NWLVet6Q3b1j5kNWlQr3JErXQsZ8vcnSS6FC-7d3oenz_X387UqkeR30BQ1vZM3lOuLTvxh3fwwkKMA9uRRyWp9ZOWH-uaWfBvu-8wiF8yQr3vwkfK3aF6G82JtOV6Ig0RhLgjYbVSpqPWelkZ6sklq9kZAiFsaKRMiyNLZS7TbkLY-R7jUCl_fQ2gqaqRWuVhvoqKWfDjDSuwtpUUpHiAaLiyy9KQDllDRLrtNe1NLBriMvsHqt5lOUPWQpmJPjZNKBfA95dq0az5fkxPqFejYME4y-4CDh18IJbRxCw55427Wl9IPDI0QWdyZYGLQpJ-KpYHsc-z4AcKU4IihHjva2F_nBj_-GmcF-vWebwFsZcijFhecUIUdAKgGXhnIEjCdseKVepqypfAEU07JHMGDF56c_1jd_9iCdcBK7kGtOcCPdC4NQVpSmAw9Mr32Mz5OQznUNhdwGby9gwkWp9_IITwUBgNhgv5Z2SyLA2jD80Z4L3vZmULMBDcfJwI4E5XkV6XFcwDC_D7IgayOP4Ov3DinqeSaxAHJL6EnEXvk6aGgFHbyVopIeF9pRW1FbHjkOJ7vTD1pJlGaA2uw0CcXXBmUBIpj2vPCc9qp7L81wBS3Hg3UgKa9SI0wP7OVQAeltWL7QpM97yVjrP9W07ukuLqB83AG-1MiNI2qcKDVJq48oeiYPm6FDSbu-XvvYWOVVCVM76Wvkz4fAQUFr9iIV5bHkxr2xpkFX2XWhWyDm_AsuBoZwIguD3UxEkZSjFngOInypmW1qq1oYeSXrENZTzZylU26Y3oWVbcklNJSdaYH-XeVFEn9vxd3nX4Ngz27mw0iy7BOitP22VxXeVeacQz-SLpj6Zb0W75OKY-jNqYn_K22wK0mmaWA_0GQpXC617HtA1-K171qAhxuh7bIFjYDD-_81xhrTzx-bMVxYvhoc2vWcsoFMvZS9GhzFYZC1H7DnEY4clvlNPhsqsXOhTzEAkge8TLj8Maf5rav1YZgI-1qVdegqZ9XOCztrqq5klkiwqr3h0TdyKDTe0B7TPM-Y7yIMDe6v6RIZfmoRO9ZyMLbfOVgukPAej6BKMaTTfAvt6HLIjU0N_jTKuZ7_O6JdbTRNEfBJtcqq2-xWTrzymla_SWvNvgfd02bo_uMmbuFneQuda8nNvK9Z_n4ampjjsuCRbLuSm4rjvjT0ZR6joI2I5vc7smBLA_5TNH-YdFavrr6UQO-umIJBeND6dbzdIMDfqYSPj3CtIxD6cT5L8tmkXmW3lMcyT2OaJYv55jal5WaWLtJ4vpktNkk60bIg7Vawz0bVKo3TNM7wN08Ws3yKzXK2TJZzmdGCqjyaxdRIpadsmL_ZJnYVMBTd1mFRK-fd-yL6A0eHRv2yQ8e2qx1_WJCbBLirgPUf6YkmZQ">