<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54164>54164</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Narrowing a shift result peephole: (unsigned)(1ULL<<n) could save instructions on RISC-V and ARM [performance]
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          pcordes
      </td>
    </tr>
</table>

<pre>
    ```
unsigned pow2(unsigned n){
    return 1ULL << n;
}
```

compiles to (https://godbolt.org/z/qej1dbW3T)

```
pow2(unsigned int):
        addi    a1, a0, -32
        bltz    a1, .LBB0_2
        mv      a0, zero
        ret
.LBB0_2:
        addi    a1, zero, 1
        sll     a0, a1, a0
        ret
```

But instead, I think it would be more efficient to materialize a 0 or 1 according to `n<32`, and then shift that.
* `n<32` case: the 32-bit unsigned result has a set bit
* `n>=32 && n<=63` case: the `1ULL << n` result has a bit set, but it's lost when truncating to unsigned, so the result is 0.  `0` shifted by anything is 0.
* `n>63` case: the `1ULL << n` has UB so it doesn't matter what we return.  I haven't checked on whether RISC-V shifts saturate or mask their count or what, but I assume that `0 << n` is always 0 even for large n.  If it saturates the count (shifting out all the bits), we could maybe just shift a constant 1 without an extra data dependency on the count.

```
# sample implementation
      sltiu   a0, a0, 32         # retval = (n<32)    
      sll      a0, a0, a1        # retval <<= n
      ret
```

This seems clearly better, except that the critical path latency is now 2 cycles from input to output (assuming both have 1c latency).  The original version shifted a constant after branching, so only 1c latency.

On ARM, shifts with count > reg_width do just shift out all the bits, so GCC compiles this to:

```
# GCC11.2 for ARM32.  clang is unfortunately more complicated
pow2(unsigned int):
        mov     r3, #1
        lsl     r0, r3, r0
        bx      lr
```

For x86-64, we use `mov eax,1` / `shlx`, which masks the count, producing a non-zero output when `n&63 < 32`.  (On Intel, xor-zero / `bts eax, edi` would be even more efficient, but we miss that peephole.)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVVk1z4zYM_TXyhROPPmzZOfiQOJtOZtJ2ZrvbHncoCba4oUSXpGI7v74PlOTY3m2mVRTJEkHgAXgAVJjquIryeDjjhyi-61qnti1VYmf2aZQuT89tlN5Gi_teSuCw5DvbiuTr87OIsjVOyGSDQLR4GH5cqu-vpWl2SpMT3gjYqL3fuSi7i9JHnFtTFUb7qbFbPL3h_2_6nlTFX9kXhnCm5kr3NWLVet6Q3b1j5kNWlQr3JErXQsZ8vcnSS6FC-7d3oenz_X387UqkeR30BQ1vZM3lOuLTvxh3fwwkKMA9uRRyWp9ZOWH-uaWfBvu-8wiF8yQr3vwkfK3aF6G82JtOV6Ig0RhLgjYbVSpqPWelkZ6sklq9kZAiFsaKRMiyNLZS7TbkLY-R7jUCl_fQ2gqaqRWuVhvoqKWfDjDSuwtpUUpHiAaLiyy9KQDllDRLrtNe1NLBriMvsHqt5lOUPWQpmJPjZNKBfA95dq0az5fkxPqFejYME4y-4CDh18IJbRxCw55427Wl9IPDI0QWdyZYGLQpJ-KpYHsc-z4AcKU4IihHjva2F_nBj_-GmcF-vWebwFsZcijFhecUIUdAKgGXhnIEjCdseKVepqypfAEU07JHMGDF56c_1jd_9iCdcBK7kGtOcCPdC4NQVpSmAw9Mr32Mz5OQznUNhdwGby9gwkWp9_IITwUBgNhgv5Z2SyLA2jD80Z4L3vZmULMBDcfJwI4E5XkV6XFcwDC_D7IgayOP4Ov3DinqeSaxAHJL6EnEXvk6aGgFHbyVopIeF9pRW1FbHjkOJ7vTD1pJlGaA2uw0CcXXBmUBIpj2vPCc9qp7L81wBS3Hg3UgKa9SI0wP7OVQAeltWL7QpM97yVjrP9W07ukuLqB83AG-1MiNI2qcKDVJq48oeiYPm6FDSbu-XvvYWOVVCVM76Wvkz4fAQUFr9iIV5bHkxr2xpkFX2XWhWyDm_AsuBoZwIguD3UxEkZSjFngOInypmW1qq1oYeSXrENZTzZylU26Y3oWVbcklNJSdaYH-XeVFEn9vxd3nX4Ngz27mw0iy7BOitP22VxXeVeacQz-SLpj6Zb0W75OKY-jNqYn_K22wK0mmaWA_0GQpXC617HtA1-K171qAhxuh7bIFjYDD-_81xhrTzx-bMVxYvhoc2vWcsoFMvZS9GhzFYZC1H7DnEY4clvlNPhsqsXOhTzEAkge8TLj8Maf5rav1YZgI-1qVdegqZ9XOCztrqq5klkiwqr3h0TdyKDTe0B7TPM-Y7yIMDe6v6RIZfmoRO9ZyMLbfOVgukPAej6BKMaTTfAvt6HLIjU0N_jTKuZ7_O6JdbTRNEfBJtcqq2-xWTrzymla_SWvNvgfd02bo_uMmbuFneQuda8nNvK9Z_n4ampjjsuCRbLuSm4rjvjT0ZR6joI2I5vc7smBLA_5TNH-YdFavrr6UQO-umIJBeND6dbzdIMDfqYSPj3CtIxD6cT5L8tmkXmW3lMcyT2OaJYv55jal5WaWLtJ4vpktNkk60bIg7Vawz0bVKo3TNM7wN08Ws3yKzXK2TJZzmdGCqjyaxdRIpadsmL_ZJnYVMBTd1mFRK-fd-yL6A0eHRv2yQ8e2qx1_WJCbBLirgPUf6YkmZQ">