<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/142046>142046</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            64 bit shifts on 32 bit targets fail to utilize shift range information

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          tom-rein

      </td>

    </tr>

</table>

<pre>

    When we know that a shift is greater than 32, we can translate a 64 bit shift into a 32 bit shift and set the lower register to 0.

Clang fails to do this optimization in most cases. This happens at least on ARM, RISC-V and x86, others probably also.

Here are some examples ([godbolt](https://godbolt.org/z/9Tdf1xKxd)):

```C++

#include <cstdint>

uint64_t shift_assume_ge32(uint64_t x, int s) {

 __builtin_assume(s >= 32);

    return x << s;

}

uint64_t shift_assume_bit5(uint64_t x, int s) {

    __builtin_assume((s | 32) == s);

    return x << s;

}

uint64_t shift_set_bit5(uint64_t x, int s) {

 s |= 32;

    return x << s;

}

uint64_t shift_set_bit5_assume_set(uint64_t x, int s) {

    __builtin_assume((s | 32) == s);

    s |= 32;

    return x << s;

}

uint64_t shift_32_plus(uint64_t x, int s) {

    __builtin_assume(s >= 0);

    s += 32;

    return x << s;

}

uint64_t shift_lt32(uint64_t x, int s) {

 __builtin_assume(s < 32);

    return x << s;

}

uint64_t shift_mask31(uint64_t x, int s) {

    s &= 31;

    return x << s;

}

uint64_t shift_assume_masked(uint64_t x, int s) {

 __builtin_assume((s & 31) == s);

    return x << s;

}

uint64_t shift_mask31_assume_masked(uint64_t x, int s) {

    __builtin_assume((s & 31) == s);

    s &= 31;

    return x << s;

}

```

Assuming that the shift amount is greater than 32 or the 5th bit is set has no effect (`shift_assume_ge32()` and `shift_assume_bit5 `).

Similarly assuming it is less than 32 or the bit is unset also has no effect (`shift_lt32()` and `shift_assume_masked()`).

Setting the 5th bit or masking by 31 (`shift_set_bit5()` and `shift_mask31()`) result in correctly simplified output.

Unless we assume the value is in range (`shift_set_bit5_assume_set()` and `shift_mask31_assume_masked()`).

Then the bit masking is correctly removed, but clang forgets the range info and emits a general shift again.

Sometimes we have a conditional shift by 32, which on a 32 bit target is just a conditional move, but clang replaces it with a general 64 bit shift, which sometimes gets only partially simplified. In particular on RISC-V it fails to realize that the left shifts are nops.

```C++

uint64_t conditional_shift32(uint64_t x, unsigned b) {

 __builtin_assume(b <= 1);

    return b ? x << 32 : x;

}

uint64_t shift_shifted_condition(uint64_t x, unsigned b) {

    __builtin_assume(b <= 1);

    return x << (32 * b);

}

uint64_t conditional_shift_bit5(uint64_t x, unsigned b) {

    return (b & 32) ? x << 32 : x;

}

uint64_t conditional_shift_bit5_assume(uint64_t x, unsigned b) {

 __builtin_assume((b & 32) == b);

    return (b & 32) ? x << 32 : x;

}

uint64_t conditional_shift_assume_bit5(uint64_t x, unsigned b) {

 __builtin_assume((b & 32) == b);

    return b ? x << 32 : x;

}

```

For `conditional_shift_bit5()` and `conditional_shift_bit5_assume()` clang sees that it is just a shift by `b`, but in `conditional_shift_assume_bit5()` it fails, even though it is only missing the `& 32`, which gets removed anyways. 

The above also made me realize, that the 64 shift for 32 bit RISC-V is super weird:

```

shift(unsigned long long, int):

        addi    a4, a2, -32

        sll     a3, a0, a2

        bltz    a4, .LBB0_2

        mv a1, a3

        srai    a0, a4, 31

        and     a0, a0, a3

 ret

.LBB0_2:

        sll     a1, a1, a2

        not     a2, a2

 srli    a0, a0, 1

        srl     a0, a0, a2

        or      a1, a1, a0

 srai    a0, a4, 31

        and     a0, a0, a3

        ret

```

The `srai + and` before the `ret`s are totally useless. The first is always 0 and the other is always a0.

With the Zicond extension it is slightly better:

```

shift(unsigned long long, int):

        sll     a1, a1, a2

 not     a3, a2

        srli    a4, a0, 1

        sll     a0, a0, a2

 addi    a2, a2, -32

        srl     a3, a4, a3

        slti    a2, a2, 0

        or      a1, a1, a3

        czero.nez       a3, a0, a2

 czero.eqz       a1, a1, a2

        or      a1, a1, a3

        czero.eqz a0, a0, a2

        ret

```

The `addi` is still there, even though it could have been combined with the `slti`. Also a minor nitpick, if the `srli` was placed after the `srl`, we could avoid using a4 and use a compressible `srli`.

This is how I would write it:

```

shift(unsigned long long, int):

        sll a1, a1, a2

        not a3, a2

        srl a3, a0, a3

 sll a0, a0, a2

        srli a3, a3, 1

        slti a2, a2, 32

 or a1, a1, a3

        czero.nez a3, a0, a2

        czero.eqz a1, a1, a2

 or a1, a1, a3

        czero.eqz a0, a0, a2

```

(The `slti` could also be replaced with `srli a2, a2, 5`, which has a compressed version, whereas if I'm not mistaken  `slti` does not)

This can be rewritten to use only base instructions with only two more instructions:

```

shift(unsigned long long, int):

        sll a1, a1, a2

        not a3, a2

        srl a3, a0, a3

 sll a0, a0, a2

        srli a3, a3, 1

        addi a2, a2, -32

 srai a2, a2, 31

        or a1, a1, a3

        and a1, a1, a2

        not a3, a2

        and a3, a3, a0

        or a1, a1, a3

        and a0, a0, a2

```

(The `addi` could be replaced with `slli a2, a2, 26`, but I'm not sure which is better)

This to me is preferable over a branchy version from a side channel perspective alone, since it is highly unintuitive that a shift introduces a branch.

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzcWU1v4zgS_TXKpdCGTNmOffDBSTbYxu5eprM7wF4CSipbnKZIL4uKk_z6RZGSLMcfnU5nLhMYCSKSVY9Vj6-KsiRSG4O4TKY3yfTuSja-sm7pbf3FoTJXuS1flr9XaGCH8N3YHfhKepBAlVp7UAQbh9Kj4-cGMpGIW55aSAPeSUNaegQJswnkynerjLcgIRODZ9KUQOjBVwja7tCBw42iYNlCOkrS1a2WZgNrqTTxs9KCrxSB3XpVq1fplTWgDNSWPBSSkEbwwBMqud2iIZAeNEryYA2sfvsXI_3t67fbL_8Jzp_nM35ifYWOYOtsLnP9AlKTZed_R4cgHQLZGgGfZb3VSJCIeTK92dgyt9on07tEzCvvt5Rkq0TcJ-K-HRpZt0nE_Wsi7hcP5Xr8_I_nMhEL_mSrJF0lszR-bhNxw590lYhMmUI3JUKS3RbkS2V8kv2Nh9JVo4yfTR7b8D1KoqbGxw1yBub94DNvSRkPlIgFJNdsFx4f80Zpr0y7KhFzAjac3YUELpIszAMAh75xBp4ZQZLdAsWh5PruEopc-ekPUQCcAhKxXN9GIJBkd4yKfgUUoX8fouC4DcKvOutCQej_xEh8BuJMPG51Qx9E2RMnfQtM3PwiMO0_zuXbXyRyLel7Nn5PTHins7DT8a8eG3aK5Ue2HHctZgzik05NjMBPQrvE5MvoPhTGXjTjvyt2qMwmliguJG1pqW1jTtUqsC5Mm_oqVCJFoQRVksBYwPUaCx8UfpaeUlnewCwNxePtDJYAfpiIBRePb6pWWjouJx3G6E4j0Vs0LZLGMBYuPxcAtWfkApI-c3FOBwi9j5Ha79464Mn8OH-BbHzgaCCip5z156X3Ag6p0VzrobDOYeH1C5Cqt1qtFZZgG79tPGP5twlR2GEMDgZQT1I3yGFQBpw0GzyJ5lBlzwO7HIwHbm-6yHchUDTA7bC2T7zwFvLGQxEbEes26CmsjBCVWdvgHmvlCSRs0KCTuuPhRiozilz9Zmv0qsaw70o-cY9UWFMq7mL6FZyH2FFVqqi4b-m7Ji_ZO8P8oyH_ZjWjPQTrcKtlgcS82ylfDcANW7O9L-oBhk1ao19gK51XUh8kcgRfTRwoGi0dY2ybKuX3vZpDqdUr7k-mxnXrkkJXZeyWRqf7oF5yBjt8DEtP1IfGhH62hPyyYuZRU-5gfEokefR-rzuZgCRbwfMPSj__xvKxh_k-cKc18zK-Hlki5gxOrILJc_iOAne6HzoHr3UaUbGMtw3JT0ToNIL9dj-WxCNEobDkpyL2ueAv9bl_Avj30HFQCu-tYwE8m_VDnfxRatrZUUYIkeIhVkPt6dUqmaV5ENaoPcqc9nAYv9ZDpxe8GJ-CJNtmU7WeggDViqgrW8FNiF70F2UriFUr1yDNy06-0AhiQh8qBJlb1lquqrUsEWrstIlt9PI0m7R7WlvXKW4nawTUbNHBDpUr39zeknTV6ui854G2ZhN-tR1Tf-eD9keWpQp_JzxDBsH_konBDNI6zszCjLSdt5-Qa_-6NzH6581N-jgcr59AjsOqbGjWyeg4Ggxrs_EQmSlhMCHdW3Dok3TVOTrYTg82OhwfgTXWxwliP0ROq7eexgdQ9TGSoVHr4NhrGk1_eJftT9zsMMkPkYDBdCJu2ARTOMe1ddjRk5fN0ljfvPWhcDaE3O2MgC2slaNAbqmZpxDPJK8ObyEGIzK8APmdCzcP_1fxmQJ89mgovPaIzatWm4r7lRy9R_cZ3DyfzD6L2VEu-mROziRTn0tmfxTE2aPgDo7C5JjU2h-ZSH9ElKGB4hWdHRl87ehxdOjiDPxfP-Mc09_piy2dJfZZ8nGsgm4SkFdaMzMcnlDPwja6jE1mjsj9eJ0rzv6u4xPTVHu2NoIVa6OEWhnrwCi_VcX3wI51P9Xp4HgnCUJXWYJcx2tVN95pMrbO5ZNVJTRBvOUk0Lyh2PPWW4dEKtdD27Et5_afoLI7-Aq7YGfnlEdQ_rOofVGjTjP7kA8hmcHSufyFw9CuyY5PglcHTI1ct-499DxbDQasOtrfj02fZOPBVVvMHw5I0yWZmZNjd9do-dUm9WCX04OazdfbPROwhCd0FPpnnoEOJTH7vibiug6JqRV5-R0NDDGUFvmWHPLckqeQJuJh2ng-EzbQLvQSuSS-sZF3TcHNCUW4YczvLNSs5MPxvwrngsaeENhQzA64OD6UsrO84eP8s7sKawYIZfozzt5Hz04hIz1PMlO_YaaYDdrXPeOocdiyVVFXXvdE85YbSUWwdbhGJ1nM7BM6kJA7aYrqpeM0rJ2tuWNWJUJRSWNQwxYdbbHwKrSm1gQRJ2UKbAt7pTYVNw9GGd-oMO_waxjjnS0bvt93HllBr8plVi6yhbzC5fh6Mp9Or8VkclUt83VepOv1BNflWMwXxSQvU3k9zcR6mqLIF1dqKVIxTadiIYRIJ5PRdFyWi_lkUkjMcZHOk0mKtVR6pPVTPbJuc6WIGlyOJyKdzK60zFFT-FJJCIM7CKOJEMn07sotedGXvNlQMkm1Ik97M155jcvhOwnu_Q_felC4KITT7FV4rxDDsH8L4-rwddBV4_TyzfcxyldNPipsnYh79tr--bJ19g8sfCLuA1ZKxH27mael-H8AAAD__y2B-08">