<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54696>54696</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Suboptimal bitfield insertions arm64
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          uncleasm
      </td>
    </tr>
</table>

<pre>
    In a case, where the intent is to copy a nibble 0x0000F000 to 0x000000F0, clang trunk
seems to perform almost optimally, except of using two LSR shifts in succession (see https://godbolt.org/z/oMP5Ya78b)

```
uint32_t copy_hex_nibble_from_4_to_1(uint32_t a) {
    auto b = (a >> 20) & 15;
    a &= ~0x00f0;
    a |= b << 4;
    return a;
}
        lsr     w8, w0, #16
        lsr     w8, w8, #4
        bfi     w0, w8, #4, #4
```

Obviously the shifting could happen with a single instruction `lsr w8, w0, #20`

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx9U01v3CAQ_TX4grrC4M-DD0k2kSq1atWcerLAxmtabCw-stn--g7sJunuoQgDwzweb5ixMOOp-7xijgfuJKIP-DhLK7GfJVarl6vHymFv8GC2E6BWJYSWmLwSaE_wRd_ZinYkGDRfD9jbsP5GZI_InZNySRybtJOxC-Z6Mc5js3m1cK1P8ZR8HeQGexMOTsXzR4O_PP_AblaTd6AFuzAM0jllVoxoA6R49n5ziN0h-gT9YEZhtN8ZewDrD3zm6_fyJ68bgWh7lnIZK3LpyQwQKKO9TzH2s3ztz1H2kzVLX_Te9Dnc-A7jQIdRfX8-jaHxANEJjNg-SuOweISOKUlIWuG8ROxffNxM6PoxPt5Ebt31Q3RHSlg84OLKb6UPFnL2vonq_Yc3Nu1smo9NSmnKC6Isr_4Lay6w4holJnVGkRvUFfrmUc_jN_GiTHD6lAoq5TImdzBBj3jm2yZXfFR-hpBj1nUsOgelM_iU5opEhTdB0I9LsrFjY8tannnlteyeg7gUFRbKT0rCLUAobaRzmNulKrJgdXdTOaAgiN1gFjC0fnmbPm3W_JKDB1M5F6SDRVlUbZXNHaPFNJG8qSbKG1GKUXAuqpKKtphayWimuZDadai8R-U-Ux0llJKC5DnNYdq1zcTGmg0FY6xuJo4KIheu9C5eHGs4s13SIMLBgVMr592Hk8OPcFilfOOHCpyN7cI6aMndkiW9XRL7F8c0IkA">