<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/54696>54696</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal bitfield insertions arm64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
uncleasm
</td>
</tr>
</table>
<pre>
In a case, where the intent is to copy a nibble 0x0000F000 to 0x000000F0, clang trunk
seems to perform almost optimally, except of using two LSR shifts in succession (see https://godbolt.org/z/oMP5Ya78b)
```
uint32_t copy_hex_nibble_from_4_to_1(uint32_t a) {
auto b = (a >> 20) & 15;
a &= ~0x00f0;
a |= b << 4;
return a;
}
lsr w8, w0, #16
lsr w8, w8, #4
bfi w0, w8, #4, #4
```
Obviously the shifting could happen with a single instruction `lsr w8, w0, #20`
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx9U01v3CAQ_TX4grrC4M-DD0k2kSq1atWcerLAxmtabCw-stn--g7sJunuoQgDwzweb5ixMOOp-7xijgfuJKIP-DhLK7GfJVarl6vHymFv8GC2E6BWJYSWmLwSaE_wRd_ZinYkGDRfD9jbsP5GZI_InZNySRybtJOxC-Z6Mc5js3m1cK1P8ZR8HeQGexMOTsXzR4O_PP_AblaTd6AFuzAM0jllVoxoA6R49n5ziN0h-gT9YEZhtN8ZewDrD3zm6_fyJ68bgWh7lnIZK3LpyQwQKKO9TzH2s3ztz1H2kzVLX_Te9Dnc-A7jQIdRfX8-jaHxANEJjNg-SuOweISOKUlIWuG8ROxffNxM6PoxPt5Ebt31Q3RHSlg84OLKb6UPFnL2vonq_Yc3Nu1smo9NSmnKC6Isr_4Lay6w4holJnVGkRvUFfrmUc_jN_GiTHD6lAoq5TImdzBBj3jm2yZXfFR-hpBj1nUsOgelM_iU5opEhTdB0I9LsrFjY8tannnlteyeg7gUFRbKT0rCLUAobaRzmNulKrJgdXdTOaAgiN1gFjC0fnmbPm3W_JKDB1M5F6SDRVlUbZXNHaPFNJG8qSbKG1GKUXAuqpKKtphayWimuZDadai8R-U-Ux0llJKC5DnNYdq1zcTGmg0FY6xuJo4KIheu9C5eHGs4s13SIMLBgVMr592Hk8OPcFilfOOHCpyN7cI6aMndkiW9XRL7F8c0IkA">