<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/57810>57810</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] Use BTS to set upper single bit on fast x64 targets
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:X86
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
RKSimon
</td>
</tr>
</table>
<pre>
Raised here: https://reviews.llvm.org/D132520#inline-1278500 (we already do this on upper 32-bits for -Os/-Oz but some targets (atom/slm, pre-haswell etc. it might be worth doing in more cases).
```
orl $65536, %edi # imm = 0x10000
```
BTW, with -Oz at least, we should be using 4-byte bts $16, %edi instead of 6-byte or $65536, %edi (or 5-byte for EAX).
On Intel CPUs bts $i8, %reg is still only 1 uop, although can run on fewer execution ports than or (p06 in SKL/ICL; the shift ports. Only p1 in Alder Lake P-cores). Appropriate at least for -Os -mtune=intel (or any specific Sandybridge-family) if we want to be that fine-grained about different instruction selection.
Perhaps even -O2 -mtune=intel. Although maybe not, since Alder Lake P-cores dropped the throughput to 1, competing with imul and tzcnt/lzcnt/popcnt for that port. (BTS still has 1 cycle latency, unlike most integer uops that can only run on port 1. Alder Lake E-cores run as 1 uop with 1 cycle latency to the integer output, 2 cycle latency to the CF output.)
So maybe only for -O2 with a -march before Alder Lake? But people normally expect -march=haswell to be good on later Intel, and it's a pretty small savings, just 2 bytes. OTOH it might be a pretty small gain, unless used in a loop with a port 1 bottleneck.
On AMD CPUs, bts $imm, %reg is 2 uops, so only appropriate for -Oz and maybe -Os.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJxtVV1v2zoM_TXOC5HAn4nzkIe0WXGHdWixdrh7lW3G1ipLhiQ3TX_9SNnZ2qJBENuRRB6ec0hXpjnvfgjpsIEOLUbZHjrvB0c3UXpDX4vPEk9updRzvzK2pb8OSZYWaRylmdRKalwm6aYs4hiitDwhCGVRNGdoDPhOOjAaxmFAC1m6rKR3cDQWlneOIi3vXqEaPTjTI3hhW6RliiK86WnZKfq9hsHishPuhEoB-noF0kMv285DhXAy1neUS-oWpIbeWIRaOKTw21UUH6J4H63j-RsejVWUI18XRbbm8FFaYCPpkoHse4iyA8QvSUyfT49fPf7Pp06S0jJ-4UGhcD78ieA6M6qGkY2OMeXL6uwRqlBYnrzNKLXzxBSYI6ynXUTMp8hKWiimLUzel_2vN9WF3zsNX7VHBdf3P90lmyznKBaJHAfOS6LQaHWGBEYz8KpQnhC3HZGmwY6a5TriieTCF6xHL-l5II4diUk7AsJyiNdM9sO3W1Lp6_VtlF3RMhcvj37avoI7zjMkvHGvGgp4K54Q7pc1SRTUgf0wWDNYKaiuC40Xd8Cy96MmQx5kqGsiQegzuAFreZQ1PAjdnCsrmxaXR9FLdaaoII8sw0loD96wDoSborJPWyvo0oCoDJmukccjWZ72sRB2rEOtDhWGu5nee7SdGBzgM2rClX7ARUVcCOzFmbJpE5xA2tf4Sd3QUMUDYWC6fGf55DAGqAmfq00_oGfjBIPJflRUNG1_rTUFvlHzdTAD3QSyQn3M-YpJunp8mHWmliGZ63OtEBRRrOszZxipZwlPbxwX7rElhOQFN8VhFwSDzFbguJCs3lbyZa6Ed4QUdHpC-yEbF8VlXrIQ6VQqY0g_33h9M-9ZkZBv3f1gZnYDtMkh6ZRTkCDC1h0pfeTe_wc0ym7giqgd0AyKhbG9UHQcX8hAfj5GOl5Gy-SW1piGC2doduqp0CakgSTsG0cZaSB5T0bkeODEM8nleNPvkThNgfuU_f9499-7UfXhYEtmnAVB52hckCuoVwQocyFUzPxDZbxXqLF--tj1---H0PMc6dL2ff--79MgcLClmSgUbzpvovM1lDixTO23WuAuWa-zfLtNsnLR7LJmm23FwkvCsYuKq1_lOioO8NMhsOWIPId-HvQ8-IhxGvZhmnBXv6zzy4BfjFbt3r9kWip2rFZ1mPr8ppkvS0L5m8SiR-ncyGPjptiUSbzoduuySuKN2Bw3KERabuOyyDPMq6LKa_JCuVCiQuUYa5SmlaifUDeUkYGnKWFfyF0ap2m8TcokiYusXIk6T-pNXmISiywVdZTH2Aup_r78FnYXYFUjCZ7HSjr_7824EM7JVmOgh-OLkSaD3f349iB7oxehgl2A_wc0p260">