<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/114282>114282</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
AMDGPU does not avoid clamp of bit shift in BFE pattern
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AMDGPU,
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
changpeng
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
arsenm
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/nT9dMvrxv
```
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s
define i32 @bfe(i32 %scr0, i32 %src1) {
%src2.clamp.i = and i32 %src1, 31
%shl.src2.i = shl i32 1, %src2.clamp.i
%rhs.i = sub i32 %shl.src2.i, 1
%and.i = and i32 %scr0, %rhs.i
ret i32 %and.i
}
```
This fails to eliminate the and clamping the shift size. The instruction only reads the low 5 bits, so we can erase the and.
```
bfe: ; @bfe
; %bb.0:
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_and_b32_e32 v1, 31, v1
v_bfe_u32 v0, v0, 0, v1
s_setpc_b64 s[30:31]
.Lfunc_end0:
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8VF2PqzYQ_TXmZRRkxrAJDzzko-lLb1VVe58jYw_gXmMjbLJ799dXEJKbrlaVEidDzjkz4zkZGYJpHVHFigNDVJ107UCuZYisOCVyip0fKzkGcn1Se_2z6mIcAhN7hmeG59br2tuY-rFleP5geHavpf52Hd-vjJ8Y36_nC19ft1Ac4O_vfzKxB2sVbPo4msESEyfZ61a5jez1_O6ChE2vhomJU9u8l5wDE0dgWISbkKbGOAIjEFjO64YY7pYAi6BGzvAI93BUGcMS2PZwo8L6FFNlZT-kBpg4gXT6v4wjiOyZ0Nl0Id3gobMLfAF-0ntijV24E6b6of-QmsnPSaTTX5Sz9nOXu-NHinfIwltveHv6-uaX87UzARppbIDogazpjZORIHa0pFw6MK5dHoTONBGC-aAUXjsC40IcJxWNd-Cd_QkjSR0WqPVvUEBtYpgrDR7eCJR0QKMMD_X0f4wxT1DsYfbHOs-HXxgWdZ3y2XkrvwyXN2michGuvXKR4Y7PI6b34Smy7Y-nHx_c60U6fakFXkggXO-TxuP8_ReobugyzYDl7m8n_4QKl0BxUJf6JYfAioOYixQZK9YRpH80k1MXcvqp-q_GkuhK6FKUMqEq2wousiwvi6SrdtTwXflSIt9ui63cZhnmdZ7nJckyK1-yxFTIMc-44NmO50WR1nonKC9VI0oiEjXLOfXS2NTaaz__WxMTwkRVluW4w8TKmmxYl0At1Q9ymon9_tvp97--z7sAjwyxNyGQ3vghmt58yNkB654Yq1l3U09tYDm3JsTwK1M00VJ10wLtKYDzEeTVm9Vp4JvZNKvTjIPD-TcYZIw0umQa7eedY2I31anyPcPznGX92Ayj_4dUZHhemgsMz2t_1wr_DQAA__8T12kB">