<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/83840>83840</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86][AVX] Recognise out of bounds AVX2 shift amounts
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            good first issue,
            backend:X86,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          RKSimon
      </td>
    </tr>
</table>

<pre>
    Pulled out of #39822 which was a bit too general.

Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:

- VSRAV clamps the unsigned shift amount to (BITWIDTH-1)
- VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).

So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.

e.g.

```ll
define <4 x i32> @ashr(<4 x i32> %sh, <4 x i32> %amt) {
  %elt.min.i = tail call <4 x i32> @llvm.umin.v4i32(<4 x i32> %amt, <4 x i32> <i32 31, i32 31, i32 31, i32 31>)
  %shr = ashr <4 x i32> %sh, %elt.min.i
  ret <4 x i32> %shr
}
```
-> 
```asm
ashr(int vector[4], unsigned int vector[4]):
        vpbroadcastd    xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]
        vpminud xmm1, xmm1, xmm2
        vpsravd xmm0, xmm0, xmm1
        ret
```
vs
```asm
ashr(int vector[4], unsigned int vector[4]):
        vpsravd xmm0, xmm0, xmm1
 ret
```

Logical shifts are trickier but also foldable:
```ll
define <4 x i32> @lshr(<4 x i32> %sh, <4 x i32> %amt) {
  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
 %shr = lshr <4 x i32> %sh, %amt
  %0 = select <4 x i1> %cmp.i, <4 x i32> %shr, <4 x i32> zeroinitializer
  ret <4 x i32> %0
}

define <4 x i32> @lshr2(<4 x i32> %sh, <4 x i32> %amt) {
  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
  %0 = select <4 x i1> %cmp.i, <4 x i32> %sh, <4 x i32> zeroinitializer
  %shr = lshr <4 x i32> %0, %amt
 ret <4 x i32> %shr
}
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVt9vqzYU_mucl6NGYEICD3lISqNWy8NVcpf1bTLYAa_GRrZJbvvXTzY0zQ_W3mnTtKoycHx-fOezP8fEGF5KxuYoXqI4G5HWVkrPN79sea3kKFf0df6tFYJRUK0FtQeEoyhNMIZjxYsKjsQAgZxbsEpBySTTRIxRkKFg0Y2_SsFfGNiKvU_D0zaD7WaB8Gq7WbvxcQ1SUWYQvveOi90zhgMrrNJgKr633TQ8J1MXu9tuFjuEV7vtZt09H9c7qIikgr3jzFUrqemjSa1aaQ2KFufI7rpEUAhSN8YXbqXng17EgVWAcLJ8-v7bU_b98S5EOD3LcAZBM9tq6Rh5Y1rBgYiWwV7p4bwGSs2IZRpsReRNiQsWtwqOFZMg1JFpLssLdjxvRwamUq2gkDMguWAO9l4JCkS-XvbjG4aGWMs8WkmhNd0S9Qw7uu1rw8wFCDYuL77RNOj-hegMlO25ZICi-wn8AB5hFD0AmgTEVBrh5MqOY1M55DdmUluEU0CzZZcWnJEJO665HHNAUQaWcAEFEeK2mBCHetw618PEWQfq-gI3haN7HmGIQjf1yVv0cFp_6JrQHhLpXoZbPMP_HqqZHXLXPbez7Irkfst5x8sZYurO0vPMpe33B4qXExRnDsJpCw7MpidlQP93aHKtCC2IsdR9_6hr7LLQo9IUGqsBxUvNG0B4CeP1_ben4PcAxZk7H7yzpwTFS0_dxRBn16VqLlvqojzNZ0987Wk0OXjPoPcIThEXnprZQfYO5r-g7kuUfwWvG9eq5AURvbaBaAZW8-KFMw15a4EI0ynbqfzjTPtZMYp_QYxF3fRC5EXdQCsGtvKHyLxw8ElCw2_RQ5__TFLic0m5Ch-YAh9imGDFB5qwj_GAB_vzbFzb3fHNJbecCP7G9KeSDa4F-yX7g2fS_4X-f0TlTzP55SIHN2v8d8_LEZ1HNI1SMmLzcBakcZxEYTCq5mmUkyIJQ0qSZJ_OwhzP0iAkEzLdFxQX6YjPcYAnQRRMwihMcDgOSUFm6TSf4oTtSRygScBqwsXY_9YoXY64MS2bJ1EyCUaC5EwYf6XCuFSKwp5rY8H7IOxYRxjnpHhhkqJo8ZxMT9aaG8PonWosr_kbsVxJNxdnIz13xe7ytjRuG3FjzUd5y63wlziXK85QvFzsnt1xvGGFKiU31_cif8O6uIyMWi3mlbWNvybhFcKrktuqzceFqhFeuVr9467R6g9WWIRXvieD8Mq3_mcAAAD__ybp3aE">