<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/83840>83840</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [X86][AVX] Recognise out of bounds AVX2 shift amounts

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            good first issue,

            backend:X86,

            missed-optimization

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          RKSimon

      </td>

    </tr>

</table>

<pre>

    Pulled out of #39822 which was a bit too general.

Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:

- VSRAV clamps the unsigned shift amount to (BITWIDTH-1)

- VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).

So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.

e.g.

```ll

define <4 x i32> @ashr(<4 x i32> %sh, <4 x i32> %amt) {

  %elt.min.i = tail call <4 x i32> @llvm.umin.v4i32(<4 x i32> %amt, <4 x i32> <i32 31, i32 31, i32 31, i32 31>)

  %shr = ashr <4 x i32> %sh, %elt.min.i

  ret <4 x i32> %shr

}

```

-> 

```asm

ashr(int vector[4], unsigned int vector[4]):

        vpbroadcastd    xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]

        vpminud xmm1, xmm1, xmm2

        vpsravd xmm0, xmm0, xmm1

        ret

```

vs

```asm

ashr(int vector[4], unsigned int vector[4]):

        vpsravd xmm0, xmm0, xmm1

 ret

```

Logical shifts are trickier but also foldable:

```ll

define <4 x i32> @lshr(<4 x i32> %sh, <4 x i32> %amt) {

  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>

 %shr = lshr <4 x i32> %sh, %amt

  %0 = select <4 x i1> %cmp.i, <4 x i32> %shr, <4 x i32> zeroinitializer

  ret <4 x i32> %0

}

define <4 x i32> @lshr2(<4 x i32> %sh, <4 x i32> %amt) {

  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>

  %0 = select <4 x i1> %cmp.i, <4 x i32> %sh, <4 x i32> zeroinitializer

  %shr = lshr <4 x i32> %0, %amt

 ret <4 x i32> %shr

}

```

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVt9vqzYU_mucl6NGYEICD3lISqNWy8NVcpf1bTLYAa_GRrZJbvvXTzY0zQ_W3mnTtKoycHx-fOezP8fEGF5KxuYoXqI4G5HWVkrPN79sea3kKFf0df6tFYJRUK0FtQeEoyhNMIZjxYsKjsQAgZxbsEpBySTTRIxRkKFg0Y2_SsFfGNiKvU_D0zaD7WaB8Gq7WbvxcQ1SUWYQvveOi90zhgMrrNJgKr633TQ8J1MXu9tuFjuEV7vtZt09H9c7qIikgr3jzFUrqemjSa1aaQ2KFufI7rpEUAhSN8YXbqXng17EgVWAcLJ8-v7bU_b98S5EOD3LcAZBM9tq6Rh5Y1rBgYiWwV7p4bwGSs2IZRpsReRNiQsWtwqOFZMg1JFpLssLdjxvRwamUq2gkDMguWAO9l4JCkS-XvbjG4aGWMs8WkmhNd0S9Qw7uu1rw8wFCDYuL77RNOj-hegMlO25ZICi-wn8AB5hFD0AmgTEVBrh5MqOY1M55DdmUluEU0CzZZcWnJEJO665HHNAUQaWcAEFEeK2mBCHetw618PEWQfq-gI3haN7HmGIQjf1yVv0cFp_6JrQHhLpXoZbPMP_HqqZHXLXPbez7Irkfst5x8sZYurO0vPMpe33B4qXExRnDsJpCw7MpidlQP93aHKtCC2IsdR9_6hr7LLQo9IUGqsBxUvNG0B4CeP1_ben4PcAxZk7H7yzpwTFS0_dxRBn16VqLlvqojzNZ0987Wk0OXjPoPcIThEXnprZQfYO5r-g7kuUfwWvG9eq5AURvbaBaAZW8-KFMw15a4EI0ynbqfzjTPtZMYp_QYxF3fRC5EXdQCsGtvKHyLxw8ElCw2_RQ5__TFLic0m5Ch-YAh9imGDFB5qwj_GAB_vzbFzb3fHNJbecCP7G9KeSDa4F-yX7g2fS_4X-f0TlTzP55SIHN2v8d8_LEZ1HNI1SMmLzcBakcZxEYTCq5mmUkyIJQ0qSZJ_OwhzP0iAkEzLdFxQX6YjPcYAnQRRMwihMcDgOSUFm6TSf4oTtSRygScBqwsXY_9YoXY64MS2bJ1EyCUaC5EwYf6XCuFSKwp5rY8H7IOxYRxjnpHhhkqJo8ZxMT9aaG8PonWosr_kbsVxJNxdnIz13xe7ytjRuG3FjzUd5y63wlziXK85QvFzsnt1xvGGFKiU31_cif8O6uIyMWi3mlbWNvybhFcKrktuqzceFqhFeuVr9467R6g9WWIRXvieD8Mq3_mcAAAD__ybp3aE">