<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/94606>94606</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            AMDGPU lowering of abs is bad for i16 vectors with more than 2 elements
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            good first issue,
            backend:AMDGPU,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          arsenm
      </td>
    </tr>
</table>

<pre>
    We currently get a simple 2-instruction expansion in the v2i16 case, but larger vectors scalarize and produce bad code instead of vector splitting 

```
; RUN: llc -march=amdgcn -mcpu=gfx900 < %s

; v_pk_sub_i16 v1, 0, v0
; v_pk_max_i16 v0, v0, v1
define <2 x i16> @v_abs_v2i16(<2 x i16> %arg) {
  %res = call <2 x i16> @llvm.abs.v2i16(<2 x i16> %arg, i1 false)
  ret <2 x i16> %res
}

; This should decompose into 2 x v2i16 and 4 instructions, but we get this long sequence

;       v_lshrrev_b32_e32 v2, 16, v1
;       v_sub_u16_e32 v3, 0, v2
;       v_max_i16_e32 v2, v2, v3
;       v_lshrrev_b32_e32 v3, 16, v0
;       v_sub_u16_e32 v4, 0, v3
;       v_max_i16_e32 v3, v3, v4
;       v_sub_u16_e32 v4, 0, v1
;       v_max_i16_e32 v1, v1, v4
;       v_sub_u16_e32 v4, 0, v0
;       v_max_i16_e32 v0, v0, v4
;       s_mov_b32 s4, 0x5040100
;       v_perm_b32 v0, v3, v0, s4
;       v_perm_b32 v1, v2, v1, s4
define <4 x i16> @v_abs_v4i16(<4 x i16> %arg) {
  %res = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %arg, i1 false)
  ret <4 x i16> %res
}
```

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUlU-PozYYxj-NubxKZGwCw4FDZtL01Kqquuoxss0LuGswtQ2T3U9fmTCbTNrNdqSRh8RPfk_evxHe63ZArMjumewOiZhCZ10lnMehT6Stv1R_IqjJORyC-QItBhDgdT8aBLbRgw9uUkHbAfA8isHHJz1A6BBmptMclPBI2AvIKYARrkUHM6pgnQevhBFOf0UQQw2js_WkEKSoQdkaIcJR1GCb9RPgR6ND0EMLhB4I3a9nTte_y0v-DL9_-pXwPRijYNMLpzrCD6KvWzXAplfjRPihbc4lpUD4CxC28--A_Bnm0_j55Cd5ijHMaYyAxmOmd5penC-at-t4phdRjY0eMHowOINOc8J_ApLR-SSkPy35Iezp7prthGsJK4EUzxcMxDcdeiD8AEoY82-iMXO_FdJvH0NfQKfQCBNLUr7BHYZ74GK3Bloc7nPzR6c9-M5OpoYale1H62O5goUIudQ9ljSDmwbxb13wiksbhUgxdmjB498TDgrvfAgt55PxnXM4nyRnJ-QMZhYxMcRrmt-0sVxTml90_Foy9l62VuwGt578B9b8xpo-sM6u1vyBNV8V8cz-Hy59gEtXxQdw9AHutp3f4fypt0tSwF9Q5x3NaErvYCO6flHN9CbO5dln35OmN9VIb6XXScr-a5Kyb02ffXySsu9M0kPoo0nKfjRJ7xdWUle8LnkpEqzSIi3ygpU5T7oqlbLZFTVvngpZ8LIockmlzGhW7nLFmzTRFaMsoznN05SxjG-VeFL8CVPJdrKmCklGsRfabJeorGsT7f2EVZnlNE-MkGj8svoZa62todHOB1g0hMUyEMakUJ9xqAnf7385_Pzbp28XvfYe640dg-71VxFnPN7tDomrot9GTq2PGdU--Os3CDoYrC4sMPYVXVzotgEhPWi_7P_GOliW6vpL8apDB711CKETAzBAgz0OwSeTM1UXwugJ3xN2JOzY6tBNcqtsT9gxuq7_NqOzf6EKhB2XAD1hx0se5or9EwAA__9MIwGI">