<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/118413>118413</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Suboptimal abs-diff codegen
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Kmeakin
      </td>
    </tr>
</table>

<pre>
    https://godbolt.org/z/Wh9sE4754
https://alive2.llvm.org/ce/z/N6uVzT

In the 32-bit and 64-bit cases, `tgt` is obviously better than `src`, since it is one instruction shorter:
```asm
src_u32:
        sub     w8, w1, w0
 subs    w9, w0, w1
        csel    w0, w9, w8, hi
 ret

tgt_u32:
        subs    w8, w0, w1
        cneg    w0, w8, lo
 ret
```

In the 8-bit and 16-bit cases, `src` and `tgt` have the same number of instructions, but `tgt` has more ILP than `src`:
```asm
src_u8:
 and     w8, w0, #0xff
        sub     w8, w8, w1, uxtb
        cmp w8, #0
        cneg    w0, w8, mi
        ret

tgt_u8:
        and w8, w0, #0xff
        sub     w9, w0, w1
        cmp     w8, w1, uxtb
 cneg    w0, w9, ls
        ret
```

I suspect the code generated for `tgt` in the 128-bit cases is not optimal either
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVFGPozYQ_jXmxcrKjAmBBx7obSOdWlWVWrWPlW0m4B7gyGPn7vbXVzjshWRv1YuiEDHffP7mm_EoItvPiA3b_8T2z5mKYXC--WVC9cnOmXbd12YI4UxMtgyODI6967Qbw5PzPYPjC4Pj30NNPxeHfcFEe49Vo70gPI3jZVrxBtek38r418ufTLRMtB9nHgbkEnbaBq7mjpdF-msUITH4wFkpQh9YKbgl7vTFukjjV64xBPQ8DGpeIOQNK8WCJzsb5DYk-IzczhR8NMG6mdPgfEC_iBTtgk9fRRMTLXnzT5RwjfH1Q1Gn5-dqof6cp1-xAChqSpF6fXeN31IN4ZgA19AVlmgGu8A8hqsFoQ_fPZg2B3-HfsZ-Q59go9sQv1Z353P1zea8fGPz1cMUvZk-qAumVFIT8jlOGj13p62tKV_HcJdFfHIe-cdff3_s0bvmV6sFi4DH2hlI8eV0er81mwbFL0FvnZrOa3gh-R8LJ7sB3Leouu_QovIHFb47ItP5zXS9in8QlyhGeivuoc2cIp3RhNQy4zrkPc7oVcCOn5zf3qbrRORQ3QZhuTOzC9ydg53UyNGGAX3WNbKrZa0ybPKDlHkFsi6yoSkEoAGNJeSi013ZFdoYA4VU4iBVfchsAwKKHASAlDIvnsrKqBoKXe1rIfd4YIXASdnx257ILFHEJs-rIpfZqDSOlBYUgFbmE84dk23bejOUBQNg8IEBTJYIu11SbV_UMpNLbP-c-WYh3unYEyvEaCnQ7ahgw5i23yvf_pn_EfVr8UrTrrOnU7KxxzmLfnxciDYMUT8ZNzE4LrzrY3f27l80gcEx1UMMjmtJlwb-CwAA__9MYJFI">