<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/118413>118413</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [AArch64] Suboptimal abs-diff codegen

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            backend:AArch64,

            missed-optimization

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          Kmeakin

      </td>

    </tr>

</table>

<pre>

    https://godbolt.org/z/Wh9sE4754

https://alive2.llvm.org/ce/z/N6uVzT

In the 32-bit and 64-bit cases, `tgt` is obviously better than `src`, since it is one instruction shorter:

```asm

src_u32:

        sub     w8, w1, w0

 subs    w9, w0, w1

        csel    w0, w9, w8, hi

 ret

tgt_u32:

        subs    w8, w0, w1

        cneg    w0, w8, lo

 ret

```

In the 8-bit and 16-bit cases, `src` and `tgt` have the same number of instructions, but `tgt` has more ILP than `src`:

```asm

src_u8:

 and     w8, w0, #0xff

        sub     w8, w8, w1, uxtb

        cmp w8, #0

        cneg    w0, w8, mi

        ret

tgt_u8:

        and w8, w0, #0xff

        sub     w9, w0, w1

        cmp     w8, w1, uxtb

 cneg    w0, w9, ls

        ret

```

I suspect the code generated for `tgt` in the 128-bit cases is not optimal either

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVFGPozYQ_jXmxcrKjAmBBx7obSOdWlWVWrWPlW0m4B7gyGPn7vbXVzjshWRv1YuiEDHffP7mm_EoItvPiA3b_8T2z5mKYXC--WVC9cnOmXbd12YI4UxMtgyODI6967Qbw5PzPYPjC4Pj30NNPxeHfcFEe49Vo70gPI3jZVrxBtek38r418ufTLRMtB9nHgbkEnbaBq7mjpdF-msUITH4wFkpQh9YKbgl7vTFukjjV64xBPQ8DGpeIOQNK8WCJzsb5DYk-IzczhR8NMG6mdPgfEC_iBTtgk9fRRMTLXnzT5RwjfH1Q1Gn5-dqof6cp1-xAChqSpF6fXeN31IN4ZgA19AVlmgGu8A8hqsFoQ_fPZg2B3-HfsZ-Q59go9sQv1Z353P1zea8fGPz1cMUvZk-qAumVFIT8jlOGj13p62tKV_HcJdFfHIe-cdff3_s0bvmV6sFi4DH2hlI8eV0er81mwbFL0FvnZrOa3gh-R8LJ7sB3Leouu_QovIHFb47ItP5zXS9in8QlyhGeivuoc2cIp3RhNQy4zrkPc7oVcCOn5zf3qbrRORQ3QZhuTOzC9ydg53UyNGGAX3WNbKrZa0ybPKDlHkFsi6yoSkEoAGNJeSi013ZFdoYA4VU4iBVfchsAwKKHASAlDIvnsrKqBoKXe1rIfd4YIXASdnx257ILFHEJs-rIpfZqDSOlBYUgFbmE84dk23bejOUBQNg8IEBTJYIu11SbV_UMpNLbP-c-WYh3unYEyvEaCnQ7ahgw5i23yvf_pn_EfVr8UrTrrOnU7KxxzmLfnxciDYMUT8ZNzE4LrzrY3f27l80gcEx1UMMjmtJlwb-CwAA__9MYJFI">