<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/119606>119606</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] Eliminate `cmp` by reassocating `add` and `sub`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Kmeakin
      </td>
    </tr>
</table>

<pre>
    This rust fn:

https://godbolt.org/z/5EKvWeEYb
```rust
#[no_mangle]
pub fn last(xs:&[u8])->Option<&u8>{
    xs.last()
}
```

produces this assembly:
```asm
last:
        add     x8, x1, x0
        cmp     x1, #0
        sub     x8, x8, #1
        csel    x0, xzr, x8, eq
 ret
```

Equivalent C also produces the same assembly
https://godbolt.org/z/c5dMbv63K
```c
#include <stddef.h>
#include <stdint.h>

uint8_t* last(uint8_t* x0, size_t x1) {
 if (x1 == 0) {
        return NULL;
    } else {
        return x0 + x1 - 1;
    }
}
```

By reassocating `sub` and `add`, we can save an instruction by reusing the overflow flag from the first `sub`:
```asm
tgt: // @tgt
        subs    x8, x1, #1
 add     x8, x0, x8
        csel    x0, xzr, x8, lo
        ret
```

I could not convince clang to produce this assembly, even using `__builtin_sub_overflow`:
https://godbolt.org/z/6vPWjx1zv
```c
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>

uint8_t* tgt(uint8_t* x0, size_t x1) {
    size_t sum;
    bool overflow = __builtin_sub_overflow(x1, 1, &sum);

    if (overflow) {
        return NULL;
    } else {
        return x0 + sum;
    }
}
```

I could only produce this assembly by writing the LLVM IR manually:
https://godbolt.org/z/hvjEEsG9r
```LLVM
define noundef i64 @tgt(i64 noundef %0, i64 noundef %1) {
  %4 = tail call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %1, i64 1)
  %overflow = extractvalue { i64, i1 } %4, 1
  %diff = extractvalue {i64, i1} %4, 0
  %sum = add i64 %0, %diff
  %ret = select i1 %overflow, i64 0, i64 %sum 
  ret i64 %ret
}
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVt1u4zYTfRr6ZhCDoizZuvCFfz8skv1aFG0XvTJIirKZpUivSCpOnr4gJcV24jSLooZhQ5yZM5w5Z0RSa-VeCzFH2RJl6xH17mCa-X0t6HepR8yUz_PfD9JC462DSqN0gXD4Hpw72vBEtohs96ZkRrmxafaIbF8Q2Wab-_ab2PzFgnuOu28ACc8kRdlSm11N9V4JlK0RXhw9g0qDotYhMjt12DnKln4WHEhxh9LNL0cnjUbpCpHcz1C6QdMlwgsAgJMd97GIFCHJdH2Zutv1sTGl58KCCzVRa0XN1HNf1OBKbY3wIoJFA_QfWpbx_zRDZAWnJP7iCwdeHzuHaEIkvTRazy6jZ71LchlvhYouOLq8NGdP8SP4NcK9r2nzw8uWKqEdrIAqa-CiSgGW1uJc6efE8az8yto8vb_MxDvWpObKlwJQurKuLEU1PgQObtikdq82vPBSu9nOIbIY-L1Y6aq18kXsXGxdAT2rsoIghQRQukbpGvCFrf80wvlGw___eHhA6WBA0zUIZcVt3xMGRJZwSuAOkqugj1SzfIZGUGsNp07qPaAcW89QjoHqMjzRsgzOZAVPAjjVYGkrgGqQ2rrG8yBaYAHF2wAQeDGtaCplnqBSdA9VY-q4XMnGunOGm9J0-6BM6BgENMFh4Upp9q1OB6W90TDuFfZTGlTmupnvG_UFuPGqBG0ccKNbqbkArmgo-VWWb2YviLsVGrrOoBzvdsxL5aTeWc92Q5tee_GZfvP212-Pp-Sl_VS_zBj1oYD_lbgDDz-p7cBTt2x9fVZh2NNZGkH0H7QjzEXA7tnNAwopOqAeqxufc8R_NztXW_6HwRn0YLR6vk1_mIqnRrphKh4e_vwKX36DmmpP1fBq_ozzQ_u42dj_Fc3lFgIUwotSVFIL0MbrUlQg88kwMWQWHgYDIlkk683aFWeIZJNIiqNSAadKBVMIiZFJ7B6aYKXaeuytZ-Mn6Q7jgYJxdIxZO-QuW9IdWBH-inpxcg3lrqXKi1uJSDbpJNAHl7Kqbga-xl2G4SHM-jpGhXdDvzfcqSoiDm6NcNHNCiW4i5s473eo5bWFA24IDpH9Wv_auBbLqJynZZEWdCTmyTRNi0mWzNLRYc4pTlhV4ITSnExxQdlkxmaclpxRNp2kIzknmEwSkiRJQXKSjquCV3mFyxkv0lzkGE2wqKlU40iJafYjaa0X8yQpcpyPFGVC2Xj5IYRR_l3oEqWLxaLhh9AzgsgKEVJLa0V5Z45O1vKFxjsIIeG-1MwD8B3zext4l9bZcyonnYo3qwEvW8NGyVpq6kR41fH6GE4R9v586U6U4XzpzoKRb9T8zShId_BszE2NyDbk7f_ujo15FNwhso31WkS2fcntnPwdAAD__1ia14k">