[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)

Mon Dec 9 01:53:03 PST 2024

sjoerdmeijer wrote:

If we always emit incb, then it should give the same or better performance. As long as that's the case, then we're fine.

As you mentioned, it looks like there are a couple of regressions as shown in one of the test cases, instead of:

    addvl x9, x8, #1

we now get this which at first sight doesn't look an improvement:

    mov x9, x8
    incb x9

However, this sequence has a latency of 1 because the MOV is a zero-latency move (on the V2)? So I think this is actually an improvement too, isn't that right?

https://github.com/llvm/llvm-project/pull/118280