[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)

Mon Dec 9 06:15:34 PST 2024

sjoerdmeijer wrote:

> Sorry, that was poor phrasing on my part. What I meant is that even though the MOVs aren't always "zero-latency", the sequences with MOV+INCB {1,2,4} (the "fast" INCBs) are still at least not worse than ADDVL from the viewpoint of latency. For other forms of INCB, when the MOV isn't zero latency, the MOV+INCB will be worse.

Ok, got it, agreed, so if we restrict this to INCB {1,2,4} we always get the same or better performance. 

https://github.com/llvm/llvm-project/pull/118280