[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)

Mon Dec 9 05:59:47 PST 2024

sjoerdmeijer wrote:

> Sure, here's the latency of using the corresponding sequences (normalised to the latency of a simple ADD):
> 
> ```
> ADD: 1
> INCB #1: 1
> INCB #16: 2
> MOV+INCB #1: 1.6
> MOV+INCB #16: 2.6
> ADDVL #1: 2
> ADDVL #16: 2
> ```
> 
> The Neoverse V2 SWOG is a bit vague about the conditions under which MOV Xd, Xn "may not be executed with zero latency", which the micro-benchmark seems to hit (hence the 60% increased latency for these patterns). Nevertheless, even for these cases the MOV patterns with fast INCB still seem at least not worse than ADDVL from the viewpoint of latency.

I don't follow the conclusion, the last sentence, because `MOV+INCB #16` is worse?

https://github.com/llvm/llvm-project/pull/118280