[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)
Sjoerd Meijer via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 9 05:59:47 PST 2024
sjoerdmeijer wrote:
> Sure, here's the latency of using the corresponding sequences (normalised to the latency of a simple ADD):
>
> ```
> ADD: 1
> INCB #1: 1
> INCB #16: 2
> MOV+INCB #1: 1.6
> MOV+INCB #16: 2.6
> ADDVL #1: 2
> ADDVL #16: 2
> ```
>
> The Neoverse V2 SWOG is a bit vague about the conditions under which MOV Xd, Xn "may not be executed with zero latency", which the micro-benchmark seems to hit (hence the 60% increased latency for these patterns). Nevertheless, even for these cases the MOV patterns with fast INCB still seem at least not worse than ADDVL from the viewpoint of latency.
I don't follow the conclusion, the last sentence, because `MOV+INCB #16` is worse?
https://github.com/llvm/llvm-project/pull/118280
More information about the llvm-commits
mailing list