[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)
Sjoerd Meijer via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 9 06:15:34 PST 2024
sjoerdmeijer wrote:
> Sorry, that was poor phrasing on my part. What I meant is that even though the MOVs aren't always "zero-latency", the sequences with MOV+INCB {1,2,4} (the "fast" INCBs) are still at least not worse than ADDVL from the viewpoint of latency. For other forms of INCB, when the MOV isn't zero latency, the MOV+INCB will be worse.
Ok, got it, agreed, so if we restrict this to INCB {1,2,4} we always get the same or better performance.
https://github.com/llvm/llvm-project/pull/118280
More information about the llvm-commits
mailing list