[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 9 02:00:50 PST 2024
david-arm wrote:
> If we always emit incb, then it should give the same or better performance. As long as that's the case, then we're fine.
Hi @sjoerdmeijer, but it's really not obvious to me that incb gives better performance for immediate values that aren't 1, 2 or 4. In fact, I wouldn't be surprised if it gave worse performance in some circumstances which is why I wonder if we should be more cautious here? That's because incb requires the src and dest registers to be the same, so that could impact the register allocator and scheduler. I don't think we can always just consider instructions in isolation by looking at the latency and throughput.
https://github.com/llvm/llvm-project/pull/118280
More information about the llvm-commits
mailing list