[llvm] [AArch64][SVE] Fold ADD+CNTB to INCB and DECB (PR #118280)

Sjoerd Meijer via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 9 02:16:21 PST 2024


sjoerdmeijer wrote:

> Hi @sjoerdmeijer, but it's really not obvious to me that incb gives better or same performance for immediate values that aren't 1, 2 or 4. In fact, I wouldn't be surprised if it gave worse performance in some circumstances which is why I wonder if we should be more cautious here?

@rj-jesus : can you micro-benchmark this?


> That's because incb requires the src and dest registers to be the same, so that could impact the register allocator and scheduler. I don't think we can always just consider instructions in isolation by looking at the latency and throughput.

An ADDVL is equivalent to MOV+INCB and uses the same number of registers (and is faster)?


https://github.com/llvm/llvm-project/pull/118280


More information about the llvm-commits mailing list