[llvm] [LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2) (PR #95819)
Sjoerd Meijer via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 18 03:06:09 PDT 2024
sjoerdmeijer wrote:
This is actually a quite complicated story, it's a combination of a few factors: a few micro-architectural reasons and (SVE) codegen reasons. To give a better introduction to the problem, we have a number of examples similar to this:
for (int i = 0; i < 32000/2; i++) {
a[i+k] = a[i] + b[i];
This is GCC's output, and LLVM's output with this patch:
.L3:
ldr q31, [x20, x0]
ldr q30, [x19, x0]
fadd v31.4s, v31.4s, v30.4s
str q31, [x21, x0]
add x0, x0, 16
cmp x0, x28
bne .L3
LLVM's output is something like this:
.LBB0_3:
add x9, x19, x8, lsl #2
add x10, x20, x8, lsl #2
ld1w { z0.s }, p0/z, [x19, x8, lsl #2]
ld1w { z2.s }, p0/z, [x20, x8, lsl #2]
add x8, x8, x21
ld1w { z1.s }, p0/z, [x9, x28, lsl #2]
ld1w { z3.s }, p0/z, [x10, x28, lsl #2]
add x10, x9, x26
cmp x8, x22
fadd z0.s, z2.s, z0.s
fadd z1.s, z3.s, z1.s
st1w { z0.s }, p0, [x9, x23, lsl #2]
st1w { z1.s }, p0, [x10, x28, lsl #2]
b.ne .LBB0_3
There is nothing fundamentally wrong with LLVM's codegen, but it performs a lot worse.
One of the micro-architectural reasons are documented in section "4.1 Dispatch constraints" of the SWOG:
> The dispatch stage can process up to 8 MOPs per cycle and dispatch up to 16 μOPs per cycle,
The smaller kernels fit in these dispatch constraints, the bigger ones don't, resulting in significant performance differences.
Most of the performance can be clawed back by interleaving more. But then there's clearly a code quality issue: the amount of code necessary to get on par with the NEON kernel would be disproportional.
Two more subjective arguments:
- there is no need to go predicated for these kind of examples,
- GCC prefers this codegen strategy for the same reasons.
Other factors are slightly more complicated SVE addressing modes, also resulting in more MOPS.
We are investigating other micro-architectural issues, but I cannot comment on this yet.
https://github.com/llvm/llvm-project/pull/95819
More information about the llvm-commits
mailing list