[all-commits] [llvm/llvm-project] d2bcc1: [AArch64][SVE] Use FeatureUseFixedOverScalableIfE...
Nashe Mncube via All-commits
all-commits at lists.llvm.org
Fri Apr 4 06:13:07 PDT 2025
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: d2bcc11067e682a0753c1068e378d66d59edff73
https://github.com/llvm/llvm-project/commit/d2bcc11067e682a0753c1068e378d66d59edff73
Author: Nashe Mncube <nashe.mncube at arm.com>
Date: 2025-04-04 (Fri, 04 Apr 2025)
Changed paths:
M llvm/lib/Target/AArch64/AArch64Processors.td
A llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll
Log Message:
-----------
[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520 (#132246)
Inefficient SVE codegen occurs on at least two in-order cores,
those being Cortex-A510 and Cortex-A520. For example a simple vector
add
```
void foo(float a, float b, float dst, unsigned n) {
for (unsigned i = 0; i < n; ++i)
dst[i] = a[i] + b[i];
}
```
Vectorizes the inner loop into the following interleaved sequence
of instructions.
```
add x12, x1, x10
ld1b { z0.b }, p0/z, [x1, x10]
add x13, x2, x10
ld1b { z1.b }, p0/z, [x2, x10]
ldr z2, [x12, #1, mul vl]
ldr z3, [x13, #1, mul vl]
dech x11
add x12, x0, x10
fadd z0.s, z1.s, z0.s
fadd z1.s, z3.s, z2.s
st1b { z0.b }, p0, [x0, x10]
addvl x10, x10, #2
str z1, [x12, #1, mul vl]
```
By adjusting the target features to prefer fixed over scalable if the
cost is equal we get the following vectorized loop.
```
ldp q0, q3, [x11, #-16]
subs x13, x13, #8
ldp q1, q2, [x10, #-16]
add x10, x10, #32
add x11, x11, #32
fadd v0.4s, v1.4s, v0.4s
fadd v1.4s, v2.4s, v3.4s
stp q0, q1, [x12, #-16]
add x12, x12, #32
```
Which is more efficient.
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list