[llvm] [LAA] Add initial support for non-power-of-2 store-load forwarding distance (PR #137873)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Tue May 13 12:48:33 PDT 2025
================
@@ -1769,24 +1770,61 @@ bool MemoryDepChecker::couldPreventStoreLoadForward(uint64_t Distance,
break;
}
}
+ // RISCV VLA supports non-power-2 vector factor. So, we iterate in a
+ // backward order to find largest VF, which allows aligned stores-loads or
+ // the number of iterations between conflicting memory addresses is not less
+ // than 8 (NumItersForStoreLoadThroughMemory).
+ if (AllowNonPow2Deps) {
+ MaxVFWithoutSLForwardIssuesNonPowerOf2 =
+ std::min(8 * VectorizerParams::MaxVectorWidth / TypeByteSize,
+ MaxNonPowerOf2StoreLoadForwardSafeDistanceInBits);
+
+ for (uint64_t VF = MaxVFWithoutSLForwardIssuesNonPowerOf2;
+ VF > MaxVFWithoutSLForwardIssuesPowerOf2; VF -= TypeByteSize) {
+ if (Distance % VF == 0 ||
+ Distance / VF >= NumItersForStoreLoadThroughMemory) {
+ uint64_t GCD =
+ isSafeForAnyStoreLoadForwardDistances()
+ ? VF
+ : std::gcd(MaxNonPowerOf2StoreLoadForwardSafeDistanceInBits,
+ VF);
+ MaxVFWithoutSLForwardIssuesNonPowerOf2 = GCD;
+ break;
+ }
+ }
+ }
- if (MaxVFWithoutSLForwardIssuesPowerOf2 < 2 * TypeByteSize) {
+ if (MaxVFWithoutSLForwardIssuesPowerOf2 < 2 * TypeByteSize &&
+ MaxVFWithoutSLForwardIssuesNonPowerOf2 < 2 * TypeByteSize) {
LLVM_DEBUG(
dbgs() << "LAA: Distance " << Distance
<< " that could cause a store-load forwarding conflict\n");
return true;
}
+ // Handle non-power-2 store-load forwarding distance, power-of-2 distance can
+ // be calculated.
+ if (AllowNonPow2Deps && CommonStride &&
+ MaxVFWithoutSLForwardIssuesNonPowerOf2 <
+ MaxNonPowerOf2StoreLoadForwardSafeDistanceInBits &&
+ MaxVFWithoutSLForwardIssuesNonPowerOf2 !=
+ 8 * VectorizerParams::MaxVectorWidth / TypeByteSize) {
+ uint64_t MaxVF = MaxVFWithoutSLForwardIssuesNonPowerOf2 / CommonStride;
+ uint64_t MaxVFInBits = MaxVF * TypeByteSize * 8;
+ MaxNonPowerOf2StoreLoadForwardSafeDistanceInBits =
----------------
fhahn wrote:
Do we need to compute this separately? Would it instead be possible to always compute the non-power-of-2 version and then have users convert it to the closest power-of-2 if that's what they need?
https://github.com/llvm/llvm-project/pull/137873
More information about the llvm-commits
mailing list