[llvm] [CostModel][AArch64] Make extractelement, with fmul user, free whenev… (PR #111479)
Sushant Gokhale via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 4 04:52:46 PST 2024
================
@@ -3194,6 +3206,149 @@ InstructionCost AArch64TTIImpl::getVectorInstrCostHelper(const Instruction *I,
// compile-time considerations.
}
+ // In case of Neon, if there exists extractelement from lane != 0 such that
+ // 1. extractelement does not necessitate a move from vector_reg -> GPR.
+ // 2. extractelement result feeds into fmul.
+ // 3. Other operand of fmul is a scalar or extractelement from lane 0 or lane
+ // equivalent to 0.
+ // then the extractelement can be merged with fmul in the backend and it
+ // incurs no cost.
+ // e.g.
+ // define double @foo(<2 x double> %a) {
+ // %1 = extractelement <2 x double> %a, i32 0
+ // %2 = extractelement <2 x double> %a, i32 1
+ // %res = fmul double %1, %2
+ // ret double %res
+ // }
+ // %2 and %res can be merged in the backend to generate fmul v0, v0, v1.d[1]
+ auto ExtractCanFuseWithFmul = [&]() {
+ // We bail out if the extract is from lane 0.
+ if (Index == 0)
+ return false;
+
+ // Check if the scalar element type of the vector operand of ExtractElement
+ // instruction is one of the allowed types.
+ auto IsAllowedScalarTy = [&](const Type *T) {
+ return T->isFloatTy() || T->isDoubleTy() ||
+ (T->isHalfTy() && ST->hasFullFP16());
+ };
+
+ // Check if the extractelement user is scalar fmul.
+ auto IsUserFMulScalarTy = [](const Value *EEUser) {
+ // Check if the user is scalar fmul.
+ const auto *BO = dyn_cast_if_present<BinaryOperator>(EEUser);
+ return BO && BO->getOpcode() == BinaryOperator::FMul &&
+ !BO->getType()->isVectorTy();
+ };
+
+ // InstCombine combines fmul with fadd/fsub. Hence, extractelement fusion
+ // with fmul does not happen.
+ auto IsFMulUserFAddFSub = [](const Value *FMul) {
----------------
sushgokh wrote:
Lets say the situation goes like this:
```
define double @foo(<2 x double> %a, double %b)
{
%1 = extractelement <2 x double> %a, i32 0
%2 = extractelement <2 x double> %a, i32 1
%3 = fmul double %1, %2
%4 = fadd double %3, %b
ret double %4
}
```
Codegen with `./llc -mtriple=aarch64 -fp-contract=fast test.ll ` is
```
foo: // @foo
.cfi_startproc
// %bb.0:
mov d2, v0.d[1]
fmadd d0, d0, d2, d1
ret
```
So, even if you need the requirements of fmul and extractelement fusion, if fmul has fadd/fsub, we need to bail out.
Hence the patch checks if fmul has fadd/fsub as users.
https://github.com/llvm/llvm-project/pull/111479
More information about the llvm-commits
mailing list