[PATCH] D89894: [AArch64] Backedge indexing
Sam Parker via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 21 09:19:16 PDT 2020
samparker created this revision.
samparker added reviewers: SjoerdMeijer, dmgreen, fhahn, efriedma, sanwou01.
Herald added subscribers: danielkiss, arphaman, hiraditya, kristof.beyls.
Herald added a project: LLVM.
samparker requested review of this revision.
Bit of a brain dump because I was seeing the same problems with addressing modes in unrolled loops and is completely related to what @SjoerdMeijer is currently working on in D89693 <https://reviews.llvm.org/D89693> and I doubt I will have time to look more into this...
For the benchmark that I am looking at, the total size shrinks, but there seems to be a problem because we no longer generate the LDPs, (which I presume this is just a current limitation of the AArch64LoadStoreOptimizer?):
< ldp q0, q2, [x2, #-16]
< ldp q1, q3, [x4, #-16]
< subs x5, x5, #8 // =8
< add x4, x4, #32 // =32
< add x2, x2, #32 // =32
< fmul v0.4s, v0.4s, v1.4s
< fmul v2.4s, v2.4s, v3.4s
< ldp q1, q3, [x3, #-16]
< fadd v0.4s, v1.4s, v0.4s
< fadd v1.4s, v3.4s, v2.4s
< stp q0, q1, [x3, #-16]
< add x3, x3, #32 // =32
---
> ldr q0, [x5, #32]!
> subs x27, x27, #8 // =8
> ldur q1, [x5, #-16]
> ldr q2, [x7, #32]!
> ldur q3, [x7, #-16]
> ldr q4, [x6, #32]!
> fmul v0.4s, v0.4s, v2.4s
> fmul v1.4s, v1.4s, v3.4s
> ldr q2, [x6, #16]
> fadd v1.4s, v4.4s, v1.4s
> fadd v0.4s, v2.4s, v0.4s
> stp q1, q0, [x6]
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D89894
Files:
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
Index: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
===================================================================
--- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -231,6 +231,8 @@
}
}
+ bool shouldFavorBackedgeIndex(const Loop *L) const;
+
unsigned getGISelRematGlobalCost() const {
return 2;
}
Index: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -16,6 +16,7 @@
#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/IntrinsicsAArch64.h"
+#include "llvm/IR/Use.h"
#include "llvm/Support/Debug.h"
#include <algorithm>
using namespace llvm;
@@ -972,6 +973,36 @@
return Considerable;
}
+bool AArch64TTIImpl::shouldFavorBackedgeIndex(const Loop *L) const {
+ // This optimisation will generally introduce base address modifying
+ // instruction(s) into the preheader and is only really useful for
+ // unrolled loops, and we don't generally do when optimising for size.
+ if (L->getHeader()->getParent()->hasOptSize() ||
+ L->getNumBlocks() != 1)
+ return false;
+
+ // Find pointers with multiple uses within the loop.
+ DenseMap<Value *, unsigned> NumPointerUses;
+ for (auto &I : *L->getHeader()) {
+ if (I.getType()->isPointerTy())
+ NumPointerUses[&I] = 0;
+
+ for (auto &Use : I.operands()) {
+ if (!Use->getType()->isPointerTy())
+ continue;
+ if (NumPointerUses.count(Use))
+ NumPointerUses[Use]++;
+ else
+ NumPointerUses[Use] = 0;
+ }
+ }
+
+ return std::any_of(NumPointerUses.begin(), NumPointerUses.end(),
+ [](detail::DenseMapPair<Value *, unsigned> Pair) {
+ return Pair.second > 1;
+ });
+}
+
bool AArch64TTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {
auto *VTy = cast<VectorType>(Ty);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D89894.299711.patch
Type: text/x-patch
Size: 2177 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20201021/528fb9ae/attachment.bin>
More information about the llvm-commits
mailing list