[PATCH] [AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registers
Tim Northover
t.p.northover at gmail.com
Mon Nov 3 13:44:35 PST 2014
I'm really not sure about this one. I agree with James that hacking around with the instructions after the scheduler seems really iffy. It sounds much more like we're hitting a scheduler defect that we want to fix properly instead, unless it's a constraint that's just impossible to represent.
Perhaps some kind of forwarding from a load to a dependent store has been omitted?
I've also got some other issues with the actual implementation.
Cheers.
Tim.
================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:6615-6620
@@ -6606,12 +6614,8 @@
MachineFunction &MF) const {
- // Don't use AdvSIMD to implement 16-byte memset. It would have taken one
- // instruction to materialize the v2i64 zero and one store (with restrictive
- // addressing mode). Just do two i64 store of zero-registers.
- bool Fast;
- const Function *F = MF.getFunction();
- if (Subtarget->hasFPARMv8() && !IsMemset && Size >= 16 &&
- !F->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
- Attribute::NoImplicitFloat) &&
- (memOpAlign(SrcAlign, DstAlign, 16) ||
- (allowsMisalignedMemoryAccesses(MVT::f128, 0, 1, &Fast) && Fast)))
- return MVT::f128;
+ // In general it's optimal to use 64-bit registers with load/store pair
+ // instructions for memcpy() inlining, rather than doing the same with regular
+ // load/store instructions operating on 128-bit registers. Do not use 128-bit
+ // types.
+
+ if (Subtarget->isCyclone()) {
+ // Don't use AdvSIMD to implement 16-byte memset. It would have taken one
----------------
How general is this? We should be writing for future cores as well as existing ones, and always preferring 64-bit operations seems like it'll be more and more of an oddity in future.
It also seems like it belongs in a completely separate patch to the interleaving one.
================
Comment at: lib/Target/AArch64/AArch64LoadStoreInterleave.cpp:242-245
@@ +241,6 @@
+// Checks if a set of load and store instructions can be safely reordered.
+static bool isSafeToReorder(MachineBasicBlock &MBB,
+ const SmallVectorImpl<MachineInstr*> &Lds,
+ const SmallVectorImpl<MachineInstr*> &Sts,
+ const TargetRegisterInfo *TRI) {
+ if (Sts.empty() || Sts.size() != Lds.size())
----------------
This seems like a really fragile way to do this. It's only ever going to work on a basic block with a single memcpy operation and no other loads/stores.
http://reviews.llvm.org/D6054
More information about the llvm-commits
mailing list