[PATCH] [AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registers

Mon Nov 3 13:44:35 PST 2014

I'm really not sure about this one. I agree with James that hacking around with the instructions after the scheduler seems really iffy. It sounds much more like we're hitting a scheduler defect that we want to fix properly instead, unless it's a constraint that's just impossible to represent.

Perhaps some kind of forwarding from a load to a dependent store has been omitted?

I've also got some other issues with the actual implementation.

Cheers.

Tim.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:6615-6620
@@ -6606,12 +6614,8 @@
                                                MachineFunction &MF) const {
-  // Don't use AdvSIMD to implement 16-byte memset. It would have taken one
-  // instruction to materialize the v2i64 zero and one store (with restrictive
-  // addressing mode). Just do two i64 store of zero-registers.
-  bool Fast;
-  const Function *F = MF.getFunction();
-  if (Subtarget->hasFPARMv8() && !IsMemset && Size >= 16 &&
-      !F->getAttributes().hasAttribute(AttributeSet::FunctionIndex,
-                                       Attribute::NoImplicitFloat) &&
-      (memOpAlign(SrcAlign, DstAlign, 16) ||
-       (allowsMisalignedMemoryAccesses(MVT::f128, 0, 1, &Fast) && Fast)))
-    return MVT::f128;
+  // In general it's optimal to use 64-bit registers with load/store pair
+  // instructions for memcpy() inlining, rather than doing the same with regular
+  // load/store instructions operating on 128-bit registers. Do not use 128-bit
+  // types.
+
+  if (Subtarget->isCyclone()) {
+    // Don't use AdvSIMD to implement 16-byte memset. It would have taken one
----------------
How general is this? We should be writing for future cores as well as existing ones, and always preferring 64-bit operations seems like it'll be more and more of an oddity in future.

It also seems like it belongs in a completely separate patch to the interleaving one.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreInterleave.cpp:242-245
@@ +241,6 @@
+// Checks if a set of load and store instructions can be safely reordered.
+static bool isSafeToReorder(MachineBasicBlock &MBB,
+                            const SmallVectorImpl<MachineInstr*> &Lds,
+                            const SmallVectorImpl<MachineInstr*> &Sts,
+                            const TargetRegisterInfo *TRI) {
+  if (Sts.empty() || Sts.size() != Lds.size())
----------------
This seems like a really fragile way to do this. It's only ever going to work on a basic block with a single memcpy operation and no other loads/stores.

http://reviews.llvm.org/D6054