[PATCH] [AArch64] Inline memcpy() as a sequence of ldp-stp with 64-bit registers

Sergey Dmitrouk sdmitrouk at accesssoftek.com
Mon Nov 3 06:15:11 PST 2014


Hi James,

> It'd have been nicer if we could convince the scheduler to do this instead, rather than going behind its back though.

That's what I've tried initially, but wasn't able to do.

> Have you talked to Andy Trick or Dave Estes to work out if this is possible?

I don't think so, there were a couple of related threads on llvm-dev, and doing
this similar to load/store optimizer was the only proposed solution. Scheduler
doesn't seem to have extension points where one could provide hints about
instruction, at least I didn't find a way other than to subclass it.

> Comments inline.

Those not answered inline here are addressed in newer revision.

Sergey

================
Comment at: lib/Target/AArch64/AArch64LoadStoreInterleave.cpp:25
@@ +24,3 @@
+//              2. <something1>              2. <something2>
+//              3. <store1>                  3. <load1>
+//              4. <something2>              4. <store1>
----------------
jmolloy wrote:
> The important thing is that we have ldp/stp in that order, ideally with increasing addresses. We don't need to cluster them all together - it's the ordering of memory operations that counts I think.
> 
> So we can have:
> 
> ldp
> stp
> add # unrelated operation
> ldp
> stp
> 
> This should be fine, and may be a good thing, depending on the microarchitecture. 
I'll try that, it shouldn't require a lot of changes.

> ideally with increasing addresses

Actually, input of the pass is already in reverse order:
```
        ldp     x10, x11, [x8, #48]
        stp     x10, x11, [x9, #48]
        ldp     x10, x11, [x8, #32]
        stp     x10, x11, [x9, #32]
        ldp     x10, x11, [x8, #16]
        stp     x10, x11, [x9, #16]
        ldp      x10, x8, [x8]
        stp      x10, x8, [x9]
```
which might come from `getMemcpyLoadsAndStores()` in `SelectionDAG.cpp`, which
doesn't specify order.


================
Comment at: lib/Target/AArch64/AArch64LoadStoreInterleave.cpp:144
@@ +143,3 @@
+static bool isSafeInstruction(unsigned LdBase, unsigned StBase, MachineInstr *I,
+                              const TargetRegisterInfo *TRI, int SeenStore) {
+  if (I->isDebugValue()) {
----------------
jmolloy wrote:
> Wouldn't isSafeToSpeculate() conservatively do the same job here?
I don't see any function with exactly this name, functions with similar name don't seem to fit and some are also static.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreInterleave.cpp:205
@@ +204,3 @@
+    case AArch64::STURXi:
+      // Loads should go first.
+      if (!Lds.empty()) {
----------------
jmolloy wrote:
> This was not mentioned in the comment; why should all loads come before all stores?
Wrong comment, I meant that first load should go before first store.

http://reviews.llvm.org/D6054






More information about the llvm-commits mailing list