[PATCH] [ARM] Teach the ARM Load Store Optimizer to collapse ldr/str's to ldrd/strd's

Mon Apr 27 10:26:08 PDT 2015

I don't know much about the ARM Load/Store Optimizer but from looking at it there's already some machinery for generating LDRD/STRD. Why is this a separate LoadStoreToDoubleOpti function instead of being integrated into LoadStoreMultipleOpti?

Always favouring LDRD/STRD is probably a little simplistic. LDRD/STRD is better than LDM/STM in that:

- it has more flexible addressing, so can be used in cases where LDM/STM can't
- it may be faster on the cpu you're compiling for (from a brief peruse of TRMs: on pre-7A and Cortex-M3/4 LDM may be faster, otherwise LDRD is at least as fast or may be faster)

Also: when optimizing for size we would want to use LDM if it means less bytes worth of instructions.

================
Comment at: lib/Target/ARM/ARMLoadStoreOptimizer.cpp:67-70
@@ +66,6 @@
+
+static cl::opt<bool> AlwaysCollapseToLoadStoreDouble("arm-load-store-use-ldrd-strd",
+    cl::Hidden, 
+    cl::desc("Always try and collapse load/store pairs into ldrd/strd's if" \
+      "available on target architecture"), cl::init(true));
+
----------------
By default always using ldrd/strd without reference to if it's faster on the target CPU sounds like a bad idea, e.g. according to the Cortex-M3 TRM LDRD is 3 cycles, but LDM is 2 + (nr registers - 1).

================
Comment at: lib/Target/ARM/ARMLoadStoreOptimizer.cpp:1821-1831
@@ +1820,13 @@
+
+// FIXME: Currently, only supports collapsing ldr/str's to ldrd/strd's for
+// V7M based cores. V7A and V7R architectures also support ldrd/strd instruction
+// with a few restrictions, for example for the ldrd instruction
+// the first destination register must be an even numbered register and
+// second register must be (first register number + 1). We should update
+// the code at some point to make it possible to generate ldrd/strd for
+// these architectuers as well.
+bool ARMLoadStoreOpt::LoadStoreToDoubleOpti(MachineBasicBlock &MBB) {
+  if (!isThumb2 || !STI->hasV7Ops() || !STI->isMClass()) {
+      return false;
+  }
+  bool Modified = false;
----------------
Actually the even/odd restriction is in A32 restriction, not a non-M-class restriction, i.e. in 7-A/R T32 there should be no problem.

http://reviews.llvm.org/D9298

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/