[PATCH] [ARM] Teach the ARM Load Store Optimizer to collapse ldr/str's to ldrd/strd's

Fri May 1 02:51:28 PDT 2015

In http://reviews.llvm.org/D9298#163286, @rs wrote:

> If you prefer it to be part of LoadStoreMultipleOpti then I can rework the patch to make it so.

Yes, that would be better. It looks like MergeOps is the function that gets a set of registers then tries to generate an LDM from them. There you could put some stuff to decide whether to instead break it up into a sequence of LDRD. Looks like you may need to adjust MergeLDR_STR also: it only collects together ascending sequences but thumb2 LDRD doesn't require that.

================
Comment at: lib/Target/ARM/ARMLoadStoreOptimizer.cpp:1821-1831
@@ +1820,13 @@
+
+// FIXME: Currently, only supports collapsing ldr/str's to ldrd/strd's for
+// V7M based cores. V7A and V7R architectures also support ldrd/strd instruction
+// with a few restrictions, for example for the ldrd instruction
+// the first destination register must be an even numbered register and
+// second register must be (first register number + 1). We should update
+// the code at some point to make it possible to generate ldrd/strd for
+// these architectuers as well.
+bool ARMLoadStoreOpt::LoadStoreToDoubleOpti(MachineBasicBlock &MBB) {
+  if (!isThumb2 || !STI->hasV7Ops() || !STI->isMClass()) {
+      return false;
+  }
+  bool Modified = false;
----------------
rs wrote:
> rengolin wrote:
> > john.brawn wrote:
> > > Actually the even/odd restriction is in A32 restriction, not a non-M-class restriction, i.e. in 7-A/R T32 there should be no problem.
> > Certainly the wrong way. A better way would be to have a flag in table gen (like fast-double-store or whatever). The best way would be to have a cost-model, like we have for the vectorizer, but that would be a big change for this small patch.
> > A better way would be to have a flag in table gen (like fast-double-store or whatever).
> OK this approach sounds better, will do it this way for my next patch.
There's some stuff in ARMBaseInstrInfo, e.g. getOperandLatency, getNumMicroOps that appears to understand the timing of LDM/STM instructions. Maybe it's possible to use that plus LDRD/STRD timing information? I.e. instead of putting something in tablegen then assuming "well, LDRD is fast so I'm going to guess that here it'll be faster than LDM" instead calculate what the timing of LDRD and LDM would be for loading a set of registers and use whichever is quickest.

http://reviews.llvm.org/D9298

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/