[PATCH] D99272: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.

Mon Apr 26 04:27:19 PDT 2021

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2257
   // Make sure this is a reg/fi+imm (as opposed to an address reloc).
   assert((MI.getOperand(1).isReg() || MI.getOperand(1).isFI()) &&
          "Expected a reg or frame index operand.");
----------------
stelios-arm wrote:
> dmgreen wrote:
> > Is this assert valid/sensible for PreInc?
> The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register.
Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex. With Preinc, that will be shifted one operand. Can we make sure it checks the correct operand for those too?

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2820
 /// Only called for LdSt for which getMemOperandWithOffset returns true.
 bool AArch64InstrInfo::shouldClusterMemOps(
     ArrayRef<const MachineOperand *> BaseOps1,
----------------
stelios-arm wrote:
> dmgreen wrote:
> > Should this get some updates to make it more precise for pre-inc pairs?
> Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset` returns `true`. However, `getMemOperandWithOffset` has a call to `getMemOpInfo`:
> 
> ```
>   // If this returns false, then it's an instruction we don't want to handle.
>   if (!getMemOpInfo(LdSt.getOpcode(), Scale, Width, Dummy1, Dummy2))
>     return false;
> ```
> Currently, the `getMemOpInfo` does not include the pre-inc instructions. Additionally, for the post-inc instructions, there is only one variant available for `STRWpost`/`LDRWpost`.  
> 
> I am not sure why the other variants are not included, so I am bit skeptical If the pre-inc instructions should be added.
> 
> What do you think?
> 
>  
We may want to make it more precise, but for the moment it's probably fine so long as it's not going to get anything drastically wrong. It clusters memory during scheduling to potentially increase the number of combined loads later on.

================
Comment at: llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1311
   if (OpcA == OpcB)
-    return true;
+    // Same Pre Ld/St Opcodes are not candidates to merge/pair.
+    return !AArch64InstrInfo::isPreLdSt(FirstMI);
----------------
Maybe phrase this as "Opcodes match: If the opcodes are pre ld/st there is nothing more to check."

================
Comment at: llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir:446
+    STRQui killed renamable $q0, renamable $x1, 0 :: (store 16)
+    STRDui killed renamable $d1, renamable $x1, 1 :: (store 4)
+    RET undef $lr
----------------
store 8

================
Comment at: llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir:615
+    ; CHECK: liveins: $q0, $q1, $x1
+    ; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 260, implicit $w1 :: (load 16)
+    ; CHECK: renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
----------------
According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP.
So 1024 is the first multiple of 16 that should be invalid.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99272/new/

https://reviews.llvm.org/D99272