[PATCH] D99272: [AArch64] Adds a pre-indexed paired Load/Store optimization for LDR-STR.

Fri Apr 30 04:11:49 PDT 2021

stelios-arm marked 4 inline comments as done.
stelios-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2820
 /// Only called for LdSt for which getMemOperandWithOffset returns true.
 bool AArch64InstrInfo::shouldClusterMemOps(
     ArrayRef<const MachineOperand *> BaseOps1,
----------------
dmgreen wrote:
> stelios-arm wrote:
> > dmgreen wrote:
> > > Should this get some updates to make it more precise for pre-inc pairs?
> > Could be. Currently, this method is called for load/stores when `getMemOperandWithOffset` returns `true`. However, `getMemOperandWithOffset` has a call to `getMemOpInfo`:
> > 
> > ```
> >   // If this returns false, then it's an instruction we don't want to handle.
> >   if (!getMemOpInfo(LdSt.getOpcode(), Scale, Width, Dummy1, Dummy2))
> >     return false;
> > ```
> > Currently, the `getMemOpInfo` does not include the pre-inc instructions. Additionally, for the post-inc instructions, there is only one variant available for `STRWpost`/`LDRWpost`.  
> > 
> > I am not sure why the other variants are not included, so I am bit skeptical If the pre-inc instructions should be added.
> > 
> > What do you think?
> > 
> >  
> We may want to make it more precise, but for the moment it's probably fine so long as it's not going to get anything drastically wrong. It clusters memory during scheduling to potentially increase the number of combined loads later on.
I agree, maybe that's the next step. 

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2257
   // Make sure this is a reg/fi+imm (as opposed to an address reloc).
   assert((MI.getOperand(1).isReg() || MI.getOperand(1).isFI()) &&
          "Expected a reg or frame index operand.");
----------------
dmgreen wrote:
> stelios-arm wrote:
> > dmgreen wrote:
> > > Is this assert valid/sensible for PreInc?
> > The assert is valid for pre-inc ld/st because `MI.getOperand(1)` is the destination register.
> Valid - sure. But sensible? It's trying to check that the address operand is a Reg/FrameIndex. With Preinc, that will be shifted one operand. Can we make sure it checks the correct operand for those too?
Oh yes, sure! 

================
Comment at: llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir:17
+    ; CHECK: liveins: $w0, $w1, $x1
+    ; CHECK: early-clobber $x1, renamable $w0, renamable $w1 = LDPWpre renamable $x1, 5 :: (load 4)
+    ; CHECK: STPWi renamable $w0, renamable $w1, renamable $x1, 0 :: (store 4)
----------------
dmgreen wrote:
> Although I don't think it's an issue with this patch exactly, should this not have two MMO's? Either be :: (load 8) or :: (load 4) (load 4) ?
Yes, it should have "(load 4), (load 4)" instead. 

================
Comment at: llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir:615
+    ; CHECK: liveins: $q0, $q1, $x1
+    ; CHECK: early-clobber renamable $x1, renamable $q0 = LDRQpre renamable $x1, 260, implicit $w1 :: (load 16)
+    ; CHECK: renamable $q1 = LDRQui renamable $x1, 1 :: (load 16)
----------------
dmgreen wrote:
> According to this the limit of a LDP is 1008: https://godbolt.org/z/613xsozqP.
> So 1024 is the first multiple of 16 that should be invalid.
According to [[ https://godbolt.org/z/a65v3vzcT | this ]], the offset of LDR<>pre/STR<>pre is in range `[-256,255]`. Assuming that `x∈[-256,255]` and `c = size of <S,D,Q,W,X>` , for this type of optimization the resulted LDP<>pre/STP<>pre will be in range of `[min(x mod c == 0), max(x mod c == 0)]`.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99272/new/

https://reviews.llvm.org/D99272