[PATCH] D18890: [AArch64] add SSA Load Store optimization pass

Mon Apr 11 05:18:08 PDT 2016

rengolin added a comment.

Some good changes, some comments, but I still wonder why:

1. You haven't done this in the already existing LdStOpt pass.
2. You enable this pass by default now, instead of let people play with it for a while.

cheers,
--renato


================
Comment at: lib/Target/AArch64/AArch64SSALoadStoreOptimizer.cpp:71
@@ +70,3 @@
+// Scaling factor for unscaled load or store.
+static int getMemScale(MachineInstr *MI) {
+  switch (MI->getOpcode()) {
----------------
JongwonLee wrote:
> rengolin wrote:
> > This functionality seems to match with TII's isUnscaledLdSt(), maybe it could be merged there?
> TII's isUnscaledLdSt() just returns boolean type accroding to opcode. Actually, getMemScale() is called dependently on the result of isUnscaledLdSt(). This usage pattern is also found in AArch64LoadStoreOptimizer.cpp. 
To me, it seems like isUnscaledLdSt should be just:

    return (getMemScale() == 1);

================
Comment at: lib/Target/AArch64/AArch64SSALoadStoreOptimizer.cpp:168
@@ +167,3 @@
+  int FirstMIOffset = getLdStOffsetOp(FirstMI).getImm();
+  int FirstMIOffsetStride = FirstMIIsUnscaled ? getMemScale(FirstMI) : 1;
+
----------------
JongwonLee wrote:
> rengolin wrote:
> > You have added the ops with scale 1, and an assert to make sure you're passing the right opcode. Shouldn't you just call getMemScale here and use it as an opcode check, and set the flag with (Stride == 1)?
> I cannot catch your point. Could you show me a code sample you want?
    int FirstMIOffsetStride = getMemScale(FirstMI);
    bool FirstMIIsUnscaled = (FirstMIOffsetStride == 1);

or, if you do the transformation I mentioned above:

    int FirstMIOffsetStride = TII->getMemScale(FirstMI);
    bool FirstMIIsUnscaled = TII->isUnscaledLdSt(FirstMI)

Inliners could common up both calls to getMemScale.

================
Comment at: lib/Target/AArch64/AArch64SSALoadStoreOptimizer.cpp:407
@@ +406,3 @@
+  bool Modified = false;
+  // 1) Find consecutive two 32-bit loads and consecutive two 32-bit stores that
+  //    write the values of the consecutive 32-bit loads. Transform the
----------------
JongwonLee wrote:
> rengolin wrote:
> > This comment is in the wrong place. It should be at a higher level, where people can actually read it before going through the code.
> I referenced this format from AArch64LoadStoreOptimizer.cpp. I think there will be more chances to optimize at this level. So, I think that the explanation is necessary for each case even though this patch starts with just one optimization case. 
> Maybe we will have ..
> 
> ```
> 2) description for second optimization
> ...
> 3) description for third optimization
> ...
> ```
Good point.

================
Comment at: test/CodeGen/AArch64/ldst-opt.ll:1125
@@ -1124,3 +1124,3 @@
   %i = phi i64 [ %dec.i, %for.body], [ %count, %0 ]
-  %gep1 = getelementptr i32, i32* %phi1, i64 -1
+  %gep1 = getelementptr i32, i32* %phi1, i64 -3
   %load1 = load i32, i32* %gep1
----------------
JongwonLee wrote:
> rengolin wrote:
> > Can you explain this change? Looks suspicious...
> This change is to prevent the optimization of this patch from being applied. This test is for another optimization.
Ah! Makes sense.


http://reviews.llvm.org/D18890