[PATCH] D34583: [LSR] Narrow search space by filtering non-optimal formulae with the same ScaledReg and Scale.

Thu Jun 29 17:53:26 PDT 2017

sanjoy added inline comments.

================
Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:4360
+      Regs.clear();
+      CostFB.RateFormula(TTI, FB, Regs, VisitedRegs, L, SE, DT, LU);
+      return CostFA.isLess(CostFB, TTI);
----------------
wmi wrote:
> sanjoy wrote:
> > As far as I can tell, in this situation we're making an arbitrary choice when the register use count is the same and both the formulas have the same cost (we'll just pick the second one).  Can we instead keep both the formulas when `Cost` does not give us a unambiguous signal?
> We uses the similar strategy in other places, like in FilterOutUndesirableDedicatedRegisters at the place: CostF.isLess(CostBest, TTI)). I think the key point here is to keep the formula set small while providing more induction variable choices for LSR solver, so making an arbitrary choice to reduce the formula set may not be too bad. 
Sounds good.

================
Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:4352
+      // shared among LSRUses, the less we increase the register number
+      // counter of the formula.
+      size_t FARegNum = 0;
----------------
Can we instead take the total number of uses of registers and `FA.BaseRegs` and `FB.BaseRegs` and compare them?  That is:

```
int TotalUsesOfA = 0, TotalUsesOfB = 0;
for (const SCEV *Reg : FA.BaseRegs)
  TotalUsesOfA += RegUses.getUsedByIndices(Reg).count();
for (const SCEV *Reg : FB.BaseRegs)
  TotalUsesOfB += RegUses.getUsedByIndices(Reg).count();

if (TotalUsesOfA != TotalUsesOfB)
  return TotalUsesOfA > TotalUsesOfB;
```

Or does it have to exactly be the expression you're using?

================
Comment at: test/Transforms/LoopStrengthReduce/X86/lsr-filtering-scaledreg.ll:7
+%class.ZippyScatteredWriter = type { i8, i8*, i8* }
+ at e = local_unnamed_addr global %class.A { i8 0, i8 0, [5 x i32] zeroinitializer, i64 1, i64 0, i64 1 }, align 8
+ at f = local_unnamed_addr global %class.ZippyScatteredWriter* null, align 8
----------------
Can you please clean up the names a bit here?  Perhaps using metarenamer?

Also, are both the loops necessary to show the difference in behavior?

Repository:
  rL LLVM

https://reviews.llvm.org/D34583