[PATCH] Improve the cost evaluation of LSR

Fri May 1 17:08:30 PDT 2015

Andy, thanks for sharing the historic background -- many of the
historic constraints will gradually go away :)

David


On Fri, May 1, 2015 at 4:38 PM, Andrew Trick <atrick at apple.com> wrote:
>
> On May 1, 2015, at 3:58 PM, Xinliang David Li <davidxl at google.com> wrote:
>
>
>
> I have been extreme on my position to emphasize that the register pressure
> is broken right now. I am afraid we will chase improvement/regressions based
> on luck if we pursue in that direction without furthest evidences.
> Eventually we may reach something reasonable, but in the meantime the
> situation will be uncomfortable. I would need to do a gymnastic to keep the
> existing improvements while moving toward the right solution. That is why I
> was proposing to see how bad we are if we keep it simple first, then we can
> move toward a better model.
>
> The register pressure estimation may really make sense, but if that is the
> case I believe we should make it an analysis pass to at least unify what we
> are doing in the loop vectorizer.
> Like I said the current pressure estimate is broken:
> 1. The estimation is based only on the number of register required to
> materialize the formulae, again IIRC. Therefore if it exceeds the register
> pressure, then yes, we will exceed the register pressure. But when we do not
> exceed the register pressure, then we cannot say anything since we do not
> consider the code in the loop body.
> 2. The number of registers given by the TTI is really rough. For instance,
> it does not account for aliasing between scalar and vector, it sees float
> and int as part of the same register class.
>
>
> Sounds like something worth fixing eventually. The improved register
> pressure analysis can be handy to guide other optimizations such as
> inliner etc too.
>
>
> I think Wei’s current direction on introducing modes and gather data is
> fine. I’m responding to some of the earlier comments on LSR…
>
> So, the goal of the LSR pass in LLVM historically has been to reduce the
> register pressure that is induced by the form of the induction variables.
> Reducing the number of instructions was secondary. It has no knowledge of
> pressure at any particular point within the loop so it's primary job is to
> minimize the cyclic liveness. It has to view all IV uses as occurring before
> any local registers are needed. That's why AddRecCost is the primary factor.
>
> NumRegs rougly indicates the temp registers that will be needed. That's a
> weak indication of register pressure, but helps drive the solution toward
> common subexpressions which is generally a good thing.
>
> More recently, people have improved target specific logic in LSR to better
> recogize addressing modes. So LSR is becoming more useful as a way to reduce
> the cost of address generation too. Note that Quentin has fixed and tuned
> things in that area and carefully investigated regressions and analyzed
> performance impact.
>
> That said, the fundamental cost model and heuristics in LSR are generally
> pretty flaky and have been known to pessimize code. I think the overall
> design has an i386 bias. For high-performance coders, hand-unrolling and
> vectorizing their own loops, -disable-lsr is a good strategy.
>
> I think it could be useful to get a coarse-grained view of pressure in the
> loop before running LSR. I'm wary of thresholds, but it would be interested
> to characeterize loops before blinding reducing the number of IVs.
>
> Note that the RegisterPressureTracker available in MachineIR is very
> precise. Any serious optimization to reduce pressure should be done at
> MachineIR level (global code motion, rematerialization, etc)., but no one
> wants to rewrite induction variables at that level.
>
> Andy
>
>
>
>
> Your current approach illustrate this point. Indeed, IIRC NumRegs only
> gives you the number of registers you need to materialize the formulae, we
> do not consider how many register we already need in the loop or through the
> loop. Therefore, I believe by tweaking the body of the loop in your
> motivating example (i.e., adding just enough live ranges), we can bloat the
> register pressure with the new rating and have spill within the loop,
> whereas we wouldn’t with the previous rating.
>
>
>
> If we don't want spills, but spills occur, then it is matter of tuning
> register pressure estimation -- can even be made more conservative. In fact,
> if we consider spills in the overall cost, we can even deliberately allow
> some spills to reduce overall cost (but that requires very precise pressure
> estimate).
>
>
> Sure.
>
>
>
>
> I also mentioned that in the related PR, but I believe the way to go is:
> not to care on register pressure and just rate the cost of the loop body.
> However, this implies the backends are able to recover from the register
> pressure bloat and I believe we are not quite here.
>
>
> The cost should include 'potential' spills due to register pressure.
>
>
>
> *How do we move?
>
> I would suggest we add an internal option to make LSR more aggressive
> w.r.t. to register pressure, and fix all the problems that rise in the
> backends. Then, we can turn that option on by default.
>
>
>
> We want to generate optimal code sequence with minimal cost --- that is not
> equivalent to 'the most aggressive LSR'.   Do we already know possible ways
> to fix the problem once the damage is already made (high reg pressure …)?
>
>
> That is the point, we do not know and to me that would be the first
> information we should seek to determine the best direction. My guts say we
> may not be able to recover and indeed a register pressure estimation would
> come handy, but I like facts :).
>
>
> Wei, is it possible to introduce mode (under the option) that also
> does what Quentin suggested? I like facts and data too.
>
>
>
>
>
> - What if other people still believe this is the right way to move?
>
> Like I said, I do not think this is the right way to go. Now, if other
> people believe it is, I would at least expect that you supply more details
> numbers. In particular, what are the actual numbers (not just the geometric
> means) and what are the regressions, why we do not care or how do we plan to
> fix them.
>
>
>
> I suggest Wei to refactor the change in a way so that it can be turned
> on/off with an option. When that is ready, the community can help with the
> performance testing on their favorite platforms with their favorite
> benchmarks. It may turns out to be better for everyone, who knows :)
>
>
> If we move toward this heuristic, I really suggest we make it a pass or
> utility to provide register pressure estimation at the IR level. The loop
> vectorizer and also probably GVN, LICM, etc. could use such information.
>
>
> I think this is a good idea.
>
> thanks,
>
> David
>
>
> Cheers,
> -Quentin
>
>
> David
>
>
>
>
>
> Cheers,
> -Quentin
>
>
> REPOSITORY
>  rL LLVM
>
> http://reviews.llvm.org/D9429
>
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>