[PATCH] Improve the cost evaluation of LSR

Fri May 1 16:38:51 PDT 2015

> On May 1, 2015, at 3:58 PM, Xinliang David Li <davidxl at google.com> wrote:
> 
>> 
>> 
>> I have been extreme on my position to emphasize that the register pressure
>> is broken right now. I am afraid we will chase improvement/regressions based
>> on luck if we pursue in that direction without furthest evidences.
>> Eventually we may reach something reasonable, but in the meantime the
>> situation will be uncomfortable. I would need to do a gymnastic to keep the
>> existing improvements while moving toward the right solution. That is why I
>> was proposing to see how bad we are if we keep it simple first, then we can
>> move toward a better model.
>> 
>> The register pressure estimation may really make sense, but if that is the
>> case I believe we should make it an analysis pass to at least unify what we
>> are doing in the loop vectorizer.
>> Like I said the current pressure estimate is broken:
>> 1. The estimation is based only on the number of register required to
>> materialize the formulae, again IIRC. Therefore if it exceeds the register
>> pressure, then yes, we will exceed the register pressure. But when we do not
>> exceed the register pressure, then we cannot say anything since we do not
>> consider the code in the loop body.
>> 2. The number of registers given by the TTI is really rough. For instance,
>> it does not account for aliasing between scalar and vector, it sees float
>> and int as part of the same register class.
>> 
> 
> Sounds like something worth fixing eventually. The improved register
> pressure analysis can be handy to guide other optimizations such as
> inliner etc too.

I think Wei’s current direction on introducing modes and gather data is fine. I’m responding to some of the earlier comments on LSR…

So, the goal of the LSR pass in LLVM historically has been to reduce the register pressure that is induced by the form of the induction variables. Reducing the number of instructions was secondary. It has no knowledge of pressure at any particular point within the loop so it's primary job is to minimize the cyclic liveness. It has to view all IV uses as occurring before any local registers are needed. That's why AddRecCost is the primary factor.

NumRegs rougly indicates the temp registers that will be needed. That's a weak indication of register pressure, but helps drive the solution toward common subexpressions which is generally a good thing.

More recently, people have improved target specific logic in LSR to better recogize addressing modes. So LSR is becoming more useful as a way to reduce the cost of address generation too. Note that Quentin has fixed and tuned things in that area and carefully investigated regressions and analyzed performance impact.

That said, the fundamental cost model and heuristics in LSR are generally pretty flaky and have been known to pessimize code. I think the overall design has an i386 bias. For high-performance coders, hand-unrolling and vectorizing their own loops, -disable-lsr is a good strategy.

I think it could be useful to get a coarse-grained view of pressure in the loop before running LSR. I'm wary of thresholds, but it would be interested to characeterize loops before blinding reducing the number of IVs.

Note that the RegisterPressureTracker available in MachineIR is very precise. Any serious optimization to reduce pressure should be done at MachineIR level (global code motion, rematerialization, etc)., but no one wants to rewrite induction variables at that level.

Andy

> 
> 
>> 
>>> Your current approach illustrate this point. Indeed, IIRC NumRegs only
>>> gives you the number of registers you need to materialize the formulae, we
>>> do not consider how many register we already need in the loop or through the
>>> loop. Therefore, I believe by tweaking the body of the loop in your
>>> motivating example (i.e., adding just enough live ranges), we can bloat the
>>> register pressure with the new rating and have spill within the loop,
>>> whereas we wouldn’t with the previous rating.
>> 
>> 
>> If we don't want spills, but spills occur, then it is matter of tuning
>> register pressure estimation -- can even be made more conservative. In fact,
>> if we consider spills in the overall cost, we can even deliberately allow
>> some spills to reduce overall cost (but that requires very precise pressure
>> estimate).
>> 
>> 
>> Sure.
>> 
>> 
>>> 
>>> 
>>> I also mentioned that in the related PR, but I believe the way to go is:
>>> not to care on register pressure and just rate the cost of the loop body.
>>> However, this implies the backends are able to recover from the register
>>> pressure bloat and I believe we are not quite here.
>>> 
>> 
>> The cost should include 'potential' spills due to register pressure.
>> 
>> 
>>> 
>>> *How do we move?
>>> 
>>> I would suggest we add an internal option to make LSR more aggressive
>>> w.r.t. to register pressure, and fix all the problems that rise in the
>>> backends. Then, we can turn that option on by default.
>> 
>> 
>> We want to generate optimal code sequence with minimal cost --- that is not
>> equivalent to 'the most aggressive LSR'.   Do we already know possible ways
>> to fix the problem once the damage is already made (high reg pressure …)?
>> 
>> 
>> That is the point, we do not know and to me that would be the first
>> information we should seek to determine the best direction. My guts say we
>> may not be able to recover and indeed a register pressure estimation would
>> come handy, but I like facts :).
>> 
> 
> Wei, is it possible to introduce mode (under the option) that also
> does what Quentin suggested? I like facts and data too.
> 
> 
>> 
>>> 
>>> 
>>> - What if other people still believe this is the right way to move?
>>> 
>>> Like I said, I do not think this is the right way to go. Now, if other
>>> people believe it is, I would at least expect that you supply more details
>>> numbers. In particular, what are the actual numbers (not just the geometric
>>> means) and what are the regressions, why we do not care or how do we plan to
>>> fix them.
>> 
>> 
>> I suggest Wei to refactor the change in a way so that it can be turned
>> on/off with an option. When that is ready, the community can help with the
>> performance testing on their favorite platforms with their favorite
>> benchmarks. It may turns out to be better for everyone, who knows :)
>> 
>> 
>> If we move toward this heuristic, I really suggest we make it a pass or
>> utility to provide register pressure estimation at the IR level. The loop
>> vectorizer and also probably GVN, LICM, etc. could use such information.
>> 
> 
> I think this is a good idea.
> 
> thanks,
> 
> David
> 
> 
>> Cheers,
>> -Quentin
>> 
>> 
>> David
>> 
>> 
>> 
>>> 
>>> 
>>> Cheers,
>>> -Quentin
>>> 
>>> 
>>> REPOSITORY
>>>  rL LLVM
>>> 
>>> http://reviews.llvm.org/D9429 <http://reviews.llvm.org/D9429>
>>> 
>>> EMAIL PREFERENCES
>>>  http://reviews.llvm.org/settings/panel/emailpreferences/ <http://reviews.llvm.org/settings/panel/emailpreferences/>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150501/de16962d/attachment.html>