[PATCH] Improve the cost evaluation of LSR

Fri May 1 11:32:19 PDT 2015

Hi Quentin,

Thanks for explaining your idea to the LSR problem in detail.

> Now, the long story.
>
> I have mixed feelings on the direction of the approach. On one hand, I also think we should optimize for performance as long as register pressure is not a problem. On the other hand, the register pressure estimate at this level too rough to make any useful decisions.
>

I agree with you, except that the register pressure estimate cannot
make any useful decisions.
I understand that the register pressure estimation cannot be very
precise in an early stage like LSR, but it could still be useful,
especially when the real register pressure in a loop is very low or
very high. When the register pressure in a loop is close to the number
of available registers,  I admit my patch can intend to use more
register, but the intention is still to reduce instruction number in
other place, like reducing recurrance adds at the end of loop or
reducing NumAddParts for a LSRUse. Perf impact in such case will be
hard to tell, because we may have more spills, but have less add insns
in the loop at the same time.

> Your current approach illustrate this point. Indeed, IIRC NumRegs only gives you the number of registers you need to materialize the formulae, we do not consider how many register we already need in the loop or through the loop. Therefore, I believe by tweaking the body of the loop in your motivating example (i.e., adding just enough live ranges), we can bloat the register pressure with the new rating and have spill within the loop, whereas we wouldn’t with the previous rating.

I hope for the test tweaked, although the new rating may have more
spills, it will have less add insns. If it is not the case, it is a
bug in the new rating I should look at. The new rating have no reason
to increase NumRegs when it cannot reduce InstNumCost.

>
> I also mentioned that in the related PR, but I believe the way to go is: not to care on register pressure and just rate the cost of the loop body. However, this implies the backends are able to recover from the register pressure bloat and I believe we are not quite here.

I agree it is a difficut way to go.

>
> *How do we move?
>
> I would suggest we add an internal option to make LSR more aggressive w.r.t. to register pressure, and fix all the problems that rise in the backends. Then, we can turn that option on by default.

That is a good suggestion! We can have more testcases then by
comparing several approaches.

>
> - What if other people still believe this is the right way to move?
>
> Like I said, I do not think this is the right way to go. Now, if other people believe it is, I would at least expect that you supply more details numbers. In particular, what are the actual numbers (not just the geometric means) and what are the regressions, why we do not care or how do we plan to fix them.

The improvement is: 1% for Ad Delivery, 1.5% for licence place
detection, 1.5% for object recognition, 3% for a matrix computation
library. I also tested spec2000 and found almost no perf difference.
I didn't see significant regressions caused by LSR now. Actually I saw
at first then I fixed those regressions and collected some testcases.
There were still some regressions but caused by other side effect
after analysis. However all the tests are only for x86. I believe it
will have many regressions on other platforms.

Thanks,
Wei.