[PATCH] D37899: [SystemZ] Implement shouldCoalesce() to help regalloc to avoid running out of registers with GR128 regs

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 25 06:25:50 PDT 2017


jonpa added a comment.

>> I thought one could get away with adding calling setOperationAction(..., i128, Expand) on all operations that we do not support...
> 
> It turns out this doesn't work.  E.g. in the case of a 128-bit add, this is split into a sequence of 64-bit add-with-carry operations by the *type legalization* code, not the normal expander.  If you declare i128 a legal type, the type legalizer doesn't do this any more.  But then if you mark the ADD as Expand, the normal expander doesn't know what to do with it either, and -I think- just falls back to a libcall.
> 
>>> i128 lives in a even/odd register pair, but this is really only needed for the few special instructions that operate on those pairs. If you had just a normal load followed by a normal store of a i128 value, you *want* to break it into two independent i64 copies -- there is no reason at all to constrain the register allocator to use a even/odd pair here.
>> 
>> I would have expected that for a tiny live range, it is always better to avoid extra COPYs.
> 
> My point was for a 128-bit load feeding into a 128-bit store, both the load and the store gets expanded to two 64-bit loads / stores anyway.   So we end up with two completely independent 64-bit data flow paths, which require 64-bit registers --- but there is absolutely no benefit to require that these registers form an even/odd pair, they might as well be arbitrary registers (it might even be the same one if the sequence ends up scheduled as LG / STG / LG / STG).  Actually the same applies to most of the other cases where the 128-bit operation ends up split into 64-bit operations by the type legalizer.

I see.

>> The updated patch uses LiveIntervals to quickly check if the two connected vregs are local to MBB and what the first and last instructions are (it may be possible to do this without LIS if we are willing to scan the whole MBB). I merely tried some values of 10 instructions and 8 phys-reg operands in order to detect a non-safe live-range. It may be that we want to check for vreg operands as well.
>> 
>> I don't have adequate tests to tune this further, but this might hopefully be a good starting point. At least all the regressions we saw before are now gone, while still handling your test case.
> 
> This looks promising.   Not sure if anybody would object to changing the interface to pass in the LiveIntervals, but it seems reasonable to me.
> 
> Not sure about the magic numbers -- does it matter whether we stop after 10 MIs?   What kind of compile-time performance impact would it have to scan the whole range?

I don't know, I just thought that we basically wanted to at least handle the type of case where there is a GR128 op and then some subreg COPY(s) very close. My reason for having the instruction limit is that we can't know the register pressure, but it seems safe to think that tiny live ranges checked in this way could be coalesced.

> It looks like your check for > 8 occurrences of physical registers would trigger even if it is always the same register.  Should we instead keep a bitmap of the used registers, so we know how many *different* ones are used?  Also, it should probably only check for GPRs, not any random physical register.

Checking for GPRs is something I seem to have missed, but however the different reg check is something I actually took away after first using a SmallSet. I thought that if we would to this properly, we should in addition also check aliases after finding the set of different regs. I then changed my mind since we are merely trying to do an estimate, and we don't really know for sure what the right limit is, etc and may rather just keep it simple (BTW, I also see that it should do +=2 for a GR128 reg).

I guess it depends on the code and if we are trying to handle "any" type of mixed code with an as-good-as-possible estimate, or are we merely trying to keep coalescing the simple and small sequences?

Taking this a step further, if regalloc can only run out of registers when enough physreg defs overlap the GR128 live range so that there is no single 128-bit register pair to be allocated, then we should basically allow coalescing if there is at least one non-clobbered i128 register pair in the region, right?

Even if there were many such GR128 virtregs overlapping, one would hope that regalloc would be able to split them so that one such free pair would be enough. Would we want to stop coalescing at that some point in order to reduce spilling? We could perhaps try to by looking for GR128 virtual register operands in the region.


https://reviews.llvm.org/D37899





More information about the llvm-commits mailing list