[llvm-commits] [llvm] r82734 - in /llvm/trunk/lib: CodeGen/PrologEpilogInserter.cpp CodeGen/PrologEpilogInserter.h Target/ARM/ARMBaseRegisterInfo.cpp Target/ARM/Thumb1RegisterInfo.cpp

Mon Sep 28 11:23:47 PDT 2009

On 28/09/2009, at 18.25, Jim Grosbach wrote:
> On Sep 25, 2009, at 12:47 AM, Jakob Stoklund Olesen wrote:
>> It seems that for now, we must require that live ranges of virtual  
>> registers are disjoint. Otherwise we would need more than one  
>> emergency spill slot in the worst case.
>>
>> With only one virtual register live at a time, the map is indeed  
>> overkill.
>>
>> It would also be a good idea to enforce the one-virtual-register-at- 
>> a-time rule with asserts here.
>
> Agreed to both. As indicated by the comment, I was thinking towards  
> doing something fancier when we start to re-use these. Allowing more  
> than one to be live so long as the emergency spill slot is unused,  
> or something like that. Thus, I didn't want to encode too many  
> strong assumption in place that would make that harder.

It is definitely possible to do something fancier here. The thing to  
remember, though, is that this pass must work /every/ time. A clever  
trick that relies on a register or the emergency spill slot being  
available will maybe work 99% of the time, but we must have a fallback  
solution for the last 1%. Therefore, if virtual registers overlap, we  
must have two (or more) emergency spill slots.

In a situation like this, I think we need to make clear rules for the  
scavenger clients: If your virtual registers don't overlap, the  
scavenger will work every time. Clients must follow the rules,  
otherwise there will be weird functions that don't compile.

If a client needs more flexibility, we can change the rules. For  
instance: "You must give the scavenger as many emergency spill slots  
as the maximum possible number of simultaneous virtual registers". I  
don't think that is needed for now.

Fancy code can give us better performance, for instance by reducing  
the number of emergency spills, but we  must have a worst-case  
fallback always.

> Upon reflection, however, I don't think that's a direction we should  
> go. If we want the compiler to be that much smarter about reusing  
> the constants, we should work towards letting the real register  
> allocator do the work. This pass should be a quick and simple one to  
> handle obvious cases and shouldn't try to get too smart.

That is probably true. We can improve performance by using available  
callee-saved registers, and optimizing emergency spill locations. If  
things get more complicated, it is probably a good idea to involve the  
register allocator somehow.

[...]

>> Ideally, I would like to get rid of FindUnusedReg(), so you could  
>> just say scavengeRegister() here. The ARMLoadStoreOptimizer is the  
>> only one to use FindUnusedReg() without immediately calling  
>> scavengerRegister() when it fails. I am no sure if that is  
>> intentional.
>
> I've been thinking about that, too. I can see the benefit in asking  
> it for a register, but only if a spill isn't necessary. For example,  
> when loading a constant, materializing the constant directly may be  
> more expensive than a load, but still cheaper if spills are  
> necessary. Nothing is being that smart currently, though. In  
> practice, it looks like one could call scavengeRegister() directly  
> without first calling FindUnusedReg() and get the single-call  
> interface you're looking for.

I think that scavengerRegister should implicitly call FindUnusedReg so  
that clients don't have to worry about it. I can see a case for still  
needing a raw FindUnusedReg, yes.

But the needed function is probably really  
FindUnusedRegThatIsAlsoNotUsedByThisInstructionRange().

Such clever clients must always be able to handle FindUnusedReg  
returning 0. It looks like the ARMLoadStoreOptimizer handles that  
correctly.

>> There is another issue here. FindUnusedReg() will give you a  
>> register that is unused /at the current position/. You really need  
>> a register that is unused in the whole live range of the virtual  
>> register you are replacing. scavengeRegister() won't return a  
>> register that is used by the instruction at I, but it looks like we  
>> need a variant that can exclude from whole range of instructions.
>>
>> The rule should be: "NewReg must not be defined by any instruction  
>> in the range, except for the last. NewReg must not be an  
>> EarlyClobber operand on the last instruction."
>
> In the general case, yes, that's true. In the cases we're dealing  
> with, however, we don't need to be that careful. The register is  
> defined, then used in the next instruction or two. We're splitting  
> out a single instruction referencing the frame when the constant  
> offset is too large. Either we're adjusting the stack point by a  
> constant value and need a register to hold the constant, or we're  
> accessing a stack slot and need a register to hold the offset to the  
> slot. Either way, if the register live range is such that it's  
> insufficient to check if the register is available at the current  
> location, then it's a flaw in the way the back end is using this pass.

An example from the Blackfin backend. Storing a 16-bit subregister to  
a stack slot:

STORE16fi P0.L, <fi#-1>, 0

With the new vreg code, eliminateFrameIndex will produce:

%reg2000 = LOADuimm16 100
%reg2000 = ADDpp %reg2000, %FP
STORE16pi %reg2000, %P0.L

Now we need to replace the vreg with a scavenged register. Assume all  
P-class registers are used, so we must scavenge one. If  
scavengeRegister() chooses to scavenge P0, we are dead - the code  
would become

STORE32pi %FP, emergengy-slot, P0
%P0 = LOADuimm16 100
%P0 = ADDpp %P0, %FP
STORE16pi %P0, %P0.L
LOAD32pi %P0, %FP, emergency-slot

This disaster won't actually happen because the heuristics in  
RegScavenger::findSurvivorReg() choose the candidate that is untouched  
for the longest time. But that is just a heuristic, we shouldn't rely  
on that.

I think scavengeRegister needs to examine all three instructions (the  
live range of %reg2000) to avoid such disasters.

> That said, that's an assumption worth enforcing. Perhaps in the  
> loop, if we have an active scavenged register, we should add an  
> assert() that the current instruction does not reference it. The  
> reference we're looking for will still be a virtual register, so it  
> won't fire on that. Hmmm.

That is a very good idea. It is simple to do, and you are scanning  
operands anyway.

[...]

>> Yeah, I also want to get rid of forward() as a public method. Since  
>> scavengeRegister takes an iterator argument that /must/ be next(MI)  
>> anyway, there is really no point to having it.
>
> I'm a bit leery of merging it with scavengeRegister(). It feels more  
> appropriate to keep those bits of functionality separate since  
> they're logically distinct. I can envision wanting to do things like  
> tracking to a point in the block and only conditionally scavenging a  
> register before proceeding.
>>
>> Batch forwarding over a range of instructions is going to be faster  
>> as well.
>
> forward(MachineBasicBlock::iterator I) just loops over calls to  
> forward() currently. Any performance improvements available from  
> merging the forwarding into scavengeRegister() should be equally  
> applicable as improvements to the current implementation, I would  
> think.

Yes. The performance improvement I am after mostly involves the four  
BitVectors in RegScavenger::forward(). They currently cost four  
mallocs and frees per instruction. By swapping forward() and forward 
(MI), we could make a tighter loop around the core logic, and only  
allocate BitVectors once per bulk skip.

> Can you elaborate a bit on why you want to remove forward()? I like  
> the idea of removing methods that aren't really necessary, but I'm  
> not sure that's the case here. Are there other motivations?

Forget about performance. The impact of those four BitVectors probably  
can't be measured.

I want to eliminate redundancy from the API. Right now, you have to  
call forward(MI) to advance the scavenger to your position. Then you  
call scavengeRegister(RC, I, 0) where I /must/ be equal to next(MI).  
It is an error to pass any other iterator, so what is the point of  
having the argument?

So you can choose to remove the iterator argument from scavengeRegister 
(), or you can get rid of forward() (as a public method, anyway). I am  
shooting for forward() because the semantics are already a bit weird.  
Calling forward(MI) places the scavenger /after/ MI while BuildMI()  
inserts instructions /before/ the iterator. That is why you are  
calling forward() at the bottom of you loop, rather than at the top.  
There are good reasons for this, but it is still weird.

By removing forward(), we push the weirdness into the RegScavenger.

If the choose to retain FindUnusedReg, it would need to take an  
iterator argument as well.

But given the discussion above, I think that both methods should take  
two iterator arguments specifying a range of instructions that must  
not be using the returned register.