[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

Thu Feb 27 14:49:30 PST 2014

On Feb 25, 2014, at 5:32 PM, Philip Reames <listmail at philipreames.com> wrote:

> I've got a pseudo instruction with some tricky semantics I need help figuring out how to encode properly.  For those interested, this is to support fully relocating garbage collection.  I'm going to try to express the requirements clearly so that you don't need to understand the use case in detail.
> 
> My end goal is to capture a list of registers and/or stack offsets for a list of virtual registers (known explicitly inside SelectionDAGBuilder) at the PC following a call instruction.  In particular, I need to be able to update all physical copies of these virtual registers.  I've decided to approach this by introducing a new psuedo instruction with these semantics, but if anyone has an alternate approach they'd recommend, I'm open to that too.
> 
> Here's the semantics of my 'instruction':
> - It must immediately follow a call instruction, before any return value copies, or frame manipulation.  Not every call has a following safepoint.
> - It has a variable number of arguments (virtual registers).  All can be both read and written.
> - It can handle any combination of stack locations and registers. Ideally, it should not effect register allocation.
> 
> The approach I've taken to date is based in part on the work done for PATCHPOINT.  Here's what I've done:
> - Introduced a SAFEPOINT psuedo instruction
> - Reverse engineered the CALLSEQ_* series of nodes to insert my node after the CALL node in both glue and chain sequences.  (BTW, is there any documentation on the call sequence?  I think I've reverse engineered it correctly, but I'm not completely sure.)
> - Introduced folding logic in foldMemoryOperand (analogous to PATCHPOINT, but which marks both load and store) -- this is where my problem currently lies
> - Inserted code during MCInstLower to record the statepoint
> 
> The problem with this is that a reload from a stack slot will sometimes be inserted between the CALL and the SAFEPOINT.  This is problematic since we are no longer recording the list of locations at the site of the call itself.  If the recorded information is used during the lifetime of the subroutine call, the wrong locations would be updated.  That would be "bad".
> 
> The reason for this is that the folding logic only applies if there's a single use of the physical register.  If there's more than one use, it's assumed to be cheaper to reload than to perform two folded operations against memory.  (I don't know if this is true always, but more importantly for me, it breaks my intended semantics.)
> 
> Does anyone know of a way to avoid the fold step to begin with?  I'd really like the register allocation to not give preference to register uses for this instruction.  If a virtual register is already in the stack, it shouldn't attempt to reload before this instruction. I haven't been able to find the appropriate hook for this.
> 
> I can go ahead and hack the folding code to unconditionally fold into SAFEPOINTs and move the load after the SAFEPOINT, but that feels like an utter hack.  Before going down the road, does anyone have a better suggestion?

Anything we do to avoid the reload is purely an optimization. There is no way to guarantee it. I think you really need your pseudo instruction to include the call, which is exactly what patchpoint was designed for.

That said, it would be nice to be more aggressive in folding reloads into stackmaps/patchpoints. It sounds like you are already making use of the foldPatchPoint logic that I added to avoid the reloads. We just need some higher level logic to handle more cases. I’m not sure how hard it will be. See InlineSpiller::foldMemoryOperand.

-Andy

> 
> I'm very open to suggestions here.  If I'm taking the wrong approach or something sounds like it doesn't work the way I've described, please point it out.  I will freely admit this is my first serious endeavour into the x86 backend and that I'm learning as I go.
> 
> Philip
> 
> Note: For the moment, this is all x86 specific.  Most of it could be made architecture independent without too much effort.