[LLVMdev] Representing a safepoint as an instruction in the x86 backend?

Tue Feb 25 17:32:21 PST 2014

I've got a pseudo instruction with some tricky semantics I need help 
figuring out how to encode properly.  For those interested, this is to 
support fully relocating garbage collection.  I'm going to try to 
express the requirements clearly so that you don't need to understand 
the use case in detail.

My end goal is to capture a list of registers and/or stack offsets for a 
list of virtual registers (known explicitly inside SelectionDAGBuilder) 
at the PC following a call instruction.  In particular, I need to be 
able to update all physical copies of these virtual registers.  I've 
decided to approach this by introducing a new psuedo instruction with 
these semantics, but if anyone has an alternate approach they'd 
recommend, I'm open to that too.

Here's the semantics of my 'instruction':
- It must immediately follow a call instruction, before any return value 
copies, or frame manipulation.  Not every call has a following safepoint.
- It has a variable number of arguments (virtual registers).  All can be 
both read and written.
- It can handle any combination of stack locations and registers. 
Ideally, it should not effect register allocation.

The approach I've taken to date is based in part on the work done for 
PATCHPOINT.  Here's what I've done:
- Introduced a SAFEPOINT psuedo instruction
- Reverse engineered the CALLSEQ_* series of nodes to insert my node 
after the CALL node in both glue and chain sequences.  (BTW, is there 
any documentation on the call sequence?  I think I've reverse engineered 
it correctly, but I'm not completely sure.)
- Introduced folding logic in foldMemoryOperand (analogous to 
PATCHPOINT, but which marks both load and store) -- this is where my 
problem currently lies
- Inserted code during MCInstLower to record the statepoint

The problem with this is that a reload from a stack slot will sometimes 
be inserted between the CALL and the SAFEPOINT.  This is problematic 
since we are no longer recording the list of locations at the site of 
the call itself.  If the recorded information is used during the 
lifetime of the subroutine call, the wrong locations would be updated.  
That would be "bad".

The reason for this is that the folding logic only applies if there's a 
single use of the physical register.  If there's more than one use, it's 
assumed to be cheaper to reload than to perform two folded operations 
against memory.  (I don't know if this is true always, but more 
importantly for me, it breaks my intended semantics.)

Does anyone know of a way to avoid the fold step to begin with?  I'd 
really like the register allocation to not give preference to register 
uses for this instruction.  If a virtual register is already in the 
stack, it shouldn't attempt to reload before this instruction. I haven't 
been able to find the appropriate hook for this.

I can go ahead and hack the folding code to unconditionally fold into 
SAFEPOINTs and move the load after the SAFEPOINT, but that feels like an 
utter hack.  Before going down the road, does anyone have a better 
suggestion?

I'm very open to suggestions here.  If I'm taking the wrong approach or 
something sounds like it doesn't work the way I've described, please 
point it out.  I will freely admit this is my first serious endeavour 
into the x86 backend and that I'm learning as I go.

Philip

Note: For the moment, this is all x86 specific.  Most of it could be 
made architecture independent without too much effort.