[LLVMdev] Stackmaps: caller-save-registers passed as deopt args

Tue Nov 18 11:09:29 PST 2014

Hi,

Sorry for the late reply, I missed this thread.

On Nov 6, 2014, at 4:47 PM, Andrew Trick <atrick at apple.com> wrote:

>> 
>> On Nov 6, 2014, at 1:53 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>> 
>>> An easier option would be to generate a second
>>> pseudo instruction immediately after the patchpoint that simply uses
>>> all the live-on-return values.
>> 
>> Do you mean some variant of
>> 
>> PATCHPOINT ... (live_values = %vreg0, %vreg1)
>> FAKE-USE %vreg1 ;; %vreg1 needs to be live on return
>> 
>> For this kind of a solution, we'll have to prevent the register
>> allocator (and other lowering components, I'm not sure) playing tricks
>> with inserting fills and remats.  IOW, the above code should not
>> compile to
>> 
>> mov 16(%rsp), %r11
>> mov 24(%rsp), %r10
>> inc %r10
>> 
>> PATCHPOINT ... (live_values = %r11, %r10)
>> mov 24(%rsp), %r10
>> inc %r10
>> FAKE-USE %r10
>> 
>> Naively, both the register assignments to %vreg1, in PATCHPOINT and
>> FAKE-USE, are caller-saved and not live on return.  It may be possible
>> to reverse-engineer the computation that will compute the value of
>> %r10, but that's non-trivial.
> 
> I was thinking FAKE_USE would need special handling. It would be bundled before and after register allocation, but unbundled during register allocation so the live interval of the live values would span the call instruction. The problem is that the register allocator itself is free do something like you’ve shown above, so it’s not a very robust solution and I don’t particularly like it. I thought it would be easier than adding a “LateUse” MachineOperand flag, but now I’m not so sure. Any ideas Quentin?

We takled with Andy about that.
The late-use flag would work (though I do not know how involved that would be).

Another approach is to extend the current register class constraints mechanism to match the dynamic constraint of your ABI convention.
Right now, we can specify the register class of each operand of an instruction. The idea would be to do the same with the operand of the patch point/stackmap intrinsic.
The tricky part is that, unlike regular instruction, those register classes are dynamically determined.

This approach is more general, i.e., we would be able to restrict the allocatable registers of every machine operand, but is likely more involved.

Here is how it would work on an example:

Right now, we have something like this (where <> represent register class).

vreg1<GPR> =
stackmap vreg1<GPR>

The proposed approach would change that into:

vreg1<GPR> =
vreg2<GPR-deopts> = vreg1<GPR>
stackmap vreg2<GPR-deopts>

Then the register coalescer/register allocator would do all the work of propagating the constraints if at all possible, splitting, etc.

Cheers,
-Quentin

> 
>> One simple solution is to force spilling of live-on-return values at
>> the SelectionDAG layer (we do this for statepoints currently); but
>> that prevents us from using callee-saved registers and may be
>> suboptimal.
> 
> If you need to do that then the design is broken. But it is an option for Kevin to get things working.
> 
>> A completely orthogonal idea: it should also be possible to do
>> deopt-on-return only using live-on-call values if you're willing to
>> use invokes instead of calls.  Instead of
>> 
>> call void @patchpoint(@callee_that_may_deopt, args, live_values)
>> 
>> emit
>> 
>> invoke @callee_that_may_deopt(args)
>> 
>> deopt_pad:
>>   ... landingpad ...
>>   call void @patchpoint(@deoptimize_me, ... live_values ...)
>> 
>> 
>> ... live_values ... in deopt_pad now only needs to be live-on-call.  It
>> can actually even be function arguments to @deoptimize_me -- the
>> register / stack slot shuffling due to a fixed calling convention will
>> no longer happen on the (presumably hot) call, but only when
>> deoptimizing.  You'd have to deoptimize by "throwing" an exception in
>> this case.
> 
> That’s an elegant approach to avoiding the backend problems. The IR representation makes perfect sense. I think your deopt mechanism would need to patch the return address as if an exception were thrown (I guess that’s what you mean by “throwing an exception”). Also, you will end up with exception tables that need to be parsed in addition to stackmaps, which seems fairly horrible. Agree?
> 
> I assume that statepoints have the same issue with both deopt values and GC roots once you stop spilling them. Do you have a preferred or tentative solution for it?
> 
> -Andy
> 
>> 
>> -- Sanjoy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/8cb3a6cf/attachment.html>