Patch: StackMap: Use IndirectMemRefOp more.

Tue Nov 25 18:35:59 PST 2014

On 11/04/2014 03:21 PM, Gustaw Smolarczyk wrote:
> Hi Philip,
>
> 2014-10-30 22:33 GMT+01:00 Philip Reames <listmail at philipreames.com>:
>> Gustaw,
>>
>> I looked over part of your first patch and like the general direction you're
>> taking.  Before getting in to the details, I want to ask a couple of
>> question about your intent.
>>
>> It seems like your arbitrary base register matching could easily match
>> non-stack locations.  I'm assuming this is by design?  What's the original
>> source construct you're hoping to represent better?
>>
>>  From the code, I'm guessing this would match things like:
>> struct S { int a; }
>> S* s = ...
>> patchpoint(..., s->a, ...)
>>
>> Is this what you're aiming for?
> In general, yes. This is the kind of code I was hoping to optimize the
> references to. I wanted not to waste registers/stack spill locations
> for the values that are already in some structure, especially when
> many entries of the same structure need to be referenced.
>
> My original problem involved a processor emulator and a state
> structure with a register file inside. Each LLVM function would
> represent a sequence of instructions that would be JIT compiled and
> executed more efficiently than going through interpreter. However, I
> still wanted to have precise interrupts without any large penalty for
> a common case. A memory access instruction could fail and fire a
> SIGSEGV signal, and the signal handler would need to reconstruct the
> emulated CPU state from host CPU state. I wanted to use the patchpoint
> intrinsic in order to do that (I would need to emit the memory access
> instruction in assembly by myself in the patchpoint area).
> Unfortunately, referencing the large register file using the current
> implementation would be prohibitively expensive - I don't want to
> touch any cold registers from the register file at all.
>
> Later on (after I submitted this patch) I came up with a very simple
> solution. I don't really have to reference the cold registers. I know
> their location and know which ones are cold so I only need to
> reference already used registers (that would currently be "in flight"
> in LLVM values). So for my particular use case, there really doesn't
> seem to be a need for this patch anymore. For others, just referencing
> the pointer to the structure could be a simple workaround. However, it
> won't always be possible or desirable for the runtime to track which
> values would benefit from such treatment and more dynamic approach
> like the one this patch introduces might be beneficial.
As Andy said, I think you found the right solution for the problem you 
were trying to solve.  I do some value in having an arbitrary register 
offset mode, but not enough to purse this further at this time.
>
>> More generally, any information about motivation you can provide would be
>> appreciated.  Once we have that, we can think about the best approach for
>> the problem.
>>
>> If I'm understanding this properly, I think we'd need to make a few
>> adjustments to make this viable.  First, we probably need a new operand
>> type.  Indirect is documented as being a stack spill, which this is not.
>> Second, we need some type of flag to control the introduction of such
>> operands.  I know that my runtime (for one) would need to be extended to
>> handle such cases.  Having a way to disable this is a must.
> Of course, I understand the need for the new kind of reference. The
> name "IndirectMemRefOp" seems to be a bit misleading (I originally
> assumed, the LLVM could do what this patch implements), it would be
> better to call it "SpilledMemRefOp", or something like that. If
> renaming the original type is not possible, then we would need to
> invent a new name for this reference kind, like "FreeIndirectMemRefOp"
> or simply "MemRefOp".
I agree that the naming scheme is mildly confusing, but that's an 
entirely separate discussion.  The updated docs helped a lot on this.
>
>> Once we've gotten those issues settled, splitting the SelectionDAGBuilder
>> and FastIsel changes for review would probably be a good idea.  That's
>> really up to the preferences of Juergen since he's most familiar with these
>> pieces.
>>
>> Philip
> I understand that this idea is not that simple to implement properly.
> The biggest problem is the register allocator. There needs to be a way
> to spill the IndirectMemRefOps when it runs out of registers for the
> base register reference (it would need to change to a stack spill).
> The current implementation that counts unique values for
> IndirectMemRefOp references is not particularly clean. Also, the
> FrameIndex nodes need to be handled properly, since they are simple
> offsets in the resulting code. If I understand correctly, they need an
> already allocated stack in order to resolve them to offsets, after
> which point any spill is impossible... The logic would be similar to
> the DirectMemRefOps handling.
>
> I am not familiar enough with the LLVM code in order to solve these
> problems. If you are still interested in merging this code, I would
> appreciate if you provided me with some pointers.
I'm not expecting any further work on this at this time, but let me save 
some thoughts for anyone interested in this in the future.

I suspect getting the eager lowering approach working through regalloc 
would be quite hard.  I can see one of two approaches to solving the 
problem:
- Introduce a layer of indirection.  We define our generic memory 
reference type such that it's base doesn't have to be in a register.  If 
the register allocator decides to spill it, we "simply" update the 
reference to refer to the stack slot.  This would involve teaching the 
register allocator how to 'fold' such uses.
- Treat an arbitrary memory reference as a best effort optimization.  
Let the address be spilled to the stack by default. A later pass (after 
regalloc) could try to fold away stores to the stack where the stack 
slots only use is as an operand to the stackmap.

I suspect that the first would be more effective, but that the second 
would be far easier to implement.  I'm not sure which is actually the 
better implementation strategy.
>
> Regards,
> Gustaw
>
>>
>> On 09/21/2014 12:33 PM, Gustaw Smolarczyk wrote:
>>
>> Hello once again,
>>
>> I managed to create a second version of my patch. This time, I don't
>> disallow FrameIndex from the IndirectMemRefOp optimization. Instead, I track
>> the number of unique SDValues with the reasoning that each of them will
>> require a register. In case the number of them grows too much, I fall back
>> to the old logic that handles registers spills gracefully (instead of the
>> register allocator error).
>>
>> Regards,
>> Gustaw
>>
>> 2014-09-21 19:11 GMT+02:00 Gustaw Smolarczyk <wielkiegie at gmail.com>:
>>> Hello all.
>>>
>>> This is my first patch to the LLVM project. I hope you will find it
>>> useful.
>>>
>>> This patch improves the experimental stackmap and patchpoint intrinsics.
>>> It tries to use the IndirectMemRefOp in more cases. Currently, they are
>>> only emitted for values spilled to a local stack. However, they could be
>>> applied for any [Register + Offset] memory load.
>>>
>>> I had problems with indirect FrameIndex references. Because my code can't
>>> retrieve the offset of such a memory reference, it allocates a register for
>>> every such access, which caused one of the tests to fail. This is why I
>>> disabled it for this case.
>>>
>>> I am not very familiar with the LLVM code base. I might have done
>>> something in a roundabout or simply incorrect way. I would be glad if I had
>>> been given some pointers.
>>>
>>> Regards,
>>> Gustaw
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>