Patch: StackMap: Use IndirectMemRefOp more.

Fri Nov 7 03:22:32 PST 2014

2014-11-07 4:45 GMT+01:00 Andrew Trick <atrick at apple.com>:
>
>> On Nov 4, 2014, at 3:21 PM, Gustaw Smolarczyk <wielkiegie at gmail.com> wrote:
>>
>> Hi Philip,
>>
>> 2014-10-30 22:33 GMT+01:00 Philip Reames <listmail at philipreames.com>:
>>> Gustaw,
>>>
>>> I looked over part of your first patch and like the general direction you're
>>> taking.  Before getting in to the details, I want to ask a couple of
>>> question about your intent.
>>>
>>> It seems like your arbitrary base register matching could easily match
>>> non-stack locations.  I'm assuming this is by design?  What's the original
>>> source construct you're hoping to represent better?
>>>
>>> From the code, I'm guessing this would match things like:
>>> struct S { int a; }
>>> S* s = ...
>>> patchpoint(..., s->a, ...)
>>>
>>> Is this what you're aiming for?
>>
>> In general, yes. This is the kind of code I was hoping to optimize the
>> references to. I wanted not to waste registers/stack spill locations
>> for the values that are already in some structure, especially when
>> many entries of the same structure need to be referenced.
>
> Thanks for sharing your use case and patch. It's really interesting.
>
>> My original problem involved a processor emulator and a state
>> structure with a register file inside. Each LLVM function would
>> represent a sequence of instructions that would be JIT compiled and
>> executed more efficiently than going through interpreter. However, I
>> still wanted to have precise interrupts without any large penalty for
>> a common case. A memory access instruction could fail and fire a
>> SIGSEGV signal, and the signal handler would need to reconstruct the
>> emulated CPU state from host CPU state. I wanted to use the patchpoint
>> intrinsic in order to do that (I would need to emit the memory access
>> instruction in assembly by myself in the patchpoint area).
>> Unfortunately, referencing the large register file using the current
>> implementation would be prohibitively expensive - I don't want to
>> touch any cold registers from the register file at all.
>
> I'm not sure what "cold" registers are. Maybe they're registers that you know by convention are not live into the region and have never been written to.

They are the registers in the file that had not been written to. Other
registers are "live" in LLVM values and would be written back to the
register file at the end of the LLVM function representing a sequence.
This way I don't have to write them to the register file before any
faulting instruction.

>
>> Later on (after I submitted this patch) I came up with a very simple
>> solution. I don't really have to reference the cold registers. I know
>> their location and know which ones are cold so I only need to
>> reference already used registers (that would currently be "in flight"
>> in LLVM values). So for my particular use case, there really doesn't
>> seem to be a need for this patch anymore. For others, just referencing
>> the pointer to the structure could be a simple workaround.
>
> Yes. It is the runtime's job to just store a reference to S. That's opaque to LLVM.
>
>> However, it
>> won't always be possible or desirable for the runtime to track which
>> values would benefit from such treatment and more dynamic approach
>> like the one this patch introduces might be beneficial.
>
> I'm glad you have a workaround. I'm not clear on the solution. Are you still using a stack map to find the values that need to be in each register?

Only live registers are now referenced by the patchpoint intrinsic. I
couldn't use stackmaps, since I need to know their location during the
whole memory (or other faulting) instruction.
An emulated processor memory read instruction (at address pointer by
register r10) could be implemented as an LLVM read:

    %ptr.8 = getelementptr i8* %memory_base, i32 %r10
    %ptr = bitcast i8* %ptr.8 to i32*
    %val = load i32* %ptr

If I put the stackmap intrinsic just before or just after the load
instruction, it would still be possible for the stackmap references to
be wrong during the load instruction (when the page fault happens).
The (not so clean) solution I came up with is the following:

    %val.64 = call anyregcc @llvm.experimental.patchpoint.i64(i64 123,
i32 8, i8* null, i32 2, i8* %memory_base, i32 %r10, <other live
regs>...)
    %val = trunc i64 %val.64 to i32

And immediately after compilation I would put an x86 instruction into
the patched area that would load i32 from %memory_base+%r10 into
%val.64 (they would all be in registers). This way, I am 100% sure
where the live registers are if this load instruction faulted.

>
>>
>>>
>>> More generally, any information about motivation you can provide would be
>>> appreciated.  Once we have that, we can think about the best approach for
>>> the problem.
>>>
>>> If I'm understanding this properly, I think we'd need to make a few
>>> adjustments to make this viable.  First, we probably need a new operand
>>> type.  Indirect is documented as being a stack spill, which this is not.
>>> Second, we need some type of flag to control the introduction of such
>>> operands.  I know that my runtime (for one) would need to be extended to
>>> handle such cases.  Having a way to disable this is a must.
>>
>> Of course, I understand the need for the new kind of reference. The
>> name "IndirectMemRefOp" seems to be a bit misleading (I originally
>> assumed, the LLVM could do what this patch implements), it would be
>> better to call it "SpilledMemRefOp", or something like that. If
>> renaming the original type is not possible, then we would need to
>> invent a new name for this reference kind, like "FreeIndirectMemRefOp"
>> or simply "MemRefOp”.
>
> From the point of view of stack maps, they are not "MemRefs" at all. They are just stack slots. But they are naturally represented as MemRefs in the machine instruction.
>
> The LLVM language reference should make it clear what a Direct vs. Indirect location is. They are both stack slots, but are interpreted differently by the runtime.
> http://llvm.org/docs/StackMaps.html

The naming confused my a little, but in the end they all reference
"stack locations" (like automatic C/C++ variables, which could live on
stack or in registers at any given time). My proposal would extend
their scope beyond just these stack locations and make them reference
any memory. But maybe complicating an already complicated interface
would not be beneficial enough for that to make sense, as I see it
now.

>
>>>
>>> Once we've gotten those issues settled, splitting the SelectionDAGBuilder
>>> and FastIsel changes for review would probably be a good idea.  That's
>>> really up to the preferences of Juergen since he's most familiar with these
>>> pieces.
>>>
>>> Philip
>>
>> I understand that this idea is not that simple to implement properly.
>> The biggest problem is the register allocator. There needs to be a way
>> to spill the IndirectMemRefOps when it runs out of registers for the
>> base register reference (it would need to change to a stack spill).
>> The current implementation that counts unique values for
>> IndirectMemRefOp references is not particularly clean. Also, the
>> FrameIndex nodes need to be handled properly, since they are simple
>> offsets in the resulting code. If I understand correctly, they need an
>> already allocated stack in order to resolve them to offsets, after
>> which point any spill is impossible... The logic would be similar to
>> the DirectMemRefOps handling.
>>
>> I am not familiar enough with the LLVM code in order to solve these
>> problems. If you are still interested in merging this code, I would
>> appreciate if you provided me with some pointers.
>
> Direct stackmap entries are currently used for a special case where both an alloca and stackmap are positioned in the entry block and we don't allow spilling. The semantics are different than normal stackmap values and I want to add a new intrinsic just for this special case to avoid confusion.
>
> More generally, direct stackmap entries could be used for encoding the address of a stack slot without requiring the address to be stored in a stack slot or register. This is a minor feature that provides a little flexibility in encoding stack map locations. There's no need to keep a base register alive, as we always assume a frame pointer is available.
>
> I don't think it's LLVM's job to do what you were trying to do. You did the right thing by working around it in the runtime.

You might be right, since I cannot come up with a good enough
justification for complicating the interface now.
Thank you for the explanation.

Regards,
Gustaw

>
> -Andy
>
>>
>> Regards,
>> Gustaw
>>
>>>
>>>
>>> On 09/21/2014 12:33 PM, Gustaw Smolarczyk wrote:
>>>
>>> Hello once again,
>>>
>>> I managed to create a second version of my patch. This time, I don't
>>> disallow FrameIndex from the IndirectMemRefOp optimization. Instead, I track
>>> the number of unique SDValues with the reasoning that each of them will
>>> require a register. In case the number of them grows too much, I fall back
>>> to the old logic that handles registers spills gracefully (instead of the
>>> register allocator error).
>>>
>>> Regards,
>>> Gustaw
>>>
>>> 2014-09-21 19:11 GMT+02:00 Gustaw Smolarczyk <wielkiegie at gmail.com>:
>>>>
>>>> Hello all.
>>>>
>>>> This is my first patch to the LLVM project. I hope you will find it
>>>> useful.
>>>>
>>>> This patch improves the experimental stackmap and patchpoint intrinsics.
>>>> It tries to use the IndirectMemRefOp in more cases. Currently, they are
>>>> only emitted for values spilled to a local stack. However, they could be
>>>> applied for any [Register + Offset] memory load.
>>>>
>>>> I had problems with indirect FrameIndex references. Because my code can't
>>>> retrieve the offset of such a memory reference, it allocates a register for
>>>> every such access, which caused one of the tests to fail. This is why I
>>>> disabled it for this case.
>>>>
>>>> I am not very familiar with the LLVM code base. I might have done
>>>> something in a roundabout or simply incorrect way. I would be glad if I had
>>>> been given some pointers.
>>>>
>>>> Regards,
>>>> Gustaw
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>