[LLVMdev] Patchpoints used for inline caches and pointless reloads
Philip Reames
listmail at philipreames.com
Mon Feb 23 17:02:27 PST 2015
Just to make sure we're on the same page, let me try to rephrase your
problem:
"After the initial call is patched out of existence, the register
spilling which was reasonable for the initial call becomes a performance
liability for the patched in assembly."
Is that a correct restatement?
If so, I'd suggest you look into not using the anyregcc convention. My
understand is that that convention is intended for calls where values
are desired in registers over the call. This is not what you want.
If you used the standard C convention, the calling convention would
place the first X (where X is 6 I think) arguments in registers and the
rest on the stack. If you pad your actual arguments with undef such
that the arguments that are used by your patch mechanism are after that
threshold, you might see what you're looking for. However, I suspect
you'd still see a move from the stack to a register, and then back to a
different slot due the calling convention adjustments.
What I think you really want is a parameter attribute that we don't
have. You want to be able to tag a particular parameter as being
"anylocation" (analogous to the anyreg of anyregcc). In concept, a
"anystack" notion is similar to what we do for gc.statepoints (and, I
think, excess arguments on patchpoints?). That's still not quite as
strong as what you want though...
If you do decide to pursue the "anylocation" parameter attribute,
definitely keep me informed. This is analogous to something we want for
gc.statepoints as well.
p.s. Without knowing what DB actually is, it really looks like you might
be able to accomplish the same thing using the "ID" field of the
patchpoint. Have you looked into that?
Philip
On 02/19/2015 04:46 AM, Frej Drejhammar wrote:
> Hi All,
>
> I am observing something i suspect is a misbehaviour of the register
> allocator which impacts the performance of patchpoints. This occurs in
> the context of an abstract machine which in some places uses inline
> caches. The problematic code looks like this:
>
> entry: ; Initialize the abstract machine
> %db = call create_big_seldom_used_database()
> ; do a lot of things which increases register pressure and spills %db
> br label %main_execution_loop;
>
> main_execution_loop:
> ; We do instruction dispatch here
> ...
>
> opcode_a:
> %name0 = ...
> ; Use the database to look up %name0 and then overwrite the patchpoint
> ; with a direct call
> tail call anyregcc void (i64, i32, i8*, i32, ...)*
> @llvm.experimental.patchpoint.void(i64 4711, i32 16, @lookup_and_patch, i32 0,
> some_type %name0, some_type %db)
> ...
>
> %name1 = ...
> ; Use the database to look up %name1 and then overwrite the patchpoint
> ; with a direct call
> tail call anyregcc void (i64, i32, i8*, i32, ...)*
> @llvm.experimental.patchpoint.void(i64 4711, i32 16, @lookup_and_patch, i32 0,
> some_type %name1, some_type %db)
> ...
>
> br label %main_execution_loop;
>
> If I run this through llc (for x86_64) I will frequently see, especially
> if I have two cache lookups in the same basic block or low register
> pressure, that %db is loaded from the stack and into a register. The
> generated code looks like this:
>
> reload %db into reg0
> ; %name0 is in reg1
> call lookup_and_patch(reg1, reg0)
> ; shadow
> ; %name1 is in reg2
> call lookup_and_patch(reg1, reg0)
> ; shadow
>
> This is a performance problem as, although the calls to
> lookup_and_patch() are overwritten, we will always pay for the, now
> useless, load of %db into reg0. If I wanted the arguments to
> lookup_and_patch() in registers I would not have used the anyregcc
> calling convention. In this toy example lookup_and_patch() only refers a
> single variable, but in my real application it uses multiple values
> (most of them spilled) and the slowdown is quite noticeable (overheads
> of up to 600% for some opcodes).
>
> To me this looks like the register allocator is too eager to load values
> which are only used by anyregcc patchpoints into registers, or is this
> the intended behavior of anyregcc patchpoints?
>
> I would be grateful for suggestions of how I could modify the register
> allocator (RAGreedy) to avoid reloading values when they are only used
> by instructions which are anyregcc patchpoints. During the last two
> weeks I have made a couple unsuccessful attempts at that and could
> really use some pointers from someone who understands it.
>
> Attached is a the smallest example I have managed to find which shows
> the problem.
>
> Regards,
>
> --Frej
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150223/30b8593c/attachment.html>
More information about the llvm-dev
mailing list