[LLVMdev] Patchpoints used for inline caches and pointless reloads

Frej Drejhammar frej at sics.se
Thu Feb 19 04:46:43 PST 2015


Hi All,

I am observing something i suspect is a misbehaviour of the register
allocator which impacts the performance of patchpoints. This occurs in
the context of an abstract machine which in some places uses inline
caches. The problematic code looks like this:

entry: ; Initialize the abstract machine
  %db = call create_big_seldom_used_database()
  ; do a lot of things which increases register pressure and spills %db
  br label %main_execution_loop;

main_execution_loop:
  ; We do instruction dispatch here
  ...

opcode_a:
  %name0 = ...
  ; Use the database to look up %name0 and then overwrite the patchpoint
  ; with a direct call
  tail call anyregcc void (i64, i32, i8*, i32, ...)*
    @llvm.experimental.patchpoint.void(i64 4711, i32 16, @lookup_and_patch, i32 0,
                                       some_type %name0, some_type %db)
  ...

  %name1 = ...
  ; Use the database to look up %name1 and then overwrite the patchpoint
  ; with a direct call
  tail call anyregcc void (i64, i32, i8*, i32, ...)*
    @llvm.experimental.patchpoint.void(i64 4711, i32 16, @lookup_and_patch, i32 0,
                                       some_type %name1, some_type %db)
  ...

  br label %main_execution_loop;

If I run this through llc (for x86_64) I will frequently see, especially
if I have two cache lookups in the same basic block or low register
pressure, that %db is loaded from the stack and into a register. The
generated code looks like this:

reload %db into reg0
; %name0 is in reg1
call lookup_and_patch(reg1, reg0)
; shadow
; %name1 is in reg2
call lookup_and_patch(reg1, reg0)
; shadow

This is a performance problem as, although the calls to
lookup_and_patch() are overwritten, we will always pay for the, now
useless, load of %db into reg0. If I wanted the arguments to
lookup_and_patch() in registers I would not have used the anyregcc
calling convention. In this toy example lookup_and_patch() only refers a
single variable, but in my real application it uses multiple values
(most of them spilled) and the slowdown is quite noticeable (overheads
of up to 600% for some opcodes).

To me this looks like the register allocator is too eager to load values
which are only used by anyregcc patchpoints into registers, or is this
the intended behavior of anyregcc patchpoints?

I would be grateful for suggestions of how I could modify the register
allocator (RAGreedy) to avoid reloading values when they are only used
by instructions which are anyregcc patchpoints. During the last two
weeks I have made a couple unsuccessful attempts at that and could
really use some pointers from someone who understands it.

Attached is a the smallest example I have managed to find which shows
the problem.

Regards,

--Frej

-------------- next part --------------
A non-text attachment was scrubbed...
Name: bug.ll
Type: application/octet-stream
Size: 808 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150219/bce5d0c5/attachment.obj>


More information about the llvm-dev mailing list