[llvm-dev] Runtime inlining proof-of-concept (and questions)

Mon Oct 5 09:10:56 PDT 2020

I'm working on some code to re-compile the output from ahead-of-time 
LLVM compilers at runtime, which allows inlining of function calls whose 
targets are only known at runtime. This works by decorating selected 
functions ahead of time, adding code for determining caller-callee 
relationships and invoking the JIT compiler at runtime. The decoration 
works on the IR from the front-end compiler (e.g. clang) before 
generating object code with llc.

If anyone is interested in knowing more about the runtime inlining 
project it's available on GitHub at https://github.com/drti/drti

Making this work got a bit tricky in places and I have some questions 
about improvements:

1. To figure out when one "decorated" function has called another I pass 
some information in the r14 register as well as in the instruction 
stream accessible via the return address. The code is only supposed to 
work on Linux x86_64 for now. What I wanted to do was  extend the 
existing X86TargetMachine to add in these features but I couldn't find 
any way to do this cleanly - I couldn't see any target machine extension 
points like RegisterPass and RegisterStandardPasses for IR passes. What 
I did in the end was implement a new target type "x86_64_drti" which 
delegates as much as possible to the real X86 target obtained via 
TargetRegistry::lookupTarget. This is  messy because many of the virtual 
functions from TargetPassConfig that I want to delegate to X86PassConfig 
are protected (e.g. addPreRegAlloc). So I'm wondering if I missed 
something and if not, whether there's a reason the existing target 
machines don't provide any extension points?

2. To make it more robust I'd like to convert CALL instructions into a 
PUSH and JMP, so I can fake the return address to point at a block 
containing raw data and a JMP back to the instruction after the original 
CALL. I think we could call this a "return thunk". So instead of CALL 
target [...] I would have something like the below:

     MOV my_thunk, R11

     PUSH R11

     JMP target

my_post_call:

     [...]

Where my_thunk would have this:

     .8byte [...]

my_thunk:

     JMP my_post_call

I don't know if this is even feasible since it splits the basic block 
containing the CALL and quite likely breaks any pre-call or post-call 
handling. To be honest I'm also not sure how this relates to instruction 
"bundles" either and whether the CALL is already more complicated than a 
single instruction. Does anyone know what would be involved in this kind 
of transformation from CALL to PUSH and JMP?

Regards,

Raoul Gough.