[LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Tue Jul 28 01:33:38 PDT 2015

Hi Lang,

Thank you very much for the detailed reply!! I will take a closer 
look at it and hopefully could start implementing my task 
based on Orc API.

Btw, by code cache I meant to have the ability to run the
the executed code from a place where I could later 
patch it -- redirect calls to a new version of functions
and store new versions of functions in it as well.

Thanks again,
Revital

From:   Lang Hames <lhames at gmail.com>
To:     Revital1 Eres/Haifa/IBM at IBMIL
Cc:     LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
Date:   28/07/2015 05:58 AM
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time

Hi Revital,

What do you mean by "code cache"? Orc (and MCJIT) does have the concept of 
an ObjectCache, which is a long-lived, potentially persistent, compiled 
version of some IR. It's not a key component of the JIT though: Most 
clients run without a cache attached and just JIT their code from scratch 
in each session.

Recompilation is orthogonal to caching. There is no in-tree support for 
recompilation yet. There are several ways that it could be supported, 
depending on what security / performance trade-offs you're willing to 
make, and how deep in to the LLVM code you want to get. As things stand at 
the moment all function calls in the lazy JIT are indirected via function 
pointers. We want to add support for patchable call-sites, but this hasn't 
been implemented yet. The Indirect calls make recompilation reasonably 
easy: You could add a transform layer on top of the CompileCallbackLayer 
which would modify each function like this:

void foo$impl() {          void foo$impl() {
  // foo body        ->      if (trigger_condition) {
}                              auto fooOpt = jit_recompile_hot(&foo);
                               fooOpt();
                             }
                             // foo body
                           }

You would implement the jit_recompile_hot function yourself in your JIT 
and make it available to JIT'd code via the SymbolResolver. When the 
trigger condition is met you'll get a call to recompile foo, at which 
point you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been 
configured with a higher optimization level, (2) look up the address of 
the optimized version of foo, and (3) update the function pointer for foo 
to point at the optimized version. The process for patchable callsites 
should be fairly similar once they're available, except that you'll 
trigger a call-site update rather than rewriting a function pointer.

This neglects all sorts of fun details (threading, garbage collection of 
old function implementations), but hopefully it gives you a place to 
start. 

Regarding laziness, as Hal mentioned you'll have to provide some target 
support for PowerPC to support lazy compilation. For a rough guide you can 
check out the X86_64 support code in 
llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and 
llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.

There are two methods that you'll need to implement: 
insertCompileCallbackTrampoline and insertResolverBlock. These work 
together to enable lazy compilation. Both of these methods inject blobs of 
target specific code in to the JIT process. To do this (at least for now) 
I make use of a handy feature of LLVM IR: You can write raw assembly code 
directly into a bitcode module ("module-level asm"). If you look at the 
X86 implementation of each of these methods you'll see they're written in 
terms of string-streams building up a string of assembly which will be 
handed off to the JIT to compile like any other code.

The first blob that you need to be able to output is the resolver block. 
The purpose of the resolver block is to save program state and call back 
in to the JIT to trigger lazy compilation of a function. When the JIT is 
done compiling the function it returns the address of the compiled 
function to the resolver block, and the resolver block returns to the 
compiled function (rather than its original return address).

Because all functions share the same resolver block, the JIT needs some 
way to distinguish them, which is where the trampolines come in. The JIT 
emits one trampoline per function and each trampoline just calls the 
resolver block. The return address of the call in each trampoline provides 
the unique address that the JIT associates with the to-be-compiled 
functions. The CompileCallbackManager manages this association between 
trampolines and functions for you, you just need to provide the 
resolver/trampoline primitives.

In case it helps, here's what the output of all this looks like on X86. 
Trampolines are trivial - they're emitted in blocks and proceeded by a 
pointer to the resolver block:

module asm "Lorc_resolve_block_addr:"
module asm "  .quad 140439143575560"
module asm "orc_jcc_0:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_1:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
module asm "orc_jcc_2:"
module asm "  callq *Lorc_resolve_block_addr(%rip)"
...

The resolver block is more complicated and I won't provide the full code 
for it here. You can find it by running:

lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>

and looking at the initial output. In pseudo-asm though, it looks like 
this:

module asm "jit_callback_manager_addr:"
module asm "  .quad 0x46fc190" // <- address of callback manager object
module asm "orc_resolver_block:"
module asm "  // save register state."
module asm "  // load jit_callback_manager_addr into %rdi
module asm "  // load the return address (from the trampoline call) into 
%rsi
module asm "  // %rax = call jit(%rdi, %rsi)
module asm "  // save %rax over the return address
module asm "  //  restore register state
module asm "  //  retq"

So, that's a whirlwind intro to implementing lazy JITing support for a new 
architecture in Orc. I'll try to answer any questions you have on the 
topic, though I'm not familiar with PowerPC at all. If you're comfortable 
with PowerPC assembly I think it should be possible to implement once you 
grok the concepts.

Hope this helps!

Cheers,
Lang.

On Jul 26, 2015, at 11:17 PM, Revital1 Eres <ERES at il.ibm.com> wrote:

Hi Again, 

I'm a little confused regarding what is the exact Orc's functions I should 
use 
in order to save the functions code in a code cache so it could be later 
replaced with different versions of it and I appreciate your help. 

Just a reminder I want to dynamically recompile the program based on 
profile 
 collected at the run-time. I would like to start executing the program 
from 
the code-cache and at some point be able to replace a function body with 
it's 
new compiled version; this can be done by replacing the entry in the 
function 
 code with a trampoline to It's new version so that future calls to it 
will 
call the new version code. 

Does the CompileOnDemandLayer executes the program from a code cache 
and holds pointers to the code of the functions it executes? I am 
compiling for Power machine. 
Is there a target specific pieces that I should implement for making Orc 
work on Power? 

Thanks again, 
Revital 

From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        20/07/2015 08:41 PM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 

Hi Revital, 

The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool. 
You can find the code in llvm/tools/lli/OrcLazyJIT.* . 

Cheers, 
Lang. 

On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello Lang, 

Thanks for your answer. 

I am now looking for an example of the usage of CompileOnDemandLayer. Is 
there an example available for that (could not find one in llvm/examples)? 

Thanks, 
Revital 

From:        Lang Hames <lhames at gmail.com> 
To:        Revital1 Eres/Haifa/IBM at IBMIL 
Cc:        LLVM Developers Mailing List <llvmdev at cs.uiuc.edu> 
Date:        10/07/2015 12:10 AM 
Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot 
functions at run-time 

Hi Revital, 

LLVM does have an IR interpreter, but I don't think it's maintained well 
(or possibly at all). The interpreter is also not designed to interact 
with the LLVM JITs. 

We generally encourage people to just JIT LLVM IR, rather than 
interpreting it. For the use-case you have described, you could JIT IR 
with no optimizations to begin with, then re-JIT hot functions at a higher 
level. 

The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of 
use-case in mind, and are probably a better fit for this than MCJIT. There 
is no built-in hot-function detection or recompilation yet, but I think 
this would be *fairly* easy to write in terms of Orc's callback API. 

Cheers, 
Lang. 

On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <ERES at il.ibm.com> wrote: 
Hello, 

I am new to LLVM and a I appreciate your help with the following: 

I want to run the LLVM IR through virtual machine (LLVM interpreter?) and 
jit 
compile the hot functions (using MCJIT). 

This task will require amongst other identifying the hot functions and 
having a 
code cache that should be patched with the native code of the functions 
after 
they are jitted. 

I've read so far about MCJIT and lli however I have not seen that the LLVM 

interpreter can be used as a VM the way I was looking for; meaning 
execute the code one instruction at a time; have a profiling mode to 
identify hot functions and call jit to compile the hot functions. 

I appreciate any advice/starting points for this project. 

Thanks, 
Revital 

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150728/7b4605f2/attachment.html>