[llvm-dev] [LLVMdev] Help with using LLVM to re-compile hot functions at run-time

Lang Hames via llvm-dev llvm-dev at lists.llvm.org
Sun Nov 15 03:33:50 PST 2015


Hi Revital,

This program does not contain any external references, and so I would not
expect it to call the resolver at all.

What symbol were you expecting to see a resolver call for?

Cheers,
Lang.

On Wed, Nov 11, 2015 at 11:44 AM, Revital1 Eres <ERES at il.ibm.com> wrote:

> Hi Lang,
>
> Thanks for your reply!
>
> The program I'm compiling is the following toy program which is compiled
> with -fno-inline to
> avoid inlining foo into main.
>
> In the fully_lazy_with_recompile code I've added the following statements.
> When running the
> code with gdb I do not see it breaks in the lamda resolver as described in
> my previous mail.
>
>  auto ExprSymbol = J.findUnmangledSymbolIn(H,"main");
>  double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>  std::cerr << "Evaluated to " << FP() << "\n";
>
> Btw, another issue I need to resolve - some of the parameters were
> originally read from command line using argv but due to the following error
> I avoided that for now (I also got similar error regarding
> ZNSt8ios_base4InitC1Ev when using prints):
> LLVM ERROR: Program used external function 'atoi' which could not be
> resolved!
>
> Thanks again,
> Revital
>
> #define ITERS 1000000
> int arr[ITERS];
>
> int
> foo (int x, int y)
> {
>   int res = 950;
>   if (x > 3 && y < 77)
>     res = 97;
>   else
>     res = res * x;
>   return res;
> }
>
> int
> main ()
> {
>   int x = 880;
>   int num = 990;
>   int i, j;
>   int b = 0;
>
>   for (i = 0; i < ITERS; i++)
>     arr[i] = i;
>
>   for (j = 0; j < num; j++)
>     for (i = 0; i < ITERS; i++)
>       {
>         b += foo (x, arr[i]) /2;
>       }
>   return 0;
> }
>
>
>
> From:        Lang Hames <lhames at gmail.com>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> Date:        10/11/2015 06:31 PM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply - I'm traveling at the moment and not able
> to check my email often.
>
> You will only see a callback on the resolver for symbols that are external
> to the module. What did the IR that you added look like?
>
> Cheers,
> Lang.
>
> On Wed, Nov 4, 2015 at 8:37 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> I want to use the lazy recompilation program you posted to compile an
> input program RI (not processing the input by
>  interpreter as it is done in the example).
> To do that I called the addModule function on the module returned from
> parseInputIR as was done with the other
> functions in the Kaleidoscope examples.
> Now, to start the codegen I am using getAddress and at this point I was
> expecting to see a call to the lamda resolver defined
> in createResolver but I did not see it happen and I appreciate your help
> to understand why.
>
> Here is a snippet from my additions to the new version of the fully_lazy
> Orc Kaleidoscope.
>
> Thanks again,
> Revital
>
>   SessionContext S(getGlobalContext());
>   KaleidoscopeJIT J(S);
>
>   cl::ParseCommandLineOptions(argc, argv,
>                               "Kaleidoscope example program\n");
>
>  std::unique_ptr<Module> M;
>   if (!InputIR.empty()) {
>       M = parseInputIR(InputIR);;
>       auto H = J.addModule(std::move(M));
>      char ModID[256];
>      sprintf(ModID, "IR:%s", InputIR.c_str());
>      auto ExprSymbol = J.findUnmangledSymbolIn(H,ModID);
>      double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
>      std::cerr << "Evaluated to " << FP() << "\n";
>      J.removeModule(H);
>   }
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL, LLVM Developers Mailing List <
> *llvm-dev at lists.llvm.org* <llvm-dev at lists.llvm.org>>
> Date:        18/09/2015 09:47 AM
>
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Attached is a new version of the fully_lazy Orc Kaleidoscope demo that has
> been extended to enable re-compilation at higher optimisation levels,
> roughly following the scheme I outlined before.
>
> In the compile action for the callback, the initial IR for each is
> transformed like this:
>
>
>                            unsigned foo_counter = 0;
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (++foo_counter > 1000) {
> }                              auto fooOpt = $recompile(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> The key changes to make this work (which you can see by diff'ing toy.cpp
> against the original fully_lazy version):
>
> 1) New layers HotCompileLayer and HotIROptsLayer added. These perform IR
> optimisation and code generation at higher optimisation levels than the
> default layers.
> 2) The symbol resolver function (not to be confused with the resolver
> block) has been pulled out into its own function, createResolver, so that
> it can be shared between optimised & non-optimized code. It also resolves
> the "$recompile" function to a static method on the KaleidoscopeJIT class
> itself.
> 3) The lazy compile action now calls 'instrumentFunctions' before adding
> the IR for cold functions to the JIT.
> 4) The instrumentFunctions method injects the counter code and call to
> recompile.
> 5) The recompileHot method re-IRGens functions, then adds them to the
> HotIROpts layer to generate more optimized versions. It then updates the
> function-body pointer so that subsequent calls go to the optimised version.
>
> This is a bit quick-and-dirty, but does work. In the future I'll try to
> tidy this up and turn it into a new tutorial chapter.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
>
>
> On Wed, Sep 16, 2015 at 10:09 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> Many thanks!!! I just wanted to make sure you did not miss it...
>
> Thanks again!
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        17/09/2015 01:56 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> Apologies for the delayed reply.
>
> I'm working on some example code for how to do this. I'll try to post it
> tomorrow.
>
> Cheers,
> Lang.
>
> On Tue, Sep 8, 2015 at 12:23 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hi Lang,
>
> After spending some time debugging Kaleidoscope orc fully_lazy toy example
> on
> x86 I want to start implementing run-time optimizer as you suggested and
> again
> I highly appreciate your help.
> For now I'll defer the target specific implementation to the end after
> I'll have
> the non target parts in place as I can run on x86 as a start.
> Given a simple example of main function calling foo and bar functions;
> IIUC I should start from the IR level of this module which means that
> ParseIRFile will be be first called on the IR of the program, is that
> right?
>
> I would like to make sure I understand your suggestion which is to insert
> a new
> layer that should be implemented on top of the CompileCallbackLayer in
> order to
> be able to call trigger_condition at the beginning of a function.
> IIUC until the function (bar or foo) is optimized the call to foo and bar
> will
> go through the resolver (foo and bar will not be compiled from scratch
> every
> time we go through the resolver but rather execute the cached non
> optimized
> version after first compiled). The resolver will check trigger_condition
> to see if the cached non optimized version should be executed or a new
> optimizied version should be compiled and executed.
> After the trigger_condition is true foo and bar will be compiled to
> generate
> their optimized version and this version will be executed directly from
> now on
> (not going through the resolver any more). Is that right?
> Does this layer on top of the CompileCallbackLayer should be similar to
> class KaleidoscopeJIT?
> I saw that in Kaleidoscope Orc's example the Lambda functions that are
> added in
> createLambdaResolver are been executed by the resolver before compiling a
> call
> so I assume that the trigger_condition should be added also by
> createLambdaResolver so before compiling foo or bar the Lambda functions
> that are added by calling createLambdaResolver and contain
> trigger_condition
> will be executed, is that right?
>
> IIUC in Kaleidoscope Orc's example the interpreter calls the addModule
> upon
> parsing call expression in HandleTopLevelExpression.
> In my case I assume addModule be called for the module returned from
> ParseIRFile, right?
> In this case should calling getAddress on the whole module (the IR of all
> functions) will trigger calling the Lambda functions defined in
> createLambdaResolver on foo and bar functions? Also - in Kaleidoscope orc
> example the execution of the function is done explicitly in
> HandleTopLevelExpression after calling getAddress and its not clear to me
> where
> I should insert this in my case.
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        28/07/2015 05:58 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> What do you mean by "code cache"? Orc (and MCJIT) does have the concept of
> an ObjectCache, which is a long-lived, potentially persistent, compiled
> version of some IR. It's not a key component of the JIT though: Most
> clients run without a cache attached and just JIT their code from scratch
> in each session.
>
> Recompilation is orthogonal to caching. There is no in-tree support for
> recompilation yet. There are several ways that it could be supported,
> depending on what security / performance trade-offs you're willing to make,
> and how deep in to the LLVM code you want to get. As things stand at the
> moment all function calls in the lazy JIT are indirected via function
> pointers. We want to add support for patchable call-sites, but this hasn't
> been implemented yet. The Indirect calls make recompilation reasonably
> easy: You could add a transform layer on top of the CompileCallbackLayer
> which would modify each function like this:
>
> void foo$impl() {          void foo$impl() {
>   // foo body        ->      if (trigger_condition) {
> }                              auto fooOpt = jit_recompile_hot(&foo);
>                                fooOpt();
>                              }
>                              // foo body
>                            }
>
> You would implement the jit_recompile_hot function yourself in your JIT
> and make it available to JIT'd code via the SymbolResolver. When the
> trigger condition is met you'll get a call to recompile foo, at which point
> you: (1) Add the IR for foo to a 2nd IRCompileLayer that has been
> configured with a higher optimization level, (2) look up the address of the
> optimized version of foo, and (3) update the function pointer for foo to
> point at the optimized version. The process for patchable callsites should
> be fairly similar once they're available, except that you'll trigger a
> call-site update rather than rewriting a function pointer.
>
> This neglects all sorts of fun details (threading, garbage collection of
> old function implementations), but hopefully it gives you a place to
> start.
>
>
> Regarding laziness, as Hal mentioned you'll have to provide some target
> support for PowerPC to support lazy compilation. For a rough guide you can
> check out the X86_64 support code in
> llvm/include/llvm/ExecutionEngine/Orc/OrcTargetSupport.h and
> llvm/lib/ExecutionEngine/Orc/OrcTargetSupport.cpp.
>
> There are two methods that you'll need to implement:
> insertCompileCallbackTrampoline and insertResolverBlock. These work
> together to enable lazy compilation. Both of these methods inject blobs of
> target specific code in to the JIT process. To do this (at least for now) I
> make use of a handy feature of LLVM IR: You can write raw assembly code
> directly into a bitcode module ("module-level asm"). If you look at the X86
> implementation of each of these methods you'll see they're written in terms
> of string-streams building up a string of assembly which will be handed off
> to the JIT to compile like any other code.
>
> The first blob that you need to be able to output is the resolver block.
> The purpose of the resolver block is to save program state and call back in
> to the JIT to trigger lazy compilation of a function. When the JIT is done
> compiling the function it returns the address of the compiled function to
> the resolver block, and the resolver block returns to the compiled function
> (rather than its original return address).
>
> Because all functions share the same resolver block, the JIT needs some
> way to distinguish them, which is where the trampolines come in. The JIT
> emits one trampoline per function and each trampoline just calls the
> resolver block. The return address of the call in each trampoline provides
> the unique address that the JIT associates with the to-be-compiled
> functions. The CompileCallbackManager manages this association between
> trampolines and functions for you, you just need to provide the
> resolver/trampoline primitives.
>
> In case it helps, here's what the output of all this looks like on X86.
> Trampolines are trivial - they're emitted in blocks and proceeded by a
> pointer to the resolver block:
>
> module asm "Lorc_resolve_block_addr:"
> module asm "  .quad 140439143575560"
> module asm "orc_jcc_0:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_1:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> module asm "orc_jcc_2:"
> module asm "  callq *Lorc_resolve_block_addr(%rip)"
> ...
>
>
> The resolver block is more complicated and I won't provide the full code
> for it here. You can find it by running:
> lli -jit-kind=orc-lazy -orc-lazy-debug=mods-to-stderr <hello_world.ll>
>
>
>
>
>
> and looking at the initial output. In pseudo-asm though, it looks like
> this:
>
> module asm "jit_callback_manager_addr:"
> module asm "  .quad 0x46fc190" // <- address of callback manager object
> module asm "orc_resolver_block:"
> module asm "  // save register state."
> module asm "  // load jit_callback_manager_addr into %rdi
> module asm "  // load the return address (from the trampoline call) into
> %rsi
> module asm "  // %rax = call jit(%rdi, %rsi)
> module asm "  // save %rax over the return address
> module asm "  //  restore register state
> module asm "  //  retq"
>
> So, that's a whirlwind intro to implementing lazy JITing support for a new
> architecture in Orc. I'll try to answer any questions you have on the
> topic, though I'm not familiar with PowerPC at all. If you're comfortable
> with PowerPC assembly I think it should be possible to implement once you
> grok the concepts.
>
> Hope this helps!
>
> Cheers,
> Lang.
>
>
> On Jul 26, 2015, at 11:17 PM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
>
> Hi Again,
>
> I'm a little confused regarding what is the exact Orc's functions I should
> use
> in order to save the functions code in a code cache so it could be later
> replaced with different versions of it and I appreciate your help.
>
> Just a reminder I want to dynamically recompile the program based on
> profile
>  collected at the run-time. I would like to start executing the program
> from
> the code-cache and at some point be able to replace a function body with
> it's
> new compiled version; this can be done by replacing the entry in the
> function
>  code with a trampoline to It's new version so that future calls to it will
> call the new version code.
>
> Does the CompileOnDemandLayer executes the program from a code cache
> and holds pointers to the code of the functions it executes? I am
> compiling for Power machine.
> Is there a target specific pieces that I should implement for making Orc
> work on Power?
>
> Thanks again,
> Revital
>
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        20/07/2015 08:41 PM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> The CompileOnDemand layer is used by the lazy bitcode JIT in the lli tool.
> You can find the code in llvm/tools/lli/OrcLazyJIT.* .
>
> Cheers,
> Lang.
>
>
> On Mon, Jul 20, 2015 at 2:32 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello Lang,
>
> Thanks for your answer.
>
> I am now looking for an example of the usage of CompileOnDemandLayer. Is
> there an example available for that (could not find one in llvm/examples)?
>
> Thanks,
> Revital
>
>
>
> From:        Lang Hames <*lhames at gmail.com* <lhames at gmail.com>>
> To:        Revital1 Eres/Haifa/IBM at IBMIL
> Cc:        LLVM Developers Mailing List <*llvmdev at cs.uiuc.edu*
> <llvmdev at cs.uiuc.edu>>
> Date:        10/07/2015 12:10 AM
> Subject:        Re: [LLVMdev] Help with using LLVM to re-compile hot
> functions at run-time
> ------------------------------
>
>
>
> Hi Revital,
>
> LLVM does have an IR interpreter, but I don't think it's maintained well
> (or possibly at all). The interpreter is also not designed to interact with
> the LLVM JITs.
>
> We generally encourage people to just JIT LLVM IR, rather than
> interpreting it. For the use-case you have described, you could JIT IR with
> no optimizations to begin with, then re-JIT hot functions at a higher
> level.
>
> The Orc JIT APIs (LLVM's newer JIT APIs) were written with this kind of
> use-case in mind, and are probably a better fit for this than MCJIT. There
> is no built-in hot-function detection or recompilation yet, but I think
> this would be *fairly* easy to write in terms of Orc's callback API.
>
> Cheers,
> Lang.
>
>
> On Thu, Jul 9, 2015 at 4:19 AM, Revital1 Eres <*ERES at il.ibm.com*
> <ERES at il.ibm.com>> wrote:
> Hello,
>
> I am new to LLVM and a I appreciate your help with the following:
>
> I want to run the LLVM IR through virtual machine (LLVM interpreter?) and
> jit
> compile the hot functions (using MCJIT).
>
> This task will require amongst other identifying the hot functions and
> having a
> code cache that should be patched with the native code of the functions
> after
> they are jitted.
>
> I've read so far about MCJIT and lli however I have not seen that the LLVM
> interpreter can be used as a VM the way I was looking for; meaning
> execute the code one instruction at a time; have a profiling mode to
> identify hot functions and call jit to compile the hot functions.
>
> I appreciate any advice/starting points for this project.
>
> Thanks,
> Revital
>
> _______________________________________________
> LLVM Developers mailing list
> *LLVMdev at cs.uiuc.edu* <LLVMdev at cs.uiuc.edu>
> *http://llvm.cs.uiuc.edu* <http://llvm.cs.uiuc.edu/>
> *http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev*
> <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>
>
>
>
>
>
>
>
> [attachment "fully_lazy_with_recompile.tgz" deleted by Revital1
> Eres/Haifa/IBM]
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151115/1cacc417/attachment-0001.html>


More information about the llvm-dev mailing list