[llvm-dev] dynamic namespacing of JIT modules?

Tue Oct 16 14:50:42 PDT 2018

Belatedly jumping in here, as there is a potential alternative answer for
this in the newer iteration of the ORC APIs.

The new APIs replace symbol resolvers with first class symbol tables,
"JITDylib" instances, which provide a way to namespace code so that
duplicate names do not clash. (JITDylibs are also faster to search, and
internally provide synchronization support for concurrent compilation).

Modules (and any other program representations) are always added to
JITDylibs in the new API, and you control symbol resolution by describing
"links-against" style relationships between JITDylibs the same way you
would when building a program/library on the command line. You can also
attach symbol definition generators to JITDylibs to generate new
definitions programmatically if desired. I have included an example below
that shows how to build a simple IR JIT that uses both techniques.

The JIT contains any number of LLVM modules, each of which defines a
> function, plus a "main" module that calls those functions. Several
> functions may have the same signature, so I need to find a way to
> resolve them.
>

> Originally, I just put each module's code in its own namespace when it
> was compiled. But now we want to be able to compile them separately to
> bitcode files and read them later. So at compilation time there is no
> longer any way to assign a unique namespace to each.

In this case, I believe that you could place each Module (or group of
modules whose names are guaranteed not to clash) in its own JITDylib. You
would use whatever disambiguation process you are using now to find the
"correct" version of the function to find the "correct" JITDylib instead,
and this would allow you to resolve correctly without modifying stored IR.

Cheers,
Lang.

Example code:

// Create a JITTargetMachineBuilder and DataLayout.
// We use a target machine builder rather than a single target machine as
the new APIs are
// capable of compiling on multiple threads, though we do not do that in
this example
auto JTMB = ExitOnErr(JITTargetMachineBuilder::detectHost());
auto DL = ExitOnErr(JTMB.getDefaultDataLayoutForTarget());

// Now we create an ExecutionSession (string pool, error reporting,
// session mutex), and object and IR compile layers.
ExecutionSession ES;
RTDyldObjectLinkingLayer ObjLayer(ES, [](){ return
llvm::make_unique<SectionMemoryManager>(); });
IRCompileLayer CompileLayer(ES, ObjLayer, ConcurrentIRCompiler(JTMB));

// Now we get to the interesting part: We declare two JITDylibs. One,
// ProcessSymbolsLib, will auto-generate definitions by calling dlsym
// on the current process, making this process's symbols available to
// JIT'd code.
// The second, Main, will contain our JIT'd code. We add a "links-against"
// relationship from Main to ProcessSymbolsLib by calling addToSearchOrder.
auto &ProcessSymbolsLib = ES.createJITDylib("<process symbols>");
ProcessSymbolsLib.setGenerator(ExitOnErr(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL)));
auto &Main = ES.createJITDylib("main");
Main.addToSearchOrder(ProcessSymbolsLib);

// Now we can add code to the Main library, and perform a look up on it.
// ExecutionSession::lookup takes as its first argument a list of JITDylibs
to search
// for the requested definition.
ExitOnErr(CompileLayer.add(Main, ThreadSafeModule(std::move(Mod),
std::move(Ctx))));
auto FooSym = ExitOnErr(ES.lookup({&Main}, "_foo"));
auto Foo = (FooTy)FooSym.getAddress();
Foo();

On Thu, Sep 13, 2018 at 8:23 AM Geoff Levner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Well, I've gotten this to work by playing with the symbol resolver as
> you suggest. Almost...
>
> In the main module, I declare the functions in (fictitious)
> namespaces. In the JIT, the symbol resolver recognizes those
> namespaces, which tell it in which modules to look for the
> corresponding unnamespaced functions. In a simple test case, that
> works. But in a more complex case, execution fails when I try to run
> the constructors for the main module. The error message says that the
> namespaced functions from the main module were not found, so
> apparently somebody somewhere is looking for those symbols and
> bypassing the JIT's symbol resolver... Perhaps the linking layer?
>
> I think I will go back to the UUID-based namespace idea, which would
> be less of a headache because it doesn't involve LLVM...
> On Thu, 13 Sep 2018 at 11:12, Geoff Levner <glevner at gmail.com> wrote:
> >
> > On Wed, 12 Sep 2018 at 21:48, Andres Freund <andres at anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > On 2018-09-12 12:09:24 +0200, Geoff Levner via llvm-dev wrote:
> > > > Greetings, LLVM wizards!
> > >
> > > Not one of them...
> > >
> > >
> > > > We have an application that uses Clang and Orc JIT to compile and
> > > > execute C++ code on the fly.
> > > >
> > > > The JIT contains any number of LLVM modules, each of which defines a
> > > > function, plus a "main" module that calls those functions. Several
> > > > functions may have the same signature, so I need to find a way to
> > > > resolve them.
> > > >
> > > > Originally, I just put each module's code in its own namespace when
> it
> > > > was compiled. But now we want to be able to compile them separately
> to
> > > > bitcode files and read them later. So at compilation time there is no
> > > > longer any way to assign a unique namespace to each.
> > >
> > > Why not?  If you assign a random uuid, or a sequential number of
> > > whatnot, that should work.
> >
> > Yes, that is the solution I am looking into at the moment, actually:
> > using a UUID to generate a namespace when the module is compiled.
> > However, that means saving the UUID somewhere; the bitcode is no
> > longer self-sufficient. I suppose I could create a special global
> > variable in the module containing the UUID...
> >
> > > > 2. Assign each module a unique namespace, but don't change the
> modules
> > > > themselves: just add the namespace when a function is called from the
> > > > main module, and modify the JIT's symbol resolver to strip the
> > > > namespace and look for the function only in the relevant module.
> > >
> > > That's kind of what I do for a similar-ish problem in the JIT engine in
> > > postgres (which uses orcjit).  There multiple dynamically loaded
> > > extensions can register functions whose source code is available, and
> > > each of them can have conflicting symbols.  The equivalent of your main
> > > module generates function names that encode information about which
> > > module to look for the actual definition of the function, and then does
> > > the symbol resolution outside of LLVMs code.  I do that both when
> > > inlining these functions, and when generating funciton calls to the
> > > external function.
> >
> > I did try something like that. The problem I ran into is that the
> > symbol resolver receives mangled function names. It is easy enough to
> > demangle them there, but hard to mangle names before compiling. Once
> > you have decoded your function name in the symbol resolver, how do you
> > generate a mangled name for the actual function you want to resolve
> > to?
> >
> > > Not sure if that helps.
> > >
> > > Greetings,
> > >
> > > Andres Freund
> >
> > Thanks, Andres.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181016/84878741/attachment.html>