[lldb-dev] Debugging JIT-compiled code with LLVM

Thu Nov 18 10:07:05 PST 2010

On Nov 17, 2010, at 11:35 PM, Simon Ask Ulsnes wrote:

> 2010/11/18 Greg Clayton <gclayton at apple.com>:
>> LLDB currently doesn't yet have any support for JIT'ed code, though I would be happy to work with you if you wanted to get that working in LLDB.
> 
> This was my fear. But I'll need some kind of debugger for my own
> language anyway, so I'd be happy to implement this in LLDB.

Great!

> Could you briefly outline the general steps necessary for adding
> support for JIT'ed code? As you mention, I would expect the procedure
> to look similar to how dlopen()ed dylibs are registered, but I might
> be wrong.

A few questions on how this would be debugged (not worrying about the JIT yet):
1 - When you are debugging this, are you going to want to step through your new source code files or generated C/C++ sources? 
2 - If you want to debug sources that you produce, will this be like debugging lex/yacc code where a bunch of #line and #file directives are used to map C/C++ code to your proprietary source code?

If you are going to be debugging standard i386/x86_64 code, then you won't need to subclass lldb_private::Process. 

In order to support JIT'ed code, we just need a way to communicate between a running program and the debugger. Setting a breakpoint, like is done with the JIT support in GDB, is quite ok for this as this is how the dynamic loader plug-in for macosx currently works. We can probably get away with being able to register additional dynamic loader plug-ins with the current Process. To elaborate a bit lets look at how the dynamic loaders work for shared libraries. Currently each process has a pluggable dynamic loader plug-in that gets loaded prior to launch by the Process subclasses in "Process::WillLaunch()", or prior to attaching in "Process::WillAttachToProcessWithID (lldb::pid_t pid)" and "Process::WillAttachToProcessWithName (const char *process_name, bool wait_for_launch)". So any process can re-use an abstract dynamic loader plugin. The pseudo code looks like:

class Process
{
...
	std::auto_ptr<DynamicLoader> m_dynamic_loader_ap;
};

When the WillLaunch, or WillAttach functions are overridden in the Process subclasses (see ProcessGDBRemote for an example), it will find a dynamic loader by the plug-in name:

    m_dynamic_loader_ap.reset(DynamicLoader::FindPlugin(this, "dynamic-loader.macosx-dyld"));

Since the ProcessGDBRemote plug-in is currently for MacOSX debugging, we know to lookup the dynamic loader using a specific name.

After a dynamic loader plug-in is installed, it will get a callback after attaching or launching:

void
ProcessGDBRemote::DidLaunch ()
{
    DidLaunchOrAttach ();
    if (m_dynamic_loader_ap.get())
        m_dynamic_loader_ap->DidLaunch();
}

This gives the dynamic loader plug-in a chance to install its breakpoint and assign a callback to that breakpoint. When breakpoints have callbacks associated with them, the callbacks get called synchronously when the breakpoint is hit and this allows you to load/unload shared libaries (See DynamicLoaderMacOSXDYLD for example code).

We could allow the Process class to have more than one dynamic loader plug-in since loading JIT code is very similar to loading shared libraries:

class Process
{
...
	std::vector<DynamicLoaderSP> m_dynamic_loaders;
};

where DynamicLoaderSP is a shared pointer typedef...

This would allow us to have a standard system dynamic loader, and one or more JIT dynamic loader plug-ins.

The JIT'ed dynamic loader plug-in would do the same kind of thing the macosx one does: it will set a breakpoint, install a callback and react to that breakpoint callback as needed.

Inside LLDB we will need to think about how we want to represent JIT'ed code. There are a few options, but first lets look at how shared libraries are represented. Any executable or shared library is represented by a Module. Module objects have ObjectFile objects (abstracted object file readers (ELF and mach-o)), and a SymbolFile for reading debug symbols. We will want to repesent JIT'ed code by making a new Module that might be a special module that might own all of the JIT'ed code in a process from a specific JIT. So the clang JIT'ed code might require us to make a DynamicLoaderClangJIT DynamicLoader subclass, which would create a new module named with a fake name "<ClangJIT>" that we could add any information to. As new JIT'ed code gets added, new functions and data would get added to the "<ClangJIT>" object file (symbol table symbols and new sections) and symbol file (if we have debug info for the JIT'ed code). Another way would be let the JIT define logical modules in case you want to organize your JIT'ed code a bit more so that you can create many different Clang JIT modules. Either way, all of this work will be done by the DynamicLoaderClangJIT class.

> 
> GDB has a hook for JIT'ed code, but you are right that it only works
> for ELF binaries. The approach there is that GDB sets a breakpoint in
> an extern function, which LLVM calls when emitting code, giving GDB a
> chance to load the symbols. Would a different approach in LLDB be
> desirable, or does that seem OK to you?

That should work, see above comments.
> 
> According to the LLVMdev list, LLVM did not emit DWARF data for JIT'ed
> code as of March 2009 — I'm not sure if this has changed, though I
> suspect it hasn't, so to get this to work I guess there is also a bit
> of work to be done on the LLVM side of things.

Agreed, it would be great to be able to get DWARF for JIT'ed code.

> - Simon

Let me know if you need any explanation on anything mentioned above.

Greg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20101118/251f1dd6/attachment.html>