[lldb-dev] [LLVMdev] MCJIT debugger registration interface.

Mon Aug 11 10:06:02 PDT 2014

On Sun, Aug 10, 2014 at 3:37 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>
>
>> On Aug 10, 2014, at 3:07 PM, Eric Christopher <echristo at gmail.com> wrote:
>>
>>> On Sun, Aug 10, 2014 at 1:43 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>> I think this ignores the real problem with the MCJIT debugging interface: it doesn't give MCJIT clients any way of directly accessing and parsing the debug metadata.
>>
>> Parsing the existing debug metadata isn't necessarily a good idea
>> anyhow. It's not a stable format and is quite large.
>
> I agree. I suspect that a better solution is to have the smarts for grokking the debug data inside LLVM, possibly borrowing logic from lldb. For starters clients like WebKit will want a machine-offset-to-debug-info map, which ain't rocket science - but currently parsing dwarf inside the LLVM client is the only way to do this afaict.

There's some support (originally forked from lldb) already in llvm to
do this. Look at lib/DebugInfo, it's what llvm-dwarfdump, etc are
based upon.

>>
>>> WebKit, and likely other non-C/C++ clients of MCJIT, will not want the MCJIT to register anything with the system debugger. Non-C languages usually have a different set of debugging interfaces and it's up to the client of LLVM to arrange to glue the debugging information that the MCJIT knows about to the debugging interface that the LLVM client knows about. The mcjit's current architecture makes this extremely awkward.
>>>
>>> This is part of a bigger problem in the MCJIT API: it is designed to work like an execution engine for C programs despite the fact that the most compelling use of MCJIT is a higher-tier JIT that is part of a mixed-mode or tiered runtime for non-C languages. Is there some client of the MCJIT that actually benefits from the MCJIT pretending to be an execution engine for C programs?  Is there a reason why this client should get more attention than the seemingly more compelling non-C use cases?
>>
>> The debug metadata is largely based around dwarf debug information,
>> but it isn't a C language based format. I think this is a misleading
>> assertion you make.
>
> That would be a misleading assertion indeed, but it's not the one I'm making. Let me restate.
>
> Clients of optimizing JIT compilers are usually going to want to have some finer-grained control over how that JIT presents debug data to the debugger. Probably all that we want is: the JIT offers its debug data to its client, and the client decides if, and how, this data is presented to any debugger (lldb, gdb, or whatever). A reasonable default can of course be provided, if it leads to a good API.
>
> The MCJIT is currently ill suited to this kind of thing because it pretends to be a black box execution engine for LLVM IR. This black box then makes further assumptions that make sense for programs that target the C runtime. I believe that life would be easier if the task of generating code and the task of linking and executing it were better separated in the API.

I think there are two things here, dwarf level support for things like
line numbers, variable locations, and even some basic type
information. Then there's language support like you'd want to see
debugging a high level language that can't be fully described or has
run time effects - a debugging interface that can be called into for
that could be useful, but I'm not seeing that as necessarily something
that MCJIT would vend but something on top of it. I.e. how a debugger
would handle (bad example here, but...) something like Obj-C or Swift.

>
>>
>> Also, it's your most compelling use case, not the most compelling.
>
> If it isn't the most compelling, then can you provide an example of an MCJIT client that benefits from the current design?
>
> I suspect that most other MCJIT clients will do some similar things to what WebKit does:
>
> - custom runtime that doesn't behave like a C linker.
>
> - custom debugging infrastructure; even if lldb integration is provided, the client's runtime will want lots of control.
>
> - multiple compiler tiers or mixed-mode execution.
>
> - source language that is not like C.
>
> These four things apply to many systems and it would be cool if LLVM became easier to use for those. If you believe that these things are not compelling, then can you describe what kind system you envision MCJIT being used for?
>

Oh, I agree they'd be cool to have as well, but there's also languages
like Swift and Julia that use the JIT. There are all of the
OpenGL/OpenCL/OpenACC accelerator type compilation uses, etc. Just
saying that the Webkit JavaScript compilation strategy isn't the only
compelling use case.

Mostly I think we're in agreement that this sort of functionality
would be useful, just where it goes and whether or not the existing
information that we can vend is also useful.

-eric

>>
>> -eric
>>
>>>
>>> -Filip
>>>
>>>> On Aug 1, 2014, at 6:10 PM, Lang Hames <lhames at gmail.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic.
>>>>
>>>> The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows:
>>>>
>>>> We have two symbols in MCJIT that act as fixed points for the debugger to latch on to:
>>>>
>>>> __jit_debug_register_code is a no-op function that the debugger can set a breakpoint on.  MCJIT will call this function to notify the debugger when an object file is loaded.
>>>>
>>>> __jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory.
>>>>
>>>> There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them.
>>>>
>>>> We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly.
>>>>
>>>> It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular:
>>>>
>>>> (1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched?
>>>>
>>>> (2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures.
>>>>
>>>> Regards,
>>>> Lang.
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev