[Lldb-commits] [lldb] Add a python JIT loader class. (PR #142514)

Tue Jun 17 13:57:10 PDT 2025

clayborg wrote:

> It also seems architecturally wrong to try to guess and influence what BreakpointResolvers do behind their backs. After all, the resolver might be just some Python Code you know nothing about. How would you instrument that? If I set a regular expression name breakpoint, will you know to compare that regex against what the JIT produces? What about source regular expression breakpoints? Do you figure out what the containing source file is and observe that?

We currently only handle source file + line breakpoints and breakpoints by name. For source file + line breakpoints the JIT keeps metadata that says "this function contains these source file + line ranges". When we get notified that a breakpoint was set, we just need to know the source file + line, and then it allows the JIT to modify the metadata it contains if the function hasn't been JIT'ed yet. If it has been JIT'ed, it will load the debug info for that function immediately if it hasn't already been loaded. This allows the breakpoint to naturally resolve itself as soon as the debug info is loaded. If the debug info hasn't been loaded, then the modification to the metadata in the JIT marks the function as being needed by the debugger and if and when and only when it gets JIT'ed we will load the module for it and the breakpoint will naturally resolve itself. Same thing for functions by name.

> Having a system where "if you set these kinds of breakpoints we'll be able to intervene, but other breakpoint types just won't work" seems awkward. If you are going to only support certain breakpoint types for JIT debugging, it seems much better to make that an explicit JIT breakpoint type writing a custom resolver that cooperates with your JIT engine to register interest and get called back when JIT events occur that are relevant to it.

We have a system that is already working just being notified about breakpoints in the JIT loader. Yes, it doesn't handle all breakpoint types right now, but we are getting this to work as proof of concept with a JIT loader that does everything lazilly. 

So right now only functions that have breakpoints set in them need debug info to be generated and that is if and only if they ever get JIT'ed. It works quite well, abeit we only support two kinds of breakpoints currently. Then if a stack trace goes through a function that doesn't resolve, we can lazily load the debug info for it on the fly only when  we have a backtrace that traverses through a JIT'ed frame that we don't have debug info for yet.

> Either that or we need to introduce the notion of a "dynamic symbol resolver" that you can register information about file names or symbol names, and then have the standard breakpoint resolvers check if one of these exists and registers interest for the names and files it is looking for. But trying to suss out what a resolver is going to do from the outside isn't the right way to go.

Happy to meet and discuss anytime. But this PR has isn't doing any of those things yet. This just enabled python JIT loaders which we need for other purposes as well.

> I think I'd come at if from the opposite direction. We don't currently know what the full set of messages that we want to send are, so making one class that receives all the messages we know about at present seems limiting.
> 
> What I was proposing instead is that when we add a way to register a callback to some event in lldb, we extend the registry to indicate not just the class that will be instantiated to watch the event, but which method is the responder.

How do we store an instance of a class and then call a method on it? We have no notion of a baton in python callbacks right now. If we allowed python callbacks to be registered with any python object as a baton, then this could be made to work, but we probably shouldn't try to call a method on some object as there is no way to specify that on the command line. We should also be able to do this with an API in the public API via callbacks with batons.
> 
> That way, for instance, you can register a stop-hook with your class, and then you will have launch and attach callbacks already. But you need a way to say "Use a common instance of this class per-whatever entity owns that callback" and "use this method(s) on my object".

That is what I don't know how to implement. How would we do this on the command line? Would we need a global variable to contain the class instance? 

> 
> That way we don't have to hook everything up for you, but rather it will be easy for designers to make a class where they can hook up the particular callbacks they need.

I am fine with this as long as the solution doesn't require using command line commands to do it and we have APIs. How about public APIs where we register callbacks and make sure that when doing it through python we can specify a python object as a baton that gets given back when the callback is called. Right now all batons for python are non existent because we use the native baton for the python implementation.

https://github.com/llvm/llvm-project/pull/142514