[lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)

Jim Ingham via lldb-dev lldb-dev at lists.llvm.org
Thu Aug 15 12:29:12 PDT 2019



> On Aug 15, 2019, at 11:55 AM, Pavel Labath <pavel at labath.sk> wrote:
> 
> On 15/08/2019 20:15, Jim Ingham wrote:
>> Thanks for your great comments.  A few replies...
>>> On Aug 15, 2019, at 10:10 AM, Pavel Labath via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>>> I am wondering whether we really need to involve the memory allocation functions here. What's the size of this address structure? I would expect it to be relatively small compared to the size of the entire register context that we have just saved to the stack. If that's the case, the case then maybe we could have the trampoline allocate some space on the stack and pass that as an argument to the $__lldb_arg building code.
>> You have no guarantee that only one thread is running this code at any given time.  So you would have to put a mutex in the condition to guard the use of this stack allocation.  That's not impossible but it means you're changing threading behavior.  Calling the system allocator might take a lock but a lot of allocation systems can hand out small allocations without locking, so it might be simpler to just take advantage of that.
> 
> I am sorry, but I am confused. I am suggesting we take a slice of the stack from the thread that happened to hit that breakpoint, and use that memory for the __lldb_arg structure for the purpose of evaluating the condition on that very thread. If two threads hit the breakpoint simultaneously, then we just allocate two chunks of memory on their respective stacks. Or am I misunderstanding something about how this structure is supposed to be used?
> 

Right, I missed that you were suggesting this on the stack - somehow I thought you meant in the allocation we made when setting up the condition.

> 
>>> 
>>> Another possible fallback behavior would be to still do the whole trampoline stuff and everything, but avoid needing to overwrite opcodes in the target by having the gdb stub do this work for us. So, we could teach the stub that some addresses are special and when a breakpoint at this location gets hit, it should automatically change the program counter to some other location (the address of our trampoline) and let the program continue. This way, you would only need to insert a single trap instruction, which is what we know how to do already. And I believe this would still bring a major speedup compared to the current implementation (particularly if the target is remote on a high-latency link, but even in the case of local debugging, I would expect maybe an order of magnitude faster processing of conditional breakpoints).
>> This is a clever idea.  It would also mean that you wouldn't have to figure out how to do register saves and restores in code, since debugserver already knows how to do that, and once you are stopped it is probably not much slower to have debugserver do that job than have the trampoline do it.  It also has the advantage that you don't need to deal with the problem where the space that you are able to allocate for the trampoline code is too far away from the code you are patching for a simple jump.  It would certainly be worth seeing how much faster this makes conditions.
> 
> I actually thought we would use the exact same trampoline that would be used for the full solution (so it would do the register saves, restores, etc), and the stub would only help us to avoid trampling over a long sequence of instructions. But other solutions are certainly possible too...

Sure, I was thinking that if you were allowing debugserver to intervene, you could just inject something like:

void handler(void) {
   args = arg_builder()
   condition_function(args);
   trap;
}

and let debugserver do:

save registers
set pc to handler
continue
hit a trap
decide whether this was the condition function trap or the handler trap
restore registers
set pc back to trap instruction
return control to lldb or single step over the instruction and continue

That way you wouldn't have to mess with injecting the trampoline, unwinding, etc.  This should be pretty easy to hack up, and see how much faster this was.

> 
>> Unless I'm missing something you would still need two traps.  One in the main instruction stream and one to stop when the condition is true.  But maybe you meant "a single kind of insertion - a trap" not  "a single trap instruction" 
> 
> I meant "a single in the application's instruction stream". The counts of traps in the code that we generate aren't that important, as we can do what we want there. But if we insert just a single trap opcode, then we are guaranteed to overwrite only one instruction, which means the whole "are we overwriting a jump target" discussion becomes moot. OTOH, if we write a full jump code then we can overwrite a *lot* of instructions -- the shortest sequence that can jump anywhere in the address space I can think of is something like pushq %rax; movabsq $WHATEVER, %rax; jmpq *%rax. Something as big as that is fairly likely to overwrite a jump target.
> 
> 
> ...
>>> 
>>> This would be kind of similar to the "cond_list" in the gdb-remote "Z0;addr,kind;cond_list" packet <https://sourceware.org/gdb/onlinedocs/gdb/Packets.html>.
>>> 
>>> In fact, given that this "instruction shifting" is the most unpredictable part of this whole architecture (because we don't control the contents of the inferior instructions), it might make sense to do this approach first, and then do the instruction shifting as a follow-up.
>> One side-benefit we are trying to get out of the instruction shifting approach is not having to stop all threads when inserting breakpoints as often as possible.  Since we can inject thread ID tests into the condition as well, doing the instruction shifting would mean you could specify thread-specific breakpoints, and then ONLY the threads that match the thread specification would ever have to be stopped.  You could also have negative tests so that you could specify "no stop" threads.  So I still think it is worthwhile pursuing the full implementation Ismail outlined in the long run.
> 
> No argument there. I'm am just proposing this as a stepping stone towards the final goal.
> 
> Interestingly, this is one of the places where the otherwise annoying linux ptrace behavior may come in really handy. Since a thread hitting a breakpoint does not automatically stop all other threads in the process (we have to manually stop all of them ourselves), the lldb-server could do the trampoline stuff without of the other threads in the process noticing anything.

You can do the same thing with Mach.  When one thread stops the others are still running and we also have to stop them.  Mach is just a little more convenient in that there's a single "task_suspend" call that keeps any threads from running, so we don't have to go stop them one by one.

Jim

> 
> pl



More information about the lldb-dev mailing list