[lldb-dev] [RFC] Fast Conditional Breakpoints (FCB)

Thu Aug 15 11:55:17 PDT 2019

On 15/08/2019 20:15, Jim Ingham wrote:
> Thanks for your great comments.  A few replies...
> 
>> On Aug 15, 2019, at 10:10 AM, Pavel Labath via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>> I am wondering whether we really need to involve the memory allocation functions here. What's the size of this address structure? I would expect it to be relatively small compared to the size of the entire register context that we have just saved to the stack. If that's the case, the case then maybe we could have the trampoline allocate some space on the stack and pass that as an argument to the $__lldb_arg building code.
> 
> You have no guarantee that only one thread is running this code at any given time.  So you would have to put a mutex in the condition to guard the use of this stack allocation.  That's not impossible but it means you're changing threading behavior.  Calling the system allocator might take a lock but a lot of allocation systems can hand out small allocations without locking, so it might be simpler to just take advantage of that.

I am sorry, but I am confused. I am suggesting we take a slice of the 
stack from the thread that happened to hit that breakpoint, and use that 
memory for the __lldb_arg structure for the purpose of evaluating the 
condition on that very thread. If two threads hit the breakpoint 
simultaneously, then we just allocate two chunks of memory on their 
respective stacks. Or am I misunderstanding something about how this 
structure is supposed to be used?

>>
>> Another possible fallback behavior would be to still do the whole trampoline stuff and everything, but avoid needing to overwrite opcodes in the target by having the gdb stub do this work for us. So, we could teach the stub that some addresses are special and when a breakpoint at this location gets hit, it should automatically change the program counter to some other location (the address of our trampoline) and let the program continue. This way, you would only need to insert a single trap instruction, which is what we know how to do already. And I believe this would still bring a major speedup compared to the current implementation (particularly if the target is remote on a high-latency link, but even in the case of local debugging, I would expect maybe an order of magnitude faster processing of conditional breakpoints).
> 
> This is a clever idea.  It would also mean that you wouldn't have to figure out how to do register saves and restores in code, since debugserver already knows how to do that, and once you are stopped it is probably not much slower to have debugserver do that job than have the trampoline do it.  It also has the advantage that you don't need to deal with the problem where the space that you are able to allocate for the trampoline code is too far away from the code you are patching for a simple jump.  It would certainly be worth seeing how much faster this makes conditions.

I actually thought we would use the exact same trampoline that would be 
used for the full solution (so it would do the register saves, restores, 
etc), and the stub would only help us to avoid trampling over a long 
sequence of instructions. But other solutions are certainly possible too...

> 
> Unless I'm missing something you would still need two traps.  One in the main instruction stream and one to stop when the condition is true.  But maybe you meant "a single kind of insertion - a trap" not  "a single trap instruction" 

I meant "a single in the application's instruction stream". The counts 
of traps in the code that we generate aren't that important, as we can 
do what we want there. But if we insert just a single trap opcode, then 
we are guaranteed to overwrite only one instruction, which means the 
whole "are we overwriting a jump target" discussion becomes moot. OTOH, 
if we write a full jump code then we can overwrite a *lot* of 
instructions -- the shortest sequence that can jump anywhere in the 
address space I can think of is something like pushq %rax; movabsq 
$WHATEVER, %rax; jmpq *%rax. Something as big as that is fairly likely 
to overwrite a jump target.

...
> 
>>
>> This would be kind of similar to the "cond_list" in the gdb-remote "Z0;addr,kind;cond_list" packet <https://sourceware.org/gdb/onlinedocs/gdb/Packets.html>.
>>
>> In fact, given that this "instruction shifting" is the most unpredictable part of this whole architecture (because we don't control the contents of the inferior instructions), it might make sense to do this approach first, and then do the instruction shifting as a follow-up.
> 
> One side-benefit we are trying to get out of the instruction shifting approach is not having to stop all threads when inserting breakpoints as often as possible.  Since we can inject thread ID tests into the condition as well, doing the instruction shifting would mean you could specify thread-specific breakpoints, and then ONLY the threads that match the thread specification would ever have to be stopped.  You could also have negative tests so that you could specify "no stop" threads.  So I still think it is worthwhile pursuing the full implementation Ismail outlined in the long run.

No argument there. I'm am just proposing this as a stepping stone 
towards the final goal.

Interestingly, this is one of the places where the otherwise annoying 
linux ptrace behavior may come in really handy. Since a thread hitting a 
breakpoint does not automatically stop all other threads in the process 
(we have to manually stop all of them ourselves), the lldb-server could 
do the trampoline stuff without of the other threads in the process 
noticing anything.

pl