[Lldb-commits] [lldb] Initial step in targets DAP support (PR #86623)

Tue Mar 26 17:33:44 PDT 2024

jimingham wrote:

> This patch provides the initial implementation for the "Step Into Specific/Step In Targets" feature in VSCode DAP.
> 
> The implementation disassembles all the call instructions in step range and try to resolve operand name (assuming one operand) using debug info. Later, the call target function name is chosen by end user and specified in the StepInto() API call.
> 
> It is v1 because of using the existing step in target function name API. This implementation has several limitations:
> 
> * Won't for indirect/virtual function call -- in most cases, our disassembler won't be able to solve the indirect call target address/name.

Targeted step-in's don't predict anything.  The algorithm works by stepping in and then matching the target name against the function you've stopped in, and if it doesn't match step out and keep going.  So you don't ever need to resolve any names up front.  Are you saying for indirect/virtual function calls you can't figure out the name once you've arrived at the target?

> * Won't work for target function without debug info -- if the target function has symbol but not debug info, the existing ThreadPlanStepInRange won't stop.

That's just a matter of adding the --step-in-avoids-nodebug option to the SB API's.  That works from the command line
already.

> * Relying on function names can be fragile -- if there is some middle glue/thunk code, our disassembler can only resolve the glue/thunk code's name not the real target function name. It can be fragile to depend compiler/linker emits the same names for both.

I think the better way to solve this is to make the matcher more flexible.  Also, lldb should always step through glue or thunks automatically.  That's going to get handled by the trampoline plan so the targeted StepIn plan won't have to know about these.  If you have a system where there aren't trampoline handlers for your indirect thunks or other glue code, you would be better served adding those than trying to get the step in plans to handle them directly.

> * Does not support step into raw address call sites -- it is a valid scenario that in Visual Studio debugger, user can explicitly choose a raw address to step into which land in the function without debug info/symbol, then choose UI to load the debug info on-demand for that module/frame to continue exploring.

That would be trivial to add.

> 
> A more reliable design could be extending the ThreadPlanStepInRange to support step in based on call-site instruction offset/PC which I will propose in next iteration.

For setting step-in targets based on call-site instruction, you definitely will need to add some support for a pc rather than a symbol match.  That should be quite easy.

But I don't think that's a viable method for the "step in target" that people really want to use.  They see:

`some_func(other_func(method_call()));`

and they want to step into method_call.  I don't think there's any way you are going to know what address "method_call" will resolve to in this context.  You'd have to know what "this" actually is, and deal with overload resolution, etc.  That does not sound very doable to me.

Note that this feature as implemented is robust because it doesn't try to predict anything.  It just steps through the range you told it to, stepping one level deep (not counting trampolines) and see if the place it stopped matches the target.  I don't think switching to a "predict the target up front and go there" method will be nearly as reliable.  The expression you are stepping through could be way more complicated than the one I showed, and trying to figure out what some text snippet is actually going to resolve to could be quite complex.  

Adding the address match in this algorithm is easy, since instead of asking "does the name match my target" you would ask "an I at the target address".

The problem of thunks and trampolines should be handled by having a trampoline handler that handles them.  That will benefit all of your step-in behavior since otherwise you'll be stopping in the thunk, which no-one really wants.  And it will mean you can ignore them for the purposes of targeted step in.

For the other problems you mentioned, I think what is needed is a richer way to provide the match between what the user sees in their code, and the symbol they are actually going to stop at.  We could allow regex matches, or be smarter about stripping out irrelevant bits to get to a base name as we do in other instances.

https://github.com/llvm/llvm-project/pull/86623