[Lldb-commits] [PATCH] D129307: Add a new breakpoint partial match settings

Thu Jul 7 15:05:08 PDT 2022

jingham added a comment.

In D129307#3636695 <https://reviews.llvm.org/D129307#3636695>, @clayborg wrote:

> In D129307#3636573 <https://reviews.llvm.org/D129307#3636573>, @jingham wrote:
>
>> I'm not entirely clear what problem this is solving.  Any actor setting breakpoints can already choose the match depth by simply providing that many directory components.  I.e. if I know I have subdir1/foo.cpp and subdir2/foo.cpp I can set a breakpoint on only one of them by doing:
>>
>>   (lldb) break set -f subdir1/foo.cpp -l 10
>>
>> So an IDE could implement this by simply passing either the base name or the base name plus one sub directory, etc.  As you say, that's what command line users do anyway, they seldom type out full paths.  So this is really about IDE's running lldb, and this seems more like a feature the IDE should offer, not lldb.
>
> Xcode has this same issue if you download a dSYM file from the Apple build infrastructure if that dSYM doesn't contain path re-mappings in the UUID plists. Xcode, the IDE, will send down full paths to the source files that it knows about if you set breakpoints. Granted Xcode could work around this, but we added the ability to remap things at the module level in LLDB to work around this issue. So yes, it would be great if all IDEs would do this, but currently none do.
>
>> It also seems awkward to do all this work as part of the breakpoint filtering - which as you have seen is somewhat delicate code.  After all, you can implement "only match the last N components" by only submitting the last N components when looking up the files.  So it would be much more straightforward to strip the leading path components from the file name when you create the breakpoint and you don't have to muck with the filtering at all.  That would also help remove some confusion when you see that the path submitted was /foo/bar/baz.cpp but I matched /some/other/bar/baz.cpp (but not /some/other/different/baz.cpp) because I had set the match suffix count to 1.  It would be nice to have break list show me exactly what it was going to match on.
>
> We might need to always store the path that was given in source file and line breakpoints and then have the breakpoint update its locations when/if either of these two settings are modified. This would also help with your example below where you comment on breakpoint settings depend on the history of the session (they shouldn't).

I took that bit out because in fact the implementation stores the depth in the resolver when made.  So any given breakpoint isn't history dependent, but rather to understand why some breakpoints take and some don't you have to know the history of your settings in the session - which is not recorded anywhere you can get your hands on.  That's a source of confusion for sure but TTTT probably not a huge one.

>> I am also not sure doing this as a general setting for values other than 0 is really helpful.  So far as I can tell, you would use a value of 1 because you know that though you might have files with duplicate names they are always disambiguated by their parent directory name.
>
> Exactly the case I was worried about. Usually zero would be the default IMHO, but if you have two different binaries that both have a "main.cpp" it could help. We can remove the depth and have this setting only set breakpoints by basename if needed if no one wants the extra complexity.

Certainly just doing base name will find all the matches regardless of what the debug info says about the path, and the only problem is extra matches from files with the same name in different directories.  That's only slightly annoying and not a fatal problem because you can always disable locations you don't want, though it would be good to have some way to handle this.  But it seems to me to disambiguate effectively, you really have to know something about the structure  of your project.  You could probably get away with 1 directory as a global setting safely since most of time the sources move rigidly and the source root will be above everything.  But anything above that will require you know that all the source files are that many levels below the source root or your setting will throw out some matches when the source root is moved.   So it seems to me any higher setting is going to be too fiddly to be worth the effort.

>> Using a value of 2 says the names are disambiguated by the two parent directory names.  But there's no guarantee that one value is going to work for all the files in your project and having to change a global setting from breakpoint to breakpoint is awkward.
>>
>> I also don't like that this makes breakpoint settings depend on the history of the session.  For instance:
>>
>> a) Set the partial match to on and the depth to 2
>> b) Set a file & line breakpoint
>> c) Change the partial match depth to 0
>> d) a library gets loaded with a file that wouldn't have matched with depth of 2 but does with 0
>
> We would need to store the full path in the source file and line breakpoints all the time and then set all of the source breakpoints again if this setting changes (enabled/disabled or dir depth changes).

I actually cut this part out of the comment (but after you responded) because the current implementation stores the setting when the breakpoint is made.  So it wouldn't change with changes in the setting - which I think is the right behavior.  There is no other instance where the resolver changes it's mind about how to find matches after the breakpoint is made.  That useful because people can set intentions on individual locations (commands, conditions, etc.) and anything but changes in the underlying binaries that removes locations seems likely to undo user's work and reasonably upset them.

Since I don't think letting the locations change over time is a good idea, it follows that if you are going to do this it's better to have lldb trim the incoming file path based on what it plans to pay attention to rather than to keep a path with implicitly ignored components around till the very end of the breakpoint setting.  It is weird for a breakpoint to show "/foo/bar/bar/blah.c" for the file path even though the breakpoint only plans to match on "bar/blah.c".  These settings really just implement: "I can't be bothered to strip all but the last <N> components from a file path before submitting it for matching, please do that for me."  Given that's what you are doing, then doing it in the most straightforward way seems best, i.e. just pre-process the file path based on what the user told you and record that in the breakpoint.

>> Also, you are supposed to be able to save breakpoints & restore them and provided your binaries haven't changed you will get the same breakpoints.  Now, for that to work, you also have to restore some target setting that wasn't saved with the breakpoints.
>
> We wouldn't actually need to if we always store the path we are given when the breakpoint is originally set, and then update the breakpoint locations when/if the settings are changed (which might cause more locations to show up, or less).

I agree with the first part of this, I don't think changing the resolver as the setting changes is a good idea (see above) but that's an orthogonal concern.

>> I wouldn't mind so much if this were passed in to the breakpoint setting directly, though again, I don't really see the point since this is mostly for IDE's and they can strip however much they want off a path before submitting it w/o involving lldb.
>
> So neither Xcode nor Visual Studio Code make any attempt to remap sources for anyone. The IDEs always pass down the full path currently.

Xcode only does this because for the longest time lldb only supported "base filename" and "full path" matches, we didn't do partial path matches.  They have a little bit of a problem figuring out the "root directory" of a project so they know what to strip from the path, but this setting wouldn't help with that anyway.

> Just to let everyone know where we are going with this: we want to implement an auto source map feature after this patch where if the user sets a breakpoint with "/some/build/path/<src-root>/bar/baz/foo.cpp", and the debug info contains "<any-path>/<src-root>/bar/baz.foo.cpp", we can have LLDB automatically create a source mapping for "/some/build/path" -> "<any-path>" where "<any-path>" can be a relative path like "." or an absolute path like "/local/file/path". The BUCK build system uses relative paths in the DWARF for all source files so that cached compiled versions of .o and .a files can be downloaded from the build infrastructure if no one has changed sources in a particular directory. This lets build servers build and cache intermediate files on another machine and send them over to user machines to speed up local builds. Apple's B&I builds on some remote build server using different paths and then uploads results to other machines in different directories and also archives the sources in other locations. So having this feature will help users to be able to set breakpoints more reliably without having to set the source mapping settings manually.

I wonder about this a bit.  Seems to me there are two separate problems that need solving.  There's the main problem that source-maps are supposed to solve, namely finding actual source files on disk so that we can show source lines as you step.  But the breakpoint system (except for `break set -p`) only needs to match paths, it doesn't care if any of the elements of the path actually exists.  And really, the only reason why the breakpoint system needs paths at all is to disambiguate files with the same basename in different directories.  That's all it cares about.  Tying "file name disambiguation" too closely with actually ensuring that you can find real files on disk will mean you only solve the disambiguation problem if you have sources locally, which has never been a requirement of breakpoints.

Resolving breakpoint disambiguation also seems to me to be pretty different from the job of building these auto path maps.  In building a path map, you are asking a question about equivalent prefixes: "how much should I chop off the beginning of these two paths to get an equivalence".  For disambiguation you are asking "how many directories above the base name do I need to look at".  It is awkward to try to use a specification for "N directories above the base name for any path I might find in the debug information" as a way to determine "where should I start looking for the equivalent prefix to strip to make a source map".  The source paths in a complex project are at all different levels so that seems the wrong way to specify it.

And of course, you will have to do this in some way that's independent of breakpoints, since it also has to work with source list, and the frame printing which are the places where we actually care about the source map directly..  So whatever you do it can't be strictly linked to setting breakpoints.

> We can implement this feature in lldb-vscode, but I would much rather have a solution in LLDB itself if this is useful to anyone other than just lldb-vscode. I am not aware of any IDEs that do source remapping for debugger in any way, so I think this feature it would be valuable to just about any IDE, but we can just put the feature in lldb-vscode and let everyone else fend for themselves if no one wants this feature in LLDB.
>
> We would welcome feedback from anyone on the review or subscriber list.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129307/new/

https://reviews.llvm.org/D129307