[Lldb-commits] [PATCH] D129307: Add a new breakpoint partial match settings

Thu Jul 7 16:39:46 PDT 2022

clayborg added a comment.

In D129307#3637310 <https://reviews.llvm.org/D129307#3637310>, @jingham wrote:

> In D129307#3636695 <https://reviews.llvm.org/D129307#3636695>, @clayborg wrote:
>
>> In D129307#3636573 <https://reviews.llvm.org/D129307#3636573>, @jingham wrote:
>>
>>> I'm not entirely clear what problem this is solving.  Any actor setting breakpoints can already choose the match depth by simply providing that many directory components.  I.e. if I know I have subdir1/foo.cpp and subdir2/foo.cpp I can set a breakpoint on only one of them by doing:
>>>
>>>   (lldb) break set -f subdir1/foo.cpp -l 10
>>>
>>> So an IDE could implement this by simply passing either the base name or the base name plus one sub directory, etc.  As you say, that's what command line users do anyway, they seldom type out full paths.  So this is really about IDE's running lldb, and this seems more like a feature the IDE should offer, not lldb.
>>
>> Xcode has this same issue if you download a dSYM file from the Apple build infrastructure if that dSYM doesn't contain path re-mappings in the UUID plists. Xcode, the IDE, will send down full paths to the source files that it knows about if you set breakpoints. Granted Xcode could work around this, but we added the ability to remap things at the module level in LLDB to work around this issue. So yes, it would be great if all IDEs would do this, but currently none do.
>>
>>> It also seems awkward to do all this work as part of the breakpoint filtering - which as you have seen is somewhat delicate code.  After all, you can implement "only match the last N components" by only submitting the last N components when looking up the files.  So it would be much more straightforward to strip the leading path components from the file name when you create the breakpoint and you don't have to muck with the filtering at all.  That would also help remove some confusion when you see that the path submitted was /foo/bar/baz.cpp but I matched /some/other/bar/baz.cpp (but not /some/other/different/baz.cpp) because I had set the match suffix count to 1.  It would be nice to have break list show me exactly what it was going to match on.
>>
>> We might need to always store the path that was given in source file and line breakpoints and then have the breakpoint update its locations when/if either of these two settings are modified. This would also help with your example below where you comment on breakpoint settings depend on the history of the session (they shouldn't).
>
> I took that bit out because in fact the implementation stores the depth in the resolver when made.  So any given breakpoint isn't history dependent, but rather to understand why some breakpoints take and some don't you have to know the history of your settings in the session - which is not recorded anywhere you can get your hands on.  That's a source of confusion for sure but TTTT probably not a huge one.

I would love for people to be able to enable/disable this setting to see if it fixes their debug session. If we go the route you are suggesting, we will need to set the setting and then remove the breakpoint and try to set it again. But I would love it if people could just enable this setting and see if it fixes their breakpoint issues. If the breakpoints always stored the full path that was supplied, and if the breakpoint always listened to the target settings, then nothing would need to change on the breakpoint storage end. We might need to add a "resolved path" to the source file and line breakpoints which would be the path that was used to set the breakpoint which we could show if it differs from the specified path. The breakpoints would always re-resolve if the settings were changed.

>>> I am also not sure doing this as a general setting for values other than 0 is really helpful.  So far as I can tell, you would use a value of 1 because you know that though you might have files with duplicate names they are always disambiguated by their parent directory name.
>>
>> Exactly the case I was worried about. Usually zero would be the default IMHO, but if you have two different binaries that both have a "main.cpp" it could help. We can remove the depth and have this setting only set breakpoints by basename if needed if no one wants the extra complexity.
>
> Certainly just doing base name will find all the matches regardless of what the debug info says about the path, and the only problem is extra matches from files with the same name in different directories.  That's only slightly annoying and not a fatal problem because you can always disable locations you don't want, though it would be good to have some way to handle this.  But it seems to me to disambiguate effectively, you really have to know something about the structure  of your project.  You could probably get away with 1 directory as a global setting safely since most of time the sources move rigidly and the source root will be above everything.  But anything above that will require you know that all the source files are that many levels below the source root or your setting will throw out some matches when the source root is moved.   So it seems to me any higher setting is going to be too fiddly to be worth the effort.

Yeah, I would be fine getting rid of the directory depth setting as it does seem to fiddly and could cause problems and I am not sure anyone will know what to set it to.

>>> Using a value of 2 says the names are disambiguated by the two parent directory names.  But there's no guarantee that one value is going to work for all the files in your project and having to change a global setting from breakpoint to breakpoint is awkward.
>>>
>>> I also don't like that this makes breakpoint settings depend on the history of the session.  For instance:
>>>
>>> a) Set the partial match to on and the depth to 2
>>> b) Set a file & line breakpoint
>>> c) Change the partial match depth to 0
>>> d) a library gets loaded with a file that wouldn't have matched with depth of 2 but does with 0
>>
>> We would need to store the full path in the source file and line breakpoints all the time and then set all of the source breakpoints again if this setting changes (enabled/disabled or dir depth changes).
>
> I actually cut this part out of the comment (but after you responded) because the current implementation stores the setting when the breakpoint is made.

Yeah, I would propose we always store the full path, and maybe add a "resolved path" to the source file and line breakpoint which can be shorter. It could show up in the "breakpoint list" output only if it differs from the specified path.

> So it wouldn't change with changes in the setting - which I think is the right behavior.  There is no other instance where the resolver changes it's mind about how to find matches after the breakpoint is made.  That useful because people can set intentions on individual locations (commands, conditions, etc.) and anything but changes in the underlying binaries that removes locations seems likely to undo user's work and reasonably upset them.

I still go back to the situation where breakpoints are not working and would love to be able to enabled this setting and see if breakpoints end up resolving. Right now we always have to tell people to run "image dump line-table Foo.cpp" and tell us if you see anything. I think having a global setting that helps breakpoint be set would be fine as long as it would re-resolve any needed breakpoints if it were changed. Most people would enable this setting by default in their init files and not change it, so it wouldn't affect users on a day to day basis, but it would help with getting breakpoints working for people.

> Since I don't think letting the locations change over time is a good idea, it follows that if you are going to do this it's better to have lldb trim the incoming file path based on what it plans to pay attention to rather than to keep a path with implicitly ignored components around till the very end of the breakpoint setting.  It is weird for a breakpoint to show "/foo/bar/bar/blah.c" for the file path even though the breakpoint only plans to match on "bar/blah.c".  These settings really just implement: "I can't be bothered to strip all but the last <N> components from a file path before submitting it for matching, please do that for me."  Given that's what you are doing, then doing it in the most straightforward way seems best, i.e. just pre-process the file path based on what the user told you and record that in the breakpoint.

It would be less intrusive of a change yes I agree. Not sure if users will then wonder why the breakpoint doesn't match their input. Like if they do:

  (lldb) settings set target.breakpoint-partial-match true
  (lldb) breakpoint set --file /path/to/foo.cpp --line 12
  (lldb) settings set target.breakpoint-partial-match false
  (lldb) breakpoint set --file /path/to/foo.cpp --line 12
  (lldb) breakpoint list

I wonder if the user will know why one has "foo.cpp" and one has "/path/to/foo.cpp". I can see your point that is it easier to implement this way, but would love to be able to have people enable this feature and just see if it fixes everything if possible without too many intrusive code changes.

>>> Also, you are supposed to be able to save breakpoints & restore them and provided your binaries haven't changed you will get the same breakpoints.  Now, for that to work, you also have to restore some target setting that wasn't saved with the breakpoints.
>>
>> We wouldn't actually need to if we always store the path we are given when the breakpoint is originally set, and then update the breakpoint locations when/if the settings are changed (which might cause more locations to show up, or less).
>
> I agree with the first part of this, I don't think changing the resolver as the setting changes is a good idea (see above) but that's an orthogonal concern.

I see your point, not sure what the user will expect from the above command sequence I mentioned. The auto source map will become harder to implement if we go the route you suggest.

>>> I wouldn't mind so much if this were passed in to the breakpoint setting directly, though again, I don't really see the point since this is mostly for IDE's and they can strip however much they want off a path before submitting it w/o involving lldb.
>>
>> So neither Xcode nor Visual Studio Code make any attempt to remap sources for anyone. The IDEs always pass down the full path currently.
>
> Xcode only does this because for the longest time lldb only supported "base filename" and "full path" matches, we didn't do partial path matches.  They have a little bit of a problem figuring out the "root directory" of a project so they know what to strip from the path, but this setting wouldn't help with that anyway.
>
>> Just to let everyone know where we are going with this: we want to implement an auto source map feature after this patch where if the user sets a breakpoint with "/some/build/path/<src-root>/bar/baz/foo.cpp", and the debug info contains "<any-path>/<src-root>/bar/baz.foo.cpp", we can have LLDB automatically create a source mapping for "/some/build/path" -> "<any-path>" where "<any-path>" can be a relative path like "." or an absolute path like "/local/file/path". The BUCK build system uses relative paths in the DWARF for all source files so that cached compiled versions of .o and .a files can be downloaded from the build infrastructure if no one has changed sources in a particular directory. This lets build servers build and cache intermediate files on another machine and send them over to user machines to speed up local builds. Apple's B&I builds on some remote build server using different paths and then uploads results to other machines in different directories and also archives the sources in other locations. So having this feature will help users to be able to set breakpoints more reliably without having to set the source mapping settings manually.
>
> I wonder about this a bit.  Seems to me there are two separate problems that need solving.  There's the main problem that source-maps are supposed to solve, namely finding actual source files on disk so that we can show source lines as you step.  But the breakpoint system (except for `break set -p`) only needs to match paths, it doesn't care if any of the elements of the path actually exists.  And really, the only reason why the breakpoint system needs paths at all is to disambiguate files with the same basename in different directories.  That's all it cares about.  Tying "file name disambiguation" too closely with actually ensuring that you can find real files on disk will mean you only solve the disambiguation problem if you have sources locally, which has never been a requirement of breakpoints.

Sources maps are the only way to get breakpoints working right now if the full path specified by the IDE doesn't match the debug info. If the user doesn't set them correctly, then no breakpoints happen at all. By setting breakpoints by basename, we can deduce the needed source mappings required by matching as many source directories as possible and then adding a source map to the settings so that everything works when stopped at the breakpoint. So we can easily locate the root directories because usually there are a few extra directories that match in the path.

> Resolving breakpoint disambiguation also seems to me to be pretty different from the job of building these auto path maps.  In building a path map, you are asking a question about equivalent prefixes: "how much should I chop off the beginning of these two paths to get an equivalence".  For disambiguation you are asking "how many directories above the base name do I need to look at".  It is awkward to try to use a specification for "N directories above the base name for any path I might find in the debug information" as a way to determine "where should I start looking for the equivalent prefix to strip to make a source map".  The source paths in a complex project are at all different levels so that seems the wrong way to specify it.

We don't need the directory depth at all to make this work, using basenames is all we need. Then it is very easy to set a breakpoint a "/foo/bar/<root-dir>/src/foo.cpp" and see that a location was matched at "./<root-dir>/src/foo.cpp" and make the auto source map entry of "/foo/bar" -> ".". The "." can also be any other directory. Since most projects have a source root and the debug info usually contains the debug info that is compiled all for those directories, it makes it easy to deduce the auto source map regardless of how this breakpoint setting strategy is done.

> And of course, you will have to do this in some way that's independent of breakpoints, since it also has to work with source list, and the frame printing which are the places where we actually care about the source map directly..  So whatever you do it can't be strictly linked to setting breakpoints.

We are again requiring people to be able to correctly set source maps before they can effectively debug which causes many debug sessions to fail for people. Most people debug projects that are relative to one root directory, so if people are already setting their source maps correctly., good for them, nothing will happen and things will just work. If they don't, then now they have a chance to debug instead of just running their code. Any system where the build system is not integrated into the IDE, like VS code, now requires to the user to know the details of how the build system works and how to set the debugger up in order for them to get things working and just be able to hit a breakpoint. Xcode doesn't have this problem because it is well integrated with the build system. VS Code doesn't have a build system plug-in, so everyone rolls their own (cmake + ninja, make, buck, etc) and each has its own ways of doing things. It isn't easy to dig ask the build system what it is doing.

So while it is true that setting breakpoints won't always fix everything, the truth is that users in IDEs set file and line breakpoints 99% of the time and this can help users debug more effectively more of the time without having to know the in depth detail of how LLDB requires source maps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129307/new/

https://reviews.llvm.org/D129307