[Lldb-commits] [PATCH] D35607: Extend 'target symbols add' to load symbols from a given file by UUID.

Thu Jul 20 09:44:39 PDT 2017

clayborg added a comment.

In https://reviews.llvm.org/D35607#815769, @labath wrote:

> In https://reviews.llvm.org/D35607#814982, @clayborg wrote:
>
> > So we should just make sure this works:
> >
> >   (lldb) target symbols add --shlib a.out debug_binaries/a.out
> >
> >
> > The --uuid option is there to help you match up without having to specify the file path. If I am understanding correctly, "a.out" has no UUID or build ID, but "debug_binaries/a.out" does?
>
>
> In this case, neither of the files has a build id/UUID. However, the tricky part here is that in this case we "invent" a UUID by doing a checksum of the whole file. This is done to support the .gnu_debuglink https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html method of finding symbols. It works like this:
>
> - if a file has a gnu_debuglink section, we set the UUID to the crc inside that section
> - if a file does not have a gnu_debuglink section, we *assume* it will be used as a gnu_debuglink target, and also set the UUID to the crc we compute from the whole file contents. This works fine for the gnu_debuglink case, as then the rest of the machinery will match the files based on their UUIDs. However, it does not work for the general case without the debug link, as then the files will end up with two different "UUID"s (because they are based on the file content), and everything will consider them as different.
>
>   The solution that would seem best to me here would be to leave the UUID of files which don't have a build-id as empty, as the gnu_debuglink is not really a UUID -- it's main function is to contain a file path, and the crc just acts as a sanity check. This should make the logic for matching two uuid-less files saner, but it would require us to implement the gnu_debuglink method in a different way (which I think is fine, as the current way seems pretty hacky, and causes us to often compute crcs of large files needlessly). I have no idea how much of our existing code would blow up if we started having files without a uuid though.
>
>   What do you think about that?

The UUID is assumed to be a unique identifier that isn't based on the entire file contents, but on the content the the important bits (.text, .data, etc). On Mac, the mach-o files have a LC_UUID which is generated by carefully doing and MD5 checksum on only the bits that don't change due to debug info or strings from debug info. This means a binary that is built  in /tmp/a or in /tmp/b with debug info will result in the same UUID. When we create a stand alone debug info file, this UUID gets copied into it.

One other thing we could try is to add a type to the lldb_private::UUID class that specifies the type of the UUID:

  namespace lldb_private {
  class UUID {
  public:
    enum Type {
      eTypeBuildID,
      eTypeFileCRC, // CRC of the entire file contents
      eTypeDebugInfoCRC, // CRC of the debug info
    };
  };

If I understood your comments above, one file would have a UUID::eTypeBuildID type and the other would have either a UUID::eTypeFileCRC or UUID::eTypeDebugInfoCRC. Then we teach the ObjectFile subclasses to properly set the UUID type. When comparing UUID values, we then could note that the types are not the same and do something more intelligent. Like when adding a symbol file, we could ignore the UUID mismatch if the UUID::Type doesn't match and just go on filename. But we would honor the mismatch if the UUID::Type was the same.

https://reviews.llvm.org/D35607