[lldb-dev] Handling of the ELF files missing build-ids?
Greg Clayton via lldb-dev
lldb-dev at lists.llvm.org
Sat Aug 4 10:28:48 PDT 2018
> On Aug 3, 2018, at 6:17 PM, Leonard Mosescu <mosescu at google.com> wrote:
>
> Greg, Mark,
>
> Looking at the code, LLDB falls back to a full file crc32 to create the module UUID if the ELF build-id is missing. This works, in the sense that the generated UUID does indeed identify the module.
>
> But there are a few problems with this approach:
>
> 1. First, runtime performance: a full file crc32 is a terribly inefficient way to generate a temporary UUID that is basically just used to match a local file to itself.
> - especially when some unstripped binaries can be very large. for example a local chromium build produces a 5.3Gb chrome binary
> - the crc32 implementation is decent, but single-threaded
> - to add insult to the injury, it seems a small bug defeats the intention to cache the hash value so it ends up being recalculated multiple times
>
> 2. The fake UUID is not going to match any external UUID that may be floating around (and yet not properly embedded into the binary)
> - an example is Breakpad, which unfortunately also attempts to make up UUIDs when the build-id is missing (something we'll hopefully fix soon)
>
> Is there a fundamental reason to calculate the full file crc32? If not I propose to improve this based on the following observations:
>
> A. Model the reality more accurately: an ELF w/o a build-id doesn't really have an UUID. So use a zero-length UUID in LLDB.
> B. The full file name should be enough to prove the identity of a local module.
> C. If we try to match an external UUID (ex. from a minidump) with a local file which does not have an UUID it may help to have an option to allow it to match (off by default, and only if there's no better match)
>
> What do you think?
I am fine with all the above except some reservations about case C. No need to calculate something if it isn't useful. For case C it should be fine to never match as if a file has a UUID to begin with it typically isn't something that gets stripped in a stripped binary. So we should either have it or not. If breakpad does calculate a CRC32, then we need to know to ignore the UUID. The problem is we probably won't be able to tell what the UUID is: real from build ID, or from GNU debug info CRC, or CRC of entire file. So the minidump code will need to do something here. If a minidump has the linux auxv and memory map in them, then we might need to dig through the section information and deduce if a file matches or not based off the size of mapped program headers to further help with the matching.
One other idea is to make a set of enumerations for the UUID type:
class UUID {
enum class Type {
BuildID, // A build ID from the compiler or linker
GNUDebugInfoCRC, // GNU debug info CRC
MD5, // MD5 of entire file
MD5NonDebug, // MD5 of the non debug info related bits
CRC32, // CRC32 of entire file
Other, // Anything else
};
};
The eTypeMD5NonDebug is what apple does: it MD5 checksums only the parts of the file that don't change with debug info or any paths found in debug info or symbols tables. So if you build a binary in /tmp/a or in /private/var/local/foo, the UUID is the same if the binary is essentially the same (code, data, etc).
Then we can make intelligent comparisons between UUID types. Might even be possible for a module to have more than 1 UUID then if a binary contains a eTypeBuildID and a eTypeGNUDebugInfoCRC. If a tool stores its UUIDs as a CRC32 or MD5, then those can be calculated on the fly. The GetUUID on lldb_private::Module might become:
const lldb_private::UUID &Module::GetUUID(UUID::Type uuid_type);
Thoughts?
Greg
>
> Thanks,
> Lemo.
>
More information about the lldb-dev
mailing list