<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">I am fine with all the above except some reservations about case C. No need to calculate something if it isn't useful. For case C it should be fine to never match as if a file has a UUID to begin with it typically isn't something that gets stripped in a stripped binary. So we should either have it or not. If breakpad does calculate a CRC32, then we need to know to ignore the UUID. The problem is we probably won't be able to tell what the UUID is: real from build ID, or from GNU debug info CRC, or CRC of entire file. So the minidump code will need to do something here. If a minidump has the linux auxv and memory map in them, then we might need to dig through the section information and deduce if a file matches or not based off the size of mapped program headers to further help with the matching.<span> </span></span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">One other idea is to make a set of enumerations for the UUID type:</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">class UUID {</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> enum class Type {</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> BuildID, // A build ID from the compiler or linker</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> GNUDebugInfoCRC, // GNU debug info CRC</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> MD5, // MD5 of entire file</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> MD5NonDebug, // MD5 of the non debug info related bits</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> CRC32, // CRC32 of entire file</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> Other, // Anything else</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"> };</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">};</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">The eTypeMD5NonDebug is what apple does: it MD5 checksums only the parts of the file that don't change with debug info or any paths found in debug info or symbols tables. So if you build a binary in /tmp/a or in /private/var/local/foo, the UUID is the same if the binary is essentially the same (code, data, etc).</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Then we can make intelligent comparisons between UUID types. Might even be possible for a module to have more than 1 UUID then if a binary contains a eTypeBuildID and a eTypeGNUDebugInfoCRC. If a tool stores its UUIDs as a CRC32 or MD5, then those can be calculated on the fly. The GetUUID on lldb_private::Module might become:</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">const lldb_private::UUID &Module::GetUUID(UUID::Type uuid_type);</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Thoughts?</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">Greg</span></blockquote><div><br></div><div>I like the idea of UUID sub-types! This solves the problem is a more generic fashion and it's also extensible. Interestingly enough, for Crashpad we're considering something similar (the fabricated UUIDs would have a different CvRecord signature)</div><div><br></div><div>Case C. is a bit ugly so let me elaborate: this is specific to Breakpad minidump + ELF binaries w/o build-id. So we'd still need to have a way to force the match of the modules in the minidump with the local files. This ought to be an off-by-default, sharp tool which you'd only need to deal with Breakpad minidumps, and even then it would only be a fall-back that must be explicitly requested.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">1. .gnu_debuglink separate file pointer</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><</span><a href="https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html" rel="noreferrer" target="_blank" style="color:rgb(17,85,204);font-size:12.8px;background-color:rgb(255,255,255)">https://sourceware.org/gdb/<wbr>onlinedocs/gdb/Separate-Debug-<wbr>Files.html</a><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">>.</span><br style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">This is where the choice of the crc algorithm comes from.</span></blockquote><div><br></div><div>Thanks Pavel. As you noted yourself, this doesn't mean that the UUID has to be tied to the checksum (they are exclusive options). In particular, for performance reasons I think it's desirable to avoid calculating checksums for every ELF module just in case.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">I think we might have something already which could serve this<br></span><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">purpose. Eugene added a couple of months ago a mechanism to force-load<br></span><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">symbols for a given file regardless of the UUIDs<br></span><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><</span><a href="https://reviews.llvm.org/D35607" rel="noreferrer" target="_blank" style="color:rgb(17,85,204);font-size:12.8px;background-color:rgb(255,255,255)">https://reviews.llvm.org/<wbr>D35607</a><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">>. It requires an explicit "target<br></span><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">symbols add" command (which seems reasonable, as lldb has no way to<br></span><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">tell if it's doing things right). Would something like that work for<br></span><span style="font-size:12.8px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">you?</span></blockquote><div><br></div><div>Nice. We may have to update the C++ API, but something like this would do the trick for case C.</div><div><br></div><div>To summarize the conversation so far:</div><div><br></div><div>1. We can fix cases A, B independent of C: if real UUIDs are missing, don't automatically use full file CRC32 as UUID.</div><div>2. Pay attention to make sure that we don't break .gnu_debuglink or remote debugging (thanks Pavel)</div><div>3. Multiple types/namespaces for UUIDs would be a great feature!</div><div>4. Eugene's symbol forcing trick could be extended to handle case C</div><div><br></div><div>Did I miss anything?</div><div><br></div><div>My current plan is to start with #1, then look into #4 (force symbols match).</div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Aug 5, 2018 at 12:11 PM, Pavel Labath <span dir="ltr"><<a href="mailto:labath@google.com" target="_blank">labath@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello Leonard,<br>
<br>
while I'm in principle supportive of this idea, I think it's not going<br>
to be as easy as you might imagine. There are currently at least two<br>
mechanisms which rely on this crc UUID.<br>
<br>
1. .gnu_debuglink separate file pointer<br>
<<a href="https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html" rel="noreferrer" target="_blank">https://sourceware.org/gdb/<wbr>onlinedocs/gdb/Separate-Debug-<wbr>Files.html</a>>.<br>
This is where the choice of the crc algorithm comes from.<br>
<br>
In short, this mechanism for debug info location works like this: The<br>
stripped file contains a .gnu_debuglink section. The section contains<br>
a file path and a crc checksum. After reading this section the<br>
debugger is expected to look for the file at the given path, and then<br>
compute it's checksum to verify it is indeed the correct file (hasn't<br>
been modified).<br>
<br>
In LLDB, this is implemented somewhat differently. First we have a<br>
mechanism for assigning UUIDs to modules. This mechanism does one of<br>
two things (I'm assuming here none of the files have proper build-ids<br>
in them). If the file has a .gnu_debuglink section (the stripped<br>
file), we fetch the CRC from there, and assign that as the build-id of<br>
the file. If the file doesn't have this section, we **assume** it is<br>
going to be a target of gnu_debuglink pointer in another file, compute<br>
its CRC via the specified algorithm, and assign that as the build-id.<br>
<br>
Second, we have a mechanism for locating the unstripped file. This<br>
just fetches the path from gnu_debuglink, and compares the UUIDs of<br>
the two modules. This works because of the UUID algorithm in the first<br>
part.<br>
<br>
Now, this is not a particularly smart way of doing things and it is<br>
the cause of the most of your problems. However, it is completely<br>
avoidable. Instead of piggy-backing on the UUID mechanism, we should<br>
just change the search mechanism (the second part) to compute the crc<br>
itself, and only after it successfully finds the file referenced by<br>
the gnu_debuglink section. This way, we will only compute the crc only<br>
when absolutely necessary (i.e. for your use case, never).<br>
<br>
<br>
2. Currently, for remote debugging, we assume that each module has<br>
some sort of a unique identifier which we can check to see whether we<br>
have downloaded that file already (see qModuleInfo packet). I am not<br>
sure what would happen if we suddenly just stopped computing a UUID<br>
for these files, but we at the very least lose the ability to detect<br>
changes to the remote files. For this item, I am not sure what would<br>
be the best course of action. Maybe we should just start relying on<br>
modification timestamp for these files?<br>
<span class=""><br>
On Sat, 4 Aug 2018 at 02:17, Leonard Mosescu <<a href="mailto:mosescu@google.com">mosescu@google.com</a>> wrote:<br>
><br>
> Greg, Mark,<br>
><br>
> Looking at the code, LLDB falls back to a full file crc32 to create the module UUID if the ELF build-id is missing. This works, in the sense that the generated UUID does indeed identify the module.<br>
><br>
> But there are a few problems with this approach:<br>
><br>
> 1. First, runtime performance: a full file crc32 is a terribly inefficient way to generate a temporary UUID that is basically just used to match a local file to itself.<br>
> - especially when some unstripped binaries can be very large. for example a local chromium build produces a 5.3Gb chrome binary<br>
> - the crc32 implementation is decent, but single-threaded<br>
> - to add insult to the injury, it seems a small bug defeats the intention to cache the hash value so it ends up being recalculated multiple times<br>
><br>
> 2. The fake UUID is not going to match any external UUID that may be floating around (and yet not properly embedded into the binary)<br>
> - an example is Breakpad, which unfortunately also attempts to make up UUIDs when the build-id is missing (something we'll hopefully fix soon)<br>
><br>
> Is there a fundamental reason to calculate the full file crc32? If not I propose to improve this based on the following observations:<br>
><br>
> A. Model the reality more accurately: an ELF w/o a build-id doesn't really have an UUID. So use a zero-length UUID in LLDB.<br>
> B. The full file name should be enough to prove the identity of a local module.<br>
> C. If we try to match an external UUID (ex. from a minidump) with a local file which does not have an UUID it may help to have an option to allow it to match (off by default, and only if there's no better match)<br>
<br>
</span>I think we might have something already which could serve this<br>
purpose. Eugene added a couple of months ago a mechanism to force-load<br>
symbols for a given file regardless of the UUIDs<br>
<<a href="https://reviews.llvm.org/D35607" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D35607</a>>. It requires an explicit "target<br>
symbols add" command (which seems reasonable, as lldb has no way to<br>
tell if it's doing things right). Would something like that work for<br>
you?<br>
<br>
cheers,<br>
pl<br>
</blockquote></div><br></div>