[PATCH] D118174: [Symbolize][DebugInfoD] Try looking up failed paths as Build IDs.

Daniel Thornburgh via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 27 11:11:51 PST 2022


mysterymath abandoned this revision.
mysterymath added inline comments.


================
Comment at: llvm/lib/DebugInfo/Symbolize/Symbolize.cpp:522
+          reinterpret_cast<const uint8_t *>(Bytes.data()), Bytes.size());
+      if (BuildID.size() != 20)
+        return BinOrErr.takeError();
----------------
phosek wrote:
> mcgrathr wrote:
> > mysterymath wrote:
> > > mysterymath wrote:
> > > > phosek wrote:
> > > > > Build ID doesn't have any prescribed length, and linkers typically support a number of hashing algorithms (see https://github.com/llvm/llvm-project/blob/3704abaa166bec865f56f48337a1732eab77dc68/lld/ELF/Driver.cpp#L764) so this isn't going to work.
> > > > From the link, it looks like the smallest presently supported would be MD5 and UUID, both at 128 bits.
> > > > Given that Build IDs are intended to be globally unique (right?), that seems like a reasonable minimum for this check.
> > > > 
> > > > It should be fairly unlikely to have a filename of >= 16 characters that is a valid hex identifier unless it's some kind of ID; we'd be into
> > > > aBadBeefCafeBabe territory.
> > > Ah, but the --build_id flag can be used to stamp binaries with an arbitrary hex string as part of the build process.
> > > 
> > > So, the possibilities that immediately occur to me:
> > > 1. Keep this as a heuristic check that captures "the usual" build IDs, but then you simply couldn't use this feature in llvm-symbolizer with your own custom build ID schema.
> > > 2. Remove this check entirely, and do lookups on anything that could be a build ID. This could issue spurious lookups on paths like "bad", but only in the case where "bad" isn't a valid binary.
> > > 3. Provide an in-band mechanism in the syntax of llvm-symbolizer for specifying whether the "path" component of a symoblization request is intended to be a path or a build ID. This would need some further design.
> > Build IDs can be of any size, and you've already overlooked one of the common hash algorithms (the one lld uses by default) in attempting to enumerate "expected" sizes, which should be a pretty strong sign that an enumeration of expected sizes might not be the best idea.  It's for reasons like this that I don't think it's the best CLI here to conflate file names with build ID hex strings.  It doesn't seem at all taxing to require a command-line switch to indicate a build ID just as the --obj / --exe / -e switch indicates a file name.  For the stdin syntax, I think a new explicit syntax makes sense as well.  It already accepts the optional "CODE", "DATA", or "FRAME" prefix before "filename address".  So I think it should be fine enough to accept an optional "BUILDID" prefix between the optional "command" prefix and the filename (which given the prefix is interpreted as a build ID).  Or something similar.
> > 
> > AFAICT the only environment variable case is using LLVM_SYMBOLIZER_OPTS to pass command-line switches.  If you can use -e there today, then using the new build ID switch should be just as easy and require no additional work to support.
> I agree with @mcgrathr, having an explicit command-line option and syntax is less error-prone and should be fairly straightforward to implement.
SGTM; I'll abandon this PR then, as the proposed mechanisms wouldn't look anything like this.

Adding another prefix to the syntax may take a bit of refactoring (I'll just have to try it.), but I already have a patch more-or-less ready for a command line switch. I'll send PRs that reference this one.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118174/new/

https://reviews.llvm.org/D118174



More information about the llvm-commits mailing list