[PATCH] D96613: [lld] Add options to trace all symbols and to trace all symbols originated from a file

Fri Feb 26 13:11:27 PST 2021

smeenai added a comment.

In D96613#2590948 <https://reviews.llvm.org/D96613#2590948>, @MaskRay wrote:

>> - If you want to integrate the functionality in your build system (e.g. you're trying to generate an aggregated report for all the libraries you build), a pipeline is much harder to integrate into your build system, vs. just adding an argument.
>> - You have to worry about platform differences; e.g., lots of Linux utilities have different arguments or behaviors than their BSD (and therefore macOS) equivalents, and Windows doesn't have these utilities at all.
>
> I think a small set of composable options for build dependency analysis will be useful, but we need to consolidate the requests and think of their composability/maintenance/etc. Apologies but I feel that "Linux utilities have different arguments or behaviors than their BSD" in this particular context is probably a weak argument: here we use LLVM binary utilities and a common 'awk' (which can be reimplemented in build system language straightforwardly). Many downstream groups have bundled LLVM binary utilities as platform-neutral utilities which are already used heavily in build systems. We don't necessarily re-invent features provided by them in LLD.

In D96613#2590976 <https://reviews.llvm.org/D96613#2590976>, @MaskRay wrote:

> For discoverability, if people know that `--trace-symbol`, re-inventing this mechanism is simple. `nm` (llvm-nm) is well-known utility to dump the symbol table. Composing nm and ld.lld together is straightforward.

I agree that we should be thinking about the composability and maintenance burden of new options, and thinking about this in a holistic way (and not just always adding one-off options for each request). At the same time, I disagree with your assessment of how straightforward it is to compose different tools together. Fair point about the `awk` functionality in your command being the same across Unixes, but `awk` still isn't a thing on Windows, and whether it's straightforward or not to reimplement the equivalent functionality in your build system language depends on your build system. Furthermore, I think text processing is inherently fragile in general; it's fair to assume that a tool like `llvm-nm` isn't going to be changing its output format, but you still have to put a lot more thought into setting up an appropriate pipeline than you would into using a built-in option.

Case in point: you're using the following invocation:

  llvm-nm -Du /usr/lib64/libc.so.6 | awk '{print "-y"substr($0,20)}'

I don't know why this is, but both GNU nm and llvm-nm print a different number of leading spaces for 32 vs. 64-bit binaries. If I do `llvm-nm -Du /usr/lib/libc.so.6` (instead of `/usr/lib64/libc.so.6`), I need to change the substr amount to `12` instead of `20`. I could use `$2` (as in the second field) instead, but that'll break if my symbol name has spaces in it (which definitely occurs for Objective-C, at least). (Incidentally, you're not putting quotes around your `-y` command, so it'll also break if the symbol name has spaces in it, which is one of the generally fragile things about shell pipelines.) Fortunately, `llvm-nm` has a `--just-symbol-name` option (which at least my version of GNU nm doesn't), which'll work regardless of the output format for the particular architecture, and handle symbol names with spaces without any issues. There's clear value to having the `--just-symbol-name` flag IMO, even though you could theoretically emulate the same functionality with a shell pipeline, since it lets you not have to worry about all these caveats.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96613/new/

https://reviews.llvm.org/D96613