[PATCH] D105985: Support GSYM in llvm-symbolizer.

Fri Jul 16 01:32:08 PDT 2021

jhenderson added a comment.

In D105985#2882467 <https://reviews.llvm.org/D105985#2882467>, @simon.giesecke wrote:

> In D105985#2879253 <https://reviews.llvm.org/D105985#2879253>, @jhenderson wrote:
>
>> 1. Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).
>
> I checked llvm/docs/CommandGuide/llvm-symbolizer.rst, and it doesn't say anything right now about how the data source is selected. Maybe it should, and I can add something, but then this should probably be more than just saying that GSYM is preferred when it's present. I think there's similar logic for .dwp files?

I don't think we need to go into that in this change. I guess the point of the llvm-symbolizer documentation is more about how to use the tool, not what does the tool do under-the-hood, but I thought it was worth checking.

In D105985#2879285 <https://reviews.llvm.org/D105985#2879285>, @simon.giesecke wrote:

>> 2. Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.
>
> The sym-gsymonly test is mostly a copy of the sym test, and the binary is just the stripped binary. These additional files don't need to be part of the repository, we could run `llvm-gsymutil --convert` and strip DWARF from it as part of the test.
>
> If this should be migrated to remove the use of the canned binary entirely, I fear I am not knowledgable enough on the test infrastructure to do that.
>
> This should probably have been stated in the test file.

I don't know how gsym data is actually generated, so some of these suggestions may not make much sense.
Option 1) Add the test to the new cross-project-tests test directory. This was created for tests that rely on clang/lld or whatever to generate valid test inputs from source, without the need for a canned binary. You'd include the source and build it directly at run-time. This would only work though if your test addresses are unlikely to change as changes occur in the compiler/linker. llvm-symbolizer was one of the primary motivations for this actually.
Option 2) Check in the assembly required to build the input, and use llvm-mc and other tools (llvm-strip/llvm-gsymutil etc) to turn it into the desired output at runtime.
Option 3) Generate the gsym data directly at run-time somehow.

================
Comment at: llvm/test/tools/llvm-symbolizer/sym-gsymonly.test:1-3
+# This test is a variant of sym.test. It uses a binary without DWARF debug
+# info, but a corresponding .gsym file. The expectations are the same, except
+# for the fact that GSYM doesn't provide us with column numbers.
----------------
I'd avoid referencing other tests as part of this comment: sym.test has been on my wishlist as something I'd love to rewrite if I had the time, due to it's usage of precanned binaries, conflation of testing of multiple unrelated options and general vagueness of the test intent.

Ideally, we'd avoid mirroring it entirely, in favour of testing the things that need to be tested specifically for this format.

================
Comment at: llvm/test/tools/llvm-symbolizer/sym-gsymonly.test:20
+
+RUN: llvm-symbolizer -print-address -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp | FileCheck %s
+RUN: llvm-symbolizer -addresses -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp | FileCheck %s
----------------
simon.giesecke wrote:
> jhenderson wrote:
> > Nit: I'd normalise the comment markers throughout this test, using the following two rules:
> > 
> > 1) True comments start with `##` (followed by a space). Applies for the source above. This helps distinguish the test details from actual comments.
> > 2) Add `#` as a comment marker to all RUN lines, much as we do in the vast majority of newer tests. Follow it with a space again.
> > 3) CHECK and equivalent lines should be `# CHECK` (with space).
> > 
> > Finally, as noted out-of-line, I've not looked at the implementation, but do you need all these individual test-cases for gsym testing? Most of them look more like they're testing generic symbolizer behaviour, which is therefore already covered elsewhere.
> As I mentioned above, this is indeed a copy of sym.test, using a different input binary. The results should be the same, except for the fact that we don't get any column information from GSYM.
> 
> I checked the list of test cases again, and arguably some tests seem to only test the formatting of the output, and this is somehow orthogonal to the data source used for symbolication. But at least some are passed into the `DIContext` implementation (inlining, basename, ...), and I am not sure if we should make assumptions on the implementation detail of which command line options interact with the `DIContext` implementation at this level.
> 
> If you think specific test cases should be removed here, please suggest them.
I'm a bit constrained on time at the moment, so can't really dig into the source to look at what makes sense to test. Perhaps @clayborg has some suggestions there. I think it's reasonable to make assumptions based on the current implementation, about what is orthoganol and what isn't. If people refactor the area, to change where information is retrieved, it's probably reasonable to expect them to ensure test coverage is still sufficient, but I don't really know.

You certainly should be able to drop the llvm-addr2line cases: llvm-addr2line is basically llvm-symbolizer with some different defaults on the options, and one or two minor formatting differences. The underlying source of information is irrelevant. Similarly, you can drop testing that different aliases mean the same thing, as these are tested elsewhere.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105985/new/

https://reviews.llvm.org/D105985