[PATCH] D105985: Support GSYM in llvm-symbolizer.

Thu Jul 15 01:38:40 PDT 2021

simon.giesecke added inline comments.

================
Comment at: llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp:25
+                                     DILineInfo &LineInfo) {
+  LineInfo.FunctionName = static_cast<std::string>(Location.Name);
+
----------------
clayborg wrote:
> If Specifier.FNKind is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name. Demangling can be tricky, as there are many name manglings. Not sure if there is a one stop shop to demangle all sorts of names in LLVM.
Hm, what do you suggest to do here then? I think I am not knowledgeable enough to implement the demangling without further guidance.

Can we leave that to future work? In that case, should we fail here or fall back to not demangling, for now?

FWIW, I found that `PDBContext` returns the empty string in case of `DINameKind::None`.

================
Comment at: llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp:75-76
+
+  // TODO Or should we set LineInfo.FunctionName = Result.FuncName? Does that
+  // make a difference?
+  if (!Result.Locations.empty()) {
----------------
clayborg wrote:
> So the "Result.FuncName" is the name of the concrete function or symbol if there was no debug info for a symbol in the symbol table. So you only want to use this if "Result.Locations" is empty. If "Result.locations" is not empty, you don't want to use that because the "Result.Locations.front()" might point to a inlined function. The name in the SourceLocation will be the inlined function name. 
> 
> The main question is what other symbolizers do in this case. Lets say we are in the concrete function "main" and the "Result.Locations.front()" points to something line "std::vector<int>::empty()" at "/.../vector:123", do other symbolizers create a name the includes the inlined function + the concrete function? I would find an address in a DWARF file that points to an inlined function and symbolize it using the DWARF and see what gets output for the DWARF. 
> 
> The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address?
> 
> But seeing as below we get the inlining information in GsymDIContext::getInliningInfoForAddress(...), maybe we should always be returning the Result.FuncName? It really depends on what the other symbolizers do. I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match.
> 
> 
> The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address?

Well, there's `getInliningInfoForAddress` which returns all inline locations, which is used when the `-inlines` option is set.

When it is not set, I implemented the same behaviour here as the DWARF implementation as checked by `sym.test`. I don't know what's the rationale for that.

> I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match.

That's basically what I did. I took the existing `addr.exe` from the `sym.test` test case, converted that to GSYM and stripped DWARF from the object file, and ensured that the test cases create the same output, which they do, except for the missing column information. This covers the inline case. Not sure if there are any tests that check consistency of the DWARF vs. PDB behaviour here.

================
Comment at: llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp:97
+
+  // FIXME Implement this as well. It's not used by Symbolize though.
+
----------------
clayborg wrote:
> If this isn't used, is there a reason we are converting it?
Well, this is an implementation of the `DIContext` interface, and ideally we would have a full implementation of the interface (or we should at least have a way to indicate that we only have a partial implementation?). The function is called in the context of JIT, by several event listeners and from `llvm-rtdlyd` right now. I wonder if GSYM support is required there, but if only DWARF is supported there anyway, why does that use the format-neutral DIContext interface?

I didn't intend to implement this as part of this patch in any case.

================
Comment at: llvm/test/tools/llvm-symbolizer/sym-gsymonly.test:1
+#Source:
+##include <stdio.h>
----------------
jhenderson wrote:
> Probably worth a top-level comment explaining what this test is supposed to be testing.
> 
> Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning.
> Probably worth a top-level comment explaining what this test is supposed to be testing.

Definitely, I'll add that.

> Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning.

"gsymonly" refers to the fact that the binary doesn't have DWARF debug info but only a corresponding GSYM file.

Another case would be that we have both a GSYM file and DWARF debug info. This should probably be tested as well.

================
Comment at: llvm/test/tools/llvm-symbolizer/sym-gsymonly.test:20
+
+RUN: llvm-symbolizer -print-address -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp | FileCheck %s
+RUN: llvm-symbolizer -addresses -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp | FileCheck %s
----------------
jhenderson wrote:
> Nit: I'd normalise the comment markers throughout this test, using the following two rules:
> 
> 1) True comments start with `##` (followed by a space). Applies for the source above. This helps distinguish the test details from actual comments.
> 2) Add `#` as a comment marker to all RUN lines, much as we do in the vast majority of newer tests. Follow it with a space again.
> 3) CHECK and equivalent lines should be `# CHECK` (with space).
> 
> Finally, as noted out-of-line, I've not looked at the implementation, but do you need all these individual test-cases for gsym testing? Most of them look more like they're testing generic symbolizer behaviour, which is therefore already covered elsewhere.
As I mentioned above, this is indeed a copy of sym.test, using a different input binary. The results should be the same, except for the fact that we don't get any column information from GSYM.

I checked the list of test cases again, and arguably some tests seem to only test the formatting of the output, and this is somehow orthogonal to the data source used for symbolication. But at least some are passed into the `DIContext` implementation (inlining, basename, ...), and I am not sure if we should make assumptions on the implementation detail of which command line options interact with the `DIContext` implementation at this level.

If you think specific test cases should be removed here, please suggest them.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105985/new/

https://reviews.llvm.org/D105985