[PATCH] D149759: [symbolizer] Support symbol lookup

Wed Jul 5 01:45:34 PDT 2023

jhenderson added inline comments.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h:69
   int SourceContextLines;
+  bool IsAddr2Line;
 };
----------------
I'm thinking this might be better named `IsGNUStyle`, since llvm-symbolizer has an `--output-style` option, and this variable controls how the output is formatted.

================
Comment at: llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp:357
+  std::vector<object::SectionedAddress> Result;
+  for (const SymbolDesc &Sym : Symbols)
+    if (Sym.Name.equals(Symbol)) {
----------------
Nit: This should probably have braces, as it is a non-trivial statement (even if it is only a single statement technically). See https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements for more details.

================
Comment at: llvm/test/tools/llvm-symbolizer/Inputs/addr2.inp:1
+0some text
+0x40054d
----------------
Please don't add a separate input file that is only used by one test.

The llvm-symbolizer tests could do with a bit of cleaning-up. Using a separate input file for text input is one such example.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-empty-line.test:12
 RUN: llvm-symbolizer --output-style=GNU -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
-RUN:   | FileCheck %s --check-prefix=GNU
+RUN:   | FileCheck %s --check-prefix=GNUOUT

----------------
I don't think these check-suffixes are particularly understandable any more, given the changed behaviour. I'd suggest instead renaming them all and moving the check patterns to be immediately after the group that uses them. Something like:

```
RUN: llvm-symbolizer -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
RUN:   | FileCheck %s --check-prefix=SYMB-LLVM

RUN: llvm-symbolizer --output-style=LLVM -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
RUN:   | FileCheck %s --check-prefix=SYMB-LLVM

SYMB-LLVM: x.c:14:0
SYMB-LLVM-EMPTY:
SYMB-LLVM-NEXT: some text2

RUN: llvm-symbolizer --output-style=GNU -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
RUN:   | FileCheck %s --check-prefix=SYMB-GNU

SYMB-GNU: x.c:14
SYMB-GNU-NEXT: some text2

RUN: llvm-addr2line -i -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
RUN:   | FileCheck %s --check-prefix=ADDR-GNU

RUN: llvm-addr2line --output-style=GNU -i -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
RUN:   | FileCheck %s --check-prefix=ADDR-GNU

ADDR-GNU: x.c:14
ADDR-GNU-NEXT: ??:0

RUN: llvm-addr2line --output-style=LLVM -i -e %p/Inputs/addr.exe < %p/Inputs/addr.inp \
RUN:   | FileCheck %s --check-prefix=ADDR-LLVM

ADDR-LLVM: x.c:14:0
ADDR-LLVM-EMPTY:
ADDR-LLVM-NEXT: ??:0
```

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:49
 ## Invalid first argument before any valid one.
-# INLINE-A2L:{"Error":{"Message":"unable to parse arguments: some text"},"ModuleName":"{{.*}}/Inputs/addr.exe"}
+# INLINE-A2L:{"Error":{"Message":"unable to parse arguments: 0some text"},"ModuleName":"{{.*}}/Inputs/addr.exe"}
 ## Resolve valid address.
----------------
I think you could make this a little more self descriptive, by changing your invalid input to something like "0 not a symbol name or number". That being said, the intent of this test, I think, was to use the exact same set of inputs as one that doesn't use JSON style.

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:1
+# This test uses ELF shared object `Inputs/symbols.so` built for x86_64 using
+# the instructions from `Inputs/symbols.h`.
----------------
You're using a weird mix of comment markers in this file. The standard rules in newer tools tests are
  - `##` for actual comments;
  - `#` for RUN and CHECK lines;
  - all comment markers should have a space between them and the rest of the line (e.g. `# CHECK:` or `## This is a comment`).
  - optionally, if the test consists solely of things prefixed with a comment marker, you can drop one `#` from each of the above, but it's more common to have them than not.

In addition, this test should have a comment explaining what the test is testing, probably before this comment.

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:4
+
+RUN: llvm-addr2line --obj=%p/Inputs/symbols.so "CODE func_01" | FileCheck --check-prefix=PRFUNC %s
+RUN: llvm-symbolizer --obj=%p/Inputs/symbols.so "CODE func_01" | FileCheck --check-prefix=PRFUNC %s
----------------
I can't tell what "PRFUNC" is supposed to stand for. It might be useful to add a comment before each test case explaining what that specific case is testing.

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:9
+RUN: llvm-addr2line -e %p/Inputs/symbols.so func_01 | FileCheck --check-prefix=A2L0 %s
+RUN: llvm-addr2line -e %p/Inputs/symbols.so func_01 | FileCheck --check-prefix=A2L0 %s
+#A2L0: /tmp/dbginfo{{[/\]+}}symbols.part1.cpp:12
----------------
Why is this test case epeated twice?

================
Comment at: llvm/test/tools/llvm-symbolizer/symbol-search.test:17
+RUN: llvm-addr2line --obj=%p/Inputs/symbols.so func_666 | FileCheck --check-prefix=NONEXISTENT %s
+#NONEXISTENT: ??
+
----------------
It would be better for both "nonexistent" cases to have a suffix, rather than just the llvm-symbolizer one.

================
Comment at: llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:204
+
+  // Parse address, which can be specified as an offset in module or as a
+  // symbol.
----------------

================
Comment at: llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:211
   // "0x" or "0X" prefix; do the same for compatibility.
+  bool StartsWithHexPrefix = false;
   if (IsAddr2Line)
----------------
This whole area of code could do with some blank lines to aid readability. I would suggest that you have them between each group of related lines, where "related" means a comment, one if statement, and any lines strongly related to that if statement (e.g. variable declarations). For example, I'd add a line before the comment about 0x prefixes, and another one after the if's closing brace.

================
Comment at: llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:471
   Config.Verbose = Args.hasArg(OPT_verbose);
+  Config.IsAddr2Line = IsAddr2Line;

----------------
This should be controlled by the output-style command-line option (see my comment elsewhere).

================
Comment at: llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:227-229
+  // Address specification is a symbol if addr2line compatibility mode in on.
+  // Otherwise treat it as an error for compatibility with previous versions of
+  // llvm-symbolizer.
----------------
ikudrin wrote:
> I don't think we need a new point of divergence here. It's unlikely that anyone would rely on the tool to generate an error for such an input.
+1: I don't think we need to handle invalid input differently between llvm-symbolizer and llvm-addr2line specifically, as long as the updated behaviour makes sense.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149759/new/

https://reviews.llvm.org/D149759