[PATCH] D96883: Add support for JSON output style to llvm-symbolizer

Thu Apr 22 02:14:27 PDT 2021

jhenderson added a comment.

I've reviewed the test cases more thoroughly today. I think you need test cases where the addresses are specified on the command-line rather than via stdin, because there's a behaviour difference (objects are in list or not as the case may be).

================
Comment at: llvm/docs/CommandGuide/llvm-symbolizer.rst:256

+  ``JSON`` style provides a machine readable output in JSON.
+
----------------
Perhaps worth briefly mentioning the behaviour changes for the following cases:
  # Stdin versus addresses on command-line (i.e. separate individual objects versus list of objects)
  # `--pretty-print` versus no `--pretty-print`. Optionally, this could be under the `--pretty-print` option discussion.

I don't think you need concrete examples of the differences. Just a one/two sentence description.

================
Comment at: llvm/docs/CommandGuide/llvm-symbolizer.rst:279

+    $ llvm-symbolizer --output-style=JSON --obj=inlined.elf 0x4004be 0x400486
+    [{"Address":"0x4004be","ModuleName":"inlined.elf","Symbol":[
----------------
Maybe use the `--pretty-print` option to make this example a bit more readable?

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h:126
+  bool Pretty;
+  std::unique_ptr<json::Array> List;
+
----------------
I guess it might be helpful to know what this list is of (e.g. `ObjectList` or whatever).

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:3-9
+## Handle symbolize library error - file does not exist, no-inlines.
+# RUN: llvm-symbolizer --output-style=JSON --no-inlines -e %p/no-file.exe 0 | \
+# RUN:   FileCheck %s -DMSG=%errc_ENOENT --check-prefix=NO-FILE --strict-whitespace --match-full-lines --implicit-check-not={{.}}
+
+## Handle symbolize library error - file does not exist, inline.
+# RUN: llvm-symbolizer --output-style=JSON -e %p/no-file.exe 0 | \
+# RUN:   FileCheck %s -DMSG=%errc_ENOENT --check-prefix=NO-FILE --strict-whitespace --match-full-lines --implicit-check-not={{.}}
----------------
My understanding is that the `--no-inlines` option has no impact on the output here, so I think you can drop one of these cases?

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:11
+
+# NO-FILE:[{"Address":"0x0","Error":{"Message":"[[MSG]]"},"ModuleName":"{{.*}}/no-file.exe"}]
+
----------------
Rather than use `{{.*}}/no-file.exe` here and similar situations for paths below, I'd recommend using FileCheck's `-D` option to ensure the correct path is printed, i.e:
```
# RUN:   FileCheck %s ... -DFILE=%p/no-file.exe

# NO-FILE: ... "ModuleName":"[[FILE]]" ...
```

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:15
+# RUN: llvm-symbolizer --output-style=JSON --no-inlines -e %p/Inputs/addr.exe 1000000000 | \
+# RUN:   FileCheck %s --check-prefix=NOT-FOUND-1 --strict-whitespace --match-full-lines --implicit-check-not={{.}}
+
----------------
Maybe rather than `NOT-FOUND-1` and `NOT-FOUND-2`, it would be more self-descriptive using `NOT-FOUND-NOINLINES` and `NOT-FOUND-INLINES` or something like that?

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:17
+
+# NOT-FOUND-1:[{"Address":"0x3b9aca00","ModuleName":"{{.*}}/Inputs/addr.exe","Symbol":[{"Column":0,"Discriminator":0,"FileName":"","FunctionName":"","Line":0,"Source":"","StartFileName":"","StartLine":0}]}]
+
----------------
Just thinking about usability - what is the canonical way for users to know that their address hasn't been found in JSON output? It might be worth documenting this somewhere.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:25
+
+# RUN: llvm-symbolizer --output-style=JSON --no-inlines -e %p/Inputs/addr.exe < %p/Inputs/addr.inp | \
+# RUN:   FileCheck %s --check-prefix=NO-INLINES --strict-whitespace --match-full-lines --implicit-check-not={{.}}
----------------
Consider a comment for this case, introducing the following set of test cases. In particular, it's important to note that this is using a stdin argument. You could also move all the references to "no-inlines" to one place. I.e. something like "this test case is testing stdin input, with the --no-inlines option" would do the trick.

Same sort of thing goes below.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:28
+
+## Invalid first argument before any valid one, no-inlines
+# NO-INLINES:{"Address":"0x0","Error":{"Message":"unable to parse arguments: some text"},"ModuleName":"{{.*}}/Inputs/addr.exe"}
----------------
Nit: missing full stop.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:29
+## Invalid first argument before any valid one, no-inlines
+# NO-INLINES:{"Address":"0x0","Error":{"Message":"unable to parse arguments: some text"},"ModuleName":"{{.*}}/Inputs/addr.exe"}
+
----------------
I find the `"Address":"0x0"` bit here somewhat confusing, given this is an error case. I'd consider omitting it entirely. Alternatively, simply print what the input address was specified as (in this case `"Address":"some text"`).

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:32
+## Resolve valid address, no-inlines.
+# NO-INLINES-NEXT:{"Address":"0x40054d","ModuleName":"{{.*}}/Inputs/addr.exe","Symbol":[{"Column":3,"Discriminator":0,"FileName":"/tmp{{/|\\\\}}x.c","FunctionName":"main","Line":3,"Source":"","StartFileName":"/tmp{{/|\\\\}}x.c","StartLine":2}]}
+
----------------
It would be nice to have one or more test cases with a non-zero discriminator value. Also, where the "Source" parameter is non-empty.

You can simplify `{{/|\\\\}}` to this: `{{[\\/]}}`. I think it's slightly more readable due to avoiding the quadruple backslash. Same goes elsewhere below.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:49
+
+## Also check the last 3 test cases with llvm-adr2line. The expected result is the same but missing the FunctionName.
+# RUN: llvm-addr2line --output-style=JSON -i -e %p/Inputs/addr.exe < %p/Inputs/addr.inp | \
----------------
I guess the obvious question is: why do we have a difference in behaviour surrounding the `FunctionName` attribute? The motivation for different symbolizer/addr2line output is because llvm-addr2line needs to be compatible with GNU addr2line, but that principle doesn't apply for JSON output (which AFAIK is not a GNU addr2line supported feature).

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:53
+
+## Invalid first argument before any valid one, inlines
+# INLINE-A2L:{"Address":"0x0","Error":{"Message":"unable to parse arguments: some text"},"ModuleName":"{{.*}}/Inputs/addr.exe"}
----------------
Nit: missing full stop.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-data.test:32
+## Test multiple addresses in the command line.
+# RUN: llvm-symbolizer -e=%t.o "DATA 0" "DATA 0" --output-style=JSON | \
+# RUN:   FileCheck %s --check-prefix=MULTI --strict-whitespace --match-full-lines --implicit-check-not={{.}}
----------------
I'd make one of these addresses non-zero, as that will show that the "Address" and "Start" parameters are not just always 0.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-frame.test:29
+
+# CHECK:[{"Address":"0x0","Frame":[{"DeclFile":"/tmp/test{{/|\\\\}}frame.cpp","DeclLine":2,"FrameOffset":-1,"FunctionName":"f","Name":"a","Size":"0x1","TagOffset":""},{"DeclFile":"/tmp/test{{/|\\\\}}frame.cpp","DeclLine":3,"FrameOffset":-8,"FunctionName":"f","Name":"b","Size":"0x4","TagOffset":""}],"ModuleName":"{{.*}}.o"}]
+
----------------
I think you need a test case with a non-empty "TagOffset".

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-frame.test:41
+##
+## clang++ --target=i386-linux-gnu frame.cpp -g -std=c++11 -S -o frame.s 
+ 
----------------
grimar wrote:
> aorlov wrote:
> > grimar wrote:
> > > Adding an exact clang version used might be helpfull.
> > You can see the exact clang version 13.0.0 at the end of the generated code.
> You mean the `info_string0`?
> 
> ```
> .Linfo_string0:
> 	.asciz	"clang version 13.0.0" # string offset=0
> ```
> 
> It is actually a string that is unimprortant for the test. I.e. it is a piece of input asm that
> can be just removed like you did for other unused sections already and nothing should change.
> 
> It is a bit strange to force user to read an assembly to find out how it was generated instead
> of reading the comment that has an intention to descibe it.
It looks like me like this hasn't been addressed?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96883/new/

https://reviews.llvm.org/D96883