[PATCH] D96883: Add support for JSON output style to llvm-symbolizer

Thu Apr 22 15:13:24 PDT 2021

aorlov marked 11 inline comments as done.
aorlov added inline comments.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:11
+
+# NO-FILE:[{"Address":"0x0","Error":{"Message":"[[MSG]]"},"ModuleName":"{{.*}}/no-file.exe"}]
+
----------------
jhenderson wrote:
> Rather than use `{{.*}}/no-file.exe` here and similar situations for paths below, I'd recommend using FileCheck's `-D` option to ensure the correct path is printed, i.e:
> ```
> # RUN:   FileCheck %s ... -DFILE=%p/no-file.exe
> 
> # NO-FILE: ... "ModuleName":"[[FILE]]" ...
> ```
Unfortunately -D would not work in this case, as JSON has own rules for escaping special symbols in paths.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:15
+# RUN: llvm-symbolizer --output-style=JSON --no-inlines -e %p/Inputs/addr.exe 1000000000 | \
+# RUN:   FileCheck %s --check-prefix=NOT-FOUND-1 --strict-whitespace --match-full-lines --implicit-check-not={{.}}
+
----------------
jhenderson wrote:
> Maybe rather than `NOT-FOUND-1` and `NOT-FOUND-2`, it would be more self-descriptive using `NOT-FOUND-NOINLINES` and `NOT-FOUND-INLINES` or something like that?
It is not applicable anymore.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:17
+
+# NOT-FOUND-1:[{"Address":"0x3b9aca00","ModuleName":"{{.*}}/Inputs/addr.exe","Symbol":[{"Column":0,"Discriminator":0,"FileName":"","FunctionName":"","Line":0,"Source":"","StartFileName":"","StartLine":0}]}]
+
----------------
jhenderson wrote:
> Just thinking about usability - what is the canonical way for users to know that their address hasn't been found in JSON output? It might be worth documenting this somewhere.
Yes. This is the Symbolize library design issue. As far as I can tell there is no reliable way to get a distinct result for the symbol not found case. 
For now I just keep it transparent by serializing whatever the library returns, as addressing the library issue is out of the scope of this patch.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:32
+## Resolve valid address, no-inlines.
+# NO-INLINES-NEXT:{"Address":"0x40054d","ModuleName":"{{.*}}/Inputs/addr.exe","Symbol":[{"Column":3,"Discriminator":0,"FileName":"/tmp{{/|\\\\}}x.c","FunctionName":"main","Line":3,"Source":"","StartFileName":"/tmp{{/|\\\\}}x.c","StartLine":2}]}
+
----------------
jhenderson wrote:
> It would be nice to have one or more test cases with a non-zero discriminator value. Also, where the "Source" parameter is non-empty.
> 
> You can simplify `{{/|\\\\}}` to this: `{{[\\/]}}`. I think it's slightly more readable due to avoiding the quadruple backslash. Same goes elsewhere below.
I have added the test for a non-zero discriminator. Note the address is hardcoded. It is more correct to use /Inputs/discrim.inp via stdin, but I just copied the address from discriminator.test.
It seems currently we have no binaries in llvm/test/tools/llvm-symbolizer/Inputs with a valid Source info.
> You can simplify
No, it does not work because JSON has own rules for escaping special symbols in paths.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-code.test:49
+
+## Also check the last 3 test cases with llvm-adr2line. The expected result is the same but missing the FunctionName.
+# RUN: llvm-addr2line --output-style=JSON -i -e %p/Inputs/addr.exe < %p/Inputs/addr.inp | \
----------------
jhenderson wrote:
> I guess the obvious question is: why do we have a difference in behaviour surrounding the `FunctionName` attribute? The motivation for different symbolizer/addr2line output is because llvm-addr2line needs to be compatible with GNU addr2line, but that principle doesn't apply for JSON output (which AFAIK is not a GNU addr2line supported feature).
This behavior is a part of llvm-symbolizer and does not depend on the output styles and does not belong to the printer (look at decideHowToPrintFunctions()). 
Not sure if I understand what you are saying. Changing the behavior is out of the scope of this patch. 
I agree that it does not make much sense in testing that, but one of the reviewers requested these tests.

================
Comment at: llvm/test/tools/llvm-symbolizer/output-style-json-frame.test:41
+##
+## clang++ --target=i386-linux-gnu frame.cpp -g -std=c++11 -S -o frame.s 
+ 
----------------
jhenderson wrote:
> grimar wrote:
> > aorlov wrote:
> > > grimar wrote:
> > > > Adding an exact clang version used might be helpfull.
> > > You can see the exact clang version 13.0.0 at the end of the generated code.
> > You mean the `info_string0`?
> > 
> > ```
> > .Linfo_string0:
> > 	.asciz	"clang version 13.0.0" # string offset=0
> > ```
> > 
> > It is actually a string that is unimprortant for the test. I.e. it is a piece of input asm that
> > can be just removed like you did for other unused sections already and nothing should change.
> > 
> > It is a bit strange to force user to read an assembly to find out how it was generated instead
> > of reading the comment that has an intention to descibe it.
> It looks like me like this hasn't been addressed?
Intention was to break a dependency on DWARF generation in clang. But I have changed it build from the C source to keep it simple for reading.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96883/new/

https://reviews.llvm.org/D96883