[PATCH] D96883: Add support for JSON output style to llvm-symbolizer

Mon Mar 15 01:58:54 PDT 2021

jhenderson requested changes to this revision.
jhenderson added a comment.
This revision now requires changes to proceed.

I'd like to have @MaskRay and @dblaikie comment on this before this gets committed, since they both were involved with the YAML version of this.

Sorry for the delay in getting to this. Having given it a bit of thought, I don't think this approach is a particularly good design, personally. Is the `DIPrinter` only used for the symbolization code? If it is, I think we need a wider refactor of it before adding this new output format, to avoid the need for repeated `if(OutputStyle == JSON)` kind of things. My opinion is to have a separate class per output format (JSON, GNU, LLVM), which share a common interface. These could either all be derivations of DIPrinter itself or a new object that DIPrinter owns (possibly it is templated on it, or possibly there is a separate output style base class they derive from), I'm not sure it matters too much. The ELF dumper for llvm-readobj is a good example of what I mean (it has separate GNU and LLVM versions). It's a bigger piece of work, but I think it makes sense to do, as it'll make the code easier to change in the future, should we either a) add another output format to it (imagine if you wanted to add YAML output style as well as JSON), or b) extend with more printing options.

Some general thoughts about the output style:

- How do you handle multiple different input files? I think it would make a lot of sense to include the input file and address in the JSON. I think we could ignore the corresponding llvm-symbolizer options that enable/disable printing these for other formats.
- How is a parser supposed to distinguish between an error message output for an address and a valid output? It might make sense to include a "success" field or similar, I think.
- Strictly speaking, if multiple input addresses are used, I don't think the following is valid JSON:

  {
    // output for address1
  }
  {
    // output for address2
  }
  ...

I think it should be in a list:

  [
    {
      // output for address1
    },
    {
      // output for address2
    },
    ...
  ]

You could omit the list for a single input address, but I think for consistency purposes, I'd not do that.

- That being said, for interactive mode where addresses are fed on stdin via a user (and not via a file redirected in), we probably want to omit the list aspect, since you won't know when to close the list.

================
Comment at: llvm/docs/CommandGuide/llvm-symbolizer.rst:275

+    $ llvm-symbolizer --output-style=JSON --obj=inlined.elf 0x4004be 0x400486
+    {"Frames":[
----------------
grimar wrote:
> Note, my previous comment says:
> "I am also not sure it is useful to have a version **without** --no-inlines here: GNU and LLVM samples doesn't have it."
> 
> I.e. the JSON sample is slightly inconsistent now with GNU/LLVM. At the same time, as I've mentioned,
> it is not a documentation for `--no-inlines`, so it is perhaps fine with me. May be other people have a more strong opinion
> (I am not sure, why `--no-inlines` was used for GNU/LLVM first of all).
@grimar, see the output for the first of these examples (line 260), which is the same, but without `--no-inlines`. The aim of these examples is to highlight the differences in the GNU and LLVM output, one of which is to do with the `--no-inlines` option. Hence there is also an example highlighting the `discriminator` difference. I don't think we need to highlight this distinction specifically for JSON output, because the output format is completely different anyway.

@aorlov, please move this example to below the second GNU style example immediately below. I'd also drop the `--no-inlines` option too (and update the text to match). See my above comment for why.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h:56
   DIPrinter &operator<<(const DIGlobal &Global);
-  DIPrinter &operator<<(const DILocal &Local);
+  DIPrinter &operator<<(const std::vector<DILocal> &Locals);
+
----------------
Why not `ArrayRef` for `Locals`?

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/DIPrinter.h:58
+
+  void printErrorJSON(const std::string &Msg, const std::error_code &EC);
 };
----------------
This doesn't feel like the right design choice here. I think a better thing to do would be to have a function like `DIPrinter &operator<<(Error E);` which can be used for reporting errors for all output formats.

================
Comment at: llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp:36
+
+void toJSON(json::OStream &J, const DILineInfo &Info) {
+  J.objectBegin();
----------------
Should this function be `static`?

================
Comment at: llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp:48
+    J.attribute("FileName", Info.FileName);
+  // Print only valid line and column to reduce a noise in the output
+  if (Info.Line)
----------------
grimar wrote:
> Missing a full stop after `output`.
I don't think we should be omitting `Line` and `Column` if they are 0. Probably the same goes for `Discriminator`, and maybe even the other fields. The problem is that by omitting them, the parser will have to handle the chance of the elements being missing.

================
Comment at: llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp:250-251

+void DIPrinter::printErrorJSON(const std::string &Msg,
+                               const std::error_code &EC) {
+  json::OStream J(OS);
----------------
As noted above, I think this would be better if it took an `Error`. The implementation would then "handle" the error by extracting the message and storing it in the `Message` field of the JSON output. The error code is probably not useful for anything, so I'd omit it entirely.

================
Comment at: llvm/lib/DebugInfo/Symbolize/DIPrinter.cpp:165
+    J.attribute("Start", int64_t(Global.Start));
+    J.attribute("Size", int64_t(Global.Size));
+    J.objectEnd();
----------------
grimar wrote:
> aorlov wrote:
> > grimar wrote:
> > > Why do you convert `uint64_t` to `int64_t` here and in many places below?
> > There is no json::OStream::attribute() version for uint32_t and uint64_t. 
> > I can use an implicit conversion for uint32_t, but I need to convert uint64_t to int64_t explicitly.
> > 
> But this is simply wrong and might print a wrong result isn't?
> 
> E.g. imagine you do:
> 
> ```
>     uint64_t XXX = 0xffffffffeeeeeeee;
>     J.attribute("Start", int64_t(XXX));
> ```
> 
> The output is:
> 
> ```
> {"Name":"foo","Start":-286331154
> ```
> 
> I think `Start` shouldn't have a negative value.
> Seems you need to update the `json::OStream` implementation to fix it.
> You also need to add a test for such situation(s).
> 
> Also. should `Start`/`Size` be printed as hex? I think it is very common to use hex for addresses/sizes.
> There is no json::OStream::attribute() version for uint32_t and uint64_t. 

You could always add them!

================
Comment at: llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp:165
+      Printer << StringError(std::make_error_code(std::errc::invalid_argument),
+                             "(address)");
+    } else {
----------------
grimar wrote:
> aorlov wrote:
> > grimar wrote:
> > > Can you use `createStringError` from `Error.h`?
> > No, because it is hard to use Error to get the error code and message just for printout. 
> > I'm using ErrorInfoBase as a holder instead.
> If we introduce the `printErrorJSON` that I suggested in a different comment, then here it will be possible to write:
> 
> ```
> Printer.printErrorJSON("unable to parse arguments: " + InputString,
>   std::make_error_code(std::errc::invalid_argument));
> ```
> 
> What is better, because allows to provide more context and more customizable error messages for JSON output.
> What do you think?
> No, because it is hard to use Error to get the error code and message just for printout. 

It really isn't that hard to do this. See either `toString` in Error.h (converts an `Error` to a `std::string`) or `handleAllErrors` (which allows you to handle `Error` in various ways, such as by getting its message.

But I see you already know how to do this in `printResOrErr`. Why not just pass the Error directly to the function??

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96883/new/

https://reviews.llvm.org/D96883