[PATCH] D146854: [Symbolizer] Add flag to dump process context JSON from markup

Fri Apr 28 22:24:22 PDT 2023

mcgrathr added inline comments.

================
Comment at: llvm/docs/CommandGuide/llvm-symbolizer.rst:248
+  Only valid with :option:`--filter-markup`. Emits a JSON representation of the
+  encountered process contexts instead of symbolizing the markup. A process
+  context is a map of the process's runtime memory layout obtained from the
----------------
I think these should be independent options: that is, you can produce filter output, or JSON output, or both.
To me, it seems reasonable to have the JSON switch take a separate explicit output file and have a separate flag that says whether or not to filtered markup output, e.g. --parse-markup that is implied by --filter-markup but can be specified separately to consume that input (and which is superfluous with --filter-markup and useless without either --filter-markup or --dump-context).

As I mentioned pedantically in another change, I think "process" is actually overly and unnecessarily specific here. Perhaps in the context (pun intention unspecified) of llvm-symbolizer, "context" alone is sufficient, but "address context" also seems pretty clear, specific, and precisely applicable to what matters about it to symbolization.

In the long run, I think it makes sense for this not to even imply --parse-markup because we can and should eventually have other sources for the same kind of information, such as switches like elfutils tools such as eu-unstrip have, for /proc/PID/maps (Linux format) text files, live process IDs (that on systems like Linux means read /proc/PID/maps), ELF core files that have NT_GNU_BUILD_ID notes themselves and/or PT_LOAD memory images with enough ELF data to reconstruct runtime layouts and build IDs post mortem, Linux kernel module layouts, and more not covered in elfutils, like minidump format or whatever else. Of course, there could be separate tools that produce this same JSON schema for each of those things, but for at least some of them building direct command-line switch support that can be used in llvm-symbolizer and other tools will make sense.

I think --parse-markup also has a nice parity with the coming switch to parse this JSON format as the input source.

Furthermore, --parse-markup alone is useful for traditional llvm-symbolizer uses without --filter-markup: I can use --parse-markup=logfile when feeding the symbolizer addresses manually on the command line (or via the stdin protocol if markup input and symbolization-request input formats can come from different explicit inputs since only one or the other can be stdin).

================
Comment at: llvm/docs/CommandGuide/llvm-symbolizer.rst:252
+
+  Markup may have multiple contexts separated by `reset` elements. The contexts
+  are returned as a JSON array of objects, even if only one context is present.
----------------
When we implement support for the trigger elements (dumpfile), that will be a separate reason to distinguish different context snapshot states. (In between reset elements, trigger elements need to be associated with the context snapshot formed from only the module+mmap elements that were emitted before the trigger element, not later additions/changes.)

So I think it makes sense to preemptively describe it here, and specify the JSON schema, as being an array of context snapshots. In future each one might contain descriptors for additional non-context elements that are to be interpreted in the context of the snapshot they appear in. In fact, it could be an option in the JSON dumper feature even now to include the bt et al that were mentioned as if they were trigger elements. That's really tantamount to implementing trigger elements and then adding another switch that says "make all the presentation elements act as trigger elements" too, so it might as well come after trigger elements. But in how we prepare the schema for future optional expansion, we should think about those cases now even if we take a while to implement them all.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/MarkupFilter.h:35
 public:
+  /// @param DumpProcessContext If true, instead of symbolizing the log, emit a
+  ///   JSON representation of the encountered process contexts.
----------------
It seems like it might be a nicer structure to keep this just responsible for the parsing and not the dumping per se.
That is, instead of just a built-in flag here, this could be an optional callback to make with ProcessContext whenever a new one is ready, paired with a flag about whether to produce the transformed version of the input text (that could just be a nullable OS pointer here).

Then a separate ProcessContext->JSON layer can be trivially plumbed to build a callback to pass here.

When you add the JSON input option, then plumbing that to do JSON->ProcessContext->JSON and markup->ProcessContext->JSON and various such combinations as unit tests becomes very attractive.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146854/new/

https://reviews.llvm.org/D146854