[PATCH] D124686: [Symbolize] Parser for log symbolizer markup.

Thu Jun 16 13:34:25 PDT 2022

mysterymath added inline comments.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/Markup.h:33
+/// any surrounding text will be reported as preceding and following elements.
+struct MarkupElement {
+  /// The full text of this element in the input.
----------------
peter.smith wrote:
> I got a bit confused with the name `MarkupElement`. Looking at the spec in https://fuchsia.dev/fuchsia-src/reference/kernel/symbolizer_markup and the `An element of symbolizer markup` I was expecting `MarkupElement` to be just a MarkupElement of the form `{{{tag:fields}}}` but it looks like this is a general case of (IIUC):
> * TextElement: a string containing no Markup Elements or SGR codes.
> * SGRElement: a special case of Text Element that contains a single SGR control code.
> * MarkupElement: Also has Tag and Fields.
> 
> Maybe worth finding a different word to element to distinguish? Something like `MarkupPiece` or `MarkupComponent`? If we do change it will be worth updating the comment. I see that you've used just element in other comments. Perhaps just Element.
> 
I was also split about this naming convention; markup languages like to describe themselves as composed of markup elements, but they're not, they're also composed of all the stuff between the elements.
I'd considered just "Element", but that would give this the name "llvm::symbolize::Element", which is a bit too broadly named for what this does. 

I did a very quick survey of what other SAX-ish HTML parsers do, it looks like they've settled on calling the abstract type a Node, while an actual element is an Element or Tag, which subclasses Node.
The usage of Node here would be akin to linked list nodes: an ordered collection of units forming a stream. 

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/Markup.h:60
+  /// After parseLine() is called, it must not be called again until
+  /// nextElement() returns None. The markup elements returned by nextElement()
+  /// may reference the input string, so it must be retained by the caller until
----------------
peter.smith wrote:
> An alternative behaviour could be that calling `parseLine()` without iterating through `nextLine()` would be to reset NextIdx and clear the Buffer. You'd then not leave the Buffer and NextIdx in an inconsistent state.
> 
> Not a strong opinion as I expect the number of users of the parser to be low.
Ah, that's cleaner; then you can stop handling nodes as soon as you decide you don't care about the rest. I can actually make use of that in the filter later.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Markup.cpp:55
+// None if the line contains no valid elements.
+Optional<MarkupElement> MarkupParser::parseElement(StringRef Line) {
+  while (true) {
----------------
peter.smith wrote:
> IIUC this is looking specifically for a MarkupElement of the form `{{{tag}}}` or `{{{tag:fields}}}` perhaps worth renaming to `parseMarkupElement()`. Could be worth putting in the comment about the expected syntax of the Markup.
Renaming MarkupElement to MarkupNode should make this clearer; with that, Element exclusively means a markup element.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Markup.cpp:75
+      continue;
+    if (!llvm::all_of(Element.Tag, [](char C) { return 'a' <= C && C <= 'z'; }))
+      continue;
----------------
peter.smith wrote:
> I expect a common user error could be an upper case tag. Is it worth a warning. This may not be the right place for it though.
I thought a bit more about this, and it seems like this is a distinction that doesn't belong in the determination of whether or not the given text is an element, but whether the element is well-formed.
This is more akin to the handling of things like integer parsing errors, which occur later, where we can give good error messages.
Accordingly, I've pulled this out of this change, to be added into the third of the series.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124686/new/

https://reviews.llvm.org/D124686