[PATCH] D124686: [Symbolize] Parser for log symbolizer markup.

Wed Jun 15 07:45:45 PDT 2022

peter.smith added a comment.

My apologies for the delay in responding. I've only taken a look at this patch alone. Not looked forward to D124798 <https://reviews.llvm.org/D124798> and D126980 <https://reviews.llvm.org/D126980>. In general looks good to me. I've a few small suggestions, mostly in naming and comments.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/Markup.h:11
+/// This file declares the log symbolizer markup data model and parser.
+///
+//===----------------------------------------------------------------------===//
----------------
Although not for this patch, I note that in D126980 the docs for the SymbolizerFormat are added, it would be good to put a link as that specification is needed to understand the code. Perhaps worth adding a TODO for this patch.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/Markup.h:33
+/// any surrounding text will be reported as preceding and following elements.
+struct MarkupElement {
+  /// The full text of this element in the input.
----------------
I got a bit confused with the name `MarkupElement`. Looking at the spec in https://fuchsia.dev/fuchsia-src/reference/kernel/symbolizer_markup and the `An element of symbolizer markup` I was expecting `MarkupElement` to be just a MarkupElement of the form `{{{tag:fields}}}` but it looks like this is a general case of (IIUC):
* TextElement: a string containing no Markup Elements or SGR codes.
* SGRElement: a special case of Text Element that contains a single SGR control code.
* MarkupElement: Also has Tag and Fields.

Maybe worth finding a different word to element to distinguish? Something like `MarkupPiece` or `MarkupComponent`? If we do change it will be worth updating the comment. I see that you've used just element in other comments. Perhaps just Element.

================
Comment at: llvm/include/llvm/DebugInfo/Symbolize/Markup.h:60
+  /// After parseLine() is called, it must not be called again until
+  /// nextElement() returns None. The markup elements returned by nextElement()
+  /// may reference the input string, so it must be retained by the caller until
----------------
An alternative behaviour could be that calling `parseLine()` without iterating through `nextLine()` would be to reset NextIdx and clear the Buffer. You'd then not leave the Buffer and NextIdx in an inconsistent state.

Not a strong opinion as I expect the number of users of the parser to be low.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Markup.cpp:37
+void MarkupParser::parseLine(StringRef Line) {
+  assert(Buffer.empty() &&
+         "Cannot call parseLine before all elements have been extracted.");
----------------
Similar to the comment I made in the header. If the Buffer isn't empty I think you could do the equivalent of:
```
NextIdx.reset();
Buffer.clear();
```

================
Comment at: llvm/lib/DebugInfo/Symbolize/Markup.cpp:55
+// None if the line contains no valid elements.
+Optional<MarkupElement> MarkupParser::parseElement(StringRef Line) {
+  while (true) {
----------------
IIUC this is looking specifically for a MarkupElement of the form `{{{tag}}}` or `{{{tag:fields}}}` perhaps worth renaming to `parseMarkupElement()`. Could be worth putting in the comment about the expected syntax of the Markup.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Markup.cpp:75
+      continue;
+    if (!llvm::all_of(Element.Tag, [](char C) { return 'a' <= C && C <= 'z'; }))
+      continue;
----------------
I expect a common user error could be an upper case tag. Is it worth a warning. This may not be the right place for it though.

================
Comment at: llvm/lib/DebugInfo/Symbolize/Markup.cpp:95
+// Parses a region of text known to be outside any markup elements. Such text
+// may still contain SGR control codes, so these are
+void MarkupParser::parseTextOutsideMarkup(StringRef Text) {
----------------
Sentence in comment looks unfinished. Perhaps `so these are added as individual elements.`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124686/new/

https://reviews.llvm.org/D124686