[PATCH] D144170: [clang-format] Add simple macro replacements in formatting.

Thu Feb 23 02:01:18 PST 2023

klimek added inline comments.

================
Comment at: clang/lib/Format/ContinuationIndenter.cpp:744
+  // Align following lines within parenthesis / brackets if configured.
+  // For a line of macro parents, the commas that follow the opening parenthesis
+  // in the line come after the opening parenthesis' children - we want to align
----------------
sammccall wrote:
> This is a bit hard to follow without context, I had to look up MacroCallReconstructor::takeResult() to recall the structure it's referring to, and there's no obvious breadcrumb to point you there. And if you don't follow the details, it's hard to understand what the relevance is.
> 
> Maybe something like: this doesn't apply to macro expansion lines, which are `MACRO( , , )` with args as children of `(` and `,`. It does not make sense to align the commas with the opening paren.
> 
> (I think describing here how the alignment ends up working in that case might just add confusion: it's not handled by this code, right?)
Excellent wording, applied. And yes, alignment in the macro case is not handled here.

================
Comment at: clang/lib/Format/TokenAnnotator.h:68
     FormatToken *Current = First;
+    addChildren(Line.Tokens.front(), Current);
     for (const UnwrappedLineNode &Node : llvm::drop_begin(Line.Tokens)) {
----------------
sammccall wrote:
> this seems like a separate bugfix: previously if the first node in Line.Tokens had children, we wouldn't be adding them.
> 
> Is this true? Was there some reason the first node could never have children before?
I think the main problem is that it's impossible to write C++ code that has a child in the first node based on how this worked. That is, previously we only added children for blocks within one line, that is, within a function or lambda. Those need intro tokens. So I really couldn't make this trigger, or I would have fixed it in a separate patch.

================
Comment at: clang/lib/Format/TokenAnnotator.h:76
+      addChildren(Node, Current);
+      // FIXME: if we add children, previous will point to the token before
+      // the children; changing this requires significant changes across
----------------
sammccall wrote:
> does "add children" here refer to the procedure in addChildren(), or to the token insertion idea introduced in the previous patch?
> 
> (i.e. is this just describing that previous/next skip over trees of children rather than including them in the chain, or is there something new going on?)
It's about adding children in general.
When fixing this I realized this is in principle wrong, but then noticed that we have code that depends on this behavior.

================
Comment at: clang/lib/Format/UnwrappedLineFormatter.cpp:926
+    } else {
+      Tok->Finalized = true;
+    }
----------------
sammccall wrote:
> why do we no longer finalize the child tree?
We do - markFinalized() is called for the child tree again. Previously we marked multiple times, which doesn't work with the new algorithm, as it would move ExpandedArg -> UnexpandedArg -> Finalized in one go.

================
Comment at: clang/lib/Format/UnwrappedLineParser.cpp:225
+      }
+      Callback.finishRun();
+    }
----------------
sammccall wrote:
> I'm sure you got this right, but it's hard for me to evaluate this code because I don't know what `UnwrappedLineConsumer::finishRun()` is for - it's not documented and the implementation is mysterious.
> 
> You might consider adding a comment to that interface/method, it's not really related to this change though.
Done. Not sure I've put too much info about how it's going to be used into the interface, where it doesn't really belong.

================
Comment at: clang/lib/Format/UnwrappedLineParser.cpp:4564
+      // We parse the macro call into a new line.
+      auto Args = parseMacroCall();
+      InExpansion = OldInExpansion;
----------------
sammccall wrote:
> it looks like you go into parsing the macro call including parens and arguments regardless of whether the macro is object-like or function-like.
> 
> So given
> ```
> #define DEBUG puts
> DEBUG("foo");
> ```
> you're going to expand to `puts;` instead of `puts("foo");`
Excellent find. Completely invalidated the way I was doing formatting of expanded lines, but fixed the problem with the line indices needing resetting \o/

Now we're formatting all lines in the first round with the macro calls replaced with expanded lines, and then again format everything including the macro calls. That is more in line with clang-format's assumptions anyway.

================
Comment at: clang/lib/Format/UnwrappedLineParser.cpp:4660
+    }
+  } while (!eof());
+  return {};
----------------
sammccall wrote:
> hitting eof is an error condition here, so it seems more natural to handle it as a case in the switch as a kind of early-return, and maybe it's worth adding an LLVM_DEBUG() for that case. If anything it'd be nice to have r_paren break the loop (but too annoying with the switch inside)
Usually we don't see hitting eof as an error, as that can always happen while we're in the process of writing a new file, or formatting "up to cursor". Thus, I think this fits the pattern of "stop where we are when we hit eof and just go with what we have" that's everywhere in the UnwrappedLineParser.

================
Comment at: clang/lib/Format/UnwrappedLineParser.h:96
 class FormatTokenSource;
+class MacroCallReconstructor;

----------------
sammccall wrote:
> you forward declare this here, move the destructor below out-of-line etc
> but also include Macros.h and use MacroExpander by value below
> Do we want to avoid the dependency or not?
That was a leftover from a previous phase, thanks for catching. Removed forward decl and removed out of line destructor.

================
Comment at: clang/lib/Format/UnwrappedLineParser.h:285
+  // until it is finished.
+  std::unique_ptr<MacroCallReconstructor> Reconstruct;
+
----------------
sammccall wrote:
> if this doesn't need to be incomplete, optional would be clearer
Excellent idea, done.

================
Comment at: clang/unittests/Format/FormatTest.cpp:22584
+  Style.Macros.push_back("CALL(x)=f([] { x })");
+  Style.Macros.push_back("ASSIGN_OR_RETURN(a, b, c)=a = (b) || (c)");
+
----------------
sammccall wrote:
> If this is assign_or_return, the treatment of "c" is fairly strange. Also you are mostly calling this with only two args. Maybe drop C and use a different macro for the 3-arg case?
ASSIGN_OR_RETURN allows 3 args, though; this is basically the thing I'd propose to configure for ASSIGN_OR_RETURN in general; is there anything specific you don't like about this?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144170/new/

https://reviews.llvm.org/D144170