[PATCH] D83296: [clang-format] Add a MacroExpander.

Wed Jul 29 08:37:29 PDT 2020

klimek added inline comments.

================
Comment at: clang/lib/Format/MacroExpander.cpp:35
+  SmallVector<FormatToken *, 8> Params;
+  SmallVector<FormatToken *, 8> Tokens;
+};
----------------
sammccall wrote:
> Tokens -> Expansion? (semantics rather than type)
Changed to "Body".

================
Comment at: clang/lib/Format/MacroExpander.cpp:81
+    do {
+      Def.Tokens.push_back(Current);
+      nextToken();
----------------
sammccall wrote:
> this assumes the expansion is nonempty, which the grammar doesn't. while{} instead?
I have no clue how this ever worked tbh O.O
Has been reworked as part of the move to use = to separate the macro signature from the body.

================
Comment at: clang/lib/Format/MacroExpander.cpp:113
+void MacroExpander::parseDefinitions(
+    const std::vector<std::string> &MacroExpander) {
+  for (const std::string &Macro : MacroExpander) {
----------------
sammccall wrote:
> weird param name!
Copy-paste gone wrong I assume.

================
Comment at: clang/lib/Format/MacroExpander.cpp:116
+    Buffers.push_back(
+        llvm::MemoryBuffer::getMemBufferCopy(Macro, "<scratch space>"));
+    clang::FileID FID =
----------------
sammccall wrote:
> This is a slightly spooky buffer name - it's the magic name the PP uses for pasted tokens.
> A closer fit for config is maybe "<command line>" (like macro definitions passed with `-D`).
> Is it necessary to use one of clang's magic buffer names at all? If so, comment! Else maybe "<clang-format style>" or something?
We need source locations, and apparently only:
<built-in>, <inline asm> and <scratch space>
are allowed to have source locations.

================
Comment at: clang/lib/Format/MacroExpander.cpp:133
+                                                          ArgsList Args) {
+  assert(defined(ID->TokenText));
+  SmallVector<FormatToken *, 8> Result;
----------------
sammccall wrote:
> is the caller responsible for checking the #args matches #params?
> If so, document and assert here?
> 
> Looking at the implementation, it seems you don't expand if there are too few args, and expand if there are too many args (ignoring the last ones). Maybe it doesn't matter, but it'd be nice to be more consistent here.
> 
> (Probably worth calling out somewhere explicitly that variadic macros are not supported)
Added docs in the class comment for MacroExpander.
(so far I always expand, too few -> empty, too many -> ignore)

================
Comment at: clang/lib/Format/MacroExpander.cpp:194
+    assert(New->MacroCtx.Role == MR_None);
+    // Tokens that are not part of the user code do not need to be formatted.
+    New->MacroCtx.Role = MR_Hidden;
----------------
sammccall wrote:
> (I don't know exactly how this is used, but consider whether you mean "do not need to", "should not" or "cannot" here)
Replaced with "are not".

================
Comment at: clang/lib/Format/MacroExpander.h:10
+///
+/// \file
+/// This file contains the declaration of MacroExpander, which handles macro
----------------
sammccall wrote:
> I think this comment is too short (and doesn't really say much that you can't get from the filename). IME many people's mental model of macros is based on how they're used rather than how they formally work, so I think it's worth spelling out even the obvious implementation choices.
> 
> I'd like to see:
>  - rough description of the scope/goal of the feature (clang-format doesn't see macro definitions, so macro uses tend to be pseudo-parsed as function calls. When this isn't appropriate [example], a macro definition can be provided as part of the style. When such a macro is used in the code, clang-format will expand it as the preprocessor would before determining the role of tokens. [example])
>  - explicitly call out here that only a single level of expansion is supported, which is a divergence from the real preprocessor. (This influences both the implementation and the mental model)
>  - what the MacroExpander does and how it relates to MacroContext. I think this should give the input and output token streams names, as it's often important to clearly distinguish the two. (TokenBuffer uses "Spelled tokens" and "Expanded tokens" for this, which is not the same as how these terms are used in SourceManager, but related and hasn't been confusing in practice).
>  - a rough sketch of how the mismatch between what was annotated and what must be formatted is resolved e.g. -- just guessing here -- the spelled stream is formatted but for macro args, the annotations from the expanded stream are used.
> 
> (I'm assuming this is the best file to talk about the implementation of this feature in general - i'm really hoping that there is such a file. If there are a bunch of related utilities you might even consider renaming the header as an umbrella to make them easier to document. This is a question of taste...)
Moved to Macros.h and added file comment that tries to address all of these.

================
Comment at: clang/lib/Format/MacroExpander.h:66
+  /// each element in \p Args is a positional argument to the macro call.
+  llvm::SmallVector<FormatToken *, 8> expand(FormatToken *ID, ArgsList Args);
+
----------------
sammccall wrote:
> (I can't see the real callsite but...)
> If we care about performance here, is 8 a little small? should we have a `vector &Out` instead?
I don't think we particularly care about performance here, but the llvm docs say I should use SmallVector. Happy to bump down to 0 if you feel that the magic 8 is a problem here as a gut-feeling premature optimization.

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83296/new/

https://reviews.llvm.org/D83296