[PATCH] D83296: [clang-format] Add a MacroExpander.

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 22 01:55:37 PDT 2020


sammccall added a comment.

Sorry for the long delay, I've made up for it with extra comments :-\

This looks really well-thought-out and I'm rationalizing my pickiness as:

- this is conceptually complicated
- I expect this code to live a long time and be read and "modified around" by lots of people

Some of the comments/requests for doc might strictly be more in scope for in later patches (documenting functionality that doesn't exist yet). Those docs would help *me* now but happy if you'd rather briefly explain and add them later.



================
Comment at: clang/lib/Format/FormatToken.h:166
+  ///               \- P0  \- P1
+  /// ExpandedFrom stacks for each generated token will be:
+  /// ( -> P0
----------------
this is a great example, it might be a little more clear with more distinct chars and some vertical alignment:
```
Given X(A)=[A], Y(A)=<A>,
                  X({ Y(0)  } ) expands as
                  [ { < 0 > } ]
StartOfExpansion  1   1
ExpandedFrom[0]   X X X X X X X
ExpandedFrom[1]       Y Y Y
```

You could extend this to cover all the fields and hoist it to be a comment on MacroContext if you like, I think the concreteness helps.


================
Comment at: clang/lib/Format/FormatToken.h:176
+
+  /// Whether this token is the first token in a macro expansion.
+  bool StartOfExpansion = false;
----------------
why the asymmetry between start/end?

given `ID(x)=X`, `ID(ID(0))` yields `0` which starts and ends two expansions, right?
Consider making them both integer, even if you don't need it at this point.

(also 64 bits, really?)


================
Comment at: clang/lib/Format/FormatToken.h:183
+
+  /// When macro expansion introduces parents, those are marked as
+  /// \c MacroParent, so formatting knows their children need to be formatted.
----------------
this isn't used in this patch - can we leave it out until used?


================
Comment at: clang/lib/Format/FormatToken.h:389
+  // in a configured macro expansion.
+  MacroContext MacroCtx;
+
----------------
if you're not extremely concerned about memory layout, I'd consider making this an `Optional<MacroContext>` with nullopt replacing the current MR_None. 

This reduces the number of implicit invariants (AIUI MR_None can't be combined with any other fields being set) and means the name MacroContext more closely fits the thing it's modeling.


================
Comment at: clang/lib/Format/FormatToken.h:632
 
+  void copyInto(FormatToken &Tok) { Tok = *this; }
+
----------------
const.

I guess it doesn't matter, but copyFrom would seem a little less weird to me in an OOP/encapsulation sense.
I do like this explicit form rather than clone() + move constructor though, as pointer identity is pretty important for tokens.


================
Comment at: clang/lib/Format/MacroExpander.cpp:35
+  SmallVector<FormatToken *, 8> Params;
+  SmallVector<FormatToken *, 8> Tokens;
+};
----------------
Tokens -> Expansion? (semantics rather than type)


================
Comment at: clang/lib/Format/MacroExpander.cpp:38
+
+// A simple macro parser.
+class MacroExpander::DefinitionParser {
----------------
Dmitri gave a tech talk on dropping comments like these :-)


================
Comment at: clang/lib/Format/MacroExpander.cpp:42
+  DefinitionParser(ArrayRef<FormatToken *> Tokens) : Tokens(Tokens) {
+    assert(!Tokens.empty());
+    Current = Tokens[0];
----------------
who's responsible for establishing this?

AIUI this will fail if e.g. `Macros` contains a string that contains only whitespace, which is a slightly weird precondition.


================
Comment at: clang/lib/Format/MacroExpander.cpp:63
+  bool parseParams() {
+    if (Current->isNot(tok::l_paren))
+      return false;
----------------
assert instead? Caller checks this


================
Comment at: clang/lib/Format/MacroExpander.cpp:81
+    do {
+      Def.Tokens.push_back(Current);
+      nextToken();
----------------
this assumes the expansion is nonempty, which the grammar doesn't. while{} instead?


================
Comment at: clang/lib/Format/MacroExpander.cpp:113
+void MacroExpander::parseDefinitions(
+    const std::vector<std::string> &MacroExpander) {
+  for (const std::string &Macro : MacroExpander) {
----------------
weird param name!


================
Comment at: clang/lib/Format/MacroExpander.cpp:116
+    Buffers.push_back(
+        llvm::MemoryBuffer::getMemBufferCopy(Macro, "<scratch space>"));
+    clang::FileID FID =
----------------
This is a slightly spooky buffer name - it's the magic name the PP uses for pasted tokens.
A closer fit for config is maybe "<command line>" (like macro definitions passed with `-D`).
Is it necessary to use one of clang's magic buffer names at all? If so, comment! Else maybe "<clang-format style>" or something?


================
Comment at: clang/lib/Format/MacroExpander.cpp:133
+                                                          ArgsList Args) {
+  assert(defined(ID->TokenText));
+  SmallVector<FormatToken *, 8> Result;
----------------
is the caller responsible for checking the #args matches #params?
If so, document and assert here?

Looking at the implementation, it seems you don't expand if there are too few args, and expand if there are too many args (ignoring the last ones). Maybe it doesn't matter, but it'd be nice to be more consistent here.

(Probably worth calling out somewhere explicitly that variadic macros are not supported)


================
Comment at: clang/lib/Format/MacroExpander.cpp:141
+  //   y -> 1
+  llvm::StringMap<size_t> ArgMap;
+  for (size_t I = 0, E = Def.Params.size(); I != E; ++I) {
----------------
This doesn't depend on args, so we could compute this mapping when the Definition is constructed and encapsulate it there.

(Maybe performance doesn't matter, I'd also find this a little clearer. But if the allocation doesn't matter, we shouldn't be using SmallVector...)


================
Comment at: clang/lib/Format/MacroExpander.cpp:167
+      return false;
+    // If there are fewer arguments than referenced parameters, skip the
+    // parameter.
----------------
skip the parameter -> treat the parameter as empty?
(My first guess was this meant given `ID(X)=X`, `ID()` would expand to `X`.)


================
Comment at: clang/lib/Format/MacroExpander.cpp:185
+
+  // Expand the definition into Restlt.
+  for (FormatToken *Tok : Definitions[ID->TokenText].Tokens) {
----------------
nit: Result


================
Comment at: clang/lib/Format/MacroExpander.cpp:189
+      continue;
+    // Create a copy of the tokens that were not part of the macro argument,
+    // i.e. were not provided by user code.
----------------
"tokens that were not part of the macro argument" --> "tokens from the macro body"?


================
Comment at: clang/lib/Format/MacroExpander.cpp:194
+    assert(New->MacroCtx.Role == MR_None);
+    // Tokens that are not part of the user code do not need to be formatted.
+    New->MacroCtx.Role = MR_Hidden;
----------------
(I don't know exactly how this is used, but consider whether you mean "do not need to", "should not" or "cannot" here)


================
Comment at: clang/lib/Format/MacroExpander.cpp:198
+  }
+  assert(Result.size() >= 1);
+  if (Result.size() > 1)
----------------
this threw me for a loop... it's EOF right?

It's not explicitly mentioned, so maybe either add a comment or `&& Result.back()->is(tok::eof)`.

This makes the `size-2` less cryptic too.


================
Comment at: clang/lib/Format/MacroExpander.cpp:200
+  if (Result.size() > 1)
+    ++Result[Result.size() - 2]->MacroCtx.EndOfExpansion;
+  return Result;
----------------
Why not set StartOfExpansion in the same way, to avoid tracking the `First` state?


================
Comment at: clang/lib/Format/MacroExpander.h:10
+///
+/// \file
+/// This file contains the declaration of MacroExpander, which handles macro
----------------
I think this comment is too short (and doesn't really say much that you can't get from the filename). IME many people's mental model of macros is based on how they're used rather than how they formally work, so I think it's worth spelling out even the obvious implementation choices.

I'd like to see:
 - rough description of the scope/goal of the feature (clang-format doesn't see macro definitions, so macro uses tend to be pseudo-parsed as function calls. When this isn't appropriate [example], a macro definition can be provided as part of the style. When such a macro is used in the code, clang-format will expand it as the preprocessor would before determining the role of tokens. [example])
 - explicitly call out here that only a single level of expansion is supported, which is a divergence from the real preprocessor. (This influences both the implementation and the mental model)
 - what the MacroExpander does and how it relates to MacroContext. I think this should give the input and output token streams names, as it's often important to clearly distinguish the two. (TokenBuffer uses "Spelled tokens" and "Expanded tokens" for this, which is not the same as how these terms are used in SourceManager, but related and hasn't been confusing in practice).
 - a rough sketch of how the mismatch between what was annotated and what must be formatted is resolved e.g. -- just guessing here -- the spelled stream is formatted but for macro args, the annotations from the expanded stream are used.

(I'm assuming this is the best file to talk about the implementation of this feature in general - i'm really hoping that there is such a file. If there are a bunch of related utilities you might even consider renaming the header as an umbrella to make them easier to document. This is a question of taste...)


================
Comment at: clang/lib/Format/MacroExpander.h:40
+
+/// Takes a set of simple macro definitions as strings and allows expanding
+/// calls to those macros.
----------------
You define "simple" in the patch description as non-recursive - can you just say "non-recursive" here? Or better spell out what that means (no macro can refer to another macro)


================
Comment at: clang/lib/Format/MacroExpander.h:44
+public:
+  typedef llvm::ArrayRef<llvm::SmallVector<FormatToken *, 8>> ArgsList;
+
----------------
nit: I think `using` is usually considered more readable


================
Comment at: clang/lib/Format/MacroExpander.h:49
+  /// Each entry in \p Macros must conform to the following simple
+  /// macro-definition language:
+  /// <def>    ::= <id> <exp> | <id> "(" <params> ") <exp>
----------------
This seems to precisely define the grammar but leave me guessing as to the semantics.
I'd at least suggest 'exp' -> 'expansion'.

Personally I'm partial to examples instead :-)


================
Comment at: clang/lib/Format/MacroExpander.h:50
+  /// macro-definition language:
+  /// <def>    ::= <id> <exp> | <id> "(" <params> ") <exp>
+  /// <params> ::= <id> | <id> "," <params>
----------------
"PI 3.14" or "NOT(X) !X" seems much less familiar than "PI=3.14" or "NOT(X)=!X" as accepted by `-D`.

It also resolves an ambiguity: is "DISCARD_ERROR ( void ) ( err )" an object or function macro?

The extra redundancy in the grammar should make it easier to detect errors too.



================
Comment at: clang/lib/Format/MacroExpander.h:51
+  /// <def>    ::= <id> <exp> | <id> "(" <params> ") <exp>
+  /// <params> ::= <id> | <id> "," <params>
+  /// <exp>    ::= <eof> | <tok> <exp>
----------------
this grammar allows object macros, but disallows function macros with no args. intended?

(FWIW this grammar also allows "ID(X X": an object macro "ID" which expands to "(X X". But the implementation, probably correctly, doesn't)


================
Comment at: clang/lib/Format/MacroExpander.h:54
+  ///
+  MacroExpander(const std::vector<std::string> &Macros,
+                clang::SourceManager &SourceMgr, const FormatStyle &Style,
----------------
The signature doesn't allow errors to be reported, which is a little unfortunate but seems hard to fix properly (so that errors are reported when the config is parsed) - the "style is a simple struct" is hard to reconcile with this data structure.

Silent discard on error should probably be mentioned in the constructor.


================
Comment at: clang/lib/Format/MacroExpander.h:56
+                clang::SourceManager &SourceMgr, const FormatStyle &Style,
+                encoding::Encoding encoding,
+                llvm::SpecificBumpPtrAllocator<FormatToken> &Allocator,
----------------
Why are the macro definitions in an arbitrary specified encoding? I'd hope by the time we've parsed the config, our strings are UTF-8. (On disk, YAML can be UTF-16 per spec, but...)


================
Comment at: clang/lib/Format/MacroExpander.h:62
+  /// Returns whether a macro \p Name is defined.
+  bool defined(llvm::StringRef Name);
+
----------------
const (and an expand)


================
Comment at: clang/lib/Format/MacroExpander.h:66
+  /// each element in \p Args is a positional argument to the macro call.
+  llvm::SmallVector<FormatToken *, 8> expand(FormatToken *ID, ArgsList Args);
+
----------------
(I can't see the real callsite but...)
If we care about performance here, is 8 a little small? should we have a `vector &Out` instead?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83296/new/

https://reviews.llvm.org/D83296





More information about the cfe-commits mailing list