[PATCH] D46000: [AST] Added a helper to extract a user-friendly text of a comment.

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Apr 25 11:41:26 PDT 2018


ioeric added inline comments.


================
Comment at: include/clang/AST/RawCommentList.h:118
+  ///     //   Parts of it  might be indented.
+  ///     /* The comments styles might be mixed. */
+  ///  into
----------------
I'm trying to understand how these cases and RawComment work.

For this case, are the `// ... ` block and `/* ... */` merged in one `RawComment` by default?


================
Comment at: include/clang/AST/RawCommentList.h:126
+  ///      *   This is a second line. It is indented.
+  ///      * This is a third line. */
+  /// and
----------------
Are the `*`s in each lines automatically consumed by the lexer?


================
Comment at: lib/AST/CommentLexer.cpp:301
+
+bool Lexer::tryLexCommands(Token &T) {
   assert(CommentState == LCS_InsideBCPLComment ||
----------------
I think we could avoid the "somewhat non-trivial" control flow by merging command and command-less cases in one function:

Something like:
```
  const char *TokenPtr = ...;
  auto HandleNonCommandToken = ...;
  if (!ParseComand) {
     HandleNonCommandToken(...);
     return;
  }
  ... 
  switch (...) {
   // after all command cases 
  default:
     HandleNonCommandToken(...);
   }
```


================
Comment at: lib/AST/CommentLexer.cpp:331
   assert(TokenPtr < CommentEnd);
-  while (TokenPtr != CommentEnd) {
-    switch(*TokenPtr) {
-      case '\\':
-      case '@': {
-        // Commands that start with a backslash and commands that start with
-        // 'at' have equivalent semantics.  But we keep information about the
-        // exact syntax in AST for comments.
-        tok::TokenKind CommandKind =
-            (*TokenPtr == '@') ? tok::at_command : tok::backslash_command;
+  switch(*TokenPtr) {
+    case '\\':
----------------
We should be extra careful about removing the loop... (It does seem to be redundant though)


================
Comment at: lib/AST/RawCommentList.cpp:380
+        SourceMgr.getSpellingColumnNumber(Tok.getLocation(), &LocInvalid);
+    if (LocInvalid)
+      TokColumn = 0;
----------------
ilya-biryukov wrote:
> ilya-biryukov wrote:
> > ioeric wrote:
> > > Explain when this would be invalid and why `TokColumn = 0` is used?
> > I don't know whether this can be even be invalid, but I'm not confident enough to add an assert there.
> > `TokColumn = 0` seems like a reasonable way to recover if we can't compute the column number, i.e. assume the line starts at the first column if SourceLocation of the line was invalid for any reason.
> > 
> > This whole column thing looks weird to me, maybe I should just remove it altogether and just remove the same amount of whitespace in all the lines. WDYT?
> On a second thought, now I remember why I added this in the first place.
> To support the following example we want to take column numbers into account:
> 
> ```
> class Foo {
>     /* A block comment
>        spanning multiple lines
>        has too many spaces on the all lines except the first one.
>     */
>     int func();
> };
> ```
So I think you would want to force `MaxSkip` to 0 if token loc is invalid to make sure no comment is accidentally eaten?


================
Comment at: lib/AST/RawCommentList.cpp:383
+    // Compute the length of whitespace we're allowed to skip.
+    unsigned MaxSkip;
+    if (IsFirstLine) {
----------------
ilya-biryukov wrote:
> ioeric wrote:
> > nit: `unsigned MaxSkip = IsFirstLine ? ... : ...;`
> That would force to get rid of the comments in the if branches, but they seem to be useful.
> Am I missing an obvious style that would preserve the comments?
You could simply merge the comments, which doesn't seem to compromise readability.


================
Comment at: lib/AST/RawCommentList.cpp:343
+  if (CommentText.empty())
+    return ""; // we couldn't retreive the comment.
+
----------------
nit: We don't really know if there was a failure or the comment is simply empty. So I'd probably leave out the comment here to avoid confusion.


================
Comment at: lib/AST/RawCommentList.cpp:349
+                    CommentText.begin(), CommentText.end(),
+                    /*ParseCommentText=*/false);
+
----------------
`s/ParseCommentText/ParseCommands/`?


================
Comment at: lib/AST/RawCommentList.cpp:352
+  std::string Result;
+  unsigned IndentColumn = 0;
+
----------------
This variable could use a comment.


================
Comment at: lib/AST/RawCommentList.cpp:393
+
+    llvm::StringRef Trimmed = TokText.drop_front(std::min(MaxSkip, WhitespaceLen));
+    Result += Trimmed;
----------------
I think `MaxSkip` could be removed with:

```
llvm::StringRef Trimmed = TokText.drop_front(IsFirstLine ? WhitespaceLen : 
                                                 std::max((int)IndentColumn - (int)TokColumn, 0)));
```


Repository:
  rC Clang

https://reviews.llvm.org/D46000





More information about the cfe-commits mailing list