[clang] [Clang][Comments] Support for parsing headers in Doxygen \par commands (PR #91100)

Sun Jun 9 16:18:33 PDT 2024

================
@@ -149,6 +149,63 @@ class TextTokenRetokenizer {
     addToken();
   }
 
+  /// Check if this line starts with @par or \par
+  bool startsWithParCommand() {
+    unsigned Offset = 1;
+
+    /// Skip all whitespace characters at the beginning.
+    /// This needs to backtrack because Pos has already advanced past the
+    /// actual \par or @par command by the time this function is called.
+    while (isWhitespace(*(Pos.BufferPtr - Offset)))
+      Offset++;
----------------
hdoc wrote:

The issue is that the position of `Pos.BufferPtr` is after the `@par` command when `lexParHeading` is called. I believe this is because the comment parser [consumes](https://github.com/llvm/llvm-project/blob/84dd803993fd2b6b31f8168a3f4dc729406bd3ca/clang/lib/AST/CommentParser.cpp#L407) the `@par` command token itself during parsing, and then based on the token [it decides to call](https://github.com/llvm/llvm-project/blob/84dd803993fd2b6b31f8168a3f4dc729406bd3ca/clang/lib/AST/CommentParser.cpp#L434-L436) `lexParHeading`. I've done some tests locally to verify this behavior, and my observations align with what I describe above.

While I agree that it'd be preferable to do a readahead instead of backtracking, I don't see any easy ways to refactor the PR to incorporate this behavior as it currently stands. It looks like we'd have to refactor a little more of the comment parser, which is delicate and would incorporate changes outside of the scope of this change.

https://github.com/llvm/llvm-project/pull/91100