<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/91311>91311</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            clang::SourceRange of clang::RawComment
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          T-Gruber
      </td>
    </tr>
</table>

<pre>
    While working with the LibTooling library (llvm-project release 17.x), I noticed a strange behaviour of the SourceRange related to RawComments. I have written a small clang-based tool that essentially has the task of detecting all comments in a given C file and removing them. For this I get the SourceRange of all RawComments and remove them using the rewriter.

```C
inline void removeAllComments(clang::ASTContext &Context, clang::Rewriter &R) {
 const clang::SourceManager &SrcMgr = Context.getSourceManager();

  if (const std::map<unsigned int, clang::RawComment *> *CommentMap =
 Context.Comments.getCommentsInFile(SrcMgr.getMainFileID()))
    for (auto [LineInfo, Comment] : *CommentMap) {
 R.RemoveText(Comment->getSourceRange());
      std::cout << "SourceRange via Lexer: "
                << clang::Lexer::getSourceText(
 clang::CharSourceRange::getTokenRange(Comment->getSourceRange()),
 SrcMgr, Context.getLangOpts(), 0).str()
                << "\nRawTest: " << Comment->getRawText(SrcMgr).str() << "\n\n";
 }
}
```
To run the standalone tool I use the following command:
```
~/llvm-project/build$ bin/comment_tool test.c --extra-arg=-fparse-all-comments  --
```

I tested a short code snippet:
```C
int a = 1 /*comment*/;
extern int /*comment*/ b;
```

And got the following rewritten result:
```C
int a = 1 
extern int  b;
```

As you can see, all comments are removed, but the semicolon is also deleted in the first VarDecl. If you take a closer look at the output in the terminal, you can see that the SourceRange of the first comment includes the semicolon. However, the RawTest corresponds to my expectation:
```
SourceRange via Lexer: /*comment*/;
RawTest: /*comment*/

SourceRange via Lexer: /*comment*/
RawTest: /*comment*/
```

Is this intentional behaviour? I would be grateful for any advice!


</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVk-P4rgT_TTmUiIKTjPAgQMNw--H1K2VGLR7XFWSIvG2sZFdge7LfvaVnYQ_08zsLEqnk7j86lW56tnovaoM0VyMn8V4NcCGa-vmu-H_XJOTG-S2_Jj_UStNcLbuTZkKzopr4JrgReU7a3X4plXu0H2AkFOtT4fh0dm_qGBwpAk9wWiSvAs5E3IJGzCWVUElIHh2aCqCnGo8Kds4sPsI_c02rqBtHHSkkakEtrDF89IeDmTYJ7CBGk8EZ6eYyQS0A2oNhUZTDXP0cYrVwDUykPdkWKHWH1Cjj04Y_VtwWBJTwSGMOL9zACpgVupEBpawDxlAU4Kjgz0FW67pkMDaOuBaedhARfyJu91HzBveVxCKEND4Dg0chVjIJSJdiXTR3b-k7bVs35XRyhCcrOphFlr34EJOY_giW4hssfi2W1rD9M4g5JfuMSzBjc228xkstkLOQEyeW0dQWOP51rYN7BUNVu2Eb654rRyIbAUdelIR35kJOQ3rnj3fxgSg9qFUWg-eyxb_gEeRLRsTK7IEZT6RvaQRhFyI7Gv41315xWMg0jno6VyqpSLunzdmrTQJOW3ph6FXVPHjZtXxba-OLMDehnin2LAFMX5-UYY2Zm8DvQ5VjFcgssU9oft8bpNtXK9dXIVpZzYU2ddL0mLRXCn0WYP4u-SpsA2DyJYiW4KQ8rbeTgrhhd7JtVzk7fzrr5t7k9l-jsgWFzIdz74arsbLGt0t337azr6R6UP4hfBkV9LQLkSbzUsZvaCpfjvGmu6UIxVylnjua-pnoYXQx0uzxfOOPHfJ6AfvqUWbGGlP48bJd3jxT8rrwohJV3HXh75f29edBdeY2N6e0ZSoraFWlzbQ-KgBsLda23OQgSA-aOI6P0L7W8j1rb4Kuc4bpUshnyBXRsh1p15_tspHnpMChkN6Z4dDdJXIVsP9EZ2nIWo9vGgdDIcPHbb3TURqFbu2jqGwJYE36ngk_sz1olQMGMVhBEKuhVx07kLvyvUlifTO5Exo90dmkF_F4xG3hSmhsvxdHlspDduCI9_oXyT5PZ1_9e3hwzZQoAFPFEr0bgNBR51El2Esb1qWng6qsNoaUB5QewslaeKoeG0YynmG39GtqNAJbPbRC-MbAUKhrScH2to3wBbQNnxsuJ_N5A7KoA4ub9i12-CD_enqsSMOyhS6Kcnfs03g__ZMJ4p9Gka65oLCOkf-aE3pwx59-AB6P1LByMqaH1XyT0Trx5Vy186fzW5W5j_B_zr2w-bw7QFAGQ4nDGtQX88zIlvDBs620SXkBJVDpn2j44aC5gOwPKmChBzdIg7KeVbOshkOaD6ajMbp5GnyNBrU81SWxawY0TTPSKZFnuJUPmWzSTYpKJ-k44Gay1Q-peN0ks6yWTpK8nSSF6M8z_MvJdE-F08pHVDpJKhIYl01UN43NJ-NstFooDEn7eNRUMpW8IPurQZuHlUnbyovnlKtPPsrAivWNP90TrgU2MMNfNA4Pa-Zjz6MhISvK8V1kyeFPXQq91nsIlkv5Dry_ScAAP__yoFTUQ">