<html><head><base href="x-msg://1078/"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div>On Dec 21, 2011, at 3:57 PM, Larry Olson wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div class="hmmessage" style="font-size: 10pt; font-family: Tahoma; "><div dir="ltr"><div>Hi all,</div><div><br></div><div>I love that clang and llvm make it possible for folks like me to create tools that have a deeper understanding of C, C++, and ObjC. For instance, I used clang to write a tool that helped me normalize headers when porting code across platforms. Clang and llvm were easy to get into and well documented. Thanks for the work you all do.</div><div><br></div><div>I have a question for you though.</div><div><br></div><div>Brief Summary of the rest of the email:</div><div>2 Questions</div><div><span style="font-size: 10pt; ">1. Is there an intended way of getting the contents of a token from just the SourceRange?</span></div><div><span style="font-size: 10pt; ">2. Should I be asking the SourceManager for a Buffer for a given FileID and then adjusting pointers into that buffer bast on the Offset of that FileID?</span></div><div><br></div><div><br></div><div>More details</div>I was considering what it would take to build a doxygen like tool using Clang and found the CommentHandler object and its virtual, HandleComments( Preprocessor, SourceRange). However, I was a bit stuck when I tried to go from the SourceRange to the actual contents of the comment. <div><br></div><div>I looked at what PPCallback and ASTConsumers offer but didn't see anything that would lead me to believe I should've expected more data from the CommentHandler. I looked at old code I had written  that used the Rewriter but that didn't feel right because I don't want to Rewrite the comment, I want to parse its contents. So then I started looking around frontend actions that spit out data or modify the guts of a buffer. Eventually I stumbled across the HTMLPrintAction and its corresponding HTMLPrinter. </div><div><br></div><div>Inside of HTMLPrinter I noticed the AddLineNumbers method which was performing manipulations based on a raw MemoryBuffer which looked about right. It eventually led me to this prototype code:</div><div><br></div><div>/// Write out the entire comment based on the source range.</div><div><div>bool IndentingCommentHandler::HandleComment(Preprocessor &pp, SourceRange rng)</div><div>{</div><div>    FileID FID = pp.getSourceManager().getMainFileID();</div><div>    const llvm::MemoryBuffer *MB = pp.getSourceManager().getBuffer(FID);</div><div><br></div><div>    int size = rng.getEnd().getRawEncoding() - rng.getBegin().getRawEncoding();</div><div>    char *Buff = (char *)calloc(size+1, sizeof(char));</div><div><br></div><div>    const char *itBeg = MB->getBufferStart();</div><div>    const char *itEnd = MB->getBufferStart();</div><div><br></div><div>    unsigned int offset = pp.getSourceManager().getLocForStartOfFile(FID).getRawEncoding();</div><div><br></div><div>    // Adjust pointers to account for the FileID's offset in the Source manager.</div><div>    itBeg -= offset;</div><div>    itEnd -= offset;</div><div><br></div><div>    // Adjust pointers relative to where the comment actually begins and ends</div><div>    for( int i = 0; i < rng.getBegin().getRawEncoding(); i++)</div><div>    {</div><div>        ++itBeg;</div><div>        ++itEnd;</div><div>    }</div><div><br></div><div>    for( int i = 0; i < size; ++i)</div><div>    {</div><div>        ++itEnd;</div><div>    }</div><div><br></div><div>    std::copy(itBeg, itEnd, Buff);</div><div>    std::cout << "=============================" << std::endl;</div><div>    std::cout << Buff << std::endl;</div><div>    free(Buff);</div><div>    return false;</div><div>}</div></div><div><br></div><div><br></div><div>This seems to work for a few test cases I've tried it against but also felt a bit verbose. I'm wondering, did I do something stupid? Did I overlook a better or more proper way of this?</div></div></div></span></blockquote><div><br></div><div>You should generally avoid using SourceLocation::getRawEncoding(), it is only useful as opaque data, do not use it for offset info.</div><div>Check out SourceManager::getDecomposedLoc(SourceLocation); this returns a pair of FileID/offset so you can arrive at a Buffer+offset for a SourceLocation.</div><div><br></div><blockquote type="cite"><span class="Apple-style-span" style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div class="hmmessage" style="font-size: 10pt; font-family: Tahoma; "><div dir="ltr"><div><br></div><div>Many thanks for any help and guidance,</div><div>Larry Olson</div><div>(<a href="https://github.com/loarabia">https://github.com/loarabia</a>)</div><div><br></div></div>_______________________________________________<br>cfe-dev mailing list<br><a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br></div></span></blockquote></div><br></body></html>