<div dir="ltr"><div dir="auto"><div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 10 Oct 2022, 19:57 David Blaikie, <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Could the underlying API be made more robust to handle StringRefs<br>
rather than null terminated char* in the first place? (like<br>
SourceManager::getCharacterData could return StringRef? Presumably at<br>
some layer it already knows how long the file is - the underlying<br>
MemoryBuffer, in this case)<br></blockquote></div></div><div dir="auto">Maybe...</div><div dir="auto"><br></div><div dir="auto">You could return a StringRef(Pos, EOF) efficiently.</div><div>However it's not a great fit for many of the existing (>150) callers:</div><div> - that are looking for the *endpoint* of some text</div><div> - which want to run the lexer (the returned char* is implicitly null-terminated, StringRef would obscure that guarantee)</div><div> - the function is marked as "this is very hot", though I doubt it matters that much</div><div>It also leads to some unnatural code at callsites such as this, because "from the start of this entity until the end of the file" is an unnatural range to have, it would probably be easy to mistake for "from the start to the end of this entity".</div><div><br></div><div>I think what this function does is too low-level to make safe and obvious, and unfortunately it's hard to build high-level things on top of it.</div><div>(e.g. StringRef getTextRange(SourceRange) would be great to have, but actually you it needs to run the lexer and assert nontrivial invariants due to design decisions elsewhere in clang)</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
On Thu, Oct 6, 2022 at 2:39 AM Sam McCall via cfe-commits<br>
<<a href="mailto:cfe-commits@lists.llvm.org" rel="noreferrer" target="_blank">cfe-commits@lists.llvm.org</a>> wrote:<br>
><br>
><br>
> Author: Sam McCall<br>
> Date: 2022-10-06T11:38:55+02:00<br>
> New Revision: 5d2d527c32da2081b814ef8b446bc3e037f74b0a<br>
><br>
> URL: <a href="https://github.com/llvm/llvm-project/commit/5d2d527c32da2081b814ef8b446bc3e037f74b0a" rel="noreferrer noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/5d2d527c32da2081b814ef8b446bc3e037f74b0a</a><br>
> DIFF: <a href="https://github.com/llvm/llvm-project/commit/5d2d527c32da2081b814ef8b446bc3e037f74b0a.diff" rel="noreferrer noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/5d2d527c32da2081b814ef8b446bc3e037f74b0a.diff</a><br>
><br>
> LOG: [clangd] Avoid scanning up to end of file on each comment!<br>
><br>
> Assigning char* (pointing at comment start) to StringRef was causing us<br>
> to scan the rest of the source file looking for the null terminator.<br>
><br>
> This seems to be eating about 8% of our *total* CPU!<br>
><br>
> While fixing this, factor out the common bits from the two places we're<br>
> parsing IWYU pragmas.<br>
><br>
> Differential Revision: <a href="https://reviews.llvm.org/D135314" rel="noreferrer noreferrer" target="_blank">https://reviews.llvm.org/D135314</a><br>
><br>
> Added:<br>
><br>
><br>
> Modified:<br>
>     clang-tools-extra/clangd/Headers.cpp<br>
>     clang-tools-extra/clangd/Headers.h<br>
>     clang-tools-extra/clangd/index/CanonicalIncludes.cpp<br>
>     clang-tools-extra/clangd/unittests/HeadersTests.cpp<br>
><br>
> Removed:<br>
><br>
><br>
><br>
> ################################################################################<br>
> diff  --git a/clang-tools-extra/clangd/Headers.cpp b/clang-tools-extra/clangd/Headers.cpp<br>
> index 5231a47487bc7..52b954e921620 100644<br>
> --- a/clang-tools-extra/clangd/Headers.cpp<br>
> +++ b/clang-tools-extra/clangd/Headers.cpp<br>
> @@ -22,9 +22,17 @@<br>
>  namespace clang {<br>
>  namespace clangd {<br>
><br>
> -const char IWYUPragmaKeep[] = "// IWYU pragma: keep";<br>
> -const char IWYUPragmaExport[] = "// IWYU pragma: export";<br>
> -const char IWYUPragmaBeginExports[] = "// IWYU pragma: begin_exports";<br>
> +llvm::Optional<StringRef> parseIWYUPragma(const char *Text) {<br>
> +  // This gets called for every comment seen in the preamble, so it's quite hot.<br>
> +  constexpr llvm::StringLiteral IWYUPragma = "// IWYU pragma: ";<br>
> +  if (strncmp(Text, IWYUPragma.data(), IWYUPragma.size()))<br>
> +    return llvm::None;<br>
> +  Text += IWYUPragma.size();<br>
> +  const char *End = Text;<br>
> +  while (*End != 0 && *End != '\n')<br>
> +    ++End;<br>
> +  return StringRef(Text, End - Text);<br>
> +}<br>
><br>
>  class IncludeStructure::RecordHeaders : public PPCallbacks,<br>
>                                          public CommentHandler {<br>
> @@ -129,10 +137,10 @@ class IncludeStructure::RecordHeaders : public PPCallbacks,<br>
>    }<br>
><br>
>    bool HandleComment(Preprocessor &PP, SourceRange Range) override {<br>
> -    bool Err = false;<br>
> -    llvm::StringRef Text = SM.getCharacterData(Range.getBegin(), &Err);<br>
> -    if (Err)<br>
> +    auto Pragma = parseIWYUPragma(SM.getCharacterData(Range.getBegin()));<br>
> +    if (!Pragma)<br>
>        return false;<br>
> +<br>
>      if (inMainFile()) {<br>
>        // Given:<br>
>        //<br>
> @@ -150,8 +158,7 @@ class IncludeStructure::RecordHeaders : public PPCallbacks,<br>
>        // will know that the next inclusion is behind the IWYU pragma.<br>
>        // FIXME: Support "IWYU pragma: begin_exports" and "IWYU pragma:<br>
>        // end_exports".<br>
> -      if (!Text.startswith(IWYUPragmaExport) &&<br>
> -          !Text.startswith(IWYUPragmaKeep))<br>
> +      if (!Pragma->startswith("export") && !Pragma->startswith("keep"))<br>
>          return false;<br>
>        unsigned Offset = SM.getFileOffset(Range.getBegin());<br>
>        LastPragmaKeepInMainFileLine =<br>
> @@ -161,8 +168,7 @@ class IncludeStructure::RecordHeaders : public PPCallbacks,<br>
>        // does not support them properly yet, so they will be not marked as<br>
>        // unused.<br>
>        // FIXME: Once IncludeCleaner supports export pragmas, remove this.<br>
> -      if (!Text.startswith(IWYUPragmaExport) &&<br>
> -          !Text.startswith(IWYUPragmaBeginExports))<br>
> +      if (!Pragma->startswith("export") && !Pragma->startswith("begin_exports"))<br>
>          return false;<br>
>        Out->HasIWYUExport.insert(<br>
>            *Out->getID(SM.getFileEntryForID(SM.getFileID(Range.getBegin()))));<br>
><br>
> diff  --git a/clang-tools-extra/clangd/Headers.h b/clang-tools-extra/clangd/Headers.h<br>
> index ff3f063168325..ba72ad397bf8f 100644<br>
> --- a/clang-tools-extra/clangd/Headers.h<br>
> +++ b/clang-tools-extra/clangd/Headers.h<br>
> @@ -35,6 +35,12 @@ namespace clangd {<br>
>  /// Returns true if \p Include is literal include like "path" or <path>.<br>
>  bool isLiteralInclude(llvm::StringRef Include);<br>
><br>
> +/// If Text begins an Include-What-You-Use directive, returns it.<br>
> +/// Given "// IWYU pragma: keep", returns "keep".<br>
> +/// Input is a null-terminated char* as provided by SM.getCharacterData().<br>
> +/// (This should not be StringRef as we do *not* want to scan for its length).<br>
> +llvm::Optional<StringRef> parseIWYUPragma(const char *Text);<br>
> +<br>
>  /// Represents a header file to be #include'd.<br>
>  struct HeaderFile {<br>
>    std::string File;<br>
><br>
> diff  --git a/clang-tools-extra/clangd/index/CanonicalIncludes.cpp b/clang-tools-extra/clangd/index/CanonicalIncludes.cpp<br>
> index b0ae42a96ecf4..8568079ca1adb 100644<br>
> --- a/clang-tools-extra/clangd/index/CanonicalIncludes.cpp<br>
> +++ b/clang-tools-extra/clangd/index/CanonicalIncludes.cpp<br>
> @@ -17,8 +17,6 @@<br>
>  namespace clang {<br>
>  namespace clangd {<br>
>  namespace {<br>
> -const char IWYUPragma[] = "// IWYU pragma: private, include ";<br>
> -<br>
>  const std::pair<llvm::StringRef, llvm::StringRef> IncludeMappings[] = {<br>
>      {"include/__stddef_max_align_t.h", "<cstddef>"},<br>
>      {"include/__wmmintrin_aes.h", "<wmmintrin.h>"},<br>
> @@ -712,17 +710,17 @@ collectIWYUHeaderMaps(CanonicalIncludes *Includes) {<br>
>      PragmaCommentHandler(CanonicalIncludes *Includes) : Includes(Includes) {}<br>
><br>
>      bool HandleComment(Preprocessor &PP, SourceRange Range) override {<br>
> -      llvm::StringRef Text =<br>
> -          Lexer::getSourceText(CharSourceRange::getCharRange(Range),<br>
> -                               PP.getSourceManager(), PP.getLangOpts());<br>
> -      if (!Text.consume_front(IWYUPragma))<br>
> +      auto Pragma = parseIWYUPragma(<br>
> +          PP.getSourceManager().getCharacterData(Range.getBegin()));<br>
> +      if (!Pragma || !Pragma->consume_front("private, include "))<br>
>          return false;<br>
>        auto &SM = PP.getSourceManager();<br>
>        // We always insert using the spelling from the pragma.<br>
>        if (auto *FE = SM.getFileEntryForID(SM.getFileID(Range.getBegin())))<br>
> -        Includes->addMapping(<br>
> -            FE->getLastRef(),<br>
> -            isLiteralInclude(Text) ? Text.str() : ("\"" + Text + "\"").str());<br>
> +        Includes->addMapping(FE->getLastRef(),<br>
> +                             isLiteralInclude(*Pragma)<br>
> +                                 ? Pragma->str()<br>
> +                                 : ("\"" + *Pragma + "\"").str());<br>
>        return false;<br>
>      }<br>
><br>
><br>
> diff  --git a/clang-tools-extra/clangd/unittests/HeadersTests.cpp b/clang-tools-extra/clangd/unittests/HeadersTests.cpp<br>
> index 32e4aea15490b..324d4b58a1ef1 100644<br>
> --- a/clang-tools-extra/clangd/unittests/HeadersTests.cpp<br>
> +++ b/clang-tools-extra/clangd/unittests/HeadersTests.cpp<br>
> @@ -9,6 +9,7 @@<br>
>  #include "Headers.h"<br>
><br>
>  #include "Compiler.h"<br>
> +#include "Matchers.h"<br>
>  #include "TestFS.h"<br>
>  #include "TestTU.h"<br>
>  #include "clang/Basic/TokenKinds.h"<br>
> @@ -30,6 +31,7 @@ namespace {<br>
>  using ::testing::AllOf;<br>
>  using ::testing::Contains;<br>
>  using ::testing::ElementsAre;<br>
> +using ::testing::Eq;<br>
>  using ::testing::IsEmpty;<br>
>  using ::testing::Not;<br>
>  using ::testing::UnorderedElementsAre;<br>
> @@ -445,6 +447,18 @@ TEST_F(HeadersTest, HasIWYUPragmas) {<br>
>    EXPECT_FALSE(Includes.hasIWYUExport(getID("none.h", Includes)));<br>
>  }<br>
><br>
> +TEST(Headers, ParseIWYUPragma) {<br>
> +  EXPECT_THAT(parseIWYUPragma("// IWYU pragma: keep"), HasValue(Eq("keep")));<br>
> +  EXPECT_THAT(parseIWYUPragma("// IWYU pragma: keep\netc"),<br>
> +              HasValue(Eq("keep")));<br>
> +  EXPECT_EQ(parseIWYUPragma("/* IWYU pragma: keep"), llvm::None)<br>
> +      << "Only // comments supported!";<br>
> +  EXPECT_EQ(parseIWYUPragma("//  IWYU pragma: keep"), llvm::None)<br>
> +      << "Sensitive to whitespace";<br>
> +  EXPECT_EQ(parseIWYUPragma("// IWYU pragma:keep"), llvm::None)<br>
> +      << "Sensitive to whitespace";<br>
> +}<br>
> +<br>
>  } // namespace<br>
>  } // namespace clangd<br>
>  } // namespace clang<br>
><br>
><br>
><br>
> _______________________________________________<br>
> cfe-commits mailing list<br>
> <a href="mailto:cfe-commits@lists.llvm.org" rel="noreferrer" target="_blank">cfe-commits@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits" rel="noreferrer noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits</a><br>
</blockquote></div></div></div>
</div>