[clang] 333d66b - [analyzer][ctu] Fix wrong 'multiple definitions' errors caused by space characters in lookup names when parsing the ctu index file

Balazs Benics via cfe-commits cfe-commits at lists.llvm.org
Thu Dec 16 08:48:16 PST 2021


Author: Ella Ma
Date: 2021-12-16T17:47:59+01:00
New Revision: 333d66b09494b7ebc1a89f2befa79128a56f77e3

URL: https://github.com/llvm/llvm-project/commit/333d66b09494b7ebc1a89f2befa79128a56f77e3
DIFF: https://github.com/llvm/llvm-project/commit/333d66b09494b7ebc1a89f2befa79128a56f77e3.diff

LOG: [analyzer][ctu] Fix wrong 'multiple definitions' errors caused by space characters in lookup names when parsing the ctu index file

This error was found when analyzing MySQL with CTU enabled.

When there are space characters in the lookup name, the current
delimiter searching strategy will make the file path wrongly parsed.
And when two lookup names have the same prefix before their first space
characters, a 'multiple definitions' error will be wrongly reported.

e.g. The lookup names for the two lambda exprs in the test case are
`c:@S at G@F at G#@Sa at F@operator int (*)(char)#1` and
`c:@S at G@F at G#@Sa at F@operator bool (*)(char)#1` respectively. And their
prefixes are both `c:@S at G@F at G#@Sa at F@operator` when using the first space
character as the delimiter.

Solving the problem by adding a length for the lookup name, making the
index items in the format of `USR-Length:USR File-Path`.

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D102669

Added: 
    clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
    clang/test/Analysis/ctu-lookup-name-with-space.cpp

Modified: 
    clang/docs/analyzer/user-docs/CrossTranslationUnit.rst
    clang/include/clang/Basic/DiagnosticCrossTUKinds.td
    clang/lib/CrossTU/CrossTranslationUnit.cpp
    clang/test/Analysis/Inputs/ctu-import.c.externalDefMap.ast-dump.txt
    clang/test/Analysis/Inputs/ctu-other.c.externalDefMap.ast-dump.txt
    clang/test/Analysis/Inputs/ctu-other.cpp.externalDefMap.ast-dump.txt
    clang/test/Analysis/Inputs/plist-macros-with-expansion-ctu.c.externalDefMap.txt
    clang/test/Analysis/ctu-inherited-default-ctor.cpp
    clang/test/Analysis/func-mapping-test.cpp
    clang/unittests/CrossTU/CrossTranslationUnitTest.cpp

Removed: 
    


################################################################################
diff  --git a/clang/docs/analyzer/user-docs/CrossTranslationUnit.rst b/clang/docs/analyzer/user-docs/CrossTranslationUnit.rst
index 0e0691b95d5d..0976c9a5b67a 100644
--- a/clang/docs/analyzer/user-docs/CrossTranslationUnit.rst
+++ b/clang/docs/analyzer/user-docs/CrossTranslationUnit.rst
@@ -81,12 +81,12 @@ of `foo.cpp`:
   $
 
 The next step is to create a CTU index file which holds the `USR` name and location of external definitions in the
-source files:
+source files in format `<USR-Length>:<USR> <File-Path>`:
 
 .. code-block:: bash
 
   $ clang-extdef-mapping -p . foo.cpp
-  c:@F at foo# /path/to/your/project/foo.cpp
+  9:c:@F at foo# /path/to/your/project/foo.cpp
   $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
 
 We have to modify `externalDefMap.txt` to contain the name of the `.ast` files instead of the source files:
@@ -278,12 +278,12 @@ The `invocation list`:
 
 We'd like to analyze `main.cpp` and discover the division by zero bug.
 As we are using On-demand mode, we only need to create a CTU index file which holds the `USR` name and location of
-external definitions in the source files:
+external definitions in the source files in format `<USR-Length>:<USR> <File-Path>`:
 
 .. code-block:: bash
 
   $ clang-extdef-mapping -p . foo.cpp
-  c:@F at foo# /path/to/your/project/foo.cpp
+  9:c:@F at foo# /path/to/your/project/foo.cpp
   $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
 
 Now everything is available for the CTU analysis.

diff  --git a/clang/include/clang/Basic/DiagnosticCrossTUKinds.td b/clang/include/clang/Basic/DiagnosticCrossTUKinds.td
index 4277a3173203..e6ea1956f98a 100644
--- a/clang/include/clang/Basic/DiagnosticCrossTUKinds.td
+++ b/clang/include/clang/Basic/DiagnosticCrossTUKinds.td
@@ -12,8 +12,8 @@ def err_ctu_error_opening : Error<
   "error opening '%0': required by the CrossTU functionality">;
 
 def err_extdefmap_parsing : Error<
-  "error parsing index file: '%0' line: %1 'UniqueID filename' format "
-  "expected">;
+  "error parsing index file: '%0' line: %1 '<USR-Length>:<USR> <File-Path>' "
+  "format expected">;
 
 def err_multiple_def_index : Error<
   "multiple definitions are found for the same key in index ">;

diff  --git a/clang/lib/CrossTU/CrossTranslationUnit.cpp b/clang/lib/CrossTU/CrossTranslationUnit.cpp
index 0aecad491ecc..cbe07acb76fb 100644
--- a/clang/lib/CrossTU/CrossTranslationUnit.cpp
+++ b/clang/lib/CrossTU/CrossTranslationUnit.cpp
@@ -149,6 +149,35 @@ std::error_code IndexError::convertToErrorCode() const {
   return std::error_code(static_cast<int>(Code), *Category);
 }
 
+/// Parse one line of the input CTU index file.
+///
+/// @param[in]  LineRef     The input CTU index item in format
+///                         "<USR-Length>:<USR> <File-Path>".
+/// @param[out] LookupName  The lookup name in format "<USR-Length>:<USR>".
+/// @param[out] FilePath    The file path "<File-Path>".
+static bool parseCrossTUIndexItem(StringRef LineRef, StringRef &LookupName,
+                                  StringRef &FilePath) {
+  // `LineRef` is "<USR-Length>:<USR> <File-Path>" now.
+
+  size_t USRLength = 0;
+  if (LineRef.consumeInteger(10, USRLength))
+    return false;
+  assert(USRLength && "USRLength should be greater than zero.");
+
+  if (!LineRef.consume_front(":"))
+    return false;
+
+  // `LineRef` is now just "<USR> <File-Path>".
+
+  // Check LookupName length out of bound and incorrect delimiter.
+  if (USRLength >= LineRef.size() || ' ' != LineRef[USRLength])
+    return false;
+
+  LookupName = LineRef.substr(0, USRLength);
+  FilePath = LineRef.substr(USRLength + 1);
+  return true;
+}
+
 llvm::Expected<llvm::StringMap<std::string>>
 parseCrossTUIndex(StringRef IndexPath) {
   std::ifstream ExternalMapFile{std::string(IndexPath)};
@@ -160,24 +189,23 @@ parseCrossTUIndex(StringRef IndexPath) {
   std::string Line;
   unsigned LineNo = 1;
   while (std::getline(ExternalMapFile, Line)) {
-    StringRef LineRef{Line};
-    const size_t Delimiter = LineRef.find(' ');
-    if (Delimiter > 0 && Delimiter != std::string::npos) {
-      StringRef LookupName = LineRef.substr(0, Delimiter);
-
-      // Store paths with posix-style directory separator.
-      SmallString<32> FilePath(LineRef.substr(Delimiter + 1));
-      llvm::sys::path::native(FilePath, llvm::sys::path::Style::posix);
-
-      bool InsertionOccured;
-      std::tie(std::ignore, InsertionOccured) =
-          Result.try_emplace(LookupName, FilePath.begin(), FilePath.end());
-      if (!InsertionOccured)
-        return llvm::make_error<IndexError>(
-            index_error_code::multiple_definitions, IndexPath.str(), LineNo);
-    } else
+    // Split lookup name and file path
+    StringRef LookupName, FilePathInIndex;
+    if (!parseCrossTUIndexItem(Line, LookupName, FilePathInIndex))
       return llvm::make_error<IndexError>(
           index_error_code::invalid_index_format, IndexPath.str(), LineNo);
+
+    // Store paths with posix-style directory separator.
+    SmallString<32> FilePath(FilePathInIndex);
+    llvm::sys::path::native(FilePath, llvm::sys::path::Style::posix);
+
+    bool InsertionOccured;
+    std::tie(std::ignore, InsertionOccured) =
+        Result.try_emplace(LookupName, FilePath.begin(), FilePath.end());
+    if (!InsertionOccured)
+      return llvm::make_error<IndexError>(
+          index_error_code::multiple_definitions, IndexPath.str(), LineNo);
+
     ++LineNo;
   }
   return Result;
@@ -187,7 +215,8 @@ std::string
 createCrossTUIndexString(const llvm::StringMap<std::string> &Index) {
   std::ostringstream Result;
   for (const auto &E : Index)
-    Result << E.getKey().str() << " " << E.getValue() << '\n';
+    Result << E.getKey().size() << ':' << E.getKey().str() << ' '
+           << E.getValue() << '\n';
   return Result.str();
 }
 
@@ -428,7 +457,7 @@ CrossTranslationUnitContext::ASTUnitStorage::getASTUnitForFunction(
             ensureCTUIndexLoaded(CrossTUDir, IndexName))
       return std::move(IndexLoadError);
 
-    // Check if there is and entry in the index for the function.
+    // Check if there is an entry in the index for the function.
     if (!NameFileMap.count(FunctionName)) {
       ++NumNotInOtherTU;
       return llvm::make_error<IndexError>(index_error_code::missing_definition);

diff  --git a/clang/test/Analysis/Inputs/ctu-import.c.externalDefMap.ast-dump.txt b/clang/test/Analysis/Inputs/ctu-import.c.externalDefMap.ast-dump.txt
index 83d3b4ca451e..ee5317d36f71 100644
--- a/clang/test/Analysis/Inputs/ctu-import.c.externalDefMap.ast-dump.txt
+++ b/clang/test/Analysis/Inputs/ctu-import.c.externalDefMap.ast-dump.txt
@@ -1 +1 @@
-c:@F at testStaticImplicit ctu-import.c.ast
+23:c:@F at testStaticImplicit ctu-import.c.ast

diff  --git a/clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp b/clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
new file mode 100644
index 000000000000..0d4fd974d3d4
--- /dev/null
+++ b/clang/test/Analysis/Inputs/ctu-lookup-name-with-space.cpp
@@ -0,0 +1,17 @@
+void f(void (*)());
+void f(void (*)(int));
+
+struct G {
+  G() {
+    // multiple definitions are found for the same key in index
+    f([]() -> void {});    // USR: c:@S at G@F at G#@Sa at F@operator void (*)()#1
+    f([](int) -> void {}); // USR: c:@S at G@F at G#@Sa at F@operator void (*)(int)#1
+
+    // As both lambda exprs have the same prefix, if the CTU index parser uses
+    // the first space character as the delimiter between USR and file path, a
+    // "multiple definitions are found for the same key in index" error will
+    // be reported.
+  }
+};
+
+void importee() {}

diff  --git a/clang/test/Analysis/Inputs/ctu-other.c.externalDefMap.ast-dump.txt b/clang/test/Analysis/Inputs/ctu-other.c.externalDefMap.ast-dump.txt
index a6de9304551b..ab675c3242c7 100644
--- a/clang/test/Analysis/Inputs/ctu-other.c.externalDefMap.ast-dump.txt
+++ b/clang/test/Analysis/Inputs/ctu-other.c.externalDefMap.ast-dump.txt
@@ -1,7 +1,7 @@
-c:@F at inlineAsm ctu-other.c.ast
-c:@F at g ctu-other.c.ast
-c:@F at f ctu-other.c.ast
-c:@F at enumCheck ctu-other.c.ast
-c:@F at identImplicit ctu-other.c.ast
-c:@F at structInProto ctu-other.c.ast
-c:@F at switchWithoutCases ctu-other.c.ast
+14:c:@F at inlineAsm ctu-other.c.ast
+6:c:@F at g ctu-other.c.ast
+6:c:@F at f ctu-other.c.ast
+14:c:@F at enumCheck ctu-other.c.ast
+18:c:@F at identImplicit ctu-other.c.ast
+18:c:@F at structInProto ctu-other.c.ast
+23:c:@F at switchWithoutCases ctu-other.c.ast

diff  --git a/clang/test/Analysis/Inputs/ctu-other.cpp.externalDefMap.ast-dump.txt b/clang/test/Analysis/Inputs/ctu-other.cpp.externalDefMap.ast-dump.txt
index e5fca5d6042a..476b8f186704 100644
--- a/clang/test/Analysis/Inputs/ctu-other.cpp.externalDefMap.ast-dump.txt
+++ b/clang/test/Analysis/Inputs/ctu-other.cpp.externalDefMap.ast-dump.txt
@@ -1,30 +1,30 @@
-c:@N at chns@F at chf1#I# ctu-other.cpp.ast
-c:@N at myns@N at embed_ns@F at fens#I# ctu-other.cpp.ast
-c:@F at g#I# ctu-other.cpp.ast
-c:@S at mycls@F at fscl#I#S ctu-other.cpp.ast
-c:@S at mycls@F at fcl#I# ctu-other.cpp.ast
-c:@S at mycls@F at fvcl#I# ctu-other.cpp.ast
-c:@N at myns@S at embed_cls@F at fecl#I# ctu-other.cpp.ast
-c:@S at mycls@S at embed_cls2@F at fecl2#I# ctu-other.cpp.ast
-c:@S at derived@F at fvcl#I# ctu-other.cpp.ast
-c:@F at f#I# ctu-other.cpp.ast
-c:@N at myns@F at fns#I# ctu-other.cpp.ast
-c:@F at h#I# ctu-other.cpp.ast
-c:@F at h_chain#I# ctu-chain.cpp.ast
-c:@N at chns@S at chcls@F at chf4#I# ctu-chain.cpp.ast
-c:@N at chns@F at chf2#I# ctu-chain.cpp.ast
-c:@F at fun_using_anon_struct#I# ctu-other.cpp.ast
-c:@F at other_macro_diag#I# ctu-other.cpp.ast
-c:@extInt ctu-other.cpp.ast
-c:@N at intns@extInt ctu-other.cpp.ast
-c:@extS ctu-other.cpp.ast
-c:@S at A@a ctu-other.cpp.ast
-c:@extSC ctu-other.cpp.ast
-c:@S at ST@sc ctu-other.cpp.ast
-c:@extSCN ctu-other.cpp.ast
-c:@extSubSCN ctu-other.cpp.ast
-c:@extSCC ctu-other.cpp.ast
-c:@extU ctu-other.cpp.ast
-c:@S at TestAnonUnionUSR@Test ctu-other.cpp.ast
-c:@F at testImportOfIncompleteDefaultParmDuringImport#I# ctu-other.cpp.ast
-c:@F at testImportOfDelegateConstructor#I# ctu-other.cpp.ast
\ No newline at end of file
+19:c:@N at chns@F at chf1#I# ctu-other.cpp.ast
+30:c:@N at myns@N at embed_ns@F at fens#I# ctu-other.cpp.ast
+9:c:@F at g#I# ctu-other.cpp.ast
+21:c:@S at mycls@F at fscl#I#S ctu-other.cpp.ast
+19:c:@S at mycls@F at fcl#I# ctu-other.cpp.ast
+20:c:@S at mycls@F at fvcl#I# ctu-other.cpp.ast
+31:c:@N at myns@S at embed_cls@F at fecl#I# ctu-other.cpp.ast
+34:c:@S at mycls@S at embed_cls2@F at fecl2#I# ctu-other.cpp.ast
+22:c:@S at derived@F at fvcl#I# ctu-other.cpp.ast
+9:c:@F at f#I# ctu-other.cpp.ast
+18:c:@N at myns@F at fns#I# ctu-other.cpp.ast
+9:c:@F at h#I# ctu-other.cpp.ast
+15:c:@F at h_chain#I# ctu-chain.cpp.ast
+27:c:@N at chns@S at chcls@F at chf4#I# ctu-chain.cpp.ast
+19:c:@N at chns@F at chf2#I# ctu-chain.cpp.ast
+29:c:@F at fun_using_anon_struct#I# ctu-other.cpp.ast
+24:c:@F at other_macro_diag#I# ctu-other.cpp.ast
+9:c:@extInt ctu-other.cpp.ast
+17:c:@N at intns@extInt ctu-other.cpp.ast
+7:c:@extS ctu-other.cpp.ast
+8:c:@S at A@a ctu-other.cpp.ast
+8:c:@extSC ctu-other.cpp.ast
+10:c:@S at ST@sc ctu-other.cpp.ast
+9:c:@extSCN ctu-other.cpp.ast
+12:c:@extSubSCN ctu-other.cpp.ast
+9:c:@extSCC ctu-other.cpp.ast
+7:c:@extU ctu-other.cpp.ast
+26:c:@S at TestAnonUnionUSR@Test ctu-other.cpp.ast
+53:c:@F at testImportOfIncompleteDefaultParmDuringImport#I# ctu-other.cpp.ast
+39:c:@F at testImportOfDelegateConstructor#I# ctu-other.cpp.ast

diff  --git a/clang/test/Analysis/Inputs/plist-macros-with-expansion-ctu.c.externalDefMap.txt b/clang/test/Analysis/Inputs/plist-macros-with-expansion-ctu.c.externalDefMap.txt
index 52214a41891e..dcf3e9324174 100644
--- a/clang/test/Analysis/Inputs/plist-macros-with-expansion-ctu.c.externalDefMap.txt
+++ b/clang/test/Analysis/Inputs/plist-macros-with-expansion-ctu.c.externalDefMap.txt
@@ -1,4 +1,4 @@
-c:@F at F1 plist-macros-ctu.c.ast
-c:@F at F2 plist-macros-ctu.c.ast
-c:@F at F3 plist-macros-ctu.c.ast
-c:@F at F_H plist-macros-ctu.c.ast
+7:c:@F at F1 plist-macros-ctu.c.ast
+7:c:@F at F2 plist-macros-ctu.c.ast
+7:c:@F at F3 plist-macros-ctu.c.ast
+8:c:@F at F_H plist-macros-ctu.c.ast

diff  --git a/clang/test/Analysis/ctu-inherited-default-ctor.cpp b/clang/test/Analysis/ctu-inherited-default-ctor.cpp
index 7c208a86ba93..e89fb2de707b 100644
--- a/clang/test/Analysis/ctu-inherited-default-ctor.cpp
+++ b/clang/test/Analysis/ctu-inherited-default-ctor.cpp
@@ -4,7 +4,7 @@
 // RUN: %clang_cc1 -std=c++14 -triple x86_64-pc-linux-gnu \
 // RUN:   -emit-pch -o %t/ctudir/ctu-inherited-default-ctor-other.cpp.ast \
 // RUN:    %S/Inputs/ctu-inherited-default-ctor-other.cpp
-// RUN: echo "c:@N at clang@S at DeclContextLookupResult@SingleElementDummyList ctu-inherited-default-ctor-other.cpp.ast" \
+// RUN: echo "59:c:@N at clang@S at DeclContextLookupResult@SingleElementDummyList ctu-inherited-default-ctor-other.cpp.ast" \
 // RUN:   > %t/ctudir/externalDefMap.txt
 //
 // RUN: %clang_analyze_cc1 -std=c++14 -triple x86_64-pc-linux-gnu \

diff  --git a/clang/test/Analysis/ctu-lookup-name-with-space.cpp b/clang/test/Analysis/ctu-lookup-name-with-space.cpp
new file mode 100644
index 000000000000..84ac6cf38db8
--- /dev/null
+++ b/clang/test/Analysis/ctu-lookup-name-with-space.cpp
@@ -0,0 +1,21 @@
+// RUN: rm -rf %t
+// RUN: mkdir %t
+
+// RUN: echo '"%/S/Inputs/ctu-lookup-name-with-space.cpp" : ["g++", "-c", "%/S/Inputs/ctu-lookup-name-with-space.cpp"]' > %t/invocations.yaml
+// RUN: %clang_extdef_map %S/Inputs/ctu-lookup-name-with-space.cpp -- > %t/externalDefMap.txt
+
+// RUN: cd %t && %clang_cc1 -fsyntax-only -analyze \
+// RUN:   -analyzer-checker=core,debug.ExprInspection \
+// RUN:   -analyzer-config experimental-enable-naive-ctu-analysis=true \
+// RUN:   -analyzer-config ctu-dir=. \
+// RUN:   -analyzer-config ctu-invocation-list=invocations.yaml \
+// RUN:   -verify %s
+
+void importee();
+
+void trigger() {
+  // Call an external function to trigger the parsing process of CTU index.
+  // Refer to file Inputs/ctu-lookup-name-with-space.cpp for more details.
+
+  importee(); // expected-no-diagnostics
+}

diff  --git a/clang/test/Analysis/func-mapping-test.cpp b/clang/test/Analysis/func-mapping-test.cpp
index 5c04d9411fee..44bbf8509720 100644
--- a/clang/test/Analysis/func-mapping-test.cpp
+++ b/clang/test/Analysis/func-mapping-test.cpp
@@ -3,10 +3,10 @@
 int f(int) {
   return 0;
 }
-// CHECK-DAG: c:@F at f#I#
+// CHECK-DAG: 9:c:@F at f#I#
 
 extern const int x = 5;
-// CHECK-DAG: c:@x
+// CHECK-DAG: 4:c:@x
 
 // Non-const variables should not be collected.
 int y = 5;
@@ -18,29 +18,29 @@ struct S {
   int a;
 };
 extern S const s = {.a = 2};
-// CHECK-DAG: c:@s
+// CHECK-DAG: 4:c:@s
 
 struct SF {
   const int a;
 };
 SF sf = {.a = 2};
-// CHECK-DAG: c:@sf
+// CHECK-DAG: 5:c:@sf
 
 struct SStatic {
   static const int a = 4;
 };
 const int SStatic::a;
-// CHECK-DAG: c:@S at SStatic@a
+// CHECK-DAG: 14:c:@S at SStatic@a
 
 extern int const arr[5] = { 0, 1 };
-// CHECK-DAG: c:@arr
+// CHECK-DAG: 6:c:@arr
 
 union U {
   const int a;
   const unsigned int b;
 };
 U u = {.a = 6};
-// CHECK-DAG: c:@u
+// CHECK-DAG: 4:c:@u
 
 // No USR can be generated for this.
 // Check for no crash in this case.
@@ -48,3 +48,15 @@ static union {
   float uf;
   const int ui;
 };
+
+void f(int (*)(char));
+void f(bool (*)(char));
+
+struct G {
+  G() {
+    f([](char) -> int { return 42; });
+    // CHECK-DAG: 41:c:@S at G@F at G#@Sa at F@operator int (*)(char)#1
+    f([](char) -> bool { return true; });
+    // CHECK-DAG: 42:c:@S at G@F at G#@Sa at F@operator bool (*)(char)#1
+  }
+};

diff  --git a/clang/unittests/CrossTU/CrossTranslationUnitTest.cpp b/clang/unittests/CrossTU/CrossTranslationUnitTest.cpp
index 03401f933b87..b2f660c361a0 100644
--- a/clang/unittests/CrossTU/CrossTranslationUnitTest.cpp
+++ b/clang/unittests/CrossTU/CrossTranslationUnitTest.cpp
@@ -61,7 +61,7 @@ class CTUASTConsumer : public clang::ASTConsumer {
     ASSERT_FALSE(llvm::sys::fs::createTemporaryFile("index", "txt", IndexFD,
                                                     IndexFileName));
     llvm::ToolOutputFile IndexFile(IndexFileName, IndexFD);
-    IndexFile.os() << "c:@F at f#I# " << ASTFileName << "\n";
+    IndexFile.os() << "9:c:@F at f#I# " << ASTFileName << "\n";
     IndexFile.os().flush();
     EXPECT_TRUE(llvm::sys::fs::exists(IndexFileName));
 


        


More information about the cfe-commits mailing list