[llvm-bugs] [Bug 32332] New: PCM nondeterminism w/builtins caused by global module index

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Mar 17 14:32:49 PDT 2017


http://bugs.llvm.org/show_bug.cgi?id=32332

            Bug ID: 32332
           Summary: PCM nondeterminism w/builtins caused by global module
                    index
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Modules
          Assignee: unassignedclangbugs at nondot.org
          Reporter: aprantl at apple.com
                CC: dgregor at apple.com, llvm-bugs at lists.llvm.org

I found a really tricky case of nondeterministic PCM files caused by builtin
macros.

In the testcase below the identifier table (and thus the AST signature) in
b.pcm is different when
1. building a.pcm and b.pcm together, versus
2. first building a.pcm, then building b.pcm

The problem is with builtin identifiers like __FLT_EPSILON__ that are defined
like this
lib/Frontend/InitPreprocessor.cpp:  Builder.defineMacro(DefPrefix +
"EPSILON__", Twine(Epsilon)+Ext);
These identifiers make it into B.pcm's identifier table only in case (1).


More instrumenting & debugging turned up that in the full build of A & B, we
have
> no global module index!
wheres in the incremental build of B
> GlobalIndex->lookupIdentifier('__FLT_EPSILON__')==1

This makes sense: this is the global index written into the module cache dir.
The presence of the hit in the GlobalIndex causes us to:

  // If the caller has provided us with a hit-set that came from the global
  // module index, mark every module file in common with the global module
  // index that is *not* in that set as 'visited'.
  if (ModuleFilesHit && !ModulesInCommonWithGlobalIndex.empty()) {
    for (unsigned I = 0, N = ModulesInCommonWithGlobalIndex.size(); I != N;
++I)
    {
      ModuleFile *M = ModulesInCommonWithGlobalIndex[I];
      if (!ModuleFilesHit->count(M))
        State->VisitNumber[M->Index] = VisitNumber;
    }
  }

which means we never mark the identifier as isFromAST since we are never
visiting it again.

In the incremental case we are marking A as visited.

  // If there is a global index, look there first to determine which modules
  // provably do not have any results for this identifier.
  GlobalModuleIndex::HitSet Hits;
  GlobalModuleIndex::HitSet *HitsPtr = nullptr;
  if (!loadGlobalIndex()) {
    /// \param Hits Will be populated with the set of module files that have
    /// information about this name.
    /// \returns true if the identifier is known to the index, false otherwise.
    if (GlobalIndex->lookupIdentifier(II.getName(), Hits))
      HitsPtr = &Hits;
  }

1. We found a hit for __FLT_EPSILON__ in the global module index.
2. Module A is not in Hits, and Hits has a length of 0. But: __FLT_EPSILON__ is
known.

How can this happen?

bool GlobalModuleIndex::lookupIdentifier(StringRef Name, HitSet &Hits) {
  Hits.clear();

  // If there's no identifier index, there is nothing we can do.
  if (!IdentifierIndex)
    return false;

  // Look into the identifier index.
  ++NumIdentifierLookups;
  IdentifierIndexTable &Table
    = *static_cast<IdentifierIndexTable *>(IdentifierIndex);
  IdentifierIndexTable::iterator Known = Table.find(Name);
  if (Known == Table.end()) {
    return true;
  }

  SmallVector<unsigned, 2> ModuleIDs = *Known;
  for (unsigned I = 0, N = ModuleIDs.size(); I != N; ++I) {
    if (ModuleFile *MF = Modules[ModuleIDs[I]].File)
      Hits.insert(MF);
  }

  ++NumIdentifierLookupHits;
  return true;
}

I suppose the comment is wrong. GlobalModuleIndex::lookupIdentifier returns
true even if there was no hit.
That means our problem is that the identifier is in the module, but not in the
index. The GlobalModuleIndex is written at the very end and works by iterating
over every file in the module cache directory.

To summarize:

1. A full build of A and B.
   -> B contains __FLT_EPSILON__ in the identifier table.
      => because the idx is empty we visit it and
         ASTIdentifierLookupTrait::ReadData() marks it as FromAST
      =? i *think* that causes it to end up in the identifier table.
   -> modules.idx contains __FLT_EPSILON__ pointing nowhere.
2. Clean cache.
3. Build of A.
   -> modules.idx contains __FLT_EPSILON__ pointing nowhere.
4. Build of B.
   -> __FLT_EPSILON__ is in the index
   -> we skip visiting __FLT_EPSILON__ because we know it is in no module
mentioned in the index.
     => uncommenting this shortcut makes my testcase succeed.

Testcase:

diff --git a/test/Modules/Inputs/builtins/a.h
b/test/Modules/Inputs/builtins/a.h
new file mode 100644
index 0000000000..84cabad6d3
--- /dev/null
+++ b/test/Modules/Inputs/builtins/a.h
@@ -0,0 +1,3 @@
+#include <float.h>
+ at interface A
+ at end
diff --git a/test/Modules/Inputs/builtins/b.h
b/test/Modules/Inputs/builtins/b.h
new file mode 100644
index 0000000000..f6e019eb64
--- /dev/null
+++ b/test/Modules/Inputs/builtins/b.h
@@ -0,0 +1,3 @@
+#include "a.h"
+ at interface B : A
+ at end
diff --git a/test/Modules/Inputs/builtins/module.modulemap
b/test/Modules/Inputs/builtins/module.modulemap
new file mode 100644
index 0000000000..74e18ca6b8
--- /dev/null
+++ b/test/Modules/Inputs/builtins/module.modulemap
@@ -0,0 +1,7 @@
+module A {
+  header "a.h"
+}
+
+module B {
+  header "b.h"
+}
diff --git a/test/Modules/builtins-incremental.m
b/test/Modules/builtins-incremental.m
new file mode 100644
index 0000000000..85326ec323
--- /dev/null
+++ b/test/Modules/builtins-incremental.m
@@ -0,0 +1,23 @@
+// RUN: rm -rf %T/cache %T/cache1 %T/cache2 
+//
+// Build B which imports A.
+// RUN: %clang_cc1 -cc1 -fsyntax-only -fmodules -fimplicit-module-maps \
+// RUN:   -fmodules-cache-path=%T/cache -fdisable-module-hash \
+// RUN:   -I %S/Inputs/builtins %s
+// RUN: mv %T/cache %T/cache1
+//
+// Build only A.
+// RUN: echo '@import A;' >%t.m
+// RUN: %clang_cc1 -cc1 -fsyntax-only -fmodules -fimplicit-module-maps \
+// RUN:   -fmodules-cache-path=%T/cache -fdisable-module-hash \
+// RUN:   -I %S/Inputs/builtins %t.m
+//
+// Incrementally build B which imports A.
+// RUN: %clang_cc1 -cc1 -fsyntax-only -fmodules -fimplicit-module-maps \
+// RUN:   -fmodules-cache-path=%T/cache -fdisable-module-hash \
+// RUN:   -I %S/Inputs/builtins %s
+// RUN: mv %T/cache %T/cache2
+//
+// RUN: diff %T/cache1/B.pcm %T/cache2/B.pcm
+
+ at import B;

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170317/06a9870b/attachment-0001.html>


More information about the llvm-bugs mailing list