[PATCH] D41474: Fix a crash in lazy loading of Metadata in ThinLTO

Teresa Johnson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 23 09:48:44 PDT 2018


tejohnson added a comment.

Ok here's what is going on.

The scope of the DILocation in _ZN4EchoD2Ev has a scope that is a DILexicalBlock. In the test case, because that same DILexicalBlock is used as the scope for DILocation from two different routines (_ZN4EchoD2Ev and _ZN5DeltaD2Ev), it lives in the MODULE's metadata block. Therefore, during importing, we try to load it lazily. We parse the debug location when from BitcodeReader::parseFunction, in the FUNC_CODE_DEBUG_LOC case. This case invokes MDLoader->getMDNodeFwdRefOrNull() for the scope and inlinedAt fields, which creates a forward ref if it hasn't been loaded already, which it hasn't since it lives in the lazily loaded Module level metadata.

Most places in that invoke getMDNodeFwdRefOrNull have a corresponding call to resolveForwardRefsAndPlaceholders which will take care of resolving them. E.g. places that call getMetadataFwdRefOrLoad, or at the end of parsing a function-level metadata block, or  at the end of the initial lazy load of module level metadata in order to handle invocations of getMDNodeFwdRefOrNull for named metadata and global object attachments. However, the calls for the scope/inlinedAt of debug locations are not backed by any such call to resolveForwardRefsAndPlaceholders.

The reason the problem goes away as noted by the patch author when the getelementptr in _ZN5DeltaD2Ev is removed (actually, all it takes is removing the !dbg attachment there), is that the DILexicalBlock then lives in the function metadata block for _ZN4EchoD2Ev since it is only referenced by one function. The function metadata block as mentioned above is loaded eagerly.

I'm not sure why we haven't hit this too often historically (I put in an assert where we get the scope/inlinedAt that it isn't a temporary and it isn't hit at all for the current llvm regression suite), but I am guessing that the reason we suddenly started hitting with ThinLTO+hotcold function splitting is that more scopes are being referenced by multiple functions (part in the hot split func and part in the cold split func) and therefore moving to the lazily loaded module level metadata. This is likely also happening because splitting is/was happening in the pre-link compile step before importing (this was fixed for the old PM in https://reviews.llvm.org/D53437 but as I noted there is not yet fixed for the new PM).

The fix I tried that works both for this test case and for the large app where I hit it after enabling hot cold splitting with ThinLTO is to change the scope and inlinedAt handling to instead invoke getMetadataFwdRefOrLoad. If this seems like a reasonable fix (I am not sure why it wouldn't be), I can send a new patch with that fix. Here's the fix that worked for me:

- a/lib/Bitcode/Reader/BitcodeReader.cpp

+++ b/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -3520,12 +3520,12 @@ Error BitcodeReader::parseFunctionBody(Function *F) {

  MDNode *Scope = nullptr, *IA = nullptr;
  if (ScopeID) {

- Scope = MDLoader->getMDNodeFwdRefOrNull(ScopeID - 1);

+        Scope = dyn_cast_or_null<MDNode>(MDLoader->getMetadataFwdRefOrLoad(ScopeID - 1));

    if (!Scope)
      return error("Invalid record");
  }
  if (IAID) {

- IA = MDLoader->getMDNodeFwdRefOrNull(IAID - 1);

+        IA = dyn_cast_or_null<MDNode>(MDLoader->getMetadataFwdRefOrLoad(IAID - 1));

    if (!IA)
      return error("Invalid record");
  }


https://reviews.llvm.org/D41474





More information about the llvm-commits mailing list