[cfe-dev] Modular Codegen: Cross-module deserialization problem

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Fri Feb 24 17:13:06 PST 2017


On 24 February 2017 at 14:25, David Blaikie via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> OK, sounds fair - so GetExternalDeclStmt needs at least a Deserializing
> object to defer deserialization further. Does it also need a
> SavedStreamPosition, do you think? I can't immediately think of how/why it
> might, but they seemed to be paired in other places.
>

I think this is an accurate summary:
 * You need a Deserializing object if you're going to deserialize (or more
accurately, temporarily violate invariants, use global cursors, ...) and
your caller might not be deserializing
 * You need a SavedStreamPosition object if you're going to use a global
cursor and your caller might be deserializing
... so you'd see a pair of them whenever someone is using a global cursor
and doesn't know whether their caller is deserializing.

I think GetExternalDeclStmt doesn't need a SavedStreamPosition, but should
assert that NumCurrentElementsDeserializing == 0 before it creates its
Deserializing object.


> On Fri, Feb 24, 2017 at 2:22 PM Richard Smith <richard at metafoo.co.uk>
> wrote:
>
>> On 24 February 2017 at 09:41, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> I've spent a few days debugging/trying to understand the following, so
>> I'm writing it all down in the hopes of getting it straight/clear and
>> potentially getting some external perspective on what's going on, whether
>> I've understood it correctly, and what might be a good way to solve it.
>>
>> Starting with this test case:
>> foo.h:
>>   struct foo {};
>>   inline void e() { foo(); }
>> bar.h:
>>   #include "foo.h"
>>   template <typename T>
>>   foo bar(foo &f) {
>>     return f;
>>   }
>>   void z() { (void)&bar<int>; }
>>
>> Build each into a separate module, run modular codegen on bar.pcm:
>>   clang-tot -cc1 -fmodules-codegen -xc++ -emit-module -fmodules \
>>     -fmodule-name=foo foo.cppmap -o foo.pcm
>>   clang-tot -cc1 -fmodules-codegen -xc++ -emit-module -fmodules \
>>     -fmodule-name=bar bar.cppmap -o bar.pcm -fmodule-file=foo.pcm
>>   clang-tot -c bar.pcm -o bar.o
>>
>> (I haven't fully understood why the use of 'foo' in foo.h is necessary,
>> nor why 'bar' needs to be a template - those might provide some further
>> insight about how this should/could work)
>>
>> This patch makes the failure a bit more visible/immediate:
>>   diff --git lib/Serialization/ASTReaderStmt.cpp lib/Serialization/
>> ASTReaderStmt.cpp
>>   index b4718367d4..1fb48e9560 100644
>>   @@ -3903,7 +3904,9 @@ Stmt *ASTReader::ReadStmtFromStream(ModuleFile
>> &F) {
>>        ++NumStatementsRead;
>>
>>        if (S && !IsStmtReference) {
>>   +      auto X = Cursor.GetCurrentBitNo();
>>          Reader.Visit(S);
>>   +      assert(X == Cursor.GetCurrentBitNo() && "Narf");
>>          StmtEntries[Cursor.GetCurrentBitNo()] = S;
>>        }
>>
>> This is layered on top of the modular codegen rewrite/refactor (to use a
>> bit on function definitions, instead of to imply modular codegen from the
>> Module object) and some other fixes. I'll include the full patch I'm
>> working with.
>>
>> The sequence of relevant steps (as best as I can figure)
>>
>>    1. bar() body deserialization begins
>>    2. bar module's DeclCursor is used, jumping to the start of the
>>    bar()'s definition
>>    3. clang::ASTReader::ReadStmtFromStream iterates through Stmts in
>>    bar()
>>    no SavedStreamPosition nor Deserializing object here
>>    4. deserializing the EXPR_CXX_CONSTRUCT gets interesting:
>>    5. eventually involves deserializing foo(const foo&)
>>    6. foo module's DeclCursor is used
>>    7. foo module's DeclCursor is saved/preserved with a
>>    SavedStreamPosition
>>    8. a ExternalASTSource::Deserializing context is started
>>    9. the definition of foo(const foo&) (in the bar module) is
>>    deserialized
>>    10. this definition is 'interesting' and added to the ASTReader's
>>    InterestingDecls
>>    11. The end of the Deserializing context (8) is reached
>>       1. In the non-modular-codegen case, DeclMustBeEmitted is not true
>>       for foo(const foo&) and it is shelved for lazy emission, end of story
>>    12. In the modular-codegen case, foo(const foo&) in the bar module
>>    must be emitted (all inline functions (implicitly or explicitly) defined in
>>    a module are emitted weak, etc)
>>    13. So foo(const foo&) is non-lazily deserialized and emitted
>>    14. The bar module's DeclCursor is used for this, unprotected
>>    15. execution eventually gets back to the deserialization of bar() -
>>    and the DeclCursor it's using is out of position -> badness.
>>
>> Maybe there's a better way to provide this timeline, I'm not sure -
>> hopefully it makes sense.
>>
>> Essentially the way ASTReader::ReadDeclRecord (started in (5) that
>> contains the SavedStreamPosition and Deserializing context) seems to assume
>> that the only things that will be non-lazily deserialized will come from
>> the same module. Modular codegen breaks that invariant at the moment (well,
>> with the patch provided).
>>
>> I did try SavedStreamPosition-protecting the call from 3->4 (specifically
>> the "Visit(Expr)" call in ReadStmtFromStream) though still hit some
>> crashes. Maybe that's the right path to go down still, but need to do more?
>>
>> When I tried to copy the DeclCursor in ReadStmtFromStream that actually
>> broke things pretty significantly (compile errors on valid code even
>> without modular codegen enabled). But I don't know much about the cursors -
>> evidently more than only an efficiency concern, I guess the non-offset
>> state in the copy of the cursor changes and so the original DeclCursor
>> isn't updated, etc. (I wonder about splitting these cursors into a shared
>> state + offset, share the state with shared_ptr and make it cheap to copy
>> the actual offset state around so there's less reason to risk these sharing
>> situations & all the SavedStreamPosition protection that requires).
>>
>> Long & short of it: What should I do here? What're the likely goals I
>> should be trying to move the code towards, if any?
>>
>>
>> We should have at least one instance of Deserializing extant during all
>> deserialization. It looks like one is missing from GetExternalDeclStmt.
>>
>> I think it's correct that ReadStmtFromStream does not try to maintain the
>> stream position: it's intending to be called with the DeclsCursor pointing
>> at the Stmt to read, and intends to leave the stream pointing to the record
>> after that point. The external caller (GetExternalDeclStmt) jumps to the
>> correct bit location (as do all other users of the DeclsCursor) before
>> calling it, and does not expect to ever be called reentrantly. But a
>> reentrant call to GetExternalDeclStmt is exactly what's happening in your
>> case, because it also doesn't /defend/ against reentrancy from
>> end-of-deserialization actions through a Deserializing object.
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170224/ffb5174f/attachment.html>


More information about the cfe-dev mailing list