[PATCH] D34444: Teach codegen to work in incremental processing mode.

Mon Jul 17 22:28:05 PDT 2017

rjmccall added a comment.

In https://reviews.llvm.org/D34444#795175, @v.g.vassilev wrote:

> @rjmccall, thanks for the prompt and thorough reply.
>
> In https://reviews.llvm.org/D34444#793311, @rjmccall wrote:
>
> > Okay.  In that case, I see two problems, one major and one potentially major.
>
>
>
>
>   This is a very accurate diagnosis which took us 5 years to discover on an empirical basis ;)

You could've asked at any time. :)

>> The major problem is that, as Richard alludes to, you need to make sure you emit any on-demand definitions that Sema registered with CodeGen during the initial CGM's lifetime but used in later CGMs.
> 
> 
> 
>   We bring the CGM state to the subsequent CGMs. See https://github.com/vgvassilev/clang/blob/cling-patches/lib/CodeGen/ModuleBuilder.cpp#L138-L160

That's quite brittle, because that code is only executed in a code path that only you are using, and you're not adding any tests.  I would greatly prefer a change to IRGen's core assumptions, as suggested.

>> The potentially major problem is that it is not possible in general to automatically break up a single translation unit into multiple translation units, for two reasons.  The big reason is that there is no way to correctly handle entities with non-external linkage that are referenced from two parts of the translation unit without implicitly finding some way to promote them to external linkage, which might open a huge can of worms if you have multiple "outer" translation units.
> 
> 
> 
>   We do not have an multiple 'outer' translation units. We have just one ever growing TU (which probably invalidates my previous statement that we have a distinct TUs) which we send to the RuntimeDyLD allowing only JIT to resolve symbols from it.  We aid the JIT when resolving symbols with internal linkage by changing all internal linkage to external (We haven't seen issues with that approach).

Ah, okay.  Yes, that is a completely different translation model from having distinct TUs.

IRGen will generally happily emit references to undefined internal objects; instead of hacking the linkage, you could just clean that up as a post-pass.  Although hacking the linkage as post-pass is reasonable, too.  In either case, you can recognize uses of internal-linkage objects that haven't been defined yet and report that back to the user.

>>   The lesser reason is that the prefix of a valid translation unit is not necessarily a valid translation unit: for example, a static or inline function can be defined at an arbitrary within the translation unit, i.e. not necessarily before its first use.  But if your use case somehow defines away these problems, this might be fine.
> 
> 
> 
>   If we end up with a module containing no definition of a symbol and such is required, then we complain. So indeed we are defining away this issue.

Ok.

>> As a minor request, if you are going to make HandleTranslationUnit no longer the final call on CodeGenerator, please either add a new method that *is* a guaranteed final call or add a new method that does the whole "end a previous part of the translation unit and start a new one" step.
> 
> 
> 
>   We can have this but it would be a copy of `HandleTranslationUnit`.  The `StartModule` interface is the antagonist routine to `ReleaseModule`. If you prefer we could merge `StartModule` into `ReleaseModule` adding a flag (or reading the value of `isIncrementalProcessingEnabled`).

I feel it is important that there be a way to inform an ASTConsumer that no further requests will be made of it, something other than calling its destructor.  I would like you to make sure that the ASTConsumer interface supports that and that that call is not made too soon in your alternate processing mode.

Repository:
  rL LLVM

https://reviews.llvm.org/D34444