[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Tue Feb 23 04:27:33 PST 2021

v.g.vassilev marked 3 inline comments as done.
v.g.vassilev added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenAction.cpp:908

+CodeGenerator *CodeGenAction::getCodeGenerator() const {
+  return BEConsumer->getCodeGenerator();
----------------
lhames wrote:
> v.g.vassilev wrote:
> > lhames wrote:
> > > v.g.vassilev wrote:
> > > > sgraenitz wrote:
> > > > > v.g.vassilev wrote:
> > > > > > @rjmccall, we were wondering if there is a better way to ask CodeGen to start a new module. The current approach seems to be drilling hole in a number of abstraction layers.
> > > > > > 
> > > > > > In the past we have touched that area a little in https://reviews.llvm.org/D34444 and the answer may be already there but I fail to connect the dots.
> > > > > > 
> > > > > > Recently, we thought about having a new FrontendAction callback for beginning a new phase when compiling incremental input. We need to keep track of the created objects (needed for error recovery) in our Transaction. We can have a map of `Transaction*` to `llvm::Module*` in CodeGen. The issue is that new JITs take ownership of the `llvm::Module*` which seems to make it impossible to support jitted code removal with that model (cc: @lhames, @rsmith).
> > > > > When compiling incrementally, doeas a "new phase" mean that all subsequent code will go into a new module from then on? How will dependencies to previous symbols be handled? Are all symbols external?
> > > > > 
> > > > > > The issue is that new JITs take ownership of the llvm::Module*
> > > > > 
> > > > > That's true, but you can still keep a raw pointer to it, which will be valid at least as long as the module wasn't linked. Afterwards it depends on the linker:
> > > > > * RuntimeDyld can return ownership of the object's memory range via `NotifyEmittedFunction`
> > > > > * JITLink provides the `ReturnObjectBufferFunction` for the same purpose
> > > > > 
> > > > > > seems to make it impossible to support jitted code removal with that model
> > > > > 
> > > > > Can you figure out what symbols are affected and remove these? A la: https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020
> > > > > 
> > > > > I think @anarazel has ported a client with code removal to OrcV2 successfully in the past. Maybe there's something we can learn from it.
> > > > > When compiling incrementally, doeas a "new phase" mean that all subsequent code will go into a new module from then on? How will dependencies to previous symbols be handled? Are all symbols external?
> > > > 
> > > > There is some discussion on this here https://reviews.llvm.org/D34444#812418
> > > > 
> > > > I think the relevant bit is that 'we have just one ever growing TU [...] which we send to the RuntimeDyLD allowing only JIT to resolve symbols from it.  We aid the JIT when resolving symbols with internal linkage by changing all internal linkage to external (We haven't seen issues with that approach)'.
> > > > 
> > > > > 
> > > > > > The issue is that new JITs take ownership of the llvm::Module*
> > > > > 
> > > > > That's true, but you can still keep a raw pointer to it, which will be valid at least as long as the module wasn't linked. 
> > > > 
> > > > That was my first implementation when I upgraded cling to llvm9 where the `shared_ptr`s went to `unique_ptr`s. This was quite problematic for many of the use cases we support as the JIT is somewhat unpredictable to the high-level API user. 
> > > > 
> > > > 
> > > > >Afterwards it depends on the linker:
> > > > > * RuntimeDyld can return ownership of the object's memory range via `NotifyEmittedFunction`
> > > > > * JITLink provides the `ReturnObjectBufferFunction` for the same purpose
> > > > > 
> > > > 
> > > > That's exactly what we ended up doing (I would like to thank Lang here who gave a similar advice).
> > > > 
> > > > > > seems to make it impossible to support jitted code removal with that model
> > > > > 
> > > > > Can you figure out what symbols are affected and remove these? A la: https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020
> > > > > 
> > > > > I think @anarazel has ported a client with code removal to OrcV2 successfully in the past. Maybe there's something we can learn from it.
> > > > 
> > > > Indeed. That's not yet on my radar as seemed somewhat distant in time.
> > > > 
> > > > Recently, we thought about having a new FrontendAction callback for beginning a new phase when compiling incremental input. We need to keep track of the created objects (needed for error recovery) in our Transaction. We can have a map of Transaction* to llvm::Module* in CodeGen. The issue is that new JITs take ownership of the llvm::Module* which seems to make it impossible to support jitted code removal with that model (cc: @lhames, @rsmith).
> > > 
> > > In the new APIs, in order to enable removable code, you can associate Modules with ResourceTrackers when they're added to the JIT. The ResourceTrackers then allow for removal. Idiomatic usage looks like:
> > > 
> > >   auto Mod = /* create module */;
> > >   auto RT = JD.createResourceTracker();
> > >   J.addModule(RT, std::move(Mod));
> > >   //...
> > >   if (auto Err = RT.remove())
> > >     /* handle Err */;
> > > 
> > > > we have just one ever growing TU [...] which we send to RuntimeDyld...
> > > 
> > > So is a TU the same as an llvm::Module in this context? If so, how do you reconcile that with the JIT taking ownership of modules? Are you just copying the Module each time before adding it?
> > > 
> > > > We need to keep track of the created objects (needed for error recovery) in our Transaction.
> > > 
> > > Do you need the Module* for error recovery? Or just the Decls?
> > > > Recently, we thought about having a new FrontendAction callback for beginning a new phase when compiling incremental input. We need to keep track of the created objects (needed for error recovery) in our Transaction. We can have a map of Transaction* to llvm::Module* in CodeGen. The issue is that new JITs take ownership of the llvm::Module* which seems to make it impossible to support jitted code removal with that model (cc: @lhames, @rsmith).
> > > 
> > > In the new APIs, in order to enable removable code, you can associate Modules with ResourceTrackers when they're added to the JIT. The ResourceTrackers then allow for removal. Idiomatic usage looks like:
> > > 
> > >   auto Mod = /* create module */;
> > >   auto RT = JD.createResourceTracker();
> > >   J.addModule(RT, std::move(Mod));
> > >   //...
> > >   if (auto Err = RT.remove())
> > >     /* handle Err */;
> > 
> > Nice, thanks!
> > 
> > > 
> > > > we have just one ever growing TU [...] which we send to RuntimeDyld...
> > > 
> > > So is a TU the same as an llvm::Module in this context? If so, how do you reconcile that with the JIT taking ownership of modules? Are you just copying the Module each time before adding it?
> > 
> > Each incremental chunk with which the TU grows has a corresponding `llvm::Module`. Once clang's CodeGen is done for the particular module it transfers the ownership to the `Transaction` which, in turn, hands it to the JIT and once the JIT is done it retains the ownership again.
> > 
> > > 
> > > > We need to keep track of the created objects (needed for error recovery) in our Transaction.
> > > 
> > > Do you need the Module* for error recovery? Or just the Decls?
> > 
> > Yes, we need a `llvm::Module` that corresponds to the Decls as sometimes CodeGen will decide not to emit a Decl.
> > Each incremental chunk with which the TU grows has a corresponding llvm::Module. Once clang's CodeGen is done for the particular module it transfers the ownership to the Transaction which, in turn, hands it to the JIT and once the JIT is done it retains the ownership again.
> 
> > Yes, we need a llvm::Module that corresponds to the Decls as sometimes CodeGen will decide not to emit a Decl.
> 
> Can you elaborate on this? (Or point me to the relevant discussion / code?)
> 
> Does CodeGen aggregate code into the Module as you CodeGen each incremental chunk? Or do you Link the previously CodeGen'd module into a new one?
> > Each incremental chunk with which the TU grows has a corresponding llvm::Module. Once clang's CodeGen is done for the particular module it transfers the ownership to the Transaction which, in turn, hands it to the JIT and once the JIT is done it retains the ownership again.
> 
> > Yes, we need a llvm::Module that corresponds to the Decls as sometimes CodeGen will decide not to emit a Decl.
> 
> Can you elaborate on this? (Or point me to the relevant discussion / code?)
> 
> Does CodeGen aggregate code into the Module as you CodeGen each incremental chunk? Or do you Link the previously CodeGen'd module into a new one?

Cling's "code unloading" rolls back the states of the various objects without any checkpointing. Consider the two subsequent incremental inputs: `int f() { return 12; } ` and `int i = f();`; `undo 1`. 

When we ask CodeGen to generate code for the first input it will not as `f` is not being used. Transaction1 will contain the `FunctionDecl*` for `f` but the corresponding llvm::Module will be empty. Then when we get the second input line, the Transaction2 will contain the `VarDecl*` but the corresponding llvm::Module will contain both IR definitions -- of `f` and `i`.

Having the clang::Decl is useful because we can restore the previous state of the various internal frontend structures such as lookup tables. However, we cannot just drop the llvm::Module as it might contain deferred declarations which were emitted due to a use.

That's pretty much the rationale behind this and the design dates back to pre-MCJIT times. I am all for making this more robust but that's what we currently have. The "code  unloading" is mostly done in cling's [DeclUnloader](https://github.com/vgvassilev/cling/blob/856f8e92f82f9037b3dbb27ae7f94add2ed6121f/lib/Interpreter/DeclUnloader.cpp#L815).

There was some useful discussion about the model [here](https://reviews.llvm.org/D34444#812418) quite some time ago

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96033/new/

https://reviews.llvm.org/D96033