[cfe-dev] RFC: A builder for CompilerInstance

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Fri May 24 16:50:55 PDT 2019


On Tue, 26 Mar 2019 at 17:14, Duncan Exon Smith via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I have the beginnings of a patch for a CompilerInstance builder.  I'd like
> some early feedback before going further.
>
> ## Questions ##
>
> - Is a builder API a good idea?
> - Should it look something like this?  If not, what?
> - What should be in it?
> - How should it be staged?
>
> ## Motivation ##
>
> My motivation for looking at this is a little tangential: while working on
> the InMemoryModuleCache recently (e.g., r355778) I realized that this cache
> should be leveraging the FileManager to dedup paths by inode.  However when
> I tried a naïve fix I was getting crashes, and I had trouble understanding
> FileManager/VFS lifetime and ownership.  Ultimately, I'm hoping to clarify
> what the invariants are around an instance of CompilerInstance and its
> FileManager.
>
> ## Is a builder API a good idea? ##
>
> A builder API seems a little cleaner than how we currently manage the
> boilerplate in constructing a CompilerInstance.
>
> The status quo is that each client default-constructs and then calls
> various mutators like `setInstance` to set it up.  However, this API allows
> clients to swap in different data structures after construction.  If
> clients don't need or use that flexibility, the code for maintaining that
> flexibility will be brittle, some combination of broken and confusing.
>
> For example, I cleaned up some confusing logic in r357037 in
> CompilerInstance::setFileManager and CompilerInstance::createFileManager
> that were trying to adapt to a CompilerInstance's VFS and FileManager
> changing at arbitrary times and keep them in sync (in that case, it was
> easy: just use FileManager's VFS directly).
>
> With a builder API, we can lock down CompilerInstance mutators so that
> it's clear what is actually allowed to change mid-flight and what isn't.
> It's probably also easier to handle "obvious defaults" in a sane way for
> clients that only want to specify one or two things (e.g., unit tests).
>
> ## Should it look something like this?  If not, what? ##
>
> Here's the WIP patch I have:
> https://reviews.llvm.org/differential/diff/192381/
> (also attached below, but without the -U9999999 context)
>
> The core idea is to remove `CompilerInstance::set*` and
> `CompilerInstance::create*` in cases where a CompilerInstance never needs
> more than one version of a data structure over its lifetime.  So far the
> patch removes CompilerInstance::setInvocation,
> CompilerInstance::setDiagnostics, and CompilerInstance::createDiagnostics.
>
> The meat is in two files:
> - CompilerInstanceBuilder.h (
> https://reviews.llvm.org/differential/diff/192381/#change-AjNDpfG5S3AZ):
> a move-only data structure for collecting the parameters for building a
> CompilerInstance.
> - CompilerInstance.cpp (
> https://reviews.llvm.org/differential/diff/192381/#change-wS2d87BZOVCm):
> logic to process those parameters at CompilerInstance construction time.
>
> Note on one semantic change I made: CompilerInstance *always* has a
> DiagnosticsEngine now.  I did that because there were tons of sites calling
> CompilerInstance::createDiagnostics with no arguments.  Originally I had a
> zero-parameter version of CompilerInstanceBuilder::diags that controlled
> whether to create one, but just doing it all the time seemed to improve the
> code.  My suspicion is that the DiagnosticsEngine is not intended to be
> optional, it's just hard to sort out how to construct one (without a
> builder API).
>
> The rest of it all looks like removal of boilerplate to me.
>
> Thoughts on this direction?
>

This seems like a nice idea to me. Building and configuring a
CompilerInstance is certainly a mess right now, and this seems to make it
cleaner and harder to get wrong.

## What should be in it? ##
>
> My personal goal here is to pull in the prerequisites for removing
> CompilerInstance::setFileManager.  It's possible this won't be possible,
> but if not I'll figure out why not in the process.
>
> What else should be in there?  Here are some options:
>
> - FileManager (I hope so!)
> - SourceManager (I think so!)
> - Preprocessor?
> - TargetInfo?
> - Sema (probably not, it seems truly optional)
> - ModuleManager (probably not?)
>

I think it'd make sense to see how far you can get with the builder
pattern. Some of the above need to be created by the FrontendAction,
possibly once for each input, and might not fit well into a builder-style
approach -- but maybe we're missing a layer there, and there should be some
separate object representing the specific action invocation (whose lifetime
is the duration of an iteration of the
BeginSourceFile/Execute/EndSourceFile loop in
CompilerInstance::ExecuteAction) rather than lumping per-action state into
the CompilerInvocation and resetting it between each action invocation? At
least right now, the file manager and source manager are torn down and
recreated for each action execution.

I think we'll need to figure out:
 1) Do we allow multiple actions to be run on a single compiler instance,
and if so, what does that mean? (Are they run in the same context or
different contexts?)
 2) Do we allow multiple inputs to be passed to a single frontend action,
and if so, what does that mean? (Are they run in the same context or
different contexts?)

Right now, the answer to both 1 and 2 is yes, and they're run in different
contexts (we tear down and recreate most of the compiler instance each
time). I think that's probably a mistake, particularly for (2): the one
frontend action that has a reason to support multiple input files is
GenerateHeaderModuleAction, and it has to work around the default behavior
of the CompilerInstance so that it can consume all the inputs in the same
action invocation. I'd also like to eventually add a "final build" mode to
clang, to parse multiple source files into the same AST and generate a
single IR module from them (but with them properly isolated from each other
by module visibility, to avoid the usual problems with that mode), and
again that suggests that multiple input files to an action should be
processed as part of the same AST / Sema / CodeGen context.

Maybe the answer to 1 should be no. In which case we might find that we end
up wanting just

  runAction(CompilerInstanceBuilder, FrontendAction)

and no exposed CompilerInstance object at all.


> ## How should it be staged? ##
>
> Assuming this directly looks good, I'm not sure how to stage this.  Should
> I create a monolithic patch and get it reviewed as a whole, or post a patch
> for a skeletal builder and then patches for adding arguments to it
> incrementally?
>
> The latter is the usual approach, and is what I've been following
> locally.  I have two patches:
>
> - Add a builder that handles the current constructor arguments:
> PCHContainerOps and InMemoryModuleCache.
> - Add a CompilerInvocation and DiagnosticsEngine to the builder and remove
> createDiagnostics, setDiagnostics, and setInvocation.  The construction of
> these two is quite intertwined so there didn't seem to be a clean way to do
> this separately.
>
> I'm hesitating slightly since I'll be touching roughly the same lines of
> code in every incremental patch, but I'm still inclined to go that way so
> that incremental progress doesn't bitrot.
>

The incremental path sounds preferable to me. I think you're far enough
along that we can imagine what a further-progressed result would look like
without having a complete patch series. Touching the same lines repeatedly
seems fine :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190524/ffb60c6e/attachment.html>


More information about the cfe-dev mailing list