[cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Tue Jun 2 17:37:36 PDT 2020


On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> *TL;DR*
>
> We propose some non-trivial refactoring in Clang and LLVM to enable
> further work on Flang driver.
>
> *SUMMARY*
> We would like to start extracting the driver/frontend code from Clang
> (alongside the code that the driver/frontend depends on, e.g.
> Diagnostics) and move the components that could be re-used by
> non-C-based languages to LLVM. From our initial investigation we see
> that these changes will impact many projects (upstream and downstream)
> and will require big mechanical patches (our first attempt is
> implemented in [8]). This is not ideal, but seems unavoidable in the
> long-term. We would like to do this refactoring _before_ we start
> implementing the Flang driver upstream (OPTION 1 below). This way we avoid:
>
> * contaminating Clang with Fortran specific code (and vice versa)
> * introducing dependency on Clang in Flang
>
> The downside is that the refactoring is likely to be disruptive for many
> projects that use Clang. We will try our best to minimise this.
>
> Does this approach make sense? Are there any preferred alternatives? At
> this stage we'd like to discuss the overall direction. If folks are in
> favour, we'll send a separate RFC with a finer breakdown and more
> technical details for the refactoring.
>
> Below you will find more context for our use-case (the Flang driver) and
> possible alternatives. We hope that this will help the discussion. We
> would really appreciate your feedback!
>

Generally, I think this is a good idea, and a healthy direction for LLVM
overall. We need to be careful to do this in a way that doesn't introduce
complexity or overheads in Clang, though, so we should proceed very
cautiously.

I also think you're skewing somewhat too far in favor of code reuse. Some
of the Clang code you're identifying below is very carefully tuned and
tailored to Clang's use, and the amount of it that could reasonably be
shared with another project (carefully tuned and tailored to that project's
use) is probably small, unless we heavily generalize it. Such
generalization is likely not going to be worth its development and
maintenance costs.


> *BACKGROUND*
> Flang (formerly known as F18) has recently been merged into LLVM [1].
> Our ambition, as a community, is to make it as flexible, robust and nice
> to work with as Clang. One of the major items to address is the
> implementation of a driver that would provide the flexibility and user
> experience similar to that available in Clang. The F18/Flang driver was
> already discussed on cfe-dev last year [2], but back then F18 (now llvm
> project/flang) was a separate project. In the original proposal it was
> assumed that initially Flang would depend (and extend where necessary)
> Clang's driver/frontend code. Since F18/Flang was an independent
> project, the refactoring of Clang/LLVM wasn't really considered. That
> design has been challenged since ([3], [10]), and also not much progress
> has been made. We would like to revisit that RFC from a slightly
> different angle. Since Flang is now part of LLVM's monorepo, we feel
> that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
> of sense and is the natural first step.
>
> *ASSUMPTIONS & DESIGN GOALS*
> 1. We will re-use as much of the Clang's driver/frontend code as
> possible (this was previously proposed in [2]).
>
> 2. We want to avoid dependencies from Flang to Clang, both long-term
> (strong requirement) and short-term (might be difficult to achieve).
> This has recently come up in a discussion on one of our early patches
> [3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
> avoid this dependency), and was also suggested before by Eric
> Christopher [10].
>
> 3. We will move the code that can be shared between Flang and Clang (and
> other projects) to LLVM. This idea has already come up on llvm-dev
> before [7] (in a slightly different context, and to a slightly different
> extent). The methods that are not language specific would be shared in
> an LLVM library.
>
> 4. The classes/types/methods that need specific changes for Fortran will
> be "copied" to Flang and adapted as needed. We should minimize (or even
> eliminate) any Fortran specific code from Clang and make sure that that
> lives in llvm-project/flang.
>
> *FLANG'S DEPENDENCIES ON CLANG*
> These are the dependencies on Clang that we have identified so far while
> prototyping the Flang driver.
>
> 1. All the machinery related to Diagnostics & SourceLocation.
>
> This is currently part of libclangBasic [4] and is used in _many_ places
> in Clang. The official documentation [5] suggests that this could be
> re-used for non-C-based languages. In particular, we feel that It would
> make a lot of sense for Flang to use it. Also, separating Clang's
> driver/frontend code and the diagnostics would require a lot of
> refactoring for no real benefit (and we feel that Flang should re-use
> Clang's driver/frontend code, see below). This dependency is used in
> many places, so moving it to LLVM will require a lot of (mostly)
> mechanical changes. We can't see an obvious way to split it into smaller
> chunks (see also below where we discuss the impact).
>

I do not think it is necessarily going to be reasonable to move all
machinery related to SourceLocation (in particular, all of clang's
SourceManager) into LLVM. The ideas and data structure underpinning
SourceLocation and SourceManager are quite general (a concatenated
hierarchical slab of contiguous blocks, with linear indexing within those
blocks), but the details are much more specific to Clang and the C-family
languages it represents. Things like the support for object-like and
function-like macros, macro arguments, #include, splitting >> tokens for
C++, and so on, all make sense for Clang, but probably make less sense for
Fortran, where a different set of kinds of block would probably be desired
instead. This isn't something that can be trivially generalized and
extended, either; we carefully bit-pack various things into our block
representations, and as a result, we're quite tightly fitted to the needs
of Clang, and would probably not want to move away from that position.

However, I do think there is common infrastructure that can be extracted,
with some significant work done to generalize the SourceManager
infrastructure and make it tailorable to the needs of Clang and Flang (and
any other consumers of it that might come along). I could imagine moving
all of the complexity to do with what kinds of SLocEntry are supported into
a traits type, and having a reusable template that can generate a data
structure that the Clang and Flang SourceManagers can be implemented in
terms of.

Clang's SourceLocation is probably almost directly useable as-is -- it has
hardcoded assumptions about a particular bit being reserved to indicate a
location within a C preprocessor macro, but we can move that to a static
method on Clang SourceManager, and then I think SourceLocation can be
directly shared between the two projects.

(One big asterisk on the above: will Flang want an integrated C
preprocessor? If so, then we're now talking about a much larger chunk of
Clang, including the lexer, preprocessor, identifier tables, the Token
type, and it may be best to simply acknowledge that Flang has a dependency
on Clang to supply all that, rather than moving it into LLVM.)

The layers below SourceManager -- FileManager, the VFS, and so on -- all
seem like they should be reasonable to share between projects.

Some of the diagnostics engine seems reasonable to share: specifically, the
tablegen-driven diagnostic table generation, most of the diagnostics engine
(including support for diagnostics pragmas that change the set of warnings
enabled at different source locations), and the formatting code for
non-clang-specific types are all relatively reusable. If you want to reuse
the TextDiagnosticPrinter, I think that will need some refactoring; it's
currently tied into the specific needs of Clang's SourceManager (for
handling textual inclusion and macro expansion in the way that C-family
languages deal with those things). I expect it would be possible to factor
out an interface that Clang could implement to provide the necessary
customizations.

Before we factor out the diagnostics engine, we should fix the longstanding
issue that it requires a global monolithic table covering all diagnostics,
and is consequently unable to properly respect layering. I think this is
very much fixable, but it requires someone to do the work to fix it :)

Looking at your branch, I immediately see a few things there that are
unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and
OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of that
already. =)


> 2. libclangFrontend & libclangDriver
>
> The Flang driver will use many methods from libClangDriver,
> libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
> from libClangDriver are responsible to call, pass the correct arguments
> and execute the driver. TextDiagnosticPrinter.h takes care of printing
> the driver diagnostics in case of errors.
>
> The Flang frontend will use CompilerInstance, CompilerInvocation,
> FrontendOptions, FrontendActions and Utils from libClangFrontend and
> libClangFrontendTool. These methods are responsible for translating the
> command line arguments to frontend Options and later to Actions to be
> executed by ExecuteCompilerInvocation. The translation from arguments to
> Actions happens with FrontendOption and FrontendActions. But it is the
> CompilerInvocation that has the pointers for the sequence of Actions
> that are required in a Compiler Instance. These methods are needed to
> implement Flang driver/frontend and contain actions/method/functions
> that seem to be language agnostic.
>

I think this is going too far in attempting to reuse Clang code.
CompilerInvocation, for example, is almost exclusively dealing in parsing
Clang's -cc1 flags, which I would expect to have very little overlap with
Flang's flags, and CompilerInstance exists (in part) to manage and own all
the Clang-specific global objects (the parser, sema, the module loader, the
AST consumer). Flang should not be going anywhere near this stuff, and
should be implementing its own frontend.

There may be some clang-independent parts that can be factored out, but I
would expect them to be small enough that we can address them on a
case-by-case basis. The interesting thing to factor out is the parsing of
command-line options, but that's already been done. I think your approach
here should be to assume as a baseline that you reuse none of clang's
Frontend library, but if you find general pieces that can meaningfully be
extracted, we can talk about those pieces in isolation.


For the driver, I think the picture is very different. It seems to me that
we should only have one LLVM driver, that can build C-family languages,
Fortran code, or both at the same time (or invoke lld etc). To that end, I
think it would be reasonable to move clang's driver out to a separate LLVM
project (maybe that's llvm/, maybe it's somewhere new such as driver/), and
extend it to be able to invoke flang actions in addition to clang actions.
Then the only difference between the clang and flang drivers would be which
frontend is directly linked into the driver binary and which one is invoked
by exec'ing a different binary. That would imply that all the parts of
Clang that are depended on by the driver are also moved out (I think the
main parts here are flags and diagnostics, and via the diagnostics layer,
source locations).

This will require some decoupling between the Clang driver and frontend
(currently Clang's Options.td contains various driver options that are
marked as also being options for Clang's -cc1 mode; duplicating those in
CC1Options.td is probably acceptable, if we're going to split the driver
and frontend into two different projects), and some shared support code
(eg, clang's sanitizers list) will presumably end up in the driver, because
we don't want a driver -> *lang dependency.


> *ALTERNATIVES*
> This is a summary of the alternative ways of implementing the Flang
> driver. We propose OPTION 1. If there are no major objections, we will
> draft a separate RFC with more technical details (we will also break it
> down into smaller pieces). Otherwise, what would be your preferred
> alternative and why?
>
> OPTION 1
> We avoid dependency on Clang from Day 1.
>
> This is the ideal scenario that would guarantee that Clang and Flang are
> completely separate and that the common bits stay in LLVM instead. It
> would mean slower progress for us initially, but then other projects
> could benefit from the refactoring sooner rather than later.
>
> OPTION 2
> We avoid dependency on clangBasic from day 1, but initially allow
> dependency on libClangFrontend & libClangDriver (or other libs specific
> to the driver/frontend).
>
> The dependency on libclang{Driver|Frontend} would gradually be
> removed/refactored out as the driver for Flang gains momentum. As
> mentioned earlier, there is plenty of code in libClangFrontend and
> libClangDriver that we'd like to re-use, but the separation between code
> that's specific to C-based languages and generic driver/frontend code is
> not always obvious. We think that refactoring the common bits in
> libClangFrontend and libClangDriver might simply be easier once:
>
>   * we have a Flang driver that leverages these libraries, and, as a
> result,
>   * we understand better what we could re-use and what's not that
> relevant to non-C-based languages.
>
> OPTION 3
> We initially keep the dependency on Clang and re-visit this RFC later.
>
> This would be the least disruptive approach (at least for the time
> being) and would allow us to make us the most rapid progress (i.e. we
> would be focusing on implementing the features rather than refactoring).
> It would also inform the future refactoring better. But it was already
> pointed out that we should avoid dependencies on clang [3] and this
> would be a step in the opposite direction. Also, the build requirements
> for Flang would increase, and we feel that we should strive to reduce
> them instead [6].
>
> If we missed any alternatives, please bring them up.
>

I don't think I can express an opinion without knowing whether you intend
for Flang to ever support an integrated C preprocessor. If not, then option
1 seems appropriate. But if so, then I think we have a choice between
factoring out all of clang below the parser or just acknowledging that
Flang depends on Clang for its lexical layer and deciding to keep a flang
-> clang dependency forever.


> *IMPACT ON OTHER PROJECTS*
> The refactoring will have non-trivial impact on other projects:
>
> * OPTION 1 and OPTION 2 - huge impact initially.
> * OPTION 3 - no impact initially, but most likely similar impact as
> OPTION 1 and OPTION 2 in the long term.
>
>  From our initial investigation, extracting Diagnostics/SourceLocation
> from clangBasic and moving it to LLVM will be the most impactful change.
> Within llvm-project it is used in clang, clang-tools-extra, lldb and
> polly. Most of the changes will be mechanical, but will require touching
> many files. In order to get to a state where we could build libclang
> using the newly defined LLVM library, we had to touch ~850 files and
> make ~30k insertions/deletions. The result of this exercise is available
> in our development fork of llvm-project [8].
>
> Please note: our patches on GitHub [8] are just experiments to
> illustrate the idea. It's work-in-progress that requires a lot of
> polishing. When/if up-streaming this, we would need to do some
> low-impact refactoring first. For example, currently ASTReader &
> ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
> possible when DiagnosticsEngine is moved to LLVM.
>
>
> On behalf of the Arm Fortran Team,
> Andrzej Warzynski
>
> REFERENCES
>
> [1]
>
> https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
> [2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
> [3] https://reviews.llvm.org/D79092
> [4]
>
> https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
> [5]
> https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
> [6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
> [7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
> [8]
>
> https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
> [9]
>
> https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
> [10] https://reviews.llvm.org/D63607
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200602/64d43935/attachment-0001.html>


More information about the cfe-dev mailing list