[llvm-dev] [cfe-dev] [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM

James Y Knight via llvm-dev llvm-dev at lists.llvm.org
Wed Jun 3 15:24:34 PDT 2020


Extracting common code, especially the Driver code, from clang, so that it
can also be used for flang seems entirely reasonable as a high-level goal.

But I'd just like to enter another vote for it to live somewhere other than
under the "llvm" top-level-directory, e.g. "clang-common",
"frontend-support", or any other name people would like to paint that
bikeshed. :)

On Wed, Jun 3, 2020 at 5:50 PM Michael Spencer via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> On Tue, Jun 2, 2020 at 6:38 PM Richard Smith via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Tue, 2 Jun 2020 at 05:08, Andrzej Warzynski via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> *TL;DR*
>>>
>>> We propose some non-trivial refactoring in Clang and LLVM to enable
>>> further work on Flang driver.
>>>
>>> *SUMMARY*
>>> We would like to start extracting the driver/frontend code from Clang
>>> (alongside the code that the driver/frontend depends on, e.g.
>>> Diagnostics) and move the components that could be re-used by
>>> non-C-based languages to LLVM. From our initial investigation we see
>>> that these changes will impact many projects (upstream and downstream)
>>> and will require big mechanical patches (our first attempt is
>>> implemented in [8]). This is not ideal, but seems unavoidable in the
>>> long-term. We would like to do this refactoring _before_ we start
>>> implementing the Flang driver upstream (OPTION 1 below). This way we
>>> avoid:
>>>
>>> * contaminating Clang with Fortran specific code (and vice versa)
>>> * introducing dependency on Clang in Flang
>>>
>>> The downside is that the refactoring is likely to be disruptive for many
>>> projects that use Clang. We will try our best to minimise this.
>>>
>>> Does this approach make sense? Are there any preferred alternatives? At
>>> this stage we'd like to discuss the overall direction. If folks are in
>>> favour, we'll send a separate RFC with a finer breakdown and more
>>> technical details for the refactoring.
>>>
>>> Below you will find more context for our use-case (the Flang driver) and
>>> possible alternatives. We hope that this will help the discussion. We
>>> would really appreciate your feedback!
>>>
>>
>> Generally, I think this is a good idea, and a healthy direction for LLVM
>> overall. We need to be careful to do this in a way that doesn't introduce
>> complexity or overheads in Clang, though, so we should proceed very
>> cautiously.
>>
>> I also think you're skewing somewhat too far in favor of code reuse. Some
>> of the Clang code you're identifying below is very carefully tuned and
>> tailored to Clang's use, and the amount of it that could reasonably be
>> shared with another project (carefully tuned and tailored to that project's
>> use) is probably small, unless we heavily generalize it. Such
>> generalization is likely not going to be worth its development and
>> maintenance costs.
>>
>>
>>> *BACKGROUND*
>>> Flang (formerly known as F18) has recently been merged into LLVM [1].
>>> Our ambition, as a community, is to make it as flexible, robust and nice
>>> to work with as Clang. One of the major items to address is the
>>> implementation of a driver that would provide the flexibility and user
>>> experience similar to that available in Clang. The F18/Flang driver was
>>> already discussed on cfe-dev last year [2], but back then F18 (now llvm
>>> project/flang) was a separate project. In the original proposal it was
>>> assumed that initially Flang would depend (and extend where necessary)
>>> Clang's driver/frontend code. Since F18/Flang was an independent
>>> project, the refactoring of Clang/LLVM wasn't really considered. That
>>> design has been challenged since ([3], [10]), and also not much progress
>>> has been made. We would like to revisit that RFC from a slightly
>>> different angle. Since Flang is now part of LLVM's monorepo, we feel
>>> that refactoring Clang/LLVM _before_ we upstream the driver makes a lot
>>> of sense and is the natural first step.
>>>
>>> *ASSUMPTIONS & DESIGN GOALS*
>>> 1. We will re-use as much of the Clang's driver/frontend code as
>>> possible (this was previously proposed in [2]).
>>>
>>> 2. We want to avoid dependencies from Flang to Clang, both long-term
>>> (strong requirement) and short-term (might be difficult to achieve).
>>> This has recently come up in a discussion on one of our early patches
>>> [3] (tl;dr Steve Scalpone, the code owner of Flang, would prefer us to
>>> avoid this dependency), and was also suggested before by Eric
>>> Christopher [10].
>>>
>>> 3. We will move the code that can be shared between Flang and Clang (and
>>> other projects) to LLVM. This idea has already come up on llvm-dev
>>> before [7] (in a slightly different context, and to a slightly different
>>> extent). The methods that are not language specific would be shared in
>>> an LLVM library.
>>>
>>> 4. The classes/types/methods that need specific changes for Fortran will
>>> be "copied" to Flang and adapted as needed. We should minimize (or even
>>> eliminate) any Fortran specific code from Clang and make sure that that
>>> lives in llvm-project/flang.
>>>
>>> *FLANG'S DEPENDENCIES ON CLANG*
>>> These are the dependencies on Clang that we have identified so far while
>>> prototyping the Flang driver.
>>>
>>> 1. All the machinery related to Diagnostics & SourceLocation.
>>>
>>> This is currently part of libclangBasic [4] and is used in _many_ places
>>> in Clang. The official documentation [5] suggests that this could be
>>> re-used for non-C-based languages. In particular, we feel that It would
>>> make a lot of sense for Flang to use it. Also, separating Clang's
>>> driver/frontend code and the diagnostics would require a lot of
>>> refactoring for no real benefit (and we feel that Flang should re-use
>>> Clang's driver/frontend code, see below). This dependency is used in
>>> many places, so moving it to LLVM will require a lot of (mostly)
>>> mechanical changes. We can't see an obvious way to split it into smaller
>>> chunks (see also below where we discuss the impact).
>>>
>>
>> I do not think it is necessarily going to be reasonable to move all
>> machinery related to SourceLocation (in particular, all of clang's
>> SourceManager) into LLVM. The ideas and data structure underpinning
>> SourceLocation and SourceManager are quite general (a concatenated
>> hierarchical slab of contiguous blocks, with linear indexing within those
>> blocks), but the details are much more specific to Clang and the C-family
>> languages it represents. Things like the support for object-like and
>> function-like macros, macro arguments, #include, splitting >> tokens for
>> C++, and so on, all make sense for Clang, but probably make less sense for
>> Fortran, where a different set of kinds of block would probably be desired
>> instead. This isn't something that can be trivially generalized and
>> extended, either; we carefully bit-pack various things into our block
>> representations, and as a result, we're quite tightly fitted to the needs
>> of Clang, and would probably not want to move away from that position.
>>
>> However, I do think there is common infrastructure that can be extracted,
>> with some significant work done to generalize the SourceManager
>> infrastructure and make it tailorable to the needs of Clang and Flang (and
>> any other consumers of it that might come along). I could imagine moving
>> all of the complexity to do with what kinds of SLocEntry are supported into
>> a traits type, and having a reusable template that can generate a data
>> structure that the Clang and Flang SourceManagers can be implemented in
>> terms of.
>>
>> Clang's SourceLocation is probably almost directly useable as-is -- it
>> has hardcoded assumptions about a particular bit being reserved to indicate
>> a location within a C preprocessor macro, but we can move that to a static
>> method on Clang SourceManager, and then I think SourceLocation can be
>> directly shared between the two projects.
>>
>> (One big asterisk on the above: will Flang want an integrated C
>> preprocessor? If so, then we're now talking about a much larger chunk of
>> Clang, including the lexer, preprocessor, identifier tables, the Token
>> type, and it may be best to simply acknowledge that Flang has a dependency
>> on Clang to supply all that, rather than moving it into LLVM.)
>>
>> The layers below SourceManager -- FileManager, the VFS, and so on -- all
>> seem like they should be reasonable to share between projects.
>>
>> Some of the diagnostics engine seems reasonable to share: specifically,
>> the tablegen-driven diagnostic table generation, most of the diagnostics
>> engine (including support for diagnostics pragmas that change the set of
>> warnings enabled at different source locations), and the formatting code
>> for non-clang-specific types are all relatively reusable. If you want to
>> reuse the TextDiagnosticPrinter, I think that will need some refactoring;
>> it's currently tied into the specific needs of Clang's SourceManager (for
>> handling textual inclusion and macro expansion in the way that C-family
>> languages deal with those things). I expect it would be possible to factor
>> out an interface that Clang could implement to provide the necessary
>> customizations.
>>
>> Before we factor out the diagnostics engine, we should fix the
>> longstanding issue that it requires a global monolithic table covering all
>> diagnostics, and is consequently unable to properly respect layering. I
>> think this is very much fixable, but it requires someone to do the work to
>> fix it :)
>>
>> Looking at your branch, I immediately see a few things there that are
>> unacceptable changes: moving clang's TokenKinds.def, Specifiers.h, and
>> OpenCLImageTypes.def into LLVM is not OK. But I assume you're aware of that
>> already. =)
>>
>>
>>> 2. libclangFrontend & libclangDriver
>>>
>>> The Flang driver will use many methods from libClangDriver,
>>> libClangFrontend and libClangFrontendTool. Driver.h and Compilation.h
>>> from libClangDriver are responsible to call, pass the correct arguments
>>> and execute the driver. TextDiagnosticPrinter.h takes care of printing
>>> the driver diagnostics in case of errors.
>>>
>>> The Flang frontend will use CompilerInstance, CompilerInvocation,
>>> FrontendOptions, FrontendActions and Utils from libClangFrontend and
>>> libClangFrontendTool. These methods are responsible for translating the
>>> command line arguments to frontend Options and later to Actions to be
>>> executed by ExecuteCompilerInvocation. The translation from arguments to
>>> Actions happens with FrontendOption and FrontendActions. But it is the
>>> CompilerInvocation that has the pointers for the sequence of Actions
>>> that are required in a Compiler Instance. These methods are needed to
>>> implement Flang driver/frontend and contain actions/method/functions
>>> that seem to be language agnostic.
>>>
>>
>> I think this is going too far in attempting to reuse Clang code.
>> CompilerInvocation, for example, is almost exclusively dealing in parsing
>> Clang's -cc1 flags, which I would expect to have very little overlap with
>> Flang's flags, and CompilerInstance exists (in part) to manage and own all
>> the Clang-specific global objects (the parser, sema, the module loader, the
>> AST consumer). Flang should not be going anywhere near this stuff, and
>> should be implementing its own frontend.
>>
>> There may be some clang-independent parts that can be factored out, but I
>> would expect them to be small enough that we can address them on a
>> case-by-case basis. The interesting thing to factor out is the parsing of
>> command-line options, but that's already been done. I think your approach
>> here should be to assume as a baseline that you reuse none of clang's
>> Frontend library, but if you find general pieces that can meaningfully be
>> extracted, we can talk about those pieces in isolation.
>>
>
> I strongly agree here. It doesn't make sense for `CompilerInvocation` or
> `CompilerInstance` to know anything about Fortran as they are entirely
> about driving the Clang frontend (-cc1).
>
>
>> For the driver, I think the picture is very different. It seems to me
>> that we should only have one LLVM driver, that can build C-family
>> languages, Fortran code, or both at the same time (or invoke lld etc). To
>> that end, I think it would be reasonable to move clang's driver out to a
>> separate LLVM project (maybe that's llvm/, maybe it's somewhere new such as
>> driver/), and extend it to be able to invoke flang actions in addition to
>> clang actions. Then the only difference between the clang and flang drivers
>> would be which frontend is directly linked into the driver binary and which
>> one is invoked by exec'ing a different binary. That would imply that all
>> the parts of Clang that are depended on by the driver are also moved out (I
>> think the main parts here are flags and diagnostics, and via the
>> diagnostics layer, source locations).
>>
>> This will require some decoupling between the Clang driver and frontend
>> (currently Clang's Options.td contains various driver options that are
>> marked as also being options for Clang's -cc1 mode; duplicating those in
>> CC1Options.td is probably acceptable, if we're going to split the driver
>> and frontend into two different projects), and some shared support code
>> (eg, clang's sanitizers list) will presumably end up in the driver, because
>> we don't want a driver -> *lang dependency.
>>
>
> Also strongly agree here. Large chunks of the driver will end up being the
> same between Clang and Flang, but they should still be separate actions
> having separate `FrontendOptions` and `FrontendActions`.
>
> I feel this is a much larger refactoring than the current changeset and
> description implies. I'm OK with the direction of Option 1, but am
> concerned with the specific implementation details that have been described
> so far. I'll feel much better when the concerns Richard expressed have been
> addressed.
>
> - Michael Spencer
>
>
>>
>>
>>> *ALTERNATIVES*
>>> This is a summary of the alternative ways of implementing the Flang
>>> driver. We propose OPTION 1. If there are no major objections, we will
>>> draft a separate RFC with more technical details (we will also break it
>>> down into smaller pieces). Otherwise, what would be your preferred
>>> alternative and why?
>>>
>>> OPTION 1
>>> We avoid dependency on Clang from Day 1.
>>>
>>> This is the ideal scenario that would guarantee that Clang and Flang are
>>> completely separate and that the common bits stay in LLVM instead. It
>>> would mean slower progress for us initially, but then other projects
>>> could benefit from the refactoring sooner rather than later.
>>>
>>> OPTION 2
>>> We avoid dependency on clangBasic from day 1, but initially allow
>>> dependency on libClangFrontend & libClangDriver (or other libs specific
>>> to the driver/frontend).
>>>
>>> The dependency on libclang{Driver|Frontend} would gradually be
>>> removed/refactored out as the driver for Flang gains momentum. As
>>> mentioned earlier, there is plenty of code in libClangFrontend and
>>> libClangDriver that we'd like to re-use, but the separation between code
>>> that's specific to C-based languages and generic driver/frontend code is
>>> not always obvious. We think that refactoring the common bits in
>>> libClangFrontend and libClangDriver might simply be easier once:
>>>
>>>   * we have a Flang driver that leverages these libraries, and, as a
>>> result,
>>>   * we understand better what we could re-use and what's not that
>>> relevant to non-C-based languages.
>>>
>>> OPTION 3
>>> We initially keep the dependency on Clang and re-visit this RFC later.
>>>
>>> This would be the least disruptive approach (at least for the time
>>> being) and would allow us to make us the most rapid progress (i.e. we
>>> would be focusing on implementing the features rather than refactoring).
>>> It would also inform the future refactoring better. But it was already
>>> pointed out that we should avoid dependencies on clang [3] and this
>>> would be a step in the opposite direction. Also, the build requirements
>>> for Flang would increase, and we feel that we should strive to reduce
>>> them instead [6].
>>>
>>> If we missed any alternatives, please bring them up.
>>>
>>
>> I don't think I can express an opinion without knowing whether you intend
>> for Flang to ever support an integrated C preprocessor. If not, then option
>> 1 seems appropriate. But if so, then I think we have a choice between
>> factoring out all of clang below the parser or just acknowledging that
>> Flang depends on Clang for its lexical layer and deciding to keep a flang
>> -> clang dependency forever.
>>
>>
>>> *IMPACT ON OTHER PROJECTS*
>>> The refactoring will have non-trivial impact on other projects:
>>>
>>> * OPTION 1 and OPTION 2 - huge impact initially.
>>> * OPTION 3 - no impact initially, but most likely similar impact as
>>> OPTION 1 and OPTION 2 in the long term.
>>>
>>>  From our initial investigation, extracting Diagnostics/SourceLocation
>>> from clangBasic and moving it to LLVM will be the most impactful change.
>>> Within llvm-project it is used in clang, clang-tools-extra, lldb and
>>> polly. Most of the changes will be mechanical, but will require touching
>>> many files. In order to get to a state where we could build libclang
>>> using the newly defined LLVM library, we had to touch ~850 files and
>>> make ~30k insertions/deletions. The result of this exercise is available
>>> in our development fork of llvm-project [8].
>>>
>>> Please note: our patches on GitHub [8] are just experiments to
>>> illustrate the idea. It's work-in-progress that requires a lot of
>>> polishing. When/if up-streaming this, we would need to do some
>>> low-impact refactoring first. For example, currently ASTReader &
>>> ASTWriter are `friends` with DiagnosticsEngine [9]. That won't be
>>> possible when DiagnosticsEngine is moved to LLVM.
>>>
>>>
>>> On behalf of the Arm Fortran Team,
>>> Andrzej Warzynski
>>>
>>> REFERENCES
>>>
>>> [1]
>>>
>>> https://github.com/llvm/llvm-project/commit/b98ad941a40c96c841bceb171725c925500fce6c
>>> [2] http://lists.llvm.org/pipermail/cfe-dev/2019-June/062669.html
>>> [3] https://reviews.llvm.org/D79092
>>> [4]
>>>
>>> https://github.com/llvm/llvm-project/blob/ad5d319ee85d31ee2b1ca5c29b3a10b340513fec/clang/lib/Basic/CMakeLists.txt#L45-L47
>>> [5]
>>> https://clang.llvm.org/docs/InternalsManual.html#the-clang-basic-library
>>> [6] http://lists.llvm.org/pipermail/flang-dev/2019-November/000061.html
>>> [7] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136743.html
>>> [8]
>>>
>>> https://github.com/banach-space/llvm-project/commits/andrzej/refactor_clangBasic
>>> [9]
>>>
>>> https://github.com/llvm/llvm-project/blob/b11ecd196540d87cb7db190d405056984740d2ce/clang/include/clang/Basic/Diagnostic.h#L985-L986
>>> [10] https://reviews.llvm.org/D63607
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200603/1cc5615f/attachment-0001.html>


More information about the llvm-dev mailing list