[llvm-dev] RFC: LLVM Build System Future Direction

Shoaib Meenai via llvm-dev llvm-dev at lists.llvm.org
Tue Oct 29 11:45:53 PDT 2019

LLVM_EXTERNAL_*_SOURCE_DIR can be used for specifying paths to external clang, etc., and I agree that with the monorepo, there’s one canonical location for those sub-projects to live, and we don’t need to support it for those subprojects. However, LLVM_EXTERNAL_*_SOURCE_DIR can also be used in conjunction with LLVM_EXTERNAL_PROJECTS to specify the paths to other projects you want to include in your build but don’t necessarily want to place beside the llvm directory, e.g. Swift. We’ll continue supporting it for the latter use case, correct?

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Chris Bieneman via llvm-dev <llvm-dev at lists.llvm.org>
Reply-To: Chris Bieneman <beanz at apple.com>
Date: Tuesday, October 29, 2019 at 10:10 AM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC: LLVM Build System Future Direction

Sorry for the delay in writing this up and sending it out, but I wanted to recap the discussion from the roundtable on October 23rd. The roundtable ran for almost two hours and we discussed at most of the main points in my RFC. Thank you everyone who participated and contributed to the discussion!

TL;DR: We should move to CMake 3.15 (RFC incoming). We should make `all` really `all`. We should strive to reduce complexity and remove options, specifically options that aren't relevant to the monorepo. We should work to standardize workflows. We need to keep thinking about how to build runtime projects.

In my recap I'd like to fully disclose that I was standing up through most of the roundtable directing the conversation so I don't have much in the way of notes to go off. I may get details or things wrong, so please chime in to correct me.

We had a brief discussion around raising the minimum required CMake version. The general consensus was that since CMake provides binary packages for most common OSs, and building CMake from source has lower system requirements than LLVM and is very simple, nobody saw any barrier to adopting new versions. Initially I suggested moving to CMake 3.11 or 3.12, I believe it was James Knight who said the actual cost of updating the bots is the hard part in raising the version, so maybe we should just take the newest. That reasoning made sense to everyone at the roundtable and there were no objections at the roundtable to moving to CMake 3.15. Look for an RFC from me shortly that will propose that and lay out a timeline.

We spent a lot of time talking about a handful of other topics that all spring out from my RFC. There was a general agreement that the number of options in the build system is unwieldy and the combinatoric effect of the options is making the build system difficult to maintain. There was a general agreement at least in principal that we should work to reduce the complexity of the build system.

We did discuss specific build system functionality that could be simplified or removed, and on some points there was agreement, and on others there is need for more discussion. A few examples:

There was general agreement on the concept from my RFC that the `all` target should always really be `all`. That would mean removing the `LLVM_TOOL_*_BUILD` options.

There was also agreement that in the monorepo it no longer makes sense to have an option to specify the source locations of sub-projects. That means we can remove the `LLVM_EXTERNAL_*_SOURCE_DIR` variables.

There was disagreement over whether or not standalone builds of non-runtime sub-projects should remain. This specifically would relate to clang and lldb, and whether or not they can be built against installed or separately built copies of LLVM. There were points on both side of this discussion, and it will require more discussion to resolve.

The roundtable also focused a lot of time on the runtime libraries, and how they should be built. It is clear from the discussion that the runtime libraries need to support being built standalone (separately from LLVM) in order to fit into the various distribution strategies. Specifically many of the runtime libraries are sometimes shipped as OS components rather than as part of the toolchain, so you need to be able to build them separately from the toolchain. Those same libraries can also be shipped as part of a toolchain, so we need to support that workflow too. The need to support these disparate workflows has led to much of the complexity. The proposal from my RFC of standardizing runtimes builds as "standalone", and using the LLVM runtimes directory (which uses CMake's ExternalProject) to configure and build runtimes had pretty wide support in the discussion. I suspect this is the direction we should move in, but with some slight changes.

One of the points that came up was that building runtimes for multiple platforms at once can be a configuration nightmare requiring hundreds of build settings. It was suggested that it might be easier if you could build all of the runtime libraries with a single CMake invocation separately from LLVM. There are some dependency complications around the ordering of the builtin libraries build, but otherwise this is largely doable.

There are also problems related to cross-runtime dependencies which have come up recently. Some of these problems can be addressed more cleanly with modern CMake generator expressions, and other issues may require a larger restructuring of code. For years there have been discussions thrown around about breaking the builtins libraries out of compiler-rt. Maybe now is the time to consider doing that.

The last topic that was brought up at the end of the roundtable was about supporting IDE generators. Saleem Abdulrasool pointed out that Visual Studio ships a CMake integration that generates Ninja builds and supports the IDE UI interactions as if it were a Visual Studio project without being a project generator. Not supporting IDE build systems would clean up a lot of complexity in our build system. As was pointed out during the discussion, Xcode has no such support and is used by members of the community. There was discussion of whether or not an Xcode+Ninja generator could be created which would use Xcode's external build system mechanism to create an Xcode project for browsing, editing, and debugging, while using Ninja to build. The conversation had no real resolution other than "that would be cool" and "it would be nice to remove all the `if (XCODE)` blocks".

Thank you everyone who participated in the roundtable! If there is anything I missed please help fill in the blanks.


On Oct 24, 2019, at 10:28 AM, Alex Denisov via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

That works perfectly.

But I’m referring to the case when I want to add LLVM as a sub project, which would allow me to debug/modify LLVM as if it is my code.
As I said, this also works, but some parts are not straightforward.

On Thu 24. Oct 2019 at 18:15, Louis Dionne <ldionne at apple.com<mailto:ldionne at apple.com>> wrote:

On Oct 24, 2019, at 02:17, Alex Denisov via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi Chris,

This is a great initiative and it feels like the right direction.

I'd like to add another point to the list: using LLVM as a library, i.e. being able to add it as a CMake subproject.
Currently it works pretty good, but some parts can be improved (somehow). E.g.:

I believe the CMake-correct way of doing that is to produce LLVM export files when installing LLVM, which would allow find_package to work out of the box. You’d get LLVM targets you can link against and all dependencies would be propagated by CMake properly.


 - getting LLVM's version: the only way I've found is to parse the CMakeLists.txt with regexes and hope it doesn't break in the future
 - getting include dirs info: IIRC currently it works transitively when one links against some library, but it feels like a weak point to me
 - getting other configuration options (for example whether LLVM is built with or without exceptions)
 - controlling how LLVM is built: the only way I found is to force set a variable (i.e. set (LLVM_BUILD_32_BITS ON CACHE BOOL "" FORCE)) before adding LLVM as a subproject

I think my list is not exhaustive since it covers only my use cases, so maybe there are other opinions/requests from others.


On Mon, Oct 21, 2019 at 8:26 PM Chris Bieneman via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Over the past few years the LLVM CMake build system has gained a lot of new features, which have powered new workflows and capabilities. This development has largely been individual efforts without focus or long-term direction. The build system is of incredible importance to LLVM as it is a primary interface for contributors to build and test their changes.

This year, LLVM is making a huge infrastructure shift to GitHub. Along with that shift many of the previously supported build options may not make sense. This is a prefect opportunity to revisit the state of our build infrastructure with an eye toward the future.

I would like this RFC, and any discussion it sparks, to serve as a starting point for a conversation we can continue at the LLVM Developer Meeting to set goals and conventions for the development of the build system.

Tom Stellard has scheduled a roundtable on CMake from 10:45-11:55 on Wednesday, Oct 23.

## The Problem
Lacking clear direction and oversight the build system is evolving in rapidly divergent ways. Further since we don't have a formalized process for unifying workflows and deprecating old behaviors LLVM's build system has a convoluted feature set.

This manifests itself in many unfortunate ways. Some examples are:

(1) There are three different ways to build compiler-rt as part of LLVM
(2) There are lots of incompatible build configurations that need to be accounted for, some that aren't (like -DLLVM_BUILD_LLVM_DYLIB=On -DBUILD_SHARED_LIBS=On, which will explode at runtime)

As the build system gains complexity maintaining the build system is getting more expensive to the community.

## Future Directions
The following are proposals to enable the build system to better facilitate LLVM development and provide a usable, extensible, and stable build system for LLVM and all sub-projects.

### Updating CMake More Regularly
In the past we have clung to old versions of CMake for extended periods of time. This has resulted in significant checking `CMAKE_VERSION` to enable some features or being completely unable to use others. In particular recent CMake development extending generator expressions could provide substantial benefit to the project. If we stick to current upgrade policies, we may not be able to use the current CMake release for another few years.

As an alternative proposal, we should consider CMake upgrades independent of OS package manager versioning. That is not to say we should take every new version of CMake, however we should upgrade for compelling reasons.

In tree right now we have code gated on CMake 3.7 which enables CMake to generate Ninja rules for tablegen that process tablegen's dep files. This allows accurately rebuilding tablegen when included files change. I propose that this change's benefit should be sufficient to justify moving to CMake 3.7 for the project.

Additionally, building LLDB.framework on Darwin requires CMake 3.8 due to bugs in earlier versions of CMake. This could also be a justification for updating.
Lastly, getting updated versions of CMake is very easy. Kitware provides Windows, Mac and Linux builds on cmake.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__cmake.org_&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=UAFGSAavn--yc4At8vmeYjEr1LHNhfLBEmBQXTeF2Ys&s=GPMAqbY_ljx2m6cGOSYDZg-mgaR32YcqsHagH6Dlr1k&e=> as well as an Ubuntu apt source. If that is insufficient building CMake from source is simple, and has minimal system requirements. Visual Studio contains reasonably up-to-date CMake bundled. As such we should not allow OS release or support cycles to dictate when we upgrade CMake.

### Reducing the Test Matrix
The most important guiding principal for development of the LLVM build system must be to reduce the matrix of configurations. The more possible configurations the build system supports the wider the test matrix. This is not to say the build system should support doing less, but rather to support less unique configurations.

Many configuration options in the build system just turn on or off different parts of the project. For example, the `LLVM_BUILD_LLVM_DYLIB` option just disables configuring libLLVM. An alternative approach would be to always configure libLLVM, and leave it up to users of the build system to determine whether or not to build it.

We also have options to enable and disable configuring individual tools in the LLVM and Clang projects. I believe we should eliminate those options, which will result in the `all` target always being everything. We have explicit clean dependencies for the `check-*` targets so most developer workflows should be un-impacted. Distribution workflows can use the `LLVM_DISTRIBUTION_COMPONENTS` option to hand tailor which parts of LLVM to build and install with better control without as much complication.

Many other options exist to support a wide variety of divergent workflows. For example, the `LLVM_EXTERNAL_${PROJECT}_SOURCE_DIR` options exist to allow users to specify custom paths for projects so that, historically, they didn't need to nest clang inside LLVM. With the move to the mono-repo we should define consistent workflows and eliminate options that support divergent workflows.

### Adopting Conventions
Much of LLVM's build system is not idiomatic CMake. Some of those differences make sense because LLVM is not a typical software project. I'm unaware of any build configuration system that was designed specifically to build compilers and handle the complex dependency chains that come with that territory.

Some of our divergences come from history. We have a great many features implemented in our CMake because CMake didn't support them at the time. We also have patterns that were appropriate before CMake added new features, and have never cleaned them up.

One big thing our build system needs is a set of guiding conventions to direct future development. Some key conventions that I believe are crucial:

#### Avoid Order Dependent Behavior
CMake generator expressions provide the ability to defer logic until after script processing. This allows the build system to avoid direct dependence on the order in which targets are processed. We should not use the `if(TARGET ...)` or `get_target_property` interfaces unless it is completely impossible to avoid.

#### Avoid Options to Enable/Disable Configuration
If we reduce the test matrix, having a convention to keep it reduced is of vital importance so that we don't find ourselves needing to clean up again in a few years.

#### Avoid Caching, Use `mark_as_advanced` and `INTERNAL` Liberally
CMake has no strategy for cache invalidation. As such, cached variables add additional maintenance burden because they can break builds sometimes in hard to diagnose ways. That said they are useful. In particular for things like configuration checks that are slow caching the result makes incremental re-configuration much faster. We should use cached values sparingly and only where they provide benefit.

Additionally, every cached CMake variable is a configuration point. Variables not marked `INTERNAL` show up in `ccmake` and `cmake-gui`, and variables not `mark_as_advanced` show up to all users. We should use the `INTERNAL` and `mark_as_advanced` options wherever appropriate to limit our supported configuration interface.

#### Making Sense of Runtime Builds
Right now, there are three different ways to build compiler-rt as part of LLVM and two different ways to build most of the other runtime libraries (libcxxabi, libcxx, libunwind, etc). This situation is confusing even for long time contributors.

We need a clearer story for building runtime libraries to reduce the number of different ways they are built and provide simplified workflows for users.

It is my opinion that if you are building a runtime library as part of an LLVM/Clang build, it should be configured and built with the in-tree clang as it would be for distribution. If you don't want to build with the in-tree clang, we should encourage people to build the runtime libraries independently of the compiler.

My reasoning for this is that distributions of clang are generally built from the default settings in the build and configuration process, and distributions (or installs by new users) which include the runtime libraries should have runtimes built with the just-built compiler. To align these two situations we need the default build configuration of LLVM+Clang+Runtimes to be using the just-built compiler.

Adopting this change would mean runtime library projects would only contain build system support for building "standalone" meaning not in the same configuration as LLVM. We would then support runtime libraries built as individual projects or using the LLVM runtimes directory, which separately configures and builds runtime libraries using the just-built clang.
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191029/56119e05/attachment-0001.html>

More information about the llvm-dev mailing list