[llvm-dev] RFC: LLVM Build System Future Direction

Chris Bieneman via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 21 11:25:38 PDT 2019

Over the past few years the LLVM CMake build system has gained a lot of new features, which have powered new workflows and capabilities. This development has largely been individual efforts without focus or long-term direction. The build system is of incredible importance to LLVM as it is a primary interface for contributors to build and test their changes.

This year, LLVM is making a huge infrastructure shift to GitHub. Along with that shift many of the previously supported build options may not make sense. This is a prefect opportunity to revisit the state of our build infrastructure with an eye toward the future.

I would like this RFC, and any discussion it sparks, to serve as a starting point for a conversation we can continue at the LLVM Developer Meeting to set goals and conventions for the development of the build system.

Tom Stellard has scheduled a roundtable on CMake from 10:45-11:55 on Wednesday, Oct 23.

## The Problem
Lacking clear direction and oversight the build system is evolving in rapidly divergent ways. Further since we don't have a formalized process for unifying workflows and deprecating old behaviors LLVM's build system has a convoluted feature set.

This manifests itself in many unfortunate ways. Some examples are:

(1) There are three different ways to build compiler-rt as part of LLVM
(2) There are lots of incompatible build configurations that need to be accounted for, some that aren't (like -DLLVM_BUILD_LLVM_DYLIB=On -DBUILD_SHARED_LIBS=On, which will explode at runtime)

As the build system gains complexity maintaining the build system is getting more expensive to the community.

## Future Directions
The following are proposals to enable the build system to better facilitate LLVM development and provide a usable, extensible, and stable build system for LLVM and all sub-projects.

### Updating CMake More Regularly
In the past we have clung to old versions of CMake for extended periods of time. This has resulted in significant checking `CMAKE_VERSION` to enable some features or being completely unable to use others. In particular recent CMake development extending generator expressions could provide substantial benefit to the project. If we stick to current upgrade policies, we may not be able to use the current CMake release for another few years.

As an alternative proposal, we should consider CMake upgrades independent of OS package manager versioning. That is not to say we should take every new version of CMake, however we should upgrade for compelling reasons.

In tree right now we have code gated on CMake 3.7 which enables CMake to generate Ninja rules for tablegen that process tablegen's dep files. This allows accurately rebuilding tablegen when included files change. I propose that this change's benefit should be sufficient to justify moving to CMake 3.7 for the project.

Additionally, building LLDB.framework on Darwin requires CMake 3.8 due to bugs in earlier versions of CMake. This could also be a justification for updating.
Lastly, getting updated versions of CMake is very easy. Kitware provides Windows, Mac and Linux builds on cmake.org as well as an Ubuntu apt source. If that is insufficient building CMake from source is simple, and has minimal system requirements. Visual Studio contains reasonably up-to-date CMake bundled. As such we should not allow OS release or support cycles to dictate when we upgrade CMake.

### Reducing the Test Matrix
The most important guiding principal for development of the LLVM build system must be to reduce the matrix of configurations. The more possible configurations the build system supports the wider the test matrix. This is not to say the build system should support doing less, but rather to support less unique configurations. 

Many configuration options in the build system just turn on or off different parts of the project. For example, the `LLVM_BUILD_LLVM_DYLIB` option just disables configuring libLLVM. An alternative approach would be to always configure libLLVM, and leave it up to users of the build system to determine whether or not to build it.

We also have options to enable and disable configuring individual tools in the LLVM and Clang projects. I believe we should eliminate those options, which will result in the `all` target always being everything. We have explicit clean dependencies for the `check-*` targets so most developer workflows should be un-impacted. Distribution workflows can use the `LLVM_DISTRIBUTION_COMPONENTS` option to hand tailor which parts of LLVM to build and install with better control without as much complication.

Many other options exist to support a wide variety of divergent workflows. For example, the `LLVM_EXTERNAL_${PROJECT}_SOURCE_DIR` options exist to allow users to specify custom paths for projects so that, historically, they didn't need to nest clang inside LLVM. With the move to the mono-repo we should define consistent workflows and eliminate options that support divergent workflows.

### Adopting Conventions
Much of LLVM's build system is not idiomatic CMake. Some of those differences make sense because LLVM is not a typical software project. I'm unaware of any build configuration system that was designed specifically to build compilers and handle the complex dependency chains that come with that territory.

Some of our divergences come from history. We have a great many features implemented in our CMake because CMake didn't support them at the time. We also have patterns that were appropriate before CMake added new features, and have never cleaned them up.

One big thing our build system needs is a set of guiding conventions to direct future development. Some key conventions that I believe are crucial:

#### Avoid Order Dependent Behavior
CMake generator expressions provide the ability to defer logic until after script processing. This allows the build system to avoid direct dependence on the order in which targets are processed. We should not use the `if(TARGET ...)` or `get_target_property` interfaces unless it is completely impossible to avoid.

#### Avoid Options to Enable/Disable Configuration
If we reduce the test matrix, having a convention to keep it reduced is of vital importance so that we don't find ourselves needing to clean up again in a few years.

#### Avoid Caching, Use `mark_as_advanced` and `INTERNAL` Liberally
CMake has no strategy for cache invalidation. As such, cached variables add additional maintenance burden because they can break builds sometimes in hard to diagnose ways. That said they are useful. In particular for things like configuration checks that are slow caching the result makes incremental re-configuration much faster. We should use cached values sparingly and only where they provide benefit.

Additionally, every cached CMake variable is a configuration point. Variables not marked `INTERNAL` show up in `ccmake` and `cmake-gui`, and variables not `mark_as_advanced` show up to all users. We should use the `INTERNAL` and `mark_as_advanced` options wherever appropriate to limit our supported configuration interface.

#### Making Sense of Runtime Builds
Right now, there are three different ways to build compiler-rt as part of LLVM and two different ways to build most of the other runtime libraries (libcxxabi, libcxx, libunwind, etc). This situation is confusing even for long time contributors.

We need a clearer story for building runtime libraries to reduce the number of different ways they are built and provide simplified workflows for users.

It is my opinion that if you are building a runtime library as part of an LLVM/Clang build, it should be configured and built with the in-tree clang as it would be for distribution. If you don't want to build with the in-tree clang, we should encourage people to build the runtime libraries independently of the compiler.

My reasoning for this is that distributions of clang are generally built from the default settings in the build and configuration process, and distributions (or installs by new users) which include the runtime libraries should have runtimes built with the just-built compiler. To align these two situations we need the default build configuration of LLVM+Clang+Runtimes to be using the just-built compiler.

Adopting this change would mean runtime library projects would only contain build system support for building "standalone" meaning not in the same configuration as LLVM. We would then support runtime libraries built as individual projects or using the LLVM runtimes directory, which separately configures and builds runtime libraries using the just-built clang.

More information about the llvm-dev mailing list