[PATCH] D114325: Add a best practice section on how to configure a fast builder

Sun Nov 21 05:57:20 PST 2021

rengolin added a comment.

Overall looks good, thanks for writing this!

I have a few comments inline, which are mostly my own experience and from a while ago, so grain of salt.

================
Comment at: llvm/docs/HowToAddABuilder.rst:194
+  may be better off with two separate bots.  Splitting increases resource
+  consumption, but makes it easy for each bot to keep up with commit flow.  
+
----------------
It also makes it easier to identify what's broken before you delve on the logs. If the number of commits is low and the build is restricted to a part of the compiler, and other bots on the same target pass, then stands to reason that the failure is in the intersection between { commit, target, feature }.

When having more than one commit in a build (in cases where it's too hard to make bots faster), this reduces a lot the work of bisecting and identifying the offending commit.

I have done this with the Arm bots in their early stages and it has helped me a lot.

================
Comment at: llvm/docs/HowToAddABuilder.rst:198
+  generally provides a good balance between build times and bug detection for
+  most buildbots.
+
----------------
`RelWithDebugInfo` is perhaps even more helpful, because you test the optimisation pipeline, get smaller objects to link, and still, in case of stack traces, you can see from the logs directly where to begin looking.

================
Comment at: llvm/docs/HowToAddABuilder.rst:202
+  Ninja really does help build times over Make, particularly for highly
+  parallel builds.  LLD helps to reduce link times significantly.  With
+  a build machine with sufficient parallism, link times tend to dominate
----------------
Most importantly, LLD reduces memory consumption, sometimes by orders of magnitude.

On targets with small memory this is paramount to faster builds, sometimes to finishing a build at all.

But even on larger memory targets, this speeds up a lot the linking process, too.

================
Comment at: llvm/docs/HowToAddABuilder.rst:211
+  use incremental builds and instead use ccache as the latter captures the
+  majority of the benefit with less risk of false positives.
+
----------------
This is true, however in the more limited targets (like our initial Cortex A9 targets back in the day), even ccache wasn't enough to speed up our builds, mostly because autoconf+make wasn't smart enough, IIRC.

It's possible that CMake+ninja plays well enough with ccache that makes incremental builds obsolete, but I have not done the proper testing to confirm that.

================
Comment at: llvm/docs/HowToAddABuilder.rst:217
+  (e.g. doc changes, python utility changes, etc..), the build will entirely
+  hit in cache and the build request will complete in just the testing time.
+
----------------
This is true for fast disks (hd, ssd or better). On targets that build on slower disk interfaces (mmc, sd, nfs) multiple parallel access to the cache storage can thrash performance to a point that it will dominate the critical path.

In those cases, incremental builds are *much* faster because they never get triggered in the first place.

================
Comment at: llvm/docs/HowToAddABuilder.rst:222
+  well, and that having local per-worker caches gets most of the benefit
+  anyways.  We don't currently recommend shared caches.
+
----------------
Absolutely! Unless you can control everything on all users' environments (toolchain, libs, OS, CMake config), you can't have a generic cache shared amongst builders.

It is possible to have shared builds cache, but the management of such a cache is non-trivial and not the goal of our buildbot infrastructure.

Of course, bot owners can, if they want, set up such a cache, but it's an exercise to the reader, not the general recommendation.

================
Comment at: llvm/docs/HowToAddABuilder.rst:235
+  impacting the broader community.  The sponsoring organization simply
+  has to take on the responsibility of all bisection and triage.
+
----------------
mehdi_amini wrote:
> For this to be effective, we're missing (I think) the ability to get notified on specific emails for builders that are on staging.
> 
> 
Not really. The staging builder sole purpose is to make sure we *don't* get notified, so the bot and target owners can work in the background, before going public with a noisy infrastructure.

The only thing we should do in the staging master is to notify the bot owner, but even that is probably redundant, as the bots only remain in the staging master while they can't be on the production one, so bot owners will be actively looking at them, trying to make them green as soon as possible.

In that case, they will probably see the failure before the email hits their inbox.

What I mean is that it would be an anti-pattern to turn on notifications here, because bot owners may be encouraged to let their bots stay on the staging master for longer than they should, because they have some level of notification.

This is bad because we'll end up with a tiered quality infrastructure, where some bots warn some people, while others warn more people, or no people. This will make it harder for a random developer in the community to know what they break or not and the idea that "all bots should be green" collapses.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114325/new/

https://reviews.llvm.org/D114325