[PATCH] D114325: Add a best practice section on how to configure a fast builder

Mon Nov 22 10:56:07 PST 2021

rengolin added inline comments.

================
Comment at: llvm/docs/HowToAddABuilder.rst:154-155
+
+As mentioned above, we generally have a strong preference for
+builders which can build every commit as they come in.  This section
+includes best practices and some recommendations as to how to achieve
----------------
dblaikie wrote:
> do we have any builders that achieve this consistently (I wouldn't think so, given the resources required)? Maybe worth rephrasing if it's  not actually achievable/achieved generally to something more in line with the practical reality?
> 
> If this document is more aspirational/trying to set a fairly new (albeit good, but perhaps not feasible?) direction - maybe it'd be more suitable in a different form/forum?
I don't think we have many, if any, but I interpreted it as "preference" and "best practices", not that we don't accept others. I agree we shouldn't be discouraging people to set buildbots if they can't follow these guidelines.

================
Comment at: llvm/docs/HowToAddABuilder.rst:212-216
+  Using ccache materially improves average build times.  Incremental builds
+  can be slightly faster, but introduce the risk of build corruption due to
+  e.g. state changes, etc...  At this point, the recommendation is not to
+  use incremental builds and instead use ccache as the latter captures the
+  majority of the benefit with less risk of false positives.
----------------
dblaikie wrote:
> Seems like we should figure out how to make incremental builds more reliable - to benefit developers (& then have buildbots using incremental builds to ensure they do keep working so developers can benefit from them being reliable). But, yeah, if it's just not practical today, so be it.
for a number of years I used incremental builds on Arm with very little trouble. I had to clean the build directory perhaps a couple of times a year when something (that I don't remember) happened, but otherwise it was way better than full builds and ccache (due to using SSD or USB2 disks on dev boards).

================
Comment at: llvm/docs/HowToAddABuilder.rst:224-227
+  With multiple workers, it is tempting to try to configure a shared cache
+  between the workers.  Experience to date indicates this is difficult to
+  well, and that having local per-worker caches gets most of the benefit
+  anyways.  We don't currently recommend shared caches.
----------------
dblaikie wrote:
> Is this about multiple workers on the same machine, or some kind of network shared cache? Presumably if we're suggesting people have multiple workers per builder (to get fast enough cycle time/short enough blame list) - that's multiple machines (since generally we could get enough parallelism to saturate a machine in the build - I guess not all the time, so maybe there's some parallelism benefit to multiple workers on the same machine?)
I interpreted as a network cache. I suppose it could be the same machine, too, though it would use the cache in a similar way if you use containers, for example.

Low memory machines have to restrict linking parallel settings, so running two builds at the same time could still OOM-kill builds. High memory machines (using LLD on release mode) have the linking phase fast enough that multiple builds tend to not help much. GCC builds used to be much less parallel than LLVM, so it worked well for them.

For a while, for Arm64, we didn't have a lot of machines, so we put multiple (different) builders on the same machine, but that couldn't use the same cache anyway.

================
Comment at: llvm/docs/HowToAddABuilder.rst:236-238
+  As a last resort, you can configure your builder to batch build requests.
+  This makes the build failure notifications markedly less actionable, and
+  should only be done once all other reasonable measures have been taken.
----------------
dblaikie wrote:
> That's the default/what most (all?) of the buildbots are doing today, though, yeah?
Yeah, I hadn't seen it that way, but I guess you're right.

Perhaps we should make clear that this is an aspiration, not strong recommendation. 

================
Comment at: llvm/docs/HowToAddABuilder.rst:211
+  use incremental builds and instead use ccache as the latter captures the
+  majority of the benefit with less risk of false positives.
+
----------------
dblaikie wrote:
> rengolin wrote:
> > This is true, however in the more limited targets (like our initial Cortex A9 targets back in the day), even ccache wasn't enough to speed up our builds, mostly because autoconf+make wasn't smart enough, IIRC.
> > 
> > It's possible that CMake+ninja plays well enough with ccache that makes incremental builds obsolete, but I have not done the proper testing to confirm that.
> what smartness was lacking from autoconf+make that meant ccache was ineffective in this situation? (I thought the point of ccache was that it didn't matter how the build system worked, basically)
I really can't remember, this was 8 years ago, but I vaguely remember that when I moved to CMake+ninja, the issues I had were gone.

Every time I setup ccache I hit some kind of problem that needs a full rebuild. I'm pretty sure that's just because I don't know how to setup ccache correctly, so mostly this should be filed under "maybe ccache isn't as trivial as hoped" or "Renato isn't smart (or patient) enough to setup ccache correctly". :)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114325/new/

https://reviews.llvm.org/D114325