[PATCH] D114325: Add a best practice section on how to configure a fast builder

Sun Nov 21 12:45:29 PST 2021

reames marked 7 inline comments as done.
reames added inline comments.

================
Comment at: llvm/docs/HowToAddABuilder.rst:198
+  generally provides a good balance between build times and bug detection for
+  most buildbots.
+
----------------
mehdi_amini wrote:
> rengolin wrote:
> > `RelWithDebugInfo` is perhaps even more helpful, because you test the optimisation pipeline, get smaller objects to link, and still, in case of stack traces, you can see from the logs directly where to begin looking.
> RelWithDebugInfo seems a bit heavy to me: the objects gets ~10x larger IRRC.
> If what you're about is better stack traces in case of crashes, then `-gmlt` (line tables only) gets it to you without blowing up the disk size / link time.
I think I managed to address this with the revised wording, let me know if further tweaking is warranted.

================
Comment at: llvm/docs/HowToAddABuilder.rst:222
+  well, and that having local per-worker caches gets most of the benefit
+  anyways.  We don't currently recommend shared caches.
+
----------------
rengolin wrote:
> Absolutely! Unless you can control everything on all users' environments (toolchain, libs, OS, CMake config), you can't have a generic cache shared amongst builders.
> 
> It is possible to have shared builds cache, but the management of such a cache is non-trivial and not the goal of our buildbot infrastructure.
> 
> Of course, bot owners can, if they want, set up such a cache, but it's an exercise to the reader, not the general recommendation.
I don't understand your comment.  The text you replied to was specific to a cache shared between the workers of a single builder, did you maybe think this was broader scoped?  

================
Comment at: llvm/docs/HowToAddABuilder.rst:235
+  impacting the broader community.  The sponsoring organization simply
+  has to take on the responsibility of all bisection and triage.
+
----------------
mehdi_amini wrote:
> rengolin wrote:
> > mehdi_amini wrote:
> > > For this to be effective, we're missing (I think) the ability to get notified on specific emails for builders that are on staging.
> > > 
> > > 
> > Not really. The staging builder sole purpose is to make sure we *don't* get notified, so the bot and target owners can work in the background, before going public with a noisy infrastructure.
> > 
> > The only thing we should do in the staging master is to notify the bot owner, but even that is probably redundant, as the bots only remain in the staging master while they can't be on the production one, so bot owners will be actively looking at them, trying to make them green as soon as possible.
> > 
> > In that case, they will probably see the failure before the email hits their inbox.
> > 
> > What I mean is that it would be an anti-pattern to turn on notifications here, because bot owners may be encouraged to let their bots stay on the staging master for longer than they should, because they have some level of notification.
> > 
> > This is bad because we'll end up with a tiered quality infrastructure, where some bots warn some people, while others warn more people, or no people. This will make it harder for a random developer in the community to know what they break or not and the idea that "all bots should be green" collapses.
> > Not really. 
> 
> It seems to me that you missed the point of this section: it explicitly talks about the possibility that "builders can run indefinitely on the staging buildmaster", as well as "The sponsoring organization simply has to take on the responsibility of all bisection and triage".
> 
> > This is bad because we'll end up with a tiered quality infrastructure, where some bots warn some people, while others warn more people, or no people. 
> 
> This already exists. There are many bots that are just not on buildbots, for example https://buildkite.com/llvm-project/llvm-main/builds?branch=main or https://buildkite.com/mlir/mlir-core/builds?branch=main 
> We could also mention Chromium bots, or other people that build downstream for their own reason. 
> 
> The only question is should this "second tiered" infrastructure be hosted by our infrastructure (could be staging bot, could be a silent mode on the production buildbot) or whether you just prefer that these folks stay on their own infra (which does not change much in practice: I'm reverting patches that break my buildkite bots...).
> 
> 
> Finally, I have bots on staging for a couple of months, and haven't been confident to migrate them to prod (busy with other stuff) and I don't spend much time monitoring them. In absence of email on failures I can't build the confidence to migrate them, so they sit there for now... I would very much appreciated being notified on these bots instead of having to poll every day to figure out if they are stable enough!
> 
The capability to have a bot which notifies only the maintainer has come up multiple times.  I think we definitely need to document how to do that.  I believe we already can, but checking my notes, I didn't write down how.  Will pay attention to this in next round of conversation and see if I can figure out how to do this with current infrastructure.

I vaguely remember it being a mode on the main buildmaster, not staging, but will have to confirm.  

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114325/new/

https://reviews.llvm.org/D114325