[cfe-dev] Can we remove llvmbb from IRC?

Tue Sep 1 18:12:58 PDT 2020

I assume you're getting emails in addition to the chat spam? Or are you
not/are these bots sending chat spam but not email? If that's the case,
yeah, I'd rather have a consistent notification experience - and disable
all notifications from a bot if some notifications are disabled (eg: if
it's not good enough to be sending email, then it shouldn't be spamming the
IRC channel either)

On Tue, Sep 1, 2020 at 1:20 PM Nico Weber <thakis at chromium.org> wrote:

> On Tue, Sep 1, 2020 at 3:57 PM David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Tue, Sep 1, 2020 at 12:42 PM Nico Weber <thakis at chromium.org> wrote:
>>
>>> On Tue, Sep 1, 2020 at 3:32 PM David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>> On Tue, Sep 1, 2020 at 12:07 PM Nico Weber via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> llvmbb's job is to inform people of build breaks. However, it seems to
>>>>> trigger for a big list of bots, and at least one of them seems to always be
>>>>> broken,
>>>>>
>>>>
>>>> If a bot is always broken it shouldn't be sending email/notifications -
>>>> generally they are configured only to send email on green>red and red>green
>>>> transitions, so if it's already broken you shouldn't be blamed for it. If
>>>> you are seeing bot spam or emails from a bot that's already red, please
>>>> email llvm-dev and the bot maintainer and ask the bot to be reconfigured or
>>>> disabled.
>>>>
>>>> If a bot is regularly flakey (& thus sending email/notifications that
>>>> are false-positives/that no one can act on) please also send email asking
>>>> for the bot to be reconfigured or disabled. (or, if you want to be a bit
>>>> more punchy - send a patch to the zorg repository to have the bot disabled
>>>> & explain why you're proposing that)
>>>>
>>>
>>> I agree with this in the abstract, but I get pinged completely reliably
>>> at least twice after every single of my commits. This isn't something that
>>> sometimes happens, it's something that always happens.
>>>
>>
>> Could you point to specific buildbots/email when that comes up to help
>> improve things both on IRC and email/mailing lists, etc?
>>
>
> Just land a change :) Or look at IRC scrollback. Given how easy it is to
> find these problems, it doesn't seem like there's a lot of appetite for
> improving this.
>

I think there's apetite for changing it in some way - no one enjoys the
current state of things. But often people assume it's not changeable,
whereas I think it is - and I think it's important that it be changed
because if we silence all the bots, then quality is likely to go down.
Silencing the IRC bot may still be good - folks should be getting buildbot
fail email which is more targeted and not spamming the channel for people
who aren't to blame (heck, the bots could send private messages instead, I
guess?).

But improving signal/noise should benefit the email, and the bot spam
(whichever channel it's in).

> Hence me asking about removing llvmbb (...and so far everyone seems to be
> in favor).
>

> In this case, from my IRC scrollback (there's more people on the
> blamelist, spread over several follow-on IRC messages):
>
> build #13975 of clang-ppc64le-linux-multistage is complete: Failure
> [failed ninja check 1]  Build details are at
> http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13975
>  blamelist: LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Nico Weber <
> thakis at chromium.org>
>

That doesn't look like the "always be broken" case. It was green on the
build prior to this one (
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/13974
 )

Looks like the buildbot triggered correctly, only took the 2 revisions you
committed. The test did pass at the prior revision and did fail at that
revision - perhaps either the buildbot or the test is flakey?
(interestingly the test failed in stage 1 at 13975, then failed in stage 2
at 13976 - then passed again in 13977. Both failures for the same reason
"/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/tools/clang/test/Driver/Output/target-override.c.script:
line 5:
/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage2/tools/clang/test/Driver/Output/testbin/i386-clang:
No such file or directory" - perhaps some problem with creating the symlink?

Started an llvm-dev thread to discuss that separately in more detail.

> build #24132 of clang-with-thin-lto-ubuntu is complete: Failure [failed
> test-stage1-compiler]  Build details are at
> http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132
>  blamelist: Nico Weber <thakis at chromium.org>, Matt Arsenault <
> Matthew.Arsenault at amd.com>, Eric Astor <epastor at google.com>, Craig Topper
> <craig.topper at intel.com>, Alina
>

Also green on the prior build (
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24131 ).
Went green again after a revert here:
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24140 which
matches the commit that made the bot go red - so this looks to be a bot
doing what it's meant to do. (varying levels of quality, and 2 hour cycle
time isn't ideal by any means, though it found this failure in 5 minutes
once it started (but that could be 2 hours after a commit))

What do you think we should do with bots like this? Should long cycle
time/long blame list bots (not always the same thing) produce no
notifications, and require them to be triaged by the bot owner who then
manually sends email/follow-up once a rough guess of blame has been made &
checked that it hasn't already been possibly diagnosed, discussed and fixed
due to a faster bot or other means?

> build #2255 of lld-x86_64-win is complete: Failure [failed test-check-all]
>  Build details are at
> http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2255  blamelist:
> LLVM GN Syncbot <llvmgnsyncbot at gmail.com>, Eric Astor <epastor at google.com>,
> Craig Topper <craig.topper at intel.com>, Alina Sbirlea <asbirlea at google.com>,
> Nico Weber <thakis at chromium.org>, Amara
>

Also green on the prior build (
http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2254 ), and went
back to green on the following build.
Possibly this was related to the same commit/revert as in the previous bot
in this list. It's a fairly fast bot, went red on a build including the
revision that committed the xor issue, and green on the next build that
included a revert of that patch. I couldn't say for sure, though.

I also got email with pointers to:
>
> http://green.lab.llvm.org/green//job/clang-stage1-RA/14180/consoleFull#-1417328700a1ca8a51-895e-46c6-af87-ce24fa4cd561
>

Was red for a few builds then green again here:
http://green.lab.llvm.org/green/job/clang-stage1-RA/14183/

Looks like the build that went red and the build that went green (& the
fact that the failure was related to libfuzzer) correlates well with this
commit:
https://github.com/llvm/llvm-project/commit/2665425908e00618074e42155ec922a37f7c9002
and
this revert:
https://github.com/llvm/llvm-project/commit/7139736261e047e9cca030e2ee5912bf2a16f816

> Chances are that there's something genuinely broken somewhere (maybe
> compiler-rt?), but asking for concrete bots distracts from the point that
> there's something broken on every single commit, which makes the bot just
> let you know that you committed something in the last few hours.
>

They also contain information about failures - yeah, they might not be
yours, but they are often/usually someone's, not just flakey bot failures.
If you're suggesting all the bots are unactionable - then perhaps we should
turn off all notifications on all of them? I have certainly considered that
- and then only enabling bots that are fast/high signal-to-noise/small
blame list. Though I imagine that's a bigger discussion.

> and the broken bots tend to have cycle times of several hours.
>>>>>
>>>>
>>>> Long cycle times are a real problem - that might be best left to
>>>> another discussion about buildbot maintenance - I would be for a policy
>>>> that says bot windows shouldn't be longer than, say, an hour or maybe less.
>>>> (so, eg: if you have a bot that's just going to take 5 hours to run - then
>>>> you need 5 machines that each pickup work every hour, so the blame lists
>>>> are smaller) this doesn't solve the problem of being notified 5 hours later
>>>> about a breakage that was caused by someone else who committed a few
>>>> minutes before or after you. Solving that problem will require a much
>>>> greater investment in infrastructure to chain buildbots, possibly use built
>>>> artefacts from one buildbot to another, etc.
>>>>
>>>>
>>>>> So if you're on IRC and you commit something, you get pinged by llvmbb
>>>>> for hours afterwards.
>>>>>
>>>>> Does anyone think llvmbb is useful?
>>>>>
>>>>
>>>> I sometimes find it useful, but happy to move to llvm-build to get
>>>> those notifications. Other folks might not know to do that, though.
>>>>
>>>>
>>>>> The best thing about llvmbb I've heard it's easy to just "/ignore
>>>>> llvmbb", but if that's what everybody does then why not not have it in the
>>>>> first place?
>>>>>
>>>>> Nico
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200901/1dc15890/attachment-0001.html>