[llvm-dev] buildbot failure in LLVM on clang-native-arm-cortex-a9

Thu Aug 27 15:25:19 PDT 2015

Hello everyone,

Thanks for the discussion.

There are 2 threads here:

1. Why builders on Panda boards are failing on the way which looks like the
bots are flaky.

First of all, the failures are consistent, and do not relate to tests.
These are compilation of certain files stalled for longer than 20 mins.
The problem looks valid to me . I'm researching the cause. So far it looks
like it takes more than 1GB to compile some unit. In particular, I see this
with ASTContext.cpp and ASTMatchersInternal.cpp. There are maybe more of
such files, I'll research.

Anyway, this issue has nothing to do with the way how exactly the build
gets orchestrated. I.e. cmake + ninja would demonstrate the same stall as
the currently used autoconfig + make.

I'm still researching and will report the exact findings as soon as I'll
finalize them.

A big part of the false "flaky" sense is because we do incremental builds.
Some problems remain hidden for a long time and get exposed often by some
random event / commit, triggering "I'm so annoyed" discussions.

I will re-evaluate the need of incremental builds and will try clear builds
to see how much commits would balk together. If it would still be
reasonable, I'll switch the Cortex-A9 bots to clean builds.

For now, I take these bots down.

2. What to do with bots which are "noisy".

First of all, I'm still in the middle of reading the thread. :)

In general, I'm with Renato on this. It should not be easy to shut the
annoying bot, just because it is not obvious at the moment why it is not
happy. Bugging the owner, is fine. I spend quite some time watching the
quality of the bots and communicating to the owners. If you are the owner,
you know this.

Thanks

Galina

On Thu, Aug 27, 2015 at 7:10 AM, Renato Golin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 27 August 2015 at 13:58, Daniel Sanders <Daniel.Sanders at imgtec.com>
> wrote:
> > I agree with the principle but 2 days feels a bit short to me since,
> accounting for time zone differences, it's closer to 1 working day. For
> example, an email sent at 9am PDT arrives at 5pm BST and (assuming normal
> working hours) might be read at 9am BST (1am PDT). Daylight savings can
> also make a difference since timezones that use it don't agree on when it's
> in effect. The owner taking a single day off is easily sufficient to go
> past the 2 day limit.
>
> Yeah, that's my feeling, too. But we Philip said, the specifics should
> be discussed on a proper RFC thread, at least we all agree on some
> threshold being defined.
>
>
> > However, the main comment I wanted to make is that it would be useful to
> be able to tell whether the buildmaster has picked up changes or not. I
> understand that many changes are automatically applied without a
> buildmaster restart but at the moment it can be difficult to tell when this
> happens.
>
> If the bot owner changes the master and restart the slave, the old
> master will show the slave as offline and the new one as online. As
> long as it stops sending emails, that's most of the problem dealt
> with. But that leaves a trail of unfinished builds and bloats the
> master by collecting commits for a build that will never happen. We
> may have to change the logic to stop collecting commits for offline
> bots.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150827/449e2450/attachment.html>