[LLVMdev] buildbot failure in LLVM on clang-native-arm-cortex-a9
alp at nuanti.com
Sat Jan 4 16:03:05 PST 2014
On 04/01/2014 19:21, Renato Golin wrote:
> On 4 January 2014 18:57, Alp Toker <alp at nuanti.com
> <mailto:alp at nuanti.com>> wrote:
> Would it be possible to skip sending mail on
> hardware/OS/out-of-disk messages?
> I imagine this is just a matter of checking the process exit code
> from the build system: 0 for success, 1 for build failure that
> sends notifications, everything else is an admin problem.
> No, exit codes don't tell the whole story. One would have to grep for
> specific messages like "disk full" or "not reproducible".
> If the script in use has no code owner, I'll appreciate a pointer
> to what's sending the mails and I'll see if someone can look into
> it and submit a patch.
> I have no idea where is this code, or who is responsible.
So, I did some digging:
zorg/buildbot/commands/StandardizedTest.py has logic that converts logs
and status reports into an actionable test results.
> We should be more proactive and disable noisy build servers until
> a technical solution is available rather than the other way round,
> given how they drown out real problems.
> It's not that simple. The ARM boards we have been using are all
> development boards, built with the quality you'd expect from
> evaluation hardware. The only production hardware you can find with an
> ARM chip inside are mobile phones, tablets and the Samsung Chromebook
> (which we use at Linaro), but they are not fit for being servers by a
> long shot. The only server-grade ARM hardware, Calxeda, went bankrupt
> last month. :(
> Unfortunately, those bots are our only solution for now, and we'll
> have to keep them running the best we can. We must fix the problem
> (grep on errors, and all the other things we discussed last week), not
> turn off the only buildbots we have.
I didn't realise these bots were the last line of defence for ARM
support! In that case let's keep them in commission and focus on the
grep fix you suggest. Agree that stderr is a more practical informant
than exit codes.
The most spammy patterns are predictable and relate to SVN outage,
network failures, out-of-disk-space and non-deterministic results
presumably related to the hardware flakiness you described. Those should
only be sent the device admins and maybe the module owner, never
individual committers to whom they're unactionable.
Think we have a handle on this now but a "pong, XXX owns this module"
would be appreciated from anyone in the know.
the browser experts
More information about the llvm-dev