<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Oct 7, 2015 at 3:09 PM Renato Golin <<a href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 7 October 2015 at 22:44, Eric Christopher <<a href="mailto:echristo@gmail.com" target="_blank">echristo@gmail.com</a>> wrote:<br>

> I think this is a poor analogy. You're also ignoring the solution I gave you<br>

> in my previous mail for slow bots.<br>

<br>

I'm not ignoring it, I'm acting upon it. But it takes time. I don't<br>

have infinite resources.<br>

<br></blockquote><div><br></div><div>Of course, it just seemed like you were ignoring it as a (partial/full) solution.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

> If you can't give some basic stability guarantees then the bot<br>

> is only harming the entire testing infrastructure.<br>

<br>

Define stability. Daniel was talking about "things I can act upon".<br>

That's so vague it means nothing. "Basic stability guarantees" is on a<br>

similar gist.<br>

<br></blockquote><div><br></div><div>Basic stability guarantee:</div><div>"Only returns failure for failures due to the compiler or the occasional exception"</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Any universal rule you try to make will either be too lax for fast and<br>

reliable bots, or too hard on slow and less used bots.<br>

<br></blockquote><div><br></div><div>I don't know how fast/slow comes into this. See Chris's mail for more comments on this. I think you're concentrating too hard on this particular axis to the detriment of the discussion. I think a better way is to look at it as "signal to noise" ratio. </div><div><br></div><div>If the bot is correctly identifying problems, but yet mostly staying green then it has a good signal and is useful, </div><div><br></div><div>If it's mostly red due to:</div><div>a) instability (exceptions, timeouts, what have you), or </div><div>b) no one looking at the failures, or</div><div>c) can't complete fast enough to deal with the transient red in top of tree</div><div><br></div><div>then it isn't providing a lot of signal.</div><div><br></div><div>This is my general guideline on how bots should go. A description of what's going on with yours and how they relate here is probably good to have as far as yours. Other sets of bots may fall into different sets of the buckets here.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

That's what I'm finding hard to understand. All you guys are saying is<br>

that things are bad and need to get better. I agree completely. But<br>

your solution is to turn off everything you don't understand or assume<br>

it's flaky, and that's just wrong.<br>

<br>

We had two flaky bots: Pandas and a Juno. Pandas were disabled, the<br>

Juno was fixed. Some of our bots, however, are still slow, and we have<br>

been asked to disable them because they were red for too long.<br>

<br></blockquote><div><br></div><div>Are they red because the tree is red over their run lifetime or red because there are problems that aren't being fixed?</div><div><br></div><div>If it's the former then they might truly be too slow to be enabled right now as public bots. When (I hope it's a when) we move to a staged bot infrastructure they can be re-enabled as things that send email and bug people when they fail. If it's the latter then we need to figure out how to get problems identified and fixed in a more rapid fashion.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Most of the problem we find are bad tests from people that didn't<br>

(obviously) test on ARM. The second most common is code that doesn't<br>

take into account 32-bits platforms. The third most common breakages<br>

is the sanitizer tests, which pop in and out on many platforms. The<br>

most common long breakage is due to self-hosted Clang breaking and<br>

making it hard to find what commit to revert or even warn the<br>

developer.<br>

<br>

None of those are due to instability of my buildbots. But I got<br>

shouted at many times to disable the bot because it was "red for too<br>

long". I find this behaviour disrespectful.<br>

<br></blockquote><div><br></div><div>Seems reasonable. If you're getting actual failures then that seems like something reasonable. If you're not trying to get them fixed by getting testcases or helping people get a problem that they can see then it may mean that since the owner doesn't care then no one does :)</div><div><br></div><div>Again, I'm not saying this is what's going on with your bots in particular, just describing a general case.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I'm now trying to get 8 more ARM boards and 3 AArch64, and I plan to<br>

put them as redundant builders. But it takes time. Weeks to make them<br>

work reliably, more weeks to make sure they won't fall under pressure,<br>

more weeks to put in production and stabilise. Meanwhile, I'd<br>

appreciate if people stopped trying to kill the others.<br><br></blockquote><div><br></div><div>Honestly I'm not sure if redundant builders are the solution here, but rather the phased system. Basically more noise (e.g. they're all going to fail) isn't going to help. That said, if they help you reduce time to find problems then it's great.</div><div><br></div><div>Hope this explains my position on how the bots should work. I definitely think we need a phased scheme and I was hoping to hear some sort of scheduling idea or transition idea from Chris. I have no idea what kind of time he's got for this sort of thing. If it's documentation to move a set of bots over to the phased builder then that seems like it would be an amazing help to the community in general :)</div><div><br></div><div>Thanks!</div><div><br></div><div>-eric</div></div></div>