<div dir="ltr">Thanks, Tamas.<br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 19, 2015 at 4:30 AM, Tamas Berghammer <span dir="ltr"><<a href="mailto:tberghammer@google.com" target="_blank">tberghammer@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The expected flakey works a bit differently then you are described:<div>* Run the tests</div><div>* If it passes, it goes as a successful test and we are done</div><div>* Run the test again</div><div>* If it is passes the 2nd time then record it as expected failure (IMO expected falkey would be a better result, but we don't have that category)</div></div></blockquote><div><br></div><div>I agree. I plan to add that category (I think I even have a bugzilla bug I created for myself on that). The intent would be to have a "pass flakey" and "fail flakey" end state for a run. How many times to run and entry/exit from run TBD. If we mark it right, and we know how many times we should be able to run it to have a single pass, we could really do this right.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>* If it fails 2 times in a row then record it as a failure because a flakey test should pass at least once in every 2 run (it means we need ~95% success rate to keep the build bot green in most of the time). If it isn't passing often enough for that then it should be marked as expected failure. This is done this way to detect the case when a flakey test get broken completely by a new change.</div><div><br></div></div></blockquote><div><br></div><div>I see. Thanks. That totally explains what I was seeing.</div><div><br></div><div>Internally I have been using "unexpected success' as an actionable item, failing our testbots. The idea being is that if something is supposed to fail and it is now passing, that indicates either (1) somebody fixed it with a change and didn't update the test as a oversight, (2) somebody fixed it with a change that shouldn't have fixed it, and an issue with the test logic is not testing something properly, and the test should be updated.</div><div><br></div><div>That is kind of stymied by this type of test result, as unexpected success becomes a "sometimes meaningless" signal. And anything that is sometimes meaningless can make the meaningful ones get overlooked.</div><div><br></div><div>So I would actively like to move away from unexpected success containing a sometimes useful / sometimes not useful semantic. We should tackle that soon.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>I checked some states for TestRaise on the build bot and in the current definition of expected flakey we shouldn't mark it as flakey because it will often fail 2 times in a row (it passing rate is ~50%) what will be reported as a failure making the build bot red.</div><div> <br></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>I will send you the full stats from the lass 100 build in a separate off list mail as it is a too big for the mailing list. If somebody else is interested in it then let me know.</div><span class="HOEnZb"><font color="#888888"><div><br></div></font></span></div></blockquote><div><br></div><div>Thanks, Tamas!</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><span class="HOEnZb"><font color="#888888"><div></div><div>Tamas</div></font></span><div><div class="h5"><br><div class="gmail_quote"><div dir="ltr">On Sun, Oct 18, 2015 at 2:18 AM Todd Fiala <<a href="mailto:todd.fiala@gmail.com" target="_blank">todd.fiala@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Nope, no good either when I limit the flakey to DWO.<div><br></div><div>So perhaps I don't understand how the flakey marking works. I thought it meant:</div><div>* run the test. </div><div>* If it passes, it goes as a successful test. Then we're done.</div><div>* run the test again.</div><div>* If it passes, then we're done and mark it a successful test. If it fails, then mark it an expected failure.</div><div><br></div><div>But that's definitely not the behavior I'm seeing, as a flakey marking in the above scheme should never produce a failing test.</div><div><br></div><div>I'll have to revisit the flakey test marking to see what it's really doing since my understanding is clearly flawed!</div></div><div class="gmail_extra"></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Oct 17, 2015 at 5:57 PM, Todd Fiala <span dir="ltr"><<a href="mailto:todd.fiala@gmail.com" target="_blank">todd.fiala@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hmm, the flakey behavior may be specific to dwo. Testing it locally as unconditionally flaky on Linux is failing on dwarf. All the ones I see succeed are dwo. I wouldn't expect a diff there but that seems to be the case.<div><br></div><div>So, the request still stands but I won't be surprised if we find that dwo sometimes passes while dwarf doesn't (or at least not enough to get through the flakey setting).</div></div><div class="gmail_extra"><div><div><br><div class="gmail_quote">On Sat, Oct 17, 2015 at 4:57 PM, Todd Fiala <span dir="ltr"><<a href="mailto:todd.fiala@gmail.com" target="_blank">todd.fiala@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Tamas,<div><br></div><div>I think you grabbed me stats on failing tests in the past. Can you dig up the failure rate for TestRaise.py's test_restart_bug() variants on Ubuntu 14.04 x86_64? I'd like to mark it as flaky on Linux, since it is passing most of the time over here. But I want to see if that's valid across all Ubuntu 14.04 x86_64. (If it is passing some of the time, I'd prefer marking it flakey so that we don't see unexpected successes).</div><div><br></div><div>Thanks!</div><span><font color="#888888"><div><div><br></div>-- <br><div><div dir="ltr">-Todd</div></div>
</div></font></span></div>
</blockquote></div><br><br clear="all"><div><br></div></div></div><span><font color="#888888">-- <br><div><div dir="ltr">-Todd</div></div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div></div><div class="gmail_extra">-- <br><div><div dir="ltr">-Todd</div></div>
</div></blockquote></div></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr">-Todd</div></div>
</div></div>