<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 19, 2015, at 4:40 PM, Zachary Turner via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org" class="">lldb-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Yea, I definitely agree with you there.  <div class=""><br class=""></div><div class="">Is this going to end up with an @expectedFlakeyWindows, @expectedFlakeyLinux, @expectedFlakeyDarwin, @expectedFlakeyAndroid, @expectedFlakeyFreeBSD?</div><div class=""><br class=""></div><div class="">It's starting to get a little crazy, at some point I think we just need something that we can use like this:</div><div class=""><br class=""></div><div class="">@test_status(status=flaky, host=[win, linux, android, darwin, bsd], target=[win, linux, android, darwin, bsd]<span style="line-height:1.5" class="">, compiler=[gcc, clang], debug_info=[dsym, dwarf, dwo])</span></div></div><br class=""></div></blockquote><div><br class=""></div><div>I think this was part of the initial intent in making the categories feature. That you would be able to mark tests with any number of “tags” in the form of categories, and then skip or execute only tests that had certain tag(s) marked to them</div><div><br class=""></div><div>With that said, the feature as it stands:</div><div><span class="Apple-tab-span" style="white-space:pre">    </span>- does not support different categories for methods in a class</div><div><span class="Apple-tab-span" style="white-space:pre">  </span>-  does not allow any more complex logic than “is this category present on this test?”</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- requires manual definition of all categories (e.g. “xfail” cross-product “platform” should be auto-generable)</div><div><br class=""></div><div>We could extend the categories system to fix all of these issues, and then you could just mark tests with categories instead of attributes. Then you would only have one attribute that would be like</div><div><br class=""></div><div>@lldbtest.categories(“win-flakey”, “linux-xfail”, “dsym”)</div><div>def test_stuff(self):</div><div>  …</div><br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote"><div dir="ltr" class="">On Mon, Oct 19, 2015 at 4:35 PM Todd Fiala <<a href="mailto:todd.fiala@gmail.com" class="">todd.fiala@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">My initial proposal was an attempt to not entirely skip running them on our end and still get them to generate actionable signals without conflating them with unexpected successes (which they absolutely are not in a semantic way).</div><div class="gmail_extra"></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Oct 19, 2015 at 4:33 PM, Todd Fiala <span dir="ltr" class=""><<a href="mailto:todd.fiala@gmail.com" target="_blank" class="">todd.fiala@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Nope, I have no issue with what you said.  We don't want to run them over here at all because we don't see enough useful info come out of them.  You need time series data for that to be somewhat useful, and even then it only is useful if you see a sharp change in it after a specific change.<div class=""><br class=""></div><div class="">So I really don't want to be running flaky tests at all as their signals are not useful on a per-run basis.</div></div><div class="gmail_extra"><div class=""><div class=""><br class=""><div class="gmail_quote">On Mon, Oct 19, 2015 at 4:16 PM, Zachary Turner <span dir="ltr" class=""><<a href="mailto:zturner@google.com" target="_blank" class="">zturner@google.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Don't get me wrong, I like the idea of running flakey tests a couple of times and seeing if one passes (Chromium does this too as well, so it's not without precedent).  If I sounded harsh, it's because I *want* to be harsh on flaky tests.  Flaky tests indicate literally the *worst* kind of bugs because you don't even know what kind of problems they're causing in the wild, so by increasing the amount of pain they cause people (test suite running longer, etc) the hope is that it will motivate someone to fix it.  </div><div class=""><div class=""><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Mon, Oct 19, 2015 at 4:04 PM Todd Fiala <<a href="mailto:todd.fiala@gmail.com" target="_blank" class="">todd.fiala@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Okay, so I'm not a fan of the flaky tests myself, nor of test suites taking longer to run than needed.<div class=""><br class=""></div><div class="">Enrico is going to add a new 'flakey' category to the test categorization.</div><div class=""><br class=""></div><div class="">Scratch all the other complexity I offered up.  What we're going to ask is if a test is flakey, please add it to the 'flakey' category.  We won't do anything different with the category by default, so everyone will still get flakey tests running the same manner they do now.  However, on our test runners, we will be disabling the category entirely using the skipCategories mechanism since those are generating too much noise.</div><div class=""><br class=""></div><div class="">We may need to add a per-test-method category mechanism since right now our only mechanism to add categories (1) specify a dot-file to the directory to have everything in it get tagged with a category, or (2) override the categorization for the TestCase getCategories() mechanism.</div><div class=""><br class=""></div><div class="">-Todd</div></div><div class="gmail_extra"></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Mon, Oct 19, 2015 at 1:03 PM, Zachary Turner <span dir="ltr" class=""><<a href="mailto:zturner@google.com" target="_blank" class="">zturner@google.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><br class=""><br class=""><div class="gmail_quote"><span class=""><div dir="ltr" class="">On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <<a href="mailto:lldb-dev@lists.llvm.org" target="_blank" class="">lldb-dev@lists.llvm.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="">Hi all,<div class=""><br class=""></div><div class="">I'd like unexpected successes (i.e. tests marked as unexpected failure that in fact pass) to retain the actionable meaning that something is wrong.  The wrong part is that either (1) the test now passes consistently and the author of the fix just missed updating the test definition (or perhaps was unaware of the test), or (2) the test is not covering the condition it is testing completely, and some change to the code just happened to make the test pass (due to the test being not comprehensive enough).  Either of those requires some sort of adjustment by the developers.</div></div></blockquote></span><div class="">I'dd add #3.  The test is actually flaky but is tagged incorrectly.</div><span class=""><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class=""><br class=""></div><div class="">We have a category of test known as "flaky" or "flakey" (both are valid spellings, for those who care: <a href="http://www.merriam-webster.com/dictionary/flaky" target="_blank" class="">http://www.merriam-webster.com/dictionary/flaky</a>, although flaky is considered the primary).  Flaky tests are tests that we can't get to pass 100% of the time.  This might be because it is extremely difficult to write the test as such and deemed not worth the effort, or it is a condition that is just not going to present itself successfully 100% of the time.  </div></div></blockquote></span><div class="">IMO if it's not worth the effort to write the test correctly, we should delete the test.  Flaky is useful as a temporary status, but if nobody ends up fixing the flakiness, I think the test should be deleted (more reasons follow).</div><span class=""><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class="">These are tests we still want to exercise, but we don't want to have them start generating test failures if they don't pass 100% of the time.  Currently the flaky test mechanism requires a test to pass one in two times.  That is okay for a test that exhibits a slim degree of flakiness.  For others, that is not a large enough sample of runs to elicit a successful result.  Those tests get marked as XFAIL, and generate a non-actionable "unexpected success" result when they do happen to pass.</div><div class=""><br class=""></div><div class="">GOAL</div><div class=""><br class=""></div><div class="">* Enhance expectedFlakey* test decorators.  Allow specification of the number of times in which a flaky test should be run to be expected to pass at least once.  Call that MAX_RUNS.</div></div></blockquote></span><div class="">I think it's worth considering it it's a good idea include the date at which they were declared flakey.  After a certain amount of time has passed, if it's still flakey they can be relegated to hard failures.  I don't think flakey should be a permanent state.</div><span class=""><div class=""> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class=""><br class=""></div><div class="">* When running a flaky test, run it up MAX_RUNS number of times.  The first time it passes, mark it as a successful test completion.  The test event system will be given the number of times it was run before passing.  Whether we consume this info or not is TBD (and falls into the purview of the test results formatter).</div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class=""><br class=""></div><div class="">* If the test does not pass within MAX_RUNS time, mark it as a flaky fail.  For purposes of the standard output, this can look like FAIL: (flaky) or something similar so fail scanners still see it.  (Note it's highly likely I'll do the normal output counts with the TestResults formatter-based output at the same time, so we get accurate test method counts and the like).</div></div></blockquote></span><div class="">The concern I have here (and the reason I would like to delete flakey tests if the flakiness isn't removed after  certain amount of time) is because some of our tests are slow.  Repeating them many times is going to have an impact on how long the test suite takes to run.  It's already tripled over the past 3 weeks, and I think we need to be careful to keep out things that have the potential to lead to significant slowness of the test suite runner.<br class=""></div><div class=""> </div></div></div>

</blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div></div><div class="gmail_extra">-- <br class=""><div class=""><div dir="ltr" class="">-Todd</div></div>

</div></blockquote></div>

</div></div></blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div></div></div><span class=""><font color="#888888" class="">-- <br class=""><div class=""><div dir="ltr" class="">-Todd</div></div>

</font></span></div>

</blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div></div><div class="gmail_extra">-- <br class=""><div class=""><div dir="ltr" class="">-Todd</div></div>

</div></blockquote></div>

_______________________________________________<br class="">lldb-dev mailing list<br class=""><a href="mailto:lldb-dev@lists.llvm.org" class="">lldb-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev<br class=""></div></blockquote></div><br class=""><div class="">

<div class="" style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="Apple-interchange-newline">Thanks,</div><div class="" style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><i class="">- Enrico</i><br class="">📩 egranata@<font color="#ff2600" class=""></font>.com ☎️ 27683</div>

</div>

<br class=""></body></html>