[lldb-dev] proposal for reworked flaky test category

Mon Oct 19 13:03:52 PDT 2015

On Mon, Oct 19, 2015 at 12:50 PM Todd Fiala via lldb-dev <
lldb-dev at lists.llvm.org> wrote:

> Hi all,
>
> I'd like unexpected successes (i.e. tests marked as unexpected failure
> that in fact pass) to retain the actionable meaning that something is
> wrong.  The wrong part is that either (1) the test now passes consistently
> and the author of the fix just missed updating the test definition (or
> perhaps was unaware of the test), or (2) the test is not covering the
> condition it is testing completely, and some change to the code just
> happened to make the test pass (due to the test being not comprehensive
> enough).  Either of those requires some sort of adjustment by the
> developers.
>
I'dd add #3.  The test is actually flaky but is tagged incorrectly.

>
> We have a category of test known as "flaky" or "flakey" (both are valid
> spellings, for those who care:
> http://www.merriam-webster.com/dictionary/flaky, although flaky is
> considered the primary).  Flaky tests are tests that we can't get to pass
> 100% of the time.  This might be because it is extremely difficult to write
> the test as such and deemed not worth the effort, or it is a condition that
> is just not going to present itself successfully 100% of the time.
>
IMO if it's not worth the effort to write the test correctly, we should
delete the test.  Flaky is useful as a temporary status, but if nobody ends
up fixing the flakiness, I think the test should be deleted (more reasons
follow).

> These are tests we still want to exercise, but we don't want to have them
> start generating test failures if they don't pass 100% of the time.
> Currently the flaky test mechanism requires a test to pass one in two
> times.  That is okay for a test that exhibits a slim degree of flakiness.
> For others, that is not a large enough sample of runs to elicit a
> successful result.  Those tests get marked as XFAIL, and generate a
> non-actionable "unexpected success" result when they do happen to pass.
>
> GOAL
>
> * Enhance expectedFlakey* test decorators.  Allow specification of the
> number of times in which a flaky test should be run to be expected to pass
> at least once.  Call that MAX_RUNS.
>
I think it's worth considering it it's a good idea include the date at
which they were declared flakey.  After a certain amount of time has
passed, if it's still flakey they can be relegated to hard failures.  I
don't think flakey should be a permanent state.

>
> * When running a flaky test, run it up MAX_RUNS number of times.  The
> first time it passes, mark it as a successful test completion.  The test
> event system will be given the number of times it was run before passing.
> Whether we consume this info or not is TBD (and falls into the purview of
> the test results formatter).
>

> * If the test does not pass within MAX_RUNS time, mark it as a flaky
> fail.  For purposes of the standard output, this can look like FAIL:
> (flaky) or something similar so fail scanners still see it.  (Note it's
> highly likely I'll do the normal output counts with the TestResults
> formatter-based output at the same time, so we get accurate test method
> counts and the like).
>
The concern I have here (and the reason I would like to delete flakey tests
if the flakiness isn't removed after  certain amount of time) is because
some of our tests are slow.  Repeating them many times is going to have an
impact on how long the test suite takes to run.  It's already tripled over
the past 3 weeks, and I think we need to be careful to keep out things that
have the potential to lead to significant slowness of the test suite runner.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151019/afe67591/attachment-0001.html>