[lldb-dev] proposal for reworked flaky test category

Mon Oct 19 12:53:03 PDT 2015

> I'd like unexpected successes (i.e. tests marked as unexpected failure
that in fact pass)

argh, that should have been "(i.e. tests marked as *expected* failure that
in fact pass)"

On Mon, Oct 19, 2015 at 12:50 PM, Todd Fiala <todd.fiala at gmail.com> wrote:

> Hi all,
>
> I'd like unexpected successes (i.e. tests marked as unexpected failure
> that in fact pass) to retain the actionable meaning that something is
> wrong.  The wrong part is that either (1) the test now passes consistently
> and the author of the fix just missed updating the test definition (or
> perhaps was unaware of the test), or (2) the test is not covering the
> condition it is testing completely, and some change to the code just
> happened to make the test pass (due to the test being not comprehensive
> enough).  Either of those requires some sort of adjustment by the
> developers.
>
> We have a category of test known as "flaky" or "flakey" (both are valid
> spellings, for those who care:
> http://www.merriam-webster.com/dictionary/flaky, although flaky is
> considered the primary).  Flaky tests are tests that we can't get to pass
> 100% of the time.  This might be because it is extremely difficult to write
> the test as such and deemed not worth the effort, or it is a condition that
> is just not going to present itself successfully 100% of the time.  These
> are tests we still want to exercise, but we don't want to have them start
> generating test failures if they don't pass 100% of the time.  Currently
> the flaky test mechanism requires a test to pass one in two times.  That is
> okay for a test that exhibits a slim degree of flakiness.  For others, that
> is not a large enough sample of runs to elicit a successful result.  Those
> tests get marked as XFAIL, and generate a non-actionable "unexpected
> success" result when they do happen to pass.
>
> GOAL
>
> * Enhance expectedFlakey* test decorators.  Allow specification of the
> number of times in which a flaky test should be run to be expected to pass
> at least once.  Call that MAX_RUNS.
>
> * When running a flaky test, run it up MAX_RUNS number of times.  The
> first time it passes, mark it as a successful test completion.  The test
> event system will be given the number of times it was run before passing.
> Whether we consume this info or not is TBD (and falls into the purview of
> the test results formatter).
>
> * If the test does not pass within MAX_RUNS time, mark it as a flaky
> fail.  For purposes of the standard output, this can look like FAIL:
> (flaky) or something similar so fail scanners still see it.  (Note it's
> highly likely I'll do the normal output counts with the TestResults
> formatter-based output at the same time, so we get accurate test method
> counts and the like).
>
> * Flaky tests never generate a non-actionable "unexpected pass".  This
> occurs because we no longer need to mark tests as XFAIL when they require
> more than two runs to get a high degree of confidence in a passing test.
>
> * Flaky tests get marked with a flaky category, so that test runners can
> choose to skip flaky tests by skipping the category.  This may not be
> necessary if tests don't take an excessively long time to get a passing
> grade with high degree of confidence.
>
> Let me know what you all think.  Once we come up with something, I'll
> implement it.
>
> --
> -Todd
>

-- 
-Todd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20151019/4ffe533f/attachment.html>