[llvm-dev] [cfe-dev] RFC: End-to-end testing

Thu Oct 10 08:46:48 PDT 2019

David Greene, will you be at the LLVM Dev Meeting? If so, could you sign
up for a Round Table session on this topic?  Obviously lots to discuss
and concerns to be addressed.

In particular I think there are two broad categories of tests that would
have to be segregated just by the nature of their requirements:

(1) Executable tests. These obviously require an execution platform; for
feasibility reasons this means host==target and the guarantee of having
a linker (possibly but not necessarily LLD) and a runtime (possibly but
not necessarily including libcxx).  Note that the LLDB tests and the 
debuginfo-tests project already have this kind of dependency, and in the
case of debuginfo-tests, this is exactly why it's a separate project.

(2) Non-executable tests.  These are near-identical in character to the
existing clang/llvm test suites and I'd expect lit to drive them.  The 
only material difference from the majority(*) of existing clang tests is 
that they are free to depend on LLVM features/passes.  The only difference 
from the majority of existing LLVM tests is that they have [Obj]{C,C++} as 
their input source language.
(*) I've encountered clang tests that I feel depend on too much within LLVM,
and it's common for new contributors to provide a C/C++ test that needs to 
be converted to a .ll test.  Some of them go in anyway.

More comments/notes below.

> -----Original Message-----
> From: lldb-dev <lldb-dev-bounces at lists.llvm.org> On Behalf Of David Greene
> via lldb-dev
> Sent: Wednesday, October 09, 2019 9:25 PM
> To: Philip Reames <listmail at philipreames.com>; llvm-dev at lists.llvm.org;
> cfe-dev at lists.llvm.org; openmp-dev at lists.llvm.org; lldb-dev at lists.llvm.org
> Subject: Re: [lldb-dev] [cfe-dev] [llvm-dev] RFC: End-to-end testing
> 
> Philip Reames via cfe-dev <cfe-dev at lists.llvm.org> writes:
> 
> > A challenge we already have - as in, I've broken these tests and had to
> > fix them - is that an end to end test which checks either IR or assembly
> > ends up being extraordinarily fragile.  Completely unrelated profitable
> > transforms create small differences which cause spurious test failures.
> > This is a very real issue today with the few end-to-end clang tests we
> > have, and I am extremely hesitant to expand those tests without giving
> > this workflow problem serious thought.  If we don't, this could bring
> > development on middle end transforms to a complete stop.  (Not kidding.)
> 
> Do you have a pointer to these tests?  We literally have tens of
> thousands of end-to-end tests downstream and while some are fragile, the
> vast majority are not.  A test that, for example, checks the entire
> generated asm for a match is indeed very fragile.  A test that checks
> whether a specific instruction/mnemonic was emitted is generally not, at
> least in my experience.  End-to-end tests require some care in
> construction.  I don't think update_llc_test_checks.py-type operation is
> desirable.

Sony likewise has a rather large corpus of end-to-end tests.  I expect any
vendor would.  When they break, we fix them or report/fix the compiler bug.
It has not been an intolerable burden on us, and I daresay if it were at
all feasible to put these upstream, it would not be an intolerable burden
on the community.  (It's not feasible because host!=target and we'd need
to provide test kits to the community and our remote-execution tools. We'd
rather just run them internally.)

Philip, what I'm actually hearing from your statement is along the lines,
"Our end-to-end tests are really fragile, therefore any end-to-end test 
will be fragile, and that will be an intolerable burden."

That's an understandable reaction, but I think the community literally
would not tolerate too-fragile tests.  Tests that are too fragile will 
be made more robust or removed.  This has been community practice for a 
long time.  There's even an entire category of "noisy bots" that certain 
people take care of and don't bother the rest of the community.  The 
LLVM Project as a whole would not tolerate a test suite that "could 
bring development ... to a complete stop" and I hope we can ease your
concerns.

More comments/notes/opinions below.

> 
> Still, you raise a valid point and I think present some good options
> below.
> 
> > A couple of approaches we could consider:
> >
> >  1. Simply restrict end to end tests to crash/assert cases.  (i.e. no
> >     property of the generated code is checked, other than that it is
> >     generated)  This isn't as restrictive as it sounds when combined
> >     w/coverage guided fuzzer corpuses.
> 
> I would be pretty hesitant to do this but I'd like to hear more about
> how you see this working with coverage/fuzzing.

I think this is way too restrictive.

> 
> >  2. Auto-update all diffs, but report them to a human user for
> >     inspection.  This ends up meaning that tests never "fail" per se,
> >     but that individuals who have expressed interest in particular tests
> >     get an automated notification and a chance to respond on list with a
> >     reduced example.
> 
> That's certainly workable.

This is not different in principle from the "noisy bot" category, and if
it's a significant concern, the e2e tests can start out in that category.
Experience will tell us whether they are inherently fragile.  I would not
want to auto-update tests.

> 
> >  3. As a variant on the former, don't auto-update tests, but only inform
> >     the *contributor* of an end-to-end test of a failure. Responsibility
> >     for determining failure vs false positive lies solely with them, and
> >     normal channels are used to report a failure after it has been
> >     confirmed/analyzed/explained.
> 
> I think I like this best of the three but it raises the question of what
> happens when the contributor is no longer contributing.  Who's
> responsible for the test?  Maybe it just sits there until someone else
> claims it.

This is *exactly* the "noisy bot" tactic, and bots are supposed to have
owners who are active.

> 
> > I really think this is a problem we need to have thought through and
> > found a workable solution before end-to-end testing as proposed becomes
> > a practically workable option.
> 
> Noted.  I'm very happy to have this discussion and work the problem.
> 
>                      -David
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev