[llvm-dev] [cfe-dev] RFC: End-to-end testing
Robinson, Paul via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 10 08:46:48 PDT 2019
David Greene, will you be at the LLVM Dev Meeting? If so, could you sign
up for a Round Table session on this topic? Obviously lots to discuss
and concerns to be addressed.
In particular I think there are two broad categories of tests that would
have to be segregated just by the nature of their requirements:
(1) Executable tests. These obviously require an execution platform; for
feasibility reasons this means host==target and the guarantee of having
a linker (possibly but not necessarily LLD) and a runtime (possibly but
not necessarily including libcxx). Note that the LLDB tests and the
debuginfo-tests project already have this kind of dependency, and in the
case of debuginfo-tests, this is exactly why it's a separate project.
(2) Non-executable tests. These are near-identical in character to the
existing clang/llvm test suites and I'd expect lit to drive them. The
only material difference from the majority(*) of existing clang tests is
that they are free to depend on LLVM features/passes. The only difference
from the majority of existing LLVM tests is that they have [Obj]{C,C++} as
their input source language.
(*) I've encountered clang tests that I feel depend on too much within LLVM,
and it's common for new contributors to provide a C/C++ test that needs to
be converted to a .ll test. Some of them go in anyway.
More comments/notes below.
> -----Original Message-----
> From: lldb-dev <lldb-dev-bounces at lists.llvm.org> On Behalf Of David Greene
> via lldb-dev
> Sent: Wednesday, October 09, 2019 9:25 PM
> To: Philip Reames <listmail at philipreames.com>; llvm-dev at lists.llvm.org;
> cfe-dev at lists.llvm.org; openmp-dev at lists.llvm.org; lldb-dev at lists.llvm.org
> Subject: Re: [lldb-dev] [cfe-dev] [llvm-dev] RFC: End-to-end testing
>
> Philip Reames via cfe-dev <cfe-dev at lists.llvm.org> writes:
>
> > A challenge we already have - as in, I've broken these tests and had to
> > fix them - is that an end to end test which checks either IR or assembly
> > ends up being extraordinarily fragile. Completely unrelated profitable
> > transforms create small differences which cause spurious test failures.
> > This is a very real issue today with the few end-to-end clang tests we
> > have, and I am extremely hesitant to expand those tests without giving
> > this workflow problem serious thought. If we don't, this could bring
> > development on middle end transforms to a complete stop. (Not kidding.)
>
> Do you have a pointer to these tests? We literally have tens of
> thousands of end-to-end tests downstream and while some are fragile, the
> vast majority are not. A test that, for example, checks the entire
> generated asm for a match is indeed very fragile. A test that checks
> whether a specific instruction/mnemonic was emitted is generally not, at
> least in my experience. End-to-end tests require some care in
> construction. I don't think update_llc_test_checks.py-type operation is
> desirable.
Sony likewise has a rather large corpus of end-to-end tests. I expect any
vendor would. When they break, we fix them or report/fix the compiler bug.
It has not been an intolerable burden on us, and I daresay if it were at
all feasible to put these upstream, it would not be an intolerable burden
on the community. (It's not feasible because host!=target and we'd need
to provide test kits to the community and our remote-execution tools. We'd
rather just run them internally.)
Philip, what I'm actually hearing from your statement is along the lines,
"Our end-to-end tests are really fragile, therefore any end-to-end test
will be fragile, and that will be an intolerable burden."
That's an understandable reaction, but I think the community literally
would not tolerate too-fragile tests. Tests that are too fragile will
be made more robust or removed. This has been community practice for a
long time. There's even an entire category of "noisy bots" that certain
people take care of and don't bother the rest of the community. The
LLVM Project as a whole would not tolerate a test suite that "could
bring development ... to a complete stop" and I hope we can ease your
concerns.
More comments/notes/opinions below.
>
> Still, you raise a valid point and I think present some good options
> below.
>
> > A couple of approaches we could consider:
> >
> > 1. Simply restrict end to end tests to crash/assert cases. (i.e. no
> > property of the generated code is checked, other than that it is
> > generated) This isn't as restrictive as it sounds when combined
> > w/coverage guided fuzzer corpuses.
>
> I would be pretty hesitant to do this but I'd like to hear more about
> how you see this working with coverage/fuzzing.
I think this is way too restrictive.
>
> > 2. Auto-update all diffs, but report them to a human user for
> > inspection. This ends up meaning that tests never "fail" per se,
> > but that individuals who have expressed interest in particular tests
> > get an automated notification and a chance to respond on list with a
> > reduced example.
>
> That's certainly workable.
This is not different in principle from the "noisy bot" category, and if
it's a significant concern, the e2e tests can start out in that category.
Experience will tell us whether they are inherently fragile. I would not
want to auto-update tests.
>
> > 3. As a variant on the former, don't auto-update tests, but only inform
> > the *contributor* of an end-to-end test of a failure. Responsibility
> > for determining failure vs false positive lies solely with them, and
> > normal channels are used to report a failure after it has been
> > confirmed/analyzed/explained.
>
> I think I like this best of the three but it raises the question of what
> happens when the contributor is no longer contributing. Who's
> responsible for the test? Maybe it just sits there until someone else
> claims it.
This is *exactly* the "noisy bot" tactic, and bots are supposed to have
owners who are active.
>
> > I really think this is a problem we need to have thought through and
> > found a workable solution before end-to-end testing as proposed becomes
> > a practically workable option.
>
> Noted. I'm very happy to have this discussion and work the problem.
>
> -David
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
More information about the llvm-dev
mailing list