[lldb-dev] [llvm-dev] [cfe-dev] RFC: End-to-end testing

Renato Golin via lldb-dev lldb-dev at lists.llvm.org
Mon Oct 14 05:44:16 PDT 2019


On Fri, 11 Oct 2019 at 18:02, David Greene via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> >> We have to support many different systems and those systems are always
> >> changing (new processors, new BIOS, new OS, etc.).  Performance can vary
> >> widely day to day from factors completely outside the compiler's
> >> control.  As the performance changes you have to keep updating the tests
> >> to expect the new performance numbers.  Relying on performance
> >> measurements to ensure something like vectorization is happening just
> >> isn't reliable in our experience.
> >
> > Could you compare performance with vectorization turned on and off?
>
> That might catch more things but now you're running tests twice and it
> still won't catch some cases.

Precisely.

In my experience, benchmarks numbers need to reset on most (if not
all) system changes, that's why we keep our benchmark machines *very*
stable (ie. outdated). Testing multiple configurations need multiple
baselines, combinatorial explosion and all that.

For clarity, I didn't mean "make e2e tests *only* run tests and check
for performance", that would be a *very* poor substitute for the tests
you proposed.

The idea to have extra checks in the test-suite has circulated many
years ago when a similar proposal was put forward, but IIRC, the
piece-wise LIT tests we already have were deemed good enough for the
cases we wanted to cover.

But in the test-suite, we have more than just the compiler. We have
libraries (run-time, language), tools (linkers, assemblers) and the
environment. Those can affect the quality of the code (as you mention
earlier).

We need to test that, but we can't do a good job in the LIT side (how
do you control libraries and other tools? it can get crazy ugly). The
sanitizer tests are a good example on how weird it gets executing the
code, grepping for output and relying on runtime system libraries to
"get right".

So, in a way, we could just stop the test-suite discussion and do like
the sanitizers. If people are ok with this, don't let me stop you. :)

But if we have some consensus on doing a clean job, then I would
actually like to have that kind of intermediary check (diagnostics,
warnings, etc) on most test-suite tests, which would cover at least
the main vectorisation issues. Later, we could add more analysis
tools, if we want.

It would be as simple as adding CHECK lines on the execution of the
compilation process (in CMake? Make? wrapper?) and keep the check
files with the tests / per file.

I think we're on the same page regarding almost everything, but
perhaps I haven't been clear enough on the main point, which I think
it's pretty simple. :)

--renato


More information about the lldb-dev mailing list