[cfe-dev] [llvm-dev] RFC: End-to-end testing

Thu Oct 10 02:55:15 PDT 2019

> On Oct 9, 2019, at 16:12, David Greene via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Mehdi AMINI via cfe-dev <cfe-dev at lists.llvm.org> writes:
> 
>>> I have a bit of concern about this sort of thing - worrying it'll lead to
>>> people being less cautious about writing the more isolated tests.
>>> 
>> 
>> I have the same concern. I really believe we need to be careful about
>> testing at the right granularity to keep things both modular and the
>> testing maintainable (for instance checking vectorized ASM from a C++
>> source through clang has always been considered a bad FileCheck practice).
>> (Not saying that there is no space for better integration testing in some
>> areas).
> 
> I absolutely disagree about vectorization tests.  We have seen
> vectorization loss in clang even though related LLVM lit tests pass,
> because something else in the clang pipeline changed that caused the
> vectorizer to not do its job.  We need both kinds of tests.  There are
> many asm tests of value beyond vectorization and they should include
> component and well as end-to-end tests.

Have you considered alternatives to checking the assembly for ensuring vectorization or other transformations? For example, instead of checking the assembly, we could check LLVM’s statistics or optimization remarks. If you want to ensure a loop got vectorized, you could check the loop-vectorize remarks, which should give you the position of the loop in the source and vectorization/interleave factor used. There are few other things that could go wrong later on that would prevent vector instruction selection, but I think it should be sufficient to guard against most cases where we loose vectorization and should be much more robust to unrelated changes. If there are additional properties you want to ensure, they potentially could be added to the remark as well.

This idea of leveraging statistics and optimization remarks to track the impact of changes on overall optimization results is nothing new and I think several people already discussed it in various forms. For regular benchmark runs, in addition to tracking the existing benchmarks, we could also track selected optimization remarks (e.g. loop-vectorize, but not necessarily noisy ones like gvn) and statistics. Comparing those run-to-run could potentially highlight new end-to-end issues on a much larger scale, across all existing benchmarks integrated in test-suite. We might be able to detect loss in vectorization pro-actively, instead of requiring someone to file a bug report and then we add an isolated test after the fact.

But building something like this would be much more work of course….

Cheers,
Florian