[lldb-dev] [cfe-dev] [llvm-dev] RFC: End-to-end testing

Fri Oct 11 08:48:35 PDT 2019

On Thu, Oct 10, 2019 at 2:21 PM David Greene via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Florian Hahn via llvm-dev <llvm-dev at lists.llvm.org> writes:
>
> >> - Performance varies from implementation to implementation.  It is
> >>  difficult to keep tests up-to-date for all possible targets and
> >>  subtargets.
> >
> > Could you expand a bit more what you mean here? Are you concerned
> > about having to run the performance tests on different kinds of
> > hardware? In what way do the existing benchmarks require keeping
> > up-to-date?
>
> We have to support many different systems and those systems are always
> changing (new processors, new BIOS, new OS, etc.).  Performance can vary
> widely day to day from factors completely outside the compiler's
> control.  As the performance changes you have to keep updating the tests
> to expect the new performance numbers.  Relying on performance
> measurements to ensure something like vectorization is happening just
> isn't reliable in our experience.

Could you compare performance with vectorization turned on and off?

>
> > With tests checking ASM, wouldn’t we end up with lots of checks for
> > various targets/subtargets that we need to keep up to date?
>
> Yes, that's true.  But the only thing that changes the asm generated is
> the compiler.
>
> > Just considering AArch64 as an example, people might want to check the
> > ASM for different architecture versions and different vector
> > extensions and different vendors might want to make sure that the ASM
> > on their specific cores does not regress.
>
> Absolutely.  We do a lot of that sort of thing downstream.
>
> >> - Partially as a result, but also for other reasons, performance tests
> >>  tend to be complicated, either in code size or in the numerous code
> >>  paths tested.  This makes such tests hard to debug when there is a
> >>  regression.
> >
> > I am not sure they have to. Have you considered adding the small test
> > functions/loops as micro-benchmarks using the existing google
> > benchmark infrastructure in test-suite?
>
> We have tried nightly performance runs using LNT/test-suite and have
> found it to be very unreliable, especially the microbenchmarks.
>
> > I think that might be able to address the points here relatively
> > adequately. The separate micro benchmarks would be relatively small
> > and we should be able to track down regressions in a similar fashion
> > as if it would be a stand-alone file we compile and then analyze the
> > ASM. Plus, we can easily run it and verify the performance on actual
> > hardware.
>
> A few of my colleagues really struggled to get consistent results out of
> LNT.  They asked for help and discussed with a few upstream folks, but
> in the end were not able to get something reliable working.  I've talked
> to a couple of other people off-list and they've had similar
> experiences.  It would be great if we have a reliable performance suite.
> Please tell us how to get it working!  :)
>
> But even then, I still maintain there is a place for the kind of
> end-to-end testing I describe.  Performance testing would complement it.
> Neither is a replacement for the other.
>
> >> - Performance tests don't focus on the why/how of vectorization.  They
> >>  just check, "did it run fast enough?"  Maybe the test ran fast enough
> >>  for some other reason but we still lost desired vectorization and
> >>  could have run even faster.
> >>
> >
> > If you would add a new micro-benchmark, you could check that it
> > produces the desired result when adding it. The runtime-tracking
> > should cover cases where we lost optimizations. I guess if the
> > benchmarks are too big, additional optimizations in one part could
> > hide lost optimizations somewhere else. But I would assume this to be
> > relatively unlikely, as long as the benchmarks are isolated.
>
> Even then I have seen small performance tests vary widely in performance
> due to system issues (see above).  Again, there is a place for them but
> they are not sufficient.
>
> > Also, checking the assembly for vector code does also not guarantee
> > that the vector code will be actually executed. So for example by just
> > checking the assembly for certain vector instructions, we might miss
> > that we regressed performance, because we messed up the runtime checks
> > guarding the vector loop.
>
> Oh absolutely.  Presumably such checks would be included in the test or
> would be checked by a different test.  As always, tests have to be
> constructed intelligently.  :)
>
>                       -David
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20191011/3b3fb8cb/attachment.html>