<div><div><br></div><div><br><div class="gmail_quote"></div></div></div><div><div dir="ltr" class="gmail_attr">On Thu, Oct 10, 2019 at 2:21 PM David Greene via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Florian Hahn via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> writes:<br>

<br>

>> - Performance varies from implementation to implementation.  It is<br>

>>  difficult to keep tests up-to-date for all possible targets and<br>

>>  subtargets.<br>

><br>

> Could you expand a bit more what you mean here? Are you concerned<br>

> about having to run the performance tests on different kinds of<br>

> hardware? In what way do the existing benchmarks require keeping<br>

> up-to-date?<br>

<br>

We have to support many different systems and those systems are always<br>

changing (new processors, new BIOS, new OS, etc.).  Performance can vary<br>

widely day to day from factors completely outside the compiler's<br>

control.  As the performance changes you have to keep updating the tests<br>

to expect the new performance numbers.  Relying on performance<br>

measurements to ensure something like vectorization is happening just<br>

isn't reliable in our experience.</blockquote><div dir="auto"><br></div></div><div><div dir="auto">Could you compare performance with vectorization turned on and off?</div></div><div><div><div class="gmail_quote"><div dir="auto"><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

> With tests checking ASM, wouldn’t we end up with lots of checks for<br>

> various targets/subtargets that we need to keep up to date?<br>

<br>

Yes, that's true.  But the only thing that changes the asm generated is<br>

the compiler.<br>

<br>

> Just considering AArch64 as an example, people might want to check the<br>

> ASM for different architecture versions and different vector<br>

> extensions and different vendors might want to make sure that the ASM<br>

> on their specific cores does not regress.<br>

<br>

Absolutely.  We do a lot of that sort of thing downstream.<br>

<br>

>> - Partially as a result, but also for other reasons, performance tests<br>

>>  tend to be complicated, either in code size or in the numerous code<br>

>>  paths tested.  This makes such tests hard to debug when there is a<br>

>>  regression.<br>

><br>

> I am not sure they have to. Have you considered adding the small test<br>

> functions/loops as micro-benchmarks using the existing google<br>

> benchmark infrastructure in test-suite?<br>

<br>

We have tried nightly performance runs using LNT/test-suite and have<br>

found it to be very unreliable, especially the microbenchmarks.<br>

<br>

> I think that might be able to address the points here relatively<br>

> adequately. The separate micro benchmarks would be relatively small<br>

> and we should be able to track down regressions in a similar fashion<br>

> as if it would be a stand-alone file we compile and then analyze the<br>

> ASM. Plus, we can easily run it and verify the performance on actual<br>

> hardware.<br>

<br>

A few of my colleagues really struggled to get consistent results out of<br>

LNT.  They asked for help and discussed with a few upstream folks, but<br>

in the end were not able to get something reliable working.  I've talked<br>

to a couple of other people off-list and they've had similar<br>

experiences.  It would be great if we have a reliable performance suite.<br>

Please tell us how to get it working!  :)<br>

<br>

But even then, I still maintain there is a place for the kind of<br>

end-to-end testing I describe.  Performance testing would complement it.<br>

Neither is a replacement for the other.<br>

<br>

>> - Performance tests don't focus on the why/how of vectorization.  They<br>

>>  just check, "did it run fast enough?"  Maybe the test ran fast enough<br>

>>  for some other reason but we still lost desired vectorization and<br>

>>  could have run even faster.<br>

>> <br>

><br>

> If you would add a new micro-benchmark, you could check that it<br>

> produces the desired result when adding it. The runtime-tracking<br>

> should cover cases where we lost optimizations. I guess if the<br>

> benchmarks are too big, additional optimizations in one part could<br>

> hide lost optimizations somewhere else. But I would assume this to be<br>

> relatively unlikely, as long as the benchmarks are isolated.<br>

<br>

Even then I have seen small performance tests vary widely in performance<br>

due to system issues (see above).  Again, there is a place for them but<br>

they are not sufficient.<br>

<br>

> Also, checking the assembly for vector code does also not guarantee<br>

> that the vector code will be actually executed. So for example by just<br>

> checking the assembly for certain vector instructions, we might miss<br>

> that we regressed performance, because we messed up the runtime checks<br>

> guarding the vector loop.<br>

<br>

Oh absolutely.  Presumably such checks would be included in the test or<br>

would be checked by a different test.  As always, tests have to be<br>

constructed intelligently.  :)<br>

<br>

                      -David<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</blockquote></div></div>

</div>