[LLVMdev] [RFC] Benchmarking subset of the test suite
Hal Finkel
hfinkel at anl.gov
Sun May 4 14:01:33 PDT 2014
----- Original Message -----
> From: "Tobias Grosser" <tobias at grosser.es>
> To: "Hal Finkel" <hfinkel at anl.gov>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Sunday, May 4, 2014 1:40:52 PM
> Subject: Re: [LLVMdev] [RFC] Benchmarking subset of the test suite
>
> oOn 04/05/2014 14:39, Hal Finkel wrote:
> > At the LLVM Developers' Meeting in November, I promised to work on
> > isolating a subset of the current test suite that is useful for
> > benchmarking. Having looked at this in more detail, most of the
> > applications and benchmarks in the test suite are useful for
> > benchmarking, and so I think that a better way of phrasing it is
> > that we should construct a list of programs in the test suite that
> > are not useful for benchmarking.
> >
> > My proposed exclusion list is provided below. I constructed this
> > exclusion list primarily based on the following experiment: I ran
> > the test suite 10 times in three configurations: 1) On an IBM
> > POWER7 (P7) with -O3 -mvsx, 2) On a P7 at -O0 and 3) On an Intel
> > Xeon E5430 with -O3 all using make -j6. I then used the ministat
> > utility (which performs a T test) to compare the timings of the
> > two P7 configurations against each other and the Xeon
> > configuration, requiring a detectable difference at 99.5%
> > confidence. I looked for tests that showed no significant
> > difference in all three comparisons. The running configuration
> > here is purposefully noisy, the idea is to eliminate those tests
> > that are significantly sensitive to startup time, file I/O time,
> > memory bandwidth, etc., or just too short, and by running many
> > tests in parallel (non-deterministically), my hope is to eliminate
> > those tests can cannot usefully serve as benchmarks in a "normal"
> > environment.
> >
> > I'll admit being somewhat surprised by so many of the Prolangs and
> > Shootout "benchmarks" seemingly not serving as useful benchmarks;
> > perhaps someone can look into improving the problem size, etc. of
> > these.
> >
> > Without further ado, I propose that a test-suite configuration
> > designed for benchmarking exclude the following:
>
> Hi Hal,
>
> thanks for putting the effort! I think the systematic approach you
> have
> taken is very sensible.
>
> I went through your list and looked at a couple of interesting cases.
Thanks! -- I figured you'd have something to add to this endeavor ;)
> For the shootout benchmarks I looked at the results and the history
> my
> LNT -O3 builder shows (long history, always 10 samples per run,
> http://llvm.org/perf/db_default/v4/nts/25326)
>
> Some observations from my side:
>
> ## Many benchmarks from your list have a runtime of zero seconds
> reported in my tester
This is true from my data is well.
>
> ## For some of the benchmarks you propose, manually looking at the
> historic samples allows a human to spot certain trends:
>
> > MultiSource/Benchmarks/Prolangs-C/football/football
>
> http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.237=34.237.3&submit=Update
>
> > MultiSource/Benchmarks/Prolangs-C/simulator/simulator
>
> http://llvm.org/perf/db_default/v4/nts/graph?show_all_points=yes&moving_window_size=10&plot.314=34.314.3&submit=Update
>
Are these plots of compile time or execution time? Both of these say, "Type: compile_time". I did not consider compile time in my analysis, and I think that is a separate issue.
> ## Some other benchmarks with zero seconds execution time are not
> contained in your list. E.g.:
>
> SingleSource/Benchmarks/Shootout/objinst
> SingleSource/Benchmarks/Shootout-C++/objinst
Interestingly, on my x86 machines this also executes for zero time, but at -O0 it takes a significant amount of time (and on PPC, even at -O3, it runs for about 0.0008s). So I think it is still useful to keep these.
>
> ## Some benchmarks on your list are really _no_ benchmarks:
>
> Shoothout hello:
>
> #include <stdio.h>
>
> int main() {
> puts("hello world\n");
> return(0);
> }
>
> Shootout sumcol:
>
> int main(int argc, char * * argv) {
> char line[MAXLINELEN];
> int sum = 0;
> char buff[4096];
> cin.rdbuf()->pubsetbuf(buff, 4096); // enable buffering
>
> while (cin.getline(line, MAXLINELEN)) {
> sum += atoi(line);
> }
> cout << sum << '\n';
> }
Indeed.
>
> To subsum, I believe this list might benefit from some improvements,
> but
> it seems to be a really good start. If someone wants to do a more
> extensive analysis, we can always analyze the historic data available
> in
> my -O3 performance buildbot. It should give us a very good idea on
> how
> noisy certain benchmarks are.
Sounds good to me.
-Hal
>
> Cheers,
> Tobias
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list