[PATCH][TEST-SUITE] Marking more programs as not to be run in benchmark-only mode

Tue May 19 10:32:30 PDT 2015

I think this is great! I totally agree.

> On May 19, 2015, at 10:28 AM, Kristof Beyls <kristof.beyls at arm.com> wrote:
> 
> Hi,
> 
> One substantial source of performance noisiness of the LNT test-suite are
> programs running
> very shortly.
> The attached patch disables more of the programs in benchmark-only mode,
> based on analysis
> that I've done on the programs that run less than 10ms on
> http://llvm.org/perf/db_default/v4/nts/machine/39. 
> 
> I think it is debatable whether some of these should remain to be run in
> benchmark-only mode,
> as we also have code in LNT to just ignore all benchmarks running less than
> 10 ms.
> 
> Here is the list of programs that are removed in benchmark-only mode by the
> attached patch.
> 
> The programs that clearly don't have value as a benchmark:
> * SingleSource/UnitTests/Vector: constpool simple:
>  both don't have any loops in the code.
> * SingleSource/UnitTests/Vector/AArch64: aarch64_neon_intrinsics:
>  doesn't have any loop in the code.
> * SingleSource/UnitTests/Vector/NEON: simple:
>  doesn't have any loop in the code.
> * SingleSource/UnitTests: 2005-07-15-Bitfield-ABI 2006-01-23-UnionInit
> 2007-04-10-BitfieldTest:
>  doesn't have any loop in the code.
> * MultiSource/Benchmarks/Prolangs-C: loader:
>  This program exits immediately because no arguments are given on the
> command line. Unless
>  someone creates inputs for this program, this should not be considered a
> benchmark.
> 
> 
> The programs for which it's debatable whether they have value or not.
> I think none of these do enough work to be considered as a benchmark:
> * MultiSource/Benchmarks/McCat/:
>  - 15-trie produces trie data structures, but the program has no loops - so
> it's probably IO
>    bound, and therefore shouldn't be considered as a benchmark.
> * MultiSource/Benchmarks/Prolangs-C:
>  - cdecl: The benchmark parses about 70 C declarations
> * MultiSource/Benchmarks/MiBench:
>  - office-stringsearch searches a few hundred substrings in a set of a few
> hundred strings.
>  - telecom-adpcm seems to spend most of its time in IO - and should have a
> loop to do the main data
>    transformation multiple times on the same buffer if run as a benchmark.
> * SingleSource/Benchmarks/Stanford:
>  - IntMM: does 10 matrix multiplications of size 40x40.
> * SingleSource/Regression/C/Makefile: matrixTranspose,  sumarray2d,
> test_indvars:
>  - matrixTranspose: transposes a 32x32 matrix 10 times.
>  - sumarray2d: creates a 100x100 matrix and sums all the elements in it.
>  - test_indvars: traverses a 20000-element matrix twice.
> * SingleSource/UnitTests/SignlessTypes:
>  - rem: does 100 gcd computations.
> 
> The following programs also run very shortly, but I do think they have value
> as a benchmark - so the attached patch doesn't disable them in benchmark
> mode:
> * SingleSource/Benchmarks/Misc/lowercase
> * SingleSource/Benchmarks/Shootout/objinst
> * SingleSource/Benchmarks/Shootout-C++/objinst
> For all three, it seems that the main computation in the benchmark is
> completely
> optimized away. Keeping these running as benchmarks should allow us to catch
> if
> llvm ever regresses in being able to optimize away the main computation in
> these
> programs.
> 
> 
> What do you think?
> 
> Thanks,
> 
> Kristof
> <exclude_short_running_programs_from_benchmark_mode.diff>