[PATCH] Add benchmarking-only mode to the test suite

Fri May 23 08:36:21 PDT 2014

On 20 May 2014 17:08, Tobias Grosser <tobias at grosser.es> wrote:
> This means we are obviously breaking those limits. However, this should be
> rather predictable, no?
Apparently not.

# started on Fri May 23 15:04:44 2014
 Performance counter stats for './3mm.simple':

      14099.995114 task-clock (msec)         #    1.000 CPUs utilized
     4,477,210,010 L1-dcache-load-misses     #  317.533 M/sec
         [66.66%]
       171,091,671 L1-dcache-store-misses    #   12.134 M/sec
         [66.68%]
         7,280,044 L1-icache-load-misses     #    0.516 M/sec
         [66.68%]
     3,215,844,875 LLC-loads                 #  228.074 M/sec
         [66.71%]
        17,047,853 LLC-stores                #    1.209 M/sec
         [66.68%]
       357,245,416 cache-misses              #   25.337 M/sec
         [66.63%]

# started on Fri May 23 15:05:02 2014
 Performance counter stats for './3mm.simple':

      10874.834701 task-clock (msec)         #    1.000 CPUs utilized
     3,907,819,424 L1-dcache-load-misses     #  359.345 M/sec
         [66.66%]
        84,628,748 L1-dcache-store-misses    #    7.782 M/sec
         [66.70%]
         6,989,338 L1-icache-load-misses     #    0.643 M/sec
         [66.70%]
     3,226,399,896 LLC-loads                 #  296.685 M/sec
         [66.67%]
        17,862,459 LLC-stores                #    1.643 M/sec
         [66.66%]
       183,435,107 cache-misses              #   16.868 M/sec
         [66.65%]

Hal, can you add -DSMALL_DATASET to Makefile or change the array size
for STANDARD_DATASET to avoid filling up L1d cache?

Cheers,
Yi Kong

On 20 May 2014 17:08, Tobias Grosser <tobias at grosser.es> wrote:
> On 20/05/2014 14:47, Hal Finkel wrote:
>>
>> ----- Original Message -----
>>>
>>> From: "Yi Kong" <kongy.dev at gmail.com>
>>>
>>> To: "Tobias Grosser" <tobias at grosser.es>
>>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Eric Christopher"
>>> <echristo at gmail.com>, "llvm-commits"
>>> <llvm-commits at cs.uiuc.edu>
>>> Sent: Tuesday, May 20, 2014 7:11:27 AM
>>> Subject: Re: [PATCH] Add benchmarking-only mode to the test suite
>>>
>>> Tobias, I can't reproduce your findings on my machine. Even if I
>>> disabled output(removing -DPOLYBENCH_DUMP_ARRAYS) and piped to
>>> /dev/null, I still get lots of spikes. I think we need to exclude
>>> those tests until we find out how to stabilize those results.
>>
>>
>> Okay, I'll also exclude them for now.
>
>
> Yes, for now that seems to be the right choice.
>
>
>> How large is the working set? Could you be seeing TLB misses?
>
> I just did the following calculations:
>
> For gemm the working set is 3 * 1024^2 * sizeof(double)
>
> -> 1024 * 1024 * 3 * 8 / 1024 / 1024 = 24 MB working set
>
> We are running the tests on a machine with an Intel(R) Xeon(R) CPU E5430 @
> 2.66GHz:
>
> http://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=30806
>
> 64-byte Prefetching
> Data TLB: 4-KB Pages, 4-way set associative, 256 entries
> Data TLB: 4-MB Pages, 4-way set associative, 32 entries
> Instruction TLB: 2-MB pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries
> Instruction TLB: 4-KB Pages, 4-way set associative, 128 entries
> L1 Data TLB: 4-KB pages, 4-way set associative, 16 entries
> L1 Data TLB: 4-MB pages, 4-way set associative, 16 entries
>
> Assuming 4MB pages, the TLB limit is 128 MB, so this should be fine.
>
> L1 Cache: 4 x 32 KB
> L2 Cache: 2 x 6 MB
>
> This means we are obviously breaking those limits. However, this should be
> rather predictable, no?
>
> Cheers,
> Tobias