[PATCH] D47675: [test-suite][RFC] Using Google Benchmark Library on Harris Kernel

Pankaj via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 21 13:16:55 PDT 2018


proton added a comment.

In https://reviews.llvm.org/D47675#1138707, @dberris wrote:

> Thanks for making some of the changes. I'm still not clear on a couple of things.
>
> Do you mind sharing some of the results with the new benchmark runs, with the different image sizes? Do we actually get the throughput numbers in there as well?


I replied to your previous comment but for some reason, it is showing only after your inlined comment not here.

I cannot use a pointer to pointer of an array (float **) here as the compiler may think that some pointers may overlap and prevents Polly from detecting SCoPs here.

I have to allocate the fixed size arrays here as "float (&outputImg)[2+height][2+width] = *reinterpret_cast<float (*)[2+height][2+width]>((float *) malloc(...)); " is not allowed by clang++

Also, Are the Number of bytes processed is calculated w.r.t to the size of output or the total number of bytes accessed in the kernel?



================
Comment at: MicroBenchmarks/harris/main.cpp:179
+}
+BENCHMARK(BENCHMARK_HARRIS)->Unit(benchmark::kMicrosecond);
+#endif
----------------
dberris wrote:
> It seems that HEIGHT and WIDTH are input values anyway, consider making multiple input sizes to see how the kernel performs as you scale the image size goes up.
> 
> You might also not need the `__restrict__` attributes for the malloc-provided heap memory either. This means you could do:
> 
> ```
> float **image = reinterpret_cast<float**>(malloc(sizeof(float) * (2 + state.range(0)) * (2 + state.range(1))));
> ```
> 
> When you register the benchmark, you can then provide the image sizes to test with:
> 
> ```
> BENCHMARK(HarrisBenchmark)
>     ->Unit(benchmark::kMicrosecond)
>     ->Args({256, 256})
>     ->Args({512, 512})
>     ->Args({1024, 1024})
>     ->Args({2048, 2048});
> ```
> 
> You can see more options at https://github.com/google/benchmark#passing-arguments.
> 
> Another thing you may consider measuring as I suggested in the past is throughput. To do that, you can call `state.SetBytesProcessed(...)` in the benchmark body, typically at the end just before exiting -- you want to essentially report something like:
> 
> ```
> state.SetBytesProcessed(sizeof(float) * (state.range(0) + 2) * (state.range(1) + 2) * state.iterations());
> ```
> 
> This will add a "MB/sec" output alongside the time it took for each iteration of the benchmark.
Cannot use float **image as pointers may overlap and this prevents Polly from detecting scops.

I have to allocate the fixed size arrays here as "float (&outputImg)[2+height][2+width] = *reinterpret_cast<float (*)[2+height][2+width]>((float *) malloc(...)); " is not allowed by clang++

I did considered adding SetBytesProcessed but I was not sure how many bytes should be written as argument (output image size or the total bytes accessed in kernel) so I commented the line "SetBytesProcessed(static_cast<int64_t>(state.iterations())*WIDTH*HEIGHT*50);" but forgot to ask about it.


https://reviews.llvm.org/D47675





More information about the llvm-commits mailing list