[PATCH] D47675: [test-suite][RFC] Using Google Benchmark Library on Harris Kernel

Thu Jun 21 16:37:41 PDT 2018

dberris accepted this revision.
dberris added a comment.
This revision is now accepted and ready to land.

In https://reviews.llvm.org/D47675#1139805, @proton wrote:

> In https://reviews.llvm.org/D47675#1138707, @dberris wrote:
>
> > Thanks for making some of the changes. I'm still not clear on a couple of things.
> >
> > Do you mind sharing some of the results with the new benchmark runs, with the different image sizes? Do we actually get the throughput numbers in there as well?
>
>
> I replied to your previous comment but for some reason, it is showing only after your inlined comment not here.
>
> I cannot use a pointer to pointer of an array (float **) here as the compiler may think that some pointers may overlap and prevents Polly from detecting SCoPs here.

I'm not sure that's entirely true -- the pointer is coming from malloc, so they're meant to not overlap. At least LLVM should be able to detect that.

> I have to allocate the fixed size arrays here as "float (&outputImg)[2+height][2+width] = *reinterpret_cast<float (*)[2+height][2+width]>((float *) malloc(...)); " is not allowed by clang++

It's not allowed because height and width are not constant expressions. Note that even in function arguments, the "array" form will be treated as pointers anyway so it shouldn't make any difference if you change the API to:

  void harrisKernel(
      int height, int width, float **inputImg,
      float **outputImg, float **Ix,
      float **Iy, float **Ixx,
      float **Ixy, float **Iyy,
      float **Sxx, float **Sxy,
      float **Syy, float **det,
      float **trace)

If anything, you want to mark those pointers that the compiler is supposed to treat as non-aliasing as `restrict` or `__restrict__`.

> Also, Are the Number of bytes processed is calculated w.r.t to the size of output or the total number of bytes accessed in the kernel?

The bytes processed is what you say it is -- as I suggested, it is based on the input size (size of the input image). If you want to measure something else, you're going to have to provide what that throughput is.

The beauty of having a benchmark suite is that you can test out these various approaches in the same benchmark. You can explore alternative strategies like:

- Turn the harris kernel implementation into a template, so you can use C++ features and take `const std::array<std::array<float, W>, H>&`, and let the compiler deduce the sizes.
- Have a version of the kernel with pointer arguments with `__restrict__` and one without.
- Instead of taking output pointers, consider returning a struct/tuple with `std::unique_ptr<float[][]>` members.
- Use C++ array new and array delete (say `new float*[(height *2) * (width * 2)]`).

Anyway, I'm fine with the state of it currently, but would like to see more work done if not now but in the future. This will be really helpful for people working on the compiler trying to see whether performance/throughput can be improved (or regress) with changes to the compiler. I'm sure the folks working on Polly would like to be able to diagnose why the Polly version is slower than the normal build.

Please wait for someone else to LGTM/Accept before landing. I'm sure we can spend a lot more time trying to make these benchmarks better, but in the meantime having *something* in there is better than not having one in there.

================
Comment at: MicroBenchmarks/harris/main.cpp:179
+}
+BENCHMARK(BENCHMARK_HARRIS)->Unit(benchmark::kMicrosecond);
+#endif
----------------
proton wrote:
> dberris wrote:
> > It seems that HEIGHT and WIDTH are input values anyway, consider making multiple input sizes to see how the kernel performs as you scale the image size goes up.
> > 
> > You might also not need the `__restrict__` attributes for the malloc-provided heap memory either. This means you could do:
> > 
> > ```
> > float **image = reinterpret_cast<float**>(malloc(sizeof(float) * (2 + state.range(0)) * (2 + state.range(1))));
> > ```
> > 
> > When you register the benchmark, you can then provide the image sizes to test with:
> > 
> > ```
> > BENCHMARK(HarrisBenchmark)
> >     ->Unit(benchmark::kMicrosecond)
> >     ->Args({256, 256})
> >     ->Args({512, 512})
> >     ->Args({1024, 1024})
> >     ->Args({2048, 2048});
> > ```
> > 
> > You can see more options at https://github.com/google/benchmark#passing-arguments.
> > 
> > Another thing you may consider measuring as I suggested in the past is throughput. To do that, you can call `state.SetBytesProcessed(...)` in the benchmark body, typically at the end just before exiting -- you want to essentially report something like:
> > 
> > ```
> > state.SetBytesProcessed(sizeof(float) * (state.range(0) + 2) * (state.range(1) + 2) * state.iterations());
> > ```
> > 
> > This will add a "MB/sec" output alongside the time it took for each iteration of the benchmark.
> Cannot use float **image as pointers may overlap and this prevents Polly from detecting scops.
> 
> I have to allocate the fixed size arrays here as "float (&outputImg)[2+height][2+width] = *reinterpret_cast<float (*)[2+height][2+width]>((float *) malloc(...)); " is not allowed by clang++
> 
> I did considered adding SetBytesProcessed but I was not sure how many bytes should be written as argument (output image size or the total bytes accessed in kernel) so I commented the line "SetBytesProcessed(static_cast<int64_t>(state.iterations())*WIDTH*HEIGHT*50);" but forgot to ask about it.
I don't know whether you want to optimise for Polly or make Polly just recognise these pointers shouldn't overlap. If Polly can't detect that these pointers are coming from different 'malloc' calls, then I suspect that's a bug in Polly rather than something you need to work around in the benchmark.

Note that maybe the better thing to do is to change the kernel's API to put `restrict` or `__restrict__` on the pointers, so that the optimiser in those cases might be able to assume that the pointers don't alias and don't do anything special in this benchmark.

See my top-level comment for alternatives to explore, if you're open to it.

https://reviews.llvm.org/D47675