[PATCH] D34619: [ARM] Enable partial and runtime unrolling

Thu Jul 6 03:07:03 PDT 2017

samparker added a comment.

Hi Eli,

Your comments make sense to me, so I ran an example to figure out if this heuristic was indeed nonsense. Here's the example kernel:

  for (unsigned i = 0; i < max; ++i) {
    acc = 0;
    innerMax = dataSize - i;
    for (unsigned j = 0; j < innerMax; ++j) {
      acc += (input[j] * input[i+j]) >> scaleValue;
    }

The results in the graph show that often the unrolled version is faster, but the net affect across the data set is that unrolling is detrimental on performance. My other benchmark results also show that having this restriction doesn't negatively impact the performance, so I think that including the heuristic to prevent unrolling is valid. 
F3628212: unrolling.png <https://reviews.llvm.org/F3628212>

cheers,
sam

https://reviews.llvm.org/D34619