[PATCH] D34619: [ARM] Enable partial and runtime unrolling

Wed Jun 28 01:40:17 PDT 2017

samparker added inline comments.

================
Comment at: lib/Target/ARM/ARMTargetTransformInfo.cpp:599
+  // Enable runtime unrolling for non-nested loops.
+  if (L->getLoopDepth() == 1) {
+    UP.Runtime = true;
----------------
efriedma wrote:
> These heuristics seem really weird... why does the loop depth affect the performance of a loop?  Why does the trip count of the parent loop affect the performance of a loop?
> 
> I mean, I can imagine these heuristics are profitable for your particular benchmarks, but it seems like a performance trap: for example, someone turns on LTO, so a function gets inlined into a loop, so we decide it's no longer profitable to unroll the inner loop, and we lose a bunch of performance.
Hi Eli,

The depth check here is just for an early exit. For nested loops, I'm checking whether the trip count of the inner loop will be the same for each  iteration of the outer loops, it's not the actual trip count that affects performance.

I have found that its not profitable to unroll loops with calls because it can prevent the inlining from happening, so unrolling will not happen in those cases and I hope that the inlining does occur. Some inlining could push the cost above the unrolling threshold, but that doesn't matter because we're still getting the inlining performance gains. The thresholds have been chosen to unroll small loops where the overhead of the backedge, on m-class cores, is significant.

cheers,
sam

https://reviews.llvm.org/D34619