[PATCH] Cleanup / consolidation of small loop unroll logic

Arnold Schwaighofer aschwaighofer at apple.com
Tue Jan 28 19:57:19 PST 2014


On Jan 28, 2014, at 6:29 PM, Chandler Carruth <chandlerc at gmail.com> wrote:

> This look OK? I just wanted to double check that there wasn't a specific reason to keep these two things separate. Hoping to get some good benchmarks on the new stuff you added soon.
> 

I am a little concerned about this part:

+  if (!Legal->getRuntimePointerCheck()->Need &&
       LoopCost < SmallLoopCost) {

This now guards all unrolling. We won’t unroll vectorized code if we need a runtime check where we did before. For example code that looks like the following snippet does not get unrolled.

int foo_func(float *A, float *B, int N) {
  for (i in 0..N)
    A[i] *= B[i];
}

For optimal throughput we might want to unroll and vectorized this and we don’t mind the runtime check.

I wanted unrolling for load/stores ports to not create an extra runtime check if there wasn’t already one and so I had the "!Legal->getRuntimePointerCheck()->Need” guard. Initially this code was in the if (VF == 1) section. When I ported the patch this got lost. This should have been something like:

  if (EnableLoadStoreRuntimeUnroll &&
      (VF > 1 || !Legal->getRuntimePointerCheck()->Need) &&
      LoopCost < SmallLoopCost) {
    // Unroll until store/load ports (estimated by max unroll factor) are
    // saturated.
    unsigned UnrollStores = UF / (Legal->NumStores ? Legal->NumStores : 1);
    unsigned UnrollLoads = UF /  (Legal->NumLoads ? Legal->NumLoads : 1);
    UF = std::max(std::min(UnrollStores, UnrollLoads), 1u);
    return UF;
  } 

Or for your patch:

-  if (EnableLoadStoreRuntimeUnroll &&
-      !Legal->getRuntimePointerCheck()->Need &&
+  // We want to unroll small loops in order to reduce the loop overhead and
+  // potentially expose ILP opportunities.
+  DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n');
+  if ((VF > 1 || !Legal->getRuntimePointerCheck()->Need) &&
       LoopCost < SmallLoopCost) {
+    // We assume that the cost overhead is 1 and we use the cost model
+    // to estimate the cost of the loop and unroll until the cost of the
+    // loop overhead is about 5% of the cost of the loop.
+    unsigned SmallUF = std::min(UF, (unsigned)PowerOf2Floor(SmallLoopCost / LoopCost));
+



> Also, unrelated, but can you commit the register pressure tweak yet? Still need more benchmarking?

Yes, this still needs benchmarking. If you want I can commit it behind a flag.





More information about the llvm-commits mailing list