[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Tue Feb 10 10:52:19 PST 2015

> I don't think that just ignoring SmallLoopCost for all loops with reductions will fly ;) -- but, I think that adjusting the UF threshold in a more-intelligent way certainly makes sense.

Right, typically, we don't want to unroll for the code below. With the proposed cost function, assuming that CriticalPathLength is 1, UF is divided by LoopSize which is large.

  for (...) {
      // large instruction sequence unrelated to r
      r += ...
  }

> On some cores it is important not to make the size of loops to larger because there is a threshold penalty effect. For example, on Intel cores, there is a small buffer associated with the LSD (the loop stream detector), and we want loops to fit into that buffer. I believe this is why SmallLoopCost is currently so small (20). That having been said, there is a trade-off, and we'll need to do some tuning.

We could compute a threshold on the UnrolledLoopSize, which would be related to LSD for Intel and to the I-cache for others targets. Then we would reduce UF until UnrolledLoopSize fits that threshold.

> UF = UF * CriticalPathLength / LoopLength

> 

> I agree, this seems like essentially what you'd like to do (assuming that the original UF is set based on the number of functional units available). We don't however, want this to override the register pressure constraint. Once you start spilling in the loop, most of these considerations are more-or-less irrelevant.

UF in the right member of the assignment is already computed according to register pressure (using calculateRegisterUsage()), so I assume it took into account the fact that the loop is large. The resulting UF would be usually smaller, so I believe this would be conservative enough?

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/