[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Tue Feb 10 10:22:02 PST 2015

I don't think that just ignoring SmallLoopCost for all loops with reductions will fly ;) -- but, I think that adjusting the UF threshold in a more-intelligent way certainly makes sense. On some cores it is important not to make the size of loops to larger because there is a threshold penalty effect. For example, on Intel cores, there is a small buffer associated with the LSD (the loop stream detector), and we want loops to fit into that buffer. I believe this is why SmallLoopCost is currently so small (20). That having been said, there is a trade-off, and we'll need to do some tuning.

> To handle all targets appropriately, I propose the following cost function to compute the unroll factor (not in this patch yet):

UF = UF * CriticalPathLength / LoopLength

I agree, this seems like essentially what you'd like to do (assuming that the original UF is set based on the number of functional units available). We don't however, want this to override the register pressure constraint. Once you start spilling in the loop, most of these considerations are more-or-less irrelevant.

Adam, Nadav, Adam, any thoughts on this?

Procedural note: Please upload full-context patches, see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/