[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Wed Feb 11 02:37:27 PST 2015

In http://reviews.llvm.org/D7514#121703, @ohsallen wrote:

> Let me try to explain the rationale below the proposed cost function: UF = UF * CriticalPathLength / LoopLength
>
> I assume CriticalPathLength is the number of reductions in the loop (or more precisely, the length of the longest reduction chain found in the loop, which would be 3 for examples above as the innermost loop is inlined).
>
> (CriticalPathLength / LoopLength) gives us the distance between the reductions. Typically, the distance is very short for the examples above, so CriticalPathLength / LoopLength = 3 / 3 = 1, and UF remains unchanged. If the distance is long, it means there are potentially many FP instructions between each reductions, therefore an OoO pipeline will be able to exploit ILP, and we don't need to add much interleaving — UF will be diminished.

Okay, this sounds reasonable, please provide a patch and we'll benchmark it.

> And if there is no OoO, UF will be short anyways.

No, if there is no OoO, then the UF should be set to hide pipeline latency, and so its size will depend on the depth of the pipeline (which is not necessarily small).

For OoO cores with deep pipelines, we should also be unrolling larger loops, but that's another matter.

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/