[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Wed Mar 4 11:08:03 PST 2015

In http://reviews.llvm.org/D7514#133020, @ohsallen wrote:

> Hi Hal,
>
> I ran more samples in a quieter manner, but there was no significant speedups/slowdowns. I also tried a more aggressive scheme, which was the first patch proposed here: just interleave large loops with reductions by UF, and forget about the cost function to reduce UF. Still got the same performance on POWER8 (there are about 20 tests that have a larger interleave factor doing so).
>
> The good news is that when we interleave such large loops, there is no spilling observed on POWER8. That could be explained by the fact that there are many registers on this target, and that the register pressure heuristics are decent.
>
> There are still specific cases (like the one originally discussed) where a very significant speedup have been observed (up to 3x). We should definitely optimize for these cases. The heuristic with the cost function is too much for what it's worth. What about allowing to interleave large loops with reductions for certain targets? We could have a TTI function for that, TTI.enableAggressiveInterleaving() for instance, that would return false except for POWER7 and POWER8, where the interleave factor can be large and have an impact on performance.

I think this is reasonable at the present time. At some point, we might have a reasonable way to model instruction throughput vs. latency, the effect of ooo cross-iteration dispatch, etc., but we don't currently. Tacking that is likely a long-term project.

> Thanks,

> Olivier

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/