[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Wed Feb 11 11:12:16 PST 2015

In http://reviews.llvm.org/D7514#122114, @ohsallen wrote:

> Hal,
>
> > Okay, this sounds reasonable, please provide a patch and we'll benchmark it.
>
>
> I am worried that the impact will be small, because as I said UF is bounded with TTI.getMaxInterleaveFactor(). Only with PPC, AMDGPU, AArch64 (for CortexA57) and X86 (with AVX), that bound is greater than 2 (but no greater than 4, except for AMDGPU). Typically to get the optimal UF for the example above on http://reviews.llvm.org/P8 using the proposed cost function,  TTI.getMaxInterleaveFactor() should return 12 instead of 2 (see http://reviews.llvm.org/D7503). And as you said, this is a theoritical max but it's too big and might do harm because of register pressure.

Right, that's good. We need to make sure there aren't significant regressions in performance (or code size that don't correspond to performance increases).

> Maybe we would need another TTI function that provides the theoritical max, like TTI::getMaxTheoriticalInterleaveFactor. Or we would have to multiply UF by some factor, which could be provided through a TTI function with default value 1?

I think we might want to separate the current single number into two numbers: one for ILP and once for latency. But I'm not exactly sure what you're suggesting.

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/