[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Wed Feb 11 10:55:09 PST 2015

Hal,

> Okay, this sounds reasonable, please provide a patch and we'll benchmark it.

I am worried that the impact will be small, because as I said UF is bounded with TTI.getMaxInterleaveFactor(). Only with PPC, AMDGPU, AArch64 (for CortexA57) and X86 (with AVX), that bound is greater than 2 (but no greater than 4, except for AMDGPU). Typically to get the optimal UF for the example above on http://reviews.llvm.org/P8 using the proposed cost function,  TTI.getMaxInterleaveFactor() should return 12 instead of 2 (see http://reviews.llvm.org/D7503). And as you said, this is a theoritical max but it's too big and might do harm because of register pressure.

Maybe we would need another TTI function that provides the theoritical max, like TTI::getMaxTheoriticalInterleaveFactor. Or we would have to multiply UF by some factor, which could be provided through a TTI function with default value 1?

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/