[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)
Olivier Sallenave
ohsallen at us.ibm.com
Wed Feb 11 10:55:09 PST 2015
Hal,
> Okay, this sounds reasonable, please provide a patch and we'll benchmark it.
I am worried that the impact will be small, because as I said UF is bounded with TTI.getMaxInterleaveFactor(). Only with PPC, AMDGPU, AArch64 (for CortexA57) and X86 (with AVX), that bound is greater than 2 (but no greater than 4, except for AMDGPU). Typically to get the optimal UF for the example above on http://reviews.llvm.org/P8 using the proposed cost function, TTI.getMaxInterleaveFactor() should return 12 instead of 2 (see http://reviews.llvm.org/D7503). And as you said, this is a theoritical max but it's too big and might do harm because of register pressure.
Maybe we would need another TTI function that provides the theoritical max, like TTI::getMaxTheoriticalInterleaveFactor. Or we would have to multiply UF by some factor, which could be provided through a TTI function with default value 1?
http://reviews.llvm.org/D7514
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list