[PATCH] Disable loop unrolling in loop vectorization pass when VF is 1 on x86

Wed May 6 09:01:56 PDT 2015

In http://reviews.llvm.org/D9515#166600, @rengolin wrote:

> Hi Wei,
>
> The example you have shown would produce bad vectorized code on any architecture, I don't think anything you said (multiple unrolling and prologue loops) would make much difference on other archs. Maybe you're trying to fix a global problem locally, and creating some unnecessary constraints for the cases that do work.
>
> However, your performance improvements are really impressive, so I think we ought to check other archs, and maybe try to detect the problematic case on a generic level?

The problem is fairly generic, but does need per-target tuning. The problem is that interleaving can be quite beneficial for VF == 1 for in-order chips with fairly-long pipelines (especially for floating point). But on those architectures, you end up unrolling a lot to get good performance. On X86, the constraints are different. On X86 we can't unroll a lot (in some sense), because you're unrolling to fill the loop-stream detectors's associated dispatch buffer, and there is a large performance cliff if you make the loop not fit into the buffer. This all unrolling is minor and the extra prologues really hurt a lot.

This LGTM.

> cheers,

> --renato

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D9515

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/