[llvm-dev] enabling interleaved access loop vectorization

Fri Aug 5 16:18:47 PDT 2016

As Ashutosh wrote, the BasicTTI cost model evaluates this as the cost of
using extracts and inserts.
So even if we end up generating inserts and extracts (and I believe we
actually manage to get the right shuffles, more or less, courtesy of
InstCombine and the shuffle lowering code), we should be seeing
improvements with the current cost model.
I agree that we can get *more* improvement with better cost modeling, but
I'd expect to be able to get *some* improvement the way things are right
now.

That's why I'm curious about where we saw regressions - I'm wondering
whether there's really a significant cost modeling issue I'm missing, or
it's something that's easy to fix so that we can make forward progress,
while Ashutosh is working on the longer-term solution.

On Fri, Aug 5, 2016 at 2:03 PM, Renato Golin <renato.golin at linaro.org>
wrote:

> On 5 August 2016 at 21:00, Demikhovsky, Elena
> <elena.demikhovsky at intel.com> wrote:
> > As far as I remember, may be I’m wrong, vectorizer does not generate
> > shuffles for interleave access. It generates a bunch of extracts and
> inserts
> > that ought to be coupled into shuffles after wise.
>
> That's my understanding as well.
>
> Whatever strategy we take, it will be a mix of telling the cost model
> to avoid some pathological cases as well as improving the detection of
> the patterns in the x86 back-end.
>
> The work to benchmark this properly looks harder than enabling the
> right flags and patterns. :)
>
> cheers,
> --renato
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160805/9d71e253/attachment.html>