[llvm-dev] enabling interleaved access loop vectorization
Michael Kuperstein via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 5 16:18:47 PDT 2016
As Ashutosh wrote, the BasicTTI cost model evaluates this as the cost of
using extracts and inserts.
So even if we end up generating inserts and extracts (and I believe we
actually manage to get the right shuffles, more or less, courtesy of
InstCombine and the shuffle lowering code), we should be seeing
improvements with the current cost model.
I agree that we can get *more* improvement with better cost modeling, but
I'd expect to be able to get *some* improvement the way things are right
That's why I'm curious about where we saw regressions - I'm wondering
whether there's really a significant cost modeling issue I'm missing, or
it's something that's easy to fix so that we can make forward progress,
while Ashutosh is working on the longer-term solution.
On Fri, Aug 5, 2016 at 2:03 PM, Renato Golin <renato.golin at linaro.org>
> On 5 August 2016 at 21:00, Demikhovsky, Elena
> <elena.demikhovsky at intel.com> wrote:
> > As far as I remember, may be I’m wrong, vectorizer does not generate
> > shuffles for interleave access. It generates a bunch of extracts and
> > that ought to be coupled into shuffles after wise.
> That's my understanding as well.
> Whatever strategy we take, it will be a mix of telling the cost model
> to avoid some pathological cases as well as improving the detection of
> the patterns in the x86 back-end.
> The work to benchmark this properly looks harder than enabling the
> right flags and patterns. :)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev