<div dir="ltr">As Ashutosh wrote, the BasicTTI cost model evaluates this as the cost of using extracts and inserts.<div>So even if we end up generating inserts and extracts (and I believe we actually manage to get the right shuffles, more or less, courtesy of InstCombine and the shuffle lowering code), we should be seeing improvements with the current cost model.</div><div>I agree that we can get *more* improvement with better cost modeling, but I'd expect to be able to get *some* improvement the way things are right now.<div><br></div><div>That's why I'm curious about where we saw regressions - I'm wondering whether there's really a significant cost modeling issue I'm missing, or it's something that's easy to fix so that we can make forward progress, while Ashutosh is working on the longer-term solution.<br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 5, 2016 at 2:03 PM, Renato Golin <span dir="ltr"><<a href="mailto:renato.golin@linaro.org" target="_blank">renato.golin@linaro.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 5 August 2016 at 21:00, Demikhovsky, Elena<br>

<span><<a href="mailto:elena.demikhovsky@intel.com" target="_blank">elena.demikhovsky@intel.com</a>> wrote:<br>

> As far as I remember, may be I’m wrong, vectorizer does not generate<br>

> shuffles for interleave access. It generates a bunch of extracts and inserts<br>

> that ought to be coupled into shuffles after wise.<br>

<br>

</span>That's my understanding as well.<br>

<br>

Whatever strategy we take, it will be a mix of telling the cost model<br>

to avoid some pathological cases as well as improving the detection of<br>

the patterns in the x86 back-end.<br>

<br>

The work to benchmark this properly looks harder than enabling the<br>

right flags and patterns. :)<br>

<br>

cheers,<br>

--renato<br>

</blockquote></div><br></div></div></div></div>