<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 11, 2015 at 8:54 AM, Simon Pilgrim <span dir="ltr"><<a href="mailto:llvm-dev@redking.me.uk" target="_blank">llvm-dev@redking.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":742" class="a3s" style="overflow:hidden">Hi chandlerc, qcolombet, andreadb, spatel,<br>

<br>

The existing unpck instruction lowering was based on matching explicit shuffle patterns, and missed many alternative shuffle masks (notably commuted masks and duplicate inputs).</div></blockquote></div><br>So, the commuted masks really shouldn't happen because we should be canonicalizing the the operand order to ensure we can just check one. We may be missing some tie-break cases in the canonicalization, but I'd much rather attack that than test ever more permutations for unpck patterns.</div><div class="gmail_extra"><br></div><div class="gmail_extra">For duplicate inputs, we should canonicalize those to single-input shuffles because we usually have better shuffles than unpck for those. Where we don't, I'm fine adding the patterns for the unpck variants directly because it should be very obvious that there is a cost tradeoff here. Using an unpck to lower a single-input shuffle precludes folding a load with the shuffle.</div></div>