<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, May 23, 2016 at 12:43 PM Hal Finkel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Chandler,<br>

<br>

Regardless of the canonical form we choose, we need code to match non-canonical associated shuffle sequences and convert them into the canonical form. We also need code to match the pattern where we extractelement on all elements and sum them into this canonical form. This code needs to exist somewhere, so we need to decide whether it exists in the frontend or the backend.<br></blockquote><div><br></div><div>Agreed. However, we also need to choose where it lives within the "backend" or LLVM more generally.</div><div><br></div><div>I think putting it late will end up with less powerful matching than having it fairly early and run often in instcombine.</div><div><br></div><div>Consider -- how should the inliner or the loop unroller evaluate code sequences containing 16 vector shuffles that all amount to expanding a horizontal operation?</div><div><br></div><div>That's why I think we should model this patterns as first class citizens in the IR. Then backends can lower them however is necessary.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Having an intrinsic is obviously smaller, in terms of IR memory overhead, than these instructions. However, I'm not sure how many passes we'll need to teach about the new intrinsic.</blockquote><div><br></div><div>An intrinsic already moves this into the IR instead of the code generator, which I like.</div><div><br></div><div>I think the important distinction is between a *target specific* intrinsic and a generic one that we expect every target to support. The latter I think will be much more useful.</div><div><br></div><div>Then we can debate the relative merits of using a generic intrinsic versus an instruction which I think are fairly mundane. I suspect this is a place where we should use instructions, but that's a much more mechanical discussion IMO.</div><div><br></div><div><br></div><div>(And in case it isn't clear, I'm not arguing we should avoid doing the vector reduction matching and other patterns at all. I'm just trying to start the discussion about the larger set of issues here.)</div><div><br></div><div>-Chandler</div></div></div>