<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: arial,helvetica,sans-serif; font-size: 10pt; color: #000000'><br><hr id="zwchr"><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Chandler Carruth" <chandlerc@gmail.com><br><b>To: </b>"Hal Finkel" <hfinkel@anl.gov>, "Chandler Carruth" <chandlerc@gmail.com><br><b>Cc: </b>"llvm-dev" <llvm-dev@lists.llvm.org><br><b>Sent: </b>Monday, May 23, 2016 2:48:13 PM<br><b>Subject: </b>Re: [llvm-dev] sum elements in the vector<br><br><div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, May 23, 2016 at 12:43 PM Hal Finkel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi Chandler,<br>
<br>
Regardless of the canonical form we choose, we need code to match non-canonical associated shuffle sequences and convert them into the canonical form. We also need code to match the pattern where we extractelement on all elements and sum them into this canonical form. This code needs to exist somewhere, so we need to decide whether it exists in the frontend or the backend.<br></blockquote><div><br></div><div>Agreed. However, we also need to choose where it lives within the "backend" or LLVM more generally.</div><div><br></div><div>I think putting it late will end up with less powerful matching than having it fairly early and run often in instcombine.</div><div><br></div><div id="DWT8483">Consider -- how should the inliner or the loop unroller evaluate code sequences containing 16 vector shuffles that all amount to expanding a horizontal operation?</div></div></div></blockquote>I agree this is an issue. This is why I said that if we choose a composite canonical form, we'll end up matching the pattern multiple times. Regarding this kind of cost modeling, however, we already have this problem, and it's sometimes quite bad, and has nothing to do with reductions. Targets, in general, need to be better about providing better "user costs" for composite sequences that will end up being cheap. Last I looked at this in the context of loop unrolling, addressing modes were a common issue here as well.<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div class="gmail_quote"><div></div><div><br></div><div>That's why I think we should model this patterns as first class citizens in the IR. Then backends can lower them however is necessary.</div><div> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
Having an intrinsic is obviously smaller, in terms of IR memory overhead, than these instructions. However, I'm not sure how many passes we'll need to teach about the new intrinsic.</blockquote><div><br></div><div>An intrinsic already moves this into the IR instead of the code generator, which I like.</div><div><br></div><div>I think the important distinction is between a *target specific* intrinsic and a generic one that we expect every target to support. The latter I think will be much more useful.</div><div><br></div><div>Then we can debate the relative merits of using a generic intrinsic versus an instruction which I think are fairly mundane. I suspect this is a place where we should use instructions, but that's a much more mechanical discussion IMO.</div><div id="DWT8484"><br></div></div></div></blockquote>Sure. If we're going to go with a dedicated representation, I'm fine with either, and I agree an instruction here might be mechanically easier.<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div class="gmail_quote"><div></div><div><br></div><div id="DWT8485">(And in case it isn't clear, I'm not arguing we should avoid doing the vector reduction matching and other patterns at all. I'm just trying to start the discussion about the larger set of issues here.)</div></div></div></blockquote>Understood.<br><br>In some mechanical sense, however, we're discussing:<br><br>if (isVectorReduction(I) == Instruction::Add) vs. if (auto *VRI = dyn_cast<VectorReductionInst>(I)) if (VRI->getReductionOp() == Instruction::Add) -- bikeshedding aside -- and so it might be better to start with the utility-function implementation, which is essentially what we should have now given the current implementation, and then see it we want to do more based on compile-time impacts or other issues.<br><br>Thanks again,<br>Hal<br><blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0); font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><div dir="ltr"><div class="gmail_quote"><div></div><div><br></div><div>-Chandler</div></div></div>
</blockquote><br><br><br>-- <br><div><span name="x"></span>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<span name="x"></span><br></div></div></body></html>