[llvm-dev] sum elements in the vector

Mon May 23 13:06:48 PDT 2016

----- Original Message -----

> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth"
> <chandlerc at gmail.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Monday, May 23, 2016 2:48:13 PM
> Subject: Re: [llvm-dev] sum elements in the vector

> On Mon, May 23, 2016 at 12:43 PM Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:

> > Hi Chandler,
> 

> > Regardless of the canonical form we choose, we need code to match
> > non-canonical associated shuffle sequences and convert them into
> > the
> > canonical form. We also need code to match the pattern where we
> > extractelement on all elements and sum them into this canonical
> > form. This code needs to exist somewhere, so we need to decide
> > whether it exists in the frontend or the backend.
> 

> Agreed. However, we also need to choose where it lives within the
> "backend" or LLVM more generally.

> I think putting it late will end up with less powerful matching than
> having it fairly early and run often in instcombine.

> Consider -- how should the inliner or the loop unroller evaluate code
> sequences containing 16 vector shuffles that all amount to expanding
> a horizontal operation?
I agree this is an issue. This is why I said that if we choose a composite canonical form, we'll end up matching the pattern multiple times. Regarding this kind of cost modeling, however, we already have this problem, and it's sometimes quite bad, and has nothing to do with reductions. Targets, in general, need to be better about providing better "user costs" for composite sequences that will end up being cheap. Last I looked at this in the context of loop unrolling, addressing modes were a common issue here as well. 

> That's why I think we should model this patterns as first class
> citizens in the IR. Then backends can lower them however is
> necessary.

> > Having an intrinsic is obviously smaller, in terms of IR memory
> > overhead, than these instructions. However, I'm not sure how many
> > passes we'll need to teach about the new intrinsic.
> 
> An intrinsic already moves this into the IR instead of the code
> generator, which I like.

> I think the important distinction is between a *target specific*
> intrinsic and a generic one that we expect every target to support.
> The latter I think will be much more useful.

> Then we can debate the relative merits of using a generic intrinsic
> versus an instruction which I think are fairly mundane. I suspect
> this is a place where we should use instructions, but that's a much
> more mechanical discussion IMO.

Sure. If we're going to go with a dedicated representation, I'm fine with either, and I agree an instruction here might be mechanically easier. 

> (And in case it isn't clear, I'm not arguing we should avoid doing
> the vector reduction matching and other patterns at all. I'm just
> trying to start the discussion about the larger set of issues here.)
Understood. 

In some mechanical sense, however, we're discussing: 

if (isVectorReduction(I) == Instruction::Add) vs. if (auto *VRI = dyn_cast<VectorReductionInst>(I)) if (VRI->getReductionOp() == Instruction::Add) -- bikeshedding aside -- and so it might be better to start with the utility-function implementation, which is essentially what we should have now given the current implementation, and then see it we want to do more based on compile-time impacts or other issues. 

Thanks again, 
Hal 

> -Chandler
-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160523/c54c9733/attachment.html>