[PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.

Wed Nov 25 02:29:49 PST 2015

Cong,

Could you please comment on what you think we should do here? It seems like we've now had several threads on this topic, and I'm not sure what the consensus is right now.

 -Hal

----- Original Message -----
> From: "Hal Finkel via llvm-commits" <llvm-commits at lists.llvm.org>
> To: "Asghar-ahmad Shahid" <Asghar-ahmad.Shahid at amd.com>
> Cc: llvm-commits at lists.llvm.org, reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org
> Sent: Tuesday, October 27, 2015 11:59:53 PM
> Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.
> 
> ----- Original Message -----
> > From: "Asghar-ahmad via llvm-commits Shahid"
> > <llvm-commits at lists.llvm.org>
> > To: reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org,
> > "renato golin" <renato.golin at linaro.org>, "james molloy"
> > <james.molloy at arm.com>
> > Cc: llvm-commits at lists.llvm.org
> > Sent: Sunday, September 27, 2015 11:42:46 PM
> > Subject: RE: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and
> > corresponding SDNodes for horizontal sum operation.
> > 
> > Hi Hal,
> > 
> > Response inlined.
> > 
> > Regards,
> > Shahid
> > 
> > > -----Original Message-----
> > > From: hfinkel at anl.gov [mailto:hfinkel at anl.gov]
> > > Sent: Saturday, September 26, 2015 2:58 AM
> > > To: Shahid, Asghar-ahmad; renato.golin at linaro.org;
> > > hfinkel at anl.gov;
> > > james.molloy at arm.com
> > > Cc: llvm-dev at redking.me.uk; bruno.cardoso at gmail.com; llvm-
> > > commits at lists.llvm.org
> > > Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and
> > > corresponding SDNodes for horizontal sum operation.
> > > 
> > > hfinkel added a comment.
> > > 
> > > I need to take a step back here; why are we doing this again?
> > > 
> > > I read again the RFC thread
> > > (http://lists.llvm.org/pipermail/llvm-dev/2015-
> > > May/085078.html), and it ended with the following (from Renato):
> > > 
> > > > > BTW, now my plan is to just add the two intrinsics for
> > > > > 'absolute
> > > difference'
> > > 
> > > >
> > > 
> > > > >  and 'horizontal add'.
> > > 
> > > >
> > > 
> > > >
> > > 
> > > > That's ok, as long as they're impossible to represent in plain
> > > > IR.
> > > 
> > > 
> > > and I think that we were all in agreement on this point. But now
> > > I'm not sure
> > > you've demonstrated the prerequisite.
> > Earlier to this RFC there was a discussion around a patch
> > (http://reviews.llvm.org/D9029)
> > regarding llvm.sad intrinsic which highlights the need to have
> > llvm.hsum and friend
> > intrinsics. This RFC was a result of that discussion.
> > 
> > > The underlying operations here (and
> > > in http://reviews.llvm.org/D10867) seem like they are
> > > representable
> > > using IR
> > > (as demonstrated by the fact that you provide
> > > potentially-equivalent IR
> > > sequences in the documentation), except for the ordering freedom
> > > here.
> > 
> > Of course the logical expansion of the underlying operation is
> > trivially representable
> > but not the pattern matching in codegen. Actually pattern matching
> > is
> > complex and
> > may well be fragile in codegen due to the long chain of generic IR
> > to
> > be matched.
> > Moreover movements of truncate/extends ops in DAG increase the
> > complexity of
> > matching further.
> 
> It might be fragile, but I suggest trying so that the issues are
> clear. I don't see how you'd end up with a lot of truncate/extends
> in the middle of a horizontal add that we'd otherwise be able to
> match. It also makes the compiler more robust because we'll be able
> to generate good code from hand-written patterns, not just from
> auto-vectorized code.
> 
> > 
> > > 
> > > And this, I fear, is where we run into trouble. The thing that is
> > > not
> > > representable in the IR is that the order of operations in the
> > > horizontal sum is
> > > undefined, and at the same time, signed overflow is undefined.
> > > This
> > > cannot
> > > be represented in the IR because nsw adds don't reassociate, and
> > > thus,
> > > there's no way to represent the sequence of nsw adds such that
> > > they
> > > can be
> > > reassociated while retaining their nsw property. But the problem
> > > is
> > > that,
> > > because this freedom cannot be represented in the IR, we also
> > > can't
> > > generate it from IR in a semantics-preserving way; and, thus, it
> > > would not be
> > > legal to generate it in the vectorizers.
> > 
> > As proposed, we are not going to generate it always in vectorizer
> > but
> > only based
> > on target checks, can't we also check for ordering relaxation
> > there?
> > Or fix the ordering
> > constraint by some other means.
> 
> No; it is a semantics problem. The vectorizer would need to prove
> something it seems unlikely to be able to prove.
> 
>  -Hal
> 
> > 
> > > 
> > > Thus, this change does not seem right, and approving
> > > http://reviews.llvm.org/D10867 seems like a mistake as well. We
> > > could
> > > certainly fix the definition here to make it exactly
> > > representable
> > > in the IR, but
> > > then what's the point of doing so?
> > > 
> > > In the RFC, you mentioned cost modeling as a major motivation for
> > > adding
> > > intrinsics, but that seems like an unrelated problem (at least in
> > > part).
> > 
> > In RFC, I had also proposed llvm.sad intrinsic which I thought
> > would
> > make cost
> > modelling easier for those targets which support sad operation
> > natively. Hence
> > The motivation. But Renato's explanation clarified the things
> > related
> > to cost
> > modeling. However the complexity of pattern matching was always a
> > bigger
> > issue.
> > 
> > > During
> > > vectorization, we can use special interfaces to estimate the cost
> > > of complex
> > > patterns. In fact, we already have an interface for reductions:
> > > TTI.getReductionCost. There is a second relevant code model: That
> > > used by
> > > the unroller and inliner. Vectorization happens after inlining,
> > > so
> > > that
> > > interaction is not really relevant, but partial unrolling happens
> > > after
> > > vectorization, and so the cost model there might want to
> > > understand
> > > that a
> > > complex sequence of shuffles, extracts and adds has a
> > > disproportionately-
> > > low cost. The same is true of the inliner if the input IR uses
> > > vector types and
> > > initially contains these operations, but even in that case, you'd
> > > not want to
> > > canonicalize on the intrinsics too early in case other
> > > optimizations remove the
> > > need for most of the operations. Thus, in the end, you need
> > > pattern-
> > > matching code near the end of the pipeline anyway to account for
> > > input IR
> > > directly containing relevant operations on vector types.
> > > 
> > > In short, I don't understand why we're going in this direction.
> > > You
> > > can match
> > > these operations in the backend, and you can match them in the
> > > cost-model
> > > code. If we can't do the latter, then we should fix that. And we
> > > already have
> > > special interfaces for vectorization costs for complex operations
> > > such as this.
> > > 
> > > 
> > > http://reviews.llvm.org/D10964
> > > 
> > > 
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory