[PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.

Tue Oct 27 21:59:53 PDT 2015

----- Original Message -----
> From: "Asghar-ahmad via llvm-commits Shahid" <llvm-commits at lists.llvm.org>
> To: reviews+D10964+public+8430e6553a631d12 at reviews.llvm.org, "renato golin" <renato.golin at linaro.org>, "james molloy"
> <james.molloy at arm.com>
> Cc: llvm-commits at lists.llvm.org
> Sent: Sunday, September 27, 2015 11:42:46 PM
> Subject: RE: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.
> 
> Hi Hal,
> 
> Response inlined.
> 
> Regards,
> Shahid
> 
> > -----Original Message-----
> > From: hfinkel at anl.gov [mailto:hfinkel at anl.gov]
> > Sent: Saturday, September 26, 2015 2:58 AM
> > To: Shahid, Asghar-ahmad; renato.golin at linaro.org; hfinkel at anl.gov;
> > james.molloy at arm.com
> > Cc: llvm-dev at redking.me.uk; bruno.cardoso at gmail.com; llvm-
> > commits at lists.llvm.org
> > Subject: Re: [PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and
> > corresponding SDNodes for horizontal sum operation.
> > 
> > hfinkel added a comment.
> > 
> > I need to take a step back here; why are we doing this again?
> > 
> > I read again the RFC thread
> > (http://lists.llvm.org/pipermail/llvm-dev/2015-
> > May/085078.html), and it ended with the following (from Renato):
> > 
> > > > BTW, now my plan is to just add the two intrinsics for
> > > > 'absolute
> > difference'
> > 
> > >
> > 
> > > >  and 'horizontal add'.
> > 
> > >
> > 
> > >
> > 
> > > That's ok, as long as they're impossible to represent in plain
> > > IR.
> > 
> > 
> > and I think that we were all in agreement on this point. But now
> > I'm not sure
> > you've demonstrated the prerequisite.
> Earlier to this RFC there was a discussion around a patch
> (http://reviews.llvm.org/D9029)
> regarding llvm.sad intrinsic which highlights the need to have
> llvm.hsum and friend
> intrinsics. This RFC was a result of that discussion.
> 
> > The underlying operations here (and
> > in http://reviews.llvm.org/D10867) seem like they are representable
> > using IR
> > (as demonstrated by the fact that you provide
> > potentially-equivalent IR
> > sequences in the documentation), except for the ordering freedom
> > here.
> 
> Of course the logical expansion of the underlying operation is
> trivially representable
> but not the pattern matching in codegen. Actually pattern matching is
> complex and
> may well be fragile in codegen due to the long chain of generic IR to
> be matched.
> Moreover movements of truncate/extends ops in DAG increase the
> complexity of
> matching further.

It might be fragile, but I suggest trying so that the issues are clear. I don't see how you'd end up with a lot of truncate/extends in the middle of a horizontal add that we'd otherwise be able to match. It also makes the compiler more robust because we'll be able to generate good code from hand-written patterns, not just from auto-vectorized code.

> 
> > 
> > And this, I fear, is where we run into trouble. The thing that is
> > not
> > representable in the IR is that the order of operations in the
> > horizontal sum is
> > undefined, and at the same time, signed overflow is undefined. This
> > cannot
> > be represented in the IR because nsw adds don't reassociate, and
> > thus,
> > there's no way to represent the sequence of nsw adds such that they
> > can be
> > reassociated while retaining their nsw property. But the problem is
> > that,
> > because this freedom cannot be represented in the IR, we also can't
> > generate it from IR in a semantics-preserving way; and, thus, it
> > would not be
> > legal to generate it in the vectorizers.
> 
> As proposed, we are not going to generate it always in vectorizer but
> only based
> on target checks, can't we also check for ordering relaxation there?
> Or fix the ordering
> constraint by some other means.

No; it is a semantics problem. The vectorizer would need to prove something it seems unlikely to be able to prove.

 -Hal

> 
> > 
> > Thus, this change does not seem right, and approving
> > http://reviews.llvm.org/D10867 seems like a mistake as well. We
> > could
> > certainly fix the definition here to make it exactly representable
> > in the IR, but
> > then what's the point of doing so?
> > 
> > In the RFC, you mentioned cost modeling as a major motivation for
> > adding
> > intrinsics, but that seems like an unrelated problem (at least in
> > part).
> 
> In RFC, I had also proposed llvm.sad intrinsic which I thought would
> make cost
> modelling easier for those targets which support sad operation
> natively. Hence
> The motivation. But Renato's explanation clarified the things related
> to cost
> modeling. However the complexity of pattern matching was always a
> bigger
> issue.
> 
> > During
> > vectorization, we can use special interfaces to estimate the cost
> > of complex
> > patterns. In fact, we already have an interface for reductions:
> > TTI.getReductionCost. There is a second relevant code model: That
> > used by
> > the unroller and inliner. Vectorization happens after inlining, so
> > that
> > interaction is not really relevant, but partial unrolling happens
> > after
> > vectorization, and so the cost model there might want to understand
> > that a
> > complex sequence of shuffles, extracts and adds has a
> > disproportionately-
> > low cost. The same is true of the inliner if the input IR uses
> > vector types and
> > initially contains these operations, but even in that case, you'd
> > not want to
> > canonicalize on the intrinsics too early in case other
> > optimizations remove the
> > need for most of the operations. Thus, in the end, you need
> > pattern-
> > matching code near the end of the pipeline anyway to account for
> > input IR
> > directly containing relevant operations on vector types.
> > 
> > In short, I don't understand why we're going in this direction. You
> > can match
> > these operations in the backend, and you can match them in the
> > cost-model
> > code. If we can't do the latter, then we should fix that. And we
> > already have
> > special interfaces for vectorization costs for complex operations
> > such as this.
> > 
> > 
> > http://reviews.llvm.org/D10964
> > 
> > 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory