[PATCH] D10964: [Codegen] Add intrinsics 'hsum*' and corresponding SDNodes for horizontal sum operation.

Fri Sep 25 14:27:57 PDT 2015

hfinkel added a comment.

I need to take a step back here; why are we doing this again?

I read again the RFC thread (http://lists.llvm.org/pipermail/llvm-dev/2015-May/085078.html), and it ended with the following (from Renato):

> > BTW, now my plan is to just add the two intrinsics for 'absolute difference'

> 

> >  and 'horizontal add'.

> 

> 

> That's ok, as long as they're impossible to represent in plain IR.

and I think that we were all in agreement on this point. But now I'm not sure you've demonstrated the prerequisite. The underlying operations here (and in http://reviews.llvm.org/D10867) seem like they are representable using IR (as demonstrated by the fact that you provide potentially-equivalent IR sequences in the documentation), except for the ordering freedom here.

And this, I fear, is where we run into trouble. The thing that is not representable in the IR is that the order of operations in the horizontal sum is undefined, and at the same time, signed overflow is undefined. This cannot be represented in the IR because nsw adds don't reassociate, and thus, there's no way to represent the sequence of nsw adds such that they can be reassociated while retaining their nsw property. But the problem is that, because this freedom cannot be represented in the IR, we also can't generate it from IR in a semantics-preserving way; and, thus, it would not be legal to generate it in the vectorizers.

Thus, this change does not seem right, and approving http://reviews.llvm.org/D10867 seems like a mistake as well. We could certainly fix the definition here to make it exactly representable in the IR, but then what's the point of doing so?

In the RFC, you mentioned cost modeling as a major motivation for adding intrinsics, but that seems like an unrelated problem (at least in part). During vectorization, we can use special interfaces to estimate the cost of complex patterns. In fact, we already have an interface for reductions: TTI.getReductionCost. There is a second relevant code model: That used by the unroller and inliner. Vectorization happens after inlining, so that interaction is not really relevant, but partial unrolling happens after vectorization, and so the cost model there might want to understand that a complex sequence of shuffles, extracts and adds has a disproportionately-low cost. The same is true of the inliner if the input IR uses vector types and initially contains these operations, but even in that case, you'd not want to canonicalize on the intrinsics too early in case other optimizations remove the need for most of the operations. Thus, in the end, you need pattern-matching code near the end of the pipeline anyway to account for input IR directly containing relevant operations on vector types.

In short, I don't understand why we're going in this direction. You can match these operations in the backend, and you can match them in the cost-model code. If we can't do the latter, then we should fix that. And we already have special interfaces for vectorization costs for complex operations such as this.

http://reviews.llvm.org/D10964