[llvm-dev] RFC: Generic IR reductions

Wed Feb 1 02:54:05 PST 2017

On 1 February 2017 at 10:30, Demikhovsky, Elena
<elena.demikhovsky at intel.com> wrote:
>> If you mean "patterns may not be matched, and reduction instructions will not be generated, making the code worse", then this is just a matter of making the patterns obvious and the back-ends robust enough to cope with it, no?
> The Back-end should be as robust as possible, I agree. The problem that I see is in adding another kind of complexity to the optimizer that works between the Vectorizer and the Back-end. It should be able to recognize all "obvious" patterns in order to preserve them.

Right!

Also, I may have been my own enemy again and muddled the question. Let
me try again... :)

I'm not against a reduction intrinsic. I'm against one reduction
intrinsic for {every kind} x {ordered, unordered}. At least until
further evidence comes to light.

My proposal was to have a reduction intrinsic that can infer the type
by the predecessors.

For example:

  @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))

would generate a widening unordered reduction (fast-math).

> Now we look at a Reduction Phi and if the FP mode requires the "Ordered" reduction which is not supported, the whole loop remains scalar.

Right, but this is orthogonal to having separate intrinsics or not.

  %fast = @llvm.reduce(ext <N x double> ( add <N x float> %a, %b))
  %order = llvm.reduce(ext <N x double> ( add <N x float> %a, %b), double %acc)

If the IR contains %order, then this will *have* to be scalar if the
target doesn't support native ordered or a fast path to it. And this
is up to the cost model.

> If we would leave this decision to the Cost Model, it will provide a cost of scalarization.  And at the end we may decide to scalarize reduction operation inside vector loop. Now, once the taken decision is vectorization, inserting an intrinsic or generating a plain IR should be a Target decision.

Hum, I'm beginning to see your point, I think. I agree this is again a
target decision, but it's also a larger compiler-wide decision, too.

The target's decision is: IR doesn't have the required semantics, I
*must* use intrinsics. It can't be: I'd rather have intrinsics because
it's easier to match in the back-end.

The first is a requirement, the second is a personal choice, and one
that can impact the generic instruction selection between IR and
target specific selection.

cheers,
--renato