[PATCH] D30086: Add generic IR vector reductions

Mon Feb 20 03:09:28 PST 2017

aemerson marked 3 inline comments as done.
aemerson added a comment.

In https://reviews.llvm.org/D30086#680635, @mkuper wrote:

> I'm not sure this is the right strategy for deciding when and how to produce an intrinsic vs. shuffle sequence.
>
> There are two related issues:
>
> 1. The normal behavior for non-experimental target-independent intrinsics is that targets *must* support them. It's ok if the "fallback" way to support them is to lower them back to to the "brute-force" shuffle pyramid, and that fallback gets used by all targets that don't have a more direct way to support it. But it still has to be supported somehow. Basically, the way I understand the general philosophy is - it's up to the optimizer to ask the target which constructs are cheap and which are expensive. But if the target is handed an expensive construct, it still has to lower it, it's just that the lowering is allowed to be really bad.
> 2. Will this now be the canonical form for reductions? E.g. should instcombine match the shuffle pyramid sequence into this? If the answer is yes, then the generic lowering for the intrinsic should be as good as lowering for the shuffle pyramid. That doesn't seem too hard, since that's the natural lowering, but that would put a bound on how bad the generic lowering is allowed to be. That would also means we won't have createTargetReduction() - all of that logic would move into lowering.
>
>   I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.

This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.

================
Comment at: docs/LangRef.rst:11620
+if the vector width is known and a power-of-two then then a shuffle sequence may
+be used instead.
+
----------------
mehdi_amini wrote:
> Why this specification about implementation detail here?
I was trying to give some explanation why there may be two different representations of reductions depending on the target. I can remove this.

================
Comment at: docs/LangRef.rst:11655
+reduction of a vector, returning the result as a scalar. The return type matches
+the element-type of the vector input.
+
----------------
mehdi_amini wrote:
> mkuper wrote:
> > mehdi_amini wrote:
> > > delena wrote:
> > > > mehdi_amini wrote:
> > > > > I think it'd be important to clarify the ordering: do we require a "fast" FMF to change it? If not, why?
> > > > I think that in the case of FP operations we need two intrinsics - "ordered" and "fast".
> > > Why don't reusing the existing FMF? The intrinsic without FMF would be equivalent to "ordered" and the "fast" FMF flag would allow any ordering.
> > I don't think we can currently attach FMF to CallInsts.
> We do actually: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151214/319516.html
Thanks, I hadn't realised CallInsts supported FMFs, it makes some sense to use for the ordered/unordered distinction but what are your thoughts on the extra scalar accumulator operand for ordered reductions? An undef scalar value could be given for fast reductions but it my view it clutters the intrinsic for the vast majority of reductions, though not a deal breaker.

================
Comment at: docs/LangRef.rst:11687
+
+The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND`` reduction
+of a vector, returning the result as a scalar. The return type matches the
----------------
mkuper wrote:
> Integer types only, presumably?
Yes.

================
Comment at: docs/LangRef.rst:11767
+
+      declare <scalar-type> @llvm.vector.reduce.umax.<scalar-type>.<vector-type>(<vector-type> %a)
+
----------------
mehdi_amini wrote:
> Why the explicit `<scalar-type>` in the signature?
This is the syntax of the declaration in the IR.

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:715
+  /// the intrinsics form instead of the shuffle form.
+  bool useReductionIntrinsic(unsigned Opcode, Type *Ty, bool IsMaxOp = false,
+                             bool IsSigned = false, bool NoNaN = false) const;
----------------
delena wrote:
> What does IsMaxOp mean? I think that it is better to split this interface into several functions and do not mix fp and integers together.
It signals to the function whether the reduction is a min or max operation, if the opcode is a CMP. I'll look at splitting this up.

Repository:
  rL LLVM

https://reviews.llvm.org/D30086