[PATCH] D30086: Add generic IR vector reductions

Wed Feb 22 02:48:19 PST 2017

aemerson marked an inline comment as done.
aemerson added a comment.

In https://reviews.llvm.org/D30086#681823, @mkuper wrote:

> In https://reviews.llvm.org/D30086#681447, @aemerson wrote:
>
> > In https://reviews.llvm.org/D30086#680635, @mkuper wrote:
> >
> > > I can see some advantages to forming the intrinsic conditionally, but I'm not sure they're worth the cost of not having a canonical representation.
> >
> >
> > This patch was intended to be a transitional stage, my expectation was that over time targets would decide to move to this representation by opting in and handling them. The reason for the transitional step was that I didn't want to force this form on all targets (even if it lowers to the same code by a lowering step) as it changes the IR during the mid-end stage. At the moment, the only target which requires this intrinsic form without any alternative is SVE. If you think it's ok to change the reduction form for all targets until a late lowering then I'm happy to implement that.
>
>
> What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.
>
> There are three ways I can see this going:
>  a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.
>  b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).
>  c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.
>
> > ! In https://reviews.llvm.org/D30086#682110, @rengolin wrote:
> > 
> >> ! In https://reviews.llvm.org/D30086#681823, @mkuper wrote:
> > 
> > What I'd really like to avoid is having a nominally "target-independent" intrinsic, which can, in practice, only be handled by one or two targets.
>
> Agreed.
>
> > a) If we know this is not the right generic representation, this should not go in as a target-independent intrinsic. I haven't seen anyone raise any "deal-breaker" issues so far, though.
>
> This seems to tick all the boxes, and have been reviewed for both ARM back-ends (including the future SVE) and x86 (including the future AVX1024).
>
> We should really have people from other targets to make sure this makes sense, but at this point, I think the changes will be minimal, since any architecture specific change can be done at the lowering stage on their own back-ends.
>
> > b) If we think this is the right generic representation, but we're not sure, we can introduce these intrinsics as llvm.experimental. For an experimental intrinsic, I think it makes more sense to have targets opt-in like this patch does (but we probably still want default lowering for targets that don't opt-in).
>
> I think, for now, this is the best thing to do.
>
> The log2 shuffle pattern is canonical and it works well with the "lower how you can" premise already. It may be cumbersome for AVX512 (long pyramid), but it's not unbearable. It will become unbearable for AVX-1024 and impossible for SVE to use the same pattern, so those targets can begin using it on the side, and then we can extend to the remaining targets on their pace.
>
> The alternative is to have an **SVE/AVX specific intrinsic**, which means the same thing as the pyramid, and doesn't need to be handled by other targets. Clang (via intrinsics) would generate them as well as the vectorisers, and that would be completely opaque to any other pass (as external function calls). But this would make it harder to extend support for the other smaller vector sizes (renames, tests, etc) if we want to move it to the canonical form.
>
> I personally think that this (concept) is a good representation of a horizontal reduction, so could easily be a canonical form one day. The particular shape may need to change, but that's orthogonal.
>
> > c) If we are sure about this, then we should do just it for all targets, and have late lowering by default. I'm not really the right person to determine that, though. I guess that's a question for the backend owners.
>
> If we make it experimental, we don't need to push any target to support it ASAP. Furthermore, both SVE and AVX 1024 are, themselves, "experimental", and this change is mostly for their benefit, so I don't see why this **shouldn't** be experimental at this stage. It'll also give us time to change and other targets to adapt it if needed.

Ok, so for now I'll make the changes to address the review comments but I won't be moving to a late lowering solution (which I think is more preferable but if these are experimental then we can't do that).

On the ordered intrinsics issue: what I propose is that we retain separation of the intrinsics for the purposes of cleaner IR for the usual fast reductions, so the scalar argument isn't needed. As these are now experimental intrinsics we can change the implementation later if necessary.

Repository:
  rL LLVM

https://reviews.llvm.org/D30086