[llvm-dev] RFC: Generic IR reductions

Wed Feb 1 06:44:56 PST 2017

Her point is that the %sum value will no longer be an add node, but
simply a constant vector. There is no way to resolve the semantics and
have meaningful IR after a simple constant prop. This means that other
simple transformations will all have to have special case logic just
to handle reductions, for example LCSSA.

> On thing that does happen is that code optimisations expose patterns
> that would otherwise not be apparent. This includes potential
> reduction or fusion patterns and can lead to massively smaller code or
> even eliding the whole block. If you convert a block to an intrinsic
> too early you may lose the ability to merge it back again later, as
> we're doing today.

> These are all hypothetical wrt SVE, but they did happen in NEON in the
> past and were the reason why we only have a handful of NEON
> intrinsics. Everything else are encoded with sequences of
> instructions.

Can you give a specific example? Reductions are actually very robust
to optimizations breaking the idiom, which is why I was able to
replace the reductions with intrinsics in my patches and with simple
matching generate identical code to before. No other changes were
required in the mid-end.

On 1 February 2017 at 14:20, Renato Golin <renato.golin at linaro.org> wrote:
> On 1 February 2017 at 13:06, Demikhovsky, Elena
> <elena.demikhovsky at intel.com> wrote:
>> Constant propagation:
>>
>> %sum = add <N x float> %a, %b
>> @llvm.reduce(ext <N x double>  %sum)
>>
>> if %a and %b are vector of constants, the %sum also becomes a vector of constants.
>> At this point you have @llvm.reduce(ext <N x double>  %sum) and don't know what kind of reduction do you need.
>
> Well, sum is an add node. If %a and %b have special semantics for the
> target, than @reduce is meaningful and this means some type of
> summation.
>
> But the more I think of it, the less I think this could actually solve
> the semantic issues around reductions... The zeroinit idiom would be
> more of a hack than re-using similar concepts in IR.
>
> So, let me take a step back, and assume that, for scalable vectors we
> *have* to use all intrinsics. IR simply has no compatible idiom, and
> unless we introduce some (like the stepvector), it won't work.
>
> As I said before, adding intrinsics is better than new IR constructs,
> so let's also assume this is the less costly way forward for now.
>
> But having IR instructions is better than adding intrinsics, and I'm
> not sure we want to completely replace what's there already, for
> intrinsics.
>
> Does AVX512 suffer from any cost in using the current extract/op
> idiom? NEON has small vectors, so IR sequences end up being 1~4 ops,
> which I don't consider a problem.
>
> Also, the current idiom can cope with ordered/unordered reduction by
> interleaving the operations or not:
>
>   %sum0 = %vec[0] + %vec[1]
>   %sum1 = %vec[2] + %vec[3]
>   %sum = %sum0 + %sum1
>
> or
>
>   %sum0 = %vec[0] + %vec[1]
>   %sum1 = %sum0 + %vec[2]
>   %sum = %sum1 + %vec[3]
>
> It may not cope with special semantics leading to use of
> target-specific instructions, in which case we obviously need
> intrinsics. It certainly can't cope with unknown vector sizes either.
>
> cheers,
> --renato