[llvm-dev] RFC: Generic IR reductions
Renato Golin via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 1 07:10:36 PST 2017
On 1 February 2017 at 14:44, Amara Emerson <amara.emerson at gmail.com> wrote:
> Her point is that the %sum value will no longer be an add node, but
> simply a constant vector. There is no way to resolve the semantics and
> have meaningful IR after a simple constant prop. This means that other
> simple transformations will all have to have special case logic just
> to handle reductions, for example LCSSA.
Right.
> Can you give a specific example? Reductions are actually very robust
> to optimizations breaking the idiom, which is why I was able to
> replace the reductions with intrinsics in my patches and with simple
> matching generate identical code to before. No other changes were
> required in the mid-end.
First, this is a demonstration that keeping them as IR is a good
thing, not a bad one. We keep the semantics as well as allow for
further introspection in the code block.
Examples of introspection are vector widening or instruction fusion
after inlining. Say you have a loop with a function call and a
reduction, but that call has a reduction on its own. If the function
gets inlined after you reduced your loop into a builtin, the optimiser
will have no visibility if the function's reduction pattern can be
merged, either widening the vectors (saving on loads/stores) or fusing
(ex. MLA).
We have seen both cases with NEON after using IR instructions for
everything but the impossible.
In 2010, I've gone through a similar discussion with Bob Wilson, who
defended the position I'm defending now. And I defend this position
today because I have been categorically proven wrong by the results I
describe above.
Chandler's arguments are perfectly to the point. Intrinsics are not
only necessary when we can't represent things in IR, they're a *good*
ways of representing odd things.
But if things are not odd (ie. many targets have them) or if we can
already represent them in IR, then it stands to reason that adding
duplicated stuff only adds complexity. It increases the maintenance
cost (more node types to consider), in increases the chance for
missing some of them (and either not optimising or generating bad
code), and it stops the optimisers that know nothing about it (because
it's too new) to do any inference, and can actually generate worse
code than before (shuffle explosion).
As they say: if it's not broken, don't fix it.
Let's talk about the reductions that AVX512 and SVE can't handle with
IR semantics, but let's not change the current IR semantics for no
reason.
cheers,
--renato
More information about the llvm-dev
mailing list