[all-commits] [llvm/llvm-project] deb72c: [ARM] Better reductions

Mon Jun 29 08:04:43 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: deb72ce29860f61fe91ddcf97e89abfc9544cf42
      https://github.com/llvm/llvm-project/commit/deb72ce29860f61fe91ddcf97e89abfc9544cf42
  Author: David Green <david.green at arm.com>
  Date:   2020-06-29 (Mon, 29 Jun 2020)

  Changed paths:
    M llvm/lib/Target/ARM/ARMISelLowering.cpp
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-bit.ll
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-fadd.ll
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-fminmax.ll
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-fmul.ll
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-loops.ll
    M llvm/test/CodeGen/Thumb2/mve-vecreduce-mul.ll

  Log Message:
  -----------
  [ARM] Better reductions

MVE has native reductions for integer add and min/max. The others need
to be expanded to a series of extract's and scalar operators to reduce
the vector into a single scalar. The default codegen for that expands
the reduction into a series of in-order operations.

This modifies that to something more suitable for MVE. The basic idea is
to use vector operations until there are 4 remaining items then switch
to pairwise operations. For example a v8f16 fadd reduction would become:
Y = VREV X
Z = ADD(X, Y)
z0 = Z[0] + Z[1]
z1 = Z[2] + Z[3]
return z0 + z1

The awkwardness (there is always some) comes in from something like a
v4f16, which is first legalized by adding identity values to the extra
lanes of the reduction, and which can then not be optimized away through
the vrev; fadd combo, the inserts remain. I've made sure they custom
lower so that we can produce the pairwise additions before the extra
values are added.

Differential Revision: https://reviews.llvm.org/D81397