[all-commits] [llvm/llvm-project] 745bf6: [LoopVectorizer] Inloop vector reductions

Thu Aug 6 02:11:50 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 745bf6cf447110ae025e1c77338020f0c5d5ca80
      https://github.com/llvm/llvm-project/commit/745bf6cf447110ae025e1c77338020f0c5d5ca80
  Author: David Green <david.green at arm.com>
  Date:   2020-08-06 (Thu, 06 Aug 2020)

  Changed paths:
    M llvm/include/llvm/Analysis/IVDescriptors.h
    M llvm/lib/Analysis/IVDescriptors.cpp
    M llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    M llvm/lib/Transforms/Vectorize/VPlan.cpp
    M llvm/lib/Transforms/Vectorize/VPlan.h
    M llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll
    M llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

  Log Message:
  -----------
  [LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069