[all-commits] [llvm/llvm-project] e97616: [LoopVectorizer] Inloop vector reductions

David Green via All-commits all-commits at lists.llvm.org
Wed Aug 5 10:14:31 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: e9761688e41cb979a1fa6a79eb18145a75104933
      https://github.com/llvm/llvm-project/commit/e9761688e41cb979a1fa6a79eb18145a75104933
  Author: David Green <david.green at arm.com>
  Date:   2020-08-05 (Wed, 05 Aug 2020)

  Changed paths:
    M llvm/include/llvm/Analysis/IVDescriptors.h
    M llvm/lib/Analysis/IVDescriptors.cpp
    M llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    M llvm/lib/Transforms/Vectorize/VPlan.cpp
    M llvm/lib/Transforms/Vectorize/VPlan.h
    M llvm/test/Transforms/LoopVectorize/reduction-inloop-uf4.ll
    M llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

  Log Message:
  -----------
  [LoopVectorizer] Inloop vector reductions

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069




More information about the All-commits mailing list