[PATCH] D75069: [LoopVectorizer] Inloop vector reductions

Mon Feb 24 11:17:43 PST 2020

dmgreen created this revision.
dmgreen added reviewers: Ayal, hsaito, fhahn, gilr, dcaballe, rengolin.
Herald added subscribers: rogfer01, bollu, hiraditya, kristof.beyls.
Herald added a project: LLVM.

Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction).  So giving the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the "reduction" operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions.

I had an initial version of this patch that just altered fixReductions, modifying the reduction operations after the fact. That doesn't properly fit into the costmodelling though. It seems like this should be the perfect kind of job for vplan, if I am understanding the goals there correctly. I would expect it to eventually work something along the lines of:

- Produce some simple vplans (scalar, different widths)
- Produce some more interesting vector plans from them (i.e inloop reductions)
- Cost each of them and pick the best
- Execute it.

This patch does not work like that yet though, as I am unsure how that should really look, and it did not seem simple to create a vplan from another yet. The def-use relations are usually not in place for example.  Neither is the costing of vplans against one another, which makes sense.

It does use a VPReductionRecipe (which seems to work well), but just uses a TTI->preferInloopReduction call for choosing between inloop and out-loop for each of the reductions. It does also not attempt to choose higher vector widths yet, which will need further changes the Arm backend.

I have already written this twice, so I don't really mind writing it again if this should be done differently, so long as I know what "good" looks like.

https://reviews.llvm.org/D75069

Files:
  llvm/include/llvm/Analysis/IVDescriptors.h
  llvm/include/llvm/Analysis/TargetTransformInfo.h
  llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
  llvm/lib/Analysis/IVDescriptors.cpp
  llvm/lib/Analysis/TargetTransformInfo.cpp
  llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
  llvm/lib/Target/ARM/ARMTargetTransformInfo.h
  llvm/lib/Transforms/Utils/LoopUtils.cpp
  llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlan.cpp
  llvm/lib/Transforms/Vectorize/VPlan.h
  llvm/lib/Transforms/Vectorize/VPlanValue.h
  llvm/test/Transforms/LoopVectorize/ARM/mve-reductions.ll
  llvm/test/Transforms/LoopVectorize/reduction-inloop.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D75069.246257.patch
Type: text/x-patch
Size: 107816 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200224/06308463/attachment.bin>