[PATCH] D111574: [SLP]Improve reductions vectorization.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 21 07:11:53 PDT 2021


ABataev added a comment.

In D111574#3075574 <https://reviews.llvm.org/D111574#3075574>, @RKSimon wrote:

> Some minor comments, but its a heavy patch to review tbh...

I understand. I'll try to split it.
The main idea behind this patch is to improve the reduction vectorization process. Currently, it SLPVectorizer gathers 3 kinds of instructions:
reduction operations (those with the root `RdxKind` kind), reduction values (the very first non-`RdxKind` instruction with the same opcodes) and extra args (instructions with different parents, non-`RdxKind` or non-reduced value opcode, etc.). At first, it complicates the reduction analysis (some of the potential reduction operands may transform into extra args, because their operands are also extra args). Also, it throws away some potential beneficial reductions, like constant, reductions with the repeated values, reductions with same/alternate opcodes.
Patch simplifies the reduction analysis process (we just do simple BFS in the operand order), gathers all potential reduced values (without checking for reduced value opcode) and extra args (without any extra transformations, we can detect such args immediately). Then it sorts potential reduction values by their value/instruction opcodes (same and/or alternate ones too) and then it tries to generate the reduction for all these potentially reduced values/instructions.
Also, it changes the order of reductions/args vectorization attempts. At first, we need to find the reductions and only if there are no reductions, try to vectorize args of the binops.
Also, it tries to generate the final scalar code for the non-reduced/extra args in the most optimal way, to avoid some extra dependency between the last scalar instructions to allow the CPU to schedule more instructions to be executed independently.
That's the first patch in the series. I have another one, which should add support for reduction operations with many uses, it may help to vectorize something like this:

  bool Res = false;
  for (int i =0; i < 15; ++i) {
    bool Cmp = a[i] < a[i+1];
    int min = Cmp ? a[i] : a[i+1]
    Res |= Cmp;
  }

and similar patterns I saw in real user code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111574/new/

https://reviews.llvm.org/D111574



More information about the llvm-commits mailing list