[all-commits] [llvm/llvm-project] a17f03: [VectorCombine] new IR transform pass for partial ...

Sun Feb 9 07:40:50 PST 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: a17f03bd93939cf30bfbb829321437bd0aaa4ef0
      https://github.com/llvm/llvm-project/commit/a17f03bd93939cf30bfbb829321437bd0aaa4ef0
  Author: Sanjay Patel <spatel at rotateright.com>
  Date:   2020-02-09 (Sun, 09 Feb 2020)

  Changed paths:
    M llvm/include/llvm/InitializePasses.h
    M llvm/include/llvm/LinkAllPasses.h
    M llvm/include/llvm/Transforms/Vectorize.h
    A llvm/include/llvm/Transforms/Vectorize/VectorCombine.h
    M llvm/lib/Passes/PassBuilder.cpp
    M llvm/lib/Passes/PassRegistry.def
    M llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
    M llvm/lib/Transforms/Vectorize/CMakeLists.txt
    A llvm/lib/Transforms/Vectorize/VectorCombine.cpp
    M llvm/lib/Transforms/Vectorize/Vectorize.cpp
    M llvm/test/Other/new-pm-defaults.ll
    M llvm/test/Other/new-pm-thinlto-defaults.ll
    M llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
    M llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
    M llvm/test/Other/opt-O2-pipeline.ll
    M llvm/test/Other/opt-O3-pipeline.ll
    M llvm/test/Other/opt-Os-pipeline.ll
    A llvm/test/Transforms/VectorCombine/X86/extract-cmp.ll
    A llvm/test/Transforms/VectorCombine/X86/lit.local.cfg

  Log Message:
  -----------
  [VectorCombine] new IR transform pass for partial vector ops

We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480