[all-commits] [llvm/llvm-project] 03fee6: [LoopVectorize] Add option to use active lane mask...

Mon Jul 11 05:47:14 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 03fee6712a39fb4132097a61fbc7e19aeececb59
      https://github.com/llvm/llvm-project/commit/03fee6712a39fb4132097a61fbc7e19aeececb59
  Author: David Sherwood <david.sherwood at arm.com>
  Date:   2022-07-11 (Mon, 11 Jul 2022)

  Changed paths:
    M llvm/include/llvm/Analysis/TargetTransformInfo.h
    M llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
    M llvm/include/llvm/CodeGen/BasicTTIImpl.h
    M llvm/lib/Analysis/TargetTransformInfo.cpp
    M llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
    M llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
    M llvm/lib/Target/ARM/ARMTargetTransformInfo.h
    M llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    M llvm/lib/Transforms/Vectorize/VPlan.cpp
    M llvm/lib/Transforms/Vectorize/VPlan.h
    M llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
    M llvm/lib/Transforms/Vectorize/VPlanValue.h
    M llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
    M llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions-tf.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
    M llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll

  Log Message:
  -----------
  [LoopVectorize] Add option to use active lane mask for loop control flow

Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301