[all-commits] [llvm/llvm-project] d15225: [ARM] Reimplement MVE Tail-Predication pass using ...

sjoerdmeijer via All-commits all-commits at lists.llvm.org
Wed Jun 17 07:25:26 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: d1522513d4c4a7bccda664c226a055057ab264d2
      https://github.com/llvm/llvm-project/commit/d1522513d4c4a7bccda664c226a055057ab264d2
  Author: Sjoerd Meijer <sjoerd.meijer at arm.com>
  Date:   2020-06-17 (Wed, 17 Jun 2020)

  Changed paths:
    M llvm/lib/Target/ARM/MVETailPredication.cpp
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/basic-tail-pred.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/clear-maskedinsts.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/cond-vector-reduce-mve-codegen.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/extending-loads.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/nested.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-const.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-widen.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-reduce.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-arith-codegen.ll
    M llvm/test/CodeGen/Thumb2/LowOverheadLoops/vector-reduce-mve-tail.ll
    M llvm/test/CodeGen/Thumb2/mve-fma-loops.ll

  Log Message:
  -----------
  [ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.mask

To set up a tail-predicated loop, we need to to calculate the number of
elements processed by the loop. We can now use intrinsic
@llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in
D79100. This intrinsic generates a predicate for the masked loads/stores, and
consumes the Backedge Taken Count (BTC) as its second argument. We can now use
that to reconstruct the loop tripcount, instead of the IR pattern match
approach we were using before.

Many thanks to Eli Friedman and Sam Parker for all their help with this work.

This also adds overflow checks for the different, new expressions that we
create: the loop tripcount, and the sub expression that calculates the
remaining elements to be processed. For the latter, SCEV is not able to
calculate precise enough bounds, so we work around that at the moment, but is
not entirely correct yet, it's conservative. The overflow checks can be
overruled with a force flag, which is thus potentially unsafe (but not really
because the vectoriser is the only place where this intrinsic is emitted at the
moment). It's also good to mention that the tail-predication pass is not yet
enabled by default.  We will follow up to see if we can implement these
overflow checks better, either by a change in SCEV or we may want revise the
definition of llvm.get.active.lane.mask.

Differential Revision: https://reviews.llvm.org/D79175




More information about the All-commits mailing list