[PATCH] D107638: [ARM] Add a tail-predication loop predicate register

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 6 04:52:52 PDT 2021


dmgreen created this revision.
dmgreen added reviewers: SjoerdMeijer, samtebbs, NickGuy, ostannard, simon_tatham.
Herald added subscribers: danielkiss, zzheng, hiraditya, kristof.beyls, qcolombet.
dmgreen requested review of this revision.
Herald added a project: LLVM.

The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words:

  mov r3, #3
  DLSTP lr, r3        // Start tail predication, lr==3
  VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
  mov lr, #1
  VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions.

This patch adds a new "lr" predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be set.

A lot of tests needed updating.


https://reviews.llvm.org/D107638

Files:
  llvm/lib/CodeGen/MachineVerifier.cpp
  llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
  llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
  llvm/lib/Target/ARM/ARMISelLowering.cpp
  llvm/lib/Target/ARM/ARMInstrCDE.td
  llvm/lib/Target/ARM/ARMInstrFormats.td
  llvm/lib/Target/ARM/ARMInstrMVE.td
  llvm/lib/Target/ARM/ARMLoadStoreOptimizer.cpp
  llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
  llvm/lib/Target/ARM/AsmParser/ARMAsmParser.cpp
  llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
  llvm/lib/Target/ARM/MVETPAndVPTOptimisationsPass.cpp
  llvm/test/CodeGen/ARM/machine-outliner-unoutlinable.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/add_reduce.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/begin-vpt-without-inst.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/cmplx_cong.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/count_dominates_start.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/ctlz-non-zeros.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/disjoint-vcmp.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/dont-ignore-vctp.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/dont-remove-loop-update.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/emptyblock.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/extract-element.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/incorrect-sub-16.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/incorrect-sub-32.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/incorrect-sub-8.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/inloop-vpnot-1.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/inloop-vpnot-2.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/inloop-vpnot-3.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/inloop-vpsel-1.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/inloop-vpsel-2.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/invariant-qreg.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-chain-store.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-chain.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-itercount.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-mov.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/it-block-random.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/iv-two-vcmp-reordered.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/iv-two-vcmp.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/iv-vcmp.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/livereg-no-loop-def.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/lstp-insertion-position.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/matrix.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/mov-after-dlstp.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/mov-lr-terminator.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/move-def-before-start.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/move-start-after-def.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/multi-block-cond-iter-count.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/multi-cond-iter-count.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/multiple-do-loops.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/no-vpsel-liveout.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/non-masked-load.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/non-masked-store.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/predicated-invariant.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/predicated-liveout.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/reductions-vpt-liveout.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/remove-elem-moves.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/safe-retaining.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/skip-debug.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/skip-vpt-debug.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/subreg-liveness.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/unpredicated-max.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/unrolled-and-vector.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/unsafe-retaining.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vaddv.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vcmp-vpst-combination-across-blocks.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-add-operand-liveout.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-in-vpt-2.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-in-vpt.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-subi3.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-subri.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp-subri12.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vctp16-reduce.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vector_spill_in_loop.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vmaxmin_vpred_r.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vmldava_in_vpt.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/vpt-blocks.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/wls-search-pred.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/wlstp.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/wrong-liveout-lsr-shift.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/wrong-vctp-opcode-liveout.mir
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/wrong-vctp-operand-liveout.mir
  llvm/test/CodeGen/Thumb2/mve-gatherscatter-mmo.ll
  llvm/test/CodeGen/Thumb2/mve-postinc-distribute.mir
  llvm/test/CodeGen/Thumb2/mve-stacksplot.mir
  llvm/test/CodeGen/Thumb2/mve-tp-loop.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-2-blocks-1-pred.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-2-blocks-2-preds.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-2-blocks-ctrl-flow.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-2-blocks-non-consecutive-ins.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-2-blocks.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-3-blocks-kill-vpr.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-1-ins.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-2-ins.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-4-ins.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-debug.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-elses.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-fold-vcmp.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-kill.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-block-optnone.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-nots.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-optimisations.mir
  llvm/test/CodeGen/Thumb2/mve-vpt-preuse.mir
  llvm/test/CodeGen/Thumb2/mve-wls-block-placement.mir
  llvm/test/CodeGen/Thumb2/phi_prevent_copy.mir



More information about the llvm-commits mailing list