[PATCH] D79001: [ARM][MVE] Tail-predication: support nested loops with dependent iterators.

Sjoerd Meijer via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 28 06:56:18 PDT 2020


SjoerdMeijer created this revision.
SjoerdMeijer added reviewers: samparker, dmgreen.
Herald added subscribers: danielkiss, javed.absar, hiraditya, kristof.beyls.
Herald added a project: LLVM.

We were not able to determine the number of elements processed by the loop (the scalar loop iteration count) for loops with dependent iterators, and so tail-predication was not triggering. The scalar loop iteration is found by pattern matching the masked load/store instruction in the vector body that use this value, which is then checked with SCEV information to make sure that this is right. Not only does the SCEV expression for these type of loops look different, but also finding the actual trip count requires more work, and most changes here are related to this.

Supported now are cases where only the inner loop iterators receives values from its outer loop, like this nested loop example:

  for (i = 0; i < N; i++)
    M = Size - i;
    for (j = 0; j < M; j++)

And also a 3d example like this because the SCEV expression is the same:

  for (k = 0; k < N; k++)
    for (i = 0; i < N; i++)
      M = Size - i;
      for (j = 0; j < M; j++)

And this will cover most reduction kernels that we currently have.

TODO:

The general case where any inner loop iterator can depend on its outer loop is not yet supported. For example, here i is initialised with k, and j is initialised with the value from its parent loop i:

  for (k = 0; k < N; k++)
    for (i = k; i < N; i++)
      for (j = i; j < M; j++)

The reason that this is not yet support is that pattern matching this SCEV is unwieldy as it almost requires a general SCEV visitor as this involves, scAddExpr, scAddRecExpr, scUMaxExpr, and scSMaxExpr SCEV types and still not very general. Instead, as a follow up, we would like to emit the scalar iteration count with an intrinsic, similar like how this is done for the hardware-loop instruction, which we can then simply pick up here, and then we don't need all this pattern matching anymore.


https://reviews.llvm.org/D79001

Files:
  llvm/lib/Target/ARM/MVETailPredication.cpp
  llvm/test/CodeGen/Thumb2/LowOverheadLoops/nested-loop.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D79001.260612.patch
Type: text/x-patch
Size: 39239 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200428/0b056f51/attachment.bin>


More information about the llvm-commits mailing list