[PATCH] D86147: [LangRef] WIP: Revise semantics of get.active.lane.mask

Tue Aug 18 09:26:50 PDT 2020

SjoerdMeijer created this revision.
SjoerdMeijer added reviewers: efriedma, simoll, vkmr, rogfer01, samparker, fhahn, rkruppe.
Herald added subscribers: javed.absar, kristof.beyls.
Herald added a reviewer: jdoerfert.
Herald added a project: LLVM.
SjoerdMeijer requested review of this revision.

A first version of get.active.lane.mask was committed in rG7fb8a40e5220 <https://reviews.llvm.org/rG7fb8a40e5220d6d4efa14c15f92b6f28ba1b18f7>.  One of the main purposes and uses of this intrinsic is to communicate information to the back-end, but its current definition and semantics make this actually very difficult. The intrinsic is defined as:

  @llvm.get.active.lane.mask(%IV, %BTC)

where %BTC is the Backedge-Taken Count (variable names are different in the LangRef spec). This allows to implicitly communicate the loop tripcount, which can be reconstructed by calculating BTC + 1. But it has been very difficult to prove that calculating BTC + 1 is safe and doesn't overflow. We need complicated range and SCEV analysis, and thus the problem is that this intrinsic isn't really doing what it was supposed to solve. Examples of the overflow checks that are required in the (ARM) back-end are D79175 <https://reviews.llvm.org/D79175> and D86074 <https://reviews.llvm.org/D86074>, which aren't even complete/correct yet.

To solve this problem, I am looking at alternative definitions/semantics for get.active.lane.mask to avoid all the complicated overflow analysis.

One obvious alternative is not to communicate the BTC but the loop tripcount instead. Now using LangRef's variable names, this means changing the current semantics from:

  icmp ule (%base + i), %n

to:

  icmp ule (%base + i), %n - 1

where %n > 0, and corresponds to the loop tripcount. The intrinsic signature remains the same.

      

I have marked this as Work-In-Progress (WIP) as I am looking for early feedback on this while I prototype and plumb this new semantics through the middle-end and back-end, and make sure I haven't missed anything.


https://reviews.llvm.org/D86147

Files:
  llvm/docs/LangRef.rst


Index: llvm/docs/LangRef.rst
===================================================================

--- llvm/docs/LangRef.rst
+++ llvm/docs/LangRef.rst
@@ -16924,14 +16924,15 @@
 
 ::
 
-      %m[i] = icmp ule (%base + i), %n
+      %m[i] = icmp ule (%base + i), %n - 1
 
 where ``%m`` is a vector (mask) of active/inactive lanes with its elements
 indexed by ``i``,  and ``%base``, ``%n`` are the two arguments to
 ``llvm.get.active.lane.mask.*``, ``%imcp`` is an integer compare and ``ule``
 the unsigned less-than-equal comparison operator.  Overflow cannot occur in
-``(%base + i)`` and its comparison against ``%n`` as it is performed in integer
-numbers and not in machine numbers.  The above is equivalent to:
+``(%base + i)`` and its comparison against ``%n`` with ``%n > 0``, as it is
+performed in integer numbers and not in machine numbers. The above is
+equivalent to:
 
 ::
 
@@ -16939,13 +16940,13 @@
 
 This can, for example, be emitted by the loop vectorizer. Then, ``%base`` is
 the first element of the vector induction variable (VIV), and ``%n`` is the
-Back-edge Taken Count (BTC). Thus, these intrinsics perform an element-wise
-less than or equal comparison of VIV with BTC, producing a mask of true/false
-values representing active/inactive vector lanes, except if the VIV overflows
-in which case they return false in the lanes where the VIV overflows.  The
-arguments are scalar types to accommodate scalable vector types, for which it is
-unknown what the type of the step vector needs to be that enumerate its
-lanes without overflow.
+loop tripcount. Since ``%n - 1`` corresponds to the Back-edge Taken Count
+(BTC), these intrinsics perform an element-wise less than or equal comparison
+of VIV with BTC, producing a mask of true/false values representing
+active/inactive vector lanes, except if the VIV overflows in which case they
+return false in the lanes where the VIV overflows.  The arguments are scalar
+types to accommodate scalable vector types, for which it is unknown what the
+type of the step vector needs to be that enumerate its lanes without overflow.
 
 This mask ``%m`` can e.g. be used in masked load/store instructions. These
 intrinsics provide a hint to the backend. I.e., for a vector loop, the


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D86147.286310.patch
Type: text/x-patch
Size: 2247 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200818/1a8b80d3/attachment-0001.bin>