[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder
Cohen, Elad2 via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 16 22:22:13 PDT 2017
We would like to add support for masked vector signed/unsigned integer division and remainder in the LLVM IR by introducing new target-independent intrinsics.
This follows similar work which was done already for masked vector loads and stores - http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html.
Another relevant reference is the masked scatter/gather intrinsics discussion - http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html.
In the current state if the loop-vectorizer decides that it should vectorize a loop which contains a predicated integer division - it will vectorize the loop body and scalarize the predicated division instruction into a sequence of branches that guard scalar division operations. In some cases the generated code for this will not be very efficient. Speculating the divides using a non-masked vector sdiv instruction is usually not an option due to the danger of integer divide-by-zero.
With the addition of these hereby proposed intrinsics the loop-vectorizer could concentrate on the vector semantics rather than how to lower them, by generating the masked intrinsics.
Initially the intrinsics will be scalarized for all targets. This could be done by extending scalarize-masked-mem-intrin to handle also division masked intrinsics. Later the intrinsics could be optimized by:
1. Lowering of the intrinsics in the backend using different expansions (for example converting to floating point and using masked vector floating-point division instructions).
2. Linking the intrinsics to different vector math library implementations.
3. Scalarizing the intrinsics at the backend possibly using target-specific considerations.
Proposed Definition (The following example is for masked signed division. The rest are similar)
An overloaded intrinsic. You can use llvm.masked.sdiv on any vector with integer elements.
declare <16 x i32> @llvm.masked.sdiv.v16i32(<16 x i32> <a>, <16 x i32> <b>, <16 x i1> <mask>, <16 x i32> <passthru>)
Returns the quotient of its two operands per vector lane according to the provided mask. The mask holds a bit for each vector lane, and is used to prevent division in the masked-off lanes. The masked-off lanes in the result vector are taken from the corresponding lanes of the passthru operand.
The first two arguments must be vectors of integer values. Both arguments must have identical types. The third operand, mask, is a vector of boolean values with the same number of elements as the first two. The fourth is a pass-through value that is used to fill the masked-off lanes of the result. The type of the passthru operand is the same as the first two.
The 'llvm.masked.sdiv' intrinsic is designed for conditional integer division of selected vector elements in a single IR operation. The result of this operation is equivalent to a regular vector 'sdiv' instruction followed by a 'select' between the loaded and the passthru values, predicated on the same mask. However, using this intrinsic prevents divide-by-zero exceptions on division of masked-off lanes. If any element in a turned-on lane of the divisor is zero, the operation has undefined behavior.
Feedback and comments are welcome!
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev