[llvm-dev] [RFC] Adding Intrinsics for Masked Vector Integer Division and Remainder

Fri Oct 20 13:44:34 PDT 2017

Adding Sander and Florian, as this will certainly apply to SVE.

cheers,
--renato

On 17 October 2017 at 18:58, Friedman, Eli via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> On 10/16/2017 10:22 PM, Cohen, Elad2 via llvm-dev wrote:
>
> Introduction
>
> ==========
>
>
>
> We would like to add support for masked vector signed/unsigned integer
> division and remainder in the LLVM IR by introducing new target-independent
> intrinsics.
>
>
>
> This follows similar work which was done already for masked vector loads and
> stores - http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html.
>
> Another relevant reference is the masked scatter/gather intrinsics
> discussion -
> http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html.
>
>
>
>
>
> Motivation
>
> =========
>
>
>
> In the current state if the loop-vectorizer decides that it should vectorize
> a loop which contains a predicated integer division - it will vectorize the
> loop body and scalarize the predicated division instruction into a sequence
> of branches that guard scalar division operations. In some cases the
> generated code for this will not be very efficient. Speculating the divides
> using a non-masked vector sdiv instruction is usually not an option due to
> the danger of integer divide-by-zero.
>
>
>
> With the addition of these hereby proposed intrinsics the loop-vectorizer
> could concentrate on the vector semantics rather than how to lower them, by
> generating the masked intrinsics.
>
> Initially the intrinsics will be scalarized for all targets. This could be
> done by extending scalarize-masked-mem-intrin to handle also division masked
> intrinsics. Later the intrinsics could be optimized by:
>
> 1.       Lowering of the intrinsics in the backend using different
> expansions (for example converting to floating point and using masked vector
> floating-point division instructions).
>
> 2.       Linking the intrinsics to different vector math library
> implementations.
>
> 3.       Scalarizing the intrinsics at the backend possibly using
> target-specific considerations.
>
>
>
>
>
> Proposed Definition (The following example is for masked signed division.
> The rest are similar)
>
> ========================================================================
>
>
>
>      ‘llvm.masked.sdiv’
>
>
>
>      Syntax:
>
>
>
>            An overloaded intrinsic. You can use llvm.masked.sdiv on any
> vector with integer elements.
>
>
>
>            declare <16 x i32>  @llvm.masked.sdiv.v16i32(<16 x i32> <a>, <16
> x i32> <b>, <16 x i1> <mask>, <16 x i32> <passthru>)
>
>
>
>      Overview:
>
>
>
>            Returns the quotient of its two operands per vector lane
> according to the provided mask. The mask holds a bit for each vector lane,
> and is used to prevent division in the masked-off lanes. The masked-off
> lanes in the result vector are taken from the corresponding lanes of the
> passthru operand.
>
>
>
>      Arguments:
>
>
>
>            The first two arguments must be vectors of integer values. Both
> arguments must have identical types. The third operand, mask, is a vector of
> boolean values with the same number of elements as the first two. The fourth
> is a pass-through value that is used to fill the masked-off lanes of the
> result. The type of the passthru operand is the same as the first two.
>
>
>
>      Semantics:
>
>
>
>            The ‘llvm.masked.sdiv’ intrinsic is designed for conditional
> integer division of selected vector elements in a single IR operation. The
> result of this operation is equivalent to a regular vector 'sdiv'
> instruction followed by a ‘select’ between the loaded and the passthru
> values, predicated on the same mask. However, using this intrinsic prevents
> divide-by-zero exceptions on division of masked-off lanes. If any element in
> a turned-on lane of the divisor is zero, the operation has undefined
> behavior.
>
>
> You probably want to mention INT_MIN/-1 overflow here?
>
> ----
>
> The alternative here is to refine the definition of "sdiv" in LangRef; other
> arithmetic operations LLVM IR don't have undefined behavior, and the primary
> reason "sdiv" has undefined behavior is the unfortunate behavior of the x86
> "IDIV" instruction.  For example, we could add a "nooverflow" bit to "sdiv",
> and say that divide-by-zero has undefined behavior if the "nooverflow" bit
> is present, and produces poison otherwise.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>