[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore

Mon Sep 26 00:31:41 PDT 2016

In theory, we could offload several things to such a target plug-in, I'm
just not entirely sure we want to.

Two examples I can think of:

1) This could be a better interface for masked load/stores and gathers.

2) Horizontal reductions. I tried writing
yet-another-horizontals-as-first-class-citizens proposal a couple of months
ago, and the main problem from the previous discussions about this was that
there's no good common representation. E.g. should a horizontal add return
a vector or a scalar, should it return the base type of the vector (assumes
saturation) or a wider integer type, etc. With a plugin, we could have the
vectorizer emit the right target intrinsic, instead of the crazy backend
pattern-matching we have now.

On Sun, Sep 25, 2016 at 9:28 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:

>
>   |
>   |Hi Elena,
>   |
>   |Technically speaking, this seems straightforward.
>   |
>   |I wonder, however, how target-independent this is in a practical
>   |sense; will there be an efficient lowering when targeting any other
>   |ISA? I don't want to get into the territory where, because the
>   |vectorizer is supposed to be architecture independent, we need to
>   |add target-independent intrinsics for all potentially-side-effect-
>   |carrying idioms (or just complicated idioms) we want the vectorizer to
>   |support on any target. Is there a way we can design the vectorizer so
>   |that the targets can plug in their own idiom recognition for these
>   |kinds of things, and then, via that interface, let the vectorizer
> produce
>   |the relevant target-dependent intrinsics?
>
> Entering target specific plug-in in vectorizer may be a good idea. We need
> target specific pattern recognition and target specific implementation of
> “vectorizeMemoryInstruction”. (It may be more functionality in the future)
> TTI->checkAdditionalVectorizationOppotunities() - detects target specific
> patterns; X86 will find compress/expand and may be others
> TTI->vectorizeMemoryInstruction()  - handle only exotic target-specific
> cases
>
> Pros:
> It will allow us to implement all X86 specific solutions.
> The expandload and compresssrore intrinsics may be x86 specific,
> polymorphic:
> llvm.x86.masked.expandload()
> llvm.x86.masked.compressstore()
>
> Cons:
>
> TTI will need to deal with Loop Info, SCEVs and other loop analysis info
> that it does not have today. (I do not like this way)
> Or we'll need to introduce TLV - Target Loop Vectorizer - a new class that
> handles all target specific cases. This solution seems more reasonable, but
> too heavy just for compress/expand.
> Do you see any other target plug-in solution?
>
> -Elena
>
>   |
>   |Thanks again,
>   |Hal
>   |
>   |----- Original Message -----
>   |> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
>   |> To: "llvm-dev" <llvm-dev at lists.llvm.org>
>   |> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael Kuperstein"
>   |<mkuper at google.com>, "Adam Nemet (anemet at apple.com)"
>   |> <anemet at apple.com>, "Hal Finkel (hfinkel at anl.gov)"
>   |<hfinkel at anl.gov>, "Sanjay Patel (spatel at rotateright.com)"
>   |> <spatel at rotateright.com>, "Nadav Rotem"
>   |<nadav.rotem at me.com>
>   |> Sent: Monday, September 19, 2016 1:37:02 AM
>   |> Subject: RFC: New intrinsics masked.expandload and
>   |> masked.compressstore
>   |>
>   |>
>   |> Hi all,
>   |>
>   |> AVX-512 ISA introduces new vector instructions VCOMPRESS and
>   |VEXPAND
>   |> in order to allow vectorization of the following loops with two
>   |> specific types of cross-iteration dependencies:
>   |>
>   |> Compress:
>   |> for (int i=0; i<N; ++i)
>   |> If (t[i])
>   |> *A++ = expr;
>   |>
>   |> Expand:
>   |> for (i=0; i<N; ++i)
>   |> If (t[i])
>   |> X[i] = *A++;
>   |> else
>   |> X[i] = PassThruV[i];
>   |>
>   |> On this poster (
>   |> http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf )
>   |you’ll
>   |> find depicted “compress” and “expand” patterns.
>   |>
>   |> The RFC proposes to support this functionality by introducing two
>   |> intrinsics to LLVM IR:
>   |> llvm.masked.expandload.*
>   |> llvm.masked.compressstore.*
>   |>
>   |> The syntax of these two intrinsics is similar to the syntax of
>   |> llvm.masked.load.* and masked.store.*, respectively, but the
>   |semantics
>   |> are different, matching the above patterns.
>   |>
>   |> %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32
>   |(float*
>   |> %ptr, <16 x i1>%mask, <16 x float> %passthru) void
>   |> @llvm.masked.compressstore.v16f32.p0f32 (<16 x float> <value>,
>   |> float* <ptr>, <16 x i1> <mask>)
>   |>
>   |> The arguments - %mask, %value and %passthru all have the same
>   |vector
>   |> length.
>   |> The underlying type of %ptr corresponds to the scalar type of the
>   |> vector value.
>   |> (In brief; the full syntax description will be provided in subsequent
>   |> full documentation.)
>   |>
>   |> The intrinsics are planned to be target independent, similar to
>   |> masked.load/store/gather/scatter. They will be lowered effectively
>   |on
>   |> AVX-512 and scalarized on other targets, also akin to masked.*
>   |> intrinsics.
>   |> Loop vectorizer will query TTI about existence of effective support
>   |> for these intrinsics, and if provided will be able to handle loops
>   |> with such cross-iteration dependences.
>   |>
>   |> The first step will include the full documentation and
>   |implementation
>   |> of CodeGen part.
>   |>
>   |> An additional information about expand load (
>   |>
>   |https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=
>   |exp
>   |> andload&techs=AVX_512
>   |> ) and compress store (
>   |>
>   |https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=
>   |com
>   |> pressstore&techs=AVX_512
>   |> ) you also can find in the Intel Intrinsic Guide.
>   |>
>   |>
>   |>     * Elena
>   |>
>   |> ---------------------------------------------------------------------
>   |> Intel Israel (74) Limited
>   |>
>   |> This e-mail and any attachments may contain confidential material
>   |for
>   |> the sole use of the intended recipient(s). Any review or distribution
>   |> by others is strictly prohibited. If you are not the intended
>   |> recipient, please contact the sender and delete all copies.
>   |
>   |--
>   |Hal Finkel
>   |Lead, Compiler Technology and Programming Languages Leadership
>   |Computing Facility Argonne National Laboratory
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160926/69afe137/attachment.html>