[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore
Michael Kuperstein via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 26 00:31:41 PDT 2016
In theory, we could offload several things to such a target plug-in, I'm
just not entirely sure we want to.
Two examples I can think of:
1) This could be a better interface for masked load/stores and gathers.
2) Horizontal reductions. I tried writing
yet-another-horizontals-as-first-class-citizens proposal a couple of months
ago, and the main problem from the previous discussions about this was that
there's no good common representation. E.g. should a horizontal add return
a vector or a scalar, should it return the base type of the vector (assumes
saturation) or a wider integer type, etc. With a plugin, we could have the
vectorizer emit the right target intrinsic, instead of the crazy backend
pattern-matching we have now.
On Sun, Sep 25, 2016 at 9:28 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:
>
> |
> |Hi Elena,
> |
> |Technically speaking, this seems straightforward.
> |
> |I wonder, however, how target-independent this is in a practical
> |sense; will there be an efficient lowering when targeting any other
> |ISA? I don't want to get into the territory where, because the
> |vectorizer is supposed to be architecture independent, we need to
> |add target-independent intrinsics for all potentially-side-effect-
> |carrying idioms (or just complicated idioms) we want the vectorizer to
> |support on any target. Is there a way we can design the vectorizer so
> |that the targets can plug in their own idiom recognition for these
> |kinds of things, and then, via that interface, let the vectorizer
> produce
> |the relevant target-dependent intrinsics?
>
> Entering target specific plug-in in vectorizer may be a good idea. We need
> target specific pattern recognition and target specific implementation of
> “vectorizeMemoryInstruction”. (It may be more functionality in the future)
> TTI->checkAdditionalVectorizationOppotunities() - detects target specific
> patterns; X86 will find compress/expand and may be others
> TTI->vectorizeMemoryInstruction() - handle only exotic target-specific
> cases
>
> Pros:
> It will allow us to implement all X86 specific solutions.
> The expandload and compresssrore intrinsics may be x86 specific,
> polymorphic:
> llvm.x86.masked.expandload()
> llvm.x86.masked.compressstore()
>
> Cons:
>
> TTI will need to deal with Loop Info, SCEVs and other loop analysis info
> that it does not have today. (I do not like this way)
> Or we'll need to introduce TLV - Target Loop Vectorizer - a new class that
> handles all target specific cases. This solution seems more reasonable, but
> too heavy just for compress/expand.
> Do you see any other target plug-in solution?
>
> -Elena
>
> |
> |Thanks again,
> |Hal
> |
> |----- Original Message -----
> |> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> |> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> |> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael Kuperstein"
> |<mkuper at google.com>, "Adam Nemet (anemet at apple.com)"
> |> <anemet at apple.com>, "Hal Finkel (hfinkel at anl.gov)"
> |<hfinkel at anl.gov>, "Sanjay Patel (spatel at rotateright.com)"
> |> <spatel at rotateright.com>, "Nadav Rotem"
> |<nadav.rotem at me.com>
> |> Sent: Monday, September 19, 2016 1:37:02 AM
> |> Subject: RFC: New intrinsics masked.expandload and
> |> masked.compressstore
> |>
> |>
> |> Hi all,
> |>
> |> AVX-512 ISA introduces new vector instructions VCOMPRESS and
> |VEXPAND
> |> in order to allow vectorization of the following loops with two
> |> specific types of cross-iteration dependencies:
> |>
> |> Compress:
> |> for (int i=0; i<N; ++i)
> |> If (t[i])
> |> *A++ = expr;
> |>
> |> Expand:
> |> for (i=0; i<N; ++i)
> |> If (t[i])
> |> X[i] = *A++;
> |> else
> |> X[i] = PassThruV[i];
> |>
> |> On this poster (
> |> http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf )
> |you’ll
> |> find depicted “compress” and “expand” patterns.
> |>
> |> The RFC proposes to support this functionality by introducing two
> |> intrinsics to LLVM IR:
> |> llvm.masked.expandload.*
> |> llvm.masked.compressstore.*
> |>
> |> The syntax of these two intrinsics is similar to the syntax of
> |> llvm.masked.load.* and masked.store.*, respectively, but the
> |semantics
> |> are different, matching the above patterns.
> |>
> |> %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32
> |(float*
> |> %ptr, <16 x i1>%mask, <16 x float> %passthru) void
> |> @llvm.masked.compressstore.v16f32.p0f32 (<16 x float> <value>,
> |> float* <ptr>, <16 x i1> <mask>)
> |>
> |> The arguments - %mask, %value and %passthru all have the same
> |vector
> |> length.
> |> The underlying type of %ptr corresponds to the scalar type of the
> |> vector value.
> |> (In brief; the full syntax description will be provided in subsequent
> |> full documentation.)
> |>
> |> The intrinsics are planned to be target independent, similar to
> |> masked.load/store/gather/scatter. They will be lowered effectively
> |on
> |> AVX-512 and scalarized on other targets, also akin to masked.*
> |> intrinsics.
> |> Loop vectorizer will query TTI about existence of effective support
> |> for these intrinsics, and if provided will be able to handle loops
> |> with such cross-iteration dependences.
> |>
> |> The first step will include the full documentation and
> |implementation
> |> of CodeGen part.
> |>
> |> An additional information about expand load (
> |>
> |https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=
> |exp
> |> andload&techs=AVX_512
> |> ) and compress store (
> |>
> |https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=
> |com
> |> pressstore&techs=AVX_512
> |> ) you also can find in the Intel Intrinsic Guide.
> |>
> |>
> |> * Elena
> |>
> |> ---------------------------------------------------------------------
> |> Intel Israel (74) Limited
> |>
> |> This e-mail and any attachments may contain confidential material
> |for
> |> the sole use of the intended recipient(s). Any review or distribution
> |> by others is strictly prohibited. If you are not the intended
> |> recipient, please contact the sender and delete all copies.
> |
> |--
> |Hal Finkel
> |Lead, Compiler Technology and Programming Languages Leadership
> |Computing Facility Argonne National Laboratory
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160926/69afe137/attachment.html>
More information about the llvm-dev
mailing list