[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore
Michael Kuperstein via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 26 00:31:41 PDT 2016
In theory, we could offload several things to such a target plug-in, I'm
just not entirely sure we want to.
Two examples I can think of:
1) This could be a better interface for masked load/stores and gathers.
2) Horizontal reductions. I tried writing
yet-another-horizontals-as-first-class-citizens proposal a couple of months
ago, and the main problem from the previous discussions about this was that
there's no good common representation. E.g. should a horizontal add return
a vector or a scalar, should it return the base type of the vector (assumes
saturation) or a wider integer type, etc. With a plugin, we could have the
vectorizer emit the right target intrinsic, instead of the crazy backend
pattern-matching we have now.
On Sun, Sep 25, 2016 at 9:28 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:
> |Hi Elena,
> |Technically speaking, this seems straightforward.
> |I wonder, however, how target-independent this is in a practical
> |sense; will there be an efficient lowering when targeting any other
> |ISA? I don't want to get into the territory where, because the
> |vectorizer is supposed to be architecture independent, we need to
> |add target-independent intrinsics for all potentially-side-effect-
> |carrying idioms (or just complicated idioms) we want the vectorizer to
> |support on any target. Is there a way we can design the vectorizer so
> |that the targets can plug in their own idiom recognition for these
> |kinds of things, and then, via that interface, let the vectorizer
> |the relevant target-dependent intrinsics?
> Entering target specific plug-in in vectorizer may be a good idea. We need
> target specific pattern recognition and target specific implementation of
> “vectorizeMemoryInstruction”. (It may be more functionality in the future)
> TTI->checkAdditionalVectorizationOppotunities() - detects target specific
> patterns; X86 will find compress/expand and may be others
> TTI->vectorizeMemoryInstruction() - handle only exotic target-specific
> It will allow us to implement all X86 specific solutions.
> The expandload and compresssrore intrinsics may be x86 specific,
> TTI will need to deal with Loop Info, SCEVs and other loop analysis info
> that it does not have today. (I do not like this way)
> Or we'll need to introduce TLV - Target Loop Vectorizer - a new class that
> handles all target specific cases. This solution seems more reasonable, but
> too heavy just for compress/expand.
> Do you see any other target plug-in solution?
> |Thanks again,
> |----- Original Message -----
> |> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> |> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> |> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael Kuperstein"
> |<mkuper at google.com>, "Adam Nemet (anemet at apple.com)"
> |> <anemet at apple.com>, "Hal Finkel (hfinkel at anl.gov)"
> |<hfinkel at anl.gov>, "Sanjay Patel (spatel at rotateright.com)"
> |> <spatel at rotateright.com>, "Nadav Rotem"
> |<nadav.rotem at me.com>
> |> Sent: Monday, September 19, 2016 1:37:02 AM
> |> Subject: RFC: New intrinsics masked.expandload and
> |> masked.compressstore
> |> Hi all,
> |> AVX-512 ISA introduces new vector instructions VCOMPRESS and
> |> in order to allow vectorization of the following loops with two
> |> specific types of cross-iteration dependencies:
> |> Compress:
> |> for (int i=0; i<N; ++i)
> |> If (t[i])
> |> *A++ = expr;
> |> Expand:
> |> for (i=0; i<N; ++i)
> |> If (t[i])
> |> X[i] = *A++;
> |> else
> |> X[i] = PassThruV[i];
> |> On this poster (
> |> http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf )
> |> find depicted “compress” and “expand” patterns.
> |> The RFC proposes to support this functionality by introducing two
> |> intrinsics to LLVM IR:
> |> llvm.masked.expandload.*
> |> llvm.masked.compressstore.*
> |> The syntax of these two intrinsics is similar to the syntax of
> |> llvm.masked.load.* and masked.store.*, respectively, but the
> |> are different, matching the above patterns.
> |> %res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32
> |> %ptr, <16 x i1>%mask, <16 x float> %passthru) void
> |> @llvm.masked.compressstore.v16f32.p0f32 (<16 x float> <value>,
> |> float* <ptr>, <16 x i1> <mask>)
> |> The arguments - %mask, %value and %passthru all have the same
> |> length.
> |> The underlying type of %ptr corresponds to the scalar type of the
> |> vector value.
> |> (In brief; the full syntax description will be provided in subsequent
> |> full documentation.)
> |> The intrinsics are planned to be target independent, similar to
> |> masked.load/store/gather/scatter. They will be lowered effectively
> |> AVX-512 and scalarized on other targets, also akin to masked.*
> |> intrinsics.
> |> Loop vectorizer will query TTI about existence of effective support
> |> for these intrinsics, and if provided will be able to handle loops
> |> with such cross-iteration dependences.
> |> The first step will include the full documentation and
> |> of CodeGen part.
> |> An additional information about expand load (
> |> andload&techs=AVX_512
> |> ) and compress store (
> |> pressstore&techs=AVX_512
> |> ) you also can find in the Intel Intrinsic Guide.
> |> * Elena
> |> ---------------------------------------------------------------------
> |> Intel Israel (74) Limited
> |> This e-mail and any attachments may contain confidential material
> |> the sole use of the intended recipient(s). Any review or distribution
> |> by others is strictly prohibited. If you are not the intended
> |> recipient, please contact the sender and delete all copies.
> |Hal Finkel
> |Lead, Compiler Technology and Programming Languages Leadership
> |Computing Facility Argonne National Laboratory
> Intel Israel (74) Limited
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev