[llvm-dev] RFC: New intrinsics masked.expandload and masked.compressstore
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 26 14:08:14 PDT 2016
----- Original Message -----
> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael Kuperstein" <mkuper at google.com>, "Adam Nemet (anemet at apple.com)"
> <anemet at apple.com>, "Sanjay Patel (spatel at rotateright.com)" <spatel at rotateright.com>, "Nadav Rotem"
> <nadav.rotem at me.com>, "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Monday, September 26, 2016 3:55:27 PM
> Subject: RE: RFC: New intrinsics masked.expandload and masked.compressstore
>
>
> |
> |How would this work in this case? The result would need to affect
> |the
> |legality and cost of the memory instruction. From your poster, it
> |looks
> |like we're talking about loops with constructs like this:
> |
> |for (i =0; i < N; i++) {
> | if (topVal > b[i]) {
> | *dst = a[i];
> | dst++;
> | }
> |}
> |
> |is this loop vectorizable at all without these constructs?
>
> Good question. Today it isn't. Theoretically yes if we'll know that
> only a small part of the loop has cross-iteration dependency or
> another issue. A loop may be vectorized and contain scalar pieces
> inside.
> But it requires full reconstruction of the cost model.
>
> | It looks like
> |the target would need to analyze the PHI representing the store's
> |address, assign the store some reasonable cost, and also provide
> |some alternative SCEVs (perhaps lower and upper bounds) for use
> |with the dependence checks?
>
> First of all, this loop should pass legality check. Legality will
> need an additional effort in order to detect compress/expand pattern
> in a loop with cross-iteration dependency.
> Once the pattern is detected, we mark the "store" as "compressing
> store" and TTI will give a cost for compressing store.
> |
> |> X86 will find compress/expand and may be others
> |
> |What others might fit in here?
> The compress/expand are special patterns that will require a separate
> analysis. I thought about other X86 specific patterns that may be
> detected. Strided memory access with masks or arithmetic with
> saturation. But again, I'm not sure that constructing plug-in will
> not be an overkill in this case.
I'm fairly certainly that creating a plugin interface just for this would be overkill. Nevertheless, I found this discussion quite helpful. If we can't think of any other examples, I'm fine with this intrinsic as proposed.
Thanks again,
Hal
> |
> |> TTI->vectorizeMemoryInstruction() - handle only exotic
> |> target-specific cases
> |>
> |> Pros:
> |> It will allow us to implement all X86 specific solutions.
> |> The expandload and compresssrore intrinsics may be x86 specific,
> |> polymorphic:
> |> llvm.x86.masked.expandload()
> |> llvm.x86.masked.compressstore()
> |>
> |> Cons:
> |>
> |> TTI will need to deal with Loop Info, SCEVs and other loop
> |> analysis
> |> info that it does not have today. (I do not like this way)
> |
> |Giving TTI the loop and other analyses, in itself, does not bother
> |me.
> |getUnrollingPreferences takes a Loop*. I'm more concerned about
> |how cleanly we could integrate everything.
> |
> |> Or we'll need to introduce TLV - Target Loop Vectorizer - a new
> |> class
> |> that handles all target specific cases. This solution seems more
> |> reasonable, but too heavy just for compress/expand.
> |
> |I don't see how this would work without duplicating a lot of the
> |logic
> |in the vectorizer (unless it is really just doing loop-idiom
> |recognition,
> |in which case none of this is really relevant). You'd want the
> |cost-
> |model using by the vectorizer, in general, to be integrated with
> |whatever the target was providing.
> |
> |Thanks again,
> |Hal
> |
> |> Do you see any other target plug-in solution?
> |>
> |> -Elena
> |>
> |> |
> |> |Thanks again,
> |> |Hal
> |> |
> |> |----- Original Message -----
> |> |> From: "Elena Demikhovsky" <elena.demikhovsky at intel.com>
> |> |> To: "llvm-dev" <llvm-dev at lists.llvm.org>
> |> |> Cc: "Ayal Zaks" <ayal.zaks at intel.com>, "Michael Kuperstein"
> |> |<mkuper at google.com>, "Adam Nemet (anemet at apple.com)"
> |> |> <anemet at apple.com>, "Hal Finkel (hfinkel at anl.gov)"
> |> |<hfinkel at anl.gov>, "Sanjay Patel (spatel at rotateright.com)"
> |> |> <spatel at rotateright.com>, "Nadav Rotem"
> |> |<nadav.rotem at me.com>
> |> |> Sent: Monday, September 19, 2016 1:37:02 AM
> |> |> Subject: RFC: New intrinsics masked.expandload and
> |> |> masked.compressstore
> |> |>
> |> |>
> |> |> Hi all,
> |> |>
> |> |> AVX-512 ISA introduces new vector instructions VCOMPRESS
> |and
> |> |VEXPAND
> |> |> in order to allow vectorization of the following loops with
> |> |> two
> |> |> specific types of cross-iteration dependencies:
> |> |>
> |> |> Compress:
> |> |> for (int i=0; i<N; ++i)
> |> |> If (t[i])
> |> |> *A++ = expr;
> |> |>
> |> |> Expand:
> |> |> for (i=0; i<N; ++i)
> |> |> If (t[i])
> |> |> X[i] = *A++;
> |> |> else
> |> |> X[i] = PassThruV[i];
> |> |>
> |> |> On this poster (
> |> |> http://llvm.org/devmtg/2013-11/slides/Demikhovsky-
> |Poster.pdf )
> |> |you’ll
> |> |> find depicted “compress” and “expand” patterns.
> |> |>
> |> |> The RFC proposes to support this functionality by
> |> |> introducing
> |> |> two
> |> |> intrinsics to LLVM IR:
> |> |> llvm.masked.expandload.*
> |> |> llvm.masked.compressstore.*
> |> |>
> |> |> The syntax of these two intrinsics is similar to the syntax
> |> |> of
> |> |> llvm.masked.load.* and masked.store.*, respectively, but
> |> |> the
> |> |semantics
> |> |> are different, matching the above patterns.
> |> |>
> |> |> %res = call <16 x float>
> |> |> @llvm.masked.expandload.v16f32.p0f32
> |> |(float*
> |> |> %ptr, <16 x i1>%mask, <16 x float> %passthru) void
> |> |> @llvm.masked.compressstore.v16f32.p0f32 (<16 x float>
> |<value>,
> |> |> float* <ptr>, <16 x i1> <mask>)
> |> |>
> |> |> The arguments - %mask, %value and %passthru all have the
> |same
> |> |vector
> |> |> length.
> |> |> The underlying type of %ptr corresponds to the scalar type
> |> |> of
> |> |> the
> |> |> vector value.
> |> |> (In brief; the full syntax description will be provided in
> |> |> subsequent
> |> |> full documentation.)
> |> |>
> |> |> The intrinsics are planned to be target independent,
> |> |> similar to
> |> |> masked.load/store/gather/scatter. They will be lowered
> |> |> effectively
> |> |on
> |> |> AVX-512 and scalarized on other targets, also akin to
> |> |> masked.*
> |> |> intrinsics.
> |> |> Loop vectorizer will query TTI about existence of effective
> |> |> support
> |> |> for these intrinsics, and if provided will be able to
> |> |> handle
> |> |> loops
> |> |> with such cross-iteration dependences.
> |> |>
> |> |> The first step will include the full documentation and
> |> |implementation
> |> |> of CodeGen part.
> |> |>
> |> |> An additional information about expand load (
> |> |>
> |>
> ||https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text
> |=
> |> |exp
> |> |> andload&techs=AVX_512
> |> |> ) and compress store (
> |> |>
> |>
> ||https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text
> |=
> |> |com
> |> |> pressstore&techs=AVX_512
> |> |> ) you also can find in the Intel Intrinsic Guide.
> |> |>
> |> |>
> |> |> * Elena
> |> |>
> |> |> ---------------------------------------------------------------------
> |> |> Intel Israel (74) Limited
> |> |>
> |> |> This e-mail and any attachments may contain confidential
> |> |> material
> |> |for
> |> |> the sole use of the intended recipient(s). Any review or
> |> |> distribution
> |> |> by others is strictly prohibited. If you are not the
> |> |> intended
> |> |> recipient, please contact the sender and delete all copies.
> |> |
> |> |--
> |> |Hal Finkel
> |> |Lead, Compiler Technology and Programming Languages
> |Leadership
> |> |Computing Facility Argonne National Laboratory
> |> ---------------------------------------------------------------------
> |> Intel Israel (74) Limited
> |>
> |> This e-mail and any attachments may contain confidential
> |> material
> |for
> |> the sole use of the intended recipient(s). Any review or
> |> distribution
> |> by others is strictly prohibited. If you are not the intended
> |> recipient, please contact the sender and delete all copies.
> |>
> |
> |--
> |Hal Finkel
> |Lead, Compiler Technology and Programming Languages Leadership
> |Computing Facility Argonne National Laboratory
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list