[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

Fri Aug 4 07:28:52 PDT 2017

Thanks, I already found it out the hard way ;) Now it works and looks 
nice and shiny.

Michael

Am 04.08.2017 um 16:20 schrieb Amara Emerson:
> Bitcasting is only valid between types of the same size, so you 
> can bitcast to i4 and then directly do a cmp i4 %castval, 0 etc.
> 
> Amara
> 
> On 4 August 2017 at 15:03, Haidl, Michael <michael.haidl at uni-muenster.de 
> <mailto:michael.haidl at uni-muenster.de>> wrote:
> 
>     I assume smaller types like <4 x i1> are getting zero extended to
>     e.g., i8?
> 
>     Am 04.08.2017 um 15:58 schrieb Amara Emerson:
>     > Actually for mask vectors of i1 values, you don't need to use reductions
>     > at all(although for SVE this is what we'll do). You can instead bitcast
>     > the vector value to an i8/i16/whatever and then compare against zero.
>     >
>     > Amara
>     >
>     > On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev
>      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>>
>     wrote:
>      >
>      >
>      >     I am currently working on a transformation pass that transforms
>      >     masked.load and masked.store intrinsics to (hopefully) increase
>      >     performance on targets where masked.load and masked.store are
>     not legal.
>      >     To check if the loads and stores are necessary at all I take
>     the mask
>      >     for the masked operations and want to reduce them to a single
>     value.
>      >     vector.reduce.or seemed very handy to do the job.
>      >
>      >     I will take a look into the function you suggested. Maybe I
>     can come up
>      >     with something that drives the development of these
>     intrinsics ahead.
>      >
>      >     Cheers,
>      >     Michael
>      >
>      >     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
>      >      > Can you tell us what you're looking to do with the intrinsics?
>      >      >
>      >      > On all non-AArch64 targets the ExpandReductions pass will
>     convert
>      >     them
>      >      > to the shuffle pattern as you're seeing. That pass was
>     written in
>      >     order
>      >      > to allow experimentation of the effects of using reduction
>      >     intrinsics at
>      >      > the IR level only, hence we convert into the shuffles very
>     late
>      >     in the
>      >      > pass pipeline.
>      >      >
>      >      > Since we haven't seen any adverse effects of representing the
>      >     reductions
>      >      > as intrinsics at the IR level, I think in that respect the
>     intrinsics
>      >      > have probably proven themselves to be stable. However the
>     error
>      >     you're
>      >      > seeing is because the AArch64 backend still expects to
>     deal with only
>      >      > intrinsics it can *natively* support, and i1 is not a natively
>      >     supported
>      >      > type for reductions. See the code in
>      >      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for
>     where we
>      >      > decide which reduction types we can support.
>      >      >
>      >      > For these cases, we need to implement more generic
>     legalization
>      >     support
>      >      > in order to either promote to a legal type, or in cases
>     where the
>      >     target
>      >      > cannot support it as a native operation at all, to expand
>     it to a
>      >      > shuffle pattern as a fallback. Once we have all that in
>     place, I
>      >     think
>      >      > we're in a strong position to move to the intrinsic form
>     as the
>      >      > canonical representation.
>      >      >
>      >      > FYI one of the motivating reasons for these to be
>     introduced was to
>      >      > allow non power-of-2 vector architectures like SVE to express
>      >     reduction
>      >      > operations.
>      >      >
>      >      > Amara
>      >      >
>      >      > On 4 August 2017 at 13:36, Haidl, Michael
>      >     <michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>
>     <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>
>      >      > <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>
>      >     <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>>> wrote:
>      >      >
>      >      >     Hi Renato,
>      >      >
>      >      >     just to make it clear, I didn't implement reductions on
>      >     x86_64 they just
>      >      >     worked when I tried to lower an
>      >      >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
>      >     shuffle pattern
>      >      >     is generated for the intrinsic.
>      >      >
>      >      >              vpshufd $78, %xmm0, %xmm1       # xmm1 =
>     xmm0[2,3,0,1]
>      >      >              vpor    %xmm1, %xmm0, %xmm0
>      >      >              vpshufd $229, %xmm0, %xmm1      # xmm1 =
>     xmm0[1,1,2,3]
>      >      >              vpor    %xmm1, %xmm0, %xmm0
>      >      >              vpsrld  $16, %xmm0, %xmm1
>      >      >              vpor    %xmm1, %xmm0, %xmm0
>      >      >              vpextrb $0, %xmm0, %eax
>      >      >
>      >      >
>      >      >     However, on AArche64 I encountered an unreachable where
>      >     codegen does not
>      >      >     know how to promote the i1 type. Since I am more familiar
>      >     with the
>      >      >     midlevel I have to start digging into codegen. Any hints
>      >     where to start
>      >      >     would be awesome.
>      >      >
>      >      >     Cheers,
>      >      >     Michael
>      >      >
>      >      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
>      >      >      > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev
>      >      >      > <llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org> <mailto:llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>
>      >     <mailto:llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org> <mailto:llvm-dev at lists.llvm.org
>     <mailto:llvm-dev at lists.llvm.org>>>>
>      >     wrote:
>      >      >      >> thank you for the clarification. I tested the
>     intrinsics
>      >     x86_64
>      >      >     and it
>      >      >      >> seemed to work pretty well. Looking forward to try
>     this
>      >      >     intrinsics with
>      >      >      >> the AArch64 backend. Maybe I find the time to look
>     into
>      >     codegen
>      >      >     to get
>      >      >      >> this intrinsics out of experimental stage. They seem
>      >     pretty useful.
>      >      >      >
>      >      >      > In addition to Amara's point, it'd be good to have it
>      >     working and
>      >      >      > default for other architectures before we can move
>     out of
>      >      >     experimental
>      >      >      > if we indeed intend to make it non-arch-specific
>     (which we
>      >     do).
>      >      >      >
>      >      >      > So, if you could share your code for the x86 port,
>     that'd
>      >     be great.
>      >      >      > But if you could help with the final touches on the
>      >     code-gen part,
>      >      >      > that'd be awesome.
>      >      >      >
>      >      >      > cheers,
>      >      >      > --renato
>      >      >      >
>      >      >
>      >      >
>      >
>      >     _______________________________________________
>      >     LLVM Developers mailing list
>      > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
>      > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>      >     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>>
>      >
>      >
> 
>