[llvm-dev] Status of llvm.experimental.vector.reduce.* intrinsics

Fri Aug 4 07:03:31 PDT 2017

I assume smaller types like <4 x i1> are getting zero extended to e.g., i8?

Am 04.08.2017 um 15:58 schrieb Amara Emerson:
> Actually for mask vectors of i1 values, you don't need to use reductions 
> at all(although for SVE this is what we'll do). You can instead bitcast 
> the vector value to an i8/i16/whatever and then compare against zero.
> 
> Amara
> 
> On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> 
> 
>     I am currently working on a transformation pass that transforms
>     masked.load and masked.store intrinsics to (hopefully) increase
>     performance on targets where masked.load and masked.store are not legal.
>     To check if the loads and stores are necessary at all I take the mask
>     for the masked operations and want to reduce them to a single value.
>     vector.reduce.or seemed very handy to do the job.
> 
>     I will take a look into the function you suggested. Maybe I can come up
>     with something that drives the development of these intrinsics ahead.
> 
>     Cheers,
>     Michael
> 
>     Am 04.08.2017 um 15:25 schrieb Amara Emerson:
>      > Can you tell us what you're looking to do with the intrinsics?
>      >
>      > On all non-AArch64 targets the ExpandReductions pass will convert
>     them
>      > to the shuffle pattern as you're seeing. That pass was written in
>     order
>      > to allow experimentation of the effects of using reduction
>     intrinsics at
>      > the IR level only, hence we convert into the shuffles very late
>     in the
>      > pass pipeline.
>      >
>      > Since we haven't seen any adverse effects of representing the
>     reductions
>      > as intrinsics at the IR level, I think in that respect the intrinsics
>      > have probably proven themselves to be stable. However the error
>     you're
>      > seeing is because the AArch64 backend still expects to deal with only
>      > intrinsics it can *natively* support, and i1 is not a natively
>     supported
>      > type for reductions. See the code in
>      > AArch64TargetTransformInfo.cpp:useReductionIntrinsic() for where we
>      > decide which reduction types we can support.
>      >
>      > For these cases, we need to implement more generic legalization
>     support
>      > in order to either promote to a legal type, or in cases where the
>     target
>      > cannot support it as a native operation at all, to expand it to a
>      > shuffle pattern as a fallback. Once we have all that in place, I
>     think
>      > we're in a strong position to move to the intrinsic form as the
>      > canonical representation.
>      >
>      > FYI one of the motivating reasons for these to be introduced was to
>      > allow non power-of-2 vector architectures like SVE to express
>     reduction
>      > operations.
>      >
>      > Amara
>      >
>      > On 4 August 2017 at 13:36, Haidl, Michael
>     <michael.haidl at uni-muenster.de <mailto:michael.haidl at uni-muenster.de>
>      > <mailto:michael.haidl at uni-muenster.de
>     <mailto:michael.haidl at uni-muenster.de>>> wrote:
>      >
>      >     Hi Renato,
>      >
>      >     just to make it clear, I didn't implement reductions on
>     x86_64 they just
>      >     worked when I tried to lower an
>      >     llvm.experimentel.vector.reduce.or.i1.v8i1 intrinsic. A
>     shuffle pattern
>      >     is generated for the intrinsic.
>      >
>      >              vpshufd $78, %xmm0, %xmm1       # xmm1 = xmm0[2,3,0,1]
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpshufd $229, %xmm0, %xmm1      # xmm1 = xmm0[1,1,2,3]
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpsrld  $16, %xmm0, %xmm1
>      >              vpor    %xmm1, %xmm0, %xmm0
>      >              vpextrb $0, %xmm0, %eax
>      >
>      >
>      >     However, on AArche64 I encountered an unreachable where
>     codegen does not
>      >     know how to promote the i1 type. Since I am more familiar
>     with the
>      >     midlevel I have to start digging into codegen. Any hints
>     where to start
>      >     would be awesome.
>      >
>      >     Cheers,
>      >     Michael
>      >
>      >     Am 04.08.2017 um 08:18 schrieb Renato Golin:
>      >      > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev
>      >      > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     <mailto:llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>>
>     wrote:
>      >      >> thank you for the clarification. I tested the intrinsics
>     x86_64
>      >     and it
>      >      >> seemed to work pretty well. Looking forward to try this
>      >     intrinsics with
>      >      >> the AArch64 backend. Maybe I find the time to look into
>     codegen
>      >     to get
>      >      >> this intrinsics out of experimental stage. They seem
>     pretty useful.
>      >      >
>      >      > In addition to Amara's point, it'd be good to have it
>     working and
>      >      > default for other architectures before we can move out of
>      >     experimental
>      >      > if we indeed intend to make it non-arch-specific (which we
>     do).
>      >      >
>      >      > So, if you could share your code for the x86 port, that'd
>     be great.
>      >      > But if you could help with the final touches on the
>     code-gen part,
>      >      > that'd be awesome.
>      >      >
>      >      > cheers,
>      >      > --renato
>      >      >
>      >
>      >
> 
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
>