<div dir="ltr"><div class="gmail_default"><font face="arial, helvetica, sans-serif">Bitcasting is only valid between types of the same size, so you can bitcast to i4 and then directly do a cmp i4 %castval, 0 etc.</font></div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div class="gmail_default"><span style="font-family:arial,helvetica,sans-serif">Amara </span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 4 August 2017 at 15:03, Haidl, Michael <span dir="ltr"><<a href="mailto:michael.haidl@uni-muenster.de" target="_blank">michael.haidl@uni-muenster.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I assume smaller types like <4 x i1> are getting zero extended to e.g., i8?<br>
<span class=""><br>
Am 04.08.2017 um 15:58 schrieb Amara Emerson:<br>
> Actually for mask vectors of i1 values, you don't need to use reductions<br>
> at all(although for SVE this is what we'll do). You can instead bitcast<br>
> the vector value to an i8/i16/whatever and then compare against zero.<br>
><br>
> Amara<br>
><br>
> On 4 August 2017 at 14:55, Haidl, Michael via llvm-dev<br>
</span><div><div class="h5">> <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a> <mailto:<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.<wbr>org</a>>> wrote:<br>
><br>
><br>
> I am currently working on a transformation pass that transforms<br>
> masked.load and masked.store intrinsics to (hopefully) increase<br>
> performance on targets where masked.load and masked.store are not legal.<br>
> To check if the loads and stores are necessary at all I take the mask<br>
> for the masked operations and want to reduce them to a single value.<br>
> vector.reduce.or seemed very handy to do the job.<br>
><br>
> I will take a look into the function you suggested. Maybe I can come up<br>
> with something that drives the development of these intrinsics ahead.<br>
><br>
> Cheers,<br>
> Michael<br>
><br>
> Am 04.08.2017 um 15:25 schrieb Amara Emerson:<br>
> > Can you tell us what you're looking to do with the intrinsics?<br>
> ><br>
> > On all non-AArch64 targets the ExpandReductions pass will convert<br>
> them<br>
> > to the shuffle pattern as you're seeing. That pass was written in<br>
> order<br>
> > to allow experimentation of the effects of using reduction<br>
> intrinsics at<br>
> > the IR level only, hence we convert into the shuffles very late<br>
> in the<br>
> > pass pipeline.<br>
> ><br>
> > Since we haven't seen any adverse effects of representing the<br>
> reductions<br>
> > as intrinsics at the IR level, I think in that respect the intrinsics<br>
> > have probably proven themselves to be stable. However the error<br>
> you're<br>
> > seeing is because the AArch64 backend still expects to deal with only<br>
> > intrinsics it can *natively* support, and i1 is not a natively<br>
> supported<br>
> > type for reductions. See the code in<br>
> > AArch64TargetTransformInfo.<wbr>cpp:useReductionIntrinsic() for where we<br>
> > decide which reduction types we can support.<br>
> ><br>
> > For these cases, we need to implement more generic legalization<br>
> support<br>
> > in order to either promote to a legal type, or in cases where the<br>
> target<br>
> > cannot support it as a native operation at all, to expand it to a<br>
> > shuffle pattern as a fallback. Once we have all that in place, I<br>
> think<br>
> > we're in a strong position to move to the intrinsic form as the<br>
> > canonical representation.<br>
> ><br>
> > FYI one of the motivating reasons for these to be introduced was to<br>
> > allow non power-of-2 vector architectures like SVE to express<br>
> reduction<br>
> > operations.<br>
> ><br>
> > Amara<br>
> ><br>
> > On 4 August 2017 at 13:36, Haidl, Michael<br>
> <<a href="mailto:michael.haidl@uni-muenster.de">michael.haidl@uni-muenster.de</a> <mailto:<a href="mailto:michael.haidl@uni-muenster.de">michael.haidl@uni-<wbr>muenster.de</a>><br>
</div></div>> > <mailto:<a href="mailto:michael.haidl@uni-muenster.de">michael.haidl@uni-<wbr>muenster.de</a><br>
<div><div class="h5">> <mailto:<a href="mailto:michael.haidl@uni-muenster.de">michael.haidl@uni-<wbr>muenster.de</a>>>> wrote:<br>
> ><br>
> > Hi Renato,<br>
> ><br>
> > just to make it clear, I didn't implement reductions on<br>
> x86_64 they just<br>
> > worked when I tried to lower an<br>
> > llvm.experimentel.vector.<wbr>reduce.or.i1.v8i1 intrinsic. A<br>
> shuffle pattern<br>
> > is generated for the intrinsic.<br>
> ><br>
> > vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]<br>
> > vpor %xmm1, %xmm0, %xmm0<br>
> > vpshufd $229, %xmm0, %xmm1 # xmm1 = xmm0[1,1,2,3]<br>
> > vpor %xmm1, %xmm0, %xmm0<br>
> > vpsrld $16, %xmm0, %xmm1<br>
> > vpor %xmm1, %xmm0, %xmm0<br>
> > vpextrb $0, %xmm0, %eax<br>
> ><br>
> ><br>
> > However, on AArche64 I encountered an unreachable where<br>
> codegen does not<br>
> > know how to promote the i1 type. Since I am more familiar<br>
> with the<br>
> > midlevel I have to start digging into codegen. Any hints<br>
> where to start<br>
> > would be awesome.<br>
> ><br>
> > Cheers,<br>
> > Michael<br>
> ><br>
> > Am 04.08.2017 um 08:18 schrieb Renato Golin:<br>
> > > On 3 August 2017 at 19:48, Haidl, Michael via llvm-dev<br>
> > > <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a> <mailto:<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.<wbr>org</a>><br>
</div></div>> <mailto:<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.<wbr>org</a> <mailto:<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.<wbr>org</a>>>><br>
<div class="HOEnZb"><div class="h5">> wrote:<br>
> > >> thank you for the clarification. I tested the intrinsics<br>
> x86_64<br>
> > and it<br>
> > >> seemed to work pretty well. Looking forward to try this<br>
> > intrinsics with<br>
> > >> the AArch64 backend. Maybe I find the time to look into<br>
> codegen<br>
> > to get<br>
> > >> this intrinsics out of experimental stage. They seem<br>
> pretty useful.<br>
> > ><br>
> > > In addition to Amara's point, it'd be good to have it<br>
> working and<br>
> > > default for other architectures before we can move out of<br>
> > experimental<br>
> > > if we indeed intend to make it non-arch-specific (which we<br>
> do).<br>
> > ><br>
> > > So, if you could share your code for the x86 port, that'd<br>
> be great.<br>
> > > But if you could help with the final touches on the<br>
> code-gen part,<br>
> > > that'd be awesome.<br>
> > ><br>
> > > cheers,<br>
> > > --renato<br>
> > ><br>
> ><br>
> ><br>
><br>
> ______________________________<wbr>_________________<br>
> LLVM Developers mailing list<br>
</div></div><div class="HOEnZb"><div class="h5">> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a> <mailto:<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.<wbr>org</a>><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
> <<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-<wbr>bin/mailman/listinfo/llvm-dev</a>><br>
><br>
><br>
</div></div></blockquote></div><br></div>