[LLVMdev] Can LLVM vectorize <2 x i32> type

zhi chen zchenhn at gmail.com
Fri Jun 26 12:08:54 PDT 2015


Thanks for pointing out. In this case, it seems that vectorization is not
actually profitable. Doesn't that mean we always need to sext i32 to i64 so
that we can use the whole lanes in a XMM register? Thanks.

On Fri, Jun 26, 2015 at 11:56 AM, suyog sarda <sardask01 at gmail.com> wrote:

>
> > For example, I have the following IR code,
> >
> > for.cond.preheader:                               ; preds = %if.end18
> >   %mul = mul i32 %12, %3
> >   %cmp21128 = icmp sgt i32 %mul, 0
> >   br i1 %cmp21128, label %for.body.preheader, label %return
> >
> > for.body.preheader:                               ; preds =
> %for.cond.preheader
> >   %19 = mul i32 %12, %3
> >   %20 = add i32 %19, -1
> >   %21 = zext i32 %20 to i64
> >   %22 = add i64 %21, 1
> >   %end.idx = add i64 %21, 1
> >   %n.vec = and i64 %22, 8589934584
> >   %cmp.zero = icmp eq i64 %n.vec, 0
> >   br i1 %cmp.zero, label %middle.block, label %vector.ph
> >
> > The corresponding assembly code is:
> > # BB#3:                                 # %for.cond.preheader
>
> >     imull   %r9d, %ebx
>
> >     testl   %ebx, %ebx
>
> >     jle .LBB10_63
>
> > # BB#4:                                 # %for.body.preheader
>
> >     leal    -1(%rbx), %eax
>
> >     incq    %rax
>
> >     xorl    %edx, %edx
>
> >     movabsq $8589934584, %rcx       # imm = 0x1FFFFFFF8
>
> >     andq    %rax, %rcx
>
> >     je  .LBB10_8
> >
> > I changed all the scalar operands to <2 x ValueType> ones. The IR
> becomes the following
> > for.cond.preheader:                               ; preds = %if.end18
> >   %mulS44_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat
> >   %cmp21128S45_D = icmp sgt <2 x i32> %mulS44_D, zeroinitializer
> >   %sextS46_D = sext <2 x i1> %cmp21128S45_D to <2 x i64>
> >   %BCS46_D = bitcast <2 x i64> %sextS46_D to i128
> >   %mskS46_D = icmp ne i128 %BCS46_D, 0
> >   br i1 %mskS46_D, label %for.body.preheader, label %return
> >
> > for.body.preheader:                               ; preds =
> %for.cond.preheader
> >   %S47_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat
> >   %S48_D = add <2 x i32> %S47_D, <i32 -1, i32 -1>
> >   %S49_D = zext <2 x i32> %S48_D to <2 x i64>
> >   %S50_D = add <2 x i64> %S49_D, <i64 1, i64 1>
> >   %end.idxS51_D = add <2 x i64> %S49_D, <i64 1, i64 1>
> >   %n.vecS52_D = and <2 x i64> %S50_D, <i64 8589934584, i64 8589934584>
> >   %cmp.zeroS53_D = icmp eq <2 x i64> %n.vecS52_D, zeroinitializer
> >   %sextS54_D = sext <2 x i1> %cmp.zeroS53_D to <2 x i64>
> >   %BCS54_D = bitcast <2 x i64> %sextS54_D to i128
> >   %mskS54_D = icmp ne i128 %BCS54_D, 0
> >   br i1 %mskS54_D, label %middle.block, label %vector.ph
> >
> > Now the assembly for the above IR code is:
> > # BB#4:                                 # %for.cond.preheader
> >     vmovdqa 144(%rsp), %xmm0        # 16-byte Reload
> >     vpmuludq    %xmm7, %xmm0, %xmm2
> >     vpsrlq  $32, %xmm7, %xmm4
> >     vpmuludq    %xmm4, %xmm0, %xmm4
> >     vpsllq  $32, %xmm4, %xmm4
> >     vpaddq  %xmm4, %xmm2, %xmm2
> >     vpsrlq  $32, %xmm0, %xmm4
> >     vpmuludq    %xmm7, %xmm4, %xmm4
> >     vpsllq  $32, %xmm4, %xmm4
> >     vpaddq  %xmm4, %xmm2, %xmm2
> >     vpextrq $1, %xmm2, %rax
> >     cltq
> >     vmovq   %rax, %xmm4
> >     vmovq   %xmm2, %rax
> >     cltq
> >     vmovq   %rax, %xmm5
> >     vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]
> >     vpcmpgtq    %xmm3, %xmm4, %xmm3
> >     vptest  %xmm3, %xmm3
> >     je  .LBB10_66
> > # BB#5:                                 # %for.body.preheader
> >     vpaddq  %xmm15, %xmm2, %xmm3
> >     vpand   %xmm15, %xmm3, %xmm3
> >     vpaddq  .LCPI10_1(%rip), %xmm3, %xmm8
> >     vpand   .LCPI10_5(%rip), %xmm8, %xmm5
> >     vpxor   %xmm4, %xmm4, %xmm4
> >     vpcmpeqq    %xmm4, %xmm5, %xmm6
> >     vptest  %xmm6, %xmm6
> >     jne .LBB10_9
> >
>
> As Mats pointed out, this may be the same problem as:
> https <https://llvm.org/bugs/show_bug.cgi?id=22703>://
> <https://llvm.org/bugs/show_bug.cgi?id=22703>llvm.org
> <https://llvm.org/bugs/show_bug.cgi?id=22703>/bugs/show_
> <https://llvm.org/bugs/show_bug.cgi?id=22703>bug.cgi
> <https://llvm.org/bugs/show_bug.cgi?id=22703>?id=22703
> <https://llvm.org/bugs/show_bug.cgi?id=22703>
>
> Basically, the code is generated for AVX2 where register XMM are 128 bits.
> Some of the above ops are <2 x i32> and involve sext to <2 x i64>, bitcast,
> etc. Hence the code has extra vector instructions.
>
> Regards,
> Suyog Sarda
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150626/ab3da0bf/attachment.html>


More information about the llvm-dev mailing list