[LLVMdev] Can LLVM vectorize <2 x i32> type

Fri Jun 26 10:18:23 PDT 2015

For example, I have the following IR code,

for.cond.preheader:                               ; preds = %if.end18
  %mul = mul i32 %12, %3
  %cmp21128 = icmp sgt i32 %mul, 0
  br i1 %cmp21128, label %for.body.preheader, label %return

for.body.preheader:                               ; preds =
%for.cond.preheader
  %19 = mul i32 %12, %3
  %20 = add i32 %19, -1
  %21 = zext i32 %20 to i64
  %22 = add i64 %21, 1
  %end.idx = add i64 %21, 1
  %n.vec = and i64 %22, 8589934584
  %cmp.zero = icmp eq i64 %n.vec, 0
  br i1 %cmp.zero, label %middle.block, label %vector.ph

The corresponding assembly code is:
# BB#3:                                 # %for.cond.preheader

    imull   %r9d, %ebx

    testl   %ebx, %ebx

    jle .LBB10_63

# BB#4:                                 # %for.body.preheader

    leal    -1(%rbx), %eax

    incq    %rax

    xorl    %edx, %edx

    movabsq $8589934584, %rcx       # imm = 0x1FFFFFFF8

    andq    %rax, %rcx

    je  .LBB10_8

I changed all the scalar operands to <2 x ValueType> ones. The IR becomes
the following
for.cond.preheader:                               ; preds = %if.end18
  %mulS44_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat
  %cmp21128S45_D = icmp sgt <2 x i32> %mulS44_D, zeroinitializer
  %sextS46_D = sext <2 x i1> %cmp21128S45_D to <2 x i64>
  %BCS46_D = bitcast <2 x i64> %sextS46_D to i128
  %mskS46_D = icmp ne i128 %BCS46_D, 0
  br i1 %mskS46_D, label %for.body.preheader, label %return

for.body.preheader:                               ; preds =
%for.cond.preheader
  %S47_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat
  %S48_D = add <2 x i32> %S47_D, <i32 -1, i32 -1>
  %S49_D = zext <2 x i32> %S48_D to <2 x i64>
  %S50_D = add <2 x i64> %S49_D, <i64 1, i64 1>
  %end.idxS51_D = add <2 x i64> %S49_D, <i64 1, i64 1>
  %n.vecS52_D = and <2 x i64> %S50_D, <i64 8589934584, i64 8589934584>
  %cmp.zeroS53_D = icmp eq <2 x i64> %n.vecS52_D, zeroinitializer
  %sextS54_D = sext <2 x i1> %cmp.zeroS53_D to <2 x i64>
  %BCS54_D = bitcast <2 x i64> %sextS54_D to i128
  %mskS54_D = icmp ne i128 %BCS54_D, 0
  br i1 %mskS54_D, label %middle.block, label %vector.ph

Now the assembly for the above IR code is:
# BB#4:                                 # %for.cond.preheader
    vmovdqa 144(%rsp), %xmm0        # 16-byte Reload
    vpmuludq    %xmm7, %xmm0, %xmm2
    vpsrlq  $32, %xmm7, %xmm4
    vpmuludq    %xmm4, %xmm0, %xmm4
    vpsllq  $32, %xmm4, %xmm4
    vpaddq  %xmm4, %xmm2, %xmm2
    vpsrlq  $32, %xmm0, %xmm4
    vpmuludq    %xmm7, %xmm4, %xmm4
    vpsllq  $32, %xmm4, %xmm4
    vpaddq  %xmm4, %xmm2, %xmm2
    vpextrq $1, %xmm2, %rax
    cltq
    vmovq   %rax, %xmm4
    vmovq   %xmm2, %rax
    cltq
    vmovq   %rax, %xmm5
    vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]
    vpcmpgtq    %xmm3, %xmm4, %xmm3
    vptest  %xmm3, %xmm3
    je  .LBB10_66
# BB#5:                                 # %for.body.preheader
    vpaddq  %xmm15, %xmm2, %xmm3
    vpand   %xmm15, %xmm3, %xmm3
    vpaddq  .LCPI10_1(%rip), %xmm3, %xmm8
    vpand   .LCPI10_5(%rip), %xmm8, %xmm5
    vpxor   %xmm4, %xmm4, %xmm4
    vpcmpeqq    %xmm4, %xmm5, %xmm6
    vptest  %xmm6, %xmm6
    jne .LBB10_9

It turned out that the vector one is way more complicated than the scalar
one. I was expecting that it would be not so tedious.

On Fri, Jun 26, 2015 at 3:49 AM, suyog sarda <sardask01 at gmail.com> wrote:

>
> >
> > Is LLVM be able to generate code for the following code?
> >
> > %mul = mul <2 x i32> %1, %2, where %1 > and %2 are <2 x i32> type.
>
> > I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
> will generates really complicated code with vpaddq, vpmuludq, vpsllq,
> vpsrlq.
> >
>
> Can you please elaborate more on what is your test case and what do you
> want to see the final output? It will be good if you can give test case you
> are running LLVM on.
>
> Regards,
> Suyog Sarda
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150626/47d7e933/attachment.html>