<p dir="ltr"><br>
> For example, I have the following IR code,<br>
><br>
> for.cond.preheader: ; preds = %if.end18<br>
> %mul = mul i32 %12, %3<br>
> %cmp21128 = icmp sgt i32 %mul, 0<br>
> br i1 %cmp21128, label %for.body.preheader, label %return<br>
><br>
> for.body.preheader: ; preds = %for.cond.preheader<br>
> %19 = mul i32 %12, %3<br>
> %20 = add i32 %19, -1<br>
> %21 = zext i32 %20 to i64<br>
> %22 = add i64 %21, 1<br>
> %end.idx = add i64 %21, 1<br>
> %n.vec = and i64 %22,<a href="tel:8589934584"> 8589934584</a><br>
> %cmp.zero = icmp eq i64 %n.vec, 0<br>
> br i1 %cmp.zero, label %middle.block, label %<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__vector.ph&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=XnaNGDkPiVxI0d-8smaG9_dZ2a0e4ut9IDxfRSsDKPc&e=">vector.ph</a><br>
><br>
> The corresponding assembly code is:<br>
> # BB#3: # %for.cond.preheader <br>
> imull %r9d, %ebx <br>
> testl %ebx, %ebx <br>
> jle .LBB10_63 <br>
> # BB#4: # %for.body.preheader <br>
> leal -1(%rbx), %eax <br>
> incq %rax <br>
> xorl %edx, %edx <br>
> movabsq $8589934584, %rcx # imm = 0x1FFFFFFF8 <br>
> andq %rax, %rcx <br>
> je .LBB10_8 <br>
><br>
> I changed all the scalar operands to <2 x ValueType> ones. The IR becomes the following<br>
> for.cond.preheader: ; preds = %if.end18<br>
> %mulS44_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat<br>
> %cmp21128S45_D = icmp sgt <2 x i32> %mulS44_D, zeroinitializer<br>
> %sextS46_D = sext <2 x i1> %cmp21128S45_D to <2 x i64><br>
> %BCS46_D = bitcast <2 x i64> %sextS46_D to i128<br>
> %mskS46_D = icmp ne i128 %BCS46_D, 0<br>
> br i1 %mskS46_D, label %for.body.preheader, label %return<br>
><br>
> for.body.preheader: ; preds = %for.cond.preheader<br>
> %S47_D = mul <2 x i32> %splatLDS24_D.splat, %splatLDS7_D.splat<br>
> %S48_D = add <2 x i32> %S47_D, <i32 -1, i32 -1><br>
> %S49_D = zext <2 x i32> %S48_D to <2 x i64><br>
> %S50_D = add <2 x i64> %S49_D, <i64 1, i64 1><br>
> %end.idxS51_D = add <2 x i64> %S49_D, <i64 1, i64 1><br>
> %n.vecS52_D = and <2 x i64> %S50_D, <i64<a href="tel:8589934584"> 8589934584</a>, i64<a href="tel:8589934584"> 8589934584</a>><br>
> %cmp.zeroS53_D = icmp eq <2 x i64> %n.vecS52_D, zeroinitializer<br>
> %sextS54_D = sext <2 x i1> %cmp.zeroS53_D to <2 x i64><br>
> %BCS54_D = bitcast <2 x i64> %sextS54_D to i128<br>
> %mskS54_D = icmp ne i128 %BCS54_D, 0<br>
> br i1 %mskS54_D, label %middle.block, label %<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__vector.ph&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=XnaNGDkPiVxI0d-8smaG9_dZ2a0e4ut9IDxfRSsDKPc&e=">vector.ph</a><br>
> <br>
> Now the assembly for the above IR code is:<br>
> # BB#4: # %for.cond.preheader<br>
> vmovdqa 144(%rsp), %xmm0 # 16-byte Reload<br>
> vpmuludq %xmm7, %xmm0, %xmm2<br>
> vpsrlq $32, %xmm7, %xmm4<br>
> vpmuludq %xmm4, %xmm0, %xmm4<br>
> vpsllq $32, %xmm4, %xmm4<br>
> vpaddq %xmm4, %xmm2, %xmm2<br>
> vpsrlq $32, %xmm0, %xmm4<br>
> vpmuludq %xmm7, %xmm4, %xmm4<br>
> vpsllq $32, %xmm4, %xmm4<br>
> vpaddq %xmm4, %xmm2, %xmm2<br>
> vpextrq $1, %xmm2, %rax<br>
> cltq<br>
> vmovq %rax, %xmm4<br>
> vmovq %xmm2, %rax<br>
> cltq<br>
> vmovq %rax, %xmm5<br>
> vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]<br>
> vpcmpgtq %xmm3, %xmm4, %xmm3<br>
> vptest %xmm3, %xmm3<br>
> je .LBB10_66<br>
> # BB#5: # %for.body.preheader<br>
> vpaddq %xmm15, %xmm2, %xmm3<br>
> vpand %xmm15, %xmm3, %xmm3<br>
> vpaddq .LCPI10_1(%rip), %xmm3, %xmm8<br>
> vpand .LCPI10_5(%rip), %xmm8, %xmm5<br>
> vpxor %xmm4, %xmm4, %xmm4<br>
> vpcmpeqq %xmm4, %xmm5, %xmm6<br>
> vptest %xmm6, %xmm6<br>
> jne .LBB10_9<br>
></p>
<p dir="ltr">As Mats pointed out, this may be the same problem as:<br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D22703&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=SDoYuIy6XMg5QeT_ZoA1Msgn3nYF_e_UDo3fzHdopnY&e=">https</a><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D22703&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=SDoYuIy6XMg5QeT_ZoA1Msgn3nYF_e_UDo3fzHdopnY&e=">://</a><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D22703&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=SDoYuIy6XMg5QeT_ZoA1Msgn3nYF_e_UDo3fzHdopnY&e=">llvm.org</a><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D22703&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=SDoYuIy6XMg5QeT_ZoA1Msgn3nYF_e_UDo3fzHdopnY&e=">/bugs/show_</a><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D22703&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=SDoYuIy6XMg5QeT_ZoA1Msgn3nYF_e_UDo3fzHdopnY&e=">bug.cgi</a><a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D22703&d=AwMGaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=V74t0FFPX6HxWOiCxg9yIRmS9HxVtcscoxvoqwRyuXI&s=SDoYuIy6XMg5QeT_ZoA1Msgn3nYF_e_UDo3fzHdopnY&e=">?id=22703</a></p>
<p dir="ltr">Basically, the code is generated for AVX2 where register XMM are 128 bits. Some of the above ops are <2 x i32> and involve sext to <2 x i64>, bitcast, etc. Hence the code has extra vector instructions.</p>
<p dir="ltr">Regards,<br>
Suyog Sarda</p>