<div dir="ltr"><div>It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing also happens to zext <2 x i32> -> <2 x i64> and trunc <2 x i64> -> <2 x i32>. Any ideas to optimize these instructions? Thanks.</div><div><br></div>%sub.ptr.sub.i6.i.i.i.i = sub <2 x i64> %sub.ptr.lhs.cast.i4.i.i.i.i, %sub.ptr.rhs.cast.i5.i.i.i.i<div>%sub.ptr.div.i7.i.i.i.i = sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24><br></div><div><br></div><div>Assembly:</div><div><div>    vpsubq  %xmm6, %xmm5, %xmm5</div><div>    vmovq   %xmm5, %rax</div><div>    movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB                                     </div><div>    imulq   %rbx</div><div>    movq    %rdx, %rcx                                                                                </div><div>    movq    %rcx, %rax                                                                                </div><div>    shrq    $63, %rax                                                                                 </div><div>    shrq    $2, %rcx</div><div>    addl    %eax, %ecx </div><div>    vpextrq $1, %xmm5, %rax                                                                           </div><div>    imulq   %rbx</div><div>    movq    %rdx, %rax                                                                                </div><div>    shrq    $63, %rax                                                                                 </div><div>    shrq    $2, %rdx</div><div>    addl    %eax, %edx                                                                                </div><div>    movslq  %edx, %rax</div><div>    vmovq   %rax, %xmm5                                                                               </div><div>    movslq  %ecx, %rax</div><div>    vmovq   %rax, %xmm6</div><div>    vpunpcklqdq %xmm5, %xmm6, %xmm5 # xmm5 = xmm6[0],xmm5[0]      </div></div></div>