[PATCH] D28537: [X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 11 18:05:11 PST 2017


craig.topper added inline comments.


================
Comment at: test/CodeGen/X86/avx512-cvt.ll:943
 ; KNL-NEXT:    vcmpltps %xmm0, %xmm1, %xmm0
-; KNL-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0,1],zero,xmm0[1]
+; KNL-NEXT:    vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
+; KNL-NEXT:    vpsllq $32, %xmm0, %xmm0
----------------
craig.topper wrote:
> delena wrote:
> > I think that this issue should be resolved, or at least understood prior to commit.
> FYI, this test case does something really terrible when avx512vl is enabled without avx512dq. Its get completely scalarized due to lack of vpmovm2d.
So what's happening here is that the setcc from the cmp gets its result type converted from v2i1 to v2i32 then to v2i64. Then the operands get legalized from v2i32 to v4i32 and we end up with (v2i64 sign_extend (v2i32 extract_subvector (v4i32 setcc)). The sign_extend becomes a sign_extend_inreg that is now implemented with a v2i64 vshli(32) and vsrai(32). Previously because v2i64 vsrai wasn't legal we lowered it to a v4i32 vsrai(31) and a shuffle. There was another shuffle later that wanted elements from this shuffle that didn't come from the VSRAI so it got removed. And I think the vshli(32) was able to get combined with other shuffles to produce the INSERTPS.

So the main issue here is that the setcc legalizing for this produces a v2i64 type that we don't need and aren't able to recover from very well.


Repository:
  rL LLVM

https://reviews.llvm.org/D28537





More information about the llvm-commits mailing list