[PATCH] D43441: [X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets (PR31630)

Mon May 14 14:49:58 PDT 2018

craig.topper added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:25279
+      SDValue Res = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, VecInVT,
+                                DAG.getConstantFP(0.0, dl, VecInVT),
+                                Src, ZeroIdx);
----------------
delena wrote:
> Why do you need to insert into zero vector? Can you insert to undef?
I think so. I asked the same question before I commandeered it. It's probably no worse than the widening with undef we do for v2f32 legalization.

================
Comment at: test/CodeGen/X86/scalar-fp-to-i64.ll:541
+; AVX512DQVL_32_LIN:       # %bb.0:
+; AVX512DQVL_32_LIN-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX512DQVL_32_LIN-NEXT:    vcvttpd2uqq %ymm0, %ymm0
----------------
delena wrote:
> Can the memory operand be folded here?
> VCVTTPD2UQQ ymm1 {k1}{z},ymm2/m256/**m64bcst**
We'd have to detect the load and the possibilty of folding it during this lowering code. Or we'd have to use undef for the upper elts and add a DAG combine to turn insert into undef into a broadcast if its foldable.

https://reviews.llvm.org/D43441