[PATCH] D43441: [X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets (PR31630)
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 14 14:49:58 PDT 2018
craig.topper added inline comments.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:25279
+ SDValue Res = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, VecInVT,
+ DAG.getConstantFP(0.0, dl, VecInVT),
+ Src, ZeroIdx);
----------------
delena wrote:
> Why do you need to insert into zero vector? Can you insert to undef?
I think so. I asked the same question before I commandeered it. It's probably no worse than the widening with undef we do for v2f32 legalization.
================
Comment at: test/CodeGen/X86/scalar-fp-to-i64.ll:541
+; AVX512DQVL_32_LIN: # %bb.0:
+; AVX512DQVL_32_LIN-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
+; AVX512DQVL_32_LIN-NEXT: vcvttpd2uqq %ymm0, %ymm0
----------------
delena wrote:
> Can the memory operand be folded here?
> VCVTTPD2UQQ ymm1 {k1}{z},ymm2/m256/**m64bcst**
We'd have to detect the load and the possibilty of folding it during this lowering code. Or we'd have to use undef for the upper elts and add a DAG combine to turn insert into undef into a broadcast if its foldable.
https://reviews.llvm.org/D43441
More information about the llvm-commits
mailing list