[PATCH] D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers

Fri Feb 22 07:55:23 PST 2019

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll:5874
+; AVX512VLDQ-NEXT:    vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
+; AVX512VLDQ-NEXT:    vcvtudq2pd %xmm0, %ymm0
+; AVX512VLDQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
----------------
spatel wrote:
> efriedma wrote:
> > Why not "vcvtudq2pd %xmm0, %xmm0"?
> We're only matching the generic UINT_TO_FP node, so we go from <4 x i32> to <4 x double>. That's also why the SSE targets don't get the similar SINT_TO_FP test above here. I can look into how the SINT_TO_FP example gets narrowed and try to make that happen here too.
> 
> There's no documentation that the generic nodes can change the number of elements in the vector, so I'm assuming they don't have that ability. Currently, we use the X86ISD::CVTSI2P for those patterns, so I think we need to extend the matching logic to handle that case to solve this more completely.
rL354675

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58197/new/

https://reviews.llvm.org/D58197