[PATCH] D58197: [x86] vectorize more cast ops in lowering to avoid register file transfers

Fri Feb 22 06:38:29 PST 2019

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll:5874
+; AVX512VLDQ-NEXT:    vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
+; AVX512VLDQ-NEXT:    vcvtudq2pd %xmm0, %ymm0
+; AVX512VLDQ-NEXT:    # kill: def $xmm0 killed $xmm0 killed $ymm0
----------------
efriedma wrote:
> Why not "vcvtudq2pd %xmm0, %xmm0"?
We're only matching the generic UINT_TO_FP node, so we go from <4 x i32> to <4 x double>. That's also why the SSE targets don't get the similar SINT_TO_FP test above here. I can look into how the SINT_TO_FP example gets narrowed and try to make that happen here too.

There's no documentation that the generic nodes can change the number of elements in the vector, so I'm assuming they don't have that ability. Currently, we use the X86ISD::CVTSI2P for those patterns, so I think we need to extend the matching logic to handle that case to solve this more completely.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58197/new/

https://reviews.llvm.org/D58197