[PATCH] D22064: [X86] Make some cast costs more precise

Thu Jul 7 10:44:51 PDT 2016

mkuper added inline comments.

================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:540
@@ -539,3 +539,3 @@
     { ISD::UINT_TO_FP,  MVT::v8f32,  MVT::v8i64,  1 },    
     { ISD::UINT_TO_FP,  MVT::v8f64,  MVT::v8i64,  1 },
 
----------------
delena wrote:
> mkuper wrote:
> > RKSimon wrote:
> > > Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors?
> > Probably.
> > I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX.
> In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same.
We don't do this right now, see below.

================
Comment at: test/Analysis/CostModel/X86/sitofp.ll:273
@@ -272,3 +272,3 @@
   ; AVX512F-LABEL: sitofpv4i64v4double
-  ; AVX512F: cost of 10 {{.*}} sitofp
+  ; AVX512F: cost of 13 {{.*}} sitofp
   %1 = sitofp <4 x i64> %a to <4 x double>
----------------
delena wrote:
> We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?
Right now, we scalarize this unless we have VL.

That is, both F and F+DQ produce:

```
	vextracti128	$1, %ymm0, %xmm1
	vpextrq	$1, %xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm1, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm1
	vunpcklpd	%xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
	vpextrq	$1, %xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm2
	vmovq	%xmm0, %rax
	vcvtsi2sdq	%rax, %xmm0, %xmm0
	vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
	vinsertf128	$1, %xmm1, %ymm0, %ymm0
	retq

```
And with VL:

```
	vcvtqq2pd	%ymm0, %ymm0
	retq
```

I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't.


http://reviews.llvm.org/D22064