[PATCH] D22064: [X86] Make some cast costs more precise
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 7 10:44:51 PDT 2016
mkuper added inline comments.
================
Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:540
@@ -539,3 +539,3 @@
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i64, 1 },
{ ISD::UINT_TO_FP, MVT::v8f64, MVT::v8i64, 1 },
----------------
delena wrote:
> mkuper wrote:
> > RKSimon wrote:
> > > Depending on how thorough we need to be shouldn't there be AVX512DQ+AVX512VL UINT_TO_FP cases for 128/256 bit vectors?
> > Probably.
> > I'd rather leave that to the Intel folks, they can probably get more precise numbers for SKX.
> In this case, even if you have only DQ without VL, the conversion is in ZMM instead of YMM, but the cost is the same.
We don't do this right now, see below.
================
Comment at: test/Analysis/CostModel/X86/sitofp.ll:273
@@ -272,3 +272,3 @@
; AVX512F-LABEL: sitofpv4i64v4double
- ; AVX512F: cost of 10 {{.*}} sitofp
+ ; AVX512F: cost of 13 {{.*}} sitofp
%1 = sitofp <4 x i64> %a to <4 x double>
----------------
delena wrote:
> We should have a nicer cost for DQ here, because it handles all 64 bit integers, right?
Right now, we scalarize this unless we have VL.
That is, both F and F+DQ produce:
```
vextracti128 $1, %ymm0, %xmm1
vpextrq $1, %xmm1, %rax
vcvtsi2sdq %rax, %xmm0, %xmm2
vmovq %xmm1, %rax
vcvtsi2sdq %rax, %xmm0, %xmm1
vunpcklpd %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0]
vpextrq $1, %xmm0, %rax
vcvtsi2sdq %rax, %xmm0, %xmm2
vmovq %xmm0, %rax
vcvtsi2sdq %rax, %xmm0, %xmm0
vunpcklpd %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vinsertf128 $1, %xmm1, %ymm0, %ymm0
retq
```
And with VL:
```
vcvtqq2pd %ymm0, %ymm0
retq
```
I guess we could, potentially, have a nicer sequence with DQ without VL (insert low lanes, vcvtqq2pd, extract low lanes), but we currently don't.
http://reviews.llvm.org/D22064
More information about the llvm-commits
mailing list