[PATCH] D23797: [X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result
Michael Kuperstein via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 29 13:28:34 PDT 2016
mkuper added a comment.
Thanks Simon.
================
Comment at: lib/Target/X86/X86InstrSSE.td:2285
@@ -2284,1 +2284,3 @@
// Match fpround and fpextend for 128/256-bit conversions
+ def : Pat<(v4f32 (bitconvert (X86vzmovl (v2f64 (bitconvert
+ (v4f32 (X86vfpround (v2f64 VR128:$src)))))))),
----------------
I've only now realized we represent zeroing the two high lanes of a v4f32 with (v4f32 (bitconvert (X86vzmovl (v2f64 (bitconvert (v4f32 ...)))))) :-\
But there's nothing we can really do about this, right?
================
Comment at: lib/Target/X86/X86IntrinsicsInfo.h:1887
@@ -1886,2 +1886,3 @@
X86_INTRINSIC_DATA(sse2_comineq_sd, COMI, X86ISD::COMI, ISD::SETNE),
+ X86_INTRINSIC_DATA(sse2_cvtpd2ps, INTR_TYPE_1OP, X86ISD::VFPROUND, 0),
X86_INTRINSIC_DATA(sse2_max_pd, INTR_TYPE_2OP, X86ISD::FMAX, 0),
----------------
This (and the change to the intrinsic test) can be a separate commit, right?
Also, any reason not to add avx_cvt_pd2_ps_256 as well?
Repository:
rL LLVM
https://reviews.llvm.org/D23797
More information about the llvm-commits
mailing list