[PATCH] [X86][SSE] Vector double -> float conversion memory folding (cvtpd2ps)
Simon Pilgrim
llvm-dev at redking.me.uk
Mon Dec 15 09:35:08 PST 2014
Hi qcolombet, spatel, andreadb,
Added a missing memory folding relationship for the (V)CVTPD2PS instruction (and its AVX variants) - we can safely fold these for stack reloads.
Follow up to http://reviews.llvm.org/D5981
I'd like to add the (V)CVTPS2PD and (V)CVTDQ2PD instructions as well but I'm hitting issues with irrelevant register/memory size differences in the ymm implementations - it reloads the whole ymm and then references the lower xmm as the src for the conversion. Any suggestions on how I should deal with this? The xmm versions seem to fold fine but I'd prefer to add them all at the same time in my next patch.
REPOSITORY
rL LLVM
http://reviews.llvm.org/D6663
Files:
lib/Target/X86/X86InstrInfo.cpp
test/CodeGen/X86/avx1-stack-reload-folding.ll
Index: lib/Target/X86/X86InstrInfo.cpp
===================================================================
--- lib/Target/X86/X86InstrInfo.cpp
+++ lib/Target/X86/X86InstrInfo.cpp
@@ -450,6 +450,7 @@
{ X86::CVTSS2SIrr, X86::CVTSS2SIrm, 0 },
{ X86::CVTDQ2PSrr, X86::CVTDQ2PSrm, TB_ALIGN_16 },
{ X86::CVTPD2DQrr, X86::CVTPD2DQrm, TB_ALIGN_16 },
+ { X86::CVTPD2PSrr, X86::CVTPD2PSrm, TB_ALIGN_16 },
{ X86::CVTPS2DQrr, X86::CVTPS2DQrm, TB_ALIGN_16 },
{ X86::CVTTPD2DQrr, X86::CVTTPD2DQrm, TB_ALIGN_16 },
{ X86::CVTTPS2DQrr, X86::CVTTPS2DQrm, TB_ALIGN_16 },
@@ -531,6 +532,7 @@
{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, 0 },
{ X86::VCVTDQ2PSrr, X86::VCVTDQ2PSrm, 0 },
{ X86::VCVTPD2DQrr, X86::VCVTPD2DQXrm, 0 },
+ { X86::VCVTPD2PSrr, X86::VCVTPD2PSXrm, 0 },
{ X86::VCVTPS2DQrr, X86::VCVTPS2DQrm, 0 },
{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQXrm, 0 },
{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },
@@ -569,6 +571,7 @@
// AVX 256-bit foldable instructions
{ X86::VCVTDQ2PSYrr, X86::VCVTDQ2PSYrm, 0 },
{ X86::VCVTPD2DQYrr, X86::VCVTPD2DQYrm, 0 },
+ { X86::VCVTPD2PSYrr, X86::VCVTPD2PSYrm, 0 },
{ X86::VCVTPS2DQYrr, X86::VCVTPS2DQYrm, 0 },
{ X86::VCVTTPD2DQYrr, X86::VCVTTPD2DQYrm, 0 },
{ X86::VCVTTPS2DQYrr, X86::VCVTTPS2DQYrm, 0 },
Index: test/CodeGen/X86/avx1-stack-reload-folding.ll
===================================================================
--- test/CodeGen/X86/avx1-stack-reload-folding.ll
+++ test/CodeGen/X86/avx1-stack-reload-folding.ll
@@ -37,6 +37,21 @@
ret void
}
+define void @stack_fold_cvtpd2ps(<128 x double>* %a, <128 x double>* %b, <128 x float>* %c) {
+ ;CHECK-LABEL: stack_fold_cvtpd2ps
+ ;CHECK: vcvtpd2psy {{[0-9]*}}(%rsp), {{%xmm[0-9][0-9]*}} {{.*#+}} 32-byte Folded Reload
+
+ %1 = load <128 x double>* %a
+ %2 = load <128 x double>* %b
+ %3 = fadd <128 x double> %1, %2
+ %4 = fsub <128 x double> %1, %2
+ %5 = fptrunc <128 x double> %3 to <128 x float>
+ %6 = fptrunc <128 x double> %4 to <128 x float>
+ %7 = fadd <128 x float> %5, %6
+ store <128 x float> %7, <128 x float>* %c
+ ret void
+}
+
define void @stack_fold_cvttpd2dq(<64 x double>* %a, <64 x double>* %b, <64 x i32>* %c) #0 {
;CHECK-LABEL: stack_fold_cvttpd2dq
;CHECK: vcvttpd2dqy {{[0-9]*}}(%rsp), {{%xmm[0-9][0-9]*}} {{.*#+}} 32-byte Folded Reload
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6663.17292.patch
Type: text/x-patch
Size: 2624 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141215/2c0ac001/attachment.bin>
More information about the llvm-commits
mailing list