[llvm-commits] CVS: llvm/lib/Target/X86/README.txt

Chris Lattner lattner at cs.uiuc.edu
Tue Apr 18 22:53:39 PDT 2006



Changes in directory llvm/lib/Target/X86:

README.txt updated: 1.92 -> 1.93
---
Log message:

Add a note.


---
Diffs of the changes:  (+58 -0)

 README.txt |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 58 insertions(+)


Index: llvm/lib/Target/X86/README.txt
diff -u llvm/lib/Target/X86/README.txt:1.92 llvm/lib/Target/X86/README.txt:1.93
--- llvm/lib/Target/X86/README.txt:1.92	Mon Apr 17 22:45:01 2006
+++ llvm/lib/Target/X86/README.txt	Wed Apr 19 00:53:27 2006
@@ -996,3 +996,61 @@
 	movaps %xmm3, %xmm2
 	movaps %xmm4, %xmm3
 	jne LBB_main_4	# cond_true44
+
+//===---------------------------------------------------------------------===//
+
+Use the 0's in the top part of movss from memory (and from other instructions
+that generate them) to build vectors more efficiently.  Consider:
+
+vector float test(float a) {
+ return (vector float){ 0.0, a, 0.0, 0.0}; 
+}
+
+We currently generate this as:
+
+_test:
+        sub %ESP, 28
+        movss %XMM0, DWORD PTR [%ESP + 32]
+        movss DWORD PTR [%ESP + 4], %XMM0
+        mov DWORD PTR [%ESP + 12], 0
+        mov DWORD PTR [%ESP + 8], 0
+        mov DWORD PTR [%ESP], 0
+        movaps %XMM0, XMMWORD PTR [%ESP]
+        add %ESP, 28
+        ret
+
+Something like this should be sufficient:
+
+_test:
+	movss %XMM0, DWORD PTR [%ESP + 4]
+	shufps %XMM0, %XMM0, 81
+	ret
+
+... which takes advantage of the zero elements provided by movss.
+Even xoring a register and shufps'ing IT would be better than the
+above code.
+
+Likewise, for this:
+
+vector float test(float a, float b) {
+ return (vector float){ b, a, 0.0, 0.0}; 
+}
+
+_test:
+        pxor %XMM0, %XMM0
+        movss %XMM1, %XMM0
+        movss %XMM2, DWORD PTR [%ESP + 4]
+        unpcklps %XMM2, %XMM1
+        movss %XMM0, DWORD PTR [%ESP + 8]
+        unpcklps %XMM0, %XMM1
+        unpcklps %XMM0, %XMM2
+        ret
+
+... where we do use pxor, it would be better to use the zero'd 
+elements that movss provides to turn this into 2 shufps's instead
+of 3 unpcklps's.
+
+Another example: {0.0, 0.0, a, b }
+
+//===---------------------------------------------------------------------===//
+






More information about the llvm-commits mailing list