[llvm-commits] CVS: llvm/lib/Target/PowerPC/README_ALTIVEC.txt

Sun Mar 26 23:41:13 PST 2006

Changes in directory llvm/lib/Target/PowerPC:

README_ALTIVEC.txt updated: 1.1 -> 1.2
---
Log message:

Add a bunch of notes from my journey thus far.


---
Diffs of the changes:  (+103 -9)

 README_ALTIVEC.txt |  112 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 103 insertions(+), 9 deletions(-)


Index: llvm/lib/Target/PowerPC/README_ALTIVEC.txt
diff -u llvm/lib/Target/PowerPC/README_ALTIVEC.txt:1.1 llvm/lib/Target/PowerPC/README_ALTIVEC.txt:1.2

--- llvm/lib/Target/PowerPC/README_ALTIVEC.txt:1.1	Mon Mar 27 01:04:16 2006
+++ llvm/lib/Target/PowerPC/README_ALTIVEC.txt	Mon Mar 27 01:41:00 2006
@@ -1,11 +1,5 @@
 //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
 
-Implement TargetConstantVec, and set up PPC to custom lower ConstantVec into
-TargetConstantVec's if it's one of the many forms that are algorithmically
-computable using the spiffy altivec instructions.
-
-//===----------------------------------------------------------------------===//
-
 Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
 registers, to generate better spill code.
 
@@ -31,8 +25,6 @@
 Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
 
-We need to codegen -0.0 vector efficiently (no constant pool load).
-
 When -ffast-math is on, we can use 0.0.
 
 //===----------------------------------------------------------------------===//
@@ -48,7 +40,109 @@
 //===----------------------------------------------------------------------===//
 
 There are a wide range of vector constants we can generate with combinations of
-altivec instructions.  For example, GCC does: t=vsplti*, r = t+t.
+altivec instructions.  Examples
+ GCC does: "t=vsplti*, r = t+t"  for constants it can't generate with one vsplti
+
+ -0.0 (sign bit):  vspltisw v0,-1 / vslw v0,v0,v0
+
+//===----------------------------------------------------------------------===//
+
+Missing intrinsics:
+
+ds*
+lve*
+lvs*
+lvx*
+mf*
+st*
+vavg*
+vexptefp
+vlogefp
+vmax*
+vmhaddshs/vmhraddshs
+vmin*
+vmladduhm
+vmr*
+vmsum*
+vmul*
+vperm
+vpk*
+vr*
+vsel (some aliases only accessible using builtins)
+vsl* (except vsldoi)
+vsr*
+vsum*
+vup*
+
+//===----------------------------------------------------------------------===//
+
+FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0.
+
+//===----------------------------------------------------------------------===//
+
+For functions that use altivec AND have calls, we are VRSAVE'ing all call
+clobbered regs.
+
+//===----------------------------------------------------------------------===//
+
+VSPLTW and friends are expanded by the FE into insert/extract element ops.  Make
+sure that the dag combiner puts them back together in the appropriate 
+vector_shuffle node and that this gets pattern matched appropriately.
+
+//===----------------------------------------------------------------------===//
+
+Implement passing/returning vectors by value.
+
+//===----------------------------------------------------------------------===//
+
+GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
+of C1/C2/C3, then a load and vperm of Variable.
+
+//===----------------------------------------------------------------------===//
+
+We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
+aligned stack slot, followed by a lve*x/vperm.  We should probably just store it
+to a scalar stack slot, then use lvsl/vperm to load it.  If the value is already
+in memory, this is a huge win.
+
+//===----------------------------------------------------------------------===//
+
+Do not generate the MFCR/RLWINM sequence for predicate compares when the
+predicate compare is used immediately by a branch.  Just branch on the right
+cond code on CR6.
+
+//===----------------------------------------------------------------------===//
+
+SROA should turn "vector unions" into the appropriate insert/extract element
+instructions.
+ 
+//===----------------------------------------------------------------------===//
+
+We need an LLVM 'shuffle' instruction, that corresponds to the VECTOR_SHUFFLE
+node.
+
+//===----------------------------------------------------------------------===//
+
+We need a way to teach tblgen that some operands of an intrinsic are required to
+be constants.  The verifier should enforce this constraint.
 
 //===----------------------------------------------------------------------===//
 
+We should instcombine the lvx/stvx intrinsics into loads/stores if we know that
+the loaded address is 16-byte aligned.
+
+//===----------------------------------------------------------------------===//
+
+Instead of writting a pattern for type-agnostic operations (e.g. gen-zero, load,
+store, and, ...) in every supported type, make legalize do the work.  We should
+have a canonical type that we want operations changed to (e.g. v4i32 for
+build_vector) and legalize should change non-identical types to thse.  This is
+similar to what it does for operations that are only supported in some types,
+e.g. x86 cmov (not supported on bytes).
+
+This would fix two problems:
+1. Writing patterns multiple times.
+2. Identical operations in different types are not getting CSE'd (e.g. 
+   { 0U, 0U, 0U, 0U } and {0.0, 0.0, 0.0, 0.0}.
+
+