[llvm-commits] CVS: llvm/lib/Target/X86/README-FPStack.txt README.txt
Chris Lattner
lattner at cs.uiuc.edu
Fri May 19 13:46:05 PDT 2006
Changes in directory llvm/lib/Target/X86:
README-FPStack.txt added (r1.1)
README.txt updated: 1.108 -> 1.109
---
Log message:
Split FP-stack notes out of the main readme. Next up: splitting out SSE.
---
Diffs of the changes: (+99 -100)
README-FPStack.txt | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++
README.txt | 100 -----------------------------------------------------
2 files changed, 99 insertions(+), 100 deletions(-)
Index: llvm/lib/Target/X86/README-FPStack.txt
diff -c /dev/null llvm/lib/Target/X86/README-FPStack.txt:1.1
*** /dev/null Fri May 19 15:46:02 2006
--- llvm/lib/Target/X86/README-FPStack.txt Fri May 19 15:45:52 2006
***************
*** 0 ****
--- 1,99 ----
+ //===---------------------------------------------------------------------===//
+ // Random ideas for the X86 backend: FP stack related stuff
+ //===---------------------------------------------------------------------===//
+
+ //===---------------------------------------------------------------------===//
+
+ Some targets (e.g. athlons) prefer freep to fstp ST(0):
+ http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html
+
+ //===---------------------------------------------------------------------===//
+
+ On darwin/x86, we should codegen:
+
+ ret double 0.000000e+00
+
+ as fld0/ret, not as:
+
+ movl $0, 4(%esp)
+ movl $0, (%esp)
+ fldl (%esp)
+ ...
+ ret
+
+ //===---------------------------------------------------------------------===//
+
+ This should use fiadd on chips where it is profitable:
+ double foo(double P, int *I) { return P+*I; }
+
+ We have fiadd patterns now but the followings have the same cost and
+ complexity. We need a way to specify the later is more profitable.
+
+ def FpADD32m : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW,
+ [(set RFP:$dst, (fadd RFP:$src1,
+ (extloadf64f32 addr:$src2)))]>;
+ // ST(0) = ST(0) + [mem32]
+
+ def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW,
+ [(set RFP:$dst, (fadd RFP:$src1,
+ (X86fild addr:$src2, i32)))]>;
+ // ST(0) = ST(0) + [mem32int]
+
+ //===---------------------------------------------------------------------===//
+
+ The FP stackifier needs to be global. Also, it should handle simple permutates
+ to reduce number of shuffle instructions, e.g. turning:
+
+ fld P -> fld Q
+ fld Q fld P
+ fxch
+
+ or:
+
+ fxch -> fucomi
+ fucomi jl X
+ jg X
+
+ Ideas:
+ http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html
+
+
+ //===---------------------------------------------------------------------===//
+
+ Add a target specific hook to DAG combiner to handle SINT_TO_FP and
+ FP_TO_SINT when the source operand is already in memory.
+
+ //===---------------------------------------------------------------------===//
+
+ Open code rint,floor,ceil,trunc:
+ http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html
+ http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html
+
+ Opencode the sincos[f] libcall.
+
+ //===---------------------------------------------------------------------===//
+
+ None of the FPStack instructions are handled in
+ X86RegisterInfo::foldMemoryOperand, which prevents the spiller from
+ folding spill code into the instructions.
+
+ //===---------------------------------------------------------------------===//
+
+ Currently the x86 codegen isn't very good at mixing SSE and FPStack
+ code:
+
+ unsigned int foo(double x) { return x; }
+
+ foo:
+ subl $20, %esp
+ movsd 24(%esp), %xmm0
+ movsd %xmm0, 8(%esp)
+ fldl 8(%esp)
+ fisttpll (%esp)
+ movl (%esp), %eax
+ addl $20, %esp
+ ret
+
+ This will be solved when we go to a dynamic programming based isel.
+
+ //===---------------------------------------------------------------------===//
Index: llvm/lib/Target/X86/README.txt
diff -u llvm/lib/Target/X86/README.txt:1.108 llvm/lib/Target/X86/README.txt:1.109
--- llvm/lib/Target/X86/README.txt:1.108 Fri May 19 14:41:33 2006
+++ llvm/lib/Target/X86/README.txt Fri May 19 15:45:52 2006
@@ -31,62 +31,6 @@
//===---------------------------------------------------------------------===//
-Some targets (e.g. athlons) prefer freep to fstp ST(0):
-http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html
-
-//===---------------------------------------------------------------------===//
-
-On darwin/x86, we should codegen:
-
- ret double 0.000000e+00
-
-as fld0/ret, not as:
-
- movl $0, 4(%esp)
- movl $0, (%esp)
- fldl (%esp)
- ...
- ret
-
-//===---------------------------------------------------------------------===//
-
-This should use fiadd on chips where it is profitable:
-double foo(double P, int *I) { return P+*I; }
-
-We have fiadd patterns now but the followings have the same cost and
-complexity. We need a way to specify the later is more profitable.
-
-def FpADD32m : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW,
- [(set RFP:$dst, (fadd RFP:$src1,
- (extloadf64f32 addr:$src2)))]>;
- // ST(0) = ST(0) + [mem32]
-
-def FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW,
- [(set RFP:$dst, (fadd RFP:$src1,
- (X86fild addr:$src2, i32)))]>;
- // ST(0) = ST(0) + [mem32int]
-
-//===---------------------------------------------------------------------===//
-
-The FP stackifier needs to be global. Also, it should handle simple permutates
-to reduce number of shuffle instructions, e.g. turning:
-
-fld P -> fld Q
-fld Q fld P
-fxch
-
-or:
-
-fxch -> fucomi
-fucomi jl X
-jg X
-
-Ideas:
-http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html
-
-
-//===---------------------------------------------------------------------===//
-
Improvements to the multiply -> shift/add algorithm:
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html
@@ -136,11 +80,6 @@
//===---------------------------------------------------------------------===//
-Add a target specific hook to DAG combiner to handle SINT_TO_FP and
-FP_TO_SINT when the source operand is already in memory.
-
-//===---------------------------------------------------------------------===//
-
Model X86 EFLAGS as a real register to avoid redudant cmp / test. e.g.
cmpl $1, %eax
@@ -181,24 +120,6 @@
//===---------------------------------------------------------------------===//
-Open code rint,floor,ceil,trunc:
-http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html
-http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html
-
-//===---------------------------------------------------------------------===//
-
-Combine: a = sin(x), b = cos(x) into a,b = sincos(x).
-
-Expand these to calls of sin/cos and stores:
- double sincos(double x, double *sin, double *cos);
- float sincosf(float x, float *sin, float *cos);
- long double sincosl(long double x, long double *sin, long double *cos);
-
-Doing so could allow SROA of the destination pointers. See also:
-http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17687
-
-//===---------------------------------------------------------------------===//
-
The instruction selector sometimes misses folding a load into a compare. The
pattern is written as (cmp reg, (load p)). Because the compare isn't
commutative, it is not matched with the load on both sides. The dag combiner
@@ -219,11 +140,6 @@
//===---------------------------------------------------------------------===//
-LSR should be turned on for the X86 backend and tuned to take advantage of its
-addressing modes.
-
-//===---------------------------------------------------------------------===//
-
When compiled with unsafemath enabled, "main" should enable SSE DAZ mode and
other fast SSE modes.
@@ -293,11 +209,6 @@
//===---------------------------------------------------------------------===//
-We need to lower switch statements to tablejumps when appropriate instead of
-always into binary branch trees.
-
-//===---------------------------------------------------------------------===//
-
SSE doesn't have [mem] op= reg instructions. If we have an SSE instruction
like this:
@@ -351,12 +262,6 @@
//===---------------------------------------------------------------------===//
-None of the FPStack instructions are handled in
-X86RegisterInfo::foldMemoryOperand, which prevents the spiller from
-folding spill code into the instructions.
-
-//===---------------------------------------------------------------------===//
-
In many cases, LLVM generates code like this:
_test:
@@ -827,11 +732,6 @@
//===---------------------------------------------------------------------===//
-A Mac OS X IA-32 specific ABI bug wrt returning value > 8 bytes:
-http://llvm.org/bugs/show_bug.cgi?id=729
-
-//===---------------------------------------------------------------------===//
-
X86RegisterInfo::copyRegToReg() returns X86::MOVAPSrr for VR128. Is it possible
to choose between movaps, movapd, and movdqa based on types of source and
destination?
More information about the llvm-commits
mailing list