[llvm-commits] CVS: reopt/docs/ReoptUsersGuide.rtf

Brian Gaeke gaeke at cs.uiuc.edu
Tue Oct 12 11:38:43 PDT 2004



Changes in directory reopt/docs:

ReoptUsersGuide.rtf updated: 1.1 -> 1.2
---
Log message:

Added 'what is the reoptimizer' section to the beginning


---
Diffs of the changes:  (+280 -213)

Index: reopt/docs/ReoptUsersGuide.rtf
diff -u reopt/docs/ReoptUsersGuide.rtf:1.1 reopt/docs/ReoptUsersGuide.rtf:1.2
--- reopt/docs/ReoptUsersGuide.rtf:1.1	Tue Oct 12 11:41:40 2004
+++ reopt/docs/ReoptUsersGuide.rtf	Tue Oct 12 13:38:32 2004
@@ -1,29 +1,33 @@
 {\rtf1\mac\ansicpg10000\cocoartf102
-{\fonttbl\f0\fswiss\fcharset77 Helvetica;\f1\fswiss\fcharset77 Helvetica-Bold;\f2\fmodern\fcharset77 Courier;
-\f3\fmodern\fcharset77 Courier-Oblique;\f4\fswiss\fcharset77 Helvetica-Oblique;\f5\fmodern\fcharset77 Courier-Bold;
-}
+{\fonttbl\f0\fswiss\fcharset77 Helvetica;\f1\fswiss\fcharset77 Helvetica-Bold;\f2\froman\fcharset77 TimesNewRomanMS;
+\f3\fmodern\fcharset77 Courier;\f4\fmodern\fcharset77 Courier-Oblique;\f5\fswiss\fcharset77 Helvetica-Oblique;
+\f6\fmodern\fcharset77 Courier-Bold;}
 {\colortbl;\red255\green255\blue255;}
-\margl1440\margr1440\vieww12000\viewh17160\viewkind0
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
+\margl1440\margr1440\vieww12000\viewh15840\viewkind0
+\deftab720
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
 \f0\fs24 \cf0 Reoptimizer User's Guide\
 October 2004\
 Brian Gaeke\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul Table of Contents\
+\f1\b \cf0 \ul \ulc0 Table of Contents\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0\b0 \ulnone \
+\f0\b0 \cf0 \ulnone \
+	Introduction to the reoptimizer\
 	Building the reoptimizer from source\
 	Running the reoptimizer on a program in the llvm-test module
 \f2 \
 
 \f0 	Options recognized by the 
-\f2 "run-tests"
+\f3 "run-tests"
 \f0  script\
 	Running the reoptimizer on an arbitrary program\
 	The 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable\
 		Instrumentation/Tracing Framework options\
 		Phase detection options\
@@ -36,304 +40,357 @@
 	How the trace optimizer works\
 		1. The trace optimizer\
 		2. TraceToFunction (
-\f2 reopt/lib/TraceToFunction
+\f3 reopt/lib/TraceToFunction
 \f0 )\
 		3. Trace Optimizations and JIT (
-\f2 reopt/lib/TraceJIT
+\f3 reopt/lib/TraceJIT
 \f0 )\
 		4. UnpackTraceFunction (
-\f2 reopt/lib/LightWtProfiling/UnpackTraceFunction.cpp
+\f3 reopt/lib/LightWtProfiling/UnpackTraceFunction.cpp
 \f0 )\
 		5. Machine-code emission (
-\f2 reopt/lib/TraceJIT/TraceJITEmitter.cpp
+\f3 reopt/lib/TraceJIT/TraceJITEmitter.cpp
 \f0 )\
 	Tips for debugging the reoptimizer\
 	Reoptimizer testing tools\
 		The Trace I/O library (
-\f2 reopt/lib/TraceIO
+\f3 reopt/lib/TraceIO
 \f0 )\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 		ttftest
-\f0  - a standalone TraceToFunction testing tool (
-\f2 reopt/tools/ttftest
+\f3 \cf0 		ttftest
+\f0 - a standalone TraceToFunction testing tool (
+\f3 reopt/tools/ttftest
 \f0 )\
 
-\f2 		dumptrace
-\f0  - list LLVM assembly for a trace's basic blocks (
-\f2 reopt/tools/dumptrace
+\f3 		dumptrace
+\f0 - list LLVM assembly for a trace's basic blocks (
+\f3 reopt/tools/dumptrace
 \f0 )\
 	Known problems with the current implementation\
 		UnpackTraceFunction generates inefficient code\
 		TraceJITEmitter is poorly integrated with the TraceCache\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f1\b \cf0 \ul \ulc0 Introduction to the reoptimizer\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f0\b0 \cf0 \ulnone \
+The reoptimizer is LLVM's dynamic optimization framework. It consists of three basic parts: a collection of compile-time profiling instrumentation passes, a runtime profiler and trace generator, and a runtime trace optimizer.\
+\
+The profiling instrumentation passes run at compile time are as follows: [FIXME]\
+\
+For more information about the reoptimizer's instrumentation passes and runtime profiler, consult Anand Shukla's M.S. thesis, "Lightweight Cross-Procedure Tracing for Runtime Optimization", July 2003.\
+\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul Building the reoptimizer from source
-\f0\b0 \ul \
-\ulnone \
+\f1\b \cf0 \ul \ulc0 Building the reoptimizer from source\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f0\b0 \cf0 \ulnone \
 1. Check out a fresh llvm tree, along with the repository of test programs.\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 	% cvs -d /home/vadve/shared/PublicCVS co llvm\
+\f3 \cf0 	% cvs -d /home/vadve/shared/PublicCVS co llvm\
 	% cd llvm/projects\
 	% cvs -d /home/vadve/shared/PublicCVS co llvm-test\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0 2. Check out the reopt module from the internal CVS repository into the llvm/projects subdirectory, creating llvm/projects/reopt.\
+\f0 \cf0 2. Check out the reopt module from the internal CVS repository into the llvm/projects subdirectory, creating llvm/projects/reopt.\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 	% cvs -d /home/vadve/shared/InternalCVS co reopt\
+\f3 \cf0 	% cvs -d /home/vadve/shared/InternalCVS co reopt\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0 3. Configure and build the system in release mode (or debug mode if you want to try to fix bugs).\
+\f0 \cf0 3. Configure and build the system in release mode (or debug mode if you want to try to fix bugs).\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 	% cd ../..\
+\f3 \cf0 	% cd ../..\
 	% ./configure
-\f3\i  [configure args]\
+\f4\i  [configure args]
+\f0\i0 \
 
-\f2\i0 	% cd llvm/projects/reopt && ./configure\
+\f3 	% cd llvm/projects/reopt && ./configure\
 	% cd ../..\
 	% gmake ENABLE_OPTIMIZED=1\
 		
-\f4\i or\
+\f5\i or
+\f0\i0 \
 
-\f2\i0 	% gmake\
+\f3 	% gmake\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0 \
+\f0 \cf0 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f1\b \cf0 \ul \ulc0 Running the reoptimizer on a program in the llvm-test module\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul Running the reoptimizer on a program in the llvm-test module
-\f0\b0 \ul \
-\ulnone \
+\f0\b0 \cf0 \ulnone \
 There is a script called 
-\f2 "run-tests"
+\f3 "run-tests"
 \f0  in the reopt/test subdirectory, which is essentially a wrapper around 
-\f2 "gmake TEST=reopt"
+\f3 "gmake TEST=reopt"
 \f0 . It knows a canned set of tests by some symbolic names I have invented. For example, two equivalent ways to run the Shootout tests are as follows:\
 	
-\f2 % gmake TEST=reopt SUBDIR=SingleSource/Benchmarks/Shootout\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
+\f3 % gmake TEST=reopt SUBDIR=SingleSource/Benchmarks/Shootout
+\f0 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f5\i \cf0 		or\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f4\i \cf0 		or
-\f2\i0 \
-	% ./run-tests shootout\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
+\f3\i0 \cf0 	% ./run-tests shootout\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
 \f0 \cf0 ...within the reopt/test directory. Both of these commands perform three tasks: first, run the ordinary llc-compiled, non-instrumented version of each Shootout program; second:  run the reoptimized version; third: compare the results.\
 \
 You can get a list of the symbolic names known by the 
-\f2 "run-tests"
+\f3 "run-tests"
 \f0  script by typing:\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 	% ./run-tests -list\
+\f3 \cf0 	% ./run-tests -list\
 \
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
 \f1\b \cf0 \ul \ulc0 Options recognized by the 
-\f5 \ul "run-tests"
+\f6 \ul "run-tests"
 \f1 \ul  script
-\f0\b0 \ul \
-\ulnone \
+\f0\b0 \ulnone \
+\
 -list
-\f2 			
+\f3 			
 \f0 		Show the list of known benchmarks.\
 \
 -clean 
-\f4\i <benchmark>		
+\f5\i <benchmark>		
 \f0\i0 Erase reoptimizer-instrumented program and its output.\
 \
 -clean -output 
-\f4\i <benchmark>	
+\f5\i <benchmark>	
 \f0\i0 Erase reoptimizer-instrumented program's output only, so that\
 					you can re-run the reoptimizer on a program without rebuilding it.\
 \
 -debug 
-\f4\i <benchmark>		
+\f5\i <benchmark>		
 \f0\i0 Try to automatically start gdb on the reoptimized version\
 					of <benchmark>. Sometimes this doesn't work, because the\
 					shell script can't parse the llvm-test makefiles very well.\
 \
 -test 
-\f4\i <benchmark>			
+\f5\i <benchmark>			
 \f0\i0 This is the default option. It runs the <benchmark> using\
 					the ordinary llc-compiled, non-instrumented version and\
 					compares its output with the output from the reoptimized\
 					version, by using GNU make (as described above).\
 \
 -skiptrace=
-\f4\i N
-\f3 				
+\f5\i N
+\f4 				
 \f0\i0 When combined with 
-\f2 -test
+\f3 -test
 \f0  or 
-\f2 -debug
+\f3 -debug
 \f0 , this modifies your \
 					
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable to tell the reoptimizer \
 					that it should skip the trace numbered 
-\f3\i N
+\f4\i N
 \f0\i0 . You can see the \
 					trace numberings by running in debug mode with 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  \
 					set to 
-\f2 '--debug --enable-trace-opt'
+\f3 '--debug --enable-trace-opt'
 \f0 .\
 \
 -release
-\f2 				
+\f3 				
 \f0 When combined with 
-\f2 -test
+\f3 -test
 \f0 , this runs the reoptimizer in \
 					release mode. You must have already built your LLVM tree \
 					with 
-\f2 'gmake ENABLE_OPTIMIZED=1'
+\f3 'gmake ENABLE_OPTIMIZED=1'
 \f0 . This option works by \
 					causing GNU make to be run with parameters\
 					
-\f2 'TEST=reopt ENABLE_OPTIMIZED=1'
+\f3 'TEST=reopt ENABLE_OPTIMIZED=1'
 \f0 .\
 \
 -lps
-\f2 					
+\f3 					
 \f0 When combined with 
-\f2 -test
+\f3 -test
 \f0 , runs Olden benchmarks with \
 					
-\f2 LARGE_PROBLEM_SIZE
+\f3 LARGE_PROBLEM_SIZE
 \f0  enabled.\
 \
 
-\f1\b \ul Running the reoptimizer on an arbitrary program
-\f0\b0 \ul \
-\ulnone \
+\f1\b \ul Running the reoptimizer on an arbitrary program\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f0\b0 \cf0 \ulnone \
 1. Assume you have a whole program which has been compiled and linked in bytecode form using the standard LLVM toolchain (llvm-gcc, gccas, gccld) in the file named 
-\f2 'prog.llvm.bc'
+\f3 'prog.llvm.bc'
 \f0 . \
 \
 2. The first thing you have to do is run the reoptimizer's first-level instrumentation passes and generate SparcV9 assembly code for the program. You use the 
-\f2 'reopt-llc'
+\f3 'reopt-llc'
 \f0  tool, located in the reopt/tools subdirectory, to do this. The following command generates instrumented SparcV9 assembly for 
-\f2 'prog.llvm.bc'
+\f3 'prog.llvm.bc'
 \f0  in the file 
-\f2 'prog.reopt.s'
+\f3 'prog.reopt.s'
 \f0 :\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 	% reopt-llc -enable-correct-eh-support -o=prog.reopt.s prog.llvm.bc\
+\f3 \cf0 	% reopt-llc -enable-correct-eh-support -o=prog.reopt.s prog.llvm.bc\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0 \
+\f0 \cf0 \
 If you are sure that the program does not depend on the proper behavior of setjmp/longjmp calls or invoke/unwind instructions, you can omit the 
-\f2 '-enable-correct-eh-support'
+\f3 '-enable-correct-eh-support'
 \f0  flag.\
 \
 3. The next step is to link the program into an executable along with the LLVM libraries that constitute the reoptimizer, as well as any other system libraries that the program needs. The reoptimizer's build process builds a single file (
-\f2 libwholereoptimizer.a
+\f3 libwholereoptimizer.a
 \f0 ) containing all the LLVM libraries you need, but you also need to link in the following libraries on Solaris: 
-\f2 '-lcpc -lm -lrt -lmalloc -ldl'
+\f3 '-lcpc -lm -lrt -lmalloc -ldl'
 \f0 . You also need to link the program using the C++ compiler. So the link line will look like this:\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 	% g++ -o prog.reopt prog.reopt.s 
-\f3\i REOPTLIB
-\f2\i0 /libwholereoptimizer.a 
-\f3\i OTHERLIBS
-\f2\i0  -lcpc -lm -lrt -lmalloc -ldl\
-
+\f3 \cf0 	% g++ -o prog.reopt prog.reopt.s 
+\f4\i REOPTLIB
+\f3\i0 /libwholereoptimizer.a 
+\f4\i OTHERLIBS
+\f3\i0  -lcpc -lm -lrt -lmalloc -ldl
 \f0 \
+\
 where 
-\f3\i REOPTLIB
+\f4\i REOPTLIB
 \f0\i0  is the full path to the 
-\f2 'llvm/projects/reopt/lib/Release'
+\f3 'llvm/projects/reopt/lib/Release'
 \f0  (or 
-\f2 'Debug'
+\f3 'Debug'
 \f0 ) directory containing your freshly-built reoptimizer libraries, and 
-\f3\i OTHERLIBS
+\f4\i OTHERLIBS
 \f0\i0  names any other system libraries needed (e.g., 
-\f2 '-ljpeg'
+\f3 '-ljpeg'
 \f0 ) by the program under consideration.\
 \
 4. Now, you should set your 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable to contain any command-line options you want to pass to the reoptimizer. The most important ones are 
-\f2 '--debug'
+\f3 '--debug'
 \f0 , to turn on debugging printouts, and 
-\f2 '--enable-trace-opt'
+\f3 '--enable-trace-opt'
 \f0 , to enable the trace optimizer. (Without the latter option, the original trace layout engine is used.)\
 \
 A full list of 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  settings follows in the next section.\
 \
 5. Finally, you can run the program with any arguments and/or input files that the normal, non-instrumented version of the program would have accepted.\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul The 
-\f5 \ul LLVM_REOPT
+\f1\b \cf0 \ul \ulc0 The 
+\f6 \ul LLVM_REOPT
 \f1 \ul  environment variable
-\f0\b0 \ul \
-\ulnone \
+\f0\b0 \ulnone \
+\
 The reoptimizer has several tunable parameters and settings which are controlled by the 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable. For a few of the tunable parameters relating to the trace cache and instrumentation, it may be helpful to look at Anand's thesis for guidance. The list of options that pertain to the reoptimizer is as follows:\
 \
-\ul Instrumentation/Tracing Framework options\ulnone \
-	-fli-threshold=
-\f4\i count
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Instrumentation/Tracing Framework options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-fli-threshold=
+\f5\i count
 \f0\i0 			Number of times FLI must trigger before attempting SLI\
 	-sli-threshold=
-\f4\i count
+\f5\i count
 \f0\i0 		Number of iterations of SLI before path counters are sampled\
 \
-\ul Phase detection options\ulnone \
-	-enable-phase-detect		Use a timer interrupt to remove traces periodically from the trace cache\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Phase detection options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-enable-phase-detect		Use a timer interrupt to remove traces periodically from the trace cache\
 	-timer-int-s=
-\f4\i seconds
+\f5\i seconds
 \f0\i0 		Interval (in seconds) between phase detection sweeps\
 \
-\ul Trace layout engine options\ulnone \
-	-count-trace-cycles			Count cycles in optimized trace, when using trace layout engine\
-\
-\ul Trace optimizer options\ulnone \
-	-enable-trace-opt			Use the new trace optimizer instead of the old trace layout engine\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Trace layout engine options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-count-trace-cycles			Count cycles in optimized trace, when using trace layout engine\
+\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Trace optimizer options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-enable-trace-opt			Use the new trace optimizer instead of the old trace layout engine\
 	-run-opt-passes			Run optimization passes before unpacking TraceFunction\
 \
-\ul Trace cache options\ulnone \
-	-inst-trace-cache-size=
-\f4\i size
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Trace cache options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-inst-trace-cache-size=
+\f5\i size
 \f0\i0 	Trace cache size for SLI-instrumented code\
 	-opt-trace-cache-size=
-\f4\i size
+\f5\i size
 \f0\i0 	Trace cache size for optimized code\
 \
-\ul Debugging options\ulnone \
-	-print-machineinstrs		Print generated machine code\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Debugging options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-print-machineinstrs		Print generated machine code\
 	-skip-trace=
-\f4\i n
+\f5\i n
 \f0\i0 				Don't optimize the 
-\f4\i n
+\f5\i n
 \f0\i0 th trace, when using trace optimizer\
 	-debug				Turn on debugging printouts\
 	-dregalloc=y				Turn on SparcV9 register allocator debugging printouts\
 \
-\ul Statistics gathering options\ulnone \
-	-stats					Enable statistics output from program\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Statistics gathering options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-stats					Enable statistics output from program\
 	-time-passes				Time each pass, printing elapsed time for each on exit\
 \
-\ul Help options\ulnone \
-	-version				Display the version of LLVM used\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 Help options\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone 	-version				Display the version of LLVM used\
 	-help					Display available options (-help-hidden for more)\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
-\cf0 \
-
-\f1\b \ul How the trace optimizer works
-\f0\b0 \ulnone \
-\
-\ul 1. The trace optimizer\ulnone \
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f1\b \cf0 \ul \ulc0 How the trace optimizer works\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f0\b0 \cf0 \ulnone \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 1. The trace optimizer\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone \
 The trace optimizer is the component of the reoptimizer (LLVM's runtime optimization framework) that consumes traces generated by the runtime profiler, converts them into functions, runs global optimizations on them, and then reattaches the optimized trace code back to the original function it came from.\
 \
 The trace optimizer's input is an LLVM Trace object, which is essentially a wrapper around a collection of LLVM BasicBlocks from a given Function. It is agnostic as to the particular method of trace generation -- that is, it should be possible to take any given runtime profiling implementation and use it to identify hot regions to optimize using the trace optimizer.\
 \
 Traces given to the trace optimizer should consist of a loop body with one or more side exits. In particular, there should be phi nodes at the beginning of the first trace BasicBlock for variables live across loop iterations; these are detected and handled specially by TraceToFunction.\
 \
-\ul 2. TraceToFunction (
-\f2 \ul reopt/lib/TraceToFunction
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 2. TraceToFunction (
+\f3 \ul reopt/lib/TraceToFunction
 \f0 \ul )\ulnone \
 \
 The first pass that runs over newly-identified Traces is the TraceToFunction pass, also known as the TraceFunctionBuilder. TraceToFunction has three major goals: first, to find the live-in and live-out sets from the trace; second, to remove the reoptimizer's first-level loop instrumentation from the trace, if it is found; and third, to construct a function containing the same code as the trace. This function is called the TraceFunction, and it is the principal output of the TraceToFunction pass.\
@@ -341,164 +398,172 @@
 A TraceFunction has several components besides the newly-generated LLVM Function containing the trace code. It also contains a pointer to the original LLVM Function which the trace came from; this is known as the trace's "matrix function". It also contains the trace's live-in and live-out sets (represented as vectors of LLVM Values), and mapping information relating LLVM Values in the matrix function to those in the TraceFunction. Also, since live-in and live-out values are expressed as arguments to the TraceFunction, there are separate maps that relate the TraceFunction's arguments to the live values that they represent.\
 \
 \ul 3. Trace Optimizations and JIT (
-\f2 \ul reopt/lib/TraceJIT
+\f3 \ul reopt/lib/TraceJIT
 \f0 \ul )\ulnone \
 \
 After the TraceFunction is built, the Trace JIT gains control. Its job is to run a specified list of global optimizations on the trace and then generate LLVM machine code (i.e., a MachineFunction) for the trace using the standard LLVM just-in-time compilation passes.\
 \
 The optimizations that are run by the Trace JIT are listed in the TraceJITOpts.cpp source file. By default, presently no optimization passes are run unless you specify the 
-\f2 '--run-opt-passes'
+\f3 '--run-opt-passes'
 \f0  flag in your 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable (see "The 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable" section, below). \
 \
 After the Trace JIT generates machine code for the newly-optimized trace, it uses UnpackTraceFunction to reattach the optimized trace's code to its matrix function, and then it uses the Trace JIT Emitter to emit the machine code into the trace cache.\
 \
 \ul 4. UnpackTraceFunction (
-\f2 \ul reopt/lib/LightWtProfiling/UnpackTraceFunction.cpp
+\f3 \ul reopt/lib/LightWtProfiling/UnpackTraceFunction.cpp
 \f0 \ul )\ulnone \
 \
 UnpackTraceFunction sets up a stack frame for the optimized trace and generates move instructions to copy each live-in/out value from its register allocated in the matrix function to/from its register allocated in the trace function. It also generates branch instructions to return control from each trace exit to the original (untraced) basic block which would have been its (off-trace) successor, replacing the LLVM "ret" instructions used by TraceToFunction to represent trace exits. Instances where an LLVM Value has been replaced with a constant must sometimes be treated specially, because constants are not typically allocated to registers. UnpackTraceFunction is also responsible for performing phi-elimination along trace exit edges, if it detects that a trace exit branch would immediately be followed by an LLVM phi instruction.\
 \
 \ul 5. Machine-code emission (
-\f2 \ul reopt/lib/TraceJIT/TraceJITEmitter.cpp
+\f3 \ul reopt/lib/TraceJIT/TraceJITEmitter.cpp
 \f0 \ul )\ulnone \
 \
 The unpacked trace code is emitted into the memory used by the trace cache using the VirtualMem abstraction, which implements reads and writes into the process's executable address space using the /proc filesystem. Once the trace has been emitted, the beginning of the SLI trace is overwritten with a branch to the beginning of the optimized trace, effectively disabling the SLI trace from further execution and enabling the optimized trace instead; control then returns from the second-level trigger function, which is invariably followed by a branch into the optimized trace.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
-\cf0 \
+\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f1\b \cf0 \ul \ulc0 Tips for debugging the reoptimizer\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul Tips for debugging the reoptimizer
-\f0\b0 \ul \
-\ulnone \
+\f0\b0 \cf0 \ulnone \
 1. It is useful to provide the 
-\f2 '-dregalloc=y -print-machineinstrs'
+\f3 '-dregalloc=y -print-machineinstrs'
 \f0  flags to 
-\f2 reopt-llc
+\f3 reopt-llc
 \f0 , and save the output, so that you can look up the registers to which LLVM Values were allocated when you are single-stepping through traces.\
 \
 2. Set your 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  environment variable to include the 
-\f2 '-debug -dregalloc=y -print-machineinstrs'
+\f3 '-debug -dregalloc=y -print-machineinstrs'
 \f0  flags. This will allow you to see the machine code before and after it has registers allocated and before and after the trace is unpacked into its parent function.\
 \
 3. Run the reoptimized binary under gdb, and set a breakpoint at the symbol 
-\f2 'TraceOptimizerDone'
+\f3 'TraceOptimizerDone'
 \f0 . This function, in RuntimeOptimizations.cpp, is called just after a trace is emitted into the trace cache, and just before the trace optimizer returns control to the program. It prints out gdb commands to disassemble the optimized trace and to set a breakpoint at the beginning of the optimized trace. It also prints out the sequence number of the trace, so that you can use the 
-\f2 LLVM_REOPT
+\f3 LLVM_REOPT
 \f0  
-\f2 '-skip-trace'
+\f3 '-skip-trace'
 \f0  option to skip the trace in a future execution.\
 \
 4. When you crash in a given trace, try using the 
-\f2 LLVM_REOPT '-skip-trace'
+\f3 LLVM_REOPT '-skip-trace'
 \f0  option to skip that trace. You could also try skipping the last trace that was generated. If you crash outside a trace, try skipping the last trace that was executed (if you can figure out which one it was).\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul Reoptimizer testing tools
-\f0\b0 \ulnone \
-\
-\ul The Trace I/O library (
-\f2 \ul reopt/lib/TraceIO
+\f1\b \cf0 \ul \ulc0 Reoptimizer testing tools\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f0\b0 \cf0 \ulnone \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 The Trace I/O library (
+\f3 \ul reopt/lib/TraceIO
 \f0 \ul )\ulnone \
 \
 Normally, traces live only in memory and are discarded when a reoptimized program exits. The Trace I/O library is provided to allow the reoptimizer to save (serialize) traces into files so that they can be reloaded for offline analysis later. The Trace I/O library contains two important functions: 
-\f2 ReadTraceFromFile
+\f3 ReadTraceFromFile
 \f0  and 
-\f2 WriteTraceToFile
+\f3 WriteTraceToFile
 \f0 . Writing a trace out to disk actually produces two files in the current implementation: (1) an LLVM bytecode file representing the module containing the trace's matrix function, and (2) a text file containing the name of the trace's matrix function and the integer indices of the basic blocks that make up the trace.\
 \
 When the trace optimizer is compiled in Debug mode and enabled using 
-\f2 LLVM_REOPT='--enable-trace-opt --debug'
+\f3 LLVM_REOPT='--enable-trace-opt --debug'
 \f0 , it will write out every trace using the Trace I/O library. The filenames used match the pattern 
-\f2 '
-\f3\i matrixfn
-\f2\i0 .trace
-\f3\i N
-\f2\i0 .
-\f3\i ext
-\f2\i0 '
+\f3 '
+\f4\i matrixfn
+\f3\i0 .trace
+\f4\i N
+\f3\i0 .
+\f4\i ext
+\f3\i0 '
 \f0  where 
-\f3\i matrixfn
+\f4\i matrixfn
 \f0\i0  is the name of the trace's matrix function, 
-\f3\i N
+\f4\i N
 \f0\i0  is the number of the trace as used by the 
-\f2 LLVM_REOPT='--skip-trace'
+\f3 LLVM_REOPT='--skip-trace'
 \f0  option (q.v.), and 
-\f3\i ext
+\f4\i ext
 \f0\i0  is 
-\f2 'bc'
+\f3 'bc'
 \f0  for the bytecode file and 
-\f2 'txt'
+\f3 'txt'
 \f0  for the text file.\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 \ul ttftest
-\f0 \ul  - a standalone TraceToFunction testing tool \ul \ulc0 (
-\f2 \ul reopt/tools/ttftest
+\f3 \cf0 \ul \ulc0 ttftest
+\f0 \ul - a standalone TraceToFunction testing tool (
+\f3 \ul reopt/tools/ttftest
 \f0 \ul )\ulnone \
 \
 There are two meaningful things you can do with serialized traces in the current reoptimizer framework. First, you can run TraceToFunction on a trace independently of the rest of the reoptimizer, using a tool called 
-\f2 ttftest
+\f3 ttftest
 \f0 . Because TraceToFunction is machine-independent, 
-\f2 ttftest
+\f3 ttftest
 \f0  should run correctly on any system capable of running LLVM. ttftest has a few important command-line options, the first two of which are required:\
 \
   -bc=
-\f4\i bytecodeFileName
+\f5\i bytecodeFileName
 \f0\i0 		Name of file containing module\
   -trace=
-\f4\i traceFileName
+\f5\i traceFileName
 \f0\i0 		Name of file containing trace\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
-\cf0   -print-live				Print live-in/out sets computed by TraceToFunction\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
-\cf0 \
-
-\f2 ttftest
-\f0  will also run the LLVM Verifier on the TraceFunction, and exit with an exit code of 0 if the verifier passes. This is the basic strategy used for the automated TraceToFunction regression tests (
-\f2 reopt/test/TTFTestCases
+  -print-live				Print live-in/out sets computed by TraceToFunction\
+\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
+\f3 \cf0 ttftest
+\f0 will also run the LLVM Verifier on the TraceFunction, and exit with an exit code of 0 if the verifier passes. This is the basic strategy used for the automated TraceToFunction regression tests (
+\f3 reopt/test/TTFTestCases
 \f0  and 
-\f2 reopt/test/TTFTestHarness.pl
+\f3 reopt/test/TTFTestHarness.pl
 \f0 ). The test harness assembles each module, runs ttftest on each of the module's traces in turn, and prints a summary of the number of test cases on which TraceToFunction produced bad LLVM code--naturally, this should always be zero!\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 \ul dumptrace
-\f0 \ul  - list LLVM assembly for a trace's basic blocks\ul \ulc0  (
-\f2 \ul reopt/tools/dumptrace
+\f3 \cf0 \ul \ulc0 dumptrace
+\f0 \ul - list LLVM assembly for a trace's basic blocks (
+\f3 \ul reopt/tools/dumptrace
 \f0 \ul )\ulnone \
 \
 If you just want to look at the assembly code for the basic blocks on a trace, the easiest way is to use the 
-\f2 dumptrace
+\f3 dumptrace
 \f0  tool. Instead of showing you the integer indices of the basic blocks on the trace, it will print out the LLVM assembly code for the basic blocks, in the order they are specified in the trace file.\
 \
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f2 \cf0 dumptrace
-\f0  also highlights two sets of values for your inspection: the values used on the trace which are defined outside the trace, and the values defined inside the trace which are used outside the trace. (Note that these are often subsets of the live-in/out sets computed by TraceToFunction; if you want to see those, use 
-\f2 ttftest -print-live
+\f3 \cf0 dumptrace
+\f0 also highlights two sets of values for your inspection: the values used on the trace which are defined outside the trace, and the values defined inside the trace which are used outside the trace. (Note that these are often subsets of the live-in/out sets computed by TraceToFunction; if you want to see those, use 
+\f3 ttftest -print-live
 \f0 .)\
 \
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural
 
-\f2 \cf0 dumptrace
-\f0  requires the same 
-\f2 -bc
+\f3 dumptrace
+\f0 requires the same 
+\f3 -bc
 \f0  and 
-\f2 -trace
+\f3 -trace
 \f0  command line options required by 
-\f2 ttftest
+\f3 ttftest
 \f0 ; see the previous section for details.\
 \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b \ul Known problems with the current implementation\
+\f1\b \cf0 \ul \ulc0 Known problems with the current implementation\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0\b0 \ulnone \
-\ul UnpackTraceFunction generates inefficient code\ulnone \
-\
+\f0\b0 \cf0 \ulnone \
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 UnpackTraceFunction generates inefficient code\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone \
 The machine code generated by the JIT for a TraceFunction assumes that live-in values and pointers to live-out values will be passed in as arguments according to the SparcV9/Solaris ABI's standard calling conventions. Also, the registers allocated to values in the TraceFunction have no relation to the registers allocated to the same values in the matrix function. These two facts pose problems for the efficient integration of optimized trace machine code into the function from which the trace was extracted.\
 \
 We intended UnpackTraceFunction to implement an efficient calling convention for traces that did not require moving values around into specific registers for the purposes of argument passing: essentially, we hoped that the TraceFunction's code could be generated to use the same register for each live-in/out value that its matrix function used for that value. Theoretically, the SparcV9 target's graph-coloring register allocator could be instructed to use a mechanism known as "suggested colorings" to provide this functionality on a best-effort basis -- that is, allocating a live-in or live-out value the matrix function's register for that value when possible, and allocating some other register (and emitting a move) in the case of a conflict.\
@@ -507,11 +572,13 @@
 \
 Also, UnpackTraceFunction must be careful to avoid read-after-write errors when copying trace live-in values from their matrix-function registers to their TraceFunction registers, if the two sets of registers overlap. We currently work around this problem by emitting stores of each trace live-in value onto the TraceFunction's stack, then (after all the loads) emitting loads of each live-in value into its TraceFunction register. This works correctly but is slow. A similar situation exists at trace exits (epilogs) with respect to live-out values.  It should be possible to use a temporary register to accomplish the same thing, if you are careful with the ordering of reads and writes.\
 \
-\ul TraceJITEmitter is poorly integrated with the TraceCache\ulnone \
-\
-In the current implementation of the TraceJITEmitter, the TraceCache does not keep track of individual traces when the trace JIT emitter inserts them, and so it cannot evict them. This deficiency exists primarily for two reasons: (1) the TraceCache wants to accept a trace in the form of a vector of binary words of machine code, and then perform fixups (e.g., to the PC-relative immediate fields of branch instructions) on its own, and therefore (2) the TraceCache wants to know the addresses of all branch and call instructions, information which the TraceJITEmitter does not have handy.\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ul \ulc0 TraceJITEmitter is poorly integrated with the TraceCache\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 \ulnone \
+In the current implementation of the TraceJITEmitter, the TraceCache does not keep track of individual traces when the trace JIT emitter inserts them, and so it cannot evict them. This deficiency exists primarily for two reasons: (1) the TraceCache wants to accept a trace in the form of a vector of binary words of machine code, and then perform fixups (e.g., to the PC-relative immediate fields of branch instructions) on its own, and therefore (2) the TraceCache wants to know the addresses of all branch and call instructions, information which the TraceJITEmitter does not have handy. Strategy (1) does not interact well with the target-independent JIT, which uses callbacks into the TraceJITEmitter to fixup backward branches.\
 \
-Strategy (1) does not interact well with the target-independent JIT, which uses callbacks into the TraceJITEmitter to fixup backward branches. It might be possible to pass empty lists of branches and calls to the TraceCache to make it skip its fixup steps. It might also be possible to change the TraceJITEmitter to emit into a temporary buffer, and then build up a vector of machine words from that and pass it off to the TraceCache. Care must be taken to avoid introducing unnecessary overhead due to copying. Currently, the TraceJITEmitter directly emits code into the memory used by the TraceCache, which is correct and reasonably efficient, but does not give the TraceCache any meaningful control over its own contents.\
+It might be possible, as a workaround, to pass empty lists of branches and calls to the TraceCache to make it skip its fixup steps. It might also be possible to change the TraceJITEmitter to emit into a temporary buffer, and then build up a vector of machine words from that and pass it off to the TraceCache. Care must be taken to avoid introducing unnecessary overhead due to copying. Currently, the TraceJITEmitter directly emits code into the memory used by the TraceCache, which is correct and reasonably efficient, but does not give the TraceCache any meaningful control over its own contents.\
 \
 In the long term, we want to change the target-independent JIT code to output pieces of machine code and lists of relocations (fixups) on the side, which also happens to be more like what the TraceCache wants -- this would make it possible to cache JIT translations at a per-function granularity, for example.\
 }
\ No newline at end of file






More information about the llvm-commits mailing list