[llvm-commits] CVS: reopt/docs/ReoptUsersGuide.rtf

Brian Gaeke gaeke at cs.uiuc.edu
Tue Dec 14 12:34:46 PST 2004



Changes in directory reopt/docs:

ReoptUsersGuide.rtf updated: 1.5 -> 1.6
---
Log message:

Add section on machine-dependent parts of the reoptimizer.
Other minor updates.


---
Diffs of the changes:  (+74 -25)

Index: reopt/docs/ReoptUsersGuide.rtf
diff -u reopt/docs/ReoptUsersGuide.rtf:1.5 reopt/docs/ReoptUsersGuide.rtf:1.6
--- reopt/docs/ReoptUsersGuide.rtf:1.5	Fri Oct 29 16:50:08 2004
+++ reopt/docs/ReoptUsersGuide.rtf	Tue Dec 14 14:34:35 2004
@@ -70,6 +70,7 @@
 \f0  - list LLVM assembly for a trace's basic blocks (
 \f3 reopt/tools/dumptrace
 \f0 )\
+	Machine-dependent parts of the reoptimizer\
 	Known problems with the current implementation\
 		UnpackTraceFunction generates inefficient code\
 		TraceJITEmitter is poorly integrated with the TraceCache\
@@ -80,7 +81,7 @@
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
 \f0\b0 \cf0 \ulnone \
-The reoptimizer is LLVM's dynamic optimization framework. It consists of three basic parts: a collection of profiling instrumentation passes, a runtime profiler and trace generator, and a runtime trace optimizer.\
+The reoptimizer is LLVM's dynamic optimization framework. It consists of three basic parts: a collection of profiling instrumentation passes, a runtime profiler and trace generator, and a runtime trace optimizer. For more general background information on the reoptimizer, read the presentation slides titled "The LLVM Reoptimizer" presented by Brian Gaeke on Dec. 6 2004, which should be available along with this manual.\
 \
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 \cf0 \ul Link-time profiling instrumentation\ulnone \
@@ -90,8 +91,7 @@
  * Function inlining (
 \f3 lib/Transforms/IPO/InlineSimple.cpp
 \f0 )\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 Performing more inlining is supposed to increase the effectiveness of interprocedural tracing.\
+Performing more inlining is supposed to increase the effectiveness of interprocedural tracing.\
 \
  * Lower LLVM 'switch' instructions to branches (
 \f3 lib/Transforms/Scalar/LowerSwitch.cpp
@@ -118,16 +118,11 @@
 \f3 reopt/lib/LightWtProfiling/FirstTrigger.cpp
 \f0 .\
 \
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \ul Runtime profiler and trace generator\ulnone \
+\ul Runtime profiler and trace generator\ulnone \
 \
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-
-\f1\b \cf0 [FIXME - not done.]
-\f0\b0  For more information about the reoptimizer's instrumentation passes and runtime profiler, consult Anand Shukla's M.S. thesis, "Lightweight Cross-Procedure Tracing for Runtime Optimization", July 2003.\
+For more information about the reoptimizer's instrumentation passes and runtime profiler, consult Anand Shukla's M.S. thesis, "Lightweight Cross-Procedure Tracing for Runtime Optimization", July 2003.\
 \
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \ul Trace optimizer\ulnone \
+\ul Trace optimizer\ulnone \
 \
 See the section "How the trace optimizer works", below.\
 \
@@ -382,10 +377,8 @@
 	-sli-threshold=
 \f5\i count
 \f0\i0 		Number of iterations of SLI before path counters are sampled.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 						The default is 50.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \
+						The default is 50.\
+\
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 \cf0 \ul Phase detection options\
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
@@ -399,11 +392,9 @@
 	-timer-int-s=
 \f5\i seconds
 \f0\i0 		Interval (in seconds) between phase detection sweeps.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 						The default is 3.0 seconds. You can provide a decimal fraction\
+						The default is 3.0 seconds. You can provide a decimal fraction\
 						to this option.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \
+\
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 \cf0 \ul Trace layout engine options\
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
@@ -434,18 +425,19 @@
 	-opt-trace-cache-size=
 \f5\i size
 \f0\i0 	Trace cache size for optimized code\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f1\b [FIXME: what are the defaults for these? what are the units in which they are specified?]\
+\f1\b \cf0 [FIXME: what are the defaults for these? what are the units in which they are specified?]\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
-\f0\b0 \
+\f0\b0 \cf0 \
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 \cf0 \ul Debugging options\
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 \cf0 \ulnone 	-debug				Turn on debugging printouts from TraceToFunction,\
 						UnpackTraceFunction, and the various other reoptimizer libraries\
 						(in addition to those available from the other options listed below).\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 	-print-machineinstrs		Print generated machine code for traces.\
+	-print-machineinstrs		Print generated machine code for traces.\
 	-skip-trace=
 \f5\i n
 \f0\i0 				Don't optimize the 
@@ -493,7 +485,18 @@
 \f3 \ul reopt/lib/TraceToFunction
 \f0 \ul )\ulnone \
 \
-The first pass that runs over newly-identified Traces is the TraceToFunction pass, also known as the TraceFunctionBuilder. TraceToFunction has three major goals: first, to find the live-in and live-out sets from the trace; second, to remove the reoptimizer's first-level loop instrumentation from the trace, if it is found; and third, to construct a function containing the same code as the trace. This function is called the TraceFunction, and it is the principal output of the TraceToFunction pass.\
+The first pass that runs over newly-identified Traces is the TraceToFunction pass, also known as the TraceFunctionBuilder. It is called from optimizeTrace() when a new trace is available, and implemented in 
+\f3 reopt/lib/TraceToFunction/TraceToFunction.cpp
+\f0 . TraceToFunction has three major goals: first, to find the live-in and live-out sets from the trace; second, to remove the reoptimizer's first-level loop instrumentation from the trace, if it is found; and third, to construct a function containing the same code as the trace. This function is called the TraceFunction, and it is the principal output of the TraceToFunction pass.\
+\
+\ul Principal functions that make up TraceToFunction\ulnone  (in order of execution)\
+BuildTraceFunction() -- Runs the TraceFunctionBuilder Pass on a given Trace (calls each of the following)\
+buildFLIMap() -- identifies FLI blocks (using identifyFLIEdge()) on the trace so they can be removed\
+findAlternateEntryPoints() -- identifies entrances to the trace other than the trace entry BB\
+buildTraceLiveInSet(), buildTraceLiveOutSet() -- find LLVM Values which are live-in at entry to the trace and live-out at exits from the trace\
+fillInFunctionBody() -- inline (clone) the trace's BBs into the new Function (using cloneTraceBBsIntoFunction()), removes FLI (using threadFLIEdges()), then calls fixupFunctionBodyBB()\
+fixupFunctionBodyBB() -- adds compensation code at trace edges, makes cloned code internally consistent\
+fixupPhis() - removes off-trace sources from Phi nodes TraceToFunction\
 \
 A TraceFunction has several components besides the newly-generated LLVM Function containing the trace code. It also contains a pointer to the original LLVM Function which the trace came from; this is known as the trace's "matrix function". It also contains the trace's live-in and live-out sets (represented as vectors of LLVM Values), and mapping information relating LLVM Values in the matrix function to those in the TraceFunction. Also, since live-in and live-out values are expressed as arguments to the TraceFunction, there are separate maps that relate the TraceFunction's arguments to the live values that they represent.\
 \
@@ -535,7 +538,13 @@
 \f3 '-dregalloc=y -print-machineinstrs'
 \f0  flags to 
 \f3 reopt-llc
-\f0 , and save the output, so that you can look up the registers to which LLVM Values were allocated when you are single-stepping through traces.\
+\f0 , and save the output, so that you can look up the registers to which LLVM Values were allocated when you are single-stepping through traces. If you run 
+\f3 'gmake TEST=reopt'
+\f0  with a debug build, the TEST.reopt.Makefile rules will save this information for each test in the file 
+\f3 'Output/
+\f4\i testname
+\f3\i0 .reopt-llc.s.log'
+\f0 .\
 \
 2. Set your 
 \f3 LLVM_REOPT
@@ -656,6 +665,46 @@
 \
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 
+\f1\b \cf0 \ul \ulc0 Machine-dependent parts of the reoptimizer\
+
+\f0\b0 \ulnone \
+In addition to the aspects listed individually below, almost all of the files below use "uint64_t" to hold pointer values at some point. These should be changed to use either "void *" or "uintptr_t". (There may be other parts of the code that also use "uint64_t" to hold pointers.)\
+\
+MappingInfo/ValueAllocState.cpp\
+ - Assumes use of SparcV9 register allocator\
+LightWtProfiling/SecondTrigger.cpp\
+ - Depends on libcpc.h (cputrack interface) from Solaris; could be ported to use PAPI\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0  - Uses SparcV9 inline assembly for saving and restoring registers\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0  - Assumes use of SparcV9 register allocator (uses %g regs)\
+ - Contains hardcoded SparcV9 binary machine code fragments used for instrumentation\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 LightWtProfiling/FirstTrigger.cpp\
+ - Assumes 32-bit instruction words\
+ - Uses SparcV9 inline assembly for saving and restoring registers\
+ - Uses "doFlush" to flush instruction cache after modifying instructions in memory\
+LightWtProfiling/SLI.cpp\
+ - Assumes 32-bit instruction words\
+ - Assumes use of SparcV9 register allocator (uses %g regs)\
+ - Contains hardcoded SparcV9 binary machine code fragments used for instrumentation\
+LightWtProfiling/UnpackTraceFunction.cpp\
+ - Assumes use of SparcV9 instruction set and register allocator (uses %g regs)\
+ - Depends on exact sequences of code produced by SparcV9's CreateCodeToLoadConst() method\
+LightWtProfiling/scheduler.cpp\
+ - Uses POSIX interval timers; only compiles correctly on Linux & Solaris\
+ScratchMemory/ScratchMemory.cpp\
+ - Depends on assembler ".skip" directive.\
+TraceCache/InstrUtils.cpp\
+ - Assumes use of SparcV9 instruction set\
+ - Contains hardcoded SparcV9 binary machine code bit patterns used for identifying & constructing code\
+TraceCache/VirtualMem.cpp\
+ - Contains hardcoded SparcV9 binary machine code bit patterns used for identifying & constructing code\
+TraceCache/TraceCache.cpp\
+ - Uses "doFlush" to flush instruction cache after modifying instructions in memory\
+\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
 \f1\b \cf0 \ul Known problems with the current implementation\
 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
 






More information about the llvm-commits mailing list