[llvm-commits] CVS: reopt/docs/ReoptUsersGuide.rtf
Brian Gaeke
gaeke at cs.uiuc.edu
Tue Dec 14 12:34:46 PST 2004
Changes in directory reopt/docs:
ReoptUsersGuide.rtf updated: 1.5 -> 1.6
---
Log message:
Add section on machine-dependent parts of the reoptimizer.
Other minor updates.
---
Diffs of the changes: (+74 -25)
Index: reopt/docs/ReoptUsersGuide.rtf
diff -u reopt/docs/ReoptUsersGuide.rtf:1.5 reopt/docs/ReoptUsersGuide.rtf:1.6
--- reopt/docs/ReoptUsersGuide.rtf:1.5 Fri Oct 29 16:50:08 2004
+++ reopt/docs/ReoptUsersGuide.rtf Tue Dec 14 14:34:35 2004
@@ -70,6 +70,7 @@
\f0 - list LLVM assembly for a trace's basic blocks (
\f3 reopt/tools/dumptrace
\f0 )\
+ Machine-dependent parts of the reoptimizer\
Known problems with the current implementation\
UnpackTraceFunction generates inefficient code\
TraceJITEmitter is poorly integrated with the TraceCache\
@@ -80,7 +81,7 @@
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
\f0\b0 \cf0 \ulnone \
-The reoptimizer is LLVM's dynamic optimization framework. It consists of three basic parts: a collection of profiling instrumentation passes, a runtime profiler and trace generator, and a runtime trace optimizer.\
+The reoptimizer is LLVM's dynamic optimization framework. It consists of three basic parts: a collection of profiling instrumentation passes, a runtime profiler and trace generator, and a runtime trace optimizer. For more general background information on the reoptimizer, read the presentation slides titled "The LLVM Reoptimizer" presented by Brian Gaeke on Dec. 6 2004, which should be available along with this manual.\
\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
\cf0 \ul Link-time profiling instrumentation\ulnone \
@@ -90,8 +91,7 @@
* Function inlining (
\f3 lib/Transforms/IPO/InlineSimple.cpp
\f0 )\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 Performing more inlining is supposed to increase the effectiveness of interprocedural tracing.\
+Performing more inlining is supposed to increase the effectiveness of interprocedural tracing.\
\
* Lower LLVM 'switch' instructions to branches (
\f3 lib/Transforms/Scalar/LowerSwitch.cpp
@@ -118,16 +118,11 @@
\f3 reopt/lib/LightWtProfiling/FirstTrigger.cpp
\f0 .\
\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \ul Runtime profiler and trace generator\ulnone \
+\ul Runtime profiler and trace generator\ulnone \
\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-
-\f1\b \cf0 [FIXME - not done.]
-\f0\b0 For more information about the reoptimizer's instrumentation passes and runtime profiler, consult Anand Shukla's M.S. thesis, "Lightweight Cross-Procedure Tracing for Runtime Optimization", July 2003.\
+For more information about the reoptimizer's instrumentation passes and runtime profiler, consult Anand Shukla's M.S. thesis, "Lightweight Cross-Procedure Tracing for Runtime Optimization", July 2003.\
\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \ul Trace optimizer\ulnone \
+\ul Trace optimizer\ulnone \
\
See the section "How the trace optimizer works", below.\
\
@@ -382,10 +377,8 @@
-sli-threshold=
\f5\i count
\f0\i0 Number of iterations of SLI before path counters are sampled.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 The default is 50.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \
+ The default is 50.\
+\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
\cf0 \ul Phase detection options\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
@@ -399,11 +392,9 @@
-timer-int-s=
\f5\i seconds
\f0\i0 Interval (in seconds) between phase detection sweeps.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 The default is 3.0 seconds. You can provide a decimal fraction\
+ The default is 3.0 seconds. You can provide a decimal fraction\
to this option.\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 \
+\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
\cf0 \ul Trace layout engine options\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
@@ -434,18 +425,19 @@
-opt-trace-cache-size=
\f5\i size
\f0\i0 Trace cache size for optimized code\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\f1\b [FIXME: what are the defaults for these? what are the units in which they are specified?]\
+\f1\b \cf0 [FIXME: what are the defaults for these? what are the units in which they are specified?]\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\f0\b0 \
+\f0\b0 \cf0 \
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
\cf0 \ul Debugging options\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
\cf0 \ulnone -debug Turn on debugging printouts from TraceToFunction,\
UnpackTraceFunction, and the various other reoptimizer libraries\
(in addition to those available from the other options listed below).\
-\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
-\cf0 -print-machineinstrs Print generated machine code for traces.\
+ -print-machineinstrs Print generated machine code for traces.\
-skip-trace=
\f5\i n
\f0\i0 Don't optimize the
@@ -493,7 +485,18 @@
\f3 \ul reopt/lib/TraceToFunction
\f0 \ul )\ulnone \
\
-The first pass that runs over newly-identified Traces is the TraceToFunction pass, also known as the TraceFunctionBuilder. TraceToFunction has three major goals: first, to find the live-in and live-out sets from the trace; second, to remove the reoptimizer's first-level loop instrumentation from the trace, if it is found; and third, to construct a function containing the same code as the trace. This function is called the TraceFunction, and it is the principal output of the TraceToFunction pass.\
+The first pass that runs over newly-identified Traces is the TraceToFunction pass, also known as the TraceFunctionBuilder. It is called from optimizeTrace() when a new trace is available, and implemented in
+\f3 reopt/lib/TraceToFunction/TraceToFunction.cpp
+\f0 . TraceToFunction has three major goals: first, to find the live-in and live-out sets from the trace; second, to remove the reoptimizer's first-level loop instrumentation from the trace, if it is found; and third, to construct a function containing the same code as the trace. This function is called the TraceFunction, and it is the principal output of the TraceToFunction pass.\
+\
+\ul Principal functions that make up TraceToFunction\ulnone (in order of execution)\
+BuildTraceFunction() -- Runs the TraceFunctionBuilder Pass on a given Trace (calls each of the following)\
+buildFLIMap() -- identifies FLI blocks (using identifyFLIEdge()) on the trace so they can be removed\
+findAlternateEntryPoints() -- identifies entrances to the trace other than the trace entry BB\
+buildTraceLiveInSet(), buildTraceLiveOutSet() -- find LLVM Values which are live-in at entry to the trace and live-out at exits from the trace\
+fillInFunctionBody() -- inline (clone) the trace's BBs into the new Function (using cloneTraceBBsIntoFunction()), removes FLI (using threadFLIEdges()), then calls fixupFunctionBodyBB()\
+fixupFunctionBodyBB() -- adds compensation code at trace edges, makes cloned code internally consistent\
+fixupPhis() - removes off-trace sources from Phi nodes TraceToFunction\
\
A TraceFunction has several components besides the newly-generated LLVM Function containing the trace code. It also contains a pointer to the original LLVM Function which the trace came from; this is known as the trace's "matrix function". It also contains the trace's live-in and live-out sets (represented as vectors of LLVM Values), and mapping information relating LLVM Values in the matrix function to those in the TraceFunction. Also, since live-in and live-out values are expressed as arguments to the TraceFunction, there are separate maps that relate the TraceFunction's arguments to the live values that they represent.\
\
@@ -535,7 +538,13 @@
\f3 '-dregalloc=y -print-machineinstrs'
\f0 flags to
\f3 reopt-llc
-\f0 , and save the output, so that you can look up the registers to which LLVM Values were allocated when you are single-stepping through traces.\
+\f0 , and save the output, so that you can look up the registers to which LLVM Values were allocated when you are single-stepping through traces. If you run
+\f3 'gmake TEST=reopt'
+\f0 with a debug build, the TEST.reopt.Makefile rules will save this information for each test in the file
+\f3 'Output/
+\f4\i testname
+\f3\i0 .reopt-llc.s.log'
+\f0 .\
\
2. Set your
\f3 LLVM_REOPT
@@ -656,6 +665,46 @@
\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\f1\b \cf0 \ul \ulc0 Machine-dependent parts of the reoptimizer\
+
+\f0\b0 \ulnone \
+In addition to the aspects listed individually below, almost all of the files below use "uint64_t" to hold pointer values at some point. These should be changed to use either "void *" or "uintptr_t". (There may be other parts of the code that also use "uint64_t" to hold pointers.)\
+\
+MappingInfo/ValueAllocState.cpp\
+ - Assumes use of SparcV9 register allocator\
+LightWtProfiling/SecondTrigger.cpp\
+ - Depends on libcpc.h (cputrack interface) from Solaris; could be ported to use PAPI\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 - Uses SparcV9 inline assembly for saving and restoring registers\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 - Assumes use of SparcV9 register allocator (uses %g regs)\
+ - Contains hardcoded SparcV9 binary machine code fragments used for instrumentation\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+\cf0 LightWtProfiling/FirstTrigger.cpp\
+ - Assumes 32-bit instruction words\
+ - Uses SparcV9 inline assembly for saving and restoring registers\
+ - Uses "doFlush" to flush instruction cache after modifying instructions in memory\
+LightWtProfiling/SLI.cpp\
+ - Assumes 32-bit instruction words\
+ - Assumes use of SparcV9 register allocator (uses %g regs)\
+ - Contains hardcoded SparcV9 binary machine code fragments used for instrumentation\
+LightWtProfiling/UnpackTraceFunction.cpp\
+ - Assumes use of SparcV9 instruction set and register allocator (uses %g regs)\
+ - Depends on exact sequences of code produced by SparcV9's CreateCodeToLoadConst() method\
+LightWtProfiling/scheduler.cpp\
+ - Uses POSIX interval timers; only compiles correctly on Linux & Solaris\
+ScratchMemory/ScratchMemory.cpp\
+ - Depends on assembler ".skip" directive.\
+TraceCache/InstrUtils.cpp\
+ - Assumes use of SparcV9 instruction set\
+ - Contains hardcoded SparcV9 binary machine code bit patterns used for identifying & constructing code\
+TraceCache/VirtualMem.cpp\
+ - Contains hardcoded SparcV9 binary machine code bit patterns used for identifying & constructing code\
+TraceCache/TraceCache.cpp\
+ - Uses "doFlush" to flush instruction cache after modifying instructions in memory\
+\
+\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
+
\f1\b \cf0 \ul Known problems with the current implementation\
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardeftab720\ql\qnatural
More information about the llvm-commits
mailing list