[llvm-commits] CVS: llvm/lib/Reoptimizer/Inst/Phase2.cpp design.txt PerfInst.cpp
Joel Stanley
jstanley at cs.uiuc.edu
Fri Apr 4 17:09:01 PST 2003
Changes in directory llvm/lib/Reoptimizer/Inst:
Phase2.cpp added (r1.1)
design.txt updated: 1.5 -> 1.6
PerfInst.cpp (r1.4) removed
Log message:
Moved PerfInst.cpp contents -> Phase2.cpp, removing PerfInst.cpp.
Diffs of the changes:
Index: llvm/lib/Reoptimizer/Inst/design.txt
diff -u llvm/lib/Reoptimizer/Inst/design.txt:1.5 llvm/lib/Reoptimizer/Inst/design.txt:1.6
--- llvm/lib/Reoptimizer/Inst/design.txt:1.5 Thu Apr 3 15:00:51 2003
+++ llvm/lib/Reoptimizer/Inst/design.txt Fri Apr 4 17:08:56 2003
@@ -865,64 +865,117 @@
{{{ TODO
-- Read EEL paper to get a better feel for binary modification issues
+ - Investigate trace-cache dummy function mechanisms, decide on approach A or B
+ in phase outline
+ - Implement phase outline
-- Use the existing mechanisms at your disposal
- (ELF/tracecache/BinInterface/VirtualMem/etc) to do the following.
- For each function, locate the load-volatile instructions that define
- interval and point metrics (potentially recording some information about
- them for later use); also find the padding region at the end of the
- function (this may be hard). Write code into the padding region to call
- the "phase 3 transformation function", and over-write the *first*
- load-volatile in the function that corresponds to an instrumentation point
- (or interval start point) with a direct branch down to the padded region.
- Vikram's comment on this last step:
- [Finding "the first" load-volatile in the function is not easy because of
- control-flow. Furthermore, I don't think Step 2 needs to find
- load-volatiles for actual instrumentations at all since many functions may
- never be executed. We should leave that to step 3.
- Therefore, I would simplify as follows:
- For each function, find the load-volatile instructions that define the
- entry of the padded region. Over-write the first instruction of the
- function with a direct branch to a trampoline in the padded region. This
- trampoline executes the first instruction and then calls the Phase 3
- routine to instrument the function.]
- Scratch that. I think this needs to be rephrased again to (assuming we
- have only one pad region in the function body:
- For each function, find the load-volatile instructions that define the
- padded region so we know where it is. Then, replace the first instruction
- in the function w/ a branch down to the padded region. The padded region
- contains and indirect branch to a dynamically-allocated body of code into
- which the entire function body is copied. Phase 2 then manipulates the
- code in the copied region, replacing candidate load-volatiles w/ if/else
- blocks that call the appropriate instrumentation function if the
- load-volatile is actually an instrumentation function or executing the
- original code otherwise.
- On phase 3 transformation function invocation:
- Performs all of tracecache-like magic, copying the original code to a
- region of memory where the code can grow, rewriting the pad region so that
- it will execute the indirect jump to the new code region, etc. The
- majority of the actions required here are still fairly unclear. To
- accomplish this step, we must first determine how to make the branch- and
- call-maps that the TraceCache addTrace() routine(s) require, and how to
- otherwise use the existing tracecache stuff to accomplish what we want.
+ - Read EEL paper to get a better feel for binary modification issues
+ Below, Approach A refers to using *only* dummy functions, and Approach B
+ refers to using *only* dynamically-allocated, heap-managed memory. Approach C
+ (to come later) is the approach that combines the two, and is slightly more
+ complex.
+ In phase 1:
+ Phase 1 actions as described in earlier work (building the GBT, handling
+ sigfuns properly (i.e. adding a pair-of-sigfuns mechanism for point
+ metrics), compare against by-hand example for phase 1 actions, etc. Also
+ might need to record information about which volatiles are associated with
+ each for start/end points of intervals and point-scopes.
+ Insert a call to phase2 in main.
+ Handling storage for new code & instrumentation calls:
+ Approach A: Construct a dummy function and record its address in the GBT
+ for use by the other phases.
+ Approach B: Other phases use heap-managed dynamic memory; no dummy
+ function needed.
+ In phase 2:
+ On program startup ("phase 2" function called from main()):
+ 1. Build a starting-addres-to-function-extent map for use by later phases.
+ 2. For each function F (only those in the text segment preferably), setup phase 3 branches.
+ Approach A:
+ 2a. Replace the first instruction in F with a branch to a new slot in
+ the dummy function.
-Notes on using the total-copy approach in the prototype implementation.
+ 2b. At the new slot write first the (replaced) first instruction in F,
+ followed by code to call the phase 3 function with the address of F as
+ an argument.
+ Approach B:
+ 2a. Save the first number instructions in an F -> [instructions] record
+ of some kind. Phase 3 will restore them later
+ 2b. Over the top of the original instructions (now saved), write a call
+ to phase 3, passing the address of F as an argument.
+ In phase 3:
+ 1. Obtain the code region specified for F by the starting address to function
+ extent table built in phase 2.
+ 2.
+ Approach A: Do nothing.
+ Approach B: Copy the body of F into the heap-managed "instruction buffer"
+ (call the start location of the copy F') and over-write the first
+ instructions of F with an indirect jump to F'. Rewrite all branches within
+ the boundaries of F' as needed. Overwrite the first instructions of F' with
+ the instructions saved in the F -> [instructions] record constructed by
+ phase 2.
+ 3. "Slots" refer to the properly-sized segments of memory containing whatever
+ code needs to be written. As a KIS concession, slots are not partitionable or
+ reusable.
+ Approach A: For each candidate load instruction I within F, at location C:
+ Approach B: For each candidate load instruction I within F', at location C:
+ 3a. Grab a new slot.
+ 3b. Save I's load/save instructions (L and S, respectively) in slot.
+ 3c. Replace the L with a branch to slot.
+ 3d. Replace S with a nop.
+ Approach A:
+ 3e. Write phase 4 code in slot:
+ if(actually an instrumentation site)
+ rewrite branch at C to next instruction
+ call proper instrumentation fnction <- C branches to here
+ branch back to C
+ else
+ restore original instructions
+ branch back to C
+ Approach B:
+ 3e. Write phase 4 code in slot:
+ if(actually an instrumentation site)
+ Grow code at C to call proper instrumentation function,
+ replacing branch placed there by 3c.
+ branch back to C
+ else
+ restore original instructions
+ branch back to C
+ In phase 4: No special action needed.
+{{{ Notes on using the total-copy approach in the prototype implementation.
Note that we will need to use the total-copy approach as a "fall-back" from the
dummy function (or padded region approach) in the following cases:
@@ -954,7 +1007,6 @@
general, less efficient implementation. The most general, most efficient
implementation may not be obtainable in the short term, but it's reasonable to
try for.
More information about the llvm-commits
mailing list