[llvm-commits] CVS: llvm/lib/Reoptimizer/Inst/Phase2.cpp design.txt PerfInst.cpp

Fri Apr 4 17:09:01 PST 2003

Changes in directory llvm/lib/Reoptimizer/Inst:

Phase2.cpp added (r1.1)
design.txt updated: 1.5 -> 1.6
PerfInst.cpp (r1.4) removed

---
Log message:

Moved PerfInst.cpp contents -> Phase2.cpp, removing PerfInst.cpp.



---
Diffs of the changes:

Index: llvm/lib/Reoptimizer/Inst/design.txt
diff -u llvm/lib/Reoptimizer/Inst/design.txt:1.5 llvm/lib/Reoptimizer/Inst/design.txt:1.6

--- llvm/lib/Reoptimizer/Inst/design.txt:1.5	Thu Apr  3 15:00:51 2003
+++ llvm/lib/Reoptimizer/Inst/design.txt	Fri Apr  4 17:08:56 2003
@@ -865,64 +865,117 @@
 
 {{{ TODO
 
-- Read EEL paper to get a better feel for binary modification issues
+  - Investigate trace-cache dummy function mechanisms, decide on approach A or B
+    in phase outline
 
-{{{ OLD PHASE DESCRIPTION 
+  - Implement phase outline
 
-- Use the existing mechanisms at your disposal
-  (ELF/tracecache/BinInterface/VirtualMem/etc) to do the following.
-
-      For each function, locate the load-volatile instructions that define
-      interval and point metrics (potentially recording some information about
-      them for later use); also find the padding region at the end of the
-      function (this may be hard).  Write code into the padding region to call
-      the "phase 3 transformation function", and over-write the *first*
-      load-volatile in the function that corresponds to an instrumentation point
-      (or interval start point) with a direct branch down to the padded region.
-
-      Vikram's comment on this last step:
-
-      [Finding "the first" load-volatile in the function is not easy because of
-      control-flow.  Furthermore, I don't think Step 2 needs to find
-      load-volatiles for actual instrumentations at all since many functions may
-      never be executed.  We should leave that to step 3.
-
-      Therefore, I would simplify as follows:
-
-      For each function, find the load-volatile instructions that define the
-      entry of the padded region.  Over-write the first instruction of the
-      function with a direct branch to a trampoline in the padded region.  This
-      trampoline executes the first instruction and then calls the Phase 3
-      routine to instrument the function.]
-
-      Scratch that. I think this needs to be rephrased again to (assuming we
-      have only one pad region in the function body:
-
-      For each function, find the load-volatile instructions that define the
-      padded region so we know where it is.  Then, replace the first instruction
-      in the function w/ a branch down to the padded region. The padded region
-      contains and indirect branch to a dynamically-allocated body of code into
-      which the entire function body is copied.  Phase 2 then manipulates the
-      code in the copied region, replacing candidate load-volatiles w/ if/else
-      blocks that call the appropriate instrumentation function if the
-      load-volatile is actually an instrumentation function or executing the
-      original code otherwise.
-
-  On phase 3 transformation function invocation:
-
-      Performs all of tracecache-like magic, copying the original code to a
-      region of memory where the code can grow, rewriting the pad region so that
-      it will execute the indirect jump to the new code region, etc.  The
-      majority of the actions required here are still fairly unclear.  To
-      accomplish this step, we must first determine how to make the branch- and
-      call-maps that the TraceCache addTrace() routine(s) require, and how to
-      otherwise use the existing tracecache stuff to accomplish what we want.
+  - Read EEL paper to get a better feel for binary modification issues
 
 }}}
 
-{{{ NEW PHASE DESCRIPTION
+{{{ PHASE OUTLINE
+
+  Below, Approach A refers to using *only* dummy functions, and Approach B
+  refers to using *only* dynamically-allocated, heap-managed memory.  Approach C
+  (to come later) is the approach that combines the two, and is slightly more
+  complex.
+
+  In phase 1:
+
+      Phase 1 actions as described in earlier work (building the GBT, handling
+      sigfuns properly (i.e. adding a pair-of-sigfuns mechanism for point
+      metrics), compare against by-hand example for phase 1 actions, etc.  Also
+      might need to record information about which volatiles are associated with
+      each for start/end points of intervals and point-scopes.
+
+      Insert a call to phase2 in main.
+
+      Handling storage for new code & instrumentation calls:
+
+      Approach A: Construct a dummy function and record its address in the GBT
+      for use by the other phases.
+
+      Approach B: Other phases use heap-managed dynamic memory; no dummy
+      function needed.
+
+  In phase 2:
+
+  On program startup ("phase 2" function called from main()):
+
+      1. Build a starting-addres-to-function-extent map for use by later phases.
+
+      2. For each function F (only those in the text segment preferably), setup phase 3 branches.
+
+      Approach A: 
+
+          2a. Replace the first instruction in F with a branch to a new slot in
+          the dummy function.  
 
-Notes on using the total-copy approach in the prototype implementation.
+	  2b. At the new slot write first the (replaced) first instruction in F,
+	  followed by code to call the phase 3 function with the address of F as
+	  an argument.
+
+      Approach B: 
+    
+          2a. Save the first number instructions in an F -> [instructions] record
+          of some kind.  Phase 3 will restore them later
+
+	  2b. Over the top of the original instructions (now saved), write a call
+	  to phase 3, passing the address of F as an argument.
+
+  In phase 3:
+ 
+  1. Obtain the code region specified for F by the starting address to function
+  extent table built in phase 2.
+
+  2. 
+    Approach A: Do nothing.
+
+    Approach B: Copy the body of F into the heap-managed "instruction buffer"
+    (call the start location of the copy F') and over-write the first
+    instructions of F with an indirect jump to F'.  Rewrite all branches within
+    the boundaries of F' as needed.  Overwrite the first instructions of F' with
+    the instructions saved in the F -> [instructions] record constructed by
+    phase 2.
+
+  3. "Slots" refer to the properly-sized segments of memory containing whatever
+  code needs to be written. As a KIS concession, slots are not partitionable or
+  reusable.
+  
+  Approach A: For each candidate load instruction I within F, at location C:
+  Approach B: For each candidate load instruction I within F', at location C:
+
+  3a. Grab a new slot.
+  3b. Save I's load/save instructions (L and S, respectively) in slot.
+  3c. Replace the L with a branch to slot.
+  3d. Replace S with a nop.
+
+  Approach A:
+
+  3e. Write phase 4 code in slot:
+      if(actually an instrumentation site)
+          rewrite branch at C to next instruction
+	  call proper instrumentation fnction <- C branches to here
+          branch back to C
+      else
+          restore original instructions
+          branch back to C
+
+  Approach B:
+
+  3e. Write phase 4 code in slot:
+      if(actually an instrumentation site)
+ 	  Grow code at C to call proper instrumentation function,
+ 	    replacing branch placed there by 3c.
+ 	  branch back to C
+      else
+ 	restore original instructions
+ 	branch back to C
+
+  In phase 4: No special action needed.
+
+{{{ Notes on using the total-copy approach in the prototype implementation.
 
 Note that we will need to use the total-copy approach as a "fall-back" from the
 dummy function (or padded region approach) in the following cases:
@@ -954,7 +1007,6 @@
 general, less efficient implementation.  The most general, most efficient
 implementation may not be obtainable in the short term, but it's reasonable to
 try for.
-
 
 }}}