[llvm-commits] CVS: llvm/lib/Reoptimizer/Inst/design.txt
Joel Stanley
jstanley at cs.uiuc.edu
Thu Mar 20 08:42:01 PST 2003
Changes in directory llvm/lib/Reoptimizer/Inst:
design.txt updated: 1.2 -> 1.3
---
Log message:
---
Diffs of the changes:
Index: llvm/lib/Reoptimizer/Inst/design.txt
diff -u llvm/lib/Reoptimizer/Inst/design.txt:1.2 llvm/lib/Reoptimizer/Inst/design.txt:1.3
--- llvm/lib/Reoptimizer/Inst/design.txt:1.2 Mon Mar 17 18:49:31 2003
+++ llvm/lib/Reoptimizer/Inst/design.txt Thu Mar 20 08:49:06 2003
@@ -1,4 +1,4 @@
-{{{ OVERALL GOALS OF PHASE 2
+{{{ OVERALL GOALS OF PHASE 2 AND GENERAL STUFF
- identify all loads of global volatile variables and the
corresponding stores to temporaries
@@ -17,7 +17,6 @@
Optimization in the compiler is on. This means that we can't really
rely on a particular "signature" of generated assembly code.
-
}}}
{{{ Problems:
@@ -59,9 +58,89 @@
analysis than we want to do (and potentially, not even then!), so we don't
overwrite address arithmetic instructions, etc.
- }}}
+ }}}
+ {{{ Inlined functions (not answered)
+
+ How do we best deal with functions that are inlined by the black-box
+ compiler? In particular, the naive approach of recording (in the
+ global bookkeeping table or GBT) the names of instrumented functions
+ in phase 1 so that they can be looked up via the ELF symtable in
+ phase 2 doesn't work if the instrumented functions got inlined. We
+ could try a 2-step approach to finding the function bodies that
+ contain instrumentation points: 1) For each function name in the
+ GBT, look it up in the ELF symtab; if present, done, otherwise try
+ step 2. 2) Scan the entire program for load-volatile instructions,
+ obtain the address of those instructions, and then find out what
+ function body address interval contains that address. This approach
+ seems like a lot of work, but might just do it.
+
+ The previous approach, which may or may not be a viable approach,
+ assumes that we can actually obtain the GBT contents. Our previous
+ plan for obtaining the GBT base address was to take its address in
+ the documentation function pre-compilation, then (post-compilation)
+ look up the documentation function (by name), parse the load
+ instruction (or however the GBT address was witnessed), and obtain
+ the GBT. However, this approach is flawed since the documentation
+ function might get removed (if it's dead) or inlined (if it's
+ called). Perhaps the address of the GBT should just be taken in a
+ non-dead way at the entry to main itself? A write of the GBT address
+ to a volatile global (yet another one!) should ensure that the copy
+ isn't removed.
+
+ Another approach for finding the GBT base addr (since we're
+ operating exclusively on ELF) is to simply look it up in the ELF
+ symtab. This will work because the static structure contents won't
+ be able to be eliminated if the struct is global, since other
+ compilation units may refer to it directly using extern. However,
+ the linker itself may prevent it from being included in the final
+ executable if there are no references to it. Perhaps we can
+ introduce a benign use of the GBT (taking it address and storing the
+ result into a global volatile) simply to ensure that there is *a*
+ reference to the structure. E-mail Chris about this.
+
+ {{{ Response from Chris
+
+ > If I've got a global statically-initialized struct that isn't used
+ > anywhere within its compliation unit, any compiler wouldn't be able to
+ > remove it because some other compilation unit may refer to it in an extern
+ > manner, correct?
+
+ True, unless it's declared static.
+
+ > However, if the above is the case, *and* no other compilation unit refers
+ > to the struct in an extern manner, a clever linker would be able to delete
+ > the structure because nothing needed to bind to the symbol. Right?
+
+ Yes, or simple IPO.
+
+ > So...I need to introduce a global struct that can't be removed by the
+ > compiler or linker, without changing the semantics of the program. Then I
+ > need to read the contents of the struct directly from the ELF executable
+ > after looking up its name in the ELF symbol table. The way I'm currently
+ > planning on doing this is by inserting an un-removable & benign reference
+ > to the global struct in, say, main() so that a clever linker can't remove
+ > it. Does this sound lame? :)
+
+ That should work. Note that normal linkers won't delete these structure
+ references, so it may not even be a problem unless you're trying to be more
+ portable...
+
+ -Chris
+
+ }}}
+
+ KIS concession: Grab the base address of the GBT directly from the ELF symtab,
+ and worry about it getting deleted if/when that actually occurs.
-}}}
+ }}}
+ {{{ Violation of register schedules (issue?)
+
+ What about violation of register schedules when inserting new code?
+ Is this even an issue?
+
+ }}}
+
+}}}
{{{ Musings on trampolines:
- Assuming that we can leave the address calculation in place for the
@@ -149,8 +228,8 @@
instructions would be required to pack the register with the target
address.
-}}}
-{{{ Trampoline-related ideas:
+
+Trampoline-related ideas:
(Thanks Brian!) :)
@@ -371,7 +450,44 @@
didn't use arbitrary instrumentation points, we can do *exactly* what
MDL can do (we think) without source access.
-Implementation sketch:
+}}}
+
+{{{ MEETING MINUTES 20 Mar 2003
+
+Agenda:
+ - Address pending issues already sent via e-mail.
+ - Confidence of approach, assurance of validity w.r.t time commitment.
+ - Inlining of functions and how to handle
+
+ - Register schedule violation; or "how do we determine what registers should
+ hold values when insert code?". Rather, should we simply adopt the policy
+ of 'always spill' or is doing otherwise an optimization that should be
+ considered later? In particular, if we always spill, AND can't remove the
+ address arithmetic instructions (for volatile temps) without more robust
+ analysis, at what point do we consider phase 2 "too expensive" when compared
+ with plain old opaque function calls at instrumentation points?
+
+ - From the e-mail(s):
+
+ (a) We have to balance the benefit of a vendor-independent implementation
+ vs. the opportunity to do something "more conceptually novel" with the
+ metrics.
+
+ (b) We can discuss instrumenting functions at function entry; of course,
+ this point is moot if we do not take the binary editing approach.
+
+ - What is the purpose of "exit stubs" in Trigger/TraceCache? What is the
+ role of the branch map and call map?
+
+ - As long as the new code fits within the 64KB segment, we have the
+ capability to add new code right?
+
+Minutes:
+
+
+}}}
+
+{{{ IMPLEMENTATION SKETCH
At a high level, in broad sweeping strokes, we're going to use the
trace cache tool as a framework for runtime manipulation of the binary
@@ -469,30 +585,222 @@
{{{ MILESTONES
-- Extract and report bookkeeping data structure contents from raw
-compiled binary.
-
-- Determine if/how the tracecache framework can be used for a CFG
-subgraph "copy" to a new area of memory; determine whether or not it's
-worth the effort or whether it should be "done from scratch".
+- Perform the "tracecache experiment" described in the TODO section.
}}}
{{{ TODO
+- Answer the following questions about the tracecache:
+ {{{
+
+ - To what extent does it use the LLVM bytecode and/or mapping information
+ to map a particular path into the cache?
+
+ It appears that the code in the TraceCache object itself doesn't require
+ any of the LLVM mapping information. However, as inputs to addTrace(), it
+ does need a "call map", a "branch map", and a vector called
+ "exitStubs". I'm not clear what the exit stubs are for yet, exactly, nor
+ the precise role of the call/branch maps (although I think they are just
+ the redirected branch destinations or some such thing). The trigger
+ routine *does* use the LLVM mapping information to construct these maps,
+ so it may be difficult to determine how to form the maps without the
+ specific mapping information...but it might be possible.
+
+ - What kind of modifications would be needed to map an entire function body
+ into the tracecache region such that "hot paths" weren't considered and
+ path activity wasn't tracked? What kind of dependence does this induce on
+ the LLVM mapping and/or bytecode representation?
+
+ Good news: the "hot path" and LLVM specific stuff seems confined to the
+ trigger routine. The TraceCache class itself seems to operate on raw
+ instruction ranges, etc. NB: No mmaping of the executable is performed
+ because all of the contextual information about a particular function is
+ obtained via the LLVM mapping information.
+
+ - Perform the following experiement to help answer these questions:
+
+ Use the tracecache/BinInterface/VirtualMem/etc mechanisms as they
+ currently exist, together with te ELF library and phase 1, to do the
+ following:
+
+ Insert a call to our phase2 function in main; the phase2 function will
+ be responsible for doing all of the binary analysis and
+ transformations.
+
+ For using ELF mechanisms that we need to use, determine how the
+ tracecache is currently (if it is) mmap'ing the executable, and how to
+ direct the ELF library to use the executable image in memory instead
+ of loading it from disk.
+
+ Given the name of a function that exists in the ELF object file,
+ obtain its starting and ending address _in the address space of the
+ running application_.
+
+ ^^^ At this point, the application should be running and, at RUNTIME,
+ spit out (at the very least) the function boundary addresses;
+ preferably, it can spit out the BinInterface-obtained disassembly as
+ well so that we can compare it against the static disassembly.
+
+ Copy this address region to the cache and reroute execution,
+ preferably modifying some code in the cache so that the rerouted
+ execution is apparent during execution. [This step is really the key
+ investigatory point: do we need to access the LLVM-bytecode CFG to do
+ this? Does the copy mechanism only support a copy of a specified path
+ into the cache, or will it operate on an arbitrary CFG/CFG subgraph?]
+
+ }}}
+
- Read EEL paper to get a better feel for binary modification issues
-- Do sample by hand and revisit actions of both phases
-- Extract bookkeeping data structure contents, function stats/ends,
- etc, using low-level POSIX/ELF mechanisms.
}}}
-{{{ PENDING QUESTIONS
+{{{ BY-HAND EXAMPLE OF PHASE ACTIONS
-[What about violation of register schedules when inserting new code?
-Is this an issue?]
+ {{{ High-level code (i.e. no sigfuns):
-}}}
+pp_interval<bounded_series, elapsedTimeStart, elapsedTimeEnd, size=20> eth;
+
+void bar() {
+ int cnt = 0;
+
+ {
+ sample eth;
+ while(cnt++ != 15) {
+ foo();
+ printf(...);
+ }
+ }
+ ...
+ printf("avg reading was %f\n", pp_avg(eth));
+}
+
+ }}}
+ {{{ Sigfun-level code (input to phase 1)
+
+void main() {
+ pp_interval("eth", elapsedTimeStart, elapsedTimeEnd, "bounded_series", "size=20");
+}
+
+[[The processing of pp_interval call in main() results in declaration:
+ double eth[20];
+which is used by name elsewhere...
+]]
+
+void bar() {
+ int cnt = 0;
+
+ {
+ pp_sigfun_interval_start("eth", elapsedTimeStart);
+
+ while(cnt++ != 15) {
+ foo();
+ printf(...);
+ }
+
+ pp_sigfun_interval_end("eth", elapsedTimeEnd);
+ }
+ ...
+ printf("avg reading was %f\n", pp_avg(eth));
+}
+
+ }}}
+ {{{ Post-phase1 code (quasi-high-level)
+
+struct GBT {
+ // fields for GBT go here...
+} the_gbt = { initializer };
+
+volatile global instSite1; // instSite1 = start of region
+volatile global instSite1_tmp;
+volatile global instSite2; // instSite2 = end of region
+volatile global instSite2_tmp;
+
+double eth[20];
+
+void bar() {
+ int cnt = 0;
+ double z; // <-- inserted for the ret val of end of region
+ // inst call (call inserted by phase 2)
+
+ {
+ instSite1_tmp = instSite1; // <-- record the address of this instSite1; a
+ // load of this address identifies this location
+ // in the code; the code:
+ // double y = elapsedTimeStart() is to be
+ // inserted here by phase 2 [replacing ld]
+ // Was: pp_sigfun_interval_start("eth", elapsedTimeStart);
+
+ while(cnt++ != 15) {
+ foo();
+ printf(...);
+ }
+
+ instSite2_tmp = instSite2; // <-- record the address of this instSite2; a
+ // load of this address identifies this location
+ // in this code; the code:
+ // z = elapsedTimeEnd(&y) is to be
+ // inserted by phase 2 [replacing ld]
+ // Was: pp_sigfun_interval_end("eth", elapsedTimeEnd);
+
+ pp_series_add(eth, z); // inserted by phase 1, uses z even though it
+ // hasn't been written to. z and eth both exist.
+ }
+ ...
+ printf("avg reading was %f\n", pp_avg(eth));
+}
+
+ }}}
+ {{{ Post-phase2 code (high level)
+
+struct GBT {
+ // fields for GBT go here...
+} the_gbt = { initializer };
+
+volatile global instSite1; // instSite1 = start of region
+volatile global instSite1_tmp;
+volatile global instSite2; // instSite2 = end of region
+volatile global instSite2_tmp;
+
+double eth[20];
+
+void bar() {
+ int cnt = 0;
+ double z;
+
+ {
+ double y = elapsedTimeStart(); // <-- y must be alloca'd, ugh.
+
+ while(cnt++ != 15) {
+ foo();
+ printf(...);
+ }
+
+ z = elapsedTimeEnd(&y);
+ pp_series_add(eth, z);
+ }
+ ...
+ printf("avg reading was %f\n", pp_avg(eth));
+}
+
+ }}}
+Aside from the obvious difficulties with phase 2 (find the load locations, etc),
+an additional difficulty exists: we must alloca the temporary for the return
+value of the first instrumentation function for a region. Originally, I thought
+that this meant we'd need to have to (enough) reserved space at the entry of the
+instrumented function to place 'n' alloca calls, etc: one for each temporary.
+However, I believe that since our current approach allows essentially arbitrary
+code to be inserted into the tracecache region (supposedly), we no longer have
+this problem: invoke alloca immediately before the call. The only problem with
+this is finding available registers to use, something which I don't understand
+at all. If we must use a temporary that exists at the end of phase 1, that is
+also a possibility, but then we've got to place un-removable uses of all of
+those temporaries at the start of main (or something) so that they do not get
+eliminated. This shouldn't be a problem though. The more general problem of
+"register schedule violation potential", however, may still be a problem:
+consider taking the address of the alloca'd temporary and passing it to the
+end-region instrumentation function, for example.
+}}}
More information about the llvm-commits
mailing list