[llvm-commits] CVS: llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt 2000-11-18-EarlyDesignIdeasResp.txt 2000-12-06-EncodingIdea.txt 2000-12-06-MeetingSummary.txt 2001-01-31-UniversalIRIdea.txt 2001-02-06-TypeNotationDebate.txt 2001-02-06-TypeNotationDebateResp1.txt 2001-02-06-TypeNotationDebateResp2.txt 2001-02-06-TypeNotationDebateResp4.txt 2001-02-09-AdveComments.txt 2001-02-09-AdveCommentsResponse.txt 2001-02-13-Reference-Memory.txt 2001-02-13-Reference-MemoryResponse.txt 2001-04-16-DynamicCompilation.txt 2001-05-18-ExceptionHandling.txt 2001-05-19-ExceptionResponse.txt 2001-06-01-GCCOptimizations.txt 2001-06-01-GCCOptimizations2.txt 2001-06-20-.NET-Differences.txt 2001-07-06-LoweringIRForCodeGen.txt 2001-07-08-InstructionSelection.txt 2001-07-08-InstructionSelection2.txt 2001-09-18-OptimizeExceptions.txt 2002-05-12-InstListChange.txt 2002-06-25-MegaPatchInfo.txt 2003-01-23-CygwinNotes.txt 2003-06-25-Reoptimizer1.txt 2003-06-26-Reoptimizer2.txt
Tanya Lattner
tonic at nondot.org
Sun Nov 19 23:27:58 PST 2006
Changes in directory llvm-www/releases/1.9/docs/HistoricalNotes:
2000-11-18-EarlyDesignIdeas.txt added (r1.1)
2000-11-18-EarlyDesignIdeasResp.txt added (r1.1)
2000-12-06-EncodingIdea.txt added (r1.1)
2000-12-06-MeetingSummary.txt added (r1.1)
2001-01-31-UniversalIRIdea.txt added (r1.1)
2001-02-06-TypeNotationDebate.txt added (r1.1)
2001-02-06-TypeNotationDebateResp1.txt added (r1.1)
2001-02-06-TypeNotationDebateResp2.txt added (r1.1)
2001-02-06-TypeNotationDebateResp4.txt added (r1.1)
2001-02-09-AdveComments.txt added (r1.1)
2001-02-09-AdveCommentsResponse.txt added (r1.1)
2001-02-13-Reference-Memory.txt added (r1.1)
2001-02-13-Reference-MemoryResponse.txt added (r1.1)
2001-04-16-DynamicCompilation.txt added (r1.1)
2001-05-18-ExceptionHandling.txt added (r1.1)
2001-05-19-ExceptionResponse.txt added (r1.1)
2001-06-01-GCCOptimizations.txt added (r1.1)
2001-06-01-GCCOptimizations2.txt added (r1.1)
2001-06-20-.NET-Differences.txt added (r1.1)
2001-07-06-LoweringIRForCodeGen.txt added (r1.1)
2001-07-08-InstructionSelection.txt added (r1.1)
2001-07-08-InstructionSelection2.txt added (r1.1)
2001-09-18-OptimizeExceptions.txt added (r1.1)
2002-05-12-InstListChange.txt added (r1.1)
2002-06-25-MegaPatchInfo.txt added (r1.1)
2003-01-23-CygwinNotes.txt added (r1.1)
2003-06-25-Reoptimizer1.txt added (r1.1)
2003-06-26-Reoptimizer2.txt added (r1.1)
---
Log message:
1.9 docs
---
Diffs of the changes: (+2185 -0)
2000-11-18-EarlyDesignIdeas.txt | 74 +++++++++
2000-11-18-EarlyDesignIdeasResp.txt | 199 +++++++++++++++++++++++++
2000-12-06-EncodingIdea.txt | 30 +++
2000-12-06-MeetingSummary.txt | 83 ++++++++++
2001-01-31-UniversalIRIdea.txt | 39 +++++
2001-02-06-TypeNotationDebate.txt | 67 ++++++++
2001-02-06-TypeNotationDebateResp1.txt | 75 +++++++++
2001-02-06-TypeNotationDebateResp2.txt | 53 ++++++
2001-02-06-TypeNotationDebateResp4.txt | 89 +++++++++++
2001-02-09-AdveComments.txt | 120 +++++++++++++++
2001-02-09-AdveCommentsResponse.txt | 245 ++++++++++++++++++++++++++++++++
2001-02-13-Reference-Memory.txt | 39 +++++
2001-02-13-Reference-MemoryResponse.txt | 47 ++++++
2001-04-16-DynamicCompilation.txt | 49 ++++++
2001-05-18-ExceptionHandling.txt | 202 ++++++++++++++++++++++++++
2001-05-19-ExceptionResponse.txt | 45 +++++
2001-06-01-GCCOptimizations.txt | 63 ++++++++
2001-06-01-GCCOptimizations2.txt | 71 +++++++++
2001-06-20-.NET-Differences.txt | 30 +++
2001-07-06-LoweringIRForCodeGen.txt | 31 ++++
2001-07-08-InstructionSelection.txt | 51 ++++++
2001-07-08-InstructionSelection2.txt | 25 +++
2001-09-18-OptimizeExceptions.txt | 56 +++++++
2002-05-12-InstListChange.txt | 55 +++++++
2002-06-25-MegaPatchInfo.txt | 72 +++++++++
2003-01-23-CygwinNotes.txt | 28 +++
2003-06-25-Reoptimizer1.txt | 137 +++++++++++++++++
2003-06-26-Reoptimizer2.txt | 110 ++++++++++++++
28 files changed, 2185 insertions(+)
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt:1.1
*** /dev/null Mon Nov 20 01:27:55 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,74 ----
+ Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST)
+ From: Vikram Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: a few thoughts
+
+ I've been mulling over the virtual machine problem and I had some
+ thoughts about some things for us to think about discuss:
+
+ 1. We need to be clear on our goals for the VM. Do we want to emphasize
+ portability and safety like the Java VM? Or shall we focus on the
+ architecture interface first (i.e., consider the code generation and
+ processor issues), since the architecture interface question is also
+ important for portable Java-type VMs?
+
+ This is important because the audiences for these two goals are very
+ different. Architects and many compiler people care much more about
+ the second question. The Java compiler and OS community care much more
+ about the first one.
+
+ Also, while the architecture interface question is important for
+ Java-type VMs, the design constraints are very different.
+
+
+ 2. Design issues to consider (an initial list that we should continue
+ to modify). Note that I'm not trying to suggest actual solutions here,
+ but just various directions we can pursue:
+
+ a. A single-assignment VM, which we've both already been thinking about.
+
+ b. A strongly-typed VM. One question is do we need the types to be
+ explicitly declared or should they be inferred by the dynamic compiler?
+
+ c. How do we get more high-level information into the VM while keeping
+ to a low-level VM design?
+
+ o Explicit array references as operands? An alternative is
+ to have just an array type, and let the index computations be
+ separate 3-operand instructions.
+
+ o Explicit instructions to handle aliasing, e.g.s:
+ -- an instruction to say "I speculate that these two values are not
+ aliased, but check at runtime", like speculative execution in
+ EPIC?
+ -- or an instruction to check whether two values are aliased and
+ execute different code depending on the answer, somewhat like
+ predicated code in EPIC
+
+ o (This one is a difficult but powerful idea.)
+ A "thread-id" field on every instruction that allows the static
+ compiler to generate a set of parallel threads, and then have
+ the runtime compiler and hardware do what they please with it.
+ This has very powerful uses, but thread-id on every instruction
+ is expensive in terms of instruction size and code size.
+ We would need to compactly encode it somehow.
+
+ Also, this will require some reading on at least two other
+ projects:
+ -- Multiscalar architecture from Wisconsin
+ -- Simultaneous multithreading architecture from Washington
+
+ o Or forget all this and stick to a traditional instruction set?
+
+
+ BTW, on an unrelated note, after the meeting yesterday, I did remember
+ that you had suggested doing instruction scheduling on SSA form instead
+ of a dependence DAG earlier in the semester. When we talked about
+ it yesterday, I didn't remember where the idea had come from but I
+ remembered later. Just giving credit where its due...
+
+ Perhaps you can save the above as a file under RCS so you and I can
+ continue to expand on this.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt:1.1
*** /dev/null Mon Nov 20 01:27:57 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,199 ----
+ Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram Adve <vadve at cs.uiuc.edu>
+ Subject: Re: a few thoughts
+
+ Okay... here are a few of my thoughts on this (it's good to know that we
+ think so alike!):
+
+ > 1. We need to be clear on our goals for the VM. Do we want to emphasize
+ > portability and safety like the Java VM? Or shall we focus on the
+ > architecture interface first (i.e., consider the code generation and
+ > processor issues), since the architecture interface question is also
+ > important for portable Java-type VMs?
+
+ I forsee the architecture looking kinda like this: (which is completely
+ subject to change)
+
+ 1. The VM code is NOT guaranteed safe in a java sense. Doing so makes it
+ basically impossible to support C like languages. Besides that,
+ certifying a register based language as safe at run time would be a
+ pretty expensive operation to have to do. Additionally, we would like
+ to be able to statically eliminate many bounds checks in Java
+ programs... for example.
+
+ 2. Instead, we can do the following (eventually):
+ * Java bytecode is used as our "safe" representation (to avoid
+ reinventing something that we don't add much value to). When the
+ user chooses to execute Java bytecodes directly (ie, not
+ precompiled) the runtime compiler can do some very simple
+ transformations (JIT style) to convert it into valid input for our
+ VM. Performance is not wonderful, but it works right.
+ * The file is scheduled to be compiled (rigorously) at a later
+ time. This could be done by some background process or by a second
+ processor in the system during idle time or something...
+ * To keep things "safe" ie to enforce a sandbox on Java/foreign code,
+ we could sign the generated VM code with a host specific private
+ key. Then before the code is executed/loaded, we can check to see if
+ the trusted compiler generated the code. This would be much quicker
+ than having to validate consistency (especially if bounds checks have
+ been removed, for example)
+
+ > This is important because the audiences for these two goals are very
+ > different. Architects and many compiler people care much more about
+ > the second question. The Java compiler and OS community care much more
+ > about the first one.
+
+ 3. By focusing on a more low level virtual machine, we have much more room
+ for value add. The nice safe "sandbox" VM can be provided as a layer
+ on top of it. It also lets us focus on the more interesting compilers
+ related projects.
+
+ > 2. Design issues to consider (an initial list that we should continue
+ > to modify). Note that I'm not trying to suggest actual solutions here,
+ > but just various directions we can pursue:
+
+ Understood. :)
+
+ > a. A single-assignment VM, which we've both already been thinking
+ > about.
+
+ Yup, I think that this makes a lot of sense. I am still intrigued,
+ however, by the prospect of a minimally allocated VM representation... I
+ think that it could have definate advantages for certain applications
+ (think very small machines, like PDAs). I don't, however, think that our
+ initial implementations should focus on this. :)
+
+ Here are some other auxilliary goals that I think we should consider:
+
+ 1. Primary goal: Support a high performance dynamic compilation
+ system. This means that we have an "ideal" division of labor between
+ the runtime and static compilers. Of course, the other goals of the
+ system somewhat reduce the importance of this point (f.e. portability
+ reduces performance, but hopefully not much)
+ 2. Portability to different processors. Since we are most familiar with
+ x86 and solaris, I think that these two are excellent candidates when
+ we get that far...
+ 3. Support for all languages & styles of programming (general purpose
+ VM). This is the point that disallows java style bytecodes, where all
+ array refs are checked for bounds, etc...
+ 4. Support linking between different language families. For example, call
+ C functions directly from Java without using the nasty/slow/gross JNI
+ layer. This involves several subpoints:
+ A. Support for languages that require garbage collectors and integration
+ with languages that don't. As a base point, we could insist on
+ always using a conservative GC, but implement free as a noop, f.e.
+
+ > b. A strongly-typed VM. One question is do we need the types to be
+ > explicitly declared or should they be inferred by the dynamic
+ > compiler?
+
+ B. This is kind of similar to another idea that I have: make OOP
+ constructs (virtual function tables, class heirarchies, etc) explicit
+ in the VM representation. I believe that the number of additional
+ constructs would be fairly low, but would give us lots of important
+ information... something else that would/could be important is to
+ have exceptions as first class types so that they would be handled in
+ a uniform way for the entire VM... so that C functions can call Java
+ functions for example...
+
+ > c. How do we get more high-level information into the VM while keeping
+ > to a low-level VM design?
+ > o Explicit array references as operands? An alternative is
+ > to have just an array type, and let the index computations be
+ > separate 3-operand instructions.
+
+ C. In the model I was thinking of (subject to change of course), we
+ would just have an array type (distinct from the pointer
+ types). This would allow us to have arbitrarily complex index
+ expressions, while still distinguishing "load" from "Array load",
+ for example. Perhaps also, switch jump tables would be first class
+ types as well? This would allow better reasoning about the program.
+
+ 5. Support dynamic loading of code from various sources. Already
+ mentioned above was the example of loading java bytecodes, but we want
+ to support dynamic loading of VM code as well. This makes the job of
+ the runtime compiler much more interesting: it can do interprocedural
+ optimizations that the static compiler can't do, because it doesn't
+ have all of the required information (for example, inlining from
+ shared libraries, etc...)
+
+ 6. Define a set of generally useful annotations to add to the VM
+ representation. For example, a function can be analysed to see if it
+ has any sideeffects when run... also, the MOD/REF sets could be
+ calculated, etc... we would have to determine what is reasonable. This
+ would generally be used to make IP optimizations cheaper for the
+ runtime compiler...
+
+ > o Explicit instructions to handle aliasing, e.g.s:
+ > -- an instruction to say "I speculate that these two values are not
+ > aliased, but check at runtime", like speculative execution in
+ > EPIC?
+ > -- or an instruction to check whether two values are aliased and
+ > execute different code depending on the answer, somewhat like
+ > predicated code in EPIC
+
+ These are also very good points... if this can be determined at compile
+ time. I think that an epic style of representation (not the instruction
+ packing, just the information presented) could be a very interesting model
+ to use... more later...
+
+ > o (This one is a difficult but powerful idea.)
+ > A "thread-id" field on every instruction that allows the static
+ > compiler to generate a set of parallel threads, and then have
+ > the runtime compiler and hardware do what they please with it.
+ > This has very powerful uses, but thread-id on every instruction
+ > is expensive in terms of instruction size and code size.
+ > We would need to compactly encode it somehow.
+
+ Yes yes yes! :) I think it would be *VERY* useful to include this kind
+ of information (which EPIC architectures *implicitly* encode. The trend
+ that we are seeing supports this greatly:
+
+ 1. Commodity processors are getting massive SIMD support:
+ * Intel/Amd MMX/MMX2
+ * AMD's 3Dnow!
+ * Intel's SSE/SSE2
+ * Sun's VIS
+ 2. SMP is becoming much more common, especially in the server space.
+ 3. Multiple processors on a die are right around the corner.
+
+ If nothing else, not designing this in would severely limit our future
+ expansion of the project...
+
+ > Also, this will require some reading on at least two other
+ > projects:
+ > -- Multiscalar architecture from Wisconsin
+ > -- Simultaneous multithreading architecture from Washington
+ >
+ > o Or forget all this and stick to a traditional instruction set?
+
+ Heh... :) Well, from a pure research point of view, it is almost more
+ attactive to go with the most extreme/different ISA possible. On one axis
+ you get safety and conservatism, and on the other you get degree of
+ influence that the results have. Of course the problem with pure research
+ is that often times there is no concrete product of the research... :)
+
+ > BTW, on an unrelated note, after the meeting yesterday, I did remember
+ > that you had suggested doing instruction scheduling on SSA form instead
+ > of a dependence DAG earlier in the semester. When we talked about
+ > it yesterday, I didn't remember where the idea had come from but I
+ > remembered later. Just giving credit where its due...
+
+ :) Thanks.
+
+ > Perhaps you can save the above as a file under RCS so you and I can
+ > continue to expand on this.
+
+ I think it makes sense to do so when we get our ideas more formalized and
+ bounce it back and forth a couple of times... then I'll do a more formal
+ writeup of our goals and ideas. Obviously our first implementation will
+ not want to do all of the stuff that I pointed out above... be we will
+ want to design the project so that we do not artificially limit ourselves
+ at sometime in the future...
+
+ Anyways, let me know what you think about these ideas... and if they sound
+ reasonable...
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt:1.1
*** /dev/null Mon Nov 20 01:27:57 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,30 ----
+ From: Chris Lattner [mailto:sabre at nondot.org]
+ Sent: Wednesday, December 06, 2000 6:41 PM
+ To: Vikram S. Adve
+ Subject: Additional idea with respect to encoding
+
+ Here's another idea with respect to keeping the common case instruction
+ size down (less than 32 bits ideally):
+
+ Instead of encoding an instruction to operate on two register numbers,
+ have it operate on two negative offsets based on the current register
+ number. Therefore, instead of using:
+
+ r57 = add r55, r56 (r57 is the implicit dest register, of course)
+
+ We could use:
+
+ r57 = add -2, -1
+
+ My guess is that most SSA references are to recent values (especially if
+ they correspond to expressions like (x+y*z+p*q/ ...), so the negative
+ numbers would tend to stay small, even at the end of the procedure (where
+ the implicit register destination number could be quite large). Of course
+ the negative sign is reduntant, so you would be storing small integers
+ almost all of the time, and 5-6 bits worth of register number would be
+ plenty for most cases...
+
+ What do you think?
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt:1.1
*** /dev/null Mon Nov 20 01:27:57 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,83 ----
+ SUMMARY
+ -------
+
+ We met to discuss the LLVM instruction format and bytecode representation:
+
+ ISSUES RESOLVED
+ ---------------
+
+ 1. We decided that we shall use a flat namespace to represent our
+ variables in SSA form, as opposed to having a two dimensional namespace
+ of the original variable and the SSA instance subscript.
+
+ ARGUMENT AGAINST:
+ * A two dimensional namespace would be valuable when doing alias
+ analysis because the extra information can help limit the scope of
+ analysis.
+
+ ARGUMENT FOR:
+ * Including this information would require that all users of the LLVM
+ bytecode would have to parse and handle it. This would slow down the
+ common case and inflate the instruction representation with another
+ infinite variable space.
+
+ REASONING:
+ * It was decided that because original variable sources could be
+ reconstructed from SSA form in linear time, that it would be an
+ unjustified expense for the common case to include the extra
+ information for one optimization. Alias analysis itself is typically
+ greater than linear in asymptotic complexity, so this extra analaysis
+ would not affect the runtime of the optimization in a significant
+ way. Additionally, this would be an unlikely optimization to do at
+ runtime.
+
+
+ IDEAS TO CONSIDER
+ -----------------
+
+ 1. Including dominator information in the LLVM bytecode
+ representation. This is one example of an analysis result that may be
+ packaged with the bytecodes themselves. As a conceptual implementation
+ idea, we could include an immediate dominator number for each basic block
+ in the LLVM bytecode program. Basic blocks could be numbered according
+ to the order of occurance in the bytecode representation.
+
+ 2. Including loop header and body information. This would facilitate
+ detection of intervals and natural loops.
+
+ UNRESOLVED ISSUES
+ -----------------
+
+ 1. Will oSUIF provide enough of an infrastructure to support the research
+ that we will be doing? We know that it has less than stellar
+ performance, but hope that this will be of little importance for our
+ static compiler. This could affect us if we decided to do some IP
+ research. Also we do not yet understand the level of exception support
+ currently implemented.
+
+ 2. Should we consider the requirements of a direct hardware implementation
+ of the LLVM when we design it? If so, several design issues should
+ have their priorities shifted. The other option is to focus on a
+ software layer interpreting the LLVM in all cases.
+
+ 3. Should we use some form of packetized format to improve forward
+ compatibility? For example, we could design the system to encode a
+ packet type and length field before analysis information, to allow a
+ runtime to skip information that it didn't understand in a bytecode
+ stream. The obvious benefit would be for compatibility, the drawback
+ is that it would tend to splinter that 'standard' LLVM definition.
+
+ 4. Should we use fixed length instructions or variable length
+ instructions? Fetching variable length instructions is expensive (for
+ either hardware or software based LLVM runtimes), but we have several
+ 'infinite' spaces that instructions operate in (SSA register numbers,
+ type spaces, or packet length [if packets were implemented]). Several
+ options were mentioned including:
+ A. Using 16 or 32 bit numbers, which would be 'big enough'
+ B. A scheme similar to how UTF-8 works, to encode infinite numbers
+ while keeping small number small.
+ C. Use something similar to Huffman encoding, so that the most common
+ numbers are the smallest.
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,39 ----
+ Date: Wed, 31 Jan 2001 12:04:33 -0600
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: another thought
+
+ I have a budding idea about making LLVM a little more ambitious: a
+ customizable runtime system that can be used to implement language-specific
+ virtual machines for many different languages. E.g., a C vm, a C++ vm, a
+ Java vm, a Lisp vm, ..
+
+ The idea would be that LLVM would provide a standard set of runtime features
+ (some low-level like standard assembly instructions with code generation and
+ static and runtime optimization; some higher-level like type-safety and
+ perhaps a garbage collection library). Each language vm would select the
+ runtime features needed for that language, extending or customizing them as
+ needed. Most of the machine-dependent code-generation and optimization
+ features as well as low-level machine-independent optimizations (like PRE)
+ could be provided by LLVM and should be sufficient for any language,
+ simplifying the language compiler. (This would also help interoperability
+ between languages.) Also, some or most of the higher-level
+ machine-independent features like type-safety and access safety should be
+ reusable by different languages, with minor extensions. The language
+ compiler could then focus on language-specific analyses and optimizations.
+
+ The risk is that this sounds like a universal IR -- something that the
+ compiler community has tried and failed to develop for decades, and is
+ universally skeptical about. No matter what we say, we won't be able to
+ convince anyone that we have a universal IR that will work. We need to
+ think about whether LLVM is different or if has something novel that might
+ convince people. E.g., the idea of providing a package of separable
+ features that different languages select from. Also, using SSA with or
+ without type-safety as the intermediate representation.
+
+ One interesting starting point would be to discuss how a JVM would be
+ implemented on top of LLVM a bit more. That might give us clues on how to
+ structure LLVM to support one or more language VMs.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,67 ----
+ Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: Type notation debate...
+
+ This is the way that I am currently planning on implementing types:
+
+ Primitive Types:
+ type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong
+
+ Method:
+ typelist ::= typelisth | /*empty*/
+ typelisth ::= type | typelisth ',' type
+ type ::= type (typelist)
+
+ Arrays (without and with size):
+ type ::= '[' type ']' | '[' INT ',' type ']'
+
+ Pointer:
+ type ::= type '*'
+
+ Structure:
+ type ::= '{' typelist '}'
+
+ Packed:
+ type ::= '<' INT ',' type '>'
+
+ Simple examples:
+
+ [[ %4, int ]] - array of (array of 4 (int))
+ [ { int, int } ] - Array of structure
+ [ < %4, int > ] - Array of 128 bit SIMD packets
+ int (int, [[int, %4]]) - Method taking a 2d array and int, returning int
+
+
+ Okay before you comment, please look at:
+
+ http://www.research.att.com/~bs/devXinterview.html
+
+ Search for "In another interview, you defined the C declarator syntax as
+ an experiment that failed. However, this syntactic construct has been
+ around for 27 years and perhaps more; why do you consider it problematic
+ (except for its cumbersome syntax)?" and read that response for me. :)
+
+ Now with this syntax, his example would be represented as:
+
+ [ %10, bool (int, int) * ] *
+
+ vs
+
+ bool (*(*)[10])(int, int)
+
+ in C.
+
+ Basically, my argument for this type construction system is that it is
+ VERY simple to use and understand (although it IS different than C, it is
+ very simple and straightforward, which C is NOT). In fact, I would assert
+ that most programmers TODAY do not understand pointers to member
+ functions, and have to look up an example when they have to write them.
+
+ In my opinion, it is critically important to have clear and concise type
+ specifications, because types are going to be all over the programs.
+
+ Let me know your thoughts on this. :)
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,75 ----
+ Date: Thu, 8 Feb 2001 08:42:04 -0600
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <sabre at nondot.org>
+ Subject: RE: Type notation debate...
+
+ Chris,
+
+ > Okay before you comment, please look at:
+ >
+ > http://www.research.att.com/~bs/devXinterview.html
+
+ I read this argument. Even before that, I was already in agreement with you
+ and him that the C declarator syntax is difficult and confusing.
+
+ But in fact, if you read the entire answer carefully, he came to the same
+ conclusion I do: that you have to go with familiar syntax over logical
+ syntax because familiarity is such a strong force:
+
+ "However, familiarity is a strong force. To compare, in English, we
+ live
+ more or less happily with the absurd rules for "to be" (am, are, is, been,
+ was, were, ...) and all attempts to simplify are treated with contempt or
+ (preferably) humor. It be a curious world and it always beed."
+
+ > Basically, my argument for this type construction system is that it is
+ > VERY simple to use and understand (although it IS different than C, it is
+ > very simple and straightforward, which C is NOT). In fact, I would assert
+ > that most programmers TODAY do not understand pointers to member
+ > functions, and have to look up an example when they have to write them.
+
+ Again, I don't disagree with this at all. But to some extent this
+ particular problem is inherently difficult. Your syntax for the above
+ example may be easier for you to read because this is the way you have been
+ thinking about it. Honestly, I don't find it much easier than the C syntax.
+ In either case, I would have to look up an example to write pointers to
+ member functions.
+
+ But pointers to member functions are nowhere near as common as arrays. And
+ the old array syntax:
+ type [ int, int, ...]
+ is just much more familiar and clear to people than anything new you
+ introduce, no matter how logical it is. Introducing a new syntax that may
+ make function pointers easier but makes arrays much more difficult seems
+ very risky to me.
+
+ > In my opinion, it is critically important to have clear and concise type
+ > specifications, because types are going to be all over the programs.
+
+ I absolutely agree. But the question is, what is more clear and concise?
+ The syntax programmers are used to out of years of experience or a new
+ syntax that they have never seen that has a more logical structure. I think
+ the answer is the former. Sometimes, you have to give up a better idea
+ because you can't overcome sociological barriers to it. Qwerty keyboards
+ and Windows are two classic examples of bad technology that are difficult to
+ root out.
+
+ P.S. Also, while I agree that most your syntax is more logical, there is
+ one part that isn't:
+
+ Arrays (without and with size):
+ type ::= '[' type ']' | '[' INT ',' type ']'.
+
+ The arrays with size lists the dimensions and the type in a single list.
+ That is just too confusing:
+ [10, 40, int]
+ This seems to be a 3-D array where the third dimension is something strange.
+ It is too confusing to have a list of 3 things, some of which are dimensions
+ and one is a type. Either of the following would be better:
+
+ array [10, 40] of int
+ or
+ int [10, 40]
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,53 ----
+ Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: RE: Type notation debate...
+
+ > Arrays (without and with size):
+ > type ::= '[' type ']' | '[' INT ',' type ']'.
+ >
+ > The arrays with size lists the dimensions and the type in a single list.
+ > That is just too confusing:
+
+ > [10, 40, int]
+ > This seems to be a 3-D array where the third dimension is something strange.
+ > It is too confusing to have a list of 3 things, some of which are dimensions
+ > and one is a type.
+
+ The above grammar indicates that there is only one integer parameter, ie
+ the upper bound. The lower bound is always implied to be zero, for
+ several reasons:
+
+ * As a low level VM, we want to expose addressing computations
+ explicitly. Since the lower bound must always be known in a high level
+ language statically, the language front end can do the translation
+ automatically.
+ * This fits more closely with what Java needs, ie what we need in the
+ short term. Java arrays are always zero based.
+
+ If a two element list is too confusing, I would recommend an alternate
+ syntax of:
+
+ type ::= '[' type ']' | '[' INT 'x' type ']'.
+
+ For example:
+ [12 x int]
+ [12x int]
+ [ 12 x [ 4x int ]]
+
+ Which is syntactically nicer, and more explicit.
+
+ > Either of the following would be better:
+ > array [10, 40] of int
+
+ I considered this approach for arrays in general (ie array of int/ array
+ of 12 int), but found that it made declarations WAY too long. Remember
+ that because of the nature of llvm, you get a lot of types strewn all over
+ the program, and using the 'typedef' like facility is not a wonderful
+ option, because then types aren't explicit anymore.
+
+ I find this email interesting, because you contradict the previous email
+ you sent, where you recommend that we stick to C syntax....
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,89 ----
+ > But in fact, if you read the entire answer carefully, he came to the same
+ > conclusion I do: that you have to go with familiar syntax over logical
+ > syntax because familiarity is such a strong force:
+ > "However, familiarity is a strong force. To compare, in English, we
+ live
+ > more or less happily with the absurd rules for "to be" (am, are, is, been,
+ > was, were, ...) and all attempts to simplify are treated with contempt or
+ > (preferably) humor. It be a curious world and it always beed."
+
+ Although you have to remember that his situation was considerably
+ different than ours. He was in a position where he was designing a high
+ level language that had to be COMPATIBLE with C. Our language is such
+ that a new person would have to learn the new, different, syntax
+ anyways. Making them learn about the type system does not seem like much
+ of a stretch from learning the opcodes and how SSA form works, and how
+ everything ties together...
+
+ > > Basically, my argument for this type construction system is that it is
+ > > VERY simple to use and understand (although it IS different than C, it is
+ > > very simple and straightforward, which C is NOT). In fact, I would assert
+ > > that most programmers TODAY do not understand pointers to member
+ > > functions, and have to look up an example when they have to write them.
+
+ > Again, I don't disagree with this at all. But to some extent this
+ > particular problem is inherently difficult. Your syntax for the above
+ > example may be easier for you to read because this is the way you have been
+ > thinking about it. Honestly, I don't find it much easier than the C syntax.
+ > In either case, I would have to look up an example to write pointers to
+ > member functions.
+
+ I would argue that because the lexical structure of the language is self
+ consistent, any person who spent a significant amount of time programming
+ in LLVM directly would understand how to do it without looking it up in a
+ manual. The reason this does not work for C is because you rarely have to
+ declare these pointers, and the syntax is inconsistent with the method
+ declaration and calling syntax.
+
+ > But pointers to member functions are nowhere near as common as arrays.
+
+ Very true. If you're implementing an object oriented language, however,
+ remember that you have to do all the pointer to member function stuff
+ yourself.... so everytime you invoke a virtual method one is involved
+ (instead of having C++ hide it for you behind "syntactic sugar").
+
+ > And the old array syntax:
+ > type [ int, int, ...]
+ > is just much more familiar and clear to people than anything new you
+ > introduce, no matter how logical it is.
+
+ Erm... excuse me but how is this the "old array syntax"? If you are
+ arguing for consistency with C, you should be asking for 'type int []',
+ which is significantly different than the above (beside the above
+ introduces a new operator and duplicates information
+ needlessly). Basically what I am suggesting is exactly the above without
+ the fluff. So instead of:
+
+ type [ int, int, ...]
+
+ you use:
+
+ type [ int ]
+
+ > Introducing a new syntax that may
+ > make function pointers easier but makes arrays much more difficult seems
+ > very risky to me.
+
+ This is not about function pointers. This is about consistency in the
+ type system, and consistency with the rest of the language. The point
+ above does not make arrays any more difficult to use, and makes the
+ structure of types much more obvious than the "c way".
+
+ > > In my opinion, it is critically important to have clear and concise type
+ > > specifications, because types are going to be all over the programs.
+ >
+ > I absolutely agree. But the question is, what is more clear and concise?
+ > The syntax programmers are used to out of years of experience or a new
+ > syntax that they have never seen that has a more logical structure. I think
+ > the answer is the former. Sometimes, you have to give up a better idea
+ > because you can't overcome sociological barriers to it. Qwerty keyboards
+ > and Windows are two classic examples of bad technology that are difficult to
+ > root out.
+
+ Very true, but you seem to be advocating a completely different Type
+ system than C has, in addition to it not offering the advantages of clear
+ structure that the system I recommended does... so you seem to not have a
+ problem with changing this, just with what I change it to. :)
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveComments.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveComments.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveComments.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,120 ----
+ Ok, here are my comments and suggestions about the LLVM instruction set.
+ We should discuss some now, but can discuss many of them later, when we
+ revisit synchronization, type inference, and other issues.
+ (We have discussed some of the comments already.)
+
+
+ o We should consider eliminating the type annotation in cases where it is
+ essentially obvious from the instruction type, e.g., in br, it is obvious
+ that the first arg. should be a bool and the other args should be labels:
+
+ br bool <cond>, label <iftrue>, label <iffalse>
+
+ I think your point was that making all types explicit improves clarity
+ and readability. I agree to some extent, but it also comes at the cost
+ of verbosity. And when the types are obvious from people's experience
+ (e.g., in the br instruction), it doesn't seem to help as much.
+
+
+ o On reflection, I really like your idea of having the two different switch
+ types (even though they encode implementation techniques rather than
+ semantics). It should simplify building the CFG and my guess is it could
+ enable some significant optimizations, though we should think about which.
+
+
+ o In the lookup-indirect form of the switch, is there a reason not to make
+ the val-type uint? Most HLL switch statements (including Java and C++)
+ require that anyway. And it would also make the val-type uniform
+ in the two forms of the switch.
+
+ I did see the switch-on-bool examples and, while cute, we can just use
+ the branch instructions in that particular case.
+
+
+ o I agree with your comment that we don't need 'neg'.
+
+
+ o There's a trade-off with the cast instruction:
+ + it avoids having to define all the upcasts and downcasts that are
+ valid for the operands of each instruction (you probably have thought
+ of other benefits also)
+ - it could make the bytecode significantly larger because there could
+ be a lot of cast operations
+
+
+ o Making the second arg. to 'shl' a ubyte seems good enough to me.
+ 255 positions seems adequate for several generations of machines
+ and is more compact than uint.
+
+
+ o I still have some major concerns about including malloc and free in the
+ language (either as builtin functions or instructions). LLVM must be
+ able to represent code from many different languages. Languages such as
+ C, C++ Java and Fortran 90 would not be able to use our malloc anyway
+ because each of them will want to provide a library implementation of it.
+
+ This gets even worse when code from different languages is linked
+ into a single executable (which is fairly common in large apps).
+ Having a single malloc would just not suffice, and instead would simply
+ complicate the picture further because it adds an extra variant in
+ addition to the one each language provides.
+
+ Instead, providing a default library version of malloc and free
+ (and perhaps a malloc_gc with garbage collection instead of free)
+ would make a good implementation available to anyone who wants it.
+
+ I don't recall all your arguments in favor so let's discuss this again,
+ and soon.
+
+
+ o 'alloca' on the other hand sounds like a good idea, and the
+ implementation seems fairly language-independent so it doesn't have the
+ problems with malloc listed above.
+
+
+ o About indirect call:
+ Your option #2 sounded good to me. I'm not sure I understand your
+ concern about an explicit 'icall' instruction?
+
+
+ o A pair of important synchronization instr'ns to think about:
+ load-linked
+ store-conditional
+
+
+ o Other classes of instructions that are valuable for pipeline performance:
+ conditional-move
+ predicated instructions
+
+
+ o I believe tail calls are relatively easy to identify; do you know why
+ .NET has a tailcall instruction?
+
+
+ o I agree that we need a static data space. Otherwise, emulating global
+ data gets unnecessarily complex.
+
+
+ o About explicit parallelism:
+
+ We once talked about adding a symbolic thread-id field to each
+ instruction. (It could be optional so single-threaded codes are
+ not penalized.) This could map well to multi-threaded architectures
+ while providing easy ILP for single-threaded onces. But it is probably
+ too radical an idea to include in a base version of LLVM. Instead, it
+ could a great topic for a separate study.
+
+ What is the semantics of the IA64 stop bit?
+
+
+
+
+ o And finally, another thought about the syntax for arrays :-)
+
+ Although this syntax:
+ array <dimension-list> of <type>
+ is verbose, it will be used only in the human-readable assembly code so
+ size should not matter. I think we should consider it because I find it
+ to be the clearest syntax. It could even make arrays of function
+ pointers somewhat readable.
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,245 ----
+ From: Chris Lattner <sabre at nondot.org>
+ To: "Vikram S. Adve" <vadve at cs.uiuc.edu>
+ Subject: Re: LLVM Feedback
+
+ I've included your feedback in the /home/vadve/lattner/llvm/docs directory
+ so that it will live in CVS eventually with the rest of LLVM. I've
+ significantly updated the documentation to reflect the changes you
+ suggested, as specified below:
+
+ > We should consider eliminating the type annotation in cases where it is
+ > essentially obvious from the instruction type:
+ > br bool <cond>, label <iftrue>, label <iffalse>
+ > I think your point was that making all types explicit improves clarity
+ > and readability. I agree to some extent, but it also comes at the
+ > cost of verbosity. And when the types are obvious from people's
+ > experience (e.g., in the br instruction), it doesn't seem to help as
+ > much.
+
+ Very true. We should discuss this more, but my reasoning is more of a
+ consistency argument. There are VERY few instructions that can have all
+ of the types eliminated, and doing so when available unnecesarily makes
+ the language more difficult to handle. Especially when you see 'int
+ %this' and 'bool %that' all over the place, I think it would be
+ disorienting to see:
+
+ br %predicate, %iftrue, %iffalse
+
+ for branches. Even just typing that once gives me the creeps. ;) Like I
+ said, we should probably discuss this further in person...
+
+ > On reflection, I really like your idea of having the two different
+ > switch types (even though they encode implementation techniques rather
+ > than semantics). It should simplify building the CFG and my guess is it
+ > could enable some significant optimizations, though we should think
+ > about which.
+
+ Great. I added a note to the switch section commenting on how the VM
+ should just use the instruction type as a hint, and that the
+ implementation may choose altermate representations (such as predicated
+ branches).
+
+ > In the lookup-indirect form of the switch, is there a reason not to
+ > make the val-type uint?
+
+ No. This was something I was debating for a while, and didn't really feel
+ strongly about either way. It is common to switch on other types in HLL's
+ (for example signed int's are particually common), but in this case, all
+ that will be added is an additional 'cast' instruction. I removed that
+ from the spec.
+
+ > I agree with your comment that we don't need 'neg'
+
+ Removed.
+
+ > There's a trade-off with the cast instruction:
+ > + it avoids having to define all the upcasts and downcasts that are
+ > valid for the operands of each instruction (you probably have
+ > thought of other benefits also)
+ > - it could make the bytecode significantly larger because there could
+ > be a lot of cast operations
+
+ + You NEED casts to represent things like:
+ void foo(float);
+ ...
+ int x;
+ ...
+ foo(x);
+ in a language like C. Even in a Java like language, you need upcasts
+ and some way to implement dynamic downcasts.
+ + Not all forms of instructions take every type (for example you can't
+ shift by a floating point number of bits), thus SOME programs will need
+ implicit casts.
+
+ To be efficient and to avoid your '-' point above, we just have to be
+ careful to specify that the instructions shall operate on all common
+ types, therefore casting should be relatively uncommon. For example all
+ of the arithmetic operations work on almost all data types.
+
+ > Making the second arg. to 'shl' a ubyte seems good enough to me.
+ > 255 positions seems adequate for several generations of machines
+
+ Okay, that comment is removed.
+
+ > and is more compact than uint.
+
+ No, it isn't. Remember that the bytecode encoding saves value slots into
+ the bytecode instructions themselves, not constant values. This is
+ another case where we may introduce more cast instructions (but we will
+ also reduce the number of opcode variants that must be supported by a
+ virtual machine). Because most shifts are by constant values, I don't
+ think that we'll have to cast many shifts. :)
+
+ > I still have some major concerns about including malloc and free in the
+ > language (either as builtin functions or instructions).
+
+ Agreed. How about this proposal:
+
+ malloc/free are either built in functions or actual opcodes. They provide
+ all of the type safety that the document would indicate, blah blah
+ blah. :)
+
+ Now, because of all of the excellent points that you raised, an
+ implementation may want to override the default malloc/free behavior of
+ the program. To do this, they simply implement a "malloc" and
+ "free" function. The virtual machine will then be defined to use the user
+ defined malloc/free function (which return/take void*'s, not type'd
+ pointers like the builtin function would) if one is available, otherwise
+ fall back on a system malloc/free.
+
+ Does this sound like a good compromise? It would give us all of the
+ typesafety/elegance in the language while still allowing the user to do
+ all the cool stuff they want to...
+
+ > 'alloca' on the other hand sounds like a good idea, and the
+ > implementation seems fairly language-independent so it doesn't have the
+ > problems with malloc listed above.
+
+ Okay, once we get the above stuff figured out, I'll put it all in the
+ spec.
+
+ > About indirect call:
+ > Your option #2 sounded good to me. I'm not sure I understand your
+ > concern about an explicit 'icall' instruction?
+
+ I worry too much. :) The other alternative has been removed. 'icall' is
+ now up in the instruction list next to 'call'.
+
+ > I believe tail calls are relatively easy to identify; do you know why
+ > .NET has a tailcall instruction?
+
+ Although I am just guessing, I believe it probably has to do with the fact
+ that they want languages like Haskell and lisp to be efficiently runnable
+ on their VM. Of course this means that the VM MUST implement tail calls
+ 'correctly', or else life will suck. :) I would put this into a future
+ feature bin, because it could be pretty handy...
+
+ > A pair of important synchronization instr'ns to think about:
+ > load-linked
+ > store-conditional
+
+ What is 'load-linked'? I think that (at least for now) I should add these
+ to the 'possible extensions' section, because they are not immediately
+ needed...
+
+ > Other classes of instructions that are valuable for pipeline
+ > performance:
+ > conditional-move
+ > predicated instructions
+
+ Conditional move is effectly a special case of a predicated
+ instruction... and I think that all predicated instructions can possibly
+ be implemented later in LLVM. It would significantly change things, and
+ it doesn't seem to be very necessary right now. It would seem to
+ complicate flow control analysis a LOT in the virtual machine. I would
+ tend to prefer that a predicated architecture like IA64 convert from a
+ "basic block" representation to a predicated rep as part of it's dynamic
+ complication phase. Also, if a basic block contains ONLY a move, then
+ that can be trivally translated into a conditional move...
+
+ > I agree that we need a static data space. Otherwise, emulating global
+ > data gets unnecessarily complex.
+
+ Definately. Also a later item though. :)
+
+ > We once talked about adding a symbolic thread-id field to each
+ > ..
+ > Instead, it could a great topic for a separate study.
+
+ Agreed. :)
+
+ > What is the semantics of the IA64 stop bit?
+
+ Basically, the IA64 writes instructions like this:
+ mov ...
+ add ...
+ sub ...
+ op xxx
+ op xxx
+ ;;
+ mov ...
+ add ...
+ sub ...
+ op xxx
+ op xxx
+ ;;
+
+ Where the ;; delimits a group of instruction with no dependencies between
+ them, which can all be executed concurrently (to the limits of the
+ available functional units). The ;; gets translated into a bit set in one
+ of the opcodes.
+
+ The advantages of this representation is that you don't have to do some
+ kind of 'thread id scheduling' pass by having to specify ahead of time how
+ many threads to use, and the representation doesn't have a per instruction
+ overhead...
+
+ > And finally, another thought about the syntax for arrays :-)
+ > Although this syntax:
+ > array <dimension-list> of <type>
+ > is verbose, it will be used only in the human-readable assembly code so
+ > size should not matter. I think we should consider it because I find it
+ > to be the clearest syntax. It could even make arrays of function
+ > pointers somewhat readable.
+
+ My only comment will be to give you an example of why this is a bad
+ idea. :)
+
+ Here is an example of using the switch statement (with my recommended
+ syntax):
+
+ switch uint %val, label %otherwise,
+ [%3 x {uint, label}] [ { uint %57, label %l1 },
+ { uint %20, label %l2 },
+ { uint %14, label %l3 } ]
+
+ Here it is with the syntax you are proposing:
+
+ switch uint %val, label %otherwise,
+ array %3 of {uint, label}
+ array of {uint, label}
+ { uint %57, label %l1 },
+ { uint %20, label %l2 },
+ { uint %14, label %l3 }
+
+ Which is ambiguous and very verbose. It would be possible to specify
+ constants with [] brackets as in my syntax, which would look like this:
+
+ switch uint %val, label %otherwise,
+ array %3 of {uint, label} [ { uint %57, label %l1 },
+ { uint %20, label %l2 },
+ { uint %14, label %l3 } ]
+
+ But then the syntax is inconsistent between type definition and constant
+ definition (why do []'s enclose the constants but not the types??).
+
+ Anyways, I'm sure that there is much debate still to be had over
+ this... :)
+
+ -Chris
+
+ http://www.nondot.org/~sabre/os/
+ http://www.nondot.org/MagicStats/
+ http://korbit.sourceforge.net/
+
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,39 ----
+ Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: LLVM Concerns...
+
+
+ I've updated the documentation to include load store and allocation
+ instructions (please take a look and let me know if I'm on the right
+ track):
+
+ file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops
+
+ I have a couple of concerns I would like to bring up:
+
+ 1. Reference types
+ Right now, I've spec'd out the language to have a pointer type, which
+ works fine for lots of stuff... except that Java really has
+ references: constrained pointers that cannot be manipulated: added and
+ subtracted, moved, etc... Do we want to have a type like this? It
+ could be very nice for analysis (pointer always points to the start of
+ an object, etc...) and more closely matches Java semantics. The
+ pointer type would be kept for C++ like semantics. Through analysis,
+ C++ pointers could be promoted to references in the LLVM
+ representation.
+
+ 2. Our "implicit" memory references in assembly language:
+ After thinking about it, this model has two problems:
+ A. If you do pointer analysis and realize that two stores are
+ independent and can share the same memory source object, there is
+ no way to represent this in either the bytecode or assembly.
+ B. When parsing assembly/bytecode, we effectively have to do a full
+ SSA generation/PHI node insertion pass to build the dependencies
+ when we don't want the "pinned" representation. This is not
+ cool.
+ I'm tempted to make memory references explicit in both the assembly and
+ bytecode to get around this... what do you think?
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,47 ----
+ Date: Tue, 13 Feb 2001 18:25:42 -0600
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <sabre at nondot.org>
+ Subject: RE: LLVM Concerns...
+
+ > 1. Reference types
+ > Right now, I've spec'd out the language to have a pointer type, which
+ > works fine for lots of stuff... except that Java really has
+ > references: constrained pointers that cannot be manipulated: added and
+ > subtracted, moved, etc... Do we want to have a type like this? It
+ > could be very nice for analysis (pointer always points to the start of
+ > an object, etc...) and more closely matches Java semantics. The
+ > pointer type would be kept for C++ like semantics. Through analysis,
+ > C++ pointers could be promoted to references in the LLVM
+ > representation.
+
+
+ You're right, having references would be useful. Even for C++ the *static*
+ compiler could generate references instead of pointers with fairly
+ straightforward analysis. Let's include a reference type for now. But I'm
+ also really concerned that LLVM is becoming big and complex and (perhaps)
+ too high-level. After we get some initial performance results, we may have
+ a clearer idea of what our goals should be and we should revisit this
+ question then.
+
+ > 2. Our "implicit" memory references in assembly language:
+ > After thinking about it, this model has two problems:
+ > A. If you do pointer analysis and realize that two stores are
+ > independent and can share the same memory source object,
+
+ not sure what you meant by "share the same memory source object"
+
+ > there is
+ > no way to represent this in either the bytecode or assembly.
+ > B. When parsing assembly/bytecode, we effectively have to do a full
+ > SSA generation/PHI node insertion pass to build the dependencies
+ > when we don't want the "pinned" representation. This is not
+ > cool.
+
+ I understand the concern. But again, let's focus on the performance first
+ and then look at the language design issues. E.g., it would be good to know
+ how big the bytecode files are before expanding them further. I am pretty
+ keen to explore the implications of LLVM for mobile devices. Both bytecode
+ size and power consumption are important to consider there.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,49 ----
+ By Chris:
+
+ LLVM has been designed with two primary goals in mind. First we strive to
+ enable the best possible division of labor between static and dynamic
+ compilers, and second, we need a flexible and powerful interface
+ between these two complementary stages of compilation. We feel that
+ providing a solution to these two goals will yield an excellent solution
+ to the performance problem faced by modern architectures and programming
+ languages.
+
+ A key insight into current compiler and runtime systems is that a
+ compiler may fall in anywhere in a "continuum of compilation" to do its
+ job. On one side, scripting languages statically compile nothing and
+ dynamically compile (or equivalently, interpret) everything. On the far
+ other side, traditional static compilers process everything statically and
+ nothing dynamically. These approaches have typically been seen as a
+ tradeoff between performance and portability. On a deeper level, however,
+ there are two reasons that optimal system performance may be obtained by a
+ system somewhere in between these two extremes: Dynamic application
+ behavior and social constraints.
+
+ From a technical perspective, pure static compilation cannot ever give
+ optimal performance in all cases, because applications have varying dynamic
+ behavior that the static compiler cannot take into consideration. Even
+ compilers that support profile guided optimization generate poor code in
+ the real world, because using such optimization tunes that application
+ to one particular usage pattern, whereas real programs (as opposed to
+ benchmarks) often have several different usage patterns.
+
+ On a social level, static compilation is a very shortsighted solution to
+ the performance problem. Instruction set architectures (ISAs) continuously
+ evolve, and each implementation of an ISA (a processor) must choose a set
+ of tradeoffs that make sense in the market context that it is designed for.
+ With every new processor introduced, the vendor faces two fundamental
+ problems: First, there is a lag time between when a processor is introduced
+ to when compilers generate quality code for the architecture. Secondly,
+ even when compilers catch up to the new architecture there is often a large
+ body of legacy code that was compiled for previous generations and will
+ not or can not be upgraded. Thus a large percentage of code running on a
+ processor may be compiled quite sub-optimally for the current
+ characteristics of the dynamic execution environment.
+
+ For these reasons, LLVM has been designed from the beginning as a long-term
+ solution to these problems. Its design allows the large body of platform
+ independent, static, program optimizations currently in compilers to be
+ reused unchanged in their current form. It also provides important static
+ type information to enable powerful dynamic and link time optimizations
+ to be performed quickly and efficiently. This combination enables an
+ increase in effective system performance for real world environments.
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,202 ----
+ Meeting notes: Implementation idea: Exception Handling in C++/Java
+
+ The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM.
+ We decided that the best solution requires a set of library calls provided by
+ the VM, as well as an extension to the LLVM function invocation syntax.
+
+ The LLVM function invocation instruction previously looks like this (ignoring
+ types):
+
+ call func(arg1, arg2, arg3)
+
+ The extension discussed today adds an optional "with" clause that
+ associates a label with the call site. The new syntax looks like this:
+
+ call func(arg1, arg2, arg3) with funcCleanup
+
+ This funcHandler always stays tightly associated with the call site (being
+ encoded directly into the call opcode itself), and should be used whenever
+ there is cleanup work that needs to be done for the current function if
+ an exception is thrown by func (or if we are in a try block).
+
+ To support this, the VM/Runtime provide the following simple library
+ functions (all syntax in this document is very abstract):
+
+ typedef struct { something } %frame;
+ The VM must export a "frame type", that is an opaque structure used to
+ implement different types of stack walking that may be used by various
+ language runtime libraries. We imagine that it would be typical to
+ represent a frame with a PC and frame pointer pair, although that is not
+ required.
+
+ %frame getStackCurrentFrame();
+ Get a frame object for the current function. Note that if the current
+ function was inlined into its caller, the "current" frame will belong to
+ the "caller".
+
+ bool isFirstFrame(%frame f);
+ Returns true if the specified frame is the top level (first activated) frame
+ for this thread. For the main thread, this corresponds to the main()
+ function, for a spawned thread, it corresponds to the thread function.
+
+ %frame getNextFrame(%frame f);
+ Return the previous frame on the stack. This function is undefined if f
+ satisfies the predicate isFirstFrame(f).
+
+ Label *getFrameLabel(%frame f);
+ If a label was associated with f (as discussed below), this function returns
+ it. Otherwise, it returns a null pointer.
+
+ doNonLocalBranch(Label *L);
+ At this point, it is not clear whether this should be a function or
+ intrinsic. It should probably be an intrinsic in LLVM, but we'll deal with
+ this issue later.
+
+
+ Here is a motivating example that illustrates how these facilities could be
+ used to implement the C++ exception model:
+
+ void TestFunction(...) {
+ A a; B b;
+ foo(); // Any function call may throw
+ bar();
+ C c;
+
+ try {
+ D d;
+ baz();
+ } catch (int) {
+ ...int Stuff...
+ // execution continues after the try block: the exception is consumed
+ } catch (double) {
+ ...double stuff...
+ throw; // Exception is propogated
+ }
+ }
+
+ This function would compile to approximately the following code (heavy
+ pseudo code follows):
+
+ Func:
+ %a = alloca A
+ A::A(%a) // These ctors & dtors could throw, but we ignore this
+ %b = alloca B // minor detail for this example
+ B::B(%b)
+
+ call foo() with fooCleanup // An exception in foo is propogated to fooCleanup
+ call bar() with barCleanup // An exception in bar is propogated to barCleanup
+
+ %c = alloca C
+ C::C(c)
+ %d = alloca D
+ D::D(d)
+ call baz() with bazCleanup // An exception in baz is propogated to bazCleanup
+ d->~D();
+ EndTry: // This label corresponds to the end of the try block
+ c->~C() // These could also throw, these are also ignored
+ b->~B()
+ a->~A()
+ return
+
+ Note that this is a very straight forward and literal translation: exactly
+ what we want for zero cost (when unused) exception handling. Especially on
+ platforms with many registers (ie, the IA64) setjmp/longjmp style exception
+ handling is *very* impractical. Also, the "with" clauses describe the
+ control flow paths explicitly so that analysis is not adversly effected.
+
+ The foo/barCleanup labels are implemented as:
+
+ TryCleanup: // Executed if an exception escapes the try block
+ c->~C()
+ barCleanup: // Executed if an exception escapes from bar()
+ // fall through
+ fooCleanup: // Executed if an exception escapes from foo()
+ b->~B()
+ a->~A()
+ Exception *E = getThreadLocalException()
+ call throw(E) // Implemented by the C++ runtime, described below
+
+ Which does the work one would expect. getThreadLocalException is a function
+ implemented by the C++ support library. It returns the current exception
+ object for the current thread. Note that we do not attempt to recycle the
+ shutdown code from before, because performance of the mainline code is
+ critically important. Also, obviously fooCleanup and barCleanup may be
+ merged and one of them eliminated. This just shows how the code generator
+ would most likely emit code.
+
+ The bazCleanup label is more interesting. Because the exception may be caught
+ by the try block, we must dispatch to its handler... but it does not exist
+ on the call stack (it does not have a VM Call->Label mapping installed), so
+ we must dispatch statically with a goto. The bazHandler thus appears as:
+
+ bazHandler:
+ d->~D(); // destruct D as it goes out of scope when entering catch clauses
+ goto TryHandler
+
+ In general, TryHandler is not the same as bazHandler, because multiple
+ function calls could be made from the try block. In this case, trivial
+ optimization could merge the two basic blocks. TryHandler is the code
+ that actually determines the type of exception, based on the Exception object
+ itself. For this discussion, assume that the exception object contains *at
+ least*:
+
+ 1. A pointer to the RTTI info for the contained object
+ 2. A pointer to the dtor for the contained object
+ 3. The contained object itself
+
+ Note that it is necessary to maintain #1 & #2 in the exception object itself
+ because objects without virtual function tables may be thrown (as in this
+ example). Assuming this, TryHandler would look something like this:
+
+ TryHandler:
+ Exception *E = getThreadLocalException();
+ switch (E->RTTIType) {
+ case IntRTTIInfo:
+ ...int Stuff... // The action to perform from the catch block
+ break;
+ case DoubleRTTIInfo:
+ ...double Stuff... // The action to perform from the catch block
+ goto TryCleanup // This catch block rethrows the exception
+ break; // Redundant, eliminated by the optimizer
+ default:
+ goto TryCleanup // Exception not caught, rethrow
+ }
+
+ // Exception was consumed
+ if (E->dtor)
+ E->dtor(E->object) // Invoke the dtor on the object if it exists
+ goto EndTry // Continue mainline code...
+
+ And that is all there is to it.
+
+ The throw(E) function would then be implemented like this (which may be
+ inlined into the caller through standard optimization):
+
+ function throw(Exception *E) {
+ // Get the start of the stack trace...
+ %frame %f = call getStackCurrentFrame()
+
+ // Get the label information that corresponds to it
+ label * %L = call getFrameLabel(%f)
+ while (%L == 0 && !isFirstFrame(%f)) {
+ // Loop until a cleanup handler is found
+ %f = call getNextFrame(%f)
+ %L = call getFrameLabel(%f)
+ }
+
+ if (%L != 0) {
+ call setThreadLocalException(E) // Allow handlers access to this...
+ call doNonLocalBranch(%L)
+ }
+ // No handler found!
+ call BlowUp() // Ends up calling the terminate() method in use
+ }
+
+ That's a brief rundown of how C++ exception handling could be implemented in
+ llvm. Java would be very similar, except it only uses destructors to unlock
+ synchronized blocks, not to destroy data. Also, it uses two stack walks: a
+ nondestructive walk that builds a stack trace, then a destructive walk that
+ unwinds the stack as shown here.
+
+ It would be trivial to get exception interoperability between C++ and Java.
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,45 ----
+ Date: Sat, 19 May 2001 19:09:13 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: RE: Meeting writeup
+
+ > I read it through and it looks great!
+
+ Thanks!
+
+ > The finally clause in Java may need more thought. The code for this clause
+ > is like a subroutine because it needs to be entered from many points (end of
+ > try block and beginning of each catch block), and then needs to *return to
+ > the place from where the code was entered*. That's why JVM has the
+ > jsr/jsr_w instruction.
+
+ Hrm... I guess that is an implementation decision. It can either be
+ modelled as a subroutine (as java bytecodes do), which is really
+ gross... or it can be modelled as code duplication (emitted once inline,
+ then once in the exception path). Because this could, at worst,
+ slightly less than double the amount of code in a function (it is
+ bounded) I don't think this is a big deal. One of the really nice things
+ about the LLVM representation is that it still allows for runtime code
+ generation for exception paths (exceptions paths are not compiled until
+ needed). Obviously a static compiler couldn't do this though. :)
+
+ In this case, only one copy of the code would be compiled... until the
+ other one is needed on demand. Also this strategy fits with the "zero
+ cost" exception model... the standard case is not burdened with extra
+ branches or "call"s.
+
+ > I suppose you could save the return address in a particular register
+ > (specific to this finally block), jump to the finally block, and then at the
+ > end of the finally block, jump back indirectly through this register. It
+ > will complicate building the CFG but I suppose that can be handled. It is
+ > also unsafe in terms of checking where control returns (which is I suppose
+ > why the JVM doesn't use this).
+
+ I think that a code duplication method would be cleaner, and would avoid
+ the caveats that you mention. Also, it does not slow down the normal case
+ with an indirect branch...
+
+ Like everything, we can probably defer a final decision until later. :)
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,63 ----
+ Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: Interesting: GCC passes
+
+
+ Take a look at this document (which describes the order of optimizations
+ that GCC performs):
+
+ http://gcc.gnu.org/onlinedocs/gcc_17.html
+
+ The rundown is that after RTL generation, the following happens:
+
+ 1 . [t] jump optimization (jumps to jumps, etc)
+ 2 . [t] Delete unreachable code
+ 3 . Compute live ranges for CSE
+ 4 . [t] Jump threading (jumps to jumps with identical or inverse conditions)
+ 5 . [t] CSE
+ 6 . *** Conversion to SSA
+ 7 . [t] SSA Based DCE
+ 8 . *** Conversion to LLVM
+ 9 . UnSSA
+ 10. GCSE
+ 11. LICM
+ 12. Strength Reduction
+ 13. Loop unrolling
+ 14. [t] CSE
+ 15. [t] DCE
+ 16. Instruction combination, register movement, scheduling... etc.
+
+ I've marked optimizations with a [t] to indicate things that I believe to
+ be relatively trivial to implement in LLVM itself. The time consuming
+ things to reimplement would be SSA based PRE, Strength reduction & loop
+ unrolling... these would be the major things we would miss out on if we
+ did LLVM creation from tree code [inlining and other high level
+ optimizations are done on the tree representation].
+
+ Given the lack of "strong" optimizations that would take a long time to
+ reimplement, I am leaning a bit more towards creating LLVM from the tree
+ code. Especially given that SGI has GPL'd their compiler, including many
+ SSA based optimizations that could be adapted (besides the fact that their
+ code looks MUCH nicer than GCC :)
+
+ Even if we choose to do LLVM code emission from RTL, we will almost
+ certainly want to move LLVM emission from step 8 down until at least CSE
+ has been rerun... which causes me to wonder if the SSA generation code
+ will still work (due to global variable dependencies and stuff). I assume
+ that it can be made to work, but might be a little more involved than we
+ would like.
+
+ I'm continuing to look at the Tree -> RTL code. It is pretty gross
+ because they do some of the translation a statement at a time, and some
+ of it a function at a time... I'm not quite clear why and how the
+ distinction is drawn, but it does not appear that there is a wonderful
+ place to attach extra info.
+
+ Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now. We
+ can talk about this more on Monday.
+
+ Wouldn't it be nice if there were a obvious decision to be made? :)
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,71 ----
+ Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: RE: Interesting: GCC passes
+
+ > That is very interesting. I agree that some of these could be done on LLVM
+ > at link-time, but it is the extra time required that concerns me. Link-time
+ > optimization is severely time-constrained.
+
+ If we were to reimplement any of these optimizations, I assume that we
+ could do them a translation unit at a time, just as GCC does now. This
+ would lead to a pipeline like this:
+
+ Static optimizations, xlation unit at a time:
+ .c --GCC--> .llvm --llvmopt--> .llvm
+
+ Link time optimizations:
+ .llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm
+
+ Of course, many optimizations could be shared between llvmopt and
+ llvm-link-opt, but the wouldn't need to be shared... Thus compile time
+ could be faster, because we are using a "smarter" IR (SSA based).
+
+ > BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and
+ > putting it into another is not necessarily easier than re-doing it.
+ > Optimization code is usually heavily tied in to the specific IR they use.
+
+ Understood. The only reason that I brought this up is because SGI's IR is
+ more similar to LLVM than it is different in many respects (SSA based,
+ relatively low level, etc), and could be easily adapted. Also their
+ optimizations are written in C++ and are actually somewhat
+ structured... of course it would be no walk in the park, but it would be
+ much less time consuming to adapt, say, SSA-PRE than to rewrite it.
+
+ > But your larger point is valid that adding SSA based optimizations is
+ > feasible and should be fun. (Again, link time cost is the issue.)
+
+ Assuming linktime cost wasn't an issue, the question is:
+ Does using GCC's backend buy us anything?
+
+ > It also occurs to me that GCC is probably doing quite a bit of back-end
+ > optimization (step 16 in your list). Do you have a breakdown of that?
+
+ Not really. The irritating part of GCC is that it mixes it all up and
+ doesn't have a clean seperation of concerns. A lot of the "back end
+ optimization" happens right along with other data optimizations (ie, CSE
+ of machine specific things).
+
+ As far as REAL back end optimizations go, it looks something like this:
+
+ 1. Instruction combination: try to make CISCy instructions, if available
+ 2. Register movement: try to get registers in the right places for the
+ architecture to avoid register to register moves. For example, try to get
+ the first argument of a function to naturally land in %o0 for sparc.
+ 3. Instruction scheduling: 'nuff said :)
+ 4. Register class preferencing: ??
+ 5. Local register allocation
+ 6. global register allocation
+ 7. Spilling
+ 8. Local regalloc
+ 9. Jump optimization
+ 10. Delay slot scheduling
+ 11. Branch shorting for CISC machines
+ 12. Instruction selection & peephole optimization
+ 13. Debug info output
+
+ But none of this would be usable for LLVM anyways, unless we were using
+ GCC as a static compiler.
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,30 ----
+ Date: Wed, 20 Jun 2001 12:32:22 -0500
+ From: Vikram Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: .NET vs. our VM
+
+ One significant difference between .NET CLR and our VM is that the CLR
+ includes full information about classes and inheritance. In fact, I just
+ sat through the paper on adding templates to .NET CLR, and the speaker
+ indicated that the goal seems to be to do simple static compilation (very
+ little lowering or optimization). Also, the templates implementation in CLR
+ "relies on dynamic class loading and JIT compilation".
+
+ This is an important difference because I think there are some significant
+ advantages to have a much lower level VM layer, and do significant static
+ analysis and optimization.
+
+ I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he
+ said that SGI and other commercial compilers have included options to export
+ their *IR* next to the object code (i.e., .il files) and use them for
+ link-time code generation. In fact, he said that the .o file was nearly
+ empty and was entirely generated from the .il at link-time. But he agreed
+ that this limited the link-time interprocedural optimization to modules
+ compiled by the same compiler, whereas our approach allows us to link and
+ optimize modules from multiple different compilers. (Also, of course, they
+ don't do anything for runtime optimization).
+
+ All issues to bring up in Related Work.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,31 ----
+ Date: Fri, 6 Jul 2001 16:56:56 -0500
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: lowering the IR
+
+ BTW, I do think that we should consider lowering the IR as you said. I
+ didn't get time to raise it today, but it comes up with the SPARC
+ move-conditional instruction. I don't think we want to put that in the core
+ VM -- it is a little too specialized. But without a corresponding
+ conditional move instruction in the VM, it is pretty difficult to maintain a
+ close mapping between VM and machine code. Other architectures may have
+ other such instructions.
+
+ What I was going to suggest was that for a particular processor, we define
+ additional VM instructions that match some of the unusual opcodes on the
+ processor but have VM semantics otherwise, i.e., all operands are in SSA
+ form and typed. This means that we can re-generate core VM code from the
+ more specialized code any time we want (so that portability is not lost).
+
+ Typically, a static compiler like gcc would generate just the core VM, which
+ is relatively portable. Anyone (an offline tool, the linker, etc., or even
+ the static compiler itself if it chooses) can transform that into more
+ specialized target-specific VM code for a particular architecture. If the
+ linker does it, it can do it after all machine-independent optimizations.
+ This would be the most convenient, but not necessary.
+
+ The main benefit of lowering will be that we will be able to retain a close
+ mapping between VM and machine code.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,51 ----
+ Date: Sun, 8 Jul 2001 09:37:22 -0500
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Ruchira Sasanka <sasanka at students.uiuc.edu>
+ Cc: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: machine instruction operands
+
+ Ruchira,
+
+ When generating machine instructions, I have to make several choices about
+ operands. For cases were a register is required, there are 3 cases:
+
+ 1. The register is for a Value* that is already in the VM code.
+
+ 2. The register is for a value that is not in the VM code, usually because 2
+ machine instructions get generated for a single VM instruction (and the
+ register holds the result of the first m/c instruction and is used by the
+ second m/c instruction).
+
+ 3. The register is a pre-determined machine register.
+
+ E.g, for this VM instruction:
+ ptr = alloca type, numElements
+ I have to generate 2 machine instructions:
+ reg = mul constant, numElements
+ ptr = add %sp, reg
+
+ Each machine instruction is of class MachineInstr.
+ It has a vector of operands. All register operands have type MO_REGISTER.
+ The 3 types of register operands are marked using this enum:
+
+ enum VirtualRegisterType {
+ MO_VMVirtualReg, // virtual register for *value
+ MO_MInstrVirtualReg, // virtual register for result of *minstr
+ MO_MachineReg // pre-assigned machine register `regNum'
+ } vregType;
+
+ Here's how this affects register allocation:
+
+ 1. MO_VMVirtualReg is the standard case: you just do the register
+ allocation.
+
+ 2. MO_MInstrVirtualReg is the case where there is a hidden register being
+ used. You should decide how you want to handle it, e.g., do you want do
+ create a Value object during the preprocessing phase to make the value
+ explicit (like for address register for the RETURN instruction).
+
+ 3. For case MO_MachineReg, you don't need to do anything, at least for
+ SPARC. The only machine regs I am using so far are %g0 and %sp.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection2.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection2.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,25 ----
+ Date: Sun, 8 Jul 2001 10:02:20 -0500
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: vadve at cs.uiuc.edu, Ruchira Sasanka <sasanka at students.uiuc.edu>
+ Cc: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: RE: machine instruction operands
+
+ I got interrupted and forgot to explain the example. In that case:
+
+ reg will be the 3rd operand of MUL and it will be of type
+ MO_MInstrVirtualReg. The field MachineInstr* minstr will point to the
+ instruction that computes reg.
+
+ numElements will be an immediate constant, not a register.
+
+ %sp will be operand 1 of ADD and it will be of type MO_MachineReg. The
+ field regNum identifies the register.
+
+ numElements will be operand 2 of ADD and it will be of type
+ MO_VMVirtualReg. The field Value* value identifies the value.
+
+ ptr will be operand 3 of ADD will also be %sp, i.e., of
+ type MO_MachineReg. regNum identifies the register.
+
+ --Vikram
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,56 ----
+ Date: Tue, 18 Sep 2001 00:38:37 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: Idea for a simple, useful link time optimization
+
+
+ In C++ programs, exceptions suck, and here's why:
+
+ 1. In virtually all function calls, you must assume that the function
+ throws an exception, unless it is defined as 'nothrow'. This means
+ that every function call has to have code to invoke dtors on objects
+ locally if one is thrown by the function. Most functions don't throw
+ exceptions, so this code is dead [with all the bad effects of dead
+ code, including icache pollution].
+ 2. Declaring a function nothrow causes catch blocks to be added to every
+ call that isnot provably nothrow. This makes them very slow.
+ 3. Extra extraneous exception edges reduce the opportunity for code
+ motion.
+ 4. EH is typically implemented with large lookup tables. Ours is going to
+ be much smaller (than the "standard" way of doing it) to start with,
+ but eliminating it entirely would be nice. :)
+ 5. It is physically impossible to correctly put (accurate, correct)
+ exception specifications on generic, templated code. But it is trivial
+ to analyze instantiations of said code.
+ 6. Most large C++ programs throw few exceptions. Most well designed
+ programs only throw exceptions in specific planned portions of the
+ code.
+
+ Given our _planned_ model of handling exceptions, all of this would be
+ pretty trivial to eliminate through some pretty simplistic interprocedural
+ analysis. The DCE factor alone could probably be pretty significant. The
+ extra code motion opportunities could also be exploited though...
+
+ Additionally, this optimization can be implemented in a straight forward
+ conservative manner, allowing libraries to be optimized or individual
+ files even (if there are leaf functions visible in the translation unit
+ that are called).
+
+ I think it's a reasonable optimization that hasn't really been addressed
+ (because assembly is way too low level for this), and could have decent
+ payoffs... without being a overly complex optimization.
+
+ After I wrote all of that, I found this page that is talking about
+ basically the same thing I just wrote, except that it is translation unit
+ at a time, tree based approach:
+ http://www.ocston.org/~jls/ehopt.html
+
+ but is very useful from "expected gain" and references perspective. Note
+ that their compiler is apparently unable to inline functions that use
+ exceptions, so there numbers are pretty worthless... also our results
+ would (hopefully) be better because it's interprocedural...
+
+ What do you think?
+
+ -Chris
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2002-05-12-InstListChange.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2002-05-12-InstListChange.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2002-05-12-InstListChange.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,55 ----
+ Date: Sun, 12 May 2002 17:12:53 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: "Vikram S. Adve" <vadve at cs.uiuc.edu>
+ Subject: LLVM change
+
+ There is a fairly fundemental change that I would like to make to the LLVM
+ infrastructure, but I'd like to know if you see any drawbacks that I
+ don't...
+
+ Basically right now at the basic block level, each basic block contains an
+ instruction list (returned by getInstList()) that is a ValueHolder of
+ instructions. To iterate over instructions, we must actually iterate over
+ the instlist, and access the instructions through the instlist.
+
+ To add or remove an instruction from a basic block, we need to get an
+ iterator to an instruction, which, given just an Instruction*, requires a
+ linear search of the basic block the instruction is contained in... just
+ to insert an instruction before another instruction, or to delete an
+ instruction! This complicates algorithms that should be very simple (like
+ simple constant propogation), because they aren't actually sparse anymore,
+ they have to traverse basic blocks to remove constant propogated
+ instructions.
+
+ Additionally, adding or removing instructions to a basic block
+ _invalidates all iterators_ pointing into that block, which is really
+ irritating.
+
+ To fix these problems (and others), I would like to make the ordering of
+ the instructions be represented with a doubly linked list in the
+ instructions themselves, instead of an external data structure. This is
+ how many other representations do it, and frankly I can't remember why I
+ originally implemented it the way I did.
+
+ Long term, all of the code that depends on the nasty features in the
+ instruction list (which can be found by grep'ing for getInstList()) will
+ be changed to do nice local transformations. In the short term, I'll
+ change the representation, but preserve the interface (including
+ getInstList()) so that all of the code doesn't have to change.
+
+ Iteration over the instructions in a basic block remains the simple:
+ for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) ...
+
+ But we will also support:
+ for (Instruction *I = BB->front(); I; I = I->getNext()) ...
+
+ After converting instructions over, I'll convert basic blocks and
+ functions to have a similar interface.
+
+ The only negative aspect of this change that I see is that it increases
+ the amount of memory consumed by one pointer per instruction. Given the
+ benefits, I think this is a very reasonable tradeoff.
+
+ What do you think?
+
+ -Chris
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,72 ----
+ Changes:
+ * Change the casting code to be const correct. Now, doing this is invalid:
+ const Value *V = ...;
+ Instruction *I = dyn_cast<Instruction>(V);
+ instead, the second line should be:
+ const Instruction *I = dyn_cast<Instruction>(V);
+
+ * Change the casting code to allow casting a reference value thus:
+ const Value &V = ...;
+ Instruction &I = cast<Instruction>(V);
+
+ dyn_cast does not work with references, because it must return a null pointer
+ on failure.
+
+ * Fundamentally change how instructions and other values are represented.
+ Before, every llvm container was an instance of the ValueHolder template,
+ instantiated for each container type. This ValueHolder was effectively a
+ wrapper around a vector of pointers to the sub-objects.
+
+ Now, instead of having a vector to pointers of objects, the objects are
+ maintained in a doubly linked list of values (ie each Instruction now has
+ Next & Previous fields). The containers are now instances of ilist (intrusive
+ linked list class), which use the next and previous fields to chain them
+ together. The advantage of this implementation is that iterators can be
+ formed directly from pointers to the LLVM value, and invalidation is much
+ easier to handle.
+
+ * As part of the above change, dereferencing an iterator (for example:
+ BasicBlock::iterator) now produces a reference to the underlying type (same
+ example: Instruction&) instead of a pointer to the underlying object. This
+ makes it much easier to write nested loops that iterator over things, changing
+ this:
+
+ for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
+ for (BasicBlock::iterator II = (*BI)->begin(); II != (*BI)->end(); ++II)
+ (*II)->dump();
+
+ into:
+
+ for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
+ for (BasicBlock::iterator II = BI->begin(); II != BI->end(); ++II)
+ II->dump();
+
+ which is much more natural and what users expect.
+
+ * Simplification of #include's: Before, it was necessary for a .cpp file to
+ include every .h file that it used. Now things are batched a little bit more
+ to make it easier to use. Specifically, the include graph now includes these
+ edges:
+ Module.h -> Function.h, GlobalVariable.h
+ Function.h -> BasicBlock.h, Argument.h
+ BasicBlock.h -> Instruction.h
+
+ Which means that #including Function.h is usually sufficient for getting the
+ lower level #includes.
+
+ * Printing out a Value* has now changed: Printing a Value* will soon print out
+ the address of the value instead of the contents of the Value. To print out
+ the contents, you must convert it to a reference with (for example)
+ 'cout << *I' instead of 'cout << I;'. This conversion is not yet complete,
+ but will be eventually. In the mean time, both forms print out the contents.
+
+ * References are used much more throughout the code base. In general, if a
+ pointer is known to never be null, it is passed in as a reference instead of a
+ pointer. For example, the instruction visitor class uses references instead
+ of pointers, and that Pass subclasses now all receive references to Values
+ instead of pointers, because they may never be null.
+
+ * The Function class now has helper functions for accessing the Arguments list.
+ Instead of having to go through getArgumentList for simple things like
+ iterator over the arguments, now the a*() methods can be used to access them.
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,28 ----
+ Date: Mon, 20 Jan 2003 00:00:28 -0600
+ From: Brian R. Gaeke <gaeke at uiuc.edu>
+ Subject: windows vs. llvm
+
+ If you're interested, here are some of the major problems compiling LLVM
+ under Cygwin and/or Mingw.
+
+ 1. Cygwin doesn't have <inttypes.h> or <stdint.h>, so all the INT*_MAX
+ symbols and standard int*_t types are off in limbo somewhere. Mingw has
+ <stdint.h>, but Cygwin doesn't like it.
+
+ 2. Mingw doesn't have <dlfcn.h> (because Windows doesn't have it.)
+
+ 3. SA_SIGINFO and friends are not around; only signal() seems to work.
+
+ 4. Relink, aka ld -r, doesn't work (probably an ld bug); you need
+ DONT_BUILD_RELINKED. This breaks all the tools makefiles; you just need to
+ change them to have .a's.
+
+ 5. There isn't a <values.h>.
+
+ 6. There isn't a mallinfo() (or, at least, it's documented, but it doesn't seem
+ to link).
+
+ 7. The version of Bison that cygwin (and newer Linux versions) comes with
+ does not like = signs in rules. Burg's gram.yc source file uses them. I think
+ you can just take them out.
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,137 ----
+ Wed Jun 25 15:13:51 CDT 2003
+
+ First-level instrumentation
+ ---------------------------
+
+ We use opt to do Bytecode-to-bytecode instrumentation. Look at
+ back-edges and insert llvm_first_trigger() function call which takes
+ no arguments and no return value. This instrumentation is designed to
+ be easy to remove, for instance by writing a NOP over the function
+ call instruction.
+
+ Keep count of every call to llvm_first_trigger(), and maintain
+ counters in a map indexed by return address. If the trigger count
+ exceeds a threshold, we identify a hot loop and perform second-level
+ instrumentation on the hot loop region (the instructions between the
+ target of the back-edge and the branch that causes the back-edge). We
+ do not move code across basic-block boundaries.
+
+
+ Second-level instrumentation
+ ---------------------------
+
+ We remove the first-level instrumentation by overwriting the CALL to
+ llvm_first_trigger() with a NOP.
+
+ The reoptimizer maintains a map between machine-code basic blocks and
+ LLVM BasicBlock*s. We only keep track of paths that start at the
+ first machine-code basic block of the hot loop region.
+
+ How do we keep track of which edges to instrument, and which edges are
+ exits from the hot region? 3 step process.
+
+ 1) Do a DFS from the first machine-code basic block of the hot loop
+ region and mark reachable edges.
+
+ 2) Do a DFS from the last machine-code basic block of the hot loop
+ region IGNORING back edges, and mark the edges which are reachable in
+ 1) and also in 2) (i.e., must be reachable from both the start BB and
+ the end BB of the hot region).
+
+ 3) Mark BBs which end in edges that exit the hot region; we need to
+ instrument these differently.
+
+ Assume that there is 1 free register. On SPARC we use %g1, which LLC
+ has agreed not to use. Shift a 1 into it at the beginning. At every
+ edge which corresponds to a conditional branch, we shift 0 for not
+ taken and 1 for taken into a register. This uniquely numbers the paths
+ through the hot region. Silently fail if we need more than 64 bits.
+
+ At the end BB we call countPath and increment the counter based on %g1
+ and the return address of the countPath call. We keep track of the
+ number of iterations and the number of paths. We only run this
+ version 30 or 40 times.
+
+ Find the BBs that total 90% or more of execution, and aggregate them
+ together to form our trace. But we do not allow more than 5 paths; if
+ we have more than 5 we take the ones that are executed the most. We
+ verify our assumption that we picked a hot back-edge in first-level
+ instrumentation, by making sure that the number of times we took an
+ exit edge from the hot trace is less than 10% of the number of
+ iterations.
+
+ LLC has been taught to recognize llvm_first_trigger() calls and NOT
+ generate saves and restores of caller-saved registers around these
+ calls.
+
+
+ Phase behavior
+ --------------
+
+ We turn off llvm_first_trigger() calls with NOPs, but this would hide
+ phase behavior from us (when some funcs/traces stop being hot and
+ others become hot.)
+
+ We have a SIGALRM timer that counts time for us. Every time we get a
+ SIGALRM we look at our priority queue of locations where we have
+ removed llvm_first_trigger() calls. Each location is inserted along
+ with a time when we will next turn instrumentation back on for that
+ call site. If the time has arrived for a particular call site, we pop
+ that off the prio. queue and turn instrumentation back on for that
+ call site.
+
+
+ Generating traces
+ -----------------
+
+ When we finally generate an optimized trace we first copy the code
+ into the trace cache. This leaves us with 3 copies of the code: the
+ original code, the instrumented code, and the optimized trace. The
+ optimized trace does not have instrumentation. The original code and
+ the instrumented code are modified to have a branch to the trace
+ cache, where the optimized traces are kept.
+
+ We copy the code from the original to the instrumentation version
+ by tracing the LLVM-to-Machine code basic block map and then copying
+ each machine code basic block we think is in the hot region into the
+ trace cache. Then we instrument that code. The process is similar for
+ generating the final optimized trace; we copy the same basic blocks
+ because we might need to put in fixup code for exit BBs.
+
+ LLVM basic blocks are not typically used in the Reoptimizer except
+ for the mapping information.
+
+ We are restricted to using single instructions to branch between the
+ original code, trace, and instrumented code. So we have to keep the
+ code copies in memory near the original code (they can't be far enough
+ away that a single pc-relative branch would not work.) Malloc() or
+ data region space is too far away. this impacts the design of the
+ trace cache.
+
+ We use a dummy function that is full of a bunch of for loops which we
+ overwrite with trace-cache code. The trace manager keeps track of
+ whether or not we have enough space in the trace cache, etc.
+
+ The trace insertion routine takes an original start address, a vector
+ of machine instructions representing the trace, index of branches and
+ their corresponding absolute targets, and index of calls and their
+ corresponding absolute targets.
+
+ The trace insertion routine is responsible for inserting branches from
+ the beginning of the original code to the beginning of the optimized
+ trace. This is because at some point the trace cache may run out of
+ space and it may have to evict a trace, at which point the branch to
+ the trace would also have to be removed. It uses a round-robin
+ replacement policy; we have found that this is almost as good as LRU
+ and better than random (especially because of problems fitting the new
+ trace in.)
+
+ We cannot deal with discontiguous trace cache areas. The trace cache
+ is supposed to be cache-line-aligned, but it is not page-aligned.
+
+ We generate instrumentation traces and optimized traces into separate
+ trace caches. We keep the instrumented code around because you don't
+ want to delete a trace when you still might have to return to it
+ (i.e., return from a llvm_first_trigger() or countPath() call.)
+
+
Index: llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt:1.1
*** /dev/null Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,110 ----
+ Thu Jun 26 14:43:04 CDT 2003
+
+ Information about BinInterface
+ ------------------------------
+
+ Take in a set of instructions with some particular register
+ allocation. It allows you to add, modify, or delete some instructions,
+ in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate
+ registers. It assumes that the transformations you are doing are safe.
+ It does not update the mapping information or the LLVM representation
+ for the modified trace (so it would not, for instance, support
+ multiple optimization passes; passes have to be aware of and update
+ manually the mapping information.)
+
+ The way you use it is you take the original code and provide it to
+ BinInterface; then you do optimizations to it, then you put it in the
+ trace cache.
+
+ The BinInterface tries to find live-outs for traces so that it can do
+ register allocation on just the trace, and stitch the trace back into
+ the original code. It has to preserve the live-ins and live-outs when
+ it does its register allocation. (On exits from the trace we have
+ epilogues that copy live-outs back into the right registers, but
+ live-ins have to be in the right registers.)
+
+
+ Limitations of BinInterface
+ ---------------------------
+
+ It does copy insertions for PHIs, which it infers from the machine
+ code. The mapping info inserted by LLC is not sufficient to determine
+ the PHIs.
+
+ It does not handle integer or floating-point condition codes and it
+ does not handle floating-point register allocation.
+
+ It is not aggressively able to use lots of registers.
+
+ There is a problem with alloca: we cannot find our spill space for
+ spilling registers, normally allocated on the stack, if the trace
+ follows an alloca(). What might be an acceptable solution would be to
+ disable trace generation on functions that have variable-sized
+ alloca()s. Variable-sized allocas in the trace would also probably
+ screw things up.
+
+ Because of the FP and alloca limitations, the BinInterface is
+ completely disabled right now.
+
+
+ Demo
+ ----
+
+ This is a demo of the Ball & Larus version that does NOT use 2-level
+ profiling.
+
+ 1. Compile program with llvm-gcc.
+ 2. Run opt -lowerswitch -paths -emitfuncs on the bytecode.
+ -lowerswitch change switch statements to branches
+ -paths Ball & Larus path-profiling algorithm
+ -emitfuncs emit the table of functions
+ 3. Run llc to generate SPARC assembly code for the result of step 2.
+ 4. Use g++ to link the (instrumented) assembly code.
+
+ We use a script to do all this:
+ ------------------------------------------------------------------------------
+ #!/bin/sh
+ llvm-gcc $1.c -o $1
+ opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc
+ llc -f $1.run.bc
+ LIBS=$HOME/llvm_sparc/lib/Debug
+ GXX=/usr/dcs/software/evaluation/bin/g++
+ $GXX -g -L $LIBS $1.run.s -o $1.run.llc \
+ $LIBS/tracecache.o \
+ $LIBS/mapinfo.o \
+ $LIBS/trigger.o \
+ $LIBS/profpaths.o \
+ $LIBS/bininterface.o \
+ $LIBS/support.o \
+ $LIBS/vmcore.o \
+ $LIBS/transformutils.o \
+ $LIBS/bcreader.o \
+ -lscalaropts -lscalaropts -lanalysis \
+ -lmalloc -lcpc -lm -ldl
+ ------------------------------------------------------------------------------
+
+ 5. Run the resulting binary. You will see output from BinInterface
+ (described below) intermixed with the output from the program.
+
+
+ Output from BinInterface
+ ------------------------
+
+ BinInterface's debugging code prints out the following stuff in order:
+
+ 1. Initial code provided to BinInterface with original register
+ allocation.
+
+ 2. Section 0 is the trace prolog, consisting mainly of live-ins and
+ register saves which will be restored in epilogs.
+
+ 3. Section 1 is the trace itself, in SSA form used by BinInterface,
+ along with the PHIs that are inserted.
+ PHIs are followed by the copies that implement them.
+ Each branch (i.e., out of the trace) is annotated with the
+ section number that represents the epilog it branches to.
+
+ 4. All the other sections starting with Section 2 are trace epilogs.
+ Every branch from the trace has to go to some epilog.
+
+ 5. After the last section is the register allocation output.
More information about the llvm-commits
mailing list