Sun Nov 19 23:27:58 PST 2006

Changes in directory llvm-www/releases/1.9/docs/HistoricalNotes:

2000-11-18-EarlyDesignIdeas.txt added (r1.1)
2000-11-18-EarlyDesignIdeasResp.txt added (r1.1)
2000-12-06-EncodingIdea.txt added (r1.1)
2000-12-06-MeetingSummary.txt added (r1.1)
2001-01-31-UniversalIRIdea.txt added (r1.1)
2001-02-06-TypeNotationDebate.txt added (r1.1)
2001-02-06-TypeNotationDebateResp1.txt added (r1.1)
2001-02-06-TypeNotationDebateResp2.txt added (r1.1)
2001-02-06-TypeNotationDebateResp4.txt added (r1.1)
2001-02-09-AdveComments.txt added (r1.1)
2001-02-09-AdveCommentsResponse.txt added (r1.1)
2001-02-13-Reference-Memory.txt added (r1.1)
2001-02-13-Reference-MemoryResponse.txt added (r1.1)
2001-04-16-DynamicCompilation.txt added (r1.1)
2001-05-18-ExceptionHandling.txt added (r1.1)
2001-05-19-ExceptionResponse.txt added (r1.1)
2001-06-01-GCCOptimizations.txt added (r1.1)
2001-06-01-GCCOptimizations2.txt added (r1.1)
2001-06-20-.NET-Differences.txt added (r1.1)
2001-07-06-LoweringIRForCodeGen.txt added (r1.1)
2001-07-08-InstructionSelection.txt added (r1.1)
2001-07-08-InstructionSelection2.txt added (r1.1)
2001-09-18-OptimizeExceptions.txt added (r1.1)
2002-05-12-InstListChange.txt added (r1.1)
2002-06-25-MegaPatchInfo.txt added (r1.1)
2003-01-23-CygwinNotes.txt added (r1.1)
2003-06-25-Reoptimizer1.txt added (r1.1)
2003-06-26-Reoptimizer2.txt added (r1.1)
---
Log message:

1.9 docs

---
Diffs of the changes:  (+2185 -0)

 2000-11-18-EarlyDesignIdeas.txt         |   74 +++++++++
 2000-11-18-EarlyDesignIdeasResp.txt     |  199 +++++++++++++++++++++++++
 2000-12-06-EncodingIdea.txt             |   30 +++
 2000-12-06-MeetingSummary.txt           |   83 ++++++++++
 2001-01-31-UniversalIRIdea.txt          |   39 +++++
 2001-02-06-TypeNotationDebate.txt       |   67 ++++++++
 2001-02-06-TypeNotationDebateResp1.txt  |   75 +++++++++
 2001-02-06-TypeNotationDebateResp2.txt  |   53 ++++++
 2001-02-06-TypeNotationDebateResp4.txt  |   89 +++++++++++
 2001-02-09-AdveComments.txt             |  120 +++++++++++++++
 2001-02-09-AdveCommentsResponse.txt     |  245 ++++++++++++++++++++++++++++++++
 2001-02-13-Reference-Memory.txt         |   39 +++++
 2001-02-13-Reference-MemoryResponse.txt |   47 ++++++
 2001-04-16-DynamicCompilation.txt       |   49 ++++++
 2001-05-18-ExceptionHandling.txt        |  202 ++++++++++++++++++++++++++
 2001-05-19-ExceptionResponse.txt        |   45 +++++
 2001-06-01-GCCOptimizations.txt         |   63 ++++++++
 2001-06-01-GCCOptimizations2.txt        |   71 +++++++++
 2001-06-20-.NET-Differences.txt         |   30 +++
 2001-07-06-LoweringIRForCodeGen.txt     |   31 ++++
 2001-07-08-InstructionSelection.txt     |   51 ++++++
 2001-07-08-InstructionSelection2.txt    |   25 +++
 2001-09-18-OptimizeExceptions.txt       |   56 +++++++
 2002-05-12-InstListChange.txt           |   55 +++++++
 2002-06-25-MegaPatchInfo.txt            |   72 +++++++++
 2003-01-23-CygwinNotes.txt              |   28 +++
 2003-06-25-Reoptimizer1.txt             |  137 +++++++++++++++++
 2003-06-26-Reoptimizer2.txt             |  110 ++++++++++++++
 28 files changed, 2185 insertions(+)

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt:1.1
*** /dev/null	Mon Nov 20 01:27:55 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeas.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,74 ----
+ Date: Sat, 18 Nov 2000 09:19:35 -0600 (CST)
+ From: Vikram Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: a few thoughts
+ 
+ I've been mulling over the virtual machine problem and I had some
+ thoughts about some things for us to think about discuss:
+ 
+ 1. We need to be clear on our goals for the VM.  Do we want to emphasize
+    portability and safety like the Java VM?  Or shall we focus on the
+    architecture interface first (i.e., consider the code generation and
+    processor issues), since the architecture interface question is also
+    important for portable Java-type VMs?
+ 
+    This is important because the audiences for these two goals are very
+    different.  Architects and many compiler people care much more about
+    the second question.  The Java compiler and OS community care much more
+    about the first one.
+ 
+    Also, while the architecture interface question is important for
+    Java-type VMs, the design constraints are very different.
+ 
+ 
+ 2. Design issues to consider (an initial list that we should continue
+    to modify).  Note that I'm not trying to suggest actual solutions here,
+    but just various directions we can pursue:
+ 
+    a. A single-assignment VM, which we've both already been thinking about.
+ 
+    b. A strongly-typed VM.  One question is do we need the types to be
+       explicitly declared or should they be inferred by the dynamic compiler?
+ 
+    c. How do we get more high-level information into the VM while keeping
+       to a low-level VM design?
+ 
+         o  Explicit array references as operands?  An alternative is
+            to have just an array type, and let the index computations be
+            separate 3-operand instructions.
+ 
+         o  Explicit instructions to handle aliasing, e.g.s:
+            -- an instruction to say "I speculate that these two values are not
+               aliased, but check at runtime", like speculative execution in
+               EPIC?
+            -- or an instruction to check whether two values are aliased and
+               execute different code depending on the answer, somewhat like
+               predicated code in EPIC
+ 
+         o  (This one is a difficult but powerful idea.)
+            A "thread-id" field on every instruction that allows the static
+            compiler to generate a set of parallel threads, and then have
+            the runtime compiler and hardware do what they please with it.
+            This has very powerful uses, but thread-id on every instruction
+            is expensive in terms of instruction size and code size.
+            We would need to compactly encode it somehow.
+ 
+            Also, this will require some reading on at least two other
+            projects:
+                 -- Multiscalar architecture from Wisconsin
+                 -- Simultaneous multithreading architecture from Washington
+ 
+         o  Or forget all this and stick to a traditional instruction set?
+ 
+ 
+ BTW, on an unrelated note, after the meeting yesterday, I did remember
+ that you had suggested doing instruction scheduling on SSA form instead
+ of a dependence DAG earlier in the semester.  When we talked about
+ it yesterday, I didn't remember where the idea had come from but I
+ remembered later.  Just giving credit where its due...
+ 
+ Perhaps you can save the above as a file under RCS so you and I can
+ continue to expand on this.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt:1.1
*** /dev/null	Mon Nov 20 01:27:57 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-11-18-EarlyDesignIdeasResp.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,199 ----
+ Date: Sun, 19 Nov 2000 16:23:57 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram Adve <vadve at cs.uiuc.edu>
+ Subject: Re: a few thoughts
+ 
+ Okay... here are a few of my thoughts on this (it's good to know that we
+ think so alike!):
+ 
+ > 1. We need to be clear on our goals for the VM.  Do we want to emphasize
+ >    portability and safety like the Java VM?  Or shall we focus on the
+ >    architecture interface first (i.e., consider the code generation and
+ >    processor issues), since the architecture interface question is also
+ >    important for portable Java-type VMs?
+ 
+ I forsee the architecture looking kinda like this: (which is completely
+ subject to change)
+ 
+ 1. The VM code is NOT guaranteed safe in a java sense.  Doing so makes it
+    basically impossible to support C like languages.  Besides that,
+    certifying a register based language as safe at run time would be a
+    pretty expensive operation to have to do.  Additionally, we would like
+    to be able to statically eliminate many bounds checks in Java
+    programs... for example.
+ 
+  2. Instead, we can do the following (eventually): 
+    * Java bytecode is used as our "safe" representation (to avoid
+      reinventing something that we don't add much value to).  When the
+      user chooses to execute Java bytecodes directly (ie, not
+      precompiled) the runtime compiler can do some very simple
+      transformations (JIT style) to convert it into valid input for our
+      VM.  Performance is not wonderful, but it works right.
+    * The file is scheduled to be compiled (rigorously) at a later
+      time.  This could be done by some background process or by a second
+      processor in the system during idle time or something...
+    * To keep things "safe" ie to enforce a sandbox on Java/foreign code,
+      we could sign the generated VM code with a host specific private
+      key.  Then before the code is executed/loaded, we can check to see if
+      the trusted compiler generated the code.  This would be much quicker
+      than having to validate consistency (especially if bounds checks have
+      been removed, for example)
+ 
+ >    This is important because the audiences for these two goals are very
+ >    different.  Architects and many compiler people care much more about
+ >    the second question.  The Java compiler and OS community care much more
+ >    about the first one.
+ 
+ 3. By focusing on a more low level virtual machine, we have much more room
+    for value add.  The nice safe "sandbox" VM can be provided as a layer
+    on top of it.  It also lets us focus on the more interesting compilers
+    related projects.
+ 
+ > 2. Design issues to consider (an initial list that we should continue
+ >    to modify).  Note that I'm not trying to suggest actual solutions here,
+ >    but just various directions we can pursue:
+ 
+ Understood.  :)
+ 
+ >    a. A single-assignment VM, which we've both already been thinking
+ >       about.
+ 
+ Yup, I think that this makes a lot of sense.  I am still intrigued,
+ however, by the prospect of a minimally allocated VM representation... I
+ think that it could have definate advantages for certain applications
+ (think very small machines, like PDAs).  I don't, however, think that our
+ initial implementations should focus on this.  :)
+ 
+ Here are some other auxilliary goals that I think we should consider:
+ 
+ 1. Primary goal: Support a high performance dynamic compilation
+    system.  This means that we have an "ideal" division of labor between
+    the runtime and static compilers.  Of course, the other goals of the
+    system somewhat reduce the importance of this point (f.e. portability
+    reduces performance, but hopefully not much)
+ 2. Portability to different processors.  Since we are most familiar with
+    x86 and solaris, I think that these two are excellent candidates when
+    we get that far...
+ 3. Support for all languages & styles of programming (general purpose
+    VM).  This is the point that disallows java style bytecodes, where all
+    array refs are checked for bounds, etc...
+ 4. Support linking between different language families.  For example, call
+    C functions directly from Java without using the nasty/slow/gross JNI
+    layer.  This involves several subpoints:
+   A. Support for languages that require garbage collectors and integration
+      with languages that don't.  As a base point, we could insist on
+      always using a conservative GC, but implement free as a noop, f.e.
+ 
+ >    b. A strongly-typed VM.  One question is do we need the types to be
+ >       explicitly declared or should they be inferred by the dynamic
+ >       compiler?
+ 
+   B. This is kind of similar to another idea that I have: make OOP
+      constructs (virtual function tables, class heirarchies, etc) explicit
+      in the VM representation.  I believe that the number of additional
+      constructs would be fairly low, but would give us lots of important
+      information... something else that would/could be important is to
+      have exceptions as first class types so that they would be handled in
+      a uniform way for the entire VM... so that C functions can call Java
+      functions for example...
+ 
+ >    c. How do we get more high-level information into the VM while keeping
+ >       to a low-level VM design?
+ >       o  Explicit array references as operands?  An alternative is
+ >          to have just an array type, and let the index computations be
+ >          separate 3-operand instructions.
+ 
+    C. In the model I was thinking of (subject to change of course), we
+       would just have an array type (distinct from the pointer
+       types).  This would allow us to have arbitrarily complex index
+       expressions, while still distinguishing "load" from "Array load",
+       for example.  Perhaps also, switch jump tables would be first class
+       types as well?  This would allow better reasoning about the program.
+ 
+ 5. Support dynamic loading of code from various sources.  Already
+    mentioned above was the example of loading java bytecodes, but we want
+    to support dynamic loading of VM code as well.  This makes the job of
+    the runtime compiler much more interesting:  it can do interprocedural
+    optimizations that the static compiler can't do, because it doesn't
+    have all of the required information (for example, inlining from
+    shared libraries, etc...)
+ 
+ 6. Define a set of generally useful annotations to add to the VM
+    representation.  For example, a function can be analysed to see if it
+    has any sideeffects when run... also, the MOD/REF sets could be
+    calculated, etc... we would have to determine what is reasonable.  This
+    would generally be used to make IP optimizations cheaper for the
+    runtime compiler...
+ 
+ >       o  Explicit instructions to handle aliasing, e.g.s:
+ >            -- an instruction to say "I speculate that these two values are not
+ >               aliased, but check at runtime", like speculative execution in
+ >             EPIC?
+ >          -- or an instruction to check whether two values are aliased and
+ >             execute different code depending on the answer, somewhat like
+ >             predicated code in EPIC
+ 
+ These are also very good points... if this can be determined at compile
+ time.  I think that an epic style of representation (not the instruction
+ packing, just the information presented) could be a very interesting model
+ to use... more later...
+ 
+ >         o  (This one is a difficult but powerful idea.)
+ >          A "thread-id" field on every instruction that allows the static
+ >          compiler to generate a set of parallel threads, and then have
+ >          the runtime compiler and hardware do what they please with it.
+ >          This has very powerful uses, but thread-id on every instruction
+ >          is expensive in terms of instruction size and code size.
+ >          We would need to compactly encode it somehow.
+ 
+ Yes yes yes!  :)  I think it would be *VERY* useful to include this kind
+ of information (which EPIC architectures *implicitly* encode.  The trend
+ that we are seeing supports this greatly:
+ 
+ 1. Commodity processors are getting massive SIMD support:
+    * Intel/Amd MMX/MMX2
+    * AMD's 3Dnow!
+    * Intel's SSE/SSE2
+    * Sun's VIS
+ 2. SMP is becoming much more common, especially in the server space.
+ 3. Multiple processors on a die are right around the corner.
+ 
+ If nothing else, not designing this in would severely limit our future
+ expansion of the project...
+ 
+ >          Also, this will require some reading on at least two other
+ >          projects:
+ >               -- Multiscalar architecture from Wisconsin
+ >               -- Simultaneous multithreading architecture from Washington
+ >
+ >       o  Or forget all this and stick to a traditional instruction set?
+ 
+ Heh... :)  Well, from a pure research point of view, it is almost more
+ attactive to go with the most extreme/different ISA possible.  On one axis
+ you get safety and conservatism, and on the other you get degree of
+ influence that the results have.  Of course the problem with pure research
+ is that often times there is no concrete product of the research... :)
+ 
+ > BTW, on an unrelated note, after the meeting yesterday, I did remember
+ > that you had suggested doing instruction scheduling on SSA form instead
+ > of a dependence DAG earlier in the semester.  When we talked about
+ > it yesterday, I didn't remember where the idea had come from but I
+ > remembered later.  Just giving credit where its due...
+ 
+ :) Thanks.  
+ 
+ > Perhaps you can save the above as a file under RCS so you and I can
+ > continue to expand on this.
+ 
+ I think it makes sense to do so when we get our ideas more formalized and
+ bounce it back and forth a couple of times... then I'll do a more formal
+ writeup of our goals and ideas.  Obviously our first implementation will
+ not want to do all of the stuff that I pointed out above... be we will
+ want to design the project so that we do not artificially limit ourselves
+ at sometime in the future...
+ 
+ Anyways, let me know what you think about these ideas... and if they sound
+ reasonable...
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt:1.1
*** /dev/null	Mon Nov 20 01:27:57 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-EncodingIdea.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,30 ----
+ From: Chris Lattner [mailto:sabre at nondot.org]
+ Sent: Wednesday, December 06, 2000 6:41 PM
+ To: Vikram S. Adve
+ Subject: Additional idea with respect to encoding
+ 
+ Here's another idea with respect to keeping the common case instruction
+ size down (less than 32 bits ideally):
+ 
+ Instead of encoding an instruction to operate on two register numbers,
+ have it operate on two negative offsets based on the current register
+ number.  Therefore, instead of using:
+ 
+ r57 = add r55, r56  (r57 is the implicit dest register, of course)
+ 
+ We could use:
+ 
+ r57 = add -2, -1
+ 
+ My guess is that most SSA references are to recent values (especially if
+ they correspond to expressions like (x+y*z+p*q/ ...), so the negative
+ numbers would tend to stay small, even at the end of the procedure (where
+ the implicit register destination number could be quite large).  Of course
+ the negative sign is reduntant, so you would be storing small integers
+ almost all of the time, and 5-6 bits worth of register number would be
+ plenty for most cases...
+ 
+ What do you think?
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt:1.1
*** /dev/null	Mon Nov 20 01:27:57 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2000-12-06-MeetingSummary.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,83 ----
+ SUMMARY
+ -------
+ 
+ We met to discuss the LLVM instruction format and bytecode representation:
+ 
+ ISSUES RESOLVED
+ ---------------
+ 
+ 1. We decided that we shall use a flat namespace to represent our 
+    variables in SSA form, as opposed to having a two dimensional namespace
+    of the original variable and the SSA instance subscript.
+ 
+ ARGUMENT AGAINST:
+    * A two dimensional namespace would be valuable when doing alias 
+      analysis because the extra information can help limit the scope of
+      analysis.
+ 
+ ARGUMENT FOR:
+    * Including this information would require that all users of the LLVM
+      bytecode would have to parse and handle it.  This would slow down the
+      common case and inflate the instruction representation with another
+      infinite variable space.
+ 
+ REASONING:
+    * It was decided that because original variable sources could be
+      reconstructed from SSA form in linear time, that it would be an
+      unjustified expense for the common case to include the extra
+      information for one optimization.  Alias analysis itself is typically
+      greater than linear in asymptotic complexity, so this extra analaysis
+      would not affect the runtime of the optimization in a significant
+      way.  Additionally, this would be an unlikely optimization to do at
+      runtime.
+ 
+ 
+ IDEAS TO CONSIDER
+ -----------------
+ 
+ 1. Including dominator information in the LLVM bytecode
+    representation.  This is one example of an analysis result that may be
+    packaged with the bytecodes themselves.  As a conceptual implementation 
+    idea, we could include an immediate dominator number for each basic block
+    in the LLVM bytecode program.  Basic blocks could be numbered according
+    to the order of occurance in the bytecode representation.
+ 
+ 2. Including loop header and body information.  This would facilitate
+    detection of intervals and natural loops.
+ 
+ UNRESOLVED ISSUES 
+ ----------------- 
+ 
+ 1. Will oSUIF provide enough of an infrastructure to support the research
+    that we will be doing?  We know that it has less than stellar
+    performance, but hope that this will be of little importance for our
+    static compiler.  This could affect us if we decided to do some IP
+    research.  Also we do not yet understand the level of exception support
+    currently implemented.
+ 
+ 2. Should we consider the requirements of a direct hardware implementation
+    of the LLVM when we design it?  If so, several design issues should
+    have their priorities shifted.  The other option is to focus on a
+    software layer interpreting the LLVM in all cases.
+ 
+ 3. Should we use some form of packetized format to improve forward
+    compatibility?  For example, we could design the system to encode a
+    packet type and length field before analysis information, to allow a
+    runtime to skip information that it didn't understand in a bytecode
+    stream.  The obvious benefit would be for compatibility, the drawback
+    is that it would tend to splinter that 'standard' LLVM definition.
+ 
+ 4. Should we use fixed length instructions or variable length
+    instructions?  Fetching variable length instructions is expensive (for
+    either hardware or software based LLVM runtimes), but we have several
+    'infinite' spaces that instructions operate in (SSA register numbers,
+    type spaces, or packet length [if packets were implemented]).  Several
+    options were mentioned including: 
+      A. Using 16 or 32 bit numbers, which would be 'big enough'
+      B. A scheme similar to how UTF-8 works, to encode infinite numbers
+         while keeping small number small.
+      C. Use something similar to Huffman encoding, so that the most common
+         numbers are the smallest.
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-01-31-UniversalIRIdea.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,39 ----
+ Date: Wed, 31 Jan 2001 12:04:33 -0600
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: another thought
+ 
+ I have a budding idea about making LLVM a little more ambitious: a
+ customizable runtime system that can be used to implement language-specific
+ virtual machines for many different languages.  E.g., a C vm, a C++ vm, a
+ Java vm, a Lisp vm, ..
+ 
+ The idea would be that LLVM would provide a standard set of runtime features
+ (some low-level like standard assembly instructions with code generation and
+ static and runtime optimization; some higher-level like type-safety and
+ perhaps a garbage collection library).  Each language vm would select the
+ runtime features needed for that language, extending or customizing them as
+ needed.  Most of the machine-dependent code-generation and optimization
+ features as well as low-level machine-independent optimizations (like PRE)
+ could be provided by LLVM and should be sufficient for any language,
+ simplifying the language compiler.  (This would also help interoperability
+ between languages.)  Also, some or most of the higher-level
+ machine-independent features like type-safety and access safety should be
+ reusable by different languages, with minor extensions.  The language
+ compiler could then focus on language-specific analyses and optimizations.
+ 
+ The risk is that this sounds like a universal IR -- something that the
+ compiler community has tried and failed to develop for decades, and is
+ universally skeptical about.  No matter what we say, we won't be able to
+ convince anyone that we have a universal IR that will work.  We need to
+ think about whether LLVM is different or if has something novel that might
+ convince people.  E.g., the idea of providing a package of separable
+ features that different languages select from.  Also, using SSA with or
+ without type-safety as the intermediate representation.
+ 
+ One interesting starting point would be to discuss how a JVM would be
+ implemented on top of LLVM a bit more.  That might give us clues on how to
+ structure LLVM to support one or more language VMs.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebate.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,67 ----
+ Date: Tue, 6 Feb 2001 20:27:37 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: Type notation debate...
+ 
+ This is the way that I am currently planning on implementing types:
+ 
+ Primitive Types:        
+ type ::= void|bool|sbyte|ubyte|short|ushort|int|uint|long|ulong
+ 
+ Method:
+ typelist ::= typelisth | /*empty*/
+ typelisth ::= type | typelisth ',' type
+ type ::= type (typelist)
+ 
+ Arrays (without and with size):
+ type ::= '[' type ']' | '[' INT ',' type ']'
+ 
+ Pointer:
+ type ::= type '*'
+ 
+ Structure:
+ type ::= '{' typelist '}'
+ 
+ Packed:
+ type ::= '<' INT ',' type '>'
+ 
+ Simple examples:
+ 
+ [[ %4, int ]]   - array of (array of 4 (int))
+ [ { int, int } ] - Array of structure
+ [ < %4, int > ] - Array of 128 bit SIMD packets
+ int (int, [[int, %4]])  - Method taking a 2d array and int, returning int
+ 
+ 
+ Okay before you comment, please look at:
+ 
+ http://www.research.att.com/~bs/devXinterview.html
+ 
+ Search for "In another interview, you defined the C declarator syntax as
+ an experiment that failed. However, this syntactic construct has been
+ around for 27 years and perhaps more; why do you consider it problematic
+ (except for its cumbersome syntax)?" and read that response for me.  :)
+ 
+ Now with this syntax, his example would be represented as:
+ 
+ [ %10, bool (int, int) * ] *
+ 
+ vs 
+ 
+ bool (*(*)[10])(int, int)
+ 
+ in C.
+ 
+ Basically, my argument for this type construction system is that it is
+ VERY simple to use and understand (although it IS different than C, it is
+ very simple and straightforward, which C is NOT).  In fact, I would assert
+ that most programmers TODAY do not understand pointers to member
+ functions, and have to look up an example when they have to write them.
+ 
+ In my opinion, it is critically important to have clear and concise type
+ specifications, because types are going to be all over the programs.
+ 
+ Let me know your thoughts on this.  :)
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp1.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,75 ----
+ Date: Thu, 8 Feb 2001 08:42:04 -0600
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <sabre at nondot.org>
+ Subject: RE: Type notation debate...
+ 
+ Chris,
+ 
+ > Okay before you comment, please look at:
+ >
+ > http://www.research.att.com/~bs/devXinterview.html
+ 
+ I read this argument.  Even before that, I was already in agreement with you
+ and him that the C declarator syntax is difficult and confusing.
+ 
+ But in fact, if you read the entire answer carefully, he came to the same
+ conclusion I do: that you have to go with familiar syntax over logical
+ syntax because familiarity is such a strong force:
+ 
+         "However, familiarity is a strong force. To compare, in English, we
+ live
+ more or less happily with the absurd rules for "to be" (am, are, is, been,
+ was, were, ...) and all attempts to simplify are treated with contempt or
+ (preferably) humor. It be a curious world and it always beed."
+ 
+ > Basically, my argument for this type construction system is that it is
+ > VERY simple to use and understand (although it IS different than C, it is
+ > very simple and straightforward, which C is NOT).  In fact, I would assert
+ > that most programmers TODAY do not understand pointers to member
+ > functions, and have to look up an example when they have to write them.
+ 
+ Again, I don't disagree with this at all.  But to some extent this
+ particular problem is inherently difficult.  Your syntax for the above
+ example may be easier for you to read because this is the way you have been
+ thinking about it.  Honestly, I don't find it much easier than the C syntax.
+ In either case, I would have to look up an example to write pointers to
+ member functions.
+ 
+ But pointers to member functions are nowhere near as common as arrays.  And
+ the old array syntax:
+         type [ int, int, ...]
+ is just much more familiar and clear to people than anything new you
+ introduce, no matter how logical it is.  Introducing a new syntax that may
+ make function pointers easier but makes arrays much more difficult seems
+ very risky to me.
+ 
+ > In my opinion, it is critically important to have clear and concise type
+ > specifications, because types are going to be all over the programs.
+ 
+ I absolutely agree.  But the question is, what is more clear and concise?
+ The syntax programmers are used to out of years of experience or a new
+ syntax that they have never seen that has a more logical structure.  I think
+ the answer is the former.  Sometimes, you have to give up a better idea
+ because you can't overcome sociological barriers to it.  Qwerty keyboards
+ and Windows are two classic examples of bad technology that are difficult to
+ root out.
+ 
+ P.S.  Also, while I agree that most your syntax is more logical, there is
+ one part that isn't:
+ 
+ Arrays (without and with size):
+ type ::= '[' type ']' | '[' INT ',' type ']'.
+ 
+ The arrays with size lists the dimensions and the type in a single list.
+ That is just too confusing:
+         [10, 40, int]
+ This seems to be a 3-D array where the third dimension is something strange.
+ It is too confusing to have a list of 3 things, some of which are dimensions
+ and one is a type.  Either of the following would be better:
+ 
+         array [10, 40] of int
+ or
+         int [10, 40]
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp2.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,53 ----
+ Date: Thu, 8 Feb 2001 14:31:05 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: RE: Type notation debate...
+ 
+ > Arrays (without and with size):
+ > type ::= '[' type ']' | '[' INT ',' type ']'.
+ > 
+ > The arrays with size lists the dimensions and the type in a single list.
+ > That is just too confusing:
+ 
+ >       [10, 40, int]
+ > This seems to be a 3-D array where the third dimension is something strange.
+ > It is too confusing to have a list of 3 things, some of which are dimensions
+ > and one is a type. 
+ 
+ The above grammar indicates that there is only one integer parameter, ie
+ the upper bound.  The lower bound is always implied to be zero, for
+ several reasons:
+ 
+ * As a low level VM, we want to expose addressing computations
+   explicitly.  Since the lower bound must always be known in a high level
+   language statically, the language front end can do the translation
+   automatically.
+ * This fits more closely with what Java needs, ie what we need in the
+   short term.  Java arrays are always zero based.
+ 
+ If a two element list is too confusing, I would recommend an alternate
+ syntax of:
+ 
+ type ::= '[' type ']' | '[' INT 'x' type ']'.
+ 
+ For example:
+   [12 x int]
+   [12x int]
+   [ 12 x [ 4x int ]]
+ 
+ Which is syntactically nicer, and more explicit.
+ 
+ > Either of the following would be better:
+ >       array [10, 40] of int
+ 
+ I considered this approach for arrays in general (ie array of int/ array
+ of 12 int), but found that it made declarations WAY too long.  Remember
+ that because of the nature of llvm, you get a lot of types strewn all over
+ the program, and using the 'typedef' like facility is not a wonderful
+ option, because then types aren't explicit anymore.
+ 
+ I find this email interesting, because you contradict the previous email
+ you sent, where you recommend that we stick to C syntax....
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-06-TypeNotationDebateResp4.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,89 ----
+ > But in fact, if you read the entire answer carefully, he came to the same
+ > conclusion I do: that you have to go with familiar syntax over logical
+ > syntax because familiarity is such a strong force:
+ >       "However, familiarity is a strong force. To compare, in English, we
+ live
+ > more or less happily with the absurd rules for "to be" (am, are, is, been,
+ > was, were, ...) and all attempts to simplify are treated with contempt or
+ > (preferably) humor. It be a curious world and it always beed."
+ 
+ Although you have to remember that his situation was considerably
+ different than ours.  He was in a position where he was designing a high
+ level language that had to be COMPATIBLE with C.  Our language is such
+ that a new person would have to learn the new, different, syntax
+ anyways.  Making them learn about the type system does not seem like much
+ of a stretch from learning the opcodes and how SSA form works, and how
+ everything ties together...
+ 
+ > > Basically, my argument for this type construction system is that it is
+ > > VERY simple to use and understand (although it IS different than C, it is
+ > > very simple and straightforward, which C is NOT).  In fact, I would assert
+ > > that most programmers TODAY do not understand pointers to member
+ > > functions, and have to look up an example when they have to write them.
+ 
+ > Again, I don't disagree with this at all.  But to some extent this
+ > particular problem is inherently difficult.  Your syntax for the above
+ > example may be easier for you to read because this is the way you have been
+ > thinking about it.  Honestly, I don't find it much easier than the C syntax.
+ > In either case, I would have to look up an example to write pointers to
+ > member functions.
+ 
+ I would argue that because the lexical structure of the language is self
+ consistent, any person who spent a significant amount of time programming
+ in LLVM directly would understand how to do it without looking it up in a
+ manual.  The reason this does not work for C is because you rarely have to
+ declare these pointers, and the syntax is inconsistent with the method
+ declaration and calling syntax.
+ 
+ > But pointers to member functions are nowhere near as common as arrays.
+ 
+ Very true.  If you're implementing an object oriented language, however,
+ remember that you have to do all the pointer to member function stuff
+ yourself.... so everytime you invoke a virtual method one is involved
+ (instead of having C++ hide it for you behind "syntactic sugar").
+ 
+ > And the old array syntax:
+ >       type [ int, int, ...]
+ > is just much more familiar and clear to people than anything new you
+ > introduce, no matter how logical it is.  
+ 
+ Erm... excuse me but how is this the "old array syntax"?  If you are
+ arguing for consistency with C, you should be asking for 'type int []',
+ which is significantly different than the above (beside the above
+ introduces a new operator and duplicates information
+ needlessly).  Basically what I am suggesting is exactly the above without
+ the fluff.  So instead of:
+ 
+        type [ int, int, ...]
+ 
+ you use:
+ 
+        type [ int ]
+ 
+ > Introducing a new syntax that may
+ > make function pointers easier but makes arrays much more difficult seems
+ > very risky to me.
+ 
+ This is not about function pointers.  This is about consistency in the
+ type system, and consistency with the rest of the language.  The point
+ above does not make arrays any more difficult to use, and makes the
+ structure of types much more obvious than the "c way".
+ 
+ > > In my opinion, it is critically important to have clear and concise type
+ > > specifications, because types are going to be all over the programs.
+ > 
+ > I absolutely agree.  But the question is, what is more clear and concise?
+ > The syntax programmers are used to out of years of experience or a new
+ > syntax that they have never seen that has a more logical structure.  I think
+ > the answer is the former.  Sometimes, you have to give up a better idea
+ > because you can't overcome sociological barriers to it.  Qwerty keyboards
+ > and Windows are two classic examples of bad technology that are difficult to
+ > root out.
+ 
+ Very true, but you seem to be advocating a completely different Type
+ system than C has, in addition to it not offering the advantages of clear
+ structure that the system I recommended does... so you seem to not have a
+ problem with changing this, just with what I change it to.  :)
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveComments.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveComments.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveComments.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,120 ----
+ Ok, here are my comments and suggestions about the LLVM instruction set.
+ We should discuss some now, but can discuss many of them later, when we
+ revisit synchronization, type inference, and other issues.
+ (We have discussed some of the comments already.)
+ 
+ 
+ o  We should consider eliminating the type annotation in cases where it is
+    essentially obvious from the instruction type, e.g., in br, it is obvious
+    that the first arg. should be a bool and the other args should be labels:
+ 
+ 	br bool <cond>, label <iftrue>, label <iffalse>
+ 
+    I think your point was that making all types explicit improves clarity
+    and readability.  I agree to some extent, but it also comes at the cost
+    of verbosity.  And when the types are obvious from people's experience
+    (e.g., in the br instruction), it doesn't seem to help as much.
+ 
+ 
+ o  On reflection, I really like your idea of having the two different switch
+    types (even though they encode implementation techniques rather than
+    semantics).  It should simplify building the CFG and my guess is it could
+    enable some significant optimizations, though we should think about which.
+ 
+ 
+ o  In the lookup-indirect form of the switch, is there a reason not to make
+    the val-type uint?  Most HLL switch statements (including Java and C++)
+    require that anyway.  And it would also make the val-type uniform 
+    in the two forms of the switch.
+ 
+    I did see the switch-on-bool examples and, while cute, we can just use
+    the branch instructions in that particular case.
+ 
+ 
+ o  I agree with your comment that we don't need 'neg'.
+ 
+ 
+ o  There's a trade-off with the cast instruction:
+    +  it avoids having to define all the upcasts and downcasts that are
+       valid for the operands of each instruction  (you probably have thought
+       of other benefits also)
+    -  it could make the bytecode significantly larger because there could
+       be a lot of cast operations
+ 
+ 
+ o  Making the second arg. to 'shl' a ubyte seems good enough to me.
+    255 positions seems adequate for several generations of machines
+    and is more compact than uint.
+ 
+ 
+ o  I still have some major concerns about including malloc and free in the
+    language (either as builtin functions or instructions).  LLVM must be
+    able to represent code from many different languages.  Languages such as
+    C, C++ Java and Fortran 90 would not be able to use our malloc anyway
+    because each of them will want to provide a library implementation of it.
+ 
+    This gets even worse when code from different languages is linked
+    into a single executable (which is fairly common in large apps).
+    Having a single malloc would just not suffice, and instead would simply
+    complicate the picture further because it adds an extra variant in
+    addition to the one each language provides.
+ 
+    Instead, providing a default library version of malloc and free
+    (and perhaps a malloc_gc with garbage collection instead of free)
+    would make a good implementation available to anyone who wants it.
+ 
+    I don't recall all your arguments in favor so let's discuss this again,
+    and soon.
+ 
+ 
+ o  'alloca' on the other hand sounds like a good idea, and the
+    implementation seems fairly language-independent so it doesn't have the
+    problems with malloc listed above.
+ 
+ 
+ o  About indirect call:
+    Your option #2 sounded good to me.  I'm not sure I understand your
+    concern about an explicit 'icall' instruction?
+ 
+ 
+ o  A pair of important synchronization instr'ns to think about:
+      load-linked
+      store-conditional
+ 
+ 
+ o  Other classes of instructions that are valuable for pipeline performance:
+      conditional-move		 
+      predicated instructions
+ 
+ 
+ o  I believe tail calls are relatively easy to identify; do you know why
+    .NET has a tailcall instruction?
+ 
+ 
+ o  I agree that we need a static data space.  Otherwise, emulating global
+    data gets unnecessarily complex.
+ 
+ 
+ o  About explicit parallelism:
+ 
+    We once talked about adding a symbolic thread-id field to each
+    instruction.  (It could be optional so single-threaded codes are
+    not penalized.)  This could map well to multi-threaded architectures
+    while providing easy ILP for single-threaded onces.  But it is probably
+    too radical an idea to include in a base version of LLVM.  Instead, it
+    could a great topic for a separate study.
+ 
+    What is the semantics of the IA64 stop bit?
+ 
+ 
+ 
+ 
+ o  And finally, another thought about the syntax for arrays :-)
+ 
+    Although this syntax:
+ 	  array <dimension-list> of <type>
+    is verbose, it will be used only in the human-readable assembly code so
+    size should not matter.  I think we should consider it because I find it
+    to be the clearest syntax.  It could even make arrays of function
+    pointers somewhat readable.
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-09-AdveCommentsResponse.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,245 ----
+ From: Chris Lattner <sabre at nondot.org>
+ To: "Vikram S. Adve" <vadve at cs.uiuc.edu>
+ Subject: Re: LLVM Feedback
+ 
+ I've included your feedback in the /home/vadve/lattner/llvm/docs directory
+ so that it will live in CVS eventually with the rest of LLVM.  I've
+ significantly updated the documentation to reflect the changes you
+ suggested, as specified below:
+ 
+ > We should consider eliminating the type annotation in cases where it is
+ > essentially obvious from the instruction type:
+ >        br bool <cond>, label <iftrue>, label <iffalse>
+ > I think your point was that making all types explicit improves clarity
+ > and readability.  I agree to some extent, but it also comes at the
+ > cost of verbosity.  And when the types are obvious from people's
+ > experience (e.g., in the br instruction), it doesn't seem to help as
+ > much.
+ 
+ Very true.  We should discuss this more, but my reasoning is more of a
+ consistency argument.  There are VERY few instructions that can have all
+ of the types eliminated, and doing so when available unnecesarily makes
+ the language more difficult to handle.  Especially when you see 'int
+ %this' and 'bool %that' all over the place, I think it would be
+ disorienting to see:
+ 
+   br %predicate, %iftrue, %iffalse
+ 
+ for branches.  Even just typing that once gives me the creeps. ;)  Like I
+ said, we should probably discuss this further in person...
+ 
+ > On reflection, I really like your idea of having the two different
+ > switch types (even though they encode implementation techniques rather
+ > than semantics).  It should simplify building the CFG and my guess is it
+ > could enable some significant optimizations, though we should think
+ > about which.
+ 
+ Great.  I added a note to the switch section commenting on how the VM
+ should just use the instruction type as a hint, and that the
+ implementation may choose altermate representations (such as predicated
+ branches).
+ 
+ > In the lookup-indirect form of the switch, is there a reason not to
+ > make the val-type uint?
+ 
+ No.  This was something I was debating for a while, and didn't really feel
+ strongly about either way.  It is common to switch on other types in HLL's
+ (for example signed int's are particually common), but in this case, all
+ that will be added is an additional 'cast' instruction.  I removed that
+ from the spec.
+ 
+ > I agree with your comment that we don't need 'neg'
+ 
+ Removed.
+ 
+ > There's a trade-off with the cast instruction:
+ >  +  it avoids having to define all the upcasts and downcasts that are
+ >     valid for the operands of each instruction  (you probably have
+ >     thought of other benefits also)
+ >  -  it could make the bytecode significantly larger because there could
+ >     be a lot of cast operations
+ 
+  + You NEED casts to represent things like:
+     void foo(float);
+     ...
+     int x;
+     ...
+     foo(x);
+    in a language like C.  Even in a Java like language, you need upcasts
+    and some way to implement dynamic downcasts.
+  + Not all forms of instructions take every type (for example you can't
+    shift by a floating point number of bits), thus SOME programs will need
+    implicit casts.
+ 
+ To be efficient and to avoid your '-' point above, we just have to be
+ careful to specify that the instructions shall operate on all common
+ types, therefore casting should be relatively uncommon.  For example all
+ of the arithmetic operations work on almost all data types.
+ 
+ > Making the second arg. to 'shl' a ubyte seems good enough to me.
+ > 255 positions seems adequate for several generations of machines
+ 
+ Okay, that comment is removed.
+ 
+ > and is more compact than uint.
+ 
+ No, it isn't.  Remember that the bytecode encoding saves value slots into
+ the bytecode instructions themselves, not constant values.  This is
+ another case where we may introduce more cast instructions (but we will
+ also reduce the number of opcode variants that must be supported by a
+ virtual machine).  Because most shifts are by constant values, I don't
+ think that we'll have to cast many shifts.  :)
+ 
+ > I still have some major concerns about including malloc and free in the
+ > language (either as builtin functions or instructions).
+ 
+ Agreed.  How about this proposal:
+ 
+ malloc/free are either built in functions or actual opcodes.  They provide
+ all of the type safety that the document would indicate, blah blah
+ blah. :)
+ 
+ Now, because of all of the excellent points that you raised, an
+ implementation may want to override the default malloc/free behavior of
+ the program.  To do this, they simply implement a "malloc" and
+ "free" function.  The virtual machine will then be defined to use the user
+ defined malloc/free function (which return/take void*'s, not type'd
+ pointers like the builtin function would) if one is available, otherwise
+ fall back on a system malloc/free.
+ 
+ Does this sound like a good compromise?  It would give us all of the
+ typesafety/elegance in the language while still allowing the user to do
+ all the cool stuff they want to...
+ 
+ >  'alloca' on the other hand sounds like a good idea, and the
+ >  implementation seems fairly language-independent so it doesn't have the
+ >  problems with malloc listed above.
+ 
+ Okay, once we get the above stuff figured out, I'll put it all in the
+ spec.
+ 
+ >  About indirect call:
+ >  Your option #2 sounded good to me.  I'm not sure I understand your
+ >  concern about an explicit 'icall' instruction?
+ 
+ I worry too much.  :)  The other alternative has been removed. 'icall' is
+ now up in the instruction list next to 'call'.
+ 
+ > I believe tail calls are relatively easy to identify; do you know why
+ > .NET has a tailcall instruction?
+ 
+ Although I am just guessing, I believe it probably has to do with the fact
+ that they want languages like Haskell and lisp to be efficiently runnable
+ on their VM.  Of course this means that the VM MUST implement tail calls
+ 'correctly', or else life will suck.  :)  I would put this into a future
+ feature bin, because it could be pretty handy...
+ 
+ >  A pair of important synchronization instr'ns to think about:
+ >    load-linked
+ >    store-conditional
+ 
+ What is 'load-linked'?  I think that (at least for now) I should add these
+ to the 'possible extensions' section, because they are not immediately
+ needed...
+ 
+ > Other classes of instructions that are valuable for pipeline
+ > performance:
+ >    conditional-move            
+ >    predicated instructions
+ 
+ Conditional move is effectly a special case of a predicated
+ instruction... and I think that all predicated instructions can possibly
+ be implemented later in LLVM.  It would significantly change things, and
+ it doesn't seem to be very necessary right now.  It would seem to
+ complicate flow control analysis a LOT in the virtual machine.  I would
+ tend to prefer that a predicated architecture like IA64 convert from a
+ "basic block" representation to a predicated rep as part of it's dynamic
+ complication phase.  Also, if a basic block contains ONLY a move, then
+ that can be trivally translated into a conditional move...
+ 
+ > I agree that we need a static data space.  Otherwise, emulating global
+ > data gets unnecessarily complex.
+ 
+ Definately.  Also a later item though.  :)
+ 
+ > We once talked about adding a symbolic thread-id field to each
+ > ..
+ > Instead, it could a great topic for a separate study.
+ 
+ Agreed.  :)
+ 
+ > What is the semantics of the IA64 stop bit?
+ 
+ Basically, the IA64 writes instructions like this:
+ mov ...
+ add ...
+ sub ...
+ op xxx
+ op xxx
+ ;;
+ mov ...
+ add ...
+ sub ...
+ op xxx
+ op xxx
+ ;;
+ 
+ Where the ;; delimits a group of instruction with no dependencies between
+ them, which can all be executed concurrently (to the limits of the
+ available functional units).  The ;; gets translated into a bit set in one
+ of the opcodes.
+ 
+ The advantages of this representation is that you don't have to do some
+ kind of 'thread id scheduling' pass by having to specify ahead of time how
+ many threads to use, and the representation doesn't have a per instruction
+ overhead...
+ 
+ > And finally, another thought about the syntax for arrays :-)
+ >  Although this syntax:
+ >         array <dimension-list> of <type>
+ >  is verbose, it will be used only in the human-readable assembly code so
+ >  size should not matter.  I think we should consider it because I find it
+ >  to be the clearest syntax.  It could even make arrays of function
+ >  pointers somewhat readable.
+ 
+ My only comment will be to give you an example of why this is a bad
+ idea.  :)
+ 
+ Here is an example of using the switch statement (with my recommended
+ syntax):
+ 
+ switch uint %val, label %otherwise, 
+        [%3 x {uint, label}] [ { uint %57, label %l1 }, 
+                               { uint %20, label %l2 }, 
+                               { uint %14, label %l3 } ]
+ 
+ Here it is with the syntax you are proposing:
+ 
+ switch uint %val, label %otherwise, 
+        array %3 of {uint, label} 
+               array of {uint, label}
+                               { uint %57, label %l1 },
+                               { uint %20, label %l2 },
+                               { uint %14, label %l3 }
+ 
+ Which is ambiguous and very verbose. It would be possible to specify
+ constants with [] brackets as in my syntax, which would look like this:
+ 
+ switch uint %val, label %otherwise,
+        array %3 of {uint, label}  [ { uint %57, label %l1 },
+                                     { uint %20, label %l2 },
+                                     { uint %14, label %l3 } ]
+ 
+ But then the syntax is inconsistent between type definition and constant
+ definition (why do []'s enclose the constants but not the types??).  
+ 
+ Anyways, I'm sure that there is much debate still to be had over
+ this... :)
+ 
+ -Chris
+ 
+ http://www.nondot.org/~sabre/os/
+ http://www.nondot.org/MagicStats/
+ http://korbit.sourceforge.net/
+ 
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-Memory.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,39 ----
+ Date: Tue, 13 Feb 2001 13:29:52 -0600 (CST)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: LLVM Concerns...
+ 
+ 
+ I've updated the documentation to include load store and allocation
+ instructions (please take a look and let me know if I'm on the right
+ track):
+ 
+ file:/home/vadve/lattner/llvm/docs/LangRef.html#memoryops
+ 
+ I have a couple of concerns I would like to bring up:
+ 
+ 1. Reference types
+    Right now, I've spec'd out the language to have a pointer type, which
+    works fine for lots of stuff... except that Java really has
+    references: constrained pointers that cannot be manipulated: added and
+    subtracted, moved, etc... Do we want to have a type like this?  It
+    could be very nice for analysis (pointer always points to the start of
+    an object, etc...) and more closely matches Java semantics.  The
+    pointer type would be kept for C++ like semantics.  Through analysis,
+    C++ pointers could be promoted to references in the LLVM
+    representation.
+ 
+ 2. Our "implicit" memory references in assembly language:
+    After thinking about it, this model has two problems:
+       A. If you do pointer analysis and realize that two stores are
+          independent and can share the same memory source object, there is
+          no way to represent this in either the bytecode or assembly.
+       B. When parsing assembly/bytecode, we effectively have to do a full
+          SSA generation/PHI node insertion pass to build the dependencies
+          when we don't want the "pinned" representation.  This is not
+          cool.
+    I'm tempted to make memory references explicit in both the assembly and
+    bytecode to get around this... what do you think?
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-02-13-Reference-MemoryResponse.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,47 ----
+ Date: Tue, 13 Feb 2001 18:25:42 -0600
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <sabre at nondot.org>
+ Subject: RE: LLVM Concerns...
+ 
+ > 1. Reference types
+ >    Right now, I've spec'd out the language to have a pointer type, which
+ >    works fine for lots of stuff... except that Java really has
+ >    references: constrained pointers that cannot be manipulated: added and
+ >    subtracted, moved, etc... Do we want to have a type like this?  It
+ >    could be very nice for analysis (pointer always points to the start of
+ >    an object, etc...) and more closely matches Java semantics.  The
+ >    pointer type would be kept for C++ like semantics.  Through analysis,
+ >    C++ pointers could be promoted to references in the LLVM
+ >    representation.
+ 
+ 
+ You're right, having references would be useful.  Even for C++ the *static*
+ compiler could generate references instead of pointers with fairly
+ straightforward analysis.  Let's include a reference type for now.  But I'm
+ also really concerned that LLVM is becoming big and complex and (perhaps)
+ too high-level.  After we get some initial performance results, we may have
+ a clearer idea of what our goals should be and we should revisit this
+ question then.
+ 
+ > 2. Our "implicit" memory references in assembly language:
+ >    After thinking about it, this model has two problems:
+ >       A. If you do pointer analysis and realize that two stores are
+ >          independent and can share the same memory source object,
+ 
+ not sure what you meant by "share the same memory source object"
+ 
+ > there is
+ >          no way to represent this in either the bytecode or assembly.
+ >       B. When parsing assembly/bytecode, we effectively have to do a full
+ >          SSA generation/PHI node insertion pass to build the dependencies
+ >          when we don't want the "pinned" representation.  This is not
+ >          cool.
+ 
+ I understand the concern.  But again, let's focus on the performance first
+ and then look at the language design issues.  E.g., it would be good to know
+ how big the bytecode files are before expanding them further.  I am pretty
+ keen to explore the implications of LLVM for mobile devices.  Both bytecode
+ size and power consumption are important to consider there.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-04-16-DynamicCompilation.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,49 ----
+ By Chris:
+ 
+ LLVM has been designed with two primary goals in mind.  First we strive to 
+ enable the best possible division of labor between static and dynamic 
+ compilers, and second, we need a flexible and powerful interface 
+ between these two complementary stages of compilation.  We feel that 
+ providing a solution to these two goals will yield an excellent solution 
+ to the performance problem faced by modern architectures and programming 
+ languages.
+ 
+ A key insight into current compiler and runtime systems is that a 
+ compiler may fall in anywhere in a "continuum of compilation" to do its 
+ job.  On one side, scripting languages statically compile nothing and 
+ dynamically compile (or equivalently, interpret) everything.  On the far 
+ other side, traditional static compilers process everything statically and 
+ nothing dynamically.  These approaches have typically been seen as a 
+ tradeoff between performance and portability.  On a deeper level, however, 
+ there are two reasons that optimal system performance may be obtained by a
+ system somewhere in between these two extremes: Dynamic application 
+ behavior and social constraints.
+ 
+ From a technical perspective, pure static compilation cannot ever give 
+ optimal performance in all cases, because applications have varying dynamic
+ behavior that the static compiler cannot take into consideration.  Even 
+ compilers that support profile guided optimization generate poor code in 
+ the real world, because using such optimization tunes that application 
+ to one particular usage pattern, whereas real programs (as opposed to 
+ benchmarks) often have several different usage patterns.
+ 
+ On a social level, static compilation is a very shortsighted solution to 
+ the performance problem.  Instruction set architectures (ISAs) continuously 
+ evolve, and each implementation of an ISA (a processor) must choose a set 
+ of tradeoffs that make sense in the market context that it is designed for.  
+ With every new processor introduced, the vendor faces two fundamental 
+ problems: First, there is a lag time between when a processor is introduced 
+ to when compilers generate quality code for the architecture.  Secondly, 
+ even when compilers catch up to the new architecture there is often a large 
+ body of legacy code that was compiled for previous generations and will 
+ not or can not be upgraded.  Thus a large percentage of code running on a 
+ processor may be compiled quite sub-optimally for the current 
+ characteristics of the dynamic execution environment.
+ 
+ For these reasons, LLVM has been designed from the beginning as a long-term 
+ solution to these problems.  Its design allows the large body of platform 
+ independent, static, program optimizations currently in compilers to be 
+ reused unchanged in their current form.  It also provides important static 
+ type information to enable powerful dynamic and link time optimizations 
+ to be performed quickly and efficiently.  This combination enables an 
+ increase in effective system performance for real world environments.

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-18-ExceptionHandling.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,202 ----
+ Meeting notes: Implementation idea: Exception Handling in C++/Java
+ 
+ The 5/18/01 meeting discussed ideas for implementing exceptions in LLVM.
+ We decided that the best solution requires a set of library calls provided by
+ the VM, as well as an extension to the LLVM function invocation syntax.
+ 
+ The LLVM function invocation instruction previously looks like this (ignoring
+ types):
+ 
+   call func(arg1, arg2, arg3)
+ 
+ The extension discussed today adds an optional "with" clause that 
+ associates a label with the call site.  The new syntax looks like this:
+ 
+   call func(arg1, arg2, arg3) with funcCleanup
+ 
+ This funcHandler always stays tightly associated with the call site (being
+ encoded directly into the call opcode itself), and should be used whenever
+ there is cleanup work that needs to be done for the current function if 
+ an exception is thrown by func (or if we are in a try block).
+ 
+ To support this, the VM/Runtime provide the following simple library 
+ functions (all syntax in this document is very abstract):
+ 
+ typedef struct { something } %frame;
+   The VM must export a "frame type", that is an opaque structure used to 
+   implement different types of stack walking that may be used by various
+   language runtime libraries. We imagine that it would be typical to 
+   represent a frame with a PC and frame pointer pair, although that is not 
+   required.
+ 
+ %frame getStackCurrentFrame();
+   Get a frame object for the current function.  Note that if the current
+   function was inlined into its caller, the "current" frame will belong to
+   the "caller".
+ 
+ bool isFirstFrame(%frame f);
+   Returns true if the specified frame is the top level (first activated) frame
+   for this thread.  For the main thread, this corresponds to the main() 
+   function, for a spawned thread, it corresponds to the thread function.
+ 
+ %frame getNextFrame(%frame f);
+   Return the previous frame on the stack.  This function is undefined if f
+   satisfies the predicate isFirstFrame(f).
+ 
+ Label *getFrameLabel(%frame f);
+   If a label was associated with f (as discussed below), this function returns
+   it.  Otherwise, it returns a null pointer.
+ 
+ doNonLocalBranch(Label *L);
+   At this point, it is not clear whether this should be a function or 
+   intrinsic.  It should probably be an intrinsic in LLVM, but we'll deal with
+   this issue later.
+ 
+ 
+ Here is a motivating example that illustrates how these facilities could be
+ used to implement the C++ exception model:
+ 
+ void TestFunction(...) {
+   A a; B b;
+   foo();        // Any function call may throw
+   bar();
+   C c;
+ 
+   try {
+     D d;
+     baz();
+   } catch (int) {
+     ...int Stuff...
+     // execution continues after the try block: the exception is consumed
+   } catch (double) {
+     ...double stuff...
+    throw;            // Exception is propogated
+   }
+ }
+ 
+ This function would compile to approximately the following code (heavy 
+ pseudo code follows):
+ 
+ Func:
+   %a = alloca A
+   A::A(%a)        // These ctors & dtors could throw, but we ignore this 
+   %b = alloca B   // minor detail for this example
+   B::B(%b)
+ 
+   call foo() with fooCleanup // An exception in foo is propogated to fooCleanup
+   call bar() with barCleanup // An exception in bar is propogated to barCleanup
+ 
+   %c = alloca C
+   C::C(c)
+   %d = alloca D
+   D::D(d)
+   call baz() with bazCleanup // An exception in baz is propogated to bazCleanup
+   d->~D();
+ EndTry:                   // This label corresponds to the end of the try block
+   c->~C()       // These could also throw, these are also ignored
+   b->~B()
+   a->~A()
+   return
+ 
+ Note that this is a very straight forward and literal translation: exactly
+ what we want for zero cost (when unused) exception handling.  Especially on
+ platforms with many registers (ie, the IA64) setjmp/longjmp style exception
+ handling is *very* impractical.  Also, the "with" clauses describe the 
+ control flow paths explicitly so that analysis is not adversly effected.
+ 
+ The foo/barCleanup labels are implemented as:
+ 
+ TryCleanup:          // Executed if an exception escapes the try block  
+   c->~C()
+ barCleanup:          // Executed if an exception escapes from bar()
+   // fall through
+ fooCleanup:          // Executed if an exception escapes from foo()
+   b->~B()
+   a->~A()
+   Exception *E = getThreadLocalException()
+   call throw(E)      // Implemented by the C++ runtime, described below
+ 
+ Which does the work one would expect.  getThreadLocalException is a function
+ implemented by the C++ support library.  It returns the current exception 
+ object for the current thread.  Note that we do not attempt to recycle the 
+ shutdown code from before, because performance of the mainline code is 
+ critically important.  Also, obviously fooCleanup and barCleanup may be 
+ merged and one of them eliminated.  This just shows how the code generator 
+ would most likely emit code.
+ 
+ The bazCleanup label is more interesting.  Because the exception may be caught
+ by the try block, we must dispatch to its handler... but it does not exist
+ on the call stack (it does not have a VM Call->Label mapping installed), so 
+ we must dispatch statically with a goto.  The bazHandler thus appears as:
+ 
+ bazHandler:
+   d->~D();    // destruct D as it goes out of scope when entering catch clauses
+   goto TryHandler
+ 
+ In general, TryHandler is not the same as bazHandler, because multiple 
+ function calls could be made from the try block.  In this case, trivial 
+ optimization could merge the two basic blocks.  TryHandler is the code 
+ that actually determines the type of exception, based on the Exception object
+ itself.  For this discussion, assume that the exception object contains *at
+ least*:
+ 
+ 1. A pointer to the RTTI info for the contained object
+ 2. A pointer to the dtor for the contained object
+ 3. The contained object itself
+ 
+ Note that it is necessary to maintain #1 & #2 in the exception object itself
+ because objects without virtual function tables may be thrown (as in this 
+ example).  Assuming this, TryHandler would look something like this:
+ 
+ TryHandler: 
+   Exception *E = getThreadLocalException();
+   switch (E->RTTIType) {
+   case IntRTTIInfo:
+     ...int Stuff...       // The action to perform from the catch block
+     break;
+   case DoubleRTTIInfo:
+     ...double Stuff...    // The action to perform from the catch block
+     goto TryCleanup       // This catch block rethrows the exception
+     break;                // Redundant, eliminated by the optimizer
+   default:
+     goto TryCleanup       // Exception not caught, rethrow
+   }
+ 
+   // Exception was consumed
+   if (E->dtor)
+     E->dtor(E->object)    // Invoke the dtor on the object if it exists
+   goto EndTry             // Continue mainline code...
+ 
+ And that is all there is to it.
+ 
+ The throw(E) function would then be implemented like this (which may be 
+ inlined into the caller through standard optimization):
+ 
+ function throw(Exception *E) {
+   // Get the start of the stack trace...
+   %frame %f = call getStackCurrentFrame()
+ 
+   // Get the label information that corresponds to it
+   label * %L = call getFrameLabel(%f)
+   while (%L == 0 && !isFirstFrame(%f)) {
+     // Loop until a cleanup handler is found
+     %f = call getNextFrame(%f)
+     %L = call getFrameLabel(%f)
+   }
+ 
+   if (%L != 0) {
+     call setThreadLocalException(E)   // Allow handlers access to this...
+     call doNonLocalBranch(%L)
+   }
+   // No handler found!
+   call BlowUp()         // Ends up calling the terminate() method in use
+ }
+ 
+ That's a brief rundown of how C++ exception handling could be implemented in
+ llvm.  Java would be very similar, except it only uses destructors to unlock
+ synchronized blocks, not to destroy data.  Also, it uses two stack walks: a
+ nondestructive walk that builds a stack trace, then a destructive walk that
+ unwinds the stack as shown here. 
+ 
+ It would be trivial to get exception interoperability between C++ and Java.
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-05-19-ExceptionResponse.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,45 ----
+ Date: Sat, 19 May 2001 19:09:13 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: RE: Meeting writeup
+ 
+ > I read it through and it looks great!
+ 
+ Thanks!
+ 
+ > The finally clause in Java may need more thought.  The code for this clause
+ > is like a subroutine because it needs to be entered from many points (end of
+ > try block and beginning of each catch block), and then needs to *return to
+ > the place from where the code was entered*.  That's why JVM has the
+ > jsr/jsr_w instruction.
+ 
+ Hrm... I guess that is an implementation decision.  It can either be
+ modelled as a subroutine (as java bytecodes do), which is really
+ gross... or it can be modelled as code duplication (emitted once inline,
+ then once in the exception path).  Because this could, at worst,
+ slightly less than double the amount of code in a function (it is
+ bounded) I don't think this is a big deal.  One of the really nice things
+ about the LLVM representation is that it still allows for runtime code
+ generation for exception paths (exceptions paths are not compiled until
+ needed).  Obviously a static compiler couldn't do this though.  :)
+ 
+ In this case, only one copy of the code would be compiled... until the
+ other one is needed on demand.  Also this strategy fits with the "zero
+ cost" exception model... the standard case is not burdened with extra
+ branches or "call"s.
+ 
+ > I suppose you could save the return address in a particular register
+ > (specific to this finally block), jump to the finally block, and then at the
+ > end of the finally block, jump back indirectly through this register.  It
+ > will complicate building the CFG but I suppose that can be handled.  It is
+ > also unsafe in terms of checking where control returns (which is I suppose
+ > why the JVM doesn't use this).
+ 
+ I think that a code duplication method would be cleaner, and would avoid
+ the caveats that you mention.  Also, it does not slow down the normal case
+ with an indirect branch...
+ 
+ Like everything, we can probably defer a final decision until later.  :)
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,63 ----
+ Date: Fri, 1 Jun 2001 16:38:17 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: Interesting: GCC passes
+ 
+ 
+ Take a look at this document (which describes the order of optimizations
+ that GCC performs):
+ 
+ http://gcc.gnu.org/onlinedocs/gcc_17.html
+ 
+ The rundown is that after RTL generation, the following happens:
+ 
+ 1 . [t] jump optimization (jumps to jumps, etc)
+ 2 . [t] Delete unreachable code
+ 3 .     Compute live ranges for CSE
+ 4 . [t] Jump threading (jumps to jumps with identical or inverse conditions)
+ 5 . [t] CSE
+ 6 . *** Conversion to SSA 
+ 7 . [t] SSA Based DCE
+ 8 . *** Conversion to LLVM
+ 9 .     UnSSA
+ 10.     GCSE
+ 11.     LICM
+ 12.     Strength Reduction
+ 13.     Loop unrolling
+ 14. [t] CSE
+ 15. [t] DCE
+ 16.     Instruction combination, register movement, scheduling... etc.
+ 
+ I've marked optimizations with a [t] to indicate things that I believe to
+ be relatively trivial to implement in LLVM itself.  The time consuming
+ things to reimplement would be SSA based PRE, Strength reduction & loop
+ unrolling... these would be the major things we would miss out on if we
+ did LLVM creation from tree code [inlining and other high level
+ optimizations are done on the tree representation].
+ 
+ Given the lack of "strong" optimizations that would take a long time to
+ reimplement, I am leaning a bit more towards creating LLVM from the tree
+ code.  Especially given that SGI has GPL'd their compiler, including many
+ SSA based optimizations that could be adapted (besides the fact that their
+ code looks MUCH nicer than GCC :)
+ 
+ Even if we choose to do LLVM code emission from RTL, we will almost
+ certainly want to move LLVM emission from step 8 down until at least CSE
+ has been rerun... which causes me to wonder if the SSA generation code
+ will still work (due to global variable dependencies and stuff).  I assume
+ that it can be made to work, but might be a little more involved than we
+ would like.
+ 
+ I'm continuing to look at the Tree -> RTL code.  It is pretty gross
+ because they do some of the translation a statement at a time, and some
+ of it a function at a time...  I'm not quite clear why and how the
+ distinction is drawn, but it does not appear that there is a wonderful
+ place to attach extra info.
+ 
+ Anyways, I'm proceeding with the RTL -> LLVM conversion phase for now.  We
+ can talk about this more on Monday.
+ 
+ Wouldn't it be nice if there were a obvious decision to be made?  :)
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-01-GCCOptimizations2.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,71 ----
+ Date: Fri, 1 Jun 2001 17:08:44 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: RE: Interesting: GCC passes
+ 
+ > That is very interesting.  I agree that some of these could be done on LLVM
+ > at link-time, but it is the extra time required that concerns me.  Link-time
+ > optimization is severely time-constrained.
+ 
+ If we were to reimplement any of these optimizations, I assume that we
+ could do them a translation unit at a time, just as GCC does now.  This
+ would lead to a pipeline like this:
+ 
+ Static optimizations, xlation unit at a time:
+ .c --GCC--> .llvm --llvmopt--> .llvm 
+ 
+ Link time optimizations:
+ .llvm --llvm-ld--> .llvm --llvm-link-opt--> .llvm 
+ 
+ Of course, many optimizations could be shared between llvmopt and
+ llvm-link-opt, but the wouldn't need to be shared...  Thus compile time
+ could be faster, because we are using a "smarter" IR (SSA based).
+ 
+ > BTW, about SGI, "borrowing" SSA-based optimizations from one compiler and
+ > putting it into another is not necessarily easier than re-doing it.
+ > Optimization code is usually heavily tied in to the specific IR they use.
+ 
+ Understood.  The only reason that I brought this up is because SGI's IR is
+ more similar to LLVM than it is different in many respects (SSA based,
+ relatively low level, etc), and could be easily adapted.  Also their
+ optimizations are written in C++ and are actually somewhat
+ structured... of course it would be no walk in the park, but it would be
+ much less time consuming to adapt, say, SSA-PRE than to rewrite it.
+ 
+ > But your larger point is valid that adding SSA based optimizations is
+ > feasible and should be fun.  (Again, link time cost is the issue.)
+ 
+ Assuming linktime cost wasn't an issue, the question is: 
+ Does using GCC's backend buy us anything?
+ 
+ > It also occurs to me that GCC is probably doing quite a bit of back-end
+ > optimization (step 16 in your list).  Do you have a breakdown of that?
+ 
+ Not really.  The irritating part of GCC is that it mixes it all up and
+ doesn't have a clean seperation of concerns.  A lot of the "back end
+ optimization" happens right along with other data optimizations (ie, CSE
+ of machine specific things).
+ 
+ As far as REAL back end optimizations go, it looks something like this:
+ 
+ 1. Instruction combination: try to make CISCy instructions, if available
+ 2. Register movement: try to get registers in the right places for the
+ architecture to avoid register to register moves.  For example, try to get
+ the first argument of a function to naturally land in %o0 for sparc.
+ 3. Instruction scheduling: 'nuff said :)
+ 4. Register class preferencing: ??
+ 5. Local register allocation
+ 6. global register allocation
+ 7. Spilling
+ 8. Local regalloc
+ 9. Jump optimization
+ 10. Delay slot scheduling
+ 11. Branch shorting for CISC machines
+ 12. Instruction selection & peephole optimization
+ 13. Debug info output
+ 
+ But none of this would be usable for LLVM anyways, unless we were using
+ GCC as a static compiler.
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-06-20-.NET-Differences.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,30 ----
+ Date: Wed, 20 Jun 2001 12:32:22 -0500
+ From: Vikram Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: .NET vs. our VM
+ 
+ One significant difference between .NET CLR and our VM is that the CLR
+ includes full information about classes and inheritance.  In fact, I just
+ sat through the paper on adding templates to .NET CLR, and the speaker
+ indicated that the goal seems to be to do simple static compilation (very
+ little lowering or optimization).  Also, the templates implementation in CLR
+ "relies on dynamic class loading and JIT compilation".
+ 
+ This is an important difference because I think there are some significant
+ advantages to have a much lower level VM layer, and do significant static
+ analysis and optimization.
+ 
+ I also talked to the lead guy for KAI's C++ compiler (Arch Robison) and he
+ said that SGI and other commercial compilers have included options to export
+ their *IR* next to the object code (i.e., .il files) and use them for
+ link-time code generation.  In fact, he said that the .o file was nearly
+ empty and was entirely generated from the .il at link-time.  But he agreed
+ that this limited the link-time interprocedural optimization to modules
+ compiled by the same compiler, whereas our approach allows us to link and
+ optimize modules from multiple different compilers.  (Also, of course, they
+ don't do anything for runtime optimization).
+ 
+ All issues to bring up in Related Work.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-06-LoweringIRForCodeGen.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,31 ----
+ Date: Fri, 6 Jul 2001 16:56:56 -0500
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: lowering the IR
+ 
+ BTW, I do think that we should consider lowering the IR as you said.  I
+ didn't get time to raise it today, but it comes up with the SPARC
+ move-conditional instruction.  I don't think we want to put that in the core
+ VM -- it is a little too specialized.  But without a corresponding
+ conditional move instruction in the VM, it is pretty difficult to maintain a
+ close mapping between VM and machine code.  Other architectures may have
+ other such instructions.
+ 
+ What I was going to suggest was that for a particular processor, we define
+ additional VM instructions that match some of the unusual opcodes on the
+ processor but have VM semantics otherwise, i.e., all operands are in SSA
+ form and typed.  This means that we can re-generate core VM code from the
+ more specialized code any time we want (so that portability is not lost).
+ 
+ Typically, a static compiler like gcc would generate just the core VM, which
+ is relatively portable.  Anyone (an offline tool, the linker, etc., or even
+ the static compiler itself if it chooses) can transform that into more
+ specialized target-specific VM code for a particular architecture.  If the
+ linker does it, it can do it after all machine-independent optimizations.
+ This would be the most convenient, but not necessary.
+ 
+ The main benefit of lowering will be that we will be able to retain a close
+ mapping between VM and machine code.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,51 ----
+ Date: Sun, 8 Jul 2001 09:37:22 -0500
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: Ruchira Sasanka <sasanka at students.uiuc.edu>
+ Cc: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: machine instruction operands
+ 
+ Ruchira,
+ 
+ When generating machine instructions, I have to make several choices about
+ operands.  For cases were a register is required, there are 3 cases:
+ 
+ 1. The register is for a Value* that is already in the VM code.
+ 
+ 2. The register is for a value that is not in the VM code, usually because 2
+ machine instructions get generated for a single VM instruction (and the
+ register holds the result of the first m/c instruction and is used by the
+ second m/c instruction).
+ 
+ 3. The register is a pre-determined machine register.
+ 
+ E.g, for this VM instruction:
+         ptr = alloca type, numElements
+ I have to generate 2 machine instructions:
+         reg = mul constant, numElements
+         ptr = add %sp, reg
+ 
+ Each machine instruction is of class MachineInstr.
+ It has a vector of operands.  All register operands have type MO_REGISTER.
+ The 3 types of register operands are marked using this enum:
+ 
+  enum VirtualRegisterType {
+     MO_VMVirtualReg,            // virtual register for *value
+     MO_MInstrVirtualReg,        // virtual register for result of *minstr
+     MO_MachineReg               // pre-assigned machine register `regNum'
+   } vregType;
+ 
+ Here's how this affects register allocation:
+ 
+ 1. MO_VMVirtualReg is the standard case: you just do the register
+ allocation.
+ 
+ 2. MO_MInstrVirtualReg is the case where there is a hidden register being
+ used.  You should decide how you want to handle it, e.g., do you want do
+ create a Value object during the preprocessing phase to make the value
+ explicit (like for address register for the RETURN instruction).
+ 
+ 3. For case MO_MachineReg, you don't need to do anything, at least for
+ SPARC. The only machine regs I am using so far are %g0 and %sp.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection2.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-07-08-InstructionSelection2.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,25 ----
+ Date: Sun, 8 Jul 2001 10:02:20 -0500
+ From: Vikram S. Adve <vadve at cs.uiuc.edu>
+ To: vadve at cs.uiuc.edu, Ruchira Sasanka <sasanka at students.uiuc.edu>
+ Cc: Chris Lattner <lattner at cs.uiuc.edu>
+ Subject: RE: machine instruction operands
+ 
+ I got interrupted and forgot to explain the example.  In that case:
+ 
+         reg will be the 3rd operand of MUL and it will be of type
+ MO_MInstrVirtualReg.  The field MachineInstr* minstr will point to the
+ instruction that computes reg.
+ 
+         numElements will be an immediate constant, not a register.
+ 
+         %sp will be operand 1 of ADD and it will be of type MO_MachineReg.  The
+ field regNum identifies the register.
+ 
+         numElements will be operand 2 of ADD and it will be of type
+ MO_VMVirtualReg.  The field Value* value identifies the value.
+ 
+         ptr will be operand 3 of ADD will also be %sp, i.e., of
+         type MO_MachineReg. regNum identifies the register.
+ 
+ --Vikram
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2001-09-18-OptimizeExceptions.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,56 ----
+ Date: Tue, 18 Sep 2001 00:38:37 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: Vikram S. Adve <vadve at cs.uiuc.edu>
+ Subject: Idea for a simple, useful link time optimization
+ 
+ 
+ In C++ programs, exceptions suck, and here's why:
+ 
+ 1. In virtually all function calls, you must assume that the function
+    throws an exception, unless it is defined as 'nothrow'.  This means
+    that every function call has to have code to invoke dtors on objects
+    locally if one is thrown by the function.  Most functions don't throw
+    exceptions, so this code is dead [with all the bad effects of dead
+    code, including icache pollution].
+ 2. Declaring a function nothrow causes catch blocks to be added to every
+    call that isnot  provably nothrow.  This makes them very slow.
+ 3. Extra extraneous exception edges reduce the opportunity for code
+    motion.
+ 4. EH is typically implemented with large lookup tables.  Ours is going to
+    be much smaller (than the "standard" way of doing it) to start with,
+    but eliminating it entirely would be nice. :)
+ 5. It is physically impossible to correctly put (accurate, correct)
+    exception specifications on generic, templated code.  But it is trivial
+    to analyze instantiations of said code.
+ 6. Most large C++ programs throw few exceptions.  Most well designed
+    programs only throw exceptions in specific planned portions of the
+    code.
+ 
+ Given our _planned_ model of handling exceptions, all of this would be
+ pretty trivial to eliminate through some pretty simplistic interprocedural
+ analysis.  The DCE factor alone could probably be pretty significant.  The
+ extra code motion opportunities could also be exploited though...
+ 
+ Additionally, this optimization can be implemented in a straight forward
+ conservative manner, allowing libraries to be optimized or individual
+ files even (if there are leaf functions visible in the translation unit
+ that are called).
+ 
+ I think it's a reasonable optimization that hasn't really been addressed
+ (because assembly is way too low level for this), and could have decent
+ payoffs... without being a overly complex optimization.
+ 
+ After I wrote all of that, I found this page that is talking about
+ basically the same thing I just wrote, except that it is translation unit
+ at a time, tree based approach:
+ http://www.ocston.org/~jls/ehopt.html
+ 
+ but is very useful from "expected gain" and references perspective.  Note
+ that their compiler is apparently unable to inline functions that use
+ exceptions, so there numbers are pretty worthless... also our results
+ would (hopefully) be better because it's interprocedural...
+ 
+ What do you think?
+ 
+ -Chris
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2002-05-12-InstListChange.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2002-05-12-InstListChange.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2002-05-12-InstListChange.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,55 ----
+ Date: Sun, 12 May 2002 17:12:53 -0500 (CDT)
+ From: Chris Lattner <sabre at nondot.org>
+ To: "Vikram S. Adve" <vadve at cs.uiuc.edu>
+ Subject: LLVM change
+ 
+ There is a fairly fundemental change that I would like to make to the LLVM 
+ infrastructure, but I'd like to know if you see any drawbacks that I 
+ don't...
+ 
+ Basically right now at the basic block level, each basic block contains an 
+ instruction list (returned by getInstList()) that is a ValueHolder of 
+ instructions.  To iterate over instructions, we must actually iterate over 
+ the instlist, and access the instructions through the instlist.
+ 
+ To add or remove an instruction from a basic block, we need to get an 
+ iterator to an instruction, which, given just an Instruction*, requires a 
+ linear search of the basic block the instruction is contained in... just 
+ to insert an instruction before another instruction, or to delete an 
+ instruction!  This complicates algorithms that should be very simple (like 
+ simple constant propogation), because they aren't actually sparse anymore, 
+ they have to traverse basic blocks to remove constant propogated 
+ instructions.
+ 
+ Additionally, adding or removing instructions to a basic block 
+ _invalidates all iterators_ pointing into that block, which is really 
+ irritating.
+ 
+ To fix these problems (and others), I would like to make the ordering of
+ the instructions be represented with a doubly linked list in the
+ instructions themselves, instead of an external data structure.  This is 
+ how many other representations do it, and frankly I can't remember why I 
+ originally implemented it the way I did.
+ 
+ Long term, all of the code that depends on the nasty features in the 
+ instruction list (which can be found by grep'ing for getInstList()) will 
+ be changed to do nice local transformations.  In the short term, I'll 
+ change the representation, but preserve the interface (including 
+ getInstList()) so that all of the code doesn't have to change.
+ 
+ Iteration over the instructions in a basic block remains the simple:
+ for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) ...
+ 
+ But we will also support:
+ for (Instruction *I = BB->front(); I; I = I->getNext()) ...
+ 
+ After converting instructions over, I'll convert basic blocks and 
+ functions to have a similar interface.
+ 
+ The only negative aspect of this change that I see is that it increases 
+ the amount of memory consumed by one pointer per instruction.  Given the 
+ benefits, I think this is a very reasonable tradeoff. 
+ 
+ What do you think?
+ 
+ -Chris

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2002-06-25-MegaPatchInfo.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,72 ----
+ Changes:
+ * Change the casting code to be const correct.  Now, doing this is invalid:
+      const Value *V = ...;
+      Instruction *I = dyn_cast<Instruction>(V);
+   instead, the second line should be:
+      const Instruction *I = dyn_cast<Instruction>(V);
+ 
+ * Change the casting code to allow casting a reference value thus:
+      const Value &V = ...;
+      Instruction &I = cast<Instruction>(V);
+ 
+   dyn_cast does not work with references, because it must return a null pointer
+   on failure.
+ 
+ * Fundamentally change how instructions and other values are represented.
+   Before, every llvm container was an instance of the ValueHolder template,
+   instantiated for each container type.  This ValueHolder was effectively a
+   wrapper around a vector of pointers to the sub-objects.
+ 
+   Now, instead of having a vector to pointers of objects, the objects are
+   maintained in a doubly linked list of values (ie each Instruction now has
+   Next & Previous fields).  The containers are now instances of ilist (intrusive
+   linked list class), which use the next and previous fields to chain them
+   together.  The advantage of this implementation is that iterators can be
+   formed directly from pointers to the LLVM value, and invalidation is much
+   easier to handle.
+ 
+ * As part of the above change, dereferencing an iterator (for example:
+   BasicBlock::iterator) now produces a reference to the underlying type (same
+   example: Instruction&) instead of a pointer to the underlying object.  This
+   makes it much easier to write nested loops that iterator over things, changing
+   this:
+ 
+     for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
+       for (BasicBlock::iterator II = (*BI)->begin(); II != (*BI)->end(); ++II)
+         (*II)->dump();
+ 
+   into:
+ 
+     for (Function::iterator BI = Func->begin(); BI != Func->end(); ++BI)
+       for (BasicBlock::iterator II = BI->begin(); II != BI->end(); ++II)
+         II->dump();
+ 
+   which is much more natural and what users expect.
+ 
+ * Simplification of #include's: Before, it was necessary for a .cpp file to
+   include every .h file that it used.  Now things are batched a little bit more
+   to make it easier to use.  Specifically, the include graph now includes these
+   edges:
+     Module.h -> Function.h, GlobalVariable.h
+     Function.h -> BasicBlock.h, Argument.h
+     BasicBlock.h -> Instruction.h
+ 
+   Which means that #including Function.h is usually sufficient for getting the
+   lower level #includes.
+ 
+ * Printing out a Value* has now changed: Printing a Value* will soon print out
+   the address of the value instead of the contents of the Value.  To print out
+   the contents, you must convert it to a reference with (for example)
+   'cout << *I' instead of 'cout << I;'.  This conversion is not yet complete,
+   but will be eventually.  In the mean time, both forms print out the contents.
+ 
+ * References are used much more throughout the code base.  In general, if a
+   pointer is known to never be null, it is passed in as a reference instead of a
+   pointer.  For example, the instruction visitor class uses references instead
+   of pointers, and that Pass subclasses now all receive references to Values
+   instead of pointers, because they may never be null.
+ 
+ * The Function class now has helper functions for accessing the Arguments list.
+   Instead of having to go through getArgumentList for simple things like
+   iterator over the arguments, now the a*() methods can be used to access them.
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2003-01-23-CygwinNotes.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,28 ----
+ Date: Mon, 20 Jan 2003 00:00:28 -0600
+ From: Brian R. Gaeke <gaeke at uiuc.edu>
+ Subject: windows vs. llvm
+ 
+ If you're interested, here are some of the major problems compiling LLVM
+ under Cygwin and/or Mingw.
+ 
+ 1. Cygwin doesn't have <inttypes.h> or <stdint.h>, so all the INT*_MAX
+    symbols and standard int*_t types are off in limbo somewhere. Mingw has
+    <stdint.h>, but Cygwin doesn't like it.
+ 
+ 2. Mingw doesn't have <dlfcn.h> (because Windows doesn't have it.)
+ 
+ 3. SA_SIGINFO and friends are not around; only signal() seems to work.
+ 
+ 4. Relink, aka ld -r, doesn't work (probably an ld bug); you need
+    DONT_BUILD_RELINKED. This breaks all the tools makefiles; you just need to
+    change them to have .a's.
+ 
+ 5. There isn't a <values.h>.
+ 
+ 6. There isn't a mallinfo() (or, at least, it's documented, but it doesn't seem
+    to link).
+ 
+ 7. The version of Bison that cygwin (and newer Linux versions) comes with
+    does not like = signs in rules. Burg's gram.yc source file uses them. I think
+    you can just take them out.
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-25-Reoptimizer1.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,137 ----
+ Wed Jun 25 15:13:51 CDT 2003
+ 
+ First-level instrumentation
+ ---------------------------
+ 
+ We use opt to do Bytecode-to-bytecode instrumentation. Look at
+ back-edges and insert llvm_first_trigger() function call which takes
+ no arguments and no return value. This instrumentation is designed to
+ be easy to remove, for instance by writing a NOP over the function
+ call instruction.
+ 
+ Keep count of every call to llvm_first_trigger(), and maintain
+ counters in a map indexed by return address. If the trigger count
+ exceeds a threshold, we identify a hot loop and perform second-level
+ instrumentation on the hot loop region (the instructions between the
+ target of the back-edge and the branch that causes the back-edge).  We
+ do not move code across basic-block boundaries.
+ 
+ 
+ Second-level instrumentation
+ ---------------------------
+ 
+ We remove the first-level instrumentation by overwriting the CALL to
+ llvm_first_trigger() with a NOP.
+ 
+ The reoptimizer maintains a map between machine-code basic blocks and
+ LLVM BasicBlock*s.  We only keep track of paths that start at the
+ first machine-code basic block of the hot loop region.
+ 
+ How do we keep track of which edges to instrument, and which edges are
+ exits from the hot region? 3 step process.
+ 
+ 1) Do a DFS from the first machine-code basic block of the hot loop
+ region and mark reachable edges.
+ 
+ 2) Do a DFS from the last machine-code basic block of the hot loop
+ region IGNORING back edges, and mark the edges which are reachable in
+ 1) and also in 2) (i.e., must be reachable from both the start BB and
+ the end BB of the hot region).
+ 
+ 3) Mark BBs which end in edges that exit the hot region; we need to
+ instrument these differently.
+ 
+ Assume that there is 1 free register. On SPARC we use %g1, which LLC
+ has agreed not to use.  Shift a 1 into it at the beginning. At every
+ edge which corresponds to a conditional branch, we shift 0 for not
+ taken and 1 for taken into a register. This uniquely numbers the paths
+ through the hot region. Silently fail if we need more than 64 bits.
+ 
+ At the end BB we call countPath and increment the counter based on %g1
+ and the return address of the countPath call.  We keep track of the
+ number of iterations and the number of paths.  We only run this
+ version 30 or 40 times.
+ 
+ Find the BBs that total 90% or more of execution, and aggregate them
+ together to form our trace. But we do not allow more than 5 paths; if
+ we have more than 5 we take the ones that are executed the most.  We
+ verify our assumption that we picked a hot back-edge in first-level
+ instrumentation, by making sure that the number of times we took an
+ exit edge from the hot trace is less than 10% of the number of
+ iterations.
+ 
+ LLC has been taught to recognize llvm_first_trigger() calls and NOT
+ generate saves and restores of caller-saved registers around these
+ calls.
+ 
+ 
+ Phase behavior
+ --------------
+ 
+ We turn off llvm_first_trigger() calls with NOPs, but this would hide
+ phase behavior from us (when some funcs/traces stop being hot and
+ others become hot.)
+ 
+ We have a SIGALRM timer that counts time for us. Every time we get a
+ SIGALRM we look at our priority queue of locations where we have
+ removed llvm_first_trigger() calls. Each location is inserted along
+ with a time when we will next turn instrumentation back on for that
+ call site. If the time has arrived for a particular call site, we pop
+ that off the prio. queue and turn instrumentation back on for that
+ call site.
+ 
+ 
+ Generating traces
+ -----------------
+ 
+ When we finally generate an optimized trace we first copy the code
+ into the trace cache. This leaves us with 3 copies of the code: the
+ original code, the instrumented code, and the optimized trace. The
+ optimized trace does not have instrumentation. The original code and
+ the instrumented code are modified to have a branch to the trace
+ cache, where the optimized traces are kept.
+ 
+ We copy the code from the original to the instrumentation version
+ by tracing the LLVM-to-Machine code basic block map and then copying
+ each machine code basic block we think is in the hot region into the
+ trace cache. Then we instrument that code. The process is similar for
+ generating the final optimized trace; we copy the same basic blocks
+ because we might need to put in fixup code for exit BBs.
+ 
+ LLVM basic blocks are not typically used in the Reoptimizer except
+ for the mapping information.
+ 
+ We are restricted to using single instructions to branch between the
+ original code, trace, and instrumented code. So we have to keep the
+ code copies in memory near the original code (they can't be far enough
+ away that a single pc-relative branch would not work.) Malloc() or
+ data region space is too far away. this impacts the design of the 
+ trace cache.
+ 
+ We use a dummy function that is full of a bunch of for loops which we
+ overwrite with trace-cache code. The trace manager keeps track of
+ whether or not we have enough space in the trace cache, etc.
+ 
+ The trace insertion routine takes an original start address, a vector
+ of machine instructions representing the trace, index of branches and
+ their corresponding absolute targets, and index of calls and their
+ corresponding absolute targets.
+ 
+ The trace insertion routine is responsible for inserting branches from
+ the beginning of the original code to the beginning of the optimized
+ trace. This is because at some point the trace cache may run out of
+ space and it may have to evict a trace, at which point the branch to
+ the trace would also have to be removed. It uses a round-robin
+ replacement policy; we have found that this is almost as good as LRU
+ and better than random (especially because of problems fitting the new
+ trace in.)
+ 
+ We cannot deal with discontiguous trace cache areas.  The trace cache
+ is supposed to be cache-line-aligned, but it is not page-aligned.
+ 
+ We generate instrumentation traces and optimized traces into separate
+ trace caches. We keep the instrumented code around because you don't
+ want to delete a trace when you still might have to return to it
+ (i.e., return from a llvm_first_trigger() or countPath() call.)
+ 
+ 

Index: llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt
diff -c /dev/null llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt:1.1
*** /dev/null	Mon Nov 20 01:27:58 2006
--- llvm-www/releases/1.9/docs/HistoricalNotes/2003-06-26-Reoptimizer2.txt	Mon Nov 20 01:27:45 2006
***************
*** 0 ****
--- 1,110 ----
+ Thu Jun 26 14:43:04 CDT 2003
+ 
+ Information about BinInterface
+ ------------------------------
+ 
+ Take in a set of instructions with some particular register
+ allocation. It allows you to add, modify, or delete some instructions,
+ in SSA form (kind of like LLVM's MachineInstrs.) Then re-allocate
+ registers. It assumes that the transformations you are doing are safe.
+ It does not update the mapping information or the LLVM representation
+ for the modified trace (so it would not, for instance, support
+ multiple optimization passes; passes have to be aware of and update
+ manually the mapping information.)
+ 
+ The way you use it is you take the original code and provide it to
+ BinInterface; then you do optimizations to it, then you put it in the
+ trace cache.
+ 
+ The BinInterface tries to find live-outs for traces so that it can do
+ register allocation on just the trace, and stitch the trace back into
+ the original code. It has to preserve the live-ins and live-outs when
+ it does its register allocation.  (On exits from the trace we have
+ epilogues that copy live-outs back into the right registers, but
+ live-ins have to be in the right registers.)
+ 
+ 
+ Limitations of BinInterface
+ ---------------------------
+ 
+ It does copy insertions for PHIs, which it infers from the machine
+ code. The mapping info inserted by LLC is not sufficient to determine
+ the PHIs.
+ 
+ It does not handle integer or floating-point condition codes and it
+ does not handle floating-point register allocation.
+ 
+ It is not aggressively able to use lots of registers.
+ 
+ There is a problem with alloca: we cannot find our spill space for
+ spilling registers, normally allocated on the stack, if the trace
+ follows an alloca(). What might be an acceptable solution would be to
+ disable trace generation on functions that have variable-sized
+ alloca()s. Variable-sized allocas in the trace would also probably
+ screw things up.
+ 
+ Because of the FP and alloca limitations, the BinInterface is
+ completely disabled right now.
+ 
+ 
+ Demo
+ ----
+ 
+ This is a demo of the Ball & Larus version that does NOT use 2-level
+ profiling.
+ 
+ 1. Compile program with llvm-gcc.
+ 2. Run opt -lowerswitch -paths -emitfuncs on the bytecode.
+    -lowerswitch change switch statements to branches
+    -paths       Ball & Larus path-profiling algorithm
+    -emitfuncs   emit the table of functions
+ 3. Run llc to generate SPARC assembly code for the result of step 2.
+ 4. Use g++ to link the (instrumented) assembly code.
+ 
+ We use a script to do all this:
+ ------------------------------------------------------------------------------
+ #!/bin/sh
+ llvm-gcc $1.c -o $1
+ opt -lowerswitch -paths -emitfuncs $1.bc > $1.run.bc
+ llc -f $1.run.bc 
+ LIBS=$HOME/llvm_sparc/lib/Debug
+ GXX=/usr/dcs/software/evaluation/bin/g++
+ $GXX -g -L $LIBS $1.run.s -o $1.run.llc \
+ $LIBS/tracecache.o \
+ $LIBS/mapinfo.o \
+ $LIBS/trigger.o \
+ $LIBS/profpaths.o \
+ $LIBS/bininterface.o \
+ $LIBS/support.o \
+ $LIBS/vmcore.o \
+ $LIBS/transformutils.o \
+ $LIBS/bcreader.o \
+ -lscalaropts -lscalaropts -lanalysis \
+ -lmalloc -lcpc -lm -ldl
+ ------------------------------------------------------------------------------
+ 
+ 5. Run the resulting binary.  You will see output from BinInterface
+ (described below) intermixed with the output from the program.
+ 
+ 
+ Output from BinInterface
+ ------------------------
+ 
+ BinInterface's debugging code prints out the following stuff in order:
+ 
+ 1. Initial code provided to BinInterface with original register
+ allocation.
+ 
+ 2. Section 0 is the trace prolog, consisting mainly of live-ins and
+ register saves which will be restored in epilogs.
+ 
+ 3. Section 1 is the trace itself, in SSA form used by BinInterface,
+ along with the PHIs that are inserted.
+ PHIs are followed by the copies that implement them.
+ Each branch (i.e., out of the trace) is annotated with the
+ section number that represents the epilog it branches to.
+ 
+ 4. All the other sections starting with Section 2 are trace epilogs.
+ Every branch from the trace has to go to some epilog.
+ 
+ 5. After the last section is the register allocation output.