[llvm] r274441 - New Kaleidoscope chapter: Creating object files

Wilfred Hughes via llvm-commits llvm-commits at lists.llvm.org
Sat Jul 2 10:02:00 PDT 2016


Author: wilfred
Date: Sat Jul  2 12:01:59 2016
New Revision: 274441

URL: http://llvm.org/viewvc/llvm-project?rev=274441&view=rev
Log:
New Kaleidoscope chapter: Creating object files

This new chapter describes compiling LLVM IR to object files.

The new chaper is chapter 8, so later chapters have been renumbered.
Since this brings us to 10 chapters total, I've also needed to rename
the other chapters to use two digit numbering.

Differential Revision: http://reviews.llvm.org/D18070


Added:
    llvm/trunk/docs/tutorial/LangImpl01.rst
    llvm/trunk/docs/tutorial/LangImpl02.rst
    llvm/trunk/docs/tutorial/LangImpl03.rst
    llvm/trunk/docs/tutorial/LangImpl04.rst
    llvm/trunk/docs/tutorial/LangImpl05-cfg.png   (with props)
    llvm/trunk/docs/tutorial/LangImpl05.rst
    llvm/trunk/docs/tutorial/LangImpl06.rst
    llvm/trunk/docs/tutorial/LangImpl07.rst
    llvm/trunk/docs/tutorial/LangImpl08.rst
    llvm/trunk/docs/tutorial/LangImpl09.rst
    llvm/trunk/docs/tutorial/LangImpl10.rst
    llvm/trunk/examples/Kaleidoscope/Chapter9/
    llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt
    llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp
Removed:
    llvm/trunk/docs/tutorial/LangImpl1.rst
    llvm/trunk/docs/tutorial/LangImpl2.rst
    llvm/trunk/docs/tutorial/LangImpl3.rst
    llvm/trunk/docs/tutorial/LangImpl4.rst
    llvm/trunk/docs/tutorial/LangImpl5-cfg.png
    llvm/trunk/docs/tutorial/LangImpl5.rst
    llvm/trunk/docs/tutorial/LangImpl6.rst
    llvm/trunk/docs/tutorial/LangImpl7.rst
    llvm/trunk/docs/tutorial/LangImpl8.rst
    llvm/trunk/docs/tutorial/LangImpl9.rst
Modified:
    llvm/trunk/docs/tutorial/OCamlLangImpl5.rst
    llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt
    llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp

Added: llvm/trunk/docs/tutorial/LangImpl01.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl01.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl01.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl01.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,293 @@
+=================================================
+Kaleidoscope: Tutorial Introduction and the Lexer
+=================================================
+
+.. contents::
+   :local:
+
+Tutorial Introduction
+=====================
+
+Welcome to the "Implementing a language with LLVM" tutorial. This
+tutorial runs through the implementation of a simple language, showing
+how fun and easy it can be. This tutorial will get you up and started as
+well as help to build a framework you can extend to other languages. The
+code in this tutorial can also be used as a playground to hack on other
+LLVM specific things.
+
+The goal of this tutorial is to progressively unveil our language,
+describing how it is built up over time. This will let us cover a fairly
+broad range of language design and LLVM-specific usage issues, showing
+and explaining the code for it all along the way, without overwhelming
+you with tons of details up front.
+
+It is useful to point out ahead of time that this tutorial is really
+about teaching compiler techniques and LLVM specifically, *not* about
+teaching modern and sane software engineering principles. In practice,
+this means that we'll take a number of shortcuts to simplify the
+exposition. For example, the code uses global variables
+all over the place, doesn't use nice design patterns like
+`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
+it is very simple. If you dig in and use the code as a basis for future
+projects, fixing these deficiencies shouldn't be hard.
+
+I've tried to put this tutorial together in a way that makes chapters
+easy to skip over if you are already familiar with or are uninterested
+in the various pieces. The structure of the tutorial is:
+
+-  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope
+   language, and the definition of its Lexer - This shows where we are
+   going and the basic functionality that we want it to do. In order to
+   make this tutorial maximally understandable and hackable, we choose
+   to implement everything in C++ instead of using lexer and parser
+   generators. LLVM obviously works just fine with such tools, feel free
+   to use one if you prefer.
+-  `Chapter #2 <LangImpl02.html>`_: Implementing a Parser and AST -
+   With the lexer in place, we can talk about parsing techniques and
+   basic AST construction. This tutorial describes recursive descent
+   parsing and operator precedence parsing. Nothing in Chapters 1 or 2
+   is LLVM-specific, the code doesn't even link in LLVM at this point.
+   :)
+-  `Chapter #3 <LangImpl03.html>`_: Code generation to LLVM IR - With
+   the AST ready, we can show off how easy generation of LLVM IR really
+   is.
+-  `Chapter #4 <LangImpl04.html>`_: Adding JIT and Optimizer Support
+   - Because a lot of people are interested in using LLVM as a JIT,
+   we'll dive right into it and show you the 3 lines it takes to add JIT
+   support. LLVM is also useful in many other ways, but this is one
+   simple and "sexy" way to show off its power. :)
+-  `Chapter #5 <LangImpl05.html>`_: Extending the Language: Control
+   Flow - With the language up and running, we show how to extend it
+   with control flow operations (if/then/else and a 'for' loop). This
+   gives us a chance to talk about simple SSA construction and control
+   flow.
+-  `Chapter #6 <LangImpl06.html>`_: Extending the Language:
+   User-defined Operators - This is a silly but fun chapter that talks
+   about extending the language to let the user program define their own
+   arbitrary unary and binary operators (with assignable precedence!).
+   This lets us build a significant piece of the "language" as library
+   routines.
+-  `Chapter #7 <LangImpl07.html>`_: Extending the Language: Mutable
+   Variables - This chapter talks about adding user-defined local
+   variables along with an assignment operator. The interesting part
+   about this is how easy and trivial it is to construct SSA form in
+   LLVM: no, LLVM does *not* require your front-end to construct SSA
+   form!
+-  `Chapter #8 <LangImpl08.html>`_: Compiling to Object Files - This
+   chapter explains how to take LLVM IR and compile it down to object
+   files.
+-  `Chapter #9 <LangImpl09.html>`_: Extending the Language: Debug
+   Information - Having built a decent little programming language with
+   control flow, functions and mutable variables, we consider what it
+   takes to add debug information to standalone executables. This debug
+   information will allow you to set breakpoints in Kaleidoscope
+   functions, print out argument variables, and call functions - all
+   from within the debugger!
+-  `Chapter #10 <LangImpl10.html>`_: Conclusion and other useful LLVM
+   tidbits - This chapter wraps up the series by talking about
+   potential ways to extend the language, but also includes a bunch of
+   pointers to info about "special topics" like adding garbage
+   collection support, exceptions, debugging, support for "spaghetti
+   stacks", and a bunch of other tips and tricks.
+
+By the end of the tutorial, we'll have written a bit less than 1000 lines
+of non-comment, non-blank, lines of code. With this small amount of
+code, we'll have built up a very reasonable compiler for a non-trivial
+language including a hand-written lexer, parser, AST, as well as code
+generation support with a JIT compiler. While other systems may have
+interesting "hello world" tutorials, I think the breadth of this
+tutorial is a great testament to the strengths of LLVM and why you
+should consider it if you're interested in language or compiler design.
+
+A note about this tutorial: we expect you to extend the language and
+play with it on your own. Take the code and go crazy hacking away at it,
+compilers don't need to be scary creatures - it can be a lot of fun to
+play with languages!
+
+The Basic Language
+==================
+
+This tutorial will be illustrated with a toy language that we'll call
+"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived
+from "meaning beautiful, form, and view"). Kaleidoscope is a procedural
+language that allows you to define functions, use conditionals, math,
+etc. Over the course of the tutorial, we'll extend Kaleidoscope to
+support the if/then/else construct, a for loop, user defined operators,
+JIT compilation with a simple command line interface, etc.
+
+Because we want to keep things simple, the only datatype in Kaleidoscope
+is a 64-bit floating point type (aka 'double' in C parlance). As such,
+all values are implicitly double precision and the language doesn't
+require type declarations. This gives the language a very nice and
+simple syntax. For example, the following simple example computes
+`Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_
+
+::
+
+    # Compute the x'th fibonacci number.
+    def fib(x)
+      if x < 3 then
+        1
+      else
+        fib(x-1)+fib(x-2)
+
+    # This expression will compute the 40th number.
+    fib(40)
+
+We also allow Kaleidoscope to call into standard library functions (the
+LLVM JIT makes this completely trivial). This means that you can use the
+'extern' keyword to define a function before you use it (this is also
+useful for mutually recursive functions). For example:
+
+::
+
+    extern sin(arg);
+    extern cos(arg);
+    extern atan2(arg1 arg2);
+
+    atan2(sin(.4), cos(42))
+
+A more interesting example is included in Chapter 6 where we write a
+little Kaleidoscope application that `displays a Mandelbrot
+Set <LangImpl06.html#kicking-the-tires>`_ at various levels of magnification.
+
+Lets dive into the implementation of this language!
+
+The Lexer
+=========
+
+When it comes to implementing a language, the first thing needed is the
+ability to process a text file and recognize what it says. The
+traditional way to do this is to use a
+"`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka
+'scanner') to break the input up into "tokens". Each token returned by
+the lexer includes a token code and potentially some metadata (e.g. the
+numeric value of a number). First, we define the possibilities:
+
+.. code-block:: c++
+
+    // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
+    // of these for known things.
+    enum Token {
+      tok_eof = -1,
+
+      // commands
+      tok_def = -2,
+      tok_extern = -3,
+
+      // primary
+      tok_identifier = -4,
+      tok_number = -5,
+    };
+
+    static std::string IdentifierStr; // Filled in if tok_identifier
+    static double NumVal;             // Filled in if tok_number
+
+Each token returned by our lexer will either be one of the Token enum
+values or it will be an 'unknown' character like '+', which is returned
+as its ASCII value. If the current token is an identifier, the
+``IdentifierStr`` global variable holds the name of the identifier. If
+the current token is a numeric literal (like 1.0), ``NumVal`` holds its
+value. Note that we use global variables for simplicity, this is not the
+best choice for a real language implementation :).
+
+The actual implementation of the lexer is a single function named
+``gettok``. The ``gettok`` function is called to return the next token
+from standard input. Its definition starts as:
+
+.. code-block:: c++
+
+    /// gettok - Return the next token from standard input.
+    static int gettok() {
+      static int LastChar = ' ';
+
+      // Skip any whitespace.
+      while (isspace(LastChar))
+        LastChar = getchar();
+
+``gettok`` works by calling the C ``getchar()`` function to read
+characters one at a time from standard input. It eats them as it
+recognizes them and stores the last character read, but not processed,
+in LastChar. The first thing that it has to do is ignore whitespace
+between tokens. This is accomplished with the loop above.
+
+The next thing ``gettok`` needs to do is recognize identifiers and
+specific keywords like "def". Kaleidoscope does this with this simple
+loop:
+
+.. code-block:: c++
+
+      if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
+        IdentifierStr = LastChar;
+        while (isalnum((LastChar = getchar())))
+          IdentifierStr += LastChar;
+
+        if (IdentifierStr == "def")
+          return tok_def;
+        if (IdentifierStr == "extern")
+          return tok_extern;
+        return tok_identifier;
+      }
+
+Note that this code sets the '``IdentifierStr``' global whenever it
+lexes an identifier. Also, since language keywords are matched by the
+same loop, we handle them here inline. Numeric values are similar:
+
+.. code-block:: c++
+
+      if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
+        std::string NumStr;
+        do {
+          NumStr += LastChar;
+          LastChar = getchar();
+        } while (isdigit(LastChar) || LastChar == '.');
+
+        NumVal = strtod(NumStr.c_str(), 0);
+        return tok_number;
+      }
+
+This is all pretty straight-forward code for processing input. When
+reading a numeric value from input, we use the C ``strtod`` function to
+convert it to a numeric value that we store in ``NumVal``. Note that
+this isn't doing sufficient error checking: it will incorrectly read
+"1.23.45.67" and handle it as if you typed in "1.23". Feel free to
+extend it :). Next we handle comments:
+
+.. code-block:: c++
+
+      if (LastChar == '#') {
+        // Comment until end of line.
+        do
+          LastChar = getchar();
+        while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
+
+        if (LastChar != EOF)
+          return gettok();
+      }
+
+We handle comments by skipping to the end of the line and then return
+the next token. Finally, if the input doesn't match one of the above
+cases, it is either an operator character like '+' or the end of the
+file. These are handled with this code:
+
+.. code-block:: c++
+
+      // Check for end of file.  Don't eat the EOF.
+      if (LastChar == EOF)
+        return tok_eof;
+
+      // Otherwise, just return the character as its ascii value.
+      int ThisChar = LastChar;
+      LastChar = getchar();
+      return ThisChar;
+    }
+
+With this, we have the complete lexer for the basic Kaleidoscope
+language (the `full code listing <LangImpl02.html#full-code-listing>`_ for the Lexer
+is available in the `next chapter <LangImpl02.html>`_ of the tutorial).
+Next we'll `build a simple parser that uses this to build an Abstract
+Syntax Tree <LangImpl02.html>`_. When we have that, we'll include a
+driver so that you can use the lexer and parser together.
+
+`Next: Implementing a Parser and AST <LangImpl02.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl02.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl02.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl02.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl02.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,735 @@
+===========================================
+Kaleidoscope: Implementing a Parser and AST
+===========================================
+
+.. contents::
+   :local:
+
+Chapter 2 Introduction
+======================
+
+Welcome to Chapter 2 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. This chapter shows you how to use the
+lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
+`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
+language. Once we have a parser, we'll define and build an `Abstract
+Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
+
+The parser we will build uses a combination of `Recursive Descent
+Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
+`Operator-Precedence
+Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
+parse the Kaleidoscope language (the latter for binary expressions and
+the former for everything else). Before we get to parsing though, lets
+talk about the output of the parser: the Abstract Syntax Tree.
+
+The Abstract Syntax Tree (AST)
+==============================
+
+The AST for a program captures its behavior in such a way that it is
+easy for later stages of the compiler (e.g. code generation) to
+interpret. We basically want one object for each construct in the
+language, and the AST should closely model the language. In
+Kaleidoscope, we have expressions, a prototype, and a function object.
+We'll start with expressions first:
+
+.. code-block:: c++
+
+    /// ExprAST - Base class for all expression nodes.
+    class ExprAST {
+    public:
+      virtual ~ExprAST() {}
+    };
+
+    /// NumberExprAST - Expression class for numeric literals like "1.0".
+    class NumberExprAST : public ExprAST {
+      double Val;
+
+    public:
+      NumberExprAST(double Val) : Val(Val) {}
+    };
+
+The code above shows the definition of the base ExprAST class and one
+subclass which we use for numeric literals. The important thing to note
+about this code is that the NumberExprAST class captures the numeric
+value of the literal as an instance variable. This allows later phases
+of the compiler to know what the stored numeric value is.
+
+Right now we only create the AST, so there are no useful accessor
+methods on them. It would be very easy to add a virtual method to pretty
+print the code, for example. Here are the other expression AST node
+definitions that we'll use in the basic form of the Kaleidoscope
+language:
+
+.. code-block:: c++
+
+    /// VariableExprAST - Expression class for referencing a variable, like "a".
+    class VariableExprAST : public ExprAST {
+      std::string Name;
+
+    public:
+      VariableExprAST(const std::string &Name) : Name(Name) {}
+    };
+
+    /// BinaryExprAST - Expression class for a binary operator.
+    class BinaryExprAST : public ExprAST {
+      char Op;
+      std::unique_ptr<ExprAST> LHS, RHS;
+
+    public:
+      BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
+                    std::unique_ptr<ExprAST> RHS)
+        : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
+    };
+
+    /// CallExprAST - Expression class for function calls.
+    class CallExprAST : public ExprAST {
+      std::string Callee;
+      std::vector<std::unique_ptr<ExprAST>> Args;
+
+    public:
+      CallExprAST(const std::string &Callee,
+                  std::vector<std::unique_ptr<ExprAST>> Args)
+        : Callee(Callee), Args(std::move(Args)) {}
+    };
+
+This is all (intentionally) rather straight-forward: variables capture
+the variable name, binary operators capture their opcode (e.g. '+'), and
+calls capture a function name as well as a list of any argument
+expressions. One thing that is nice about our AST is that it captures
+the language features without talking about the syntax of the language.
+Note that there is no discussion about precedence of binary operators,
+lexical structure, etc.
+
+For our basic language, these are all of the expression nodes we'll
+define. Because it doesn't have conditional control flow, it isn't
+Turing-complete; we'll fix that in a later installment. The two things
+we need next are a way to talk about the interface to a function, and a
+way to talk about functions themselves:
+
+.. code-block:: c++
+
+    /// PrototypeAST - This class represents the "prototype" for a function,
+    /// which captures its name, and its argument names (thus implicitly the number
+    /// of arguments the function takes).
+    class PrototypeAST {
+      std::string Name;
+      std::vector<std::string> Args;
+
+    public:
+      PrototypeAST(const std::string &name, std::vector<std::string> Args)
+        : Name(name), Args(std::move(Args)) {}
+    };
+
+    /// FunctionAST - This class represents a function definition itself.
+    class FunctionAST {
+      std::unique_ptr<PrototypeAST> Proto;
+      std::unique_ptr<ExprAST> Body;
+
+    public:
+      FunctionAST(std::unique_ptr<PrototypeAST> Proto,
+                  std::unique_ptr<ExprAST> Body)
+        : Proto(std::move(Proto)), Body(std::move(Body)) {}
+    };
+
+In Kaleidoscope, functions are typed with just a count of their
+arguments. Since all values are double precision floating point, the
+type of each argument doesn't need to be stored anywhere. In a more
+aggressive and realistic language, the "ExprAST" class would probably
+have a type field.
+
+With this scaffolding, we can now talk about parsing expressions and
+function bodies in Kaleidoscope.
+
+Parser Basics
+=============
+
+Now that we have an AST to build, we need to define the parser code to
+build it. The idea here is that we want to parse something like "x+y"
+(which is returned as three tokens by the lexer) into an AST that could
+be generated with calls like this:
+
+.. code-block:: c++
+
+      auto LHS = llvm::make_unique<VariableExprAST>("x");
+      auto RHS = llvm::make_unique<VariableExprAST>("y");
+      auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
+                                                    std::move(RHS));
+
+In order to do this, we'll start by defining some basic helper routines:
+
+.. code-block:: c++
+
+    /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
+    /// token the parser is looking at.  getNextToken reads another token from the
+    /// lexer and updates CurTok with its results.
+    static int CurTok;
+    static int getNextToken() {
+      return CurTok = gettok();
+    }
+
+This implements a simple token buffer around the lexer. This allows us
+to look one token ahead at what the lexer is returning. Every function
+in our parser will assume that CurTok is the current token that needs to
+be parsed.
+
+.. code-block:: c++
+
+
+    /// LogError* - These are little helper functions for error handling.
+    std::unique_ptr<ExprAST> LogError(const char *Str) {
+      fprintf(stderr, "LogError: %s\n", Str);
+      return nullptr;
+    }
+    std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
+      LogError(Str);
+      return nullptr;
+    }
+
+The ``LogError`` routines are simple helper routines that our parser will
+use to handle errors. The error recovery in our parser will not be the
+best and is not particular user-friendly, but it will be enough for our
+tutorial. These routines make it easier to handle errors in routines
+that have various return types: they always return null.
+
+With these basic helper functions, we can implement the first piece of
+our grammar: numeric literals.
+
+Basic Expression Parsing
+========================
+
+We start with numeric literals, because they are the simplest to
+process. For each production in our grammar, we'll define a function
+which parses that production. For numeric literals, we have:
+
+.. code-block:: c++
+
+    /// numberexpr ::= number
+    static std::unique_ptr<ExprAST> ParseNumberExpr() {
+      auto Result = llvm::make_unique<NumberExprAST>(NumVal);
+      getNextToken(); // consume the number
+      return std::move(Result);
+    }
+
+This routine is very simple: it expects to be called when the current
+token is a ``tok_number`` token. It takes the current number value,
+creates a ``NumberExprAST`` node, advances the lexer to the next token,
+and finally returns.
+
+There are some interesting aspects to this. The most important one is
+that this routine eats all of the tokens that correspond to the
+production and returns the lexer buffer with the next token (which is
+not part of the grammar production) ready to go. This is a fairly
+standard way to go for recursive descent parsers. For a better example,
+the parenthesis operator is defined like this:
+
+.. code-block:: c++
+
+    /// parenexpr ::= '(' expression ')'
+    static std::unique_ptr<ExprAST> ParseParenExpr() {
+      getNextToken(); // eat (.
+      auto V = ParseExpression();
+      if (!V)
+        return nullptr;
+
+      if (CurTok != ')')
+        return LogError("expected ')'");
+      getNextToken(); // eat ).
+      return V;
+    }
+
+This function illustrates a number of interesting things about the
+parser:
+
+1) It shows how we use the LogError routines. When called, this function
+expects that the current token is a '(' token, but after parsing the
+subexpression, it is possible that there is no ')' waiting. For example,
+if the user types in "(4 x" instead of "(4)", the parser should emit an
+error. Because errors can occur, the parser needs a way to indicate that
+they happened: in our parser, we return null on an error.
+
+2) Another interesting aspect of this function is that it uses recursion
+by calling ``ParseExpression`` (we will soon see that
+``ParseExpression`` can call ``ParseParenExpr``). This is powerful
+because it allows us to handle recursive grammars, and keeps each
+production very simple. Note that parentheses do not cause construction
+of AST nodes themselves. While we could do it this way, the most
+important role of parentheses are to guide the parser and provide
+grouping. Once the parser constructs the AST, parentheses are not
+needed.
+
+The next simple production is for handling variable references and
+function calls:
+
+.. code-block:: c++
+
+    /// identifierexpr
+    ///   ::= identifier
+    ///   ::= identifier '(' expression* ')'
+    static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
+      std::string IdName = IdentifierStr;
+
+      getNextToken();  // eat identifier.
+
+      if (CurTok != '(') // Simple variable ref.
+        return llvm::make_unique<VariableExprAST>(IdName);
+
+      // Call.
+      getNextToken();  // eat (
+      std::vector<std::unique_ptr<ExprAST>> Args;
+      if (CurTok != ')') {
+        while (1) {
+          if (auto Arg = ParseExpression())
+            Args.push_back(std::move(Arg));
+          else
+            return nullptr;
+
+          if (CurTok == ')')
+            break;
+
+          if (CurTok != ',')
+            return LogError("Expected ')' or ',' in argument list");
+          getNextToken();
+        }
+      }
+
+      // Eat the ')'.
+      getNextToken();
+
+      return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
+    }
+
+This routine follows the same style as the other routines. (It expects
+to be called if the current token is a ``tok_identifier`` token). It
+also has recursion and error handling. One interesting aspect of this is
+that it uses *look-ahead* to determine if the current identifier is a
+stand alone variable reference or if it is a function call expression.
+It handles this by checking to see if the token after the identifier is
+a '(' token, constructing either a ``VariableExprAST`` or
+``CallExprAST`` node as appropriate.
+
+Now that we have all of our simple expression-parsing logic in place, we
+can define a helper function to wrap it together into one entry point.
+We call this class of expressions "primary" expressions, for reasons
+that will become more clear `later in the
+tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
+primary expression, we need to determine what sort of expression it is:
+
+.. code-block:: c++
+
+    /// primary
+    ///   ::= identifierexpr
+    ///   ::= numberexpr
+    ///   ::= parenexpr
+    static std::unique_ptr<ExprAST> ParsePrimary() {
+      switch (CurTok) {
+      default:
+        return LogError("unknown token when expecting an expression");
+      case tok_identifier:
+        return ParseIdentifierExpr();
+      case tok_number:
+        return ParseNumberExpr();
+      case '(':
+        return ParseParenExpr();
+      }
+    }
+
+Now that you see the definition of this function, it is more obvious why
+we can assume the state of CurTok in the various functions. This uses
+look-ahead to determine which sort of expression is being inspected, and
+then parses it with a function call.
+
+Now that basic expressions are handled, we need to handle binary
+expressions. They are a bit more complex.
+
+Binary Expression Parsing
+=========================
+
+Binary expressions are significantly harder to parse because they are
+often ambiguous. For example, when given the string "x+y\*z", the parser
+can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
+definitions from mathematics, we expect the later parse, because "\*"
+(multiplication) has higher *precedence* than "+" (addition).
+
+There are many ways to handle this, but an elegant and efficient way is
+to use `Operator-Precedence
+Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
+This parsing technique uses the precedence of binary operators to guide
+recursion. To start with, we need a table of precedences:
+
+.. code-block:: c++
+
+    /// BinopPrecedence - This holds the precedence for each binary operator that is
+    /// defined.
+    static std::map<char, int> BinopPrecedence;
+
+    /// GetTokPrecedence - Get the precedence of the pending binary operator token.
+    static int GetTokPrecedence() {
+      if (!isascii(CurTok))
+        return -1;
+
+      // Make sure it's a declared binop.
+      int TokPrec = BinopPrecedence[CurTok];
+      if (TokPrec <= 0) return -1;
+      return TokPrec;
+    }
+
+    int main() {
+      // Install standard binary operators.
+      // 1 is lowest precedence.
+      BinopPrecedence['<'] = 10;
+      BinopPrecedence['+'] = 20;
+      BinopPrecedence['-'] = 20;
+      BinopPrecedence['*'] = 40;  // highest.
+      ...
+    }
+
+For the basic form of Kaleidoscope, we will only support 4 binary
+operators (this can obviously be extended by you, our brave and intrepid
+reader). The ``GetTokPrecedence`` function returns the precedence for
+the current token, or -1 if the token is not a binary operator. Having a
+map makes it easy to add new operators and makes it clear that the
+algorithm doesn't depend on the specific operators involved, but it
+would be easy enough to eliminate the map and do the comparisons in the
+``GetTokPrecedence`` function. (Or just use a fixed-size array).
+
+With the helper above defined, we can now start parsing binary
+expressions. The basic idea of operator precedence parsing is to break
+down an expression with potentially ambiguous binary operators into
+pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
+Operator precedence parsing considers this as a stream of primary
+expressions separated by binary operators. As such, it will first parse
+the leading primary expression "a", then it will see the pairs [+, b]
+[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
+primary expressions, the binary expression parser doesn't need to worry
+about nested subexpressions like (c+d) at all.
+
+To start, an expression is a primary expression potentially followed by
+a sequence of [binop,primaryexpr] pairs:
+
+.. code-block:: c++
+
+    /// expression
+    ///   ::= primary binoprhs
+    ///
+    static std::unique_ptr<ExprAST> ParseExpression() {
+      auto LHS = ParsePrimary();
+      if (!LHS)
+        return nullptr;
+
+      return ParseBinOpRHS(0, std::move(LHS));
+    }
+
+``ParseBinOpRHS`` is the function that parses the sequence of pairs for
+us. It takes a precedence and a pointer to an expression for the part
+that has been parsed so far. Note that "x" is a perfectly valid
+expression: As such, "binoprhs" is allowed to be empty, in which case it
+returns the expression that is passed into it. In our example above, the
+code passes the expression for "a" into ``ParseBinOpRHS`` and the
+current token is "+".
+
+The precedence value passed into ``ParseBinOpRHS`` indicates the
+*minimal operator precedence* that the function is allowed to eat. For
+example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
+passed in a precedence of 40, it will not consume any tokens (because
+the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
+starts with:
+
+.. code-block:: c++
+
+    /// binoprhs
+    ///   ::= ('+' primary)*
+    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
+                                                  std::unique_ptr<ExprAST> LHS) {
+      // If this is a binop, find its precedence.
+      while (1) {
+        int TokPrec = GetTokPrecedence();
+
+        // If this is a binop that binds at least as tightly as the current binop,
+        // consume it, otherwise we are done.
+        if (TokPrec < ExprPrec)
+          return LHS;
+
+This code gets the precedence of the current token and checks to see if
+if is too low. Because we defined invalid tokens to have a precedence of
+-1, this check implicitly knows that the pair-stream ends when the token
+stream runs out of binary operators. If this check succeeds, we know
+that the token is a binary operator and that it will be included in this
+expression:
+
+.. code-block:: c++
+
+        // Okay, we know this is a binop.
+        int BinOp = CurTok;
+        getNextToken();  // eat binop
+
+        // Parse the primary expression after the binary operator.
+        auto RHS = ParsePrimary();
+        if (!RHS)
+          return nullptr;
+
+As such, this code eats (and remembers) the binary operator and then
+parses the primary expression that follows. This builds up the whole
+pair, the first of which is [+, b] for the running example.
+
+Now that we parsed the left-hand side of an expression and one pair of
+the RHS sequence, we have to decide which way the expression associates.
+In particular, we could have "(a+b) binop unparsed" or "a + (b binop
+unparsed)". To determine this, we look ahead at "binop" to determine its
+precedence and compare it to BinOp's precedence (which is '+' in this
+case):
+
+.. code-block:: c++
+
+        // If BinOp binds less tightly with RHS than the operator after RHS, let
+        // the pending operator take RHS as its LHS.
+        int NextPrec = GetTokPrecedence();
+        if (TokPrec < NextPrec) {
+
+If the precedence of the binop to the right of "RHS" is lower or equal
+to the precedence of our current operator, then we know that the
+parentheses associate as "(a+b) binop ...". In our example, the current
+operator is "+" and the next operator is "+", we know that they have the
+same precedence. In this case we'll create the AST node for "a+b", and
+then continue parsing:
+
+.. code-block:: c++
+
+          ... if body omitted ...
+        }
+
+        // Merge LHS/RHS.
+        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
+                                               std::move(RHS));
+      }  // loop around to the top of the while loop.
+    }
+
+In our example above, this will turn "a+b+" into "(a+b)" and execute the
+next iteration of the loop, with "+" as the current token. The code
+above will eat, remember, and parse "(c+d)" as the primary expression,
+which makes the current pair equal to [+, (c+d)]. It will then evaluate
+the 'if' conditional above with "\*" as the binop to the right of the
+primary. In this case, the precedence of "\*" is higher than the
+precedence of "+" so the if condition will be entered.
+
+The critical question left here is "how can the if condition parse the
+right hand side in full"? In particular, to build the AST correctly for
+our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
+variable. The code to do this is surprisingly simple (code from the
+above two blocks duplicated for context):
+
+.. code-block:: c++
+
+        // If BinOp binds less tightly with RHS than the operator after RHS, let
+        // the pending operator take RHS as its LHS.
+        int NextPrec = GetTokPrecedence();
+        if (TokPrec < NextPrec) {
+          RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
+          if (!RHS)
+            return nullptr;
+        }
+        // Merge LHS/RHS.
+        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
+                                               std::move(RHS));
+      }  // loop around to the top of the while loop.
+    }
+
+At this point, we know that the binary operator to the RHS of our
+primary has higher precedence than the binop we are currently parsing.
+As such, we know that any sequence of pairs whose operators are all
+higher precedence than "+" should be parsed together and returned as
+"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
+specifying "TokPrec+1" as the minimum precedence required for it to
+continue. In our example above, this will cause it to return the AST
+node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
+expression.
+
+Finally, on the next iteration of the while loop, the "+g" piece is
+parsed and added to the AST. With this little bit of code (14
+non-trivial lines), we correctly handle fully general binary expression
+parsing in a very elegant way. This was a whirlwind tour of this code,
+and it is somewhat subtle. I recommend running through it with a few
+tough examples to see how it works.
+
+This wraps up handling of expressions. At this point, we can point the
+parser at an arbitrary token stream and build an expression from it,
+stopping at the first token that is not part of the expression. Next up
+we need to handle function definitions, etc.
+
+Parsing the Rest
+================
+
+The next thing missing is handling of function prototypes. In
+Kaleidoscope, these are used both for 'extern' function declarations as
+well as function body definitions. The code to do this is
+straight-forward and not very interesting (once you've survived
+expressions):
+
+.. code-block:: c++
+
+    /// prototype
+    ///   ::= id '(' id* ')'
+    static std::unique_ptr<PrototypeAST> ParsePrototype() {
+      if (CurTok != tok_identifier)
+        return LogErrorP("Expected function name in prototype");
+
+      std::string FnName = IdentifierStr;
+      getNextToken();
+
+      if (CurTok != '(')
+        return LogErrorP("Expected '(' in prototype");
+
+      // Read the list of argument names.
+      std::vector<std::string> ArgNames;
+      while (getNextToken() == tok_identifier)
+        ArgNames.push_back(IdentifierStr);
+      if (CurTok != ')')
+        return LogErrorP("Expected ')' in prototype");
+
+      // success.
+      getNextToken();  // eat ')'.
+
+      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
+    }
+
+Given this, a function definition is very simple, just a prototype plus
+an expression to implement the body:
+
+.. code-block:: c++
+
+    /// definition ::= 'def' prototype expression
+    static std::unique_ptr<FunctionAST> ParseDefinition() {
+      getNextToken();  // eat def.
+      auto Proto = ParsePrototype();
+      if (!Proto) return nullptr;
+
+      if (auto E = ParseExpression())
+        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
+      return nullptr;
+    }
+
+In addition, we support 'extern' to declare functions like 'sin' and
+'cos' as well as to support forward declaration of user functions. These
+'extern's are just prototypes with no body:
+
+.. code-block:: c++
+
+    /// external ::= 'extern' prototype
+    static std::unique_ptr<PrototypeAST> ParseExtern() {
+      getNextToken();  // eat extern.
+      return ParsePrototype();
+    }
+
+Finally, we'll also let the user type in arbitrary top-level expressions
+and evaluate them on the fly. We will handle this by defining anonymous
+nullary (zero argument) functions for them:
+
+.. code-block:: c++
+
+    /// toplevelexpr ::= expression
+    static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
+      if (auto E = ParseExpression()) {
+        // Make an anonymous proto.
+        auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
+        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
+      }
+      return nullptr;
+    }
+
+Now that we have all the pieces, let's build a little driver that will
+let us actually *execute* this code we've built!
+
+The Driver
+==========
+
+The driver for this simply invokes all of the parsing pieces with a
+top-level dispatch loop. There isn't much interesting here, so I'll just
+include the top-level loop. See `below <#full-code-listing>`_ for full code in the
+"Top-Level Parsing" section.
+
+.. code-block:: c++
+
+    /// top ::= definition | external | expression | ';'
+    static void MainLoop() {
+      while (1) {
+        fprintf(stderr, "ready> ");
+        switch (CurTok) {
+        case tok_eof:
+          return;
+        case ';': // ignore top-level semicolons.
+          getNextToken();
+          break;
+        case tok_def:
+          HandleDefinition();
+          break;
+        case tok_extern:
+          HandleExtern();
+          break;
+        default:
+          HandleTopLevelExpression();
+          break;
+        }
+      }
+    }
+
+The most interesting part of this is that we ignore top-level
+semicolons. Why is this, you ask? The basic reason is that if you type
+"4 + 5" at the command line, the parser doesn't know whether that is the
+end of what you will type or not. For example, on the next line you
+could type "def foo..." in which case 4+5 is the end of a top-level
+expression. Alternatively you could type "\* 6", which would continue
+the expression. Having top-level semicolons allows you to type "4+5;",
+and the parser will know you are done.
+
+Conclusions
+===========
+
+With just under 400 lines of commented code (240 lines of non-comment,
+non-blank code), we fully defined our minimal language, including a
+lexer, parser, and AST builder. With this done, the executable will
+validate Kaleidoscope code and tell us if it is grammatically invalid.
+For example, here is a sample interaction:
+
+.. code-block:: bash
+
+    $ ./a.out
+    ready> def foo(x y) x+foo(y, 4.0);
+    Parsed a function definition.
+    ready> def foo(x y) x+y y;
+    Parsed a function definition.
+    Parsed a top-level expr
+    ready> def foo(x y) x+y );
+    Parsed a function definition.
+    Error: unknown token when expecting an expression
+    ready> extern sin(a);
+    ready> Parsed an extern
+    ready> ^D
+    $
+
+There is a lot of room for extension here. You can define new AST nodes,
+extend the language in many ways, etc. In the `next
+installment <LangImpl3.html>`_, we will describe how to generate LLVM
+Intermediate Representation (IR) from the AST.
+
+Full Code Listing
+=================
+
+Here is the complete code listing for this and the previous chapter.
+Note that it is fully self-contained: you don't need LLVM or any
+external libraries at all for this. (Besides the C and C++ standard
+libraries, of course.) To build this, just compile with:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g -O3 toy.cpp
+    # Run
+    ./a.out
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
+   :language: c++
+
+`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl03.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl03.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl03.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl03.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,567 @@
+========================================
+Kaleidoscope: Code generation to LLVM IR
+========================================
+
+.. contents::
+   :local:
+
+Chapter 3 Introduction
+======================
+
+Welcome to Chapter 3 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. This chapter shows you how to transform
+the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into
+LLVM IR. This will teach you a little bit about how LLVM does things, as
+well as demonstrate how easy it is to use. It's much more work to build
+a lexer and parser than it is to generate LLVM IR code. :)
+
+**Please note**: the code in this chapter and later require LLVM 3.7 or
+later. LLVM 3.6 and before will not work with it. Also note that you
+need to use a version of this tutorial that matches your LLVM release:
+If you are using an official LLVM release, use the version of the
+documentation included with your release or on the `llvm.org releases
+page <http://llvm.org/releases/>`_.
+
+Code Generation Setup
+=====================
+
+In order to generate LLVM IR, we want some simple setup to get started.
+First we define virtual code generation (codegen) methods in each AST
+class:
+
+.. code-block:: c++
+
+    /// ExprAST - Base class for all expression nodes.
+    class ExprAST {
+    public:
+      virtual ~ExprAST() {}
+      virtual Value *codegen() = 0;
+    };
+
+    /// NumberExprAST - Expression class for numeric literals like "1.0".
+    class NumberExprAST : public ExprAST {
+      double Val;
+
+    public:
+      NumberExprAST(double Val) : Val(Val) {}
+      virtual Value *codegen();
+    };
+    ...
+
+The codegen() method says to emit IR for that AST node along with all
+the things it depends on, and they all return an LLVM Value object.
+"Value" is the class used to represent a "`Static Single Assignment
+(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+register" or "SSA value" in LLVM. The most distinct aspect of SSA values
+is that their value is computed as the related instruction executes, and
+it does not get a new value until (and if) the instruction re-executes.
+In other words, there is no way to "change" an SSA value. For more
+information, please read up on `Static Single
+Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+- the concepts are really quite natural once you grok them.
+
+Note that instead of adding virtual methods to the ExprAST class
+hierarchy, it could also make sense to use a `visitor
+pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other
+way to model this. Again, this tutorial won't dwell on good software
+engineering practices: for our purposes, adding a virtual method is
+simplest.
+
+The second thing we want is an "LogError" method like we used for the
+parser, which will be used to report errors found during code generation
+(for example, use of an undeclared parameter):
+
+.. code-block:: c++
+
+    static LLVMContext TheContext;
+    static IRBuilder<> Builder(TheContext);
+    static std::unique_ptr<Module> TheModule;
+    static std::map<std::string, Value *> NamedValues;
+
+    Value *LogErrorV(const char *Str) {
+      LogError(Str);
+      return nullptr;
+    }
+
+The static variables will be used during code generation. ``TheContext``
+is an opaque object that owns a lot of core LLVM data structures, such as
+the type and constant value tables. We don't need to understand it in
+detail, we just need a single instance to pass into APIs that require it.
+
+The ``Builder`` object is a helper object that makes it easy to generate
+LLVM instructions. Instances of the
+`IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
+class template keep track of the current place to insert instructions
+and has methods to create new instructions.
+
+``TheModule`` is an LLVM construct that contains functions and global
+variables. In many ways, it is the top-level structure that the LLVM IR
+uses to contain code. It will own the memory for all of the IR that we
+generate, which is why the codegen() method returns a raw Value\*,
+rather than a unique_ptr<Value>.
+
+The ``NamedValues`` map keeps track of which values are defined in the
+current scope and what their LLVM representation is. (In other words, it
+is a symbol table for the code). In this form of Kaleidoscope, the only
+things that can be referenced are function parameters. As such, function
+parameters will be in this map when generating code for their function
+body.
+
+With these basics in place, we can start talking about how to generate
+code for each expression. Note that this assumes that the ``Builder``
+has been set up to generate code *into* something. For now, we'll assume
+that this has already been done, and we'll just use it to emit code.
+
+Expression Code Generation
+==========================
+
+Generating LLVM code for expression nodes is very straightforward: less
+than 45 lines of commented code for all four of our expression nodes.
+First we'll do numeric literals:
+
+.. code-block:: c++
+
+    Value *NumberExprAST::codegen() {
+      return ConstantFP::get(LLVMContext, APFloat(Val));
+    }
+
+In the LLVM IR, numeric constants are represented with the
+``ConstantFP`` class, which holds the numeric value in an ``APFloat``
+internally (``APFloat`` has the capability of holding floating point
+constants of Arbitrary Precision). This code basically just creates
+and returns a ``ConstantFP``. Note that in the LLVM IR that constants
+are all uniqued together and shared. For this reason, the API uses the
+"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".
+
+.. code-block:: c++
+
+    Value *VariableExprAST::codegen() {
+      // Look this variable up in the function.
+      Value *V = NamedValues[Name];
+      if (!V)
+        LogErrorV("Unknown variable name");
+      return V;
+    }
+
+References to variables are also quite simple using LLVM. In the simple
+version of Kaleidoscope, we assume that the variable has already been
+emitted somewhere and its value is available. In practice, the only
+values that can be in the ``NamedValues`` map are function arguments.
+This code simply checks to see that the specified name is in the map (if
+not, an unknown variable is being referenced) and returns the value for
+it. In future chapters, we'll add support for `loop induction
+variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local
+variables <LangImpl7.html#user-defined-local-variables>`_.
+
+.. code-block:: c++
+
+    Value *BinaryExprAST::codegen() {
+      Value *L = LHS->codegen();
+      Value *R = RHS->codegen();
+      if (!L || !R)
+        return nullptr;
+
+      switch (Op) {
+      case '+':
+        return Builder.CreateFAdd(L, R, "addtmp");
+      case '-':
+        return Builder.CreateFSub(L, R, "subtmp");
+      case '*':
+        return Builder.CreateFMul(L, R, "multmp");
+      case '<':
+        L = Builder.CreateFCmpULT(L, R, "cmptmp");
+        // Convert bool 0/1 to double 0.0 or 1.0
+        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),
+                                    "booltmp");
+      default:
+        return LogErrorV("invalid binary operator");
+      }
+    }
+
+Binary operators start to get more interesting. The basic idea here is
+that we recursively emit code for the left-hand side of the expression,
+then the right-hand side, then we compute the result of the binary
+expression. In this code, we do a simple switch on the opcode to create
+the right LLVM instruction.
+
+In the example above, the LLVM builder class is starting to show its
+value. IRBuilder knows where to insert the newly created instruction,
+all you have to do is specify what instruction to create (e.g. with
+``CreateFAdd``), which operands to use (``L`` and ``R`` here) and
+optionally provide a name for the generated instruction.
+
+One nice thing about LLVM is that the name is just a hint. For instance,
+if the code above emits multiple "addtmp" variables, LLVM will
+automatically provide each one with an increasing, unique numeric
+suffix. Local value names for instructions are purely optional, but it
+makes it much easier to read the IR dumps.
+
+`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
+rules: for example, the Left and Right operators of an `add
+instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
+result type of the add must match the operand types. Because all values
+in Kaleidoscope are doubles, this makes for very simple code for add,
+sub and mul.
+
+On the other hand, LLVM specifies that the `fcmp
+instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
+one bit integer). The problem with this is that Kaleidoscope wants the
+value to be a 0.0 or 1.0 value. In order to get these semantics, we
+combine the fcmp instruction with a `uitofp
+instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
+input integer into a floating point value by treating the input as an
+unsigned value. In contrast, if we used the `sitofp
+instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
+would return 0.0 and -1.0, depending on the input value.
+
+.. code-block:: c++
+
+    Value *CallExprAST::codegen() {
+      // Look up the name in the global module table.
+      Function *CalleeF = TheModule->getFunction(Callee);
+      if (!CalleeF)
+        return LogErrorV("Unknown function referenced");
+
+      // If argument mismatch error.
+      if (CalleeF->arg_size() != Args.size())
+        return LogErrorV("Incorrect # arguments passed");
+
+      std::vector<Value *> ArgsV;
+      for (unsigned i = 0, e = Args.size(); i != e; ++i) {
+        ArgsV.push_back(Args[i]->codegen());
+        if (!ArgsV.back())
+          return nullptr;
+      }
+
+      return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
+    }
+
+Code generation for function calls is quite straightforward with LLVM. The code
+above initially does a function name lookup in the LLVM Module's symbol table.
+Recall that the LLVM Module is the container that holds the functions we are
+JIT'ing. By giving each function the same name as what the user specifies, we
+can use the LLVM symbol table to resolve function names for us.
+
+Once we have the function to call, we recursively codegen each argument
+that is to be passed in, and create an LLVM `call
+instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
+calling conventions by default, allowing these calls to also call into
+standard library functions like "sin" and "cos", with no additional
+effort.
+
+This wraps up our handling of the four basic expressions that we have so
+far in Kaleidoscope. Feel free to go in and add some more. For example,
+by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
+several other interesting instructions that are really easy to plug into
+our basic framework.
+
+Function Code Generation
+========================
+
+Code generation for prototypes and functions must handle a number of
+details, which make their code less beautiful than expression code
+generation, but allows us to illustrate some important points. First,
+lets talk about code generation for prototypes: they are used both for
+function bodies and external function declarations. The code starts
+with:
+
+.. code-block:: c++
+
+    Function *PrototypeAST::codegen() {
+      // Make the function type:  double(double,double) etc.
+      std::vector<Type*> Doubles(Args.size(),
+                                 Type::getDoubleTy(LLVMContext));
+      FunctionType *FT =
+        FunctionType::get(Type::getDoubleTy(LLVMContext), Doubles, false);
+
+      Function *F =
+        Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
+
+This code packs a lot of power into a few lines. Note first that this
+function returns a "Function\*" instead of a "Value\*". Because a
+"prototype" really talks about the external interface for a function
+(not the value computed by an expression), it makes sense for it to
+return the LLVM Function it corresponds to when codegen'd.
+
+The call to ``FunctionType::get`` creates the ``FunctionType`` that
+should be used for a given Prototype. Since all function arguments in
+Kaleidoscope are of type double, the first line creates a vector of "N"
+LLVM double types. It then uses the ``Functiontype::get`` method to
+create a function type that takes "N" doubles as arguments, returns one
+double as a result, and that is not vararg (the false parameter
+indicates this). Note that Types in LLVM are uniqued just like Constants
+are, so you don't "new" a type, you "get" it.
+
+The final line above actually creates the IR Function corresponding to
+the Prototype. This indicates the type, linkage and name to use, as
+well as which module to insert into. "`external
+linkage <../LangRef.html#linkage>`_" means that the function may be
+defined outside the current module and/or that it is callable by
+functions outside the module. The Name passed in is the name the user
+specified: since "``TheModule``" is specified, this name is registered
+in "``TheModule``"s symbol table.
+
+.. code-block:: c++
+
+  // Set names for all arguments.
+  unsigned Idx = 0;
+  for (auto &Arg : F->args())
+    Arg.setName(Args[Idx++]);
+
+  return F;
+
+Finally, we set the name of each of the function's arguments according to the
+names given in the Prototype. This step isn't strictly necessary, but keeping
+the names consistent makes the IR more readable, and allows subsequent code to
+refer directly to the arguments for their names, rather than having to look up
+them up in the Prototype AST.
+
+At this point we have a function prototype with no body. This is how LLVM IR
+represents function declarations. For extern statements in Kaleidoscope, this
+is as far as we need to go. For function definitions however, we need to
+codegen and attach a function body.
+
+.. code-block:: c++
+
+  Function *FunctionAST::codegen() {
+      // First, check for an existing function from a previous 'extern' declaration.
+    Function *TheFunction = TheModule->getFunction(Proto->getName());
+
+    if (!TheFunction)
+      TheFunction = Proto->codegen();
+
+    if (!TheFunction)
+      return nullptr;
+
+    if (!TheFunction->empty())
+      return (Function*)LogErrorV("Function cannot be redefined.");
+
+
+For function definitions, we start by searching TheModule's symbol table for an
+existing version of this function, in case one has already been created using an
+'extern' statement. If Module::getFunction returns null then no previous version
+exists, so we'll codegen one from the Prototype. In either case, we want to
+assert that the function is empty (i.e. has no body yet) before we start.
+
+.. code-block:: c++
+
+  // Create a new basic block to start insertion into.
+  BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);
+  Builder.SetInsertPoint(BB);
+
+  // Record the function arguments in the NamedValues map.
+  NamedValues.clear();
+  for (auto &Arg : TheFunction->args())
+    NamedValues[Arg.getName()] = &Arg;
+
+Now we get to the point where the ``Builder`` is set up. The first line
+creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
+(named "entry"), which is inserted into ``TheFunction``. The second line
+then tells the builder that new instructions should be inserted into the
+end of the new basic block. Basic blocks in LLVM are an important part
+of functions that define the `Control Flow
+Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
+don't have any control flow, our functions will only contain one block
+at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).
+
+Next we add the function arguments to the NamedValues map (after first clearing
+it out) so that they're accessible to ``VariableExprAST`` nodes.
+
+.. code-block:: c++
+
+      if (Value *RetVal = Body->codegen()) {
+        // Finish off the function.
+        Builder.CreateRet(RetVal);
+
+        // Validate the generated code, checking for consistency.
+        verifyFunction(*TheFunction);
+
+        return TheFunction;
+      }
+
+Once the insertion point has been set up and the NamedValues map populated,
+we call the ``codegen()`` method for the root expression of the function. If no
+error happens, this emits code to compute the expression into the entry block
+and returns the value that was computed. Assuming no error, we then create an
+LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function.
+Once the function is built, we call ``verifyFunction``, which is
+provided by LLVM. This function does a variety of consistency checks on
+the generated code, to determine if our compiler is doing everything
+right. Using this is important: it can catch a lot of bugs. Once the
+function is finished and validated, we return it.
+
+.. code-block:: c++
+
+      // Error reading body, remove function.
+      TheFunction->eraseFromParent();
+      return nullptr;
+    }
+
+The only piece left here is handling of the error case. For simplicity,
+we handle this by merely deleting the function we produced with the
+``eraseFromParent`` method. This allows the user to redefine a function
+that they incorrectly typed in before: if we didn't delete it, it would
+live in the symbol table, with a body, preventing future redefinition.
+
+This code does have a bug, though: If the ``FunctionAST::codegen()`` method
+finds an existing IR Function, it does not validate its signature against the
+definition's own prototype. This means that an earlier 'extern' declaration will
+take precedence over the function definition's signature, which can cause
+codegen to fail, for instance if the function arguments are named differently.
+There are a number of ways to fix this bug, see what you can come up with! Here
+is a testcase:
+
+::
+
+    extern foo(a);     # ok, defines foo.
+    def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence).
+
+Driver Changes and Closing Thoughts
+===================================
+
+For now, code generation to LLVM doesn't really get us much, except that
+we can look at the pretty IR calls. The sample code inserts calls to
+codegen into the "``HandleDefinition``", "``HandleExtern``" etc
+functions, and then dumps out the LLVM IR. This gives a nice way to look
+at the LLVM IR for simple functions. For example:
+
+::
+
+    ready> 4+5;
+    Read top-level expression:
+    define double @0() {
+    entry:
+      ret double 9.000000e+00
+    }
+
+Note how the parser turns the top-level expression into anonymous
+functions for us. This will be handy when we add `JIT
+support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the
+code is very literally transcribed, no optimizations are being performed
+except simple constant folding done by IRBuilder. We will `add
+optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next
+chapter.
+
+::
+
+    ready> def foo(a b) a*a + 2*a*b + b*b;
+    Read function definition:
+    define double @foo(double %a, double %b) {
+    entry:
+      %multmp = fmul double %a, %a
+      %multmp1 = fmul double 2.000000e+00, %a
+      %multmp2 = fmul double %multmp1, %b
+      %addtmp = fadd double %multmp, %multmp2
+      %multmp3 = fmul double %b, %b
+      %addtmp4 = fadd double %addtmp, %multmp3
+      ret double %addtmp4
+    }
+
+This shows some simple arithmetic. Notice the striking similarity to the
+LLVM builder calls that we use to create the instructions.
+
+::
+
+    ready> def bar(a) foo(a, 4.0) + bar(31337);
+    Read function definition:
+    define double @bar(double %a) {
+    entry:
+      %calltmp = call double @foo(double %a, double 4.000000e+00)
+      %calltmp1 = call double @bar(double 3.133700e+04)
+      %addtmp = fadd double %calltmp, %calltmp1
+      ret double %addtmp
+    }
+
+This shows some function calls. Note that this function will take a long
+time to execute if you call it. In the future we'll add conditional
+control flow to actually make recursion useful :).
+
+::
+
+    ready> extern cos(x);
+    Read extern:
+    declare double @cos(double)
+
+    ready> cos(1.234);
+    Read top-level expression:
+    define double @1() {
+    entry:
+      %calltmp = call double @cos(double 1.234000e+00)
+      ret double %calltmp
+    }
+
+This shows an extern for the libm "cos" function, and a call to it.
+
+.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up
+   on highlighting this due to the first line.
+
+::
+
+    ready> ^D
+    ; ModuleID = 'my cool jit'
+
+    define double @0() {
+    entry:
+      %addtmp = fadd double 4.000000e+00, 5.000000e+00
+      ret double %addtmp
+    }
+
+    define double @foo(double %a, double %b) {
+    entry:
+      %multmp = fmul double %a, %a
+      %multmp1 = fmul double 2.000000e+00, %a
+      %multmp2 = fmul double %multmp1, %b
+      %addtmp = fadd double %multmp, %multmp2
+      %multmp3 = fmul double %b, %b
+      %addtmp4 = fadd double %addtmp, %multmp3
+      ret double %addtmp4
+    }
+
+    define double @bar(double %a) {
+    entry:
+      %calltmp = call double @foo(double %a, double 4.000000e+00)
+      %calltmp1 = call double @bar(double 3.133700e+04)
+      %addtmp = fadd double %calltmp, %calltmp1
+      ret double %addtmp
+    }
+
+    declare double @cos(double)
+
+    define double @1() {
+    entry:
+      %calltmp = call double @cos(double 1.234000e+00)
+      ret double %calltmp
+    }
+
+When you quit the current demo, it dumps out the IR for the entire
+module generated. Here you can see the big picture with all the
+functions referencing each other.
+
+This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
+we'll describe how to `add JIT codegen and optimizer
+support <LangImpl4.html>`_ to this so we can actually start running
+code!
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+the LLVM code generator. Because this uses the LLVM libraries, we need
+to link them in. To do this, we use the
+`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
+our makefile/command line about which options to use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
+    # Run
+    ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter3/toy.cpp
+   :language: c++
+
+`Next: Adding JIT and Optimizer Support <LangImpl04.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl04.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl04.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl04.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl04.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,610 @@
+==============================================
+Kaleidoscope: Adding JIT and Optimizer Support
+==============================================
+
+.. contents::
+   :local:
+
+Chapter 4 Introduction
+======================
+
+Welcome to Chapter 4 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
+of a simple language and added support for generating LLVM IR. This
+chapter describes two new techniques: adding optimizer support to your
+language, and adding JIT compiler support. These additions will
+demonstrate how to get nice, efficient code for the Kaleidoscope
+language.
+
+Trivial Constant Folding
+========================
+
+Our demonstration for Chapter 3 is elegant and easy to extend.
+Unfortunately, it does not produce wonderful code. The IRBuilder,
+however, does give us obvious optimizations when compiling simple code:
+
+::
+
+    ready> def test(x) 1+2+x;
+    Read function definition:
+    define double @test(double %x) {
+    entry:
+            %addtmp = fadd double 3.000000e+00, %x
+            ret double %addtmp
+    }
+
+This code is not a literal transcription of the AST built by parsing the
+input. That would be:
+
+::
+
+    ready> def test(x) 1+2+x;
+    Read function definition:
+    define double @test(double %x) {
+    entry:
+            %addtmp = fadd double 2.000000e+00, 1.000000e+00
+            %addtmp1 = fadd double %addtmp, %x
+            ret double %addtmp1
+    }
+
+Constant folding, as seen above, in particular, is a very common and
+very important optimization: so much so that many language implementors
+implement constant folding support in their AST representation.
+
+With LLVM, you don't need this support in the AST. Since all calls to
+build LLVM IR go through the LLVM IR builder, the builder itself checked
+to see if there was a constant folding opportunity when you call it. If
+so, it just does the constant fold and return the constant instead of
+creating an instruction.
+
+Well, that was easy :). In practice, we recommend always using
+``IRBuilder`` when generating code like this. It has no "syntactic
+overhead" for its use (you don't have to uglify your compiler with
+constant checks everywhere) and it can dramatically reduce the amount of
+LLVM IR that is generated in some cases (particular for languages with a
+macro preprocessor or that use a lot of constants).
+
+On the other hand, the ``IRBuilder`` is limited by the fact that it does
+all of its analysis inline with the code as it is built. If you take a
+slightly more complex example:
+
+::
+
+    ready> def test(x) (1+2+x)*(x+(1+2));
+    ready> Read function definition:
+    define double @test(double %x) {
+    entry:
+            %addtmp = fadd double 3.000000e+00, %x
+            %addtmp1 = fadd double %x, 3.000000e+00
+            %multmp = fmul double %addtmp, %addtmp1
+            ret double %multmp
+    }
+
+In this case, the LHS and RHS of the multiplication are the same value.
+We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
+instead of computing "``x+3``" twice.
+
+Unfortunately, no amount of local analysis will be able to detect and
+correct this. This requires two transformations: reassociation of
+expressions (to make the add's lexically identical) and Common
+Subexpression Elimination (CSE) to delete the redundant add instruction.
+Fortunately, LLVM provides a broad range of optimizations that you can
+use, in the form of "passes".
+
+LLVM Optimization Passes
+========================
+
+LLVM provides many optimization passes, which do many different sorts of
+things and have different tradeoffs. Unlike other systems, LLVM doesn't
+hold to the mistaken notion that one set of optimizations is right for
+all languages and for all situations. LLVM allows a compiler implementor
+to make complete decisions about what optimizations to use, in which
+order, and in what situation.
+
+As a concrete example, LLVM supports both "whole module" passes, which
+look across as large of body of code as they can (often a whole file,
+but if run at link time, this can be a substantial portion of the whole
+program). It also supports and includes "per-function" passes which just
+operate on a single function at a time, without looking at other
+functions. For more information on passes and how they are run, see the
+`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the
+`List of LLVM Passes <../Passes.html>`_.
+
+For Kaleidoscope, we are currently generating functions on the fly, one
+at a time, as the user types them in. We aren't shooting for the
+ultimate optimization experience in this setting, but we also want to
+catch the easy and quick stuff where possible. As such, we will choose
+to run a few per-function optimizations as the user types the function
+in. If we wanted to make a "static Kaleidoscope compiler", we would use
+exactly the code we have now, except that we would defer running the
+optimizer until the entire file has been parsed.
+
+In order to get per-function optimizations going, we need to set up a
+`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
+and organize the LLVM optimizations that we want to run. Once we have
+that, we can add a set of optimizations to run. We'll need a new
+FunctionPassManager for each module that we want to optimize, so we'll
+write a function to create and initialize both the module and pass manager
+for us:
+
+.. code-block:: c++
+
+    void InitializeModuleAndPassManager(void) {
+      // Open a new module.
+      Context LLVMContext;
+      TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext);
+      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
+
+      // Create a new pass manager attached to it.
+      TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get());
+
+      // Provide basic AliasAnalysis support for GVN.
+      TheFPM.add(createBasicAliasAnalysisPass());
+      // Do simple "peephole" optimizations and bit-twiddling optzns.
+      TheFPM.add(createInstructionCombiningPass());
+      // Reassociate expressions.
+      TheFPM.add(createReassociatePass());
+      // Eliminate Common SubExpressions.
+      TheFPM.add(createGVNPass());
+      // Simplify the control flow graph (deleting unreachable blocks, etc).
+      TheFPM.add(createCFGSimplificationPass());
+
+      TheFPM.doInitialization();
+    }
+
+This code initializes the global module ``TheModule``, and the function pass
+manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
+set up, we use a series of "add" calls to add a bunch of LLVM passes.
+
+In this case, we choose to add five passes: one analysis pass (alias analysis),
+and four optimization passes. The passes we choose here are a pretty standard set
+of "cleanup" optimizations that are useful for a wide variety of code. I won't
+delve into what they do but, believe me, they are a good starting place :).
+
+Once the PassManager is set up, we need to make use of it. We do this by
+running it after our newly created function is constructed (in
+``FunctionAST::codegen()``), but before it is returned to the client:
+
+.. code-block:: c++
+
+      if (Value *RetVal = Body->codegen()) {
+        // Finish off the function.
+        Builder.CreateRet(RetVal);
+
+        // Validate the generated code, checking for consistency.
+        verifyFunction(*TheFunction);
+
+        // Optimize the function.
+        TheFPM->run(*TheFunction);
+
+        return TheFunction;
+      }
+
+As you can see, this is pretty straightforward. The
+``FunctionPassManager`` optimizes and updates the LLVM Function\* in
+place, improving (hopefully) its body. With this in place, we can try
+our test above again:
+
+::
+
+    ready> def test(x) (1+2+x)*(x+(1+2));
+    ready> Read function definition:
+    define double @test(double %x) {
+    entry:
+            %addtmp = fadd double %x, 3.000000e+00
+            %multmp = fmul double %addtmp, %addtmp
+            ret double %multmp
+    }
+
+As expected, we now get our nicely optimized code, saving a floating
+point add instruction from every execution of this function.
+
+LLVM provides a wide variety of optimizations that can be used in
+certain circumstances. Some `documentation about the various
+passes <../Passes.html>`_ is available, but it isn't very complete.
+Another good source of ideas can come from looking at the passes that
+``Clang`` runs to get started. The "``opt``" tool allows you to
+experiment with passes from the command line, so you can see if they do
+anything.
+
+Now that we have reasonable code coming out of our front-end, lets talk
+about executing it!
+
+Adding a JIT Compiler
+=====================
+
+Code that is available in LLVM IR can have a wide variety of tools
+applied to it. For example, you can run optimizations on it (as we did
+above), you can dump it out in textual or binary forms, you can compile
+the code to an assembly file (.s) for some target, or you can JIT
+compile it. The nice thing about the LLVM IR representation is that it
+is the "common currency" between many different parts of the compiler.
+
+In this section, we'll add JIT compiler support to our interpreter. The
+basic idea that we want for Kaleidoscope is to have the user enter
+function bodies as they do now, but immediately evaluate the top-level
+expressions they type in. For example, if they type in "1 + 2;", we
+should evaluate and print out 3. If they define a function, they should
+be able to call it from the command line.
+
+In order to do this, we first declare and initialize the JIT. This is
+done by adding a global variable ``TheJIT``, and initializing it in
+``main``:
+
+.. code-block:: c++
+
+    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
+    ...
+    int main() {
+      ..
+      TheJIT = llvm::make_unique<KaleidoscopeJIT>();
+
+      // Run the main "interpreter loop" now.
+      MainLoop();
+
+      return 0;
+    }
+
+The KaleidoscopeJIT class is a simple JIT built specifically for these
+tutorials. In later chapters we will look at how it works and extend it with
+new features, but for now we will take it as given. Its API is very simple::
+``addModule`` adds an LLVM IR module to the JIT, making its functions
+available for execution; ``removeModule`` removes a module, freeing any
+memory associated with the code in that module; and ``findSymbol`` allows us
+to look up pointers to the compiled code.
+
+We can take this simple API and change our code that parses top-level expressions to
+look like this:
+
+.. code-block:: c++
+
+    static void HandleTopLevelExpression() {
+      // Evaluate a top-level expression into an anonymous function.
+      if (auto FnAST = ParseTopLevelExpr()) {
+        if (FnAST->codegen()) {
+
+          // JIT the module containing the anonymous expression, keeping a handle so
+          // we can free it later.
+          auto H = TheJIT->addModule(std::move(TheModule));
+          InitializeModuleAndPassManager();
+
+          // Search the JIT for the __anon_expr symbol.
+          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
+          assert(ExprSymbol && "Function not found");
+
+          // Get the symbol's address and cast it to the right type (takes no
+          // arguments, returns a double) so we can call it as a native function.
+          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
+          fprintf(stderr, "Evaluated to %f\n", FP());
+
+          // Delete the anonymous expression module from the JIT.
+          TheJIT->removeModule(H);
+        }
+
+If parsing and codegen succeeed, the next step is to add the module containing
+the top-level expression to the JIT. We do this by calling addModule, which
+triggers code generation for all the functions in the module, and returns a
+handle that can be used to remove the module from the JIT later. Once the module
+has been added to the JIT it can no longer be modified, so we also open a new
+module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
+
+Once we've added the module to the JIT we need to get a pointer to the final
+generated code. We do this by calling the JIT's findSymbol method, and passing
+the name of the top-level expression function: ``__anon_expr``. Since we just
+added this function, we assert that findSymbol returned a result.
+
+Next, we get the in-memory address of the ``__anon_expr`` function by calling
+``getAddress()`` on the symbol. Recall that we compile top-level expressions
+into a self-contained LLVM function that takes no arguments and returns the
+computed double. Because the LLVM JIT compiler matches the native platform ABI,
+this means that you can just cast the result pointer to a function pointer of
+that type and call it directly. This means, there is no difference between JIT
+compiled code and native machine code that is statically linked into your
+application.
+
+Finally, since we don't support re-evaluation of top-level expressions, we
+remove the module from the JIT when we're done to free the associated memory.
+Recall, however, that the module we created a few lines earlier (via
+``InitializeModuleAndPassManager``) is still open and waiting for new code to be
+added.
+
+With just these two changes, lets see how Kaleidoscope works now!
+
+::
+
+    ready> 4+5;
+    Read top-level expression:
+    define double @0() {
+    entry:
+      ret double 9.000000e+00
+    }
+
+    Evaluated to 9.000000
+
+Well this looks like it is basically working. The dump of the function
+shows the "no argument function that always returns double" that we
+synthesize for each top-level expression that is typed in. This
+demonstrates very basic functionality, but can we do more?
+
+::
+
+    ready> def testfunc(x y) x + y*2;
+    Read function definition:
+    define double @testfunc(double %x, double %y) {
+    entry:
+      %multmp = fmul double %y, 2.000000e+00
+      %addtmp = fadd double %multmp, %x
+      ret double %addtmp
+    }
+
+    ready> testfunc(4, 10);
+    Read top-level expression:
+    define double @1() {
+    entry:
+      %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
+      ret double %calltmp
+    }
+
+    Evaluated to 24.000000
+
+    ready> testfunc(5, 10);
+    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
+
+
+Function definitions and calls also work, but something went very wrong on that
+last line. The call looks valid, so what happened? As you may have guessed from
+the the API a Module is a unit of allocation for the JIT, and testfunc was part
+of the same module that contained anonymous expression. When we removed that
+module from the JIT to free the memory for the anonymous expression, we deleted
+the definition of ``testfunc`` along with it. Then, when we tried to call
+testfunc a second time, the JIT could no longer find it.
+
+The easiest way to fix this is to put the anonymous expression in a separate
+module from the rest of the function definitions. The JIT will happily resolve
+function calls across module boundaries, as long as each of the functions called
+has a prototype, and is added to the JIT before it is called. By putting the
+anonymous expression in a different module we can delete it without affecting
+the rest of the functions.
+
+In fact, we're going to go a step further and put every function in its own
+module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
+that will make our environment more REPL-like: Functions can be added to the
+JIT more than once (unlike a module where every function must have a unique
+definition). When you look up a symbol in KaleidoscopeJIT it will always return
+the most recent definition:
+
+::
+
+    ready> def foo(x) x + 1;
+    Read function definition:
+    define double @foo(double %x) {
+    entry:
+      %addtmp = fadd double %x, 1.000000e+00
+      ret double %addtmp
+    }
+
+    ready> foo(2);
+    Evaluated to 3.000000
+
+    ready> def foo(x) x + 2;
+    define double @foo(double %x) {
+    entry:
+      %addtmp = fadd double %x, 2.000000e+00
+      ret double %addtmp
+    }
+
+    ready> foo(2);
+    Evaluated to 4.000000
+
+
+To allow each function to live in its own module we'll need a way to
+re-generate previous function declarations into each new module we open:
+
+.. code-block:: c++
+
+    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
+
+    ...
+
+    Function *getFunction(std::string Name) {
+      // First, see if the function has already been added to the current module.
+      if (auto *F = TheModule->getFunction(Name))
+        return F;
+
+      // If not, check whether we can codegen the declaration from some existing
+      // prototype.
+      auto FI = FunctionProtos.find(Name);
+      if (FI != FunctionProtos.end())
+        return FI->second->codegen();
+
+      // If no existing prototype exists, return null.
+      return nullptr;
+    }
+
+    ...
+
+    Value *CallExprAST::codegen() {
+      // Look up the name in the global module table.
+      Function *CalleeF = getFunction(Callee);
+
+    ...
+
+    Function *FunctionAST::codegen() {
+      // Transfer ownership of the prototype to the FunctionProtos map, but keep a
+      // reference to it for use below.
+      auto &P = *Proto;
+      FunctionProtos[Proto->getName()] = std::move(Proto);
+      Function *TheFunction = getFunction(P.getName());
+      if (!TheFunction)
+        return nullptr;
+
+
+To enable this, we'll start by adding a new global, ``FunctionProtos``, that
+holds the most recent prototype for each function. We'll also add a convenience
+method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
+Our convenience method searches ``TheModule`` for an existing function
+declaration, falling back to generating a new declaration from FunctionProtos if
+it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
+call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
+update the FunctionProtos map first, then call ``getFunction()``. With this
+done, we can always obtain a function declaration in the current module for any
+previously declared function.
+
+We also need to update HandleDefinition and HandleExtern:
+
+.. code-block:: c++
+
+    static void HandleDefinition() {
+      if (auto FnAST = ParseDefinition()) {
+        if (auto *FnIR = FnAST->codegen()) {
+          fprintf(stderr, "Read function definition:");
+          FnIR->dump();
+          TheJIT->addModule(std::move(TheModule));
+          InitializeModuleAndPassManager();
+        }
+      } else {
+        // Skip token for error recovery.
+         getNextToken();
+      }
+    }
+
+    static void HandleExtern() {
+      if (auto ProtoAST = ParseExtern()) {
+        if (auto *FnIR = ProtoAST->codegen()) {
+          fprintf(stderr, "Read extern: ");
+          FnIR->dump();
+          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
+        }
+      } else {
+        // Skip token for error recovery.
+        getNextToken();
+      }
+    }
+
+In HandleDefinition, we add two lines to transfer the newly defined function to
+the JIT and open a new module. In HandleExtern, we just need to add one line to
+add the prototype to FunctionProtos.
+
+With these changes made, lets try our REPL again (I removed the dump of the
+anonymous functions this time, you should get the idea by now :) :
+
+::
+
+    ready> def foo(x) x + 1;
+    ready> foo(2);
+    Evaluated to 3.000000
+
+    ready> def foo(x) x + 2;
+    ready> foo(2);
+    Evaluated to 4.000000
+
+It works!
+
+Even with this simple code, we get some surprisingly powerful capabilities -
+check this out:
+
+::
+
+    ready> extern sin(x);
+    Read extern:
+    declare double @sin(double)
+
+    ready> extern cos(x);
+    Read extern:
+    declare double @cos(double)
+
+    ready> sin(1.0);
+    Read top-level expression:
+    define double @2() {
+    entry:
+      ret double 0x3FEAED548F090CEE
+    }
+
+    Evaluated to 0.841471
+
+    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
+    Read function definition:
+    define double @foo(double %x) {
+    entry:
+      %calltmp = call double @sin(double %x)
+      %multmp = fmul double %calltmp, %calltmp
+      %calltmp2 = call double @cos(double %x)
+      %multmp4 = fmul double %calltmp2, %calltmp2
+      %addtmp = fadd double %multmp, %multmp4
+      ret double %addtmp
+    }
+
+    ready> foo(4.0);
+    Read top-level expression:
+    define double @3() {
+    entry:
+      %calltmp = call double @foo(double 4.000000e+00)
+      ret double %calltmp
+    }
+
+    Evaluated to 1.000000
+
+Whoa, how does the JIT know about sin and cos? The answer is surprisingly
+simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
+it uses to find symbols that aren't available in any given module: First
+it searches all the modules that have already been added to the JIT, from the
+most recent to the oldest, to find the newest definition. If no definition is
+found inside the JIT, it falls back to calling "``dlsym("sin")``" on the
+Kaleidoscope process itself. Since "``sin``" is defined within the JIT's
+address space, it simply patches up calls in the module to call the libm
+version of ``sin`` directly.
+
+In the future we'll see how tweaking this symbol resolution rule can be used to
+enable all sorts of useful features, from security (restricting the set of
+symbols available to JIT'd code), to dynamic code generation based on symbol
+names, and even lazy compilation.
+
+One immediate benefit of the symbol resolution rule is that we can now extend
+the language by writing arbitrary C++ code to implement operations. For example,
+if we add:
+
+.. code-block:: c++
+
+    /// putchard - putchar that takes a double and returns 0.
+    extern "C" double putchard(double X) {
+      fputc((char)X, stderr);
+      return 0;
+    }
+
+Now we can produce simple output to the console by using things like:
+"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
+on the console (120 is the ASCII code for 'x'). Similar code could be
+used to implement file I/O, console input, and many other capabilities
+in Kaleidoscope.
+
+This completes the JIT and optimizer chapter of the Kaleidoscope
+tutorial. At this point, we can compile a non-Turing-complete
+programming language, optimize and JIT compile it in a user-driven way.
+Next up we'll look into `extending the language with control flow
+constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues
+along the way.
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+the LLVM JIT and optimizer. To build this example, use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+    # Run
+    ./toy
+
+If you are compiling this on Linux, make sure to add the "-rdynamic"
+option as well. This makes sure that the external functions are resolved
+properly at runtime.
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp
+   :language: c++
+
+`Next: Extending the language: control flow <LangImpl05.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl05-cfg.png
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl05-cfg.png?rev=274441&view=auto
==============================================================================
Binary file - no diff available.

Propchange: llvm/trunk/docs/tutorial/LangImpl05-cfg.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: llvm/trunk/docs/tutorial/LangImpl05.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl05.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl05.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl05.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,790 @@
+==================================================
+Kaleidoscope: Extending the Language: Control Flow
+==================================================
+
+.. contents::
+   :local:
+
+Chapter 5 Introduction
+======================
+
+Welcome to Chapter 5 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of
+the simple Kaleidoscope language and included support for generating
+LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as
+presented, Kaleidoscope is mostly useless: it has no control flow other
+than call and return. This means that you can't have conditional
+branches in the code, significantly limiting its power. In this episode
+of "build that compiler", we'll extend Kaleidoscope to have an
+if/then/else expression plus a simple 'for' loop.
+
+If/Then/Else
+============
+
+Extending Kaleidoscope to support if/then/else is quite straightforward.
+It basically requires adding support for this "new" concept to the
+lexer, parser, AST, and LLVM code emitter. This example is nice, because
+it shows how easy it is to "grow" a language over time, incrementally
+extending it as new ideas are discovered.
+
+Before we get going on "how" we add this extension, lets talk about
+"what" we want. The basic idea is that we want to be able to write this
+sort of thing:
+
+::
+
+    def fib(x)
+      if x < 3 then
+        1
+      else
+        fib(x-1)+fib(x-2);
+
+In Kaleidoscope, every construct is an expression: there are no
+statements. As such, the if/then/else expression needs to return a value
+like any other. Since we're using a mostly functional form, we'll have
+it evaluate its conditional, then return the 'then' or 'else' value
+based on how the condition was resolved. This is very similar to the C
+"?:" expression.
+
+The semantics of the if/then/else expression is that it evaluates the
+condition to a boolean equality value: 0.0 is considered to be false and
+everything else is considered to be true. If the condition is true, the
+first subexpression is evaluated and returned, if the condition is
+false, the second subexpression is evaluated and returned. Since
+Kaleidoscope allows side-effects, this behavior is important to nail
+down.
+
+Now that we know what we "want", lets break this down into its
+constituent pieces.
+
+Lexer Extensions for If/Then/Else
+---------------------------------
+
+The lexer extensions are straightforward. First we add new enum values
+for the relevant tokens:
+
+.. code-block:: c++
+
+      // control
+      tok_if = -6,
+      tok_then = -7,
+      tok_else = -8,
+
+Once we have that, we recognize the new keywords in the lexer. This is
+pretty simple stuff:
+
+.. code-block:: c++
+
+        ...
+        if (IdentifierStr == "def")
+          return tok_def;
+        if (IdentifierStr == "extern")
+          return tok_extern;
+        if (IdentifierStr == "if")
+          return tok_if;
+        if (IdentifierStr == "then")
+          return tok_then;
+        if (IdentifierStr == "else")
+          return tok_else;
+        return tok_identifier;
+
+AST Extensions for If/Then/Else
+-------------------------------
+
+To represent the new expression we add a new AST node for it:
+
+.. code-block:: c++
+
+    /// IfExprAST - Expression class for if/then/else.
+    class IfExprAST : public ExprAST {
+      std::unique_ptr<ExprAST> Cond, Then, Else;
+
+    public:
+      IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,
+                std::unique_ptr<ExprAST> Else)
+        : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}
+      virtual Value *codegen();
+    };
+
+The AST node just has pointers to the various subexpressions.
+
+Parser Extensions for If/Then/Else
+----------------------------------
+
+Now that we have the relevant tokens coming from the lexer and we have
+the AST node to build, our parsing logic is relatively straightforward.
+First we define a new parsing function:
+
+.. code-block:: c++
+
+    /// ifexpr ::= 'if' expression 'then' expression 'else' expression
+    static std::unique_ptr<ExprAST> ParseIfExpr() {
+      getNextToken();  // eat the if.
+
+      // condition.
+      auto Cond = ParseExpression();
+      if (!Cond)
+        return nullptr;
+
+      if (CurTok != tok_then)
+        return LogError("expected then");
+      getNextToken();  // eat the then
+
+      auto Then = ParseExpression();
+      if (!Then)
+        return nullptr;
+
+      if (CurTok != tok_else)
+        return LogError("expected else");
+
+      getNextToken();
+
+      auto Else = ParseExpression();
+      if (!Else)
+        return nullptr;
+
+      return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),
+                                          std::move(Else));
+    }
+
+Next we hook it up as a primary expression:
+
+.. code-block:: c++
+
+    static std::unique_ptr<ExprAST> ParsePrimary() {
+      switch (CurTok) {
+      default:
+        return LogError("unknown token when expecting an expression");
+      case tok_identifier:
+        return ParseIdentifierExpr();
+      case tok_number:
+        return ParseNumberExpr();
+      case '(':
+        return ParseParenExpr();
+      case tok_if:
+        return ParseIfExpr();
+      }
+    }
+
+LLVM IR for If/Then/Else
+------------------------
+
+Now that we have it parsing and building the AST, the final piece is
+adding LLVM code generation support. This is the most interesting part
+of the if/then/else example, because this is where it starts to
+introduce new concepts. All of the code above has been thoroughly
+described in previous chapters.
+
+To motivate the code we want to produce, lets take a look at a simple
+example. Consider:
+
+::
+
+    extern foo();
+    extern bar();
+    def baz(x) if x then foo() else bar();
+
+If you disable optimizations, the code you'll (soon) get from
+Kaleidoscope looks like this:
+
+.. code-block:: llvm
+
+    declare double @foo()
+
+    declare double @bar()
+
+    define double @baz(double %x) {
+    entry:
+      %ifcond = fcmp one double %x, 0.000000e+00
+      br i1 %ifcond, label %then, label %else
+
+    then:       ; preds = %entry
+      %calltmp = call double @foo()
+      br label %ifcont
+
+    else:       ; preds = %entry
+      %calltmp1 = call double @bar()
+      br label %ifcont
+
+    ifcont:     ; preds = %else, %then
+      %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
+      ret double %iftmp
+    }
+
+To visualize the control flow graph, you can use a nifty feature of the
+LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM
+IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a
+window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll
+see this graph:
+
+.. figure:: LangImpl05-cfg.png
+   :align: center
+   :alt: Example CFG
+
+   Example CFG
+
+Another way to get this is to call "``F->viewCFG()``" or
+"``F->viewCFGOnly()``" (where F is a "``Function*``") either by
+inserting actual calls into the code and recompiling or by calling these
+in the debugger. LLVM has many nice features for visualizing various
+graphs.
+
+Getting back to the generated code, it is fairly simple: the entry block
+evaluates the conditional expression ("x" in our case here) and compares
+the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered
+and Not Equal"). Based on the result of this expression, the code jumps
+to either the "then" or "else" blocks, which contain the expressions for
+the true/false cases.
+
+Once the then/else blocks are finished executing, they both branch back
+to the 'ifcont' block to execute the code that happens after the
+if/then/else. In this case the only thing left to do is to return to the
+caller of the function. The question then becomes: how does the code
+know which expression to return?
+
+The answer to this question involves an important SSA operation: the
+`Phi
+operation <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
+If you're not familiar with SSA, `the wikipedia
+article <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+is a good introduction and there are various other introductions to it
+available on your favorite search engine. The short version is that
+"execution" of the Phi operation requires "remembering" which block
+control came from. The Phi operation takes on the value corresponding to
+the input control block. In this case, if control comes in from the
+"then" block, it gets the value of "calltmp". If control comes from the
+"else" block, it gets the value of "calltmp1".
+
+At this point, you are probably starting to think "Oh no! This means my
+simple and elegant front-end will have to start generating SSA form in
+order to use LLVM!". Fortunately, this is not the case, and we strongly
+advise *not* implementing an SSA construction algorithm in your
+front-end unless there is an amazingly good reason to do so. In
+practice, there are two sorts of values that float around in code
+written for your average imperative programming language that might need
+Phi nodes:
+
+#. Code that involves user variables: ``x = 1; x = x + 1;``
+#. Values that are implicit in the structure of your AST, such as the
+   Phi node in this case.
+
+In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"),
+we'll talk about #1 in depth. For now, just believe me that you don't
+need SSA construction to handle this case. For #2, you have the choice
+of using the techniques that we will describe for #1, or you can insert
+Phi nodes directly, if convenient. In this case, it is really
+easy to generate the Phi node, so we choose to do it directly.
+
+Okay, enough of the motivation and overview, lets generate code!
+
+Code Generation for If/Then/Else
+--------------------------------
+
+In order to generate code for this, we implement the ``codegen`` method
+for ``IfExprAST``:
+
+.. code-block:: c++
+
+    Value *IfExprAST::codegen() {
+      Value *CondV = Cond->codegen();
+      if (!CondV)
+        return nullptr;
+
+      // Convert condition to a bool by comparing equal to 0.0.
+      CondV = Builder.CreateFCmpONE(
+          CondV, ConstantFP::get(LLVMContext, APFloat(0.0)), "ifcond");
+
+This code is straightforward and similar to what we saw before. We emit
+the expression for the condition, then compare that value to zero to get
+a truth value as a 1-bit (bool) value.
+
+.. code-block:: c++
+
+      Function *TheFunction = Builder.GetInsertBlock()->getParent();
+
+      // Create blocks for the then and else cases.  Insert the 'then' block at the
+      // end of the function.
+      BasicBlock *ThenBB =
+          BasicBlock::Create(LLVMContext, "then", TheFunction);
+      BasicBlock *ElseBB = BasicBlock::Create(LLVMContext, "else");
+      BasicBlock *MergeBB = BasicBlock::Create(LLVMContext, "ifcont");
+
+      Builder.CreateCondBr(CondV, ThenBB, ElseBB);
+
+This code creates the basic blocks that are related to the if/then/else
+statement, and correspond directly to the blocks in the example above.
+The first line gets the current Function object that is being built. It
+gets this by asking the builder for the current BasicBlock, and asking
+that block for its "parent" (the function it is currently embedded
+into).
+
+Once it has that, it creates three blocks. Note that it passes
+"TheFunction" into the constructor for the "then" block. This causes the
+constructor to automatically insert the new block into the end of the
+specified function. The other two blocks are created, but aren't yet
+inserted into the function.
+
+Once the blocks are created, we can emit the conditional branch that
+chooses between them. Note that creating new blocks does not implicitly
+affect the IRBuilder, so it is still inserting into the block that the
+condition went into. Also note that it is creating a branch to the
+"then" block and the "else" block, even though the "else" block isn't
+inserted into the function yet. This is all ok: it is the standard way
+that LLVM supports forward references.
+
+.. code-block:: c++
+
+      // Emit then value.
+      Builder.SetInsertPoint(ThenBB);
+
+      Value *ThenV = Then->codegen();
+      if (!ThenV)
+        return nullptr;
+
+      Builder.CreateBr(MergeBB);
+      // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
+      ThenBB = Builder.GetInsertBlock();
+
+After the conditional branch is inserted, we move the builder to start
+inserting into the "then" block. Strictly speaking, this call moves the
+insertion point to be at the end of the specified block. However, since
+the "then" block is empty, it also starts out by inserting at the
+beginning of the block. :)
+
+Once the insertion point is set, we recursively codegen the "then"
+expression from the AST. To finish off the "then" block, we create an
+unconditional branch to the merge block. One interesting (and very
+important) aspect of the LLVM IR is that it `requires all basic blocks
+to be "terminated" <../LangRef.html#functionstructure>`_ with a `control
+flow instruction <../LangRef.html#terminators>`_ such as return or
+branch. This means that all control flow, *including fall throughs* must
+be made explicit in the LLVM IR. If you violate this rule, the verifier
+will emit an error.
+
+The final line here is quite subtle, but is very important. The basic
+issue is that when we create the Phi node in the merge block, we need to
+set up the block/value pairs that indicate how the Phi will work.
+Importantly, the Phi node expects to have an entry for each predecessor
+of the block in the CFG. Why then, are we getting the current block when
+we just set it to ThenBB 5 lines above? The problem is that the "Then"
+expression may actually itself change the block that the Builder is
+emitting into if, for example, it contains a nested "if/then/else"
+expression. Because calling ``codegen()`` recursively could arbitrarily change
+the notion of the current block, we are required to get an up-to-date
+value for code that will set up the Phi node.
+
+.. code-block:: c++
+
+      // Emit else block.
+      TheFunction->getBasicBlockList().push_back(ElseBB);
+      Builder.SetInsertPoint(ElseBB);
+
+      Value *ElseV = Else->codegen();
+      if (!ElseV)
+        return nullptr;
+
+      Builder.CreateBr(MergeBB);
+      // codegen of 'Else' can change the current block, update ElseBB for the PHI.
+      ElseBB = Builder.GetInsertBlock();
+
+Code generation for the 'else' block is basically identical to codegen
+for the 'then' block. The only significant difference is the first line,
+which adds the 'else' block to the function. Recall previously that the
+'else' block was created, but not added to the function. Now that the
+'then' and 'else' blocks are emitted, we can finish up with the merge
+code:
+
+.. code-block:: c++
+
+      // Emit merge block.
+      TheFunction->getBasicBlockList().push_back(MergeBB);
+      Builder.SetInsertPoint(MergeBB);
+      PHINode *PN =
+        Builder.CreatePHI(Type::getDoubleTy(LLVMContext), 2, "iftmp");
+
+      PN->addIncoming(ThenV, ThenBB);
+      PN->addIncoming(ElseV, ElseBB);
+      return PN;
+    }
+
+The first two lines here are now familiar: the first adds the "merge"
+block to the Function object (it was previously floating, like the else
+block above). The second changes the insertion point so that newly
+created code will go into the "merge" block. Once that is done, we need
+to create the PHI node and set up the block/value pairs for the PHI.
+
+Finally, the CodeGen function returns the phi node as the value computed
+by the if/then/else expression. In our example above, this returned
+value will feed into the code for the top-level function, which will
+create the return instruction.
+
+Overall, we now have the ability to execute conditional code in
+Kaleidoscope. With this extension, Kaleidoscope is a fairly complete
+language that can calculate a wide variety of numeric functions. Next up
+we'll add another useful expression that is familiar from non-functional
+languages...
+
+'for' Loop Expression
+=====================
+
+Now that we know how to add basic control flow constructs to the
+language, we have the tools to add more powerful things. Lets add
+something more aggressive, a 'for' expression:
+
+::
+
+     extern putchard(char)
+     def printstar(n)
+       for i = 1, i < n, 1.0 in
+         putchard(42);  # ascii 42 = '*'
+
+     # print 100 '*' characters
+     printstar(100);
+
+This expression defines a new variable ("i" in this case) which iterates
+from a starting value, while the condition ("i < n" in this case) is
+true, incrementing by an optional step value ("1.0" in this case). If
+the step value is omitted, it defaults to 1.0. While the loop is true,
+it executes its body expression. Because we don't have anything better
+to return, we'll just define the loop as always returning 0.0. In the
+future when we have mutable variables, it will get more useful.
+
+As before, lets talk about the changes that we need to Kaleidoscope to
+support this.
+
+Lexer Extensions for the 'for' Loop
+-----------------------------------
+
+The lexer extensions are the same sort of thing as for if/then/else:
+
+.. code-block:: c++
+
+      ... in enum Token ...
+      // control
+      tok_if = -6, tok_then = -7, tok_else = -8,
+      tok_for = -9, tok_in = -10
+
+      ... in gettok ...
+      if (IdentifierStr == "def")
+        return tok_def;
+      if (IdentifierStr == "extern")
+        return tok_extern;
+      if (IdentifierStr == "if")
+        return tok_if;
+      if (IdentifierStr == "then")
+        return tok_then;
+      if (IdentifierStr == "else")
+        return tok_else;
+      if (IdentifierStr == "for")
+        return tok_for;
+      if (IdentifierStr == "in")
+        return tok_in;
+      return tok_identifier;
+
+AST Extensions for the 'for' Loop
+---------------------------------
+
+The AST node is just as simple. It basically boils down to capturing the
+variable name and the constituent expressions in the node.
+
+.. code-block:: c++
+
+    /// ForExprAST - Expression class for for/in.
+    class ForExprAST : public ExprAST {
+      std::string VarName;
+      std::unique_ptr<ExprAST> Start, End, Step, Body;
+
+    public:
+      ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,
+                 std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,
+                 std::unique_ptr<ExprAST> Body)
+        : VarName(VarName), Start(std::move(Start)), End(std::move(End)),
+          Step(std::move(Step)), Body(std::move(Body)) {}
+      virtual Value *codegen();
+    };
+
+Parser Extensions for the 'for' Loop
+------------------------------------
+
+The parser code is also fairly standard. The only interesting thing here
+is handling of the optional step value. The parser code handles it by
+checking to see if the second comma is present. If not, it sets the step
+value to null in the AST node:
+
+.. code-block:: c++
+
+    /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
+    static std::unique_ptr<ExprAST> ParseForExpr() {
+      getNextToken();  // eat the for.
+
+      if (CurTok != tok_identifier)
+        return LogError("expected identifier after for");
+
+      std::string IdName = IdentifierStr;
+      getNextToken();  // eat identifier.
+
+      if (CurTok != '=')
+        return LogError("expected '=' after for");
+      getNextToken();  // eat '='.
+
+
+      auto Start = ParseExpression();
+      if (!Start)
+        return nullptr;
+      if (CurTok != ',')
+        return LogError("expected ',' after for start value");
+      getNextToken();
+
+      auto End = ParseExpression();
+      if (!End)
+        return nullptr;
+
+      // The step value is optional.
+      std::unique_ptr<ExprAST> Step;
+      if (CurTok == ',') {
+        getNextToken();
+        Step = ParseExpression();
+        if (!Step)
+          return nullptr;
+      }
+
+      if (CurTok != tok_in)
+        return LogError("expected 'in' after for");
+      getNextToken();  // eat 'in'.
+
+      auto Body = ParseExpression();
+      if (!Body)
+        return nullptr;
+
+      return llvm::make_unique<ForExprAST>(IdName, std::move(Start),
+                                           std::move(End), std::move(Step),
+                                           std::move(Body));
+    }
+
+LLVM IR for the 'for' Loop
+--------------------------
+
+Now we get to the good part: the LLVM IR we want to generate for this
+thing. With the simple example above, we get this LLVM IR (note that
+this dump is generated with optimizations disabled for clarity):
+
+.. code-block:: llvm
+
+    declare double @putchard(double)
+
+    define double @printstar(double %n) {
+    entry:
+      ; initial value = 1.0 (inlined into phi)
+      br label %loop
+
+    loop:       ; preds = %loop, %entry
+      %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
+      ; body
+      %calltmp = call double @putchard(double 4.200000e+01)
+      ; increment
+      %nextvar = fadd double %i, 1.000000e+00
+
+      ; termination test
+      %cmptmp = fcmp ult double %i, %n
+      %booltmp = uitofp i1 %cmptmp to double
+      %loopcond = fcmp one double %booltmp, 0.000000e+00
+      br i1 %loopcond, label %loop, label %afterloop
+
+    afterloop:      ; preds = %loop
+      ; loop always returns 0.0
+      ret double 0.000000e+00
+    }
+
+This loop contains all the same constructs we saw before: a phi node,
+several expressions, and some basic blocks. Lets see how this fits
+together.
+
+Code Generation for the 'for' Loop
+----------------------------------
+
+The first part of codegen is very simple: we just output the start
+expression for the loop value:
+
+.. code-block:: c++
+
+    Value *ForExprAST::codegen() {
+      // Emit the start code first, without 'variable' in scope.
+      Value *StartVal = Start->codegen();
+      if (StartVal == 0) return 0;
+
+With this out of the way, the next step is to set up the LLVM basic
+block for the start of the loop body. In the case above, the whole loop
+body is one block, but remember that the body code itself could consist
+of multiple blocks (e.g. if it contains an if/then/else or a for/in
+expression).
+
+.. code-block:: c++
+
+      // Make the new basic block for the loop header, inserting after current
+      // block.
+      Function *TheFunction = Builder.GetInsertBlock()->getParent();
+      BasicBlock *PreheaderBB = Builder.GetInsertBlock();
+      BasicBlock *LoopBB =
+          BasicBlock::Create(LLVMContext, "loop", TheFunction);
+
+      // Insert an explicit fall through from the current block to the LoopBB.
+      Builder.CreateBr(LoopBB);
+
+This code is similar to what we saw for if/then/else. Because we will
+need it to create the Phi node, we remember the block that falls through
+into the loop. Once we have that, we create the actual block that starts
+the loop and create an unconditional branch for the fall-through between
+the two blocks.
+
+.. code-block:: c++
+
+      // Start insertion in LoopBB.
+      Builder.SetInsertPoint(LoopBB);
+
+      // Start the PHI node with an entry for Start.
+      PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(LLVMContext),
+                                            2, VarName.c_str());
+      Variable->addIncoming(StartVal, PreheaderBB);
+
+Now that the "preheader" for the loop is set up, we switch to emitting
+code for the loop body. To begin with, we move the insertion point and
+create the PHI node for the loop induction variable. Since we already
+know the incoming value for the starting value, we add it to the Phi
+node. Note that the Phi will eventually get a second value for the
+backedge, but we can't set it up yet (because it doesn't exist!).
+
+.. code-block:: c++
+
+      // Within the loop, the variable is defined equal to the PHI node.  If it
+      // shadows an existing variable, we have to restore it, so save it now.
+      Value *OldVal = NamedValues[VarName];
+      NamedValues[VarName] = Variable;
+
+      // Emit the body of the loop.  This, like any other expr, can change the
+      // current BB.  Note that we ignore the value computed by the body, but don't
+      // allow an error.
+      if (!Body->codegen())
+        return nullptr;
+
+Now the code starts to get more interesting. Our 'for' loop introduces a
+new variable to the symbol table. This means that our symbol table can
+now contain either function arguments or loop variables. To handle this,
+before we codegen the body of the loop, we add the loop variable as the
+current value for its name. Note that it is possible that there is a
+variable of the same name in the outer scope. It would be easy to make
+this an error (emit an error and return null if there is already an
+entry for VarName) but we choose to allow shadowing of variables. In
+order to handle this correctly, we remember the Value that we are
+potentially shadowing in ``OldVal`` (which will be null if there is no
+shadowed variable).
+
+Once the loop variable is set into the symbol table, the code
+recursively codegen's the body. This allows the body to use the loop
+variable: any references to it will naturally find it in the symbol
+table.
+
+.. code-block:: c++
+
+      // Emit the step value.
+      Value *StepVal = nullptr;
+      if (Step) {
+        StepVal = Step->codegen();
+        if (!StepVal)
+          return nullptr;
+      } else {
+        // If not specified, use 1.0.
+        StepVal = ConstantFP::get(LLVMContext, APFloat(1.0));
+      }
+
+      Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
+
+Now that the body is emitted, we compute the next value of the iteration
+variable by adding the step value, or 1.0 if it isn't present.
+'``NextVar``' will be the value of the loop variable on the next
+iteration of the loop.
+
+.. code-block:: c++
+
+      // Compute the end condition.
+      Value *EndCond = End->codegen();
+      if (!EndCond)
+        return nullptr;
+
+      // Convert condition to a bool by comparing equal to 0.0.
+      EndCond = Builder.CreateFCmpONE(
+          EndCond, ConstantFP::get(LLVMContext, APFloat(0.0)), "loopcond");
+
+Finally, we evaluate the exit value of the loop, to determine whether
+the loop should exit. This mirrors the condition evaluation for the
+if/then/else statement.
+
+.. code-block:: c++
+
+      // Create the "after loop" block and insert it.
+      BasicBlock *LoopEndBB = Builder.GetInsertBlock();
+      BasicBlock *AfterBB =
+          BasicBlock::Create(LLVMContext, "afterloop", TheFunction);
+
+      // Insert the conditional branch into the end of LoopEndBB.
+      Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
+
+      // Any new code will be inserted in AfterBB.
+      Builder.SetInsertPoint(AfterBB);
+
+With the code for the body of the loop complete, we just need to finish
+up the control flow for it. This code remembers the end block (for the
+phi node), then creates the block for the loop exit ("afterloop"). Based
+on the value of the exit condition, it creates a conditional branch that
+chooses between executing the loop again and exiting the loop. Any
+future code is emitted in the "afterloop" block, so it sets the
+insertion position to it.
+
+.. code-block:: c++
+
+      // Add a new entry to the PHI node for the backedge.
+      Variable->addIncoming(NextVar, LoopEndBB);
+
+      // Restore the unshadowed variable.
+      if (OldVal)
+        NamedValues[VarName] = OldVal;
+      else
+        NamedValues.erase(VarName);
+
+      // for expr always returns 0.0.
+      return Constant::getNullValue(Type::getDoubleTy(LLVMContext));
+    }
+
+The final code handles various cleanups: now that we have the "NextVar"
+value, we can add the incoming value to the loop PHI node. After that,
+we remove the loop variable from the symbol table, so that it isn't in
+scope after the for loop. Finally, code generation of the for loop
+always returns 0.0, so that is what we return from
+``ForExprAST::codegen()``.
+
+With this, we conclude the "adding control flow to Kaleidoscope" chapter
+of the tutorial. In this chapter we added two control flow constructs,
+and used them to motivate a couple of aspects of the LLVM IR that are
+important for front-end implementors to know. In the next chapter of our
+saga, we will get a bit crazier and add `user-defined
+operators <LangImpl6.html>`_ to our poor innocent language.
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+the if/then/else and for expressions.. To build this example, use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+    # Run
+    ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter5/toy.cpp
+   :language: c++
+
+`Next: Extending the language: user-defined operators <LangImpl06.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl06.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl06.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl06.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl06.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,768 @@
+============================================================
+Kaleidoscope: Extending the Language: User-defined Operators
+============================================================
+
+.. contents::
+   :local:
+
+Chapter 6 Introduction
+======================
+
+Welcome to Chapter 6 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
+have a fully functional language that is fairly minimal, but also
+useful. There is still one big problem with it, however. Our language
+doesn't have many useful operators (like division, logical negation, or
+even any comparisons besides less-than).
+
+This chapter of the tutorial takes a wild digression into adding
+user-defined operators to the simple and beautiful Kaleidoscope
+language. This digression now gives us a simple and ugly language in
+some ways, but also a powerful one at the same time. One of the great
+things about creating your own language is that you get to decide what
+is good or bad. In this tutorial we'll assume that it is okay to use
+this as a way to show some interesting parsing techniques.
+
+At the end of this tutorial, we'll run through an example Kaleidoscope
+application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
+example of what you can build with Kaleidoscope and its feature set.
+
+User-defined Operators: the Idea
+================================
+
+The "operator overloading" that we will add to Kaleidoscope is more
+general than languages like C++. In C++, you are only allowed to
+redefine existing operators: you can't programatically change the
+grammar, introduce new operators, change precedence levels, etc. In this
+chapter, we will add this capability to Kaleidoscope, which will let the
+user round out the set of operators that are supported.
+
+The point of going into user-defined operators in a tutorial like this
+is to show the power and flexibility of using a hand-written parser.
+Thus far, the parser we have been implementing uses recursive descent
+for most parts of the grammar and operator precedence parsing for the
+expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
+using operator precedence parsing, it would be very difficult to allow
+the programmer to introduce new operators into the grammar: the grammar
+is dynamically extensible as the JIT runs.
+
+The two specific features we'll add are programmable unary operators
+(right now, Kaleidoscope has no unary operators at all) as well as
+binary operators. An example of this is:
+
+::
+
+    # Logical unary not.
+    def unary!(v)
+      if v then
+        0
+      else
+        1;
+
+    # Define > with the same precedence as <.
+    def binary> 10 (LHS RHS)
+      RHS < LHS;
+
+    # Binary "logical or", (note that it does not "short circuit")
+    def binary| 5 (LHS RHS)
+      if LHS then
+        1
+      else if RHS then
+        1
+      else
+        0;
+
+    # Define = with slightly lower precedence than relationals.
+    def binary= 9 (LHS RHS)
+      !(LHS < RHS | LHS > RHS);
+
+Many languages aspire to being able to implement their standard runtime
+library in the language itself. In Kaleidoscope, we can implement
+significant parts of the language in the library!
+
+We will break down implementation of these features into two parts:
+implementing support for user-defined binary operators and adding unary
+operators.
+
+User-defined Binary Operators
+=============================
+
+Adding support for user-defined binary operators is pretty simple with
+our current framework. We'll first add support for the unary/binary
+keywords:
+
+.. code-block:: c++
+
+    enum Token {
+      ...
+      // operators
+      tok_binary = -11,
+      tok_unary = -12
+    };
+    ...
+    static int gettok() {
+    ...
+        if (IdentifierStr == "for")
+          return tok_for;
+        if (IdentifierStr == "in")
+          return tok_in;
+        if (IdentifierStr == "binary")
+          return tok_binary;
+        if (IdentifierStr == "unary")
+          return tok_unary;
+        return tok_identifier;
+
+This just adds lexer support for the unary and binary keywords, like we
+did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing
+about our current AST, is that we represent binary operators with full
+generalisation by using their ASCII code as the opcode. For our extended
+operators, we'll use this same representation, so we don't need any new
+AST or parser support.
+
+On the other hand, we have to be able to represent the definitions of
+these new operators, in the "def binary\| 5" part of the function
+definition. In our grammar so far, the "name" for the function
+definition is parsed as the "prototype" production and into the
+``PrototypeAST`` AST node. To represent our new user-defined operators
+as prototypes, we have to extend the ``PrototypeAST`` AST node like
+this:
+
+.. code-block:: c++
+
+    /// PrototypeAST - This class represents the "prototype" for a function,
+    /// which captures its argument names as well as if it is an operator.
+    class PrototypeAST {
+      std::string Name;
+      std::vector<std::string> Args;
+      bool IsOperator;
+      unsigned Precedence;  // Precedence if a binary op.
+
+    public:
+      PrototypeAST(const std::string &name, std::vector<std::string> Args,
+                   bool IsOperator = false, unsigned Prec = 0)
+      : Name(name), Args(std::move(Args)), IsOperator(IsOperator),
+        Precedence(Prec) {}
+
+      bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
+      bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
+
+      char getOperatorName() const {
+        assert(isUnaryOp() || isBinaryOp());
+        return Name[Name.size()-1];
+      }
+
+      unsigned getBinaryPrecedence() const { return Precedence; }
+
+      Function *codegen();
+    };
+
+Basically, in addition to knowing a name for the prototype, we now keep
+track of whether it was an operator, and if it was, what precedence
+level the operator is at. The precedence is only used for binary
+operators (as you'll see below, it just doesn't apply for unary
+operators). Now that we have a way to represent the prototype for a
+user-defined operator, we need to parse it:
+
+.. code-block:: c++
+
+    /// prototype
+    ///   ::= id '(' id* ')'
+    ///   ::= binary LETTER number? (id, id)
+    static std::unique_ptr<PrototypeAST> ParsePrototype() {
+      std::string FnName;
+
+      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
+      unsigned BinaryPrecedence = 30;
+
+      switch (CurTok) {
+      default:
+        return LogErrorP("Expected function name in prototype");
+      case tok_identifier:
+        FnName = IdentifierStr;
+        Kind = 0;
+        getNextToken();
+        break;
+      case tok_binary:
+        getNextToken();
+        if (!isascii(CurTok))
+          return LogErrorP("Expected binary operator");
+        FnName = "binary";
+        FnName += (char)CurTok;
+        Kind = 2;
+        getNextToken();
+
+        // Read the precedence if present.
+        if (CurTok == tok_number) {
+          if (NumVal < 1 || NumVal > 100)
+            return LogErrorP("Invalid precedecnce: must be 1..100");
+          BinaryPrecedence = (unsigned)NumVal;
+          getNextToken();
+        }
+        break;
+      }
+
+      if (CurTok != '(')
+        return LogErrorP("Expected '(' in prototype");
+
+      std::vector<std::string> ArgNames;
+      while (getNextToken() == tok_identifier)
+        ArgNames.push_back(IdentifierStr);
+      if (CurTok != ')')
+        return LogErrorP("Expected ')' in prototype");
+
+      // success.
+      getNextToken();  // eat ')'.
+
+      // Verify right number of names for operator.
+      if (Kind && ArgNames.size() != Kind)
+        return LogErrorP("Invalid number of operands for operator");
+
+      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,
+                                             BinaryPrecedence);
+    }
+
+This is all fairly straightforward parsing code, and we have already
+seen a lot of similar code in the past. One interesting part about the
+code above is the couple lines that set up ``FnName`` for binary
+operators. This builds names like "binary@" for a newly defined "@"
+operator. This then takes advantage of the fact that symbol names in the
+LLVM symbol table are allowed to have any character in them, including
+embedded nul characters.
+
+The next interesting thing to add, is codegen support for these binary
+operators. Given our current structure, this is a simple addition of a
+default case for our existing binary operator node:
+
+.. code-block:: c++
+
+    Value *BinaryExprAST::codegen() {
+      Value *L = LHS->codegen();
+      Value *R = RHS->codegen();
+      if (!L || !R)
+        return nullptr;
+
+      switch (Op) {
+      case '+':
+        return Builder.CreateFAdd(L, R, "addtmp");
+      case '-':
+        return Builder.CreateFSub(L, R, "subtmp");
+      case '*':
+        return Builder.CreateFMul(L, R, "multmp");
+      case '<':
+        L = Builder.CreateFCmpULT(L, R, "cmptmp");
+        // Convert bool 0/1 to double 0.0 or 1.0
+        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),
+                                    "booltmp");
+      default:
+        break;
+      }
+
+      // If it wasn't a builtin binary operator, it must be a user defined one. Emit
+      // a call to it.
+      Function *F = TheModule->getFunction(std::string("binary") + Op);
+      assert(F && "binary operator not found!");
+
+      Value *Ops[2] = { L, R };
+      return Builder.CreateCall(F, Ops, "binop");
+    }
+
+As you can see above, the new code is actually really simple. It just
+does a lookup for the appropriate operator in the symbol table and
+generates a function call to it. Since user-defined operators are just
+built as normal functions (because the "prototype" boils down to a
+function with the right name) everything falls into place.
+
+The final piece of code we are missing, is a bit of top-level magic:
+
+.. code-block:: c++
+
+    Function *FunctionAST::codegen() {
+      NamedValues.clear();
+
+      Function *TheFunction = Proto->codegen();
+      if (!TheFunction)
+        return nullptr;
+
+      // If this is an operator, install it.
+      if (Proto->isBinaryOp())
+        BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
+
+      // Create a new basic block to start insertion into.
+      BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);
+      Builder.SetInsertPoint(BB);
+
+      if (Value *RetVal = Body->codegen()) {
+        ...
+
+Basically, before codegening a function, if it is a user-defined
+operator, we register it in the precedence table. This allows the binary
+operator parsing logic we already have in place to handle it. Since we
+are working on a fully-general operator precedence parser, this is all
+we need to do to "extend the grammar".
+
+Now we have useful user-defined binary operators. This builds a lot on
+the previous framework we built for other operators. Adding unary
+operators is a bit more challenging, because we don't have any framework
+for it yet - lets see what it takes.
+
+User-defined Unary Operators
+============================
+
+Since we don't currently support unary operators in the Kaleidoscope
+language, we'll need to add everything to support them. Above, we added
+simple support for the 'unary' keyword to the lexer. In addition to
+that, we need an AST node:
+
+.. code-block:: c++
+
+    /// UnaryExprAST - Expression class for a unary operator.
+    class UnaryExprAST : public ExprAST {
+      char Opcode;
+      std::unique_ptr<ExprAST> Operand;
+
+    public:
+      UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
+        : Opcode(Opcode), Operand(std::move(Operand)) {}
+      virtual Value *codegen();
+    };
+
+This AST node is very simple and obvious by now. It directly mirrors the
+binary operator AST node, except that it only has one child. With this,
+we need to add the parsing logic. Parsing a unary operator is pretty
+simple: we'll add a new function to do it:
+
+.. code-block:: c++
+
+    /// unary
+    ///   ::= primary
+    ///   ::= '!' unary
+    static std::unique_ptr<ExprAST> ParseUnary() {
+      // If the current token is not an operator, it must be a primary expr.
+      if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
+        return ParsePrimary();
+
+      // If this is a unary operator, read it.
+      int Opc = CurTok;
+      getNextToken();
+      if (auto Operand = ParseUnary())
+        return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));
+      return nullptr;
+    }
+
+The grammar we add is pretty straightforward here. If we see a unary
+operator when parsing a primary operator, we eat the operator as a
+prefix and parse the remaining piece as another unary operator. This
+allows us to handle multiple unary operators (e.g. "!!x"). Note that
+unary operators can't have ambiguous parses like binary operators can,
+so there is no need for precedence information.
+
+The problem with this function, is that we need to call ParseUnary from
+somewhere. To do this, we change previous callers of ParsePrimary to
+call ParseUnary instead:
+
+.. code-block:: c++
+
+    /// binoprhs
+    ///   ::= ('+' unary)*
+    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
+                                                  std::unique_ptr<ExprAST> LHS) {
+      ...
+        // Parse the unary expression after the binary operator.
+        auto RHS = ParseUnary();
+        if (!RHS)
+          return nullptr;
+      ...
+    }
+    /// expression
+    ///   ::= unary binoprhs
+    ///
+    static std::unique_ptr<ExprAST> ParseExpression() {
+      auto LHS = ParseUnary();
+      if (!LHS)
+        return nullptr;
+
+      return ParseBinOpRHS(0, std::move(LHS));
+    }
+
+With these two simple changes, we are now able to parse unary operators
+and build the AST for them. Next up, we need to add parser support for
+prototypes, to parse the unary operator prototype. We extend the binary
+operator code above with:
+
+.. code-block:: c++
+
+    /// prototype
+    ///   ::= id '(' id* ')'
+    ///   ::= binary LETTER number? (id, id)
+    ///   ::= unary LETTER (id)
+    static std::unique_ptr<PrototypeAST> ParsePrototype() {
+      std::string FnName;
+
+      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
+      unsigned BinaryPrecedence = 30;
+
+      switch (CurTok) {
+      default:
+        return LogErrorP("Expected function name in prototype");
+      case tok_identifier:
+        FnName = IdentifierStr;
+        Kind = 0;
+        getNextToken();
+        break;
+      case tok_unary:
+        getNextToken();
+        if (!isascii(CurTok))
+          return LogErrorP("Expected unary operator");
+        FnName = "unary";
+        FnName += (char)CurTok;
+        Kind = 1;
+        getNextToken();
+        break;
+      case tok_binary:
+        ...
+
+As with binary operators, we name unary operators with a name that
+includes the operator character. This assists us at code generation
+time. Speaking of, the final piece we need to add is codegen support for
+unary operators. It looks like this:
+
+.. code-block:: c++
+
+    Value *UnaryExprAST::codegen() {
+      Value *OperandV = Operand->codegen();
+      if (!OperandV)
+        return nullptr;
+
+      Function *F = TheModule->getFunction(std::string("unary")+Opcode);
+      if (!F)
+        return LogErrorV("Unknown unary operator");
+
+      return Builder.CreateCall(F, OperandV, "unop");
+    }
+
+This code is similar to, but simpler than, the code for binary
+operators. It is simpler primarily because it doesn't need to handle any
+predefined operators.
+
+Kicking the Tires
+=================
+
+It is somewhat hard to believe, but with a few simple extensions we've
+covered in the last chapters, we have grown a real-ish language. With
+this, we can do a lot of interesting things, including I/O, math, and a
+bunch of other things. For example, we can now add a nice sequencing
+operator (printd is defined to print out the specified value and a
+newline):
+
+::
+
+    ready> extern printd(x);
+    Read extern:
+    declare double @printd(double)
+
+    ready> def binary : 1 (x y) 0;  # Low-precedence operator that ignores operands.
+    ..
+    ready> printd(123) : printd(456) : printd(789);
+    123.000000
+    456.000000
+    789.000000
+    Evaluated to 0.000000
+
+We can also define a bunch of other "primitive" operations, such as:
+
+::
+
+    # Logical unary not.
+    def unary!(v)
+      if v then
+        0
+      else
+        1;
+
+    # Unary negate.
+    def unary-(v)
+      0-v;
+
+    # Define > with the same precedence as <.
+    def binary> 10 (LHS RHS)
+      RHS < LHS;
+
+    # Binary logical or, which does not short circuit.
+    def binary| 5 (LHS RHS)
+      if LHS then
+        1
+      else if RHS then
+        1
+      else
+        0;
+
+    # Binary logical and, which does not short circuit.
+    def binary& 6 (LHS RHS)
+      if !LHS then
+        0
+      else
+        !!RHS;
+
+    # Define = with slightly lower precedence than relationals.
+    def binary = 9 (LHS RHS)
+      !(LHS < RHS | LHS > RHS);
+
+    # Define ':' for sequencing: as a low-precedence operator that ignores operands
+    # and just returns the RHS.
+    def binary : 1 (x y) y;
+
+Given the previous if/then/else support, we can also define interesting
+functions for I/O. For example, the following prints out a character
+whose "density" reflects the value passed in: the lower the value, the
+denser the character:
+
+::
+
+    ready>
+
+    extern putchard(char)
+    def printdensity(d)
+      if d > 8 then
+        putchard(32)  # ' '
+      else if d > 4 then
+        putchard(46)  # '.'
+      else if d > 2 then
+        putchard(43)  # '+'
+      else
+        putchard(42); # '*'
+    ...
+    ready> printdensity(1): printdensity(2): printdensity(3):
+           printdensity(4): printdensity(5): printdensity(9):
+           putchard(10);
+    **++.
+    Evaluated to 0.000000
+
+Based on these simple primitive operations, we can start to define more
+interesting things. For example, here's a little function that solves
+for the number of iterations it takes a function in the complex plane to
+converge:
+
+::
+
+    # Determine whether the specific location diverges.
+    # Solve for z = z^2 + c in the complex plane.
+    def mandelconverger(real imag iters creal cimag)
+      if iters > 255 | (real*real + imag*imag > 4) then
+        iters
+      else
+        mandelconverger(real*real - imag*imag + creal,
+                        2*real*imag + cimag,
+                        iters+1, creal, cimag);
+
+    # Return the number of iterations required for the iteration to escape
+    def mandelconverge(real imag)
+      mandelconverger(real, imag, 0, real, imag);
+
+This "``z = z2 + c``" function is a beautiful little creature that is
+the basis for computation of the `Mandelbrot
+Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
+``mandelconverge`` function returns the number of iterations that it
+takes for a complex orbit to escape, saturating to 255. This is not a
+very useful function by itself, but if you plot its value over a
+two-dimensional plane, you can see the Mandelbrot set. Given that we are
+limited to using putchard here, our amazing graphical output is limited,
+but we can whip together something using the density plotter above:
+
+::
+
+    # Compute and plot the mandelbrot set with the specified 2 dimensional range
+    # info.
+    def mandelhelp(xmin xmax xstep   ymin ymax ystep)
+      for y = ymin, y < ymax, ystep in (
+        (for x = xmin, x < xmax, xstep in
+           printdensity(mandelconverge(x,y)))
+        : putchard(10)
+      )
+
+    # mandel - This is a convenient helper function for plotting the mandelbrot set
+    # from the specified position with the specified Magnification.
+    def mandel(realstart imagstart realmag imagmag)
+      mandelhelp(realstart, realstart+realmag*78, realmag,
+                 imagstart, imagstart+imagmag*40, imagmag);
+
+Given this, we can try plotting out the mandelbrot set! Lets try it out:
+
+::
+
+    ready> mandel(-2.3, -1.3, 0.05, 0.07);
+    *******************************+++++++++++*************************************
+    *************************+++++++++++++++++++++++*******************************
+    **********************+++++++++++++++++++++++++++++****************************
+    *******************+++++++++++++++++++++.. ...++++++++*************************
+    *****************++++++++++++++++++++++.... ...+++++++++***********************
+    ***************+++++++++++++++++++++++.....   ...+++++++++*********************
+    **************+++++++++++++++++++++++....     ....+++++++++********************
+    *************++++++++++++++++++++++......      .....++++++++*******************
+    ************+++++++++++++++++++++.......       .......+++++++******************
+    ***********+++++++++++++++++++....                ... .+++++++*****************
+    **********+++++++++++++++++.......                     .+++++++****************
+    *********++++++++++++++...........                    ...+++++++***************
+    ********++++++++++++............                      ...++++++++**************
+    ********++++++++++... ..........                        .++++++++**************
+    *******+++++++++.....                                   .+++++++++*************
+    *******++++++++......                                  ..+++++++++*************
+    *******++++++.......                                   ..+++++++++*************
+    *******+++++......                                     ..+++++++++*************
+    *******.... ....                                      ...+++++++++*************
+    *******.... .                                         ...+++++++++*************
+    *******+++++......                                    ...+++++++++*************
+    *******++++++.......                                   ..+++++++++*************
+    *******++++++++......                                   .+++++++++*************
+    *******+++++++++.....                                  ..+++++++++*************
+    ********++++++++++... ..........                        .++++++++**************
+    ********++++++++++++............                      ...++++++++**************
+    *********++++++++++++++..........                     ...+++++++***************
+    **********++++++++++++++++........                     .+++++++****************
+    **********++++++++++++++++++++....                ... ..+++++++****************
+    ***********++++++++++++++++++++++.......       .......++++++++*****************
+    ************+++++++++++++++++++++++......      ......++++++++******************
+    **************+++++++++++++++++++++++....      ....++++++++********************
+    ***************+++++++++++++++++++++++.....   ...+++++++++*********************
+    *****************++++++++++++++++++++++....  ...++++++++***********************
+    *******************+++++++++++++++++++++......++++++++*************************
+    *********************++++++++++++++++++++++.++++++++***************************
+    *************************+++++++++++++++++++++++*******************************
+    ******************************+++++++++++++************************************
+    *******************************************************************************
+    *******************************************************************************
+    *******************************************************************************
+    Evaluated to 0.000000
+    ready> mandel(-2, -1, 0.02, 0.04);
+    **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
+    ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+    *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
+    *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
+    *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
+    ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
+    **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
+    ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
+    ***********++++++++++++++++++++++++++++++++++++++++++++++++++........        .
+    **********++++++++++++++++++++++++++++++++++++++++++++++.............
+    ********+++++++++++++++++++++++++++++++++++++++++++..................
+    *******+++++++++++++++++++++++++++++++++++++++.......................
+    ******+++++++++++++++++++++++++++++++++++...........................
+    *****++++++++++++++++++++++++++++++++............................
+    *****++++++++++++++++++++++++++++...............................
+    ****++++++++++++++++++++++++++......   .........................
+    ***++++++++++++++++++++++++.........     ......    ...........
+    ***++++++++++++++++++++++............
+    **+++++++++++++++++++++..............
+    **+++++++++++++++++++................
+    *++++++++++++++++++.................
+    *++++++++++++++++............ ...
+    *++++++++++++++..............
+    *+++....++++................
+    *..........  ...........
+    *
+    *..........  ...........
+    *+++....++++................
+    *++++++++++++++..............
+    *++++++++++++++++............ ...
+    *++++++++++++++++++.................
+    **+++++++++++++++++++................
+    **+++++++++++++++++++++..............
+    ***++++++++++++++++++++++............
+    ***++++++++++++++++++++++++.........     ......    ...........
+    ****++++++++++++++++++++++++++......   .........................
+    *****++++++++++++++++++++++++++++...............................
+    *****++++++++++++++++++++++++++++++++............................
+    ******+++++++++++++++++++++++++++++++++++...........................
+    *******+++++++++++++++++++++++++++++++++++++++.......................
+    ********+++++++++++++++++++++++++++++++++++++++++++..................
+    Evaluated to 0.000000
+    ready> mandel(-0.9, -1.4, 0.02, 0.03);
+    *******************************************************************************
+    *******************************************************************************
+    *******************************************************************************
+    **********+++++++++++++++++++++************************************************
+    *+++++++++++++++++++++++++++++++++++++++***************************************
+    +++++++++++++++++++++++++++++++++++++++++++++**********************************
+    ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
+    ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
+    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
+    +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
+    +++++++++++++++++++++++++++++++....   ......+++++++++++++++++++****************
+    +++++++++++++++++++++++++++++.......  ........+++++++++++++++++++**************
+    ++++++++++++++++++++++++++++........   ........++++++++++++++++++++************
+    +++++++++++++++++++++++++++.........     ..  ...+++++++++++++++++++++**********
+    ++++++++++++++++++++++++++...........        ....++++++++++++++++++++++********
+    ++++++++++++++++++++++++.............       .......++++++++++++++++++++++******
+    +++++++++++++++++++++++.............        ........+++++++++++++++++++++++****
+    ++++++++++++++++++++++...........           ..........++++++++++++++++++++++***
+    ++++++++++++++++++++...........                .........++++++++++++++++++++++*
+    ++++++++++++++++++............                  ...........++++++++++++++++++++
+    ++++++++++++++++...............                 .............++++++++++++++++++
+    ++++++++++++++.................                 ...............++++++++++++++++
+    ++++++++++++..................                  .................++++++++++++++
+    +++++++++..................                      .................+++++++++++++
+    ++++++........        .                               .........  ..++++++++++++
+    ++............                                         ......    ....++++++++++
+    ..............                                                    ...++++++++++
+    ..............                                                    ....+++++++++
+    ..............                                                    .....++++++++
+    .............                                                    ......++++++++
+    ...........                                                     .......++++++++
+    .........                                                       ........+++++++
+    .........                                                       ........+++++++
+    .........                                                           ....+++++++
+    ........                                                             ...+++++++
+    .......                                                              ...+++++++
+                                                                        ....+++++++
+                                                                       .....+++++++
+                                                                        ....+++++++
+                                                                        ....+++++++
+                                                                        ....+++++++
+    Evaluated to 0.000000
+    ready> ^D
+
+At this point, you may be starting to realize that Kaleidoscope is a
+real and powerful language. It may not be self-similar :), but it can be
+used to plot things that are!
+
+With this, we conclude the "adding user-defined operators" chapter of
+the tutorial. We have successfully augmented our language, adding the
+ability to extend the language in the library, and we have shown how
+this can be used to build a simple but interesting end-user application
+in Kaleidoscope. At this point, Kaleidoscope can build a variety of
+applications that are functional and can call functions with
+side-effects, but it can't actually define and mutate a variable itself.
+
+Strikingly, variable mutation is an important feature of some languages,
+and it is not at all obvious how to `add support for mutable
+variables <LangImpl7.html>`_ without having to add an "SSA construction"
+phase to your front-end. In the next chapter, we will describe how you
+can add variable mutation without building SSA in your front-end.
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+the if/then/else and for expressions.. To build this example, use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+    # Run
+    ./toy
+
+On some platforms, you will need to specify -rdynamic or
+-Wl,--export-dynamic when linking. This ensures that symbols defined in
+the main executable are exported to the dynamic linker and so are
+available for symbol resolution at run time. This is not needed if you
+compile your support code into a shared library, although doing that
+will cause problems on Windows.
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp
+   :language: c++
+
+`Next: Extending the language: mutable variables / SSA
+construction <LangImpl07.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl07.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl07.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl07.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl07.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,881 @@
+=======================================================
+Kaleidoscope: Extending the Language: Mutable Variables
+=======================================================
+
+.. contents::
+   :local:
+
+Chapter 7 Introduction
+======================
+
+Welcome to Chapter 7 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a
+very respectable, albeit simple, `functional programming
+language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our
+journey, we learned some parsing techniques, how to build and represent
+an AST, how to build LLVM IR, and how to optimize the resultant code as
+well as JIT compile it.
+
+While Kaleidoscope is interesting as a functional language, the fact
+that it is functional makes it "too easy" to generate LLVM IR for it. In
+particular, a functional language makes it very easy to build LLVM IR
+directly in `SSA
+form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
+Since LLVM requires that the input code be in SSA form, this is a very
+nice property and it is often unclear to newcomers how to generate code
+for an imperative language with mutable variables.
+
+The short (and happy) summary of this chapter is that there is no need
+for your front-end to build SSA form: LLVM provides highly tuned and
+well tested support for this, though the way it works is a bit
+unexpected for some.
+
+Why is this a hard problem?
+===========================
+
+To understand why mutable variables cause complexities in SSA
+construction, consider this extremely simple C example:
+
+.. code-block:: c
+
+    int G, H;
+    int test(_Bool Condition) {
+      int X;
+      if (Condition)
+        X = G;
+      else
+        X = H;
+      return X;
+    }
+
+In this case, we have the variable "X", whose value depends on the path
+executed in the program. Because there are two different possible values
+for X before the return instruction, a PHI node is inserted to merge the
+two values. The LLVM IR that we want for this example looks like this:
+
+.. code-block:: llvm
+
+    @G = weak global i32 0   ; type of @G is i32*
+    @H = weak global i32 0   ; type of @H is i32*
+
+    define i32 @test(i1 %Condition) {
+    entry:
+      br i1 %Condition, label %cond_true, label %cond_false
+
+    cond_true:
+      %X.0 = load i32* @G
+      br label %cond_next
+
+    cond_false:
+      %X.1 = load i32* @H
+      br label %cond_next
+
+    cond_next:
+      %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
+      ret i32 %X.2
+    }
+
+In this example, the loads from the G and H global variables are
+explicit in the LLVM IR, and they live in the then/else branches of the
+if statement (cond\_true/cond\_false). In order to merge the incoming
+values, the X.2 phi node in the cond\_next block selects the right value
+to use based on where control flow is coming from: if control flow comes
+from the cond\_false block, X.2 gets the value of X.1. Alternatively, if
+control flow comes from cond\_true, it gets the value of X.0. The intent
+of this chapter is not to explain the details of SSA form. For more
+information, see one of the many `online
+references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
+
+The question for this article is "who places the phi nodes when lowering
+assignments to mutable variables?". The issue here is that LLVM
+*requires* that its IR be in SSA form: there is no "non-ssa" mode for
+it. However, SSA construction requires non-trivial algorithms and data
+structures, so it is inconvenient and wasteful for every front-end to
+have to reproduce this logic.
+
+Memory in LLVM
+==============
+
+The 'trick' here is that while LLVM does require all register values to
+be in SSA form, it does not require (or permit) memory objects to be in
+SSA form. In the example above, note that the loads from G and H are
+direct accesses to G and H: they are not renamed or versioned. This
+differs from some other compiler systems, which do try to version memory
+objects. In LLVM, instead of encoding dataflow analysis of memory into
+the LLVM IR, it is handled with `Analysis
+Passes <../WritingAnLLVMPass.html>`_ which are computed on demand.
+
+With this in mind, the high-level idea is that we want to make a stack
+variable (which lives in memory, because it is on the stack) for each
+mutable object in a function. To take advantage of this trick, we need
+to talk about how LLVM represents stack variables.
+
+In LLVM, all memory accesses are explicit with load/store instructions,
+and it is carefully designed not to have (or need) an "address-of"
+operator. Notice how the type of the @G/@H global variables is actually
+"i32\*" even though the variable is defined as "i32". What this means is
+that @G defines *space* for an i32 in the global data area, but its
+*name* actually refers to the address for that space. Stack variables
+work the same way, except that instead of being declared with global
+variable definitions, they are declared with the `LLVM alloca
+instruction <../LangRef.html#alloca-instruction>`_:
+
+.. code-block:: llvm
+
+    define i32 @example() {
+    entry:
+      %X = alloca i32           ; type of %X is i32*.
+      ...
+      %tmp = load i32* %X       ; load the stack value %X from the stack.
+      %tmp2 = add i32 %tmp, 1   ; increment it
+      store i32 %tmp2, i32* %X  ; store it back
+      ...
+
+This code shows an example of how you can declare and manipulate a stack
+variable in the LLVM IR. Stack memory allocated with the alloca
+instruction is fully general: you can pass the address of the stack slot
+to functions, you can store it in other variables, etc. In our example
+above, we could rewrite the example to use the alloca technique to avoid
+using a PHI node:
+
+.. code-block:: llvm
+
+    @G = weak global i32 0   ; type of @G is i32*
+    @H = weak global i32 0   ; type of @H is i32*
+
+    define i32 @test(i1 %Condition) {
+    entry:
+      %X = alloca i32           ; type of %X is i32*.
+      br i1 %Condition, label %cond_true, label %cond_false
+
+    cond_true:
+      %X.0 = load i32* @G
+      store i32 %X.0, i32* %X   ; Update X
+      br label %cond_next
+
+    cond_false:
+      %X.1 = load i32* @H
+      store i32 %X.1, i32* %X   ; Update X
+      br label %cond_next
+
+    cond_next:
+      %X.2 = load i32* %X       ; Read X
+      ret i32 %X.2
+    }
+
+With this, we have discovered a way to handle arbitrary mutable
+variables without the need to create Phi nodes at all:
+
+#. Each mutable variable becomes a stack allocation.
+#. Each read of the variable becomes a load from the stack.
+#. Each update of the variable becomes a store to the stack.
+#. Taking the address of a variable just uses the stack address
+   directly.
+
+While this solution has solved our immediate problem, it introduced
+another one: we have now apparently introduced a lot of stack traffic
+for very simple and common operations, a major performance problem.
+Fortunately for us, the LLVM optimizer has a highly-tuned optimization
+pass named "mem2reg" that handles this case, promoting allocas like this
+into SSA registers, inserting Phi nodes as appropriate. If you run this
+example through the pass, for example, you'll get:
+
+.. code-block:: bash
+
+    $ llvm-as < example.ll | opt -mem2reg | llvm-dis
+    @G = weak global i32 0
+    @H = weak global i32 0
+
+    define i32 @test(i1 %Condition) {
+    entry:
+      br i1 %Condition, label %cond_true, label %cond_false
+
+    cond_true:
+      %X.0 = load i32* @G
+      br label %cond_next
+
+    cond_false:
+      %X.1 = load i32* @H
+      br label %cond_next
+
+    cond_next:
+      %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
+      ret i32 %X.01
+    }
+
+The mem2reg pass implements the standard "iterated dominance frontier"
+algorithm for constructing SSA form and has a number of optimizations
+that speed up (very common) degenerate cases. The mem2reg optimization
+pass is the answer to dealing with mutable variables, and we highly
+recommend that you depend on it. Note that mem2reg only works on
+variables in certain circumstances:
+
+#. mem2reg is alloca-driven: it looks for allocas and if it can handle
+   them, it promotes them. It does not apply to global variables or heap
+   allocations.
+#. mem2reg only looks for alloca instructions in the entry block of the
+   function. Being in the entry block guarantees that the alloca is only
+   executed once, which makes analysis simpler.
+#. mem2reg only promotes allocas whose uses are direct loads and stores.
+   If the address of the stack object is passed to a function, or if any
+   funny pointer arithmetic is involved, the alloca will not be
+   promoted.
+#. mem2reg only works on allocas of `first
+   class <../LangRef.html#first-class-types>`_ values (such as pointers,
+   scalars and vectors), and only if the array size of the allocation is
+   1 (or missing in the .ll file). mem2reg is not capable of promoting
+   structs or arrays to registers. Note that the "sroa" pass is
+   more powerful and can promote structs, "unions", and arrays in many
+   cases.
+
+All of these properties are easy to satisfy for most imperative
+languages, and we'll illustrate it below with Kaleidoscope. The final
+question you may be asking is: should I bother with this nonsense for my
+front-end? Wouldn't it be better if I just did SSA construction
+directly, avoiding use of the mem2reg optimization pass? In short, we
+strongly recommend that you use this technique for building SSA form,
+unless there is an extremely good reason not to. Using this technique
+is:
+
+-  Proven and well tested: clang uses this technique
+   for local mutable variables. As such, the most common clients of LLVM
+   are using this to handle a bulk of their variables. You can be sure
+   that bugs are found fast and fixed early.
+-  Extremely Fast: mem2reg has a number of special cases that make it
+   fast in common cases as well as fully general. For example, it has
+   fast-paths for variables that are only used in a single block,
+   variables that only have one assignment point, good heuristics to
+   avoid insertion of unneeded phi nodes, etc.
+-  Needed for debug info generation: `Debug information in
+   LLVM <../SourceLevelDebugging.html>`_ relies on having the address of
+   the variable exposed so that debug info can be attached to it. This
+   technique dovetails very naturally with this style of debug info.
+
+If nothing else, this makes it much easier to get your front-end up and
+running, and is very simple to implement. Let's extend Kaleidoscope with
+mutable variables now!
+
+Mutable Variables in Kaleidoscope
+=================================
+
+Now that we know the sort of problem we want to tackle, let's see what
+this looks like in the context of our little Kaleidoscope language.
+We're going to add two features:
+
+#. The ability to mutate variables with the '=' operator.
+#. The ability to define new variables.
+
+While the first item is really what this is about, we only have
+variables for incoming arguments as well as for induction variables, and
+redefining those only goes so far :). Also, the ability to define new
+variables is a useful thing regardless of whether you will be mutating
+them. Here's a motivating example that shows how we could use these:
+
+::
+
+    # Define ':' for sequencing: as a low-precedence operator that ignores operands
+    # and just returns the RHS.
+    def binary : 1 (x y) y;
+
+    # Recursive fib, we could do this before.
+    def fib(x)
+      if (x < 3) then
+        1
+      else
+        fib(x-1)+fib(x-2);
+
+    # Iterative fib.
+    def fibi(x)
+      var a = 1, b = 1, c in
+      (for i = 3, i < x in
+         c = a + b :
+         a = b :
+         b = c) :
+      b;
+
+    # Call it.
+    fibi(10);
+
+In order to mutate variables, we have to change our existing variables
+to use the "alloca trick". Once we have that, we'll add our new
+operator, then extend Kaleidoscope to support new variable definitions.
+
+Adjusting Existing Variables for Mutation
+=========================================
+
+The symbol table in Kaleidoscope is managed at code generation time by
+the '``NamedValues``' map. This map currently keeps track of the LLVM
+"Value\*" that holds the double value for the named variable. In order
+to support mutation, we need to change this slightly, so that
+``NamedValues`` holds the *memory location* of the variable in question.
+Note that this change is a refactoring: it changes the structure of the
+code, but does not (by itself) change the behavior of the compiler. All
+of these changes are isolated in the Kaleidoscope code generator.
+
+At this point in Kaleidoscope's development, it only supports variables
+for two things: incoming arguments to functions and the induction
+variable of 'for' loops. For consistency, we'll allow mutation of these
+variables in addition to other user-defined variables. This means that
+these will both need memory locations.
+
+To start our transformation of Kaleidoscope, we'll change the
+NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once
+we do this, the C++ compiler will tell us what parts of the code we need
+to update:
+
+.. code-block:: c++
+
+    static std::map<std::string, AllocaInst*> NamedValues;
+
+Also, since we will need to create these alloca's, we'll use a helper
+function that ensures that the allocas are created in the entry block of
+the function:
+
+.. code-block:: c++
+
+    /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
+    /// the function.  This is used for mutable variables etc.
+    static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
+                                              const std::string &VarName) {
+      IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
+                     TheFunction->getEntryBlock().begin());
+      return TmpB.CreateAlloca(Type::getDoubleTy(LLVMContext), 0,
+                               VarName.c_str());
+    }
+
+This funny looking code creates an IRBuilder object that is pointing at
+the first instruction (.begin()) of the entry block. It then creates an
+alloca with the expected name and returns it. Because all values in
+Kaleidoscope are doubles, there is no need to pass in a type to use.
+
+With this in place, the first functionality change we want to make is to
+variable references. In our new scheme, variables live on the stack, so
+code generating a reference to them actually needs to produce a load
+from the stack slot:
+
+.. code-block:: c++
+
+    Value *VariableExprAST::codegen() {
+      // Look this variable up in the function.
+      Value *V = NamedValues[Name];
+      if (!V)
+        return LogErrorV("Unknown variable name");
+
+      // Load the value.
+      return Builder.CreateLoad(V, Name.c_str());
+    }
+
+As you can see, this is pretty straightforward. Now we need to update
+the things that define the variables to set up the alloca. We'll start
+with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for
+the unabridged code):
+
+.. code-block:: c++
+
+      Function *TheFunction = Builder.GetInsertBlock()->getParent();
+
+      // Create an alloca for the variable in the entry block.
+      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
+
+        // Emit the start code first, without 'variable' in scope.
+      Value *StartVal = Start->codegen();
+      if (!StartVal)
+        return nullptr;
+
+      // Store the value into the alloca.
+      Builder.CreateStore(StartVal, Alloca);
+      ...
+
+      // Compute the end condition.
+      Value *EndCond = End->codegen();
+      if (!EndCond)
+        return nullptr;
+
+      // Reload, increment, and restore the alloca.  This handles the case where
+      // the body of the loop mutates the variable.
+      Value *CurVar = Builder.CreateLoad(Alloca);
+      Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
+      Builder.CreateStore(NextVar, Alloca);
+      ...
+
+This code is virtually identical to the code `before we allowed mutable
+variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we
+no longer have to construct a PHI node, and we use load/store to access
+the variable as needed.
+
+To support mutable argument variables, we need to also make allocas for
+them. The code for this is also pretty simple:
+
+.. code-block:: c++
+
+    /// CreateArgumentAllocas - Create an alloca for each argument and register the
+    /// argument in the symbol table so that references to it will succeed.
+    void PrototypeAST::CreateArgumentAllocas(Function *F) {
+      Function::arg_iterator AI = F->arg_begin();
+      for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
+        // Create an alloca for this variable.
+        AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
+
+        // Store the initial value into the alloca.
+        Builder.CreateStore(AI, Alloca);
+
+        // Add arguments to variable symbol table.
+        NamedValues[Args[Idx]] = Alloca;
+      }
+    }
+
+For each argument, we make an alloca, store the input value to the
+function into the alloca, and register the alloca as the memory location
+for the argument. This method gets invoked by ``FunctionAST::codegen()``
+right after it sets up the entry block for the function.
+
+The final missing piece is adding the mem2reg pass, which allows us to
+get good codegen once again:
+
+.. code-block:: c++
+
+        // Set up the optimizer pipeline.  Start with registering info about how the
+        // target lays out data structures.
+        OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
+        // Promote allocas to registers.
+        OurFPM.add(createPromoteMemoryToRegisterPass());
+        // Do simple "peephole" optimizations and bit-twiddling optzns.
+        OurFPM.add(createInstructionCombiningPass());
+        // Reassociate expressions.
+        OurFPM.add(createReassociatePass());
+
+It is interesting to see what the code looks like before and after the
+mem2reg optimization runs. For example, this is the before/after code
+for our recursive fib function. Before the optimization:
+
+.. code-block:: llvm
+
+    define double @fib(double %x) {
+    entry:
+      %x1 = alloca double
+      store double %x, double* %x1
+      %x2 = load double* %x1
+      %cmptmp = fcmp ult double %x2, 3.000000e+00
+      %booltmp = uitofp i1 %cmptmp to double
+      %ifcond = fcmp one double %booltmp, 0.000000e+00
+      br i1 %ifcond, label %then, label %else
+
+    then:       ; preds = %entry
+      br label %ifcont
+
+    else:       ; preds = %entry
+      %x3 = load double* %x1
+      %subtmp = fsub double %x3, 1.000000e+00
+      %calltmp = call double @fib(double %subtmp)
+      %x4 = load double* %x1
+      %subtmp5 = fsub double %x4, 2.000000e+00
+      %calltmp6 = call double @fib(double %subtmp5)
+      %addtmp = fadd double %calltmp, %calltmp6
+      br label %ifcont
+
+    ifcont:     ; preds = %else, %then
+      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
+      ret double %iftmp
+    }
+
+Here there is only one variable (x, the input argument) but you can
+still see the extremely simple-minded code generation strategy we are
+using. In the entry block, an alloca is created, and the initial input
+value is stored into it. Each reference to the variable does a reload
+from the stack. Also, note that we didn't modify the if/then/else
+expression, so it still inserts a PHI node. While we could make an
+alloca for it, it is actually easier to create a PHI node for it, so we
+still just make the PHI.
+
+Here is the code after the mem2reg pass runs:
+
+.. code-block:: llvm
+
+    define double @fib(double %x) {
+    entry:
+      %cmptmp = fcmp ult double %x, 3.000000e+00
+      %booltmp = uitofp i1 %cmptmp to double
+      %ifcond = fcmp one double %booltmp, 0.000000e+00
+      br i1 %ifcond, label %then, label %else
+
+    then:
+      br label %ifcont
+
+    else:
+      %subtmp = fsub double %x, 1.000000e+00
+      %calltmp = call double @fib(double %subtmp)
+      %subtmp5 = fsub double %x, 2.000000e+00
+      %calltmp6 = call double @fib(double %subtmp5)
+      %addtmp = fadd double %calltmp, %calltmp6
+      br label %ifcont
+
+    ifcont:     ; preds = %else, %then
+      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
+      ret double %iftmp
+    }
+
+This is a trivial case for mem2reg, since there are no redefinitions of
+the variable. The point of showing this is to calm your tension about
+inserting such blatent inefficiencies :).
+
+After the rest of the optimizers run, we get:
+
+.. code-block:: llvm
+
+    define double @fib(double %x) {
+    entry:
+      %cmptmp = fcmp ult double %x, 3.000000e+00
+      %booltmp = uitofp i1 %cmptmp to double
+      %ifcond = fcmp ueq double %booltmp, 0.000000e+00
+      br i1 %ifcond, label %else, label %ifcont
+
+    else:
+      %subtmp = fsub double %x, 1.000000e+00
+      %calltmp = call double @fib(double %subtmp)
+      %subtmp5 = fsub double %x, 2.000000e+00
+      %calltmp6 = call double @fib(double %subtmp5)
+      %addtmp = fadd double %calltmp, %calltmp6
+      ret double %addtmp
+
+    ifcont:
+      ret double 1.000000e+00
+    }
+
+Here we see that the simplifycfg pass decided to clone the return
+instruction into the end of the 'else' block. This allowed it to
+eliminate some branches and the PHI node.
+
+Now that all symbol table references are updated to use stack variables,
+we'll add the assignment operator.
+
+New Assignment Operator
+=======================
+
+With our current framework, adding a new assignment operator is really
+simple. We will parse it just like any other binary operator, but handle
+it internally (instead of allowing the user to define it). The first
+step is to set a precedence:
+
+.. code-block:: c++
+
+     int main() {
+       // Install standard binary operators.
+       // 1 is lowest precedence.
+       BinopPrecedence['='] = 2;
+       BinopPrecedence['<'] = 10;
+       BinopPrecedence['+'] = 20;
+       BinopPrecedence['-'] = 20;
+
+Now that the parser knows the precedence of the binary operator, it
+takes care of all the parsing and AST generation. We just need to
+implement codegen for the assignment operator. This looks like:
+
+.. code-block:: c++
+
+    Value *BinaryExprAST::codegen() {
+      // Special case '=' because we don't want to emit the LHS as an expression.
+      if (Op == '=') {
+        // Assignment requires the LHS to be an identifier.
+        VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());
+        if (!LHSE)
+          return LogErrorV("destination of '=' must be a variable");
+
+Unlike the rest of the binary operators, our assignment operator doesn't
+follow the "emit LHS, emit RHS, do computation" model. As such, it is
+handled as a special case before the other binary operators are handled.
+The other strange thing is that it requires the LHS to be a variable. It
+is invalid to have "(x+1) = expr" - only things like "x = expr" are
+allowed.
+
+.. code-block:: c++
+
+        // Codegen the RHS.
+        Value *Val = RHS->codegen();
+        if (!Val)
+          return nullptr;
+
+        // Look up the name.
+        Value *Variable = NamedValues[LHSE->getName()];
+        if (!Variable)
+          return LogErrorV("Unknown variable name");
+
+        Builder.CreateStore(Val, Variable);
+        return Val;
+      }
+      ...
+
+Once we have the variable, codegen'ing the assignment is
+straightforward: we emit the RHS of the assignment, create a store, and
+return the computed value. Returning a value allows for chained
+assignments like "X = (Y = Z)".
+
+Now that we have an assignment operator, we can mutate loop variables
+and arguments. For example, we can now run code like this:
+
+::
+
+    # Function to print a double.
+    extern printd(x);
+
+    # Define ':' for sequencing: as a low-precedence operator that ignores operands
+    # and just returns the RHS.
+    def binary : 1 (x y) y;
+
+    def test(x)
+      printd(x) :
+      x = 4 :
+      printd(x);
+
+    test(123);
+
+When run, this example prints "123" and then "4", showing that we did
+actually mutate the value! Okay, we have now officially implemented our
+goal: getting this to work requires SSA construction in the general
+case. However, to be really useful, we want the ability to define our
+own local variables, let's add this next!
+
+User-defined Local Variables
+============================
+
+Adding var/in is just like any other extension we made to
+Kaleidoscope: we extend the lexer, the parser, the AST and the code
+generator. The first step for adding our new 'var/in' construct is to
+extend the lexer. As before, this is pretty trivial, the code looks like
+this:
+
+.. code-block:: c++
+
+    enum Token {
+      ...
+      // var definition
+      tok_var = -13
+    ...
+    }
+    ...
+    static int gettok() {
+    ...
+        if (IdentifierStr == "in")
+          return tok_in;
+        if (IdentifierStr == "binary")
+          return tok_binary;
+        if (IdentifierStr == "unary")
+          return tok_unary;
+        if (IdentifierStr == "var")
+          return tok_var;
+        return tok_identifier;
+    ...
+
+The next step is to define the AST node that we will construct. For
+var/in, it looks like this:
+
+.. code-block:: c++
+
+    /// VarExprAST - Expression class for var/in
+    class VarExprAST : public ExprAST {
+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
+      std::unique_ptr<ExprAST> Body;
+
+    public:
+      VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
+                 std::unique_ptr<ExprAST> body)
+      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
+
+      virtual Value *codegen();
+    };
+
+var/in allows a list of names to be defined all at once, and each name
+can optionally have an initializer value. As such, we capture this
+information in the VarNames vector. Also, var/in has a body, this body
+is allowed to access the variables defined by the var/in.
+
+With this in place, we can define the parser pieces. The first thing we
+do is add it as a primary expression:
+
+.. code-block:: c++
+
+    /// primary
+    ///   ::= identifierexpr
+    ///   ::= numberexpr
+    ///   ::= parenexpr
+    ///   ::= ifexpr
+    ///   ::= forexpr
+    ///   ::= varexpr
+    static std::unique_ptr<ExprAST> ParsePrimary() {
+      switch (CurTok) {
+      default:
+        return LogError("unknown token when expecting an expression");
+      case tok_identifier:
+        return ParseIdentifierExpr();
+      case tok_number:
+        return ParseNumberExpr();
+      case '(':
+        return ParseParenExpr();
+      case tok_if:
+        return ParseIfExpr();
+      case tok_for:
+        return ParseForExpr();
+      case tok_var:
+        return ParseVarExpr();
+      }
+    }
+
+Next we define ParseVarExpr:
+
+.. code-block:: c++
+
+    /// varexpr ::= 'var' identifier ('=' expression)?
+    //                    (',' identifier ('=' expression)?)* 'in' expression
+    static std::unique_ptr<ExprAST> ParseVarExpr() {
+      getNextToken();  // eat the var.
+
+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
+
+      // At least one variable name is required.
+      if (CurTok != tok_identifier)
+        return LogError("expected identifier after var");
+
+The first part of this code parses the list of identifier/expr pairs
+into the local ``VarNames`` vector.
+
+.. code-block:: c++
+
+      while (1) {
+        std::string Name = IdentifierStr;
+        getNextToken();  // eat identifier.
+
+        // Read the optional initializer.
+        std::unique_ptr<ExprAST> Init;
+        if (CurTok == '=') {
+          getNextToken(); // eat the '='.
+
+          Init = ParseExpression();
+          if (!Init) return nullptr;
+        }
+
+        VarNames.push_back(std::make_pair(Name, std::move(Init)));
+
+        // End of var list, exit loop.
+        if (CurTok != ',') break;
+        getNextToken(); // eat the ','.
+
+        if (CurTok != tok_identifier)
+          return LogError("expected identifier list after var");
+      }
+
+Once all the variables are parsed, we then parse the body and create the
+AST node:
+
+.. code-block:: c++
+
+      // At this point, we have to have 'in'.
+      if (CurTok != tok_in)
+        return LogError("expected 'in' keyword after 'var'");
+      getNextToken();  // eat 'in'.
+
+      auto Body = ParseExpression();
+      if (!Body)
+        return nullptr;
+
+      return llvm::make_unique<VarExprAST>(std::move(VarNames),
+                                           std::move(Body));
+    }
+
+Now that we can parse and represent the code, we need to support
+emission of LLVM IR for it. This code starts out with:
+
+.. code-block:: c++
+
+    Value *VarExprAST::codegen() {
+      std::vector<AllocaInst *> OldBindings;
+
+      Function *TheFunction = Builder.GetInsertBlock()->getParent();
+
+      // Register all variables and emit their initializer.
+      for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
+        const std::string &VarName = VarNames[i].first;
+        ExprAST *Init = VarNames[i].second.get();
+
+Basically it loops over all the variables, installing them one at a
+time. For each variable we put into the symbol table, we remember the
+previous value that we replace in OldBindings.
+
+.. code-block:: c++
+
+        // Emit the initializer before adding the variable to scope, this prevents
+        // the initializer from referencing the variable itself, and permits stuff
+        // like this:
+        //  var a = 1 in
+        //    var a = a in ...   # refers to outer 'a'.
+        Value *InitVal;
+        if (Init) {
+          InitVal = Init->codegen();
+          if (!InitVal)
+            return nullptr;
+        } else { // If not specified, use 0.0.
+          InitVal = ConstantFP::get(LLVMContext, APFloat(0.0));
+        }
+
+        AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
+        Builder.CreateStore(InitVal, Alloca);
+
+        // Remember the old variable binding so that we can restore the binding when
+        // we unrecurse.
+        OldBindings.push_back(NamedValues[VarName]);
+
+        // Remember this binding.
+        NamedValues[VarName] = Alloca;
+      }
+
+There are more comments here than code. The basic idea is that we emit
+the initializer, create the alloca, then update the symbol table to
+point to it. Once all the variables are installed in the symbol table,
+we evaluate the body of the var/in expression:
+
+.. code-block:: c++
+
+      // Codegen the body, now that all vars are in scope.
+      Value *BodyVal = Body->codegen();
+      if (!BodyVal)
+        return nullptr;
+
+Finally, before returning, we restore the previous variable bindings:
+
+.. code-block:: c++
+
+      // Pop all our variables from scope.
+      for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
+        NamedValues[VarNames[i].first] = OldBindings[i];
+
+      // Return the body computation.
+      return BodyVal;
+    }
+
+The end result of all of this is that we get properly scoped variable
+definitions, and we even (trivially) allow mutation of them :).
+
+With this, we completed what we set out to do. Our nice iterative fib
+example from the intro compiles and runs just fine. The mem2reg pass
+optimizes all of our stack variables into SSA registers, inserting PHI
+nodes where needed, and our front-end remains simple: no "iterated
+dominance frontier" computation anywhere in sight.
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+mutable variables and var/in support. To build this example, use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+    # Run
+    ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp
+   :language: c++
+
+`Next: Compiling to Object Code <LangImpl08.html>`_
+

Added: llvm/trunk/docs/tutorial/LangImpl08.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl08.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl08.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl08.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,218 @@
+========================================
+ Kaleidoscope: Compiling to Object Code
+========================================
+
+.. contents::
+   :local:
+
+Chapter 8 Introduction
+======================
+
+Welcome to Chapter 8 of the "`Implementing a language with LLVM
+<index.html>`_" tutorial. This chapter describes how to compile our
+language down to object files.
+
+Choosing a target
+=================
+
+LLVM has native support for cross-compilation. You can compile to the
+architecture of your current machine, or just as easily compile for
+other architectures. In this tutorial, we'll target the current
+machine.
+
+To specify the architecture that you want to target, we use a string
+called a "target triple". This takes the form
+``<arch><sub>-<vendor>-<sys>-<abi>`` (see the `cross compilation docs
+<http://clang.llvm.org/docs/CrossCompilation.html#target-triple>`_).
+
+As an example, we can see what clang thinks is our current target
+triple:
+
+::
+
+    $ clang --version | grep Target
+    Target: x86_64-unknown-linux-gnu
+
+Running this command may show something different on your machine as
+you might be using a different architecture or operating system to me.
+
+Fortunately, we don't need to hard-code a target triple to target the
+current machine. LLVM provides ``sys::getDefaultTargetTriple``, which
+returns the target triple of the current machine.
+
+.. code-block:: c++
+
+    auto TargetTriple = sys::getDefaultTargetTriple();
+
+LLVM doesn't require us to to link in all the target
+functionality. For example, if we're just using the JIT, we don't need
+the assembly printers. Similarly, if we're only targetting certain
+architectures, we can only link in the functionality for those
+architectures.
+
+For this example, we'll initialize all the targets for emitting object
+code.
+
+.. code-block:: c++
+
+    InitializeAllTargetInfos();
+    InitializeAllTargets();
+    InitializeAllTargetMCs();
+    InitializeAllAsmParsers();
+    InitializeAllAsmPrinters();
+
+We can now use our target triple to get a ``Target``:
+
+.. code-block:: c++
+
+  std::string Error;
+  auto Target = TargetRegistry::lookupTarget(TargetTriple, Error);
+
+  // Print an error and exit if we couldn't find the requested target.
+  // This generally occurs if we've forgotten to initialise the
+  // TargetRegistry or we have a bogus target triple.
+  if (!Target) {
+    errs() << Error;
+    return 1;
+  }
+
+Target Machine
+==============
+
+We will also need a ``TargetMachine``. This class provides a complete
+machine description of the machine we're targetting. If we want to
+target a specific feature (such as SSE) or a specific CPU (such as
+Intel's Sandylake), we do so now.
+
+To see which features and CPUs that LLVM knows about, we can use
+``llc``. For example, let's look at x86:
+
+::
+
+    $ llvm-as < /dev/null | llc -march=x86 -mattr=help
+    Available CPUs for this target:
+
+      amdfam10      - Select the amdfam10 processor.
+      athlon        - Select the athlon processor.
+      athlon-4      - Select the athlon-4 processor.
+      ...
+
+    Available features for this target:
+
+      16bit-mode            - 16-bit mode (i8086).
+      32bit-mode            - 32-bit mode (80386).
+      3dnow                 - Enable 3DNow! instructions.
+      3dnowa                - Enable 3DNow! Athlon instructions.
+      ...
+
+For our example, we'll use the generic CPU without any additional
+features, options or relocation model.
+
+.. code-block:: c++
+
+  auto CPU = "generic";
+  auto Features = "";
+
+  TargetOptions opt;
+  auto RM = Optional<Reloc::Model>();
+  auto TargetMachine = Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM);
+
+
+Configuring the Module
+======================
+
+We're now ready to configure our module, to specify the target and
+data layout. This isn't strictly necessary, but the `frontend
+performance guide <../Frontend/PerformanceTips.html>`_ recommends
+this. Optimizations benefit from knowing about the target and data
+layout.
+
+.. code-block:: c++
+
+  TheModule->setDataLayout(TargetMachine->createDataLayout());
+  TheModule->setTargetTriple(TargetTriple);   
+  
+Emit Object Code
+================
+
+We're ready to emit object code! Let's define where we want to write
+our file to:
+
+.. code-block:: c++
+
+  auto Filename = "output.o";
+  std::error_code EC;
+  raw_fd_ostream dest(Filename, EC, sys::fs::F_None);
+
+  if (EC) {
+    errs() << "Could not open file: " << EC.message();
+    return 1;
+  }
+
+Finally, we define a pass that emits object code, then we run that
+pass:
+
+.. code-block:: c++
+
+  legacy::PassManager pass;
+  auto FileType = TargetMachine::CGFT_ObjectFile;
+
+  if (TargetMachine->addPassesToEmitFile(pass, dest, FileType)) {
+    errs() << "TargetMachine can't emit a file of this type";
+    return 1;
+  }
+
+  pass.run(*TheModule);
+  dest.flush();
+
+Putting It All Together
+=======================
+
+Does it work? Let's give it a try. We need to compile our code, but
+note that the arguments to ``llvm-config`` are different to the previous chapters.
+
+::
+
+    $ clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs all` -o toy
+
+Let's run it, and define a simple ``average`` function. Press Ctrl-D
+when you're done.
+
+::
+   
+    $ ./toy
+    ready> def average(x y) (x + y) * 0.5;
+    ^D
+    Wrote output.o
+
+We have an object file! To test it, let's write a simple program and
+link it with our output. Here's the source code:
+
+.. code-block:: c++
+
+    #include <iostream>
+
+    extern "C" {
+        double average(double, double);
+    }
+
+    int main() {
+        std::cout << "average of 3.0 and 4.0: " << average(3.0, 4.0) << std::endl;
+    }
+
+We link our program to output.o and check the result is what we
+expected:
+
+::
+
+    $ clang++ main.cpp output.o -o main
+    $ ./main
+    average of 3.0 and 4.0: 3.5
+
+Full Code Listing
+=================
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp
+   :language: c++
+
+`Next: Adding Debug Information <LangImpl09.html>`_

Added: llvm/trunk/docs/tutorial/LangImpl09.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl09.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl09.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl09.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,462 @@
+======================================
+Kaleidoscope: Adding Debug Information
+======================================
+
+.. contents::
+   :local:
+
+Chapter 9 Introduction
+======================
+
+Welcome to Chapter 9 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a
+decent little programming language with functions and variables.
+What happens if something goes wrong though, how do you debug your
+program?
+
+Source level debugging uses formatted data that helps a debugger
+translate from binary and the state of the machine back to the
+source that the programmer wrote. In LLVM we generally use a format
+called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding
+that represents types, source locations, and variable locations. 
+
+The short summary of this chapter is that we'll go through the
+various things you have to add to a programming language to
+support debug info, and how you translate that into DWARF.
+
+Caveat: For now we can't debug via the JIT, so we'll need to compile
+our program down to something small and standalone. As part of this
+we'll make a few modifications to the running of the language and
+how programs are compiled. This means that we'll have a source file
+with a simple program written in Kaleidoscope rather than the
+interactive JIT. It does involve a limitation that we can only
+have one "top level" command at a time to reduce the number of
+changes necessary.
+
+Here's the sample program we'll be compiling:
+
+.. code-block:: python
+
+   def fib(x)
+     if x < 3 then
+       1
+     else
+       fib(x-1)+fib(x-2);
+
+   fib(10)
+
+
+Why is this a hard problem?
+===========================
+
+Debug information is a hard problem for a few different reasons - mostly
+centered around optimized code. First, optimization makes keeping source
+locations more difficult. In LLVM IR we keep the original source location
+for each IR level instruction on the instruction. Optimization passes
+should keep the source locations for newly created instructions, but merged
+instructions only get to keep a single location - this can cause jumping
+around when stepping through optimized programs. Secondly, optimization
+can move variables in ways that are either optimized out, shared in memory
+with other variables, or difficult to track. For the purposes of this
+tutorial we're going to avoid optimization (as you'll see with one of the
+next sets of patches).
+
+Ahead-of-Time Compilation Mode
+==============================
+
+To highlight only the aspects of adding debug information to a source
+language without needing to worry about the complexities of JIT debugging
+we're going to make a few changes to Kaleidoscope to support compiling
+the IR emitted by the front end into a simple standalone program that
+you can execute, debug, and see results.
+
+First we make our anonymous function that contains our top level
+statement be our "main":
+
+.. code-block:: udiff
+
+  -    auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
+  +    auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>());
+
+just with the simple change of giving it a name.
+
+Then we're going to remove the command line code wherever it exists:
+
+.. code-block:: udiff
+
+  @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
+   /// top ::= definition | external | expression | ';'
+   static void MainLoop() {
+     while (1) {
+  -    fprintf(stderr, "ready> ");
+       switch (CurTok) {
+       case tok_eof:
+         return;
+  @@ -1184,7 +1183,6 @@ int main() {
+     BinopPrecedence['*'] = 40; // highest.
+ 
+     // Prime the first token.
+  -  fprintf(stderr, "ready> ");
+     getNextToken();
+ 
+Lastly we're going to disable all of the optimization passes and the JIT so
+that the only thing that happens after we're done parsing and generating
+code is that the llvm IR goes to standard error:
+
+.. code-block:: udiff
+
+  @@ -1108,17 +1108,8 @@ static void HandleExtern() {
+   static void HandleTopLevelExpression() {
+     // Evaluate a top-level expression into an anonymous function.
+     if (auto FnAST = ParseTopLevelExpr()) {
+  -    if (auto *FnIR = FnAST->codegen()) {
+  -      // We're just doing this to make sure it executes.
+  -      TheExecutionEngine->finalizeObject();
+  -      // JIT the function, returning a function pointer.
+  -      void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR);
+  -
+  -      // Cast it to the right type (takes no arguments, returns a double) so we
+  -      // can call it as a native function.
+  -      double (*FP)() = (double (*)())(intptr_t)FPtr;
+  -      // Ignore the return value for this.
+  -      (void)FP;
+  +    if (!F->codegen()) {
+  +      fprintf(stderr, "Error generating code for top level expr");
+       }
+     } else {
+       // Skip token for error recovery.
+  @@ -1439,11 +1459,11 @@ int main() {
+     // target lays out data structures.
+     TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
+     OurFPM.add(new DataLayoutPass());
+  +#if 0
+     OurFPM.add(createBasicAliasAnalysisPass());
+     // Promote allocas to registers.
+     OurFPM.add(createPromoteMemoryToRegisterPass());
+  @@ -1218,7 +1210,7 @@ int main() {
+     OurFPM.add(createGVNPass());
+     // Simplify the control flow graph (deleting unreachable blocks, etc).
+     OurFPM.add(createCFGSimplificationPass());
+  -
+  +  #endif
+     OurFPM.doInitialization();
+ 
+     // Set the global so the code gen can use this.
+
+This relatively small set of changes get us to the point that we can compile
+our piece of Kaleidoscope language down to an executable program via this
+command line:
+
+.. code-block:: bash
+
+  Kaleidoscope-Ch9 < fib.ks | & clang -x ir -
+
+which gives an a.out/a.exe in the current working directory.
+
+Compile Unit
+============
+
+The top level container for a section of code in DWARF is a compile unit.
+This contains the type and function data for an individual translation unit
+(read: one file of source code). So the first thing we need to do is
+construct one for our fib.ks file.
+
+DWARF Emission Setup
+====================
+
+Similar to the ``IRBuilder`` class we have a
+`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
+that helps in constructing debug metadata for an llvm IR file. It
+corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
+Using it does require that you be more familiar with DWARF terminology than
+you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
+read through the general documentation on the
+`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
+should be a little more clear. We'll be using this class to construct all
+of our IR level descriptions. Construction for it takes a module so we
+need to construct it shortly after we construct our module. We've left it
+as a global static variable to make it a bit easier to use.
+
+Next we're going to create a small container to cache some of our frequent
+data. The first will be our compile unit, but we'll also write a bit of
+code for our one type since we won't have to worry about multiple typed
+expressions:
+
+.. code-block:: c++
+
+  static DIBuilder *DBuilder;
+
+  struct DebugInfo {
+    DICompileUnit *TheCU;
+    DIType *DblTy;
+
+    DIType *getDoubleTy();
+  } KSDbgInfo;
+
+  DIType *DebugInfo::getDoubleTy() {
+    if (DblTy.isValid())
+      return DblTy;
+
+    DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
+    return DblTy;
+  }
+
+And then later on in ``main`` when we're constructing our module:
+
+.. code-block:: c++
+
+  DBuilder = new DIBuilder(*TheModule);
+
+  KSDbgInfo.TheCU = DBuilder->createCompileUnit(
+      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
+
+There are a couple of things to note here. First, while we're producing a
+compile unit for a language called Kaleidoscope we used the language
+constant for C. This is because a debugger wouldn't necessarily understand
+the calling conventions or default ABI for a language it doesn't recognize
+and we follow the C ABI in our llvm code generation so it's the closest
+thing to accurate. This ensures we can actually call functions from the
+debugger and have them execute. Secondly, you'll see the "fib.ks" in the
+call to ``createCompileUnit``. This is a default hard coded value since
+we're using shell redirection to put our source into the Kaleidoscope
+compiler. In a usual front end you'd have an input file name and it would
+go there.
+
+One last thing as part of emitting debug information via DIBuilder is that
+we need to "finalize" the debug information. The reasons are part of the
+underlying API for DIBuilder, but make sure you do this near the end of
+main:
+
+.. code-block:: c++
+
+  DBuilder->finalize();
+
+before you dump out the module.
+
+Functions
+=========
+
+Now that we have our ``Compile Unit`` and our source locations, we can add
+function definitions to the debug info. So in ``PrototypeAST::codegen()`` we
+add a few lines of code to describe a context for our subprogram, in this
+case the "File", and the actual definition of the function itself.
+
+So the context:
+
+.. code-block:: c++
+
+  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
+                                      KSDbgInfo.TheCU.getDirectory());
+
+giving us an DIFile and asking the ``Compile Unit`` we created above for the
+directory and filename where we are currently. Then, for now, we use some
+source locations of 0 (since our AST doesn't currently have source location
+information) and construct our function definition:
+
+.. code-block:: c++
+
+  DIScope *FContext = Unit;
+  unsigned LineNo = 0;
+  unsigned ScopeLine = 0;
+  DISubprogram *SP = DBuilder->createFunction(
+      FContext, Name, StringRef(), Unit, LineNo,
+      CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
+      true /* definition */, ScopeLine, DINode::FlagPrototyped, false);
+  F->setSubprogram(SP);
+
+and we now have an DISubprogram that contains a reference to all of our
+metadata for the function.
+
+Source Locations
+================
+
+The most important thing for debug information is accurate source location -
+this makes it possible to map your source code back. We have a problem though,
+Kaleidoscope really doesn't have any source location information in the lexer
+or parser so we'll need to add it.
+
+.. code-block:: c++
+
+   struct SourceLocation {
+     int Line;
+     int Col;
+   };
+   static SourceLocation CurLoc;
+   static SourceLocation LexLoc = {1, 0};
+
+   static int advance() {
+     int LastChar = getchar();
+
+     if (LastChar == '\n' || LastChar == '\r') {
+       LexLoc.Line++;
+       LexLoc.Col = 0;
+     } else
+       LexLoc.Col++;
+     return LastChar;
+   }
+
+In this set of code we've added some functionality on how to keep track of the
+line and column of the "source file". As we lex every token we set our current
+current "lexical location" to the assorted line and column for the beginning
+of the token. We do this by overriding all of the previous calls to
+``getchar()`` with our new ``advance()`` that keeps track of the information
+and then we have added to all of our AST classes a source location:
+
+.. code-block:: c++
+
+   class ExprAST {
+     SourceLocation Loc;
+
+     public:
+       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
+       virtual ~ExprAST() {}
+       virtual Value* codegen() = 0;
+       int getLine() const { return Loc.Line; }
+       int getCol() const { return Loc.Col; }
+       virtual raw_ostream &dump(raw_ostream &out, int ind) {
+         return out << ':' << getLine() << ':' << getCol() << '\n';
+       }
+
+that we pass down through when we create a new expression:
+
+.. code-block:: c++
+
+   LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),
+                                          std::move(RHS));
+
+giving us locations for each of our expressions and variables.
+
+From this we can make sure to tell ``DIBuilder`` when we're at a new source
+location so it can use that when we generate the rest of our code and make
+sure that each instruction has source location information. We do this
+by constructing another small function:
+
+.. code-block:: c++
+
+  void DebugInfo::emitLocation(ExprAST *AST) {
+    DIScope *Scope;
+    if (LexicalBlocks.empty())
+      Scope = TheCU;
+    else
+      Scope = LexicalBlocks.back();
+    Builder.SetCurrentDebugLocation(
+        DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
+  }
+
+that both tells the main ``IRBuilder`` where we are, but also what scope
+we're in. Since we've just created a function above we can either be in
+the main file scope (like when we created our function), or now we can be
+in the function scope we just created. To represent this we create a stack
+of scopes:
+
+.. code-block:: c++
+
+   std::vector<DIScope *> LexicalBlocks;
+   std::map<const PrototypeAST *, DIScope *> FnScopeMap;
+
+and keep a map of each function to the scope that it represents (an
+DISubprogram is also an DIScope).
+
+Then we make sure to:
+
+.. code-block:: c++
+
+   KSDbgInfo.emitLocation(this);
+
+emit the location every time we start to generate code for a new AST, and
+also:
+
+.. code-block:: c++
+
+  KSDbgInfo.FnScopeMap[this] = SP;
+
+store the scope (function) when we create it and use it:
+
+  KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
+
+when we start generating the code for each function.
+
+also, don't forget to pop the scope back off of your scope stack at the
+end of the code generation for the function:
+
+.. code-block:: c++
+
+  // Pop off the lexical block for the function since we added it
+  // unconditionally.
+  KSDbgInfo.LexicalBlocks.pop_back();
+
+Variables
+=========
+
+Now that we have functions, we need to be able to print out the variables
+we have in scope. Let's get our function arguments set up so we can get
+decent backtraces and see how our functions are being called. It isn't
+a lot of code, and we generally handle it when we're creating the
+argument allocas in ``PrototypeAST::CreateArgumentAllocas``.
+
+.. code-block:: c++
+
+  DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
+  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
+                                      KSDbgInfo.TheCU.getDirectory());
+  DILocalVariable D = DBuilder->createParameterVariable(
+      Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true);
+
+  DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
+                          DebugLoc::get(Line, 0, Scope),
+                          Builder.GetInsertBlock());
+
+Here we're doing a few things. First, we're grabbing our current scope
+for the variable so we can say what range of code our variable is valid
+through. Second, we're creating the variable, giving it the scope,
+the name, source location, type, and since it's an argument, the argument
+index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR
+level that we've got a variable in an alloca (and it gives a starting
+location for the variable), and setting a source location for the
+beginning of the scope on the declare.
+
+One interesting thing to note at this point is that various debuggers have
+assumptions based on how code and debug information was generated for them
+in the past. In this case we need to do a little bit of a hack to avoid
+generating line information for the function prologue so that the debugger
+knows to skip over those instructions when setting a breakpoint. So in
+``FunctionAST::CodeGen`` we add a couple of lines:
+
+.. code-block:: c++
+
+  // Unset the location for the prologue emission (leading instructions with no
+  // location in a function are considered part of the prologue and the debugger
+  // will run past them when breaking on a function)
+  KSDbgInfo.emitLocation(nullptr);
+
+and then emit a new location when we actually start generating code for the
+body of the function:
+
+.. code-block:: c++
+
+  KSDbgInfo.emitLocation(Body);
+
+With this we have enough debug information to set breakpoints in functions,
+print out argument variables, and call functions. Not too bad for just a
+few simple lines of code!
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+debug information. To build this example, use:
+
+.. code-block:: bash
+
+    # Compile
+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
+    # Run
+    ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp
+   :language: c++
+
+`Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_
+

Removed: llvm/trunk/docs/tutorial/LangImpl1.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl1.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl1.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl1.rst (removed)
@@ -1,290 +0,0 @@
-=================================================
-Kaleidoscope: Tutorial Introduction and the Lexer
-=================================================
-
-.. contents::
-   :local:
-
-Tutorial Introduction
-=====================
-
-Welcome to the "Implementing a language with LLVM" tutorial. This
-tutorial runs through the implementation of a simple language, showing
-how fun and easy it can be. This tutorial will get you up and started as
-well as help to build a framework you can extend to other languages. The
-code in this tutorial can also be used as a playground to hack on other
-LLVM specific things.
-
-The goal of this tutorial is to progressively unveil our language,
-describing how it is built up over time. This will let us cover a fairly
-broad range of language design and LLVM-specific usage issues, showing
-and explaining the code for it all along the way, without overwhelming
-you with tons of details up front.
-
-It is useful to point out ahead of time that this tutorial is really
-about teaching compiler techniques and LLVM specifically, *not* about
-teaching modern and sane software engineering principles. In practice,
-this means that we'll take a number of shortcuts to simplify the
-exposition. For example, the code uses global variables
-all over the place, doesn't use nice design patterns like
-`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
-it is very simple. If you dig in and use the code as a basis for future
-projects, fixing these deficiencies shouldn't be hard.
-
-I've tried to put this tutorial together in a way that makes chapters
-easy to skip over if you are already familiar with or are uninterested
-in the various pieces. The structure of the tutorial is:
-
--  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope
-   language, and the definition of its Lexer - This shows where we are
-   going and the basic functionality that we want it to do. In order to
-   make this tutorial maximally understandable and hackable, we choose
-   to implement everything in C++ instead of using lexer and parser
-   generators. LLVM obviously works just fine with such tools, feel free
-   to use one if you prefer.
--  `Chapter #2 <LangImpl2.html>`_: Implementing a Parser and AST -
-   With the lexer in place, we can talk about parsing techniques and
-   basic AST construction. This tutorial describes recursive descent
-   parsing and operator precedence parsing. Nothing in Chapters 1 or 2
-   is LLVM-specific, the code doesn't even link in LLVM at this point.
-   :)
--  `Chapter #3 <LangImpl3.html>`_: Code generation to LLVM IR - With
-   the AST ready, we can show off how easy generation of LLVM IR really
-   is.
--  `Chapter #4 <LangImpl4.html>`_: Adding JIT and Optimizer Support
-   - Because a lot of people are interested in using LLVM as a JIT,
-   we'll dive right into it and show you the 3 lines it takes to add JIT
-   support. LLVM is also useful in many other ways, but this is one
-   simple and "sexy" way to show off its power. :)
--  `Chapter #5 <LangImpl5.html>`_: Extending the Language: Control
-   Flow - With the language up and running, we show how to extend it
-   with control flow operations (if/then/else and a 'for' loop). This
-   gives us a chance to talk about simple SSA construction and control
-   flow.
--  `Chapter #6 <LangImpl6.html>`_: Extending the Language:
-   User-defined Operators - This is a silly but fun chapter that talks
-   about extending the language to let the user program define their own
-   arbitrary unary and binary operators (with assignable precedence!).
-   This lets us build a significant piece of the "language" as library
-   routines.
--  `Chapter #7 <LangImpl7.html>`_: Extending the Language: Mutable
-   Variables - This chapter talks about adding user-defined local
-   variables along with an assignment operator. The interesting part
-   about this is how easy and trivial it is to construct SSA form in
-   LLVM: no, LLVM does *not* require your front-end to construct SSA
-   form!
--  `Chapter #8 <LangImpl8.html>`_: Extending the Language: Debug
-   Information - Having built a decent little programming language with
-   control flow, functions and mutable variables, we consider what it
-   takes to add debug information to standalone executables. This debug
-   information will allow you to set breakpoints in Kaleidoscope
-   functions, print out argument variables, and call functions - all
-   from within the debugger!
--  `Chapter #9 <LangImpl9.html>`_: Conclusion and other useful LLVM
-   tidbits - This chapter wraps up the series by talking about
-   potential ways to extend the language, but also includes a bunch of
-   pointers to info about "special topics" like adding garbage
-   collection support, exceptions, debugging, support for "spaghetti
-   stacks", and a bunch of other tips and tricks.
-
-By the end of the tutorial, we'll have written a bit less than 1000 lines
-of non-comment, non-blank, lines of code. With this small amount of
-code, we'll have built up a very reasonable compiler for a non-trivial
-language including a hand-written lexer, parser, AST, as well as code
-generation support with a JIT compiler. While other systems may have
-interesting "hello world" tutorials, I think the breadth of this
-tutorial is a great testament to the strengths of LLVM and why you
-should consider it if you're interested in language or compiler design.
-
-A note about this tutorial: we expect you to extend the language and
-play with it on your own. Take the code and go crazy hacking away at it,
-compilers don't need to be scary creatures - it can be a lot of fun to
-play with languages!
-
-The Basic Language
-==================
-
-This tutorial will be illustrated with a toy language that we'll call
-"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived
-from "meaning beautiful, form, and view"). Kaleidoscope is a procedural
-language that allows you to define functions, use conditionals, math,
-etc. Over the course of the tutorial, we'll extend Kaleidoscope to
-support the if/then/else construct, a for loop, user defined operators,
-JIT compilation with a simple command line interface, etc.
-
-Because we want to keep things simple, the only datatype in Kaleidoscope
-is a 64-bit floating point type (aka 'double' in C parlance). As such,
-all values are implicitly double precision and the language doesn't
-require type declarations. This gives the language a very nice and
-simple syntax. For example, the following simple example computes
-`Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_
-
-::
-
-    # Compute the x'th fibonacci number.
-    def fib(x)
-      if x < 3 then
-        1
-      else
-        fib(x-1)+fib(x-2)
-
-    # This expression will compute the 40th number.
-    fib(40)
-
-We also allow Kaleidoscope to call into standard library functions (the
-LLVM JIT makes this completely trivial). This means that you can use the
-'extern' keyword to define a function before you use it (this is also
-useful for mutually recursive functions). For example:
-
-::
-
-    extern sin(arg);
-    extern cos(arg);
-    extern atan2(arg1 arg2);
-
-    atan2(sin(.4), cos(42))
-
-A more interesting example is included in Chapter 6 where we write a
-little Kaleidoscope application that `displays a Mandelbrot
-Set <LangImpl6.html#kicking-the-tires>`_ at various levels of magnification.
-
-Lets dive into the implementation of this language!
-
-The Lexer
-=========
-
-When it comes to implementing a language, the first thing needed is the
-ability to process a text file and recognize what it says. The
-traditional way to do this is to use a
-"`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka
-'scanner') to break the input up into "tokens". Each token returned by
-the lexer includes a token code and potentially some metadata (e.g. the
-numeric value of a number). First, we define the possibilities:
-
-.. code-block:: c++
-
-    // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
-    // of these for known things.
-    enum Token {
-      tok_eof = -1,
-
-      // commands
-      tok_def = -2,
-      tok_extern = -3,
-
-      // primary
-      tok_identifier = -4,
-      tok_number = -5,
-    };
-
-    static std::string IdentifierStr; // Filled in if tok_identifier
-    static double NumVal;             // Filled in if tok_number
-
-Each token returned by our lexer will either be one of the Token enum
-values or it will be an 'unknown' character like '+', which is returned
-as its ASCII value. If the current token is an identifier, the
-``IdentifierStr`` global variable holds the name of the identifier. If
-the current token is a numeric literal (like 1.0), ``NumVal`` holds its
-value. Note that we use global variables for simplicity, this is not the
-best choice for a real language implementation :).
-
-The actual implementation of the lexer is a single function named
-``gettok``. The ``gettok`` function is called to return the next token
-from standard input. Its definition starts as:
-
-.. code-block:: c++
-
-    /// gettok - Return the next token from standard input.
-    static int gettok() {
-      static int LastChar = ' ';
-
-      // Skip any whitespace.
-      while (isspace(LastChar))
-        LastChar = getchar();
-
-``gettok`` works by calling the C ``getchar()`` function to read
-characters one at a time from standard input. It eats them as it
-recognizes them and stores the last character read, but not processed,
-in LastChar. The first thing that it has to do is ignore whitespace
-between tokens. This is accomplished with the loop above.
-
-The next thing ``gettok`` needs to do is recognize identifiers and
-specific keywords like "def". Kaleidoscope does this with this simple
-loop:
-
-.. code-block:: c++
-
-      if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
-        IdentifierStr = LastChar;
-        while (isalnum((LastChar = getchar())))
-          IdentifierStr += LastChar;
-
-        if (IdentifierStr == "def")
-          return tok_def;
-        if (IdentifierStr == "extern")
-          return tok_extern;
-        return tok_identifier;
-      }
-
-Note that this code sets the '``IdentifierStr``' global whenever it
-lexes an identifier. Also, since language keywords are matched by the
-same loop, we handle them here inline. Numeric values are similar:
-
-.. code-block:: c++
-
-      if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
-        std::string NumStr;
-        do {
-          NumStr += LastChar;
-          LastChar = getchar();
-        } while (isdigit(LastChar) || LastChar == '.');
-
-        NumVal = strtod(NumStr.c_str(), 0);
-        return tok_number;
-      }
-
-This is all pretty straight-forward code for processing input. When
-reading a numeric value from input, we use the C ``strtod`` function to
-convert it to a numeric value that we store in ``NumVal``. Note that
-this isn't doing sufficient error checking: it will incorrectly read
-"1.23.45.67" and handle it as if you typed in "1.23". Feel free to
-extend it :). Next we handle comments:
-
-.. code-block:: c++
-
-      if (LastChar == '#') {
-        // Comment until end of line.
-        do
-          LastChar = getchar();
-        while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
-
-        if (LastChar != EOF)
-          return gettok();
-      }
-
-We handle comments by skipping to the end of the line and then return
-the next token. Finally, if the input doesn't match one of the above
-cases, it is either an operator character like '+' or the end of the
-file. These are handled with this code:
-
-.. code-block:: c++
-
-      // Check for end of file.  Don't eat the EOF.
-      if (LastChar == EOF)
-        return tok_eof;
-
-      // Otherwise, just return the character as its ascii value.
-      int ThisChar = LastChar;
-      LastChar = getchar();
-      return ThisChar;
-    }
-
-With this, we have the complete lexer for the basic Kaleidoscope
-language (the `full code listing <LangImpl2.html#full-code-listing>`_ for the Lexer
-is available in the `next chapter <LangImpl2.html>`_ of the tutorial).
-Next we'll `build a simple parser that uses this to build an Abstract
-Syntax Tree <LangImpl2.html>`_. When we have that, we'll include a
-driver so that you can use the lexer and parser together.
-
-`Next: Implementing a Parser and AST <LangImpl2.html>`_
-

Added: llvm/trunk/docs/tutorial/LangImpl10.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl10.rst?rev=274441&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl10.rst (added)
+++ llvm/trunk/docs/tutorial/LangImpl10.rst Sat Jul  2 12:01:59 2016
@@ -0,0 +1,259 @@
+======================================================
+Kaleidoscope: Conclusion and other useful LLVM tidbits
+======================================================
+
+.. contents::
+   :local:
+
+Tutorial Conclusion
+===================
+
+Welcome to the final chapter of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
+grown our little Kaleidoscope language from being a useless toy, to
+being a semi-interesting (but probably still useless) toy. :)
+
+It is interesting to see how far we've come, and how little code it has
+taken. We built the entire lexer, parser, AST, code generator, an
+interactive run-loop (with a JIT!), and emitted debug information in
+standalone executables - all in under 1000 lines of (non-comment/non-blank)
+code.
+
+Our little language supports a couple of interesting features: it
+supports user defined binary and unary operators, it uses JIT
+compilation for immediate evaluation, and it supports a few control flow
+constructs with SSA construction.
+
+Part of the idea of this tutorial was to show you how easy and fun it
+can be to define, build, and play with languages. Building a compiler
+need not be a scary or mystical process! Now that you've seen some of
+the basics, I strongly encourage you to take the code and hack on it.
+For example, try adding:
+
+-  **global variables** - While global variables have questional value
+   in modern software engineering, they are often useful when putting
+   together quick little hacks like the Kaleidoscope compiler itself.
+   Fortunately, our current setup makes it very easy to add global
+   variables: just have value lookup check to see if an unresolved
+   variable is in the global variable symbol table before rejecting it.
+   To create a new global variable, make an instance of the LLVM
+   ``GlobalVariable`` class.
+-  **typed variables** - Kaleidoscope currently only supports variables
+   of type double. This gives the language a very nice elegance, because
+   only supporting one type means that you never have to specify types.
+   Different languages have different ways of handling this. The easiest
+   way is to require the user to specify types for every variable
+   definition, and record the type of the variable in the symbol table
+   along with its Value\*.
+-  **arrays, structs, vectors, etc** - Once you add types, you can start
+   extending the type system in all sorts of interesting ways. Simple
+   arrays are very easy and are quite useful for many different
+   applications. Adding them is mostly an exercise in learning how the
+   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction
+   works: it is so nifty/unconventional, it `has its own
+   FAQ <../GetElementPtr.html>`_!
+-  **standard runtime** - Our current language allows the user to access
+   arbitrary external functions, and we use it for things like "printd"
+   and "putchard". As you extend the language to add higher-level
+   constructs, often these constructs make the most sense if they are
+   lowered to calls into a language-supplied runtime. For example, if
+   you add hash tables to the language, it would probably make sense to
+   add the routines to a runtime, instead of inlining them all the way.
+-  **memory management** - Currently we can only access the stack in
+   Kaleidoscope. It would also be useful to be able to allocate heap
+   memory, either with calls to the standard libc malloc/free interface
+   or with a garbage collector. If you would like to use garbage
+   collection, note that LLVM fully supports `Accurate Garbage
+   Collection <../GarbageCollection.html>`_ including algorithms that
+   move objects and need to scan/update the stack.
+-  **exception handling support** - LLVM supports generation of `zero
+   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
+   code compiled in other languages. You could also generate code by
+   implicitly making every function return an error value and checking
+   it. You could also make explicit use of setjmp/longjmp. There are
+   many different ways to go here.
+-  **object orientation, generics, database access, complex numbers,
+   geometric programming, ...** - Really, there is no end of crazy
+   features that you can add to the language.
+-  **unusual domains** - We've been talking about applying LLVM to a
+   domain that many people are interested in: building a compiler for a
+   specific language. However, there are many other domains that can use
+   compiler technology that are not typically considered. For example,
+   LLVM has been used to implement OpenGL graphics acceleration,
+   translate C++ code to ActionScript, and many other cute and clever
+   things. Maybe you will be the first to JIT compile a regular
+   expression interpreter into native code with LLVM?
+
+Have fun - try doing something crazy and unusual. Building a language
+like everyone else always has, is much less fun than trying something a
+little crazy or off the wall and seeing how it turns out. If you get
+stuck or want to talk about it, feel free to email the `llvm-dev mailing
+list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_: it has lots
+of people who are interested in languages and are often willing to help
+out.
+
+Before we end this tutorial, I want to talk about some "tips and tricks"
+for generating LLVM IR. These are some of the more subtle things that
+may not be obvious, but are very useful if you want to take advantage of
+LLVM's capabilities.
+
+Properties of the LLVM IR
+=========================
+
+We have a couple of common questions about code in the LLVM IR form -
+let's just get these out of the way right now, shall we?
+
+Target Independence
+-------------------
+
+Kaleidoscope is an example of a "portable language": any program written
+in Kaleidoscope will work the same way on any target that it runs on.
+Many other languages have this property, e.g. lisp, java, haskell,
+javascript, python, etc (note that while these languages are portable,
+not all their libraries are).
+
+One nice aspect of LLVM is that it is often capable of preserving target
+independence in the IR: you can take the LLVM IR for a
+Kaleidoscope-compiled program and run it on any target that LLVM
+supports, even emitting C code and compiling that on targets that LLVM
+doesn't support natively. You can trivially tell that the Kaleidoscope
+compiler generates target-independent code because it never queries for
+any target-specific information when generating code.
+
+The fact that LLVM provides a compact, target-independent,
+representation for code gets a lot of people excited. Unfortunately,
+these people are usually thinking about C or a language from the C
+family when they are asking questions about language portability. I say
+"unfortunately", because there is really no way to make (fully general)
+C code portable, other than shipping the source code around (and of
+course, C source code is not actually portable in general either - ever
+port a really old application from 32- to 64-bits?).
+
+The problem with C (again, in its full generality) is that it is heavily
+laden with target specific assumptions. As one simple example, the
+preprocessor often destructively removes target-independence from the
+code when it processes the input text:
+
+.. code-block:: c
+
+    #ifdef __i386__
+      int X = 1;
+    #else
+      int X = 42;
+    #endif
+
+While it is possible to engineer more and more complex solutions to
+problems like this, it cannot be solved in full generality in a way that
+is better than shipping the actual source code.
+
+That said, there are interesting subsets of C that can be made portable.
+If you are willing to fix primitive types to a fixed size (say int =
+32-bits, and long = 64-bits), don't care about ABI compatibility with
+existing binaries, and are willing to give up some other minor features,
+you can have portable code. This can make sense for specialized domains
+such as an in-kernel language.
+
+Safety Guarantees
+-----------------
+
+Many of the languages above are also "safe" languages: it is impossible
+for a program written in Java to corrupt its address space and crash the
+process (assuming the JVM has no bugs). Safety is an interesting
+property that requires a combination of language design, runtime
+support, and often operating system support.
+
+It is certainly possible to implement a safe language in LLVM, but LLVM
+IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
+casts, use after free bugs, buffer over-runs, and a variety of other
+problems. Safety needs to be implemented as a layer on top of LLVM and,
+conveniently, several groups have investigated this. Ask on the `llvm-dev
+mailing list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ if
+you are interested in more details.
+
+Language-Specific Optimizations
+-------------------------------
+
+One thing about LLVM that turns off many people is that it does not
+solve all the world's problems in one system (sorry 'world hunger',
+someone else will have to solve you some other day). One specific
+complaint is that people perceive LLVM as being incapable of performing
+high-level language-specific optimization: LLVM "loses too much
+information".
+
+Unfortunately, this is really not the place to give you a full and
+unified version of "Chris Lattner's theory of compiler design". Instead,
+I'll make a few observations:
+
+First, you're right that LLVM does lose information. For example, as of
+this writing, there is no way to distinguish in the LLVM IR whether an
+SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
+than debug info). Both get compiled down to an 'i32' value and the
+information about what it came from is lost. The more general issue
+here, is that the LLVM type system uses "structural equivalence" instead
+of "name equivalence". Another place this surprises people is if you
+have two types in a high-level language that have the same structure
+(e.g. two different structs that have a single int field): these types
+will compile down into a single LLVM type and it will be impossible to
+tell what it came from.
+
+Second, while LLVM does lose information, LLVM is not a fixed target: we
+continue to enhance and improve it in many different ways. In addition
+to adding new features (LLVM did not always support exceptions or debug
+info), we also extend the IR to capture important information for
+optimization (e.g. whether an argument is sign or zero extended,
+information about pointers aliasing, etc). Many of the enhancements are
+user-driven: people want LLVM to include some specific feature, so they
+go ahead and extend it.
+
+Third, it is *possible and easy* to add language-specific optimizations,
+and you have a number of choices in how to do it. As one trivial
+example, it is easy to add language-specific optimization passes that
+"know" things about code compiled for a language. In the case of the C
+family, there is an optimization pass that "knows" about the standard C
+library functions. If you call "exit(0)" in main(), it knows that it is
+safe to optimize that into "return 0;" because C specifies what the
+'exit' function does.
+
+In addition to simple library knowledge, it is possible to embed a
+variety of other language-specific information into the LLVM IR. If you
+have a specific need and run into a wall, please bring the topic up on
+the llvm-dev list. At the very worst, you can always treat LLVM as if it
+were a "dumb code generator" and implement the high-level optimizations
+you desire in your front-end, on the language-specific AST.
+
+Tips and Tricks
+===============
+
+There is a variety of useful tips and tricks that you come to know after
+working on/with LLVM that aren't obvious at first glance. Instead of
+letting everyone rediscover them, this section talks about some of these
+issues.
+
+Implementing portable offsetof/sizeof
+-------------------------------------
+
+One interesting thing that comes up, if you are trying to keep the code
+generated by your compiler "target independent", is that you often need
+to know the size of some LLVM type or the offset of some field in an
+llvm structure. For example, you might need to pass the size of a type
+into a function that allocates memory.
+
+Unfortunately, this can vary widely across targets: for example the
+width of a pointer is trivially target-specific. However, there is a
+`clever way to use the getelementptr
+instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
+that allows you to compute this in a portable way.
+
+Garbage Collected Stack Frames
+------------------------------
+
+Some languages want to explicitly manage their stack frames, often so
+that they are garbage collected or to allow easy implementation of
+closures. There are often better ways to implement these features than
+explicit stack frames, but `LLVM does support
+them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
+if you want. It requires your front-end to convert the code into
+`Continuation Passing
+Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
+the use of tail calls (which LLVM also supports).
+

Removed: llvm/trunk/docs/tutorial/LangImpl2.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl2.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl2.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl2.rst (removed)
@@ -1,735 +0,0 @@
-===========================================
-Kaleidoscope: Implementing a Parser and AST
-===========================================
-
-.. contents::
-   :local:
-
-Chapter 2 Introduction
-======================
-
-Welcome to Chapter 2 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. This chapter shows you how to use the
-lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full
-`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
-language. Once we have a parser, we'll define and build an `Abstract
-Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
-
-The parser we will build uses a combination of `Recursive Descent
-Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
-`Operator-Precedence
-Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
-parse the Kaleidoscope language (the latter for binary expressions and
-the former for everything else). Before we get to parsing though, lets
-talk about the output of the parser: the Abstract Syntax Tree.
-
-The Abstract Syntax Tree (AST)
-==============================
-
-The AST for a program captures its behavior in such a way that it is
-easy for later stages of the compiler (e.g. code generation) to
-interpret. We basically want one object for each construct in the
-language, and the AST should closely model the language. In
-Kaleidoscope, we have expressions, a prototype, and a function object.
-We'll start with expressions first:
-
-.. code-block:: c++
-
-    /// ExprAST - Base class for all expression nodes.
-    class ExprAST {
-    public:
-      virtual ~ExprAST() {}
-    };
-
-    /// NumberExprAST - Expression class for numeric literals like "1.0".
-    class NumberExprAST : public ExprAST {
-      double Val;
-
-    public:
-      NumberExprAST(double Val) : Val(Val) {}
-    };
-
-The code above shows the definition of the base ExprAST class and one
-subclass which we use for numeric literals. The important thing to note
-about this code is that the NumberExprAST class captures the numeric
-value of the literal as an instance variable. This allows later phases
-of the compiler to know what the stored numeric value is.
-
-Right now we only create the AST, so there are no useful accessor
-methods on them. It would be very easy to add a virtual method to pretty
-print the code, for example. Here are the other expression AST node
-definitions that we'll use in the basic form of the Kaleidoscope
-language:
-
-.. code-block:: c++
-
-    /// VariableExprAST - Expression class for referencing a variable, like "a".
-    class VariableExprAST : public ExprAST {
-      std::string Name;
-
-    public:
-      VariableExprAST(const std::string &Name) : Name(Name) {}
-    };
-
-    /// BinaryExprAST - Expression class for a binary operator.
-    class BinaryExprAST : public ExprAST {
-      char Op;
-      std::unique_ptr<ExprAST> LHS, RHS;
-
-    public:
-      BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
-                    std::unique_ptr<ExprAST> RHS)
-        : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
-    };
-
-    /// CallExprAST - Expression class for function calls.
-    class CallExprAST : public ExprAST {
-      std::string Callee;
-      std::vector<std::unique_ptr<ExprAST>> Args;
-
-    public:
-      CallExprAST(const std::string &Callee,
-                  std::vector<std::unique_ptr<ExprAST>> Args)
-        : Callee(Callee), Args(std::move(Args)) {}
-    };
-
-This is all (intentionally) rather straight-forward: variables capture
-the variable name, binary operators capture their opcode (e.g. '+'), and
-calls capture a function name as well as a list of any argument
-expressions. One thing that is nice about our AST is that it captures
-the language features without talking about the syntax of the language.
-Note that there is no discussion about precedence of binary operators,
-lexical structure, etc.
-
-For our basic language, these are all of the expression nodes we'll
-define. Because it doesn't have conditional control flow, it isn't
-Turing-complete; we'll fix that in a later installment. The two things
-we need next are a way to talk about the interface to a function, and a
-way to talk about functions themselves:
-
-.. code-block:: c++
-
-    /// PrototypeAST - This class represents the "prototype" for a function,
-    /// which captures its name, and its argument names (thus implicitly the number
-    /// of arguments the function takes).
-    class PrototypeAST {
-      std::string Name;
-      std::vector<std::string> Args;
-
-    public:
-      PrototypeAST(const std::string &name, std::vector<std::string> Args)
-        : Name(name), Args(std::move(Args)) {}
-    };
-
-    /// FunctionAST - This class represents a function definition itself.
-    class FunctionAST {
-      std::unique_ptr<PrototypeAST> Proto;
-      std::unique_ptr<ExprAST> Body;
-
-    public:
-      FunctionAST(std::unique_ptr<PrototypeAST> Proto,
-                  std::unique_ptr<ExprAST> Body)
-        : Proto(std::move(Proto)), Body(std::move(Body)) {}
-    };
-
-In Kaleidoscope, functions are typed with just a count of their
-arguments. Since all values are double precision floating point, the
-type of each argument doesn't need to be stored anywhere. In a more
-aggressive and realistic language, the "ExprAST" class would probably
-have a type field.
-
-With this scaffolding, we can now talk about parsing expressions and
-function bodies in Kaleidoscope.
-
-Parser Basics
-=============
-
-Now that we have an AST to build, we need to define the parser code to
-build it. The idea here is that we want to parse something like "x+y"
-(which is returned as three tokens by the lexer) into an AST that could
-be generated with calls like this:
-
-.. code-block:: c++
-
-      auto LHS = llvm::make_unique<VariableExprAST>("x");
-      auto RHS = llvm::make_unique<VariableExprAST>("y");
-      auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
-                                                    std::move(RHS));
-
-In order to do this, we'll start by defining some basic helper routines:
-
-.. code-block:: c++
-
-    /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
-    /// token the parser is looking at.  getNextToken reads another token from the
-    /// lexer and updates CurTok with its results.
-    static int CurTok;
-    static int getNextToken() {
-      return CurTok = gettok();
-    }
-
-This implements a simple token buffer around the lexer. This allows us
-to look one token ahead at what the lexer is returning. Every function
-in our parser will assume that CurTok is the current token that needs to
-be parsed.
-
-.. code-block:: c++
-
-
-    /// LogError* - These are little helper functions for error handling.
-    std::unique_ptr<ExprAST> LogError(const char *Str) {
-      fprintf(stderr, "LogError: %s\n", Str);
-      return nullptr;
-    }
-    std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
-      LogError(Str);
-      return nullptr;
-    }
-
-The ``LogError`` routines are simple helper routines that our parser will
-use to handle errors. The error recovery in our parser will not be the
-best and is not particular user-friendly, but it will be enough for our
-tutorial. These routines make it easier to handle errors in routines
-that have various return types: they always return null.
-
-With these basic helper functions, we can implement the first piece of
-our grammar: numeric literals.
-
-Basic Expression Parsing
-========================
-
-We start with numeric literals, because they are the simplest to
-process. For each production in our grammar, we'll define a function
-which parses that production. For numeric literals, we have:
-
-.. code-block:: c++
-
-    /// numberexpr ::= number
-    static std::unique_ptr<ExprAST> ParseNumberExpr() {
-      auto Result = llvm::make_unique<NumberExprAST>(NumVal);
-      getNextToken(); // consume the number
-      return std::move(Result);
-    }
-
-This routine is very simple: it expects to be called when the current
-token is a ``tok_number`` token. It takes the current number value,
-creates a ``NumberExprAST`` node, advances the lexer to the next token,
-and finally returns.
-
-There are some interesting aspects to this. The most important one is
-that this routine eats all of the tokens that correspond to the
-production and returns the lexer buffer with the next token (which is
-not part of the grammar production) ready to go. This is a fairly
-standard way to go for recursive descent parsers. For a better example,
-the parenthesis operator is defined like this:
-
-.. code-block:: c++
-
-    /// parenexpr ::= '(' expression ')'
-    static std::unique_ptr<ExprAST> ParseParenExpr() {
-      getNextToken(); // eat (.
-      auto V = ParseExpression();
-      if (!V)
-        return nullptr;
-
-      if (CurTok != ')')
-        return LogError("expected ')'");
-      getNextToken(); // eat ).
-      return V;
-    }
-
-This function illustrates a number of interesting things about the
-parser:
-
-1) It shows how we use the LogError routines. When called, this function
-expects that the current token is a '(' token, but after parsing the
-subexpression, it is possible that there is no ')' waiting. For example,
-if the user types in "(4 x" instead of "(4)", the parser should emit an
-error. Because errors can occur, the parser needs a way to indicate that
-they happened: in our parser, we return null on an error.
-
-2) Another interesting aspect of this function is that it uses recursion
-by calling ``ParseExpression`` (we will soon see that
-``ParseExpression`` can call ``ParseParenExpr``). This is powerful
-because it allows us to handle recursive grammars, and keeps each
-production very simple. Note that parentheses do not cause construction
-of AST nodes themselves. While we could do it this way, the most
-important role of parentheses are to guide the parser and provide
-grouping. Once the parser constructs the AST, parentheses are not
-needed.
-
-The next simple production is for handling variable references and
-function calls:
-
-.. code-block:: c++
-
-    /// identifierexpr
-    ///   ::= identifier
-    ///   ::= identifier '(' expression* ')'
-    static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
-      std::string IdName = IdentifierStr;
-
-      getNextToken();  // eat identifier.
-
-      if (CurTok != '(') // Simple variable ref.
-        return llvm::make_unique<VariableExprAST>(IdName);
-
-      // Call.
-      getNextToken();  // eat (
-      std::vector<std::unique_ptr<ExprAST>> Args;
-      if (CurTok != ')') {
-        while (1) {
-          if (auto Arg = ParseExpression())
-            Args.push_back(std::move(Arg));
-          else
-            return nullptr;
-
-          if (CurTok == ')')
-            break;
-
-          if (CurTok != ',')
-            return LogError("Expected ')' or ',' in argument list");
-          getNextToken();
-        }
-      }
-
-      // Eat the ')'.
-      getNextToken();
-
-      return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
-    }
-
-This routine follows the same style as the other routines. (It expects
-to be called if the current token is a ``tok_identifier`` token). It
-also has recursion and error handling. One interesting aspect of this is
-that it uses *look-ahead* to determine if the current identifier is a
-stand alone variable reference or if it is a function call expression.
-It handles this by checking to see if the token after the identifier is
-a '(' token, constructing either a ``VariableExprAST`` or
-``CallExprAST`` node as appropriate.
-
-Now that we have all of our simple expression-parsing logic in place, we
-can define a helper function to wrap it together into one entry point.
-We call this class of expressions "primary" expressions, for reasons
-that will become more clear `later in the
-tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
-primary expression, we need to determine what sort of expression it is:
-
-.. code-block:: c++
-
-    /// primary
-    ///   ::= identifierexpr
-    ///   ::= numberexpr
-    ///   ::= parenexpr
-    static std::unique_ptr<ExprAST> ParsePrimary() {
-      switch (CurTok) {
-      default:
-        return LogError("unknown token when expecting an expression");
-      case tok_identifier:
-        return ParseIdentifierExpr();
-      case tok_number:
-        return ParseNumberExpr();
-      case '(':
-        return ParseParenExpr();
-      }
-    }
-
-Now that you see the definition of this function, it is more obvious why
-we can assume the state of CurTok in the various functions. This uses
-look-ahead to determine which sort of expression is being inspected, and
-then parses it with a function call.
-
-Now that basic expressions are handled, we need to handle binary
-expressions. They are a bit more complex.
-
-Binary Expression Parsing
-=========================
-
-Binary expressions are significantly harder to parse because they are
-often ambiguous. For example, when given the string "x+y\*z", the parser
-can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
-definitions from mathematics, we expect the later parse, because "\*"
-(multiplication) has higher *precedence* than "+" (addition).
-
-There are many ways to handle this, but an elegant and efficient way is
-to use `Operator-Precedence
-Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
-This parsing technique uses the precedence of binary operators to guide
-recursion. To start with, we need a table of precedences:
-
-.. code-block:: c++
-
-    /// BinopPrecedence - This holds the precedence for each binary operator that is
-    /// defined.
-    static std::map<char, int> BinopPrecedence;
-
-    /// GetTokPrecedence - Get the precedence of the pending binary operator token.
-    static int GetTokPrecedence() {
-      if (!isascii(CurTok))
-        return -1;
-
-      // Make sure it's a declared binop.
-      int TokPrec = BinopPrecedence[CurTok];
-      if (TokPrec <= 0) return -1;
-      return TokPrec;
-    }
-
-    int main() {
-      // Install standard binary operators.
-      // 1 is lowest precedence.
-      BinopPrecedence['<'] = 10;
-      BinopPrecedence['+'] = 20;
-      BinopPrecedence['-'] = 20;
-      BinopPrecedence['*'] = 40;  // highest.
-      ...
-    }
-
-For the basic form of Kaleidoscope, we will only support 4 binary
-operators (this can obviously be extended by you, our brave and intrepid
-reader). The ``GetTokPrecedence`` function returns the precedence for
-the current token, or -1 if the token is not a binary operator. Having a
-map makes it easy to add new operators and makes it clear that the
-algorithm doesn't depend on the specific operators involved, but it
-would be easy enough to eliminate the map and do the comparisons in the
-``GetTokPrecedence`` function. (Or just use a fixed-size array).
-
-With the helper above defined, we can now start parsing binary
-expressions. The basic idea of operator precedence parsing is to break
-down an expression with potentially ambiguous binary operators into
-pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
-Operator precedence parsing considers this as a stream of primary
-expressions separated by binary operators. As such, it will first parse
-the leading primary expression "a", then it will see the pairs [+, b]
-[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
-primary expressions, the binary expression parser doesn't need to worry
-about nested subexpressions like (c+d) at all.
-
-To start, an expression is a primary expression potentially followed by
-a sequence of [binop,primaryexpr] pairs:
-
-.. code-block:: c++
-
-    /// expression
-    ///   ::= primary binoprhs
-    ///
-    static std::unique_ptr<ExprAST> ParseExpression() {
-      auto LHS = ParsePrimary();
-      if (!LHS)
-        return nullptr;
-
-      return ParseBinOpRHS(0, std::move(LHS));
-    }
-
-``ParseBinOpRHS`` is the function that parses the sequence of pairs for
-us. It takes a precedence and a pointer to an expression for the part
-that has been parsed so far. Note that "x" is a perfectly valid
-expression: As such, "binoprhs" is allowed to be empty, in which case it
-returns the expression that is passed into it. In our example above, the
-code passes the expression for "a" into ``ParseBinOpRHS`` and the
-current token is "+".
-
-The precedence value passed into ``ParseBinOpRHS`` indicates the
-*minimal operator precedence* that the function is allowed to eat. For
-example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
-passed in a precedence of 40, it will not consume any tokens (because
-the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
-starts with:
-
-.. code-block:: c++
-
-    /// binoprhs
-    ///   ::= ('+' primary)*
-    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
-                                                  std::unique_ptr<ExprAST> LHS) {
-      // If this is a binop, find its precedence.
-      while (1) {
-        int TokPrec = GetTokPrecedence();
-
-        // If this is a binop that binds at least as tightly as the current binop,
-        // consume it, otherwise we are done.
-        if (TokPrec < ExprPrec)
-          return LHS;
-
-This code gets the precedence of the current token and checks to see if
-if is too low. Because we defined invalid tokens to have a precedence of
--1, this check implicitly knows that the pair-stream ends when the token
-stream runs out of binary operators. If this check succeeds, we know
-that the token is a binary operator and that it will be included in this
-expression:
-
-.. code-block:: c++
-
-        // Okay, we know this is a binop.
-        int BinOp = CurTok;
-        getNextToken();  // eat binop
-
-        // Parse the primary expression after the binary operator.
-        auto RHS = ParsePrimary();
-        if (!RHS)
-          return nullptr;
-
-As such, this code eats (and remembers) the binary operator and then
-parses the primary expression that follows. This builds up the whole
-pair, the first of which is [+, b] for the running example.
-
-Now that we parsed the left-hand side of an expression and one pair of
-the RHS sequence, we have to decide which way the expression associates.
-In particular, we could have "(a+b) binop unparsed" or "a + (b binop
-unparsed)". To determine this, we look ahead at "binop" to determine its
-precedence and compare it to BinOp's precedence (which is '+' in this
-case):
-
-.. code-block:: c++
-
-        // If BinOp binds less tightly with RHS than the operator after RHS, let
-        // the pending operator take RHS as its LHS.
-        int NextPrec = GetTokPrecedence();
-        if (TokPrec < NextPrec) {
-
-If the precedence of the binop to the right of "RHS" is lower or equal
-to the precedence of our current operator, then we know that the
-parentheses associate as "(a+b) binop ...". In our example, the current
-operator is "+" and the next operator is "+", we know that they have the
-same precedence. In this case we'll create the AST node for "a+b", and
-then continue parsing:
-
-.. code-block:: c++
-
-          ... if body omitted ...
-        }
-
-        // Merge LHS/RHS.
-        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
-                                               std::move(RHS));
-      }  // loop around to the top of the while loop.
-    }
-
-In our example above, this will turn "a+b+" into "(a+b)" and execute the
-next iteration of the loop, with "+" as the current token. The code
-above will eat, remember, and parse "(c+d)" as the primary expression,
-which makes the current pair equal to [+, (c+d)]. It will then evaluate
-the 'if' conditional above with "\*" as the binop to the right of the
-primary. In this case, the precedence of "\*" is higher than the
-precedence of "+" so the if condition will be entered.
-
-The critical question left here is "how can the if condition parse the
-right hand side in full"? In particular, to build the AST correctly for
-our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
-variable. The code to do this is surprisingly simple (code from the
-above two blocks duplicated for context):
-
-.. code-block:: c++
-
-        // If BinOp binds less tightly with RHS than the operator after RHS, let
-        // the pending operator take RHS as its LHS.
-        int NextPrec = GetTokPrecedence();
-        if (TokPrec < NextPrec) {
-          RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
-          if (!RHS)
-            return nullptr;
-        }
-        // Merge LHS/RHS.
-        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
-                                               std::move(RHS));
-      }  // loop around to the top of the while loop.
-    }
-
-At this point, we know that the binary operator to the RHS of our
-primary has higher precedence than the binop we are currently parsing.
-As such, we know that any sequence of pairs whose operators are all
-higher precedence than "+" should be parsed together and returned as
-"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
-specifying "TokPrec+1" as the minimum precedence required for it to
-continue. In our example above, this will cause it to return the AST
-node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
-expression.
-
-Finally, on the next iteration of the while loop, the "+g" piece is
-parsed and added to the AST. With this little bit of code (14
-non-trivial lines), we correctly handle fully general binary expression
-parsing in a very elegant way. This was a whirlwind tour of this code,
-and it is somewhat subtle. I recommend running through it with a few
-tough examples to see how it works.
-
-This wraps up handling of expressions. At this point, we can point the
-parser at an arbitrary token stream and build an expression from it,
-stopping at the first token that is not part of the expression. Next up
-we need to handle function definitions, etc.
-
-Parsing the Rest
-================
-
-The next thing missing is handling of function prototypes. In
-Kaleidoscope, these are used both for 'extern' function declarations as
-well as function body definitions. The code to do this is
-straight-forward and not very interesting (once you've survived
-expressions):
-
-.. code-block:: c++
-
-    /// prototype
-    ///   ::= id '(' id* ')'
-    static std::unique_ptr<PrototypeAST> ParsePrototype() {
-      if (CurTok != tok_identifier)
-        return LogErrorP("Expected function name in prototype");
-
-      std::string FnName = IdentifierStr;
-      getNextToken();
-
-      if (CurTok != '(')
-        return LogErrorP("Expected '(' in prototype");
-
-      // Read the list of argument names.
-      std::vector<std::string> ArgNames;
-      while (getNextToken() == tok_identifier)
-        ArgNames.push_back(IdentifierStr);
-      if (CurTok != ')')
-        return LogErrorP("Expected ')' in prototype");
-
-      // success.
-      getNextToken();  // eat ')'.
-
-      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
-    }
-
-Given this, a function definition is very simple, just a prototype plus
-an expression to implement the body:
-
-.. code-block:: c++
-
-    /// definition ::= 'def' prototype expression
-    static std::unique_ptr<FunctionAST> ParseDefinition() {
-      getNextToken();  // eat def.
-      auto Proto = ParsePrototype();
-      if (!Proto) return nullptr;
-
-      if (auto E = ParseExpression())
-        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
-      return nullptr;
-    }
-
-In addition, we support 'extern' to declare functions like 'sin' and
-'cos' as well as to support forward declaration of user functions. These
-'extern's are just prototypes with no body:
-
-.. code-block:: c++
-
-    /// external ::= 'extern' prototype
-    static std::unique_ptr<PrototypeAST> ParseExtern() {
-      getNextToken();  // eat extern.
-      return ParsePrototype();
-    }
-
-Finally, we'll also let the user type in arbitrary top-level expressions
-and evaluate them on the fly. We will handle this by defining anonymous
-nullary (zero argument) functions for them:
-
-.. code-block:: c++
-
-    /// toplevelexpr ::= expression
-    static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
-      if (auto E = ParseExpression()) {
-        // Make an anonymous proto.
-        auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
-        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
-      }
-      return nullptr;
-    }
-
-Now that we have all the pieces, let's build a little driver that will
-let us actually *execute* this code we've built!
-
-The Driver
-==========
-
-The driver for this simply invokes all of the parsing pieces with a
-top-level dispatch loop. There isn't much interesting here, so I'll just
-include the top-level loop. See `below <#full-code-listing>`_ for full code in the
-"Top-Level Parsing" section.
-
-.. code-block:: c++
-
-    /// top ::= definition | external | expression | ';'
-    static void MainLoop() {
-      while (1) {
-        fprintf(stderr, "ready> ");
-        switch (CurTok) {
-        case tok_eof:
-          return;
-        case ';': // ignore top-level semicolons.
-          getNextToken();
-          break;
-        case tok_def:
-          HandleDefinition();
-          break;
-        case tok_extern:
-          HandleExtern();
-          break;
-        default:
-          HandleTopLevelExpression();
-          break;
-        }
-      }
-    }
-
-The most interesting part of this is that we ignore top-level
-semicolons. Why is this, you ask? The basic reason is that if you type
-"4 + 5" at the command line, the parser doesn't know whether that is the
-end of what you will type or not. For example, on the next line you
-could type "def foo..." in which case 4+5 is the end of a top-level
-expression. Alternatively you could type "\* 6", which would continue
-the expression. Having top-level semicolons allows you to type "4+5;",
-and the parser will know you are done.
-
-Conclusions
-===========
-
-With just under 400 lines of commented code (240 lines of non-comment,
-non-blank code), we fully defined our minimal language, including a
-lexer, parser, and AST builder. With this done, the executable will
-validate Kaleidoscope code and tell us if it is grammatically invalid.
-For example, here is a sample interaction:
-
-.. code-block:: bash
-
-    $ ./a.out
-    ready> def foo(x y) x+foo(y, 4.0);
-    Parsed a function definition.
-    ready> def foo(x y) x+y y;
-    Parsed a function definition.
-    Parsed a top-level expr
-    ready> def foo(x y) x+y );
-    Parsed a function definition.
-    Error: unknown token when expecting an expression
-    ready> extern sin(a);
-    ready> Parsed an extern
-    ready> ^D
-    $
-
-There is a lot of room for extension here. You can define new AST nodes,
-extend the language in many ways, etc. In the `next
-installment <LangImpl3.html>`_, we will describe how to generate LLVM
-Intermediate Representation (IR) from the AST.
-
-Full Code Listing
-=================
-
-Here is the complete code listing for this and the previous chapter.
-Note that it is fully self-contained: you don't need LLVM or any
-external libraries at all for this. (Besides the C and C++ standard
-libraries, of course.) To build this, just compile with:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g -O3 toy.cpp
-    # Run
-    ./a.out
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp
-   :language: c++
-
-`Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl3.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl3.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl3.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl3.rst (removed)
@@ -1,567 +0,0 @@
-========================================
-Kaleidoscope: Code generation to LLVM IR
-========================================
-
-.. contents::
-   :local:
-
-Chapter 3 Introduction
-======================
-
-Welcome to Chapter 3 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. This chapter shows you how to transform
-the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into
-LLVM IR. This will teach you a little bit about how LLVM does things, as
-well as demonstrate how easy it is to use. It's much more work to build
-a lexer and parser than it is to generate LLVM IR code. :)
-
-**Please note**: the code in this chapter and later require LLVM 3.7 or
-later. LLVM 3.6 and before will not work with it. Also note that you
-need to use a version of this tutorial that matches your LLVM release:
-If you are using an official LLVM release, use the version of the
-documentation included with your release or on the `llvm.org releases
-page <http://llvm.org/releases/>`_.
-
-Code Generation Setup
-=====================
-
-In order to generate LLVM IR, we want some simple setup to get started.
-First we define virtual code generation (codegen) methods in each AST
-class:
-
-.. code-block:: c++
-
-    /// ExprAST - Base class for all expression nodes.
-    class ExprAST {
-    public:
-      virtual ~ExprAST() {}
-      virtual Value *codegen() = 0;
-    };
-
-    /// NumberExprAST - Expression class for numeric literals like "1.0".
-    class NumberExprAST : public ExprAST {
-      double Val;
-
-    public:
-      NumberExprAST(double Val) : Val(Val) {}
-      virtual Value *codegen();
-    };
-    ...
-
-The codegen() method says to emit IR for that AST node along with all
-the things it depends on, and they all return an LLVM Value object.
-"Value" is the class used to represent a "`Static Single Assignment
-(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
-register" or "SSA value" in LLVM. The most distinct aspect of SSA values
-is that their value is computed as the related instruction executes, and
-it does not get a new value until (and if) the instruction re-executes.
-In other words, there is no way to "change" an SSA value. For more
-information, please read up on `Static Single
-Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
-- the concepts are really quite natural once you grok them.
-
-Note that instead of adding virtual methods to the ExprAST class
-hierarchy, it could also make sense to use a `visitor
-pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other
-way to model this. Again, this tutorial won't dwell on good software
-engineering practices: for our purposes, adding a virtual method is
-simplest.
-
-The second thing we want is an "LogError" method like we used for the
-parser, which will be used to report errors found during code generation
-(for example, use of an undeclared parameter):
-
-.. code-block:: c++
-
-    static LLVMContext TheContext;
-    static IRBuilder<> Builder(TheContext);
-    static std::unique_ptr<Module> TheModule;
-    static std::map<std::string, Value *> NamedValues;
-
-    Value *LogErrorV(const char *Str) {
-      LogError(Str);
-      return nullptr;
-    }
-
-The static variables will be used during code generation. ``TheContext``
-is an opaque object that owns a lot of core LLVM data structures, such as
-the type and constant value tables. We don't need to understand it in
-detail, we just need a single instance to pass into APIs that require it.
-
-The ``Builder`` object is a helper object that makes it easy to generate
-LLVM instructions. Instances of the
-`IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
-class template keep track of the current place to insert instructions
-and has methods to create new instructions.
-
-``TheModule`` is an LLVM construct that contains functions and global
-variables. In many ways, it is the top-level structure that the LLVM IR
-uses to contain code. It will own the memory for all of the IR that we
-generate, which is why the codegen() method returns a raw Value\*,
-rather than a unique_ptr<Value>.
-
-The ``NamedValues`` map keeps track of which values are defined in the
-current scope and what their LLVM representation is. (In other words, it
-is a symbol table for the code). In this form of Kaleidoscope, the only
-things that can be referenced are function parameters. As such, function
-parameters will be in this map when generating code for their function
-body.
-
-With these basics in place, we can start talking about how to generate
-code for each expression. Note that this assumes that the ``Builder``
-has been set up to generate code *into* something. For now, we'll assume
-that this has already been done, and we'll just use it to emit code.
-
-Expression Code Generation
-==========================
-
-Generating LLVM code for expression nodes is very straightforward: less
-than 45 lines of commented code for all four of our expression nodes.
-First we'll do numeric literals:
-
-.. code-block:: c++
-
-    Value *NumberExprAST::codegen() {
-      return ConstantFP::get(LLVMContext, APFloat(Val));
-    }
-
-In the LLVM IR, numeric constants are represented with the
-``ConstantFP`` class, which holds the numeric value in an ``APFloat``
-internally (``APFloat`` has the capability of holding floating point
-constants of Arbitrary Precision). This code basically just creates
-and returns a ``ConstantFP``. Note that in the LLVM IR that constants
-are all uniqued together and shared. For this reason, the API uses the
-"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".
-
-.. code-block:: c++
-
-    Value *VariableExprAST::codegen() {
-      // Look this variable up in the function.
-      Value *V = NamedValues[Name];
-      if (!V)
-        LogErrorV("Unknown variable name");
-      return V;
-    }
-
-References to variables are also quite simple using LLVM. In the simple
-version of Kaleidoscope, we assume that the variable has already been
-emitted somewhere and its value is available. In practice, the only
-values that can be in the ``NamedValues`` map are function arguments.
-This code simply checks to see that the specified name is in the map (if
-not, an unknown variable is being referenced) and returns the value for
-it. In future chapters, we'll add support for `loop induction
-variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local
-variables <LangImpl7.html#user-defined-local-variables>`_.
-
-.. code-block:: c++
-
-    Value *BinaryExprAST::codegen() {
-      Value *L = LHS->codegen();
-      Value *R = RHS->codegen();
-      if (!L || !R)
-        return nullptr;
-
-      switch (Op) {
-      case '+':
-        return Builder.CreateFAdd(L, R, "addtmp");
-      case '-':
-        return Builder.CreateFSub(L, R, "subtmp");
-      case '*':
-        return Builder.CreateFMul(L, R, "multmp");
-      case '<':
-        L = Builder.CreateFCmpULT(L, R, "cmptmp");
-        // Convert bool 0/1 to double 0.0 or 1.0
-        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),
-                                    "booltmp");
-      default:
-        return LogErrorV("invalid binary operator");
-      }
-    }
-
-Binary operators start to get more interesting. The basic idea here is
-that we recursively emit code for the left-hand side of the expression,
-then the right-hand side, then we compute the result of the binary
-expression. In this code, we do a simple switch on the opcode to create
-the right LLVM instruction.
-
-In the example above, the LLVM builder class is starting to show its
-value. IRBuilder knows where to insert the newly created instruction,
-all you have to do is specify what instruction to create (e.g. with
-``CreateFAdd``), which operands to use (``L`` and ``R`` here) and
-optionally provide a name for the generated instruction.
-
-One nice thing about LLVM is that the name is just a hint. For instance,
-if the code above emits multiple "addtmp" variables, LLVM will
-automatically provide each one with an increasing, unique numeric
-suffix. Local value names for instructions are purely optional, but it
-makes it much easier to read the IR dumps.
-
-`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
-rules: for example, the Left and Right operators of an `add
-instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
-result type of the add must match the operand types. Because all values
-in Kaleidoscope are doubles, this makes for very simple code for add,
-sub and mul.
-
-On the other hand, LLVM specifies that the `fcmp
-instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
-one bit integer). The problem with this is that Kaleidoscope wants the
-value to be a 0.0 or 1.0 value. In order to get these semantics, we
-combine the fcmp instruction with a `uitofp
-instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
-input integer into a floating point value by treating the input as an
-unsigned value. In contrast, if we used the `sitofp
-instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
-would return 0.0 and -1.0, depending on the input value.
-
-.. code-block:: c++
-
-    Value *CallExprAST::codegen() {
-      // Look up the name in the global module table.
-      Function *CalleeF = TheModule->getFunction(Callee);
-      if (!CalleeF)
-        return LogErrorV("Unknown function referenced");
-
-      // If argument mismatch error.
-      if (CalleeF->arg_size() != Args.size())
-        return LogErrorV("Incorrect # arguments passed");
-
-      std::vector<Value *> ArgsV;
-      for (unsigned i = 0, e = Args.size(); i != e; ++i) {
-        ArgsV.push_back(Args[i]->codegen());
-        if (!ArgsV.back())
-          return nullptr;
-      }
-
-      return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
-    }
-
-Code generation for function calls is quite straightforward with LLVM. The code
-above initially does a function name lookup in the LLVM Module's symbol table.
-Recall that the LLVM Module is the container that holds the functions we are
-JIT'ing. By giving each function the same name as what the user specifies, we
-can use the LLVM symbol table to resolve function names for us.
-
-Once we have the function to call, we recursively codegen each argument
-that is to be passed in, and create an LLVM `call
-instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
-calling conventions by default, allowing these calls to also call into
-standard library functions like "sin" and "cos", with no additional
-effort.
-
-This wraps up our handling of the four basic expressions that we have so
-far in Kaleidoscope. Feel free to go in and add some more. For example,
-by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
-several other interesting instructions that are really easy to plug into
-our basic framework.
-
-Function Code Generation
-========================
-
-Code generation for prototypes and functions must handle a number of
-details, which make their code less beautiful than expression code
-generation, but allows us to illustrate some important points. First,
-lets talk about code generation for prototypes: they are used both for
-function bodies and external function declarations. The code starts
-with:
-
-.. code-block:: c++
-
-    Function *PrototypeAST::codegen() {
-      // Make the function type:  double(double,double) etc.
-      std::vector<Type*> Doubles(Args.size(),
-                                 Type::getDoubleTy(LLVMContext));
-      FunctionType *FT =
-        FunctionType::get(Type::getDoubleTy(LLVMContext), Doubles, false);
-
-      Function *F =
-        Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
-
-This code packs a lot of power into a few lines. Note first that this
-function returns a "Function\*" instead of a "Value\*". Because a
-"prototype" really talks about the external interface for a function
-(not the value computed by an expression), it makes sense for it to
-return the LLVM Function it corresponds to when codegen'd.
-
-The call to ``FunctionType::get`` creates the ``FunctionType`` that
-should be used for a given Prototype. Since all function arguments in
-Kaleidoscope are of type double, the first line creates a vector of "N"
-LLVM double types. It then uses the ``Functiontype::get`` method to
-create a function type that takes "N" doubles as arguments, returns one
-double as a result, and that is not vararg (the false parameter
-indicates this). Note that Types in LLVM are uniqued just like Constants
-are, so you don't "new" a type, you "get" it.
-
-The final line above actually creates the IR Function corresponding to
-the Prototype. This indicates the type, linkage and name to use, as
-well as which module to insert into. "`external
-linkage <../LangRef.html#linkage>`_" means that the function may be
-defined outside the current module and/or that it is callable by
-functions outside the module. The Name passed in is the name the user
-specified: since "``TheModule``" is specified, this name is registered
-in "``TheModule``"s symbol table.
-
-.. code-block:: c++
-
-  // Set names for all arguments.
-  unsigned Idx = 0;
-  for (auto &Arg : F->args())
-    Arg.setName(Args[Idx++]);
-
-  return F;
-
-Finally, we set the name of each of the function's arguments according to the
-names given in the Prototype. This step isn't strictly necessary, but keeping
-the names consistent makes the IR more readable, and allows subsequent code to
-refer directly to the arguments for their names, rather than having to look up
-them up in the Prototype AST.
-
-At this point we have a function prototype with no body. This is how LLVM IR
-represents function declarations. For extern statements in Kaleidoscope, this
-is as far as we need to go. For function definitions however, we need to
-codegen and attach a function body.
-
-.. code-block:: c++
-
-  Function *FunctionAST::codegen() {
-      // First, check for an existing function from a previous 'extern' declaration.
-    Function *TheFunction = TheModule->getFunction(Proto->getName());
-
-    if (!TheFunction)
-      TheFunction = Proto->codegen();
-
-    if (!TheFunction)
-      return nullptr;
-
-    if (!TheFunction->empty())
-      return (Function*)LogErrorV("Function cannot be redefined.");
-
-
-For function definitions, we start by searching TheModule's symbol table for an
-existing version of this function, in case one has already been created using an
-'extern' statement. If Module::getFunction returns null then no previous version
-exists, so we'll codegen one from the Prototype. In either case, we want to
-assert that the function is empty (i.e. has no body yet) before we start.
-
-.. code-block:: c++
-
-  // Create a new basic block to start insertion into.
-  BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);
-  Builder.SetInsertPoint(BB);
-
-  // Record the function arguments in the NamedValues map.
-  NamedValues.clear();
-  for (auto &Arg : TheFunction->args())
-    NamedValues[Arg.getName()] = &Arg;
-
-Now we get to the point where the ``Builder`` is set up. The first line
-creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
-(named "entry"), which is inserted into ``TheFunction``. The second line
-then tells the builder that new instructions should be inserted into the
-end of the new basic block. Basic blocks in LLVM are an important part
-of functions that define the `Control Flow
-Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
-don't have any control flow, our functions will only contain one block
-at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).
-
-Next we add the function arguments to the NamedValues map (after first clearing
-it out) so that they're accessible to ``VariableExprAST`` nodes.
-
-.. code-block:: c++
-
-      if (Value *RetVal = Body->codegen()) {
-        // Finish off the function.
-        Builder.CreateRet(RetVal);
-
-        // Validate the generated code, checking for consistency.
-        verifyFunction(*TheFunction);
-
-        return TheFunction;
-      }
-
-Once the insertion point has been set up and the NamedValues map populated,
-we call the ``codegen()`` method for the root expression of the function. If no
-error happens, this emits code to compute the expression into the entry block
-and returns the value that was computed. Assuming no error, we then create an
-LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function.
-Once the function is built, we call ``verifyFunction``, which is
-provided by LLVM. This function does a variety of consistency checks on
-the generated code, to determine if our compiler is doing everything
-right. Using this is important: it can catch a lot of bugs. Once the
-function is finished and validated, we return it.
-
-.. code-block:: c++
-
-      // Error reading body, remove function.
-      TheFunction->eraseFromParent();
-      return nullptr;
-    }
-
-The only piece left here is handling of the error case. For simplicity,
-we handle this by merely deleting the function we produced with the
-``eraseFromParent`` method. This allows the user to redefine a function
-that they incorrectly typed in before: if we didn't delete it, it would
-live in the symbol table, with a body, preventing future redefinition.
-
-This code does have a bug, though: If the ``FunctionAST::codegen()`` method
-finds an existing IR Function, it does not validate its signature against the
-definition's own prototype. This means that an earlier 'extern' declaration will
-take precedence over the function definition's signature, which can cause
-codegen to fail, for instance if the function arguments are named differently.
-There are a number of ways to fix this bug, see what you can come up with! Here
-is a testcase:
-
-::
-
-    extern foo(a);     # ok, defines foo.
-    def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence).
-
-Driver Changes and Closing Thoughts
-===================================
-
-For now, code generation to LLVM doesn't really get us much, except that
-we can look at the pretty IR calls. The sample code inserts calls to
-codegen into the "``HandleDefinition``", "``HandleExtern``" etc
-functions, and then dumps out the LLVM IR. This gives a nice way to look
-at the LLVM IR for simple functions. For example:
-
-::
-
-    ready> 4+5;
-    Read top-level expression:
-    define double @0() {
-    entry:
-      ret double 9.000000e+00
-    }
-
-Note how the parser turns the top-level expression into anonymous
-functions for us. This will be handy when we add `JIT
-support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the
-code is very literally transcribed, no optimizations are being performed
-except simple constant folding done by IRBuilder. We will `add
-optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next
-chapter.
-
-::
-
-    ready> def foo(a b) a*a + 2*a*b + b*b;
-    Read function definition:
-    define double @foo(double %a, double %b) {
-    entry:
-      %multmp = fmul double %a, %a
-      %multmp1 = fmul double 2.000000e+00, %a
-      %multmp2 = fmul double %multmp1, %b
-      %addtmp = fadd double %multmp, %multmp2
-      %multmp3 = fmul double %b, %b
-      %addtmp4 = fadd double %addtmp, %multmp3
-      ret double %addtmp4
-    }
-
-This shows some simple arithmetic. Notice the striking similarity to the
-LLVM builder calls that we use to create the instructions.
-
-::
-
-    ready> def bar(a) foo(a, 4.0) + bar(31337);
-    Read function definition:
-    define double @bar(double %a) {
-    entry:
-      %calltmp = call double @foo(double %a, double 4.000000e+00)
-      %calltmp1 = call double @bar(double 3.133700e+04)
-      %addtmp = fadd double %calltmp, %calltmp1
-      ret double %addtmp
-    }
-
-This shows some function calls. Note that this function will take a long
-time to execute if you call it. In the future we'll add conditional
-control flow to actually make recursion useful :).
-
-::
-
-    ready> extern cos(x);
-    Read extern:
-    declare double @cos(double)
-
-    ready> cos(1.234);
-    Read top-level expression:
-    define double @1() {
-    entry:
-      %calltmp = call double @cos(double 1.234000e+00)
-      ret double %calltmp
-    }
-
-This shows an extern for the libm "cos" function, and a call to it.
-
-.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up
-   on highlighting this due to the first line.
-
-::
-
-    ready> ^D
-    ; ModuleID = 'my cool jit'
-
-    define double @0() {
-    entry:
-      %addtmp = fadd double 4.000000e+00, 5.000000e+00
-      ret double %addtmp
-    }
-
-    define double @foo(double %a, double %b) {
-    entry:
-      %multmp = fmul double %a, %a
-      %multmp1 = fmul double 2.000000e+00, %a
-      %multmp2 = fmul double %multmp1, %b
-      %addtmp = fadd double %multmp, %multmp2
-      %multmp3 = fmul double %b, %b
-      %addtmp4 = fadd double %addtmp, %multmp3
-      ret double %addtmp4
-    }
-
-    define double @bar(double %a) {
-    entry:
-      %calltmp = call double @foo(double %a, double 4.000000e+00)
-      %calltmp1 = call double @bar(double 3.133700e+04)
-      %addtmp = fadd double %calltmp, %calltmp1
-      ret double %addtmp
-    }
-
-    declare double @cos(double)
-
-    define double @1() {
-    entry:
-      %calltmp = call double @cos(double 1.234000e+00)
-      ret double %calltmp
-    }
-
-When you quit the current demo, it dumps out the IR for the entire
-module generated. Here you can see the big picture with all the
-functions referencing each other.
-
-This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
-we'll describe how to `add JIT codegen and optimizer
-support <LangImpl4.html>`_ to this so we can actually start running
-code!
-
-Full Code Listing
-=================
-
-Here is the complete code listing for our running example, enhanced with
-the LLVM code generator. Because this uses the LLVM libraries, we need
-to link them in. To do this, we use the
-`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
-our makefile/command line about which options to use:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
-    # Run
-    ./toy
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter3/toy.cpp
-   :language: c++
-
-`Next: Adding JIT and Optimizer Support <LangImpl4.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl4.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl4.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl4.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl4.rst (removed)
@@ -1,610 +0,0 @@
-==============================================
-Kaleidoscope: Adding JIT and Optimizer Support
-==============================================
-
-.. contents::
-   :local:
-
-Chapter 4 Introduction
-======================
-
-Welcome to Chapter 4 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
-of a simple language and added support for generating LLVM IR. This
-chapter describes two new techniques: adding optimizer support to your
-language, and adding JIT compiler support. These additions will
-demonstrate how to get nice, efficient code for the Kaleidoscope
-language.
-
-Trivial Constant Folding
-========================
-
-Our demonstration for Chapter 3 is elegant and easy to extend.
-Unfortunately, it does not produce wonderful code. The IRBuilder,
-however, does give us obvious optimizations when compiling simple code:
-
-::
-
-    ready> def test(x) 1+2+x;
-    Read function definition:
-    define double @test(double %x) {
-    entry:
-            %addtmp = fadd double 3.000000e+00, %x
-            ret double %addtmp
-    }
-
-This code is not a literal transcription of the AST built by parsing the
-input. That would be:
-
-::
-
-    ready> def test(x) 1+2+x;
-    Read function definition:
-    define double @test(double %x) {
-    entry:
-            %addtmp = fadd double 2.000000e+00, 1.000000e+00
-            %addtmp1 = fadd double %addtmp, %x
-            ret double %addtmp1
-    }
-
-Constant folding, as seen above, in particular, is a very common and
-very important optimization: so much so that many language implementors
-implement constant folding support in their AST representation.
-
-With LLVM, you don't need this support in the AST. Since all calls to
-build LLVM IR go through the LLVM IR builder, the builder itself checked
-to see if there was a constant folding opportunity when you call it. If
-so, it just does the constant fold and return the constant instead of
-creating an instruction.
-
-Well, that was easy :). In practice, we recommend always using
-``IRBuilder`` when generating code like this. It has no "syntactic
-overhead" for its use (you don't have to uglify your compiler with
-constant checks everywhere) and it can dramatically reduce the amount of
-LLVM IR that is generated in some cases (particular for languages with a
-macro preprocessor or that use a lot of constants).
-
-On the other hand, the ``IRBuilder`` is limited by the fact that it does
-all of its analysis inline with the code as it is built. If you take a
-slightly more complex example:
-
-::
-
-    ready> def test(x) (1+2+x)*(x+(1+2));
-    ready> Read function definition:
-    define double @test(double %x) {
-    entry:
-            %addtmp = fadd double 3.000000e+00, %x
-            %addtmp1 = fadd double %x, 3.000000e+00
-            %multmp = fmul double %addtmp, %addtmp1
-            ret double %multmp
-    }
-
-In this case, the LHS and RHS of the multiplication are the same value.
-We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
-instead of computing "``x+3``" twice.
-
-Unfortunately, no amount of local analysis will be able to detect and
-correct this. This requires two transformations: reassociation of
-expressions (to make the add's lexically identical) and Common
-Subexpression Elimination (CSE) to delete the redundant add instruction.
-Fortunately, LLVM provides a broad range of optimizations that you can
-use, in the form of "passes".
-
-LLVM Optimization Passes
-========================
-
-LLVM provides many optimization passes, which do many different sorts of
-things and have different tradeoffs. Unlike other systems, LLVM doesn't
-hold to the mistaken notion that one set of optimizations is right for
-all languages and for all situations. LLVM allows a compiler implementor
-to make complete decisions about what optimizations to use, in which
-order, and in what situation.
-
-As a concrete example, LLVM supports both "whole module" passes, which
-look across as large of body of code as they can (often a whole file,
-but if run at link time, this can be a substantial portion of the whole
-program). It also supports and includes "per-function" passes which just
-operate on a single function at a time, without looking at other
-functions. For more information on passes and how they are run, see the
-`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the
-`List of LLVM Passes <../Passes.html>`_.
-
-For Kaleidoscope, we are currently generating functions on the fly, one
-at a time, as the user types them in. We aren't shooting for the
-ultimate optimization experience in this setting, but we also want to
-catch the easy and quick stuff where possible. As such, we will choose
-to run a few per-function optimizations as the user types the function
-in. If we wanted to make a "static Kaleidoscope compiler", we would use
-exactly the code we have now, except that we would defer running the
-optimizer until the entire file has been parsed.
-
-In order to get per-function optimizations going, we need to set up a
-`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
-and organize the LLVM optimizations that we want to run. Once we have
-that, we can add a set of optimizations to run. We'll need a new
-FunctionPassManager for each module that we want to optimize, so we'll
-write a function to create and initialize both the module and pass manager
-for us:
-
-.. code-block:: c++
-
-    void InitializeModuleAndPassManager(void) {
-      // Open a new module.
-      Context LLVMContext;
-      TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext);
-      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
-
-      // Create a new pass manager attached to it.
-      TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get());
-
-      // Provide basic AliasAnalysis support for GVN.
-      TheFPM.add(createBasicAliasAnalysisPass());
-      // Do simple "peephole" optimizations and bit-twiddling optzns.
-      TheFPM.add(createInstructionCombiningPass());
-      // Reassociate expressions.
-      TheFPM.add(createReassociatePass());
-      // Eliminate Common SubExpressions.
-      TheFPM.add(createGVNPass());
-      // Simplify the control flow graph (deleting unreachable blocks, etc).
-      TheFPM.add(createCFGSimplificationPass());
-
-      TheFPM.doInitialization();
-    }
-
-This code initializes the global module ``TheModule``, and the function pass
-manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
-set up, we use a series of "add" calls to add a bunch of LLVM passes.
-
-In this case, we choose to add five passes: one analysis pass (alias analysis),
-and four optimization passes. The passes we choose here are a pretty standard set
-of "cleanup" optimizations that are useful for a wide variety of code. I won't
-delve into what they do but, believe me, they are a good starting place :).
-
-Once the PassManager is set up, we need to make use of it. We do this by
-running it after our newly created function is constructed (in
-``FunctionAST::codegen()``), but before it is returned to the client:
-
-.. code-block:: c++
-
-      if (Value *RetVal = Body->codegen()) {
-        // Finish off the function.
-        Builder.CreateRet(RetVal);
-
-        // Validate the generated code, checking for consistency.
-        verifyFunction(*TheFunction);
-
-        // Optimize the function.
-        TheFPM->run(*TheFunction);
-
-        return TheFunction;
-      }
-
-As you can see, this is pretty straightforward. The
-``FunctionPassManager`` optimizes and updates the LLVM Function\* in
-place, improving (hopefully) its body. With this in place, we can try
-our test above again:
-
-::
-
-    ready> def test(x) (1+2+x)*(x+(1+2));
-    ready> Read function definition:
-    define double @test(double %x) {
-    entry:
-            %addtmp = fadd double %x, 3.000000e+00
-            %multmp = fmul double %addtmp, %addtmp
-            ret double %multmp
-    }
-
-As expected, we now get our nicely optimized code, saving a floating
-point add instruction from every execution of this function.
-
-LLVM provides a wide variety of optimizations that can be used in
-certain circumstances. Some `documentation about the various
-passes <../Passes.html>`_ is available, but it isn't very complete.
-Another good source of ideas can come from looking at the passes that
-``Clang`` runs to get started. The "``opt``" tool allows you to
-experiment with passes from the command line, so you can see if they do
-anything.
-
-Now that we have reasonable code coming out of our front-end, lets talk
-about executing it!
-
-Adding a JIT Compiler
-=====================
-
-Code that is available in LLVM IR can have a wide variety of tools
-applied to it. For example, you can run optimizations on it (as we did
-above), you can dump it out in textual or binary forms, you can compile
-the code to an assembly file (.s) for some target, or you can JIT
-compile it. The nice thing about the LLVM IR representation is that it
-is the "common currency" between many different parts of the compiler.
-
-In this section, we'll add JIT compiler support to our interpreter. The
-basic idea that we want for Kaleidoscope is to have the user enter
-function bodies as they do now, but immediately evaluate the top-level
-expressions they type in. For example, if they type in "1 + 2;", we
-should evaluate and print out 3. If they define a function, they should
-be able to call it from the command line.
-
-In order to do this, we first declare and initialize the JIT. This is
-done by adding a global variable ``TheJIT``, and initializing it in
-``main``:
-
-.. code-block:: c++
-
-    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
-    ...
-    int main() {
-      ..
-      TheJIT = llvm::make_unique<KaleidoscopeJIT>();
-
-      // Run the main "interpreter loop" now.
-      MainLoop();
-
-      return 0;
-    }
-
-The KaleidoscopeJIT class is a simple JIT built specifically for these
-tutorials. In later chapters we will look at how it works and extend it with
-new features, but for now we will take it as given. Its API is very simple::
-``addModule`` adds an LLVM IR module to the JIT, making its functions
-available for execution; ``removeModule`` removes a module, freeing any
-memory associated with the code in that module; and ``findSymbol`` allows us
-to look up pointers to the compiled code.
-
-We can take this simple API and change our code that parses top-level expressions to
-look like this:
-
-.. code-block:: c++
-
-    static void HandleTopLevelExpression() {
-      // Evaluate a top-level expression into an anonymous function.
-      if (auto FnAST = ParseTopLevelExpr()) {
-        if (FnAST->codegen()) {
-
-          // JIT the module containing the anonymous expression, keeping a handle so
-          // we can free it later.
-          auto H = TheJIT->addModule(std::move(TheModule));
-          InitializeModuleAndPassManager();
-
-          // Search the JIT for the __anon_expr symbol.
-          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
-          assert(ExprSymbol && "Function not found");
-
-          // Get the symbol's address and cast it to the right type (takes no
-          // arguments, returns a double) so we can call it as a native function.
-          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
-          fprintf(stderr, "Evaluated to %f\n", FP());
-
-          // Delete the anonymous expression module from the JIT.
-          TheJIT->removeModule(H);
-        }
-
-If parsing and codegen succeeed, the next step is to add the module containing
-the top-level expression to the JIT. We do this by calling addModule, which
-triggers code generation for all the functions in the module, and returns a
-handle that can be used to remove the module from the JIT later. Once the module
-has been added to the JIT it can no longer be modified, so we also open a new
-module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
-
-Once we've added the module to the JIT we need to get a pointer to the final
-generated code. We do this by calling the JIT's findSymbol method, and passing
-the name of the top-level expression function: ``__anon_expr``. Since we just
-added this function, we assert that findSymbol returned a result.
-
-Next, we get the in-memory address of the ``__anon_expr`` function by calling
-``getAddress()`` on the symbol. Recall that we compile top-level expressions
-into a self-contained LLVM function that takes no arguments and returns the
-computed double. Because the LLVM JIT compiler matches the native platform ABI,
-this means that you can just cast the result pointer to a function pointer of
-that type and call it directly. This means, there is no difference between JIT
-compiled code and native machine code that is statically linked into your
-application.
-
-Finally, since we don't support re-evaluation of top-level expressions, we
-remove the module from the JIT when we're done to free the associated memory.
-Recall, however, that the module we created a few lines earlier (via
-``InitializeModuleAndPassManager``) is still open and waiting for new code to be
-added.
-
-With just these two changes, lets see how Kaleidoscope works now!
-
-::
-
-    ready> 4+5;
-    Read top-level expression:
-    define double @0() {
-    entry:
-      ret double 9.000000e+00
-    }
-
-    Evaluated to 9.000000
-
-Well this looks like it is basically working. The dump of the function
-shows the "no argument function that always returns double" that we
-synthesize for each top-level expression that is typed in. This
-demonstrates very basic functionality, but can we do more?
-
-::
-
-    ready> def testfunc(x y) x + y*2;
-    Read function definition:
-    define double @testfunc(double %x, double %y) {
-    entry:
-      %multmp = fmul double %y, 2.000000e+00
-      %addtmp = fadd double %multmp, %x
-      ret double %addtmp
-    }
-
-    ready> testfunc(4, 10);
-    Read top-level expression:
-    define double @1() {
-    entry:
-      %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
-      ret double %calltmp
-    }
-
-    Evaluated to 24.000000
-
-    ready> testfunc(5, 10);
-    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
-
-
-Function definitions and calls also work, but something went very wrong on that
-last line. The call looks valid, so what happened? As you may have guessed from
-the the API a Module is a unit of allocation for the JIT, and testfunc was part
-of the same module that contained anonymous expression. When we removed that
-module from the JIT to free the memory for the anonymous expression, we deleted
-the definition of ``testfunc`` along with it. Then, when we tried to call
-testfunc a second time, the JIT could no longer find it.
-
-The easiest way to fix this is to put the anonymous expression in a separate
-module from the rest of the function definitions. The JIT will happily resolve
-function calls across module boundaries, as long as each of the functions called
-has a prototype, and is added to the JIT before it is called. By putting the
-anonymous expression in a different module we can delete it without affecting
-the rest of the functions.
-
-In fact, we're going to go a step further and put every function in its own
-module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
-that will make our environment more REPL-like: Functions can be added to the
-JIT more than once (unlike a module where every function must have a unique
-definition). When you look up a symbol in KaleidoscopeJIT it will always return
-the most recent definition:
-
-::
-
-    ready> def foo(x) x + 1;
-    Read function definition:
-    define double @foo(double %x) {
-    entry:
-      %addtmp = fadd double %x, 1.000000e+00
-      ret double %addtmp
-    }
-
-    ready> foo(2);
-    Evaluated to 3.000000
-
-    ready> def foo(x) x + 2;
-    define double @foo(double %x) {
-    entry:
-      %addtmp = fadd double %x, 2.000000e+00
-      ret double %addtmp
-    }
-
-    ready> foo(2);
-    Evaluated to 4.000000
-
-
-To allow each function to live in its own module we'll need a way to
-re-generate previous function declarations into each new module we open:
-
-.. code-block:: c++
-
-    static std::unique_ptr<KaleidoscopeJIT> TheJIT;
-
-    ...
-
-    Function *getFunction(std::string Name) {
-      // First, see if the function has already been added to the current module.
-      if (auto *F = TheModule->getFunction(Name))
-        return F;
-
-      // If not, check whether we can codegen the declaration from some existing
-      // prototype.
-      auto FI = FunctionProtos.find(Name);
-      if (FI != FunctionProtos.end())
-        return FI->second->codegen();
-
-      // If no existing prototype exists, return null.
-      return nullptr;
-    }
-
-    ...
-
-    Value *CallExprAST::codegen() {
-      // Look up the name in the global module table.
-      Function *CalleeF = getFunction(Callee);
-
-    ...
-
-    Function *FunctionAST::codegen() {
-      // Transfer ownership of the prototype to the FunctionProtos map, but keep a
-      // reference to it for use below.
-      auto &P = *Proto;
-      FunctionProtos[Proto->getName()] = std::move(Proto);
-      Function *TheFunction = getFunction(P.getName());
-      if (!TheFunction)
-        return nullptr;
-
-
-To enable this, we'll start by adding a new global, ``FunctionProtos``, that
-holds the most recent prototype for each function. We'll also add a convenience
-method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
-Our convenience method searches ``TheModule`` for an existing function
-declaration, falling back to generating a new declaration from FunctionProtos if
-it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
-call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
-update the FunctionProtos map first, then call ``getFunction()``. With this
-done, we can always obtain a function declaration in the current module for any
-previously declared function.
-
-We also need to update HandleDefinition and HandleExtern:
-
-.. code-block:: c++
-
-    static void HandleDefinition() {
-      if (auto FnAST = ParseDefinition()) {
-        if (auto *FnIR = FnAST->codegen()) {
-          fprintf(stderr, "Read function definition:");
-          FnIR->dump();
-          TheJIT->addModule(std::move(TheModule));
-          InitializeModuleAndPassManager();
-        }
-      } else {
-        // Skip token for error recovery.
-         getNextToken();
-      }
-    }
-
-    static void HandleExtern() {
-      if (auto ProtoAST = ParseExtern()) {
-        if (auto *FnIR = ProtoAST->codegen()) {
-          fprintf(stderr, "Read extern: ");
-          FnIR->dump();
-          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
-        }
-      } else {
-        // Skip token for error recovery.
-        getNextToken();
-      }
-    }
-
-In HandleDefinition, we add two lines to transfer the newly defined function to
-the JIT and open a new module. In HandleExtern, we just need to add one line to
-add the prototype to FunctionProtos.
-
-With these changes made, lets try our REPL again (I removed the dump of the
-anonymous functions this time, you should get the idea by now :) :
-
-::
-
-    ready> def foo(x) x + 1;
-    ready> foo(2);
-    Evaluated to 3.000000
-
-    ready> def foo(x) x + 2;
-    ready> foo(2);
-    Evaluated to 4.000000
-
-It works!
-
-Even with this simple code, we get some surprisingly powerful capabilities -
-check this out:
-
-::
-
-    ready> extern sin(x);
-    Read extern:
-    declare double @sin(double)
-
-    ready> extern cos(x);
-    Read extern:
-    declare double @cos(double)
-
-    ready> sin(1.0);
-    Read top-level expression:
-    define double @2() {
-    entry:
-      ret double 0x3FEAED548F090CEE
-    }
-
-    Evaluated to 0.841471
-
-    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
-    Read function definition:
-    define double @foo(double %x) {
-    entry:
-      %calltmp = call double @sin(double %x)
-      %multmp = fmul double %calltmp, %calltmp
-      %calltmp2 = call double @cos(double %x)
-      %multmp4 = fmul double %calltmp2, %calltmp2
-      %addtmp = fadd double %multmp, %multmp4
-      ret double %addtmp
-    }
-
-    ready> foo(4.0);
-    Read top-level expression:
-    define double @3() {
-    entry:
-      %calltmp = call double @foo(double 4.000000e+00)
-      ret double %calltmp
-    }
-
-    Evaluated to 1.000000
-
-Whoa, how does the JIT know about sin and cos? The answer is surprisingly
-simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
-it uses to find symbols that aren't available in any given module: First
-it searches all the modules that have already been added to the JIT, from the
-most recent to the oldest, to find the newest definition. If no definition is
-found inside the JIT, it falls back to calling "``dlsym("sin")``" on the
-Kaleidoscope process itself. Since "``sin``" is defined within the JIT's
-address space, it simply patches up calls in the module to call the libm
-version of ``sin`` directly.
-
-In the future we'll see how tweaking this symbol resolution rule can be used to
-enable all sorts of useful features, from security (restricting the set of
-symbols available to JIT'd code), to dynamic code generation based on symbol
-names, and even lazy compilation.
-
-One immediate benefit of the symbol resolution rule is that we can now extend
-the language by writing arbitrary C++ code to implement operations. For example,
-if we add:
-
-.. code-block:: c++
-
-    /// putchard - putchar that takes a double and returns 0.
-    extern "C" double putchard(double X) {
-      fputc((char)X, stderr);
-      return 0;
-    }
-
-Now we can produce simple output to the console by using things like:
-"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
-on the console (120 is the ASCII code for 'x'). Similar code could be
-used to implement file I/O, console input, and many other capabilities
-in Kaleidoscope.
-
-This completes the JIT and optimizer chapter of the Kaleidoscope
-tutorial. At this point, we can compile a non-Turing-complete
-programming language, optimize and JIT compile it in a user-driven way.
-Next up we'll look into `extending the language with control flow
-constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues
-along the way.
-
-Full Code Listing
-=================
-
-Here is the complete code listing for our running example, enhanced with
-the LLVM JIT and optimizer. To build this example, use:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
-    # Run
-    ./toy
-
-If you are compiling this on Linux, make sure to add the "-rdynamic"
-option as well. This makes sure that the external functions are resolved
-properly at runtime.
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp
-   :language: c++
-
-`Next: Extending the language: control flow <LangImpl5.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl5-cfg.png
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl5-cfg.png?rev=274440&view=auto
==============================================================================
Binary file - no diff available.

Removed: llvm/trunk/docs/tutorial/LangImpl5.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl5.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl5.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl5.rst (removed)
@@ -1,790 +0,0 @@
-==================================================
-Kaleidoscope: Extending the Language: Control Flow
-==================================================
-
-.. contents::
-   :local:
-
-Chapter 5 Introduction
-======================
-
-Welcome to Chapter 5 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of
-the simple Kaleidoscope language and included support for generating
-LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as
-presented, Kaleidoscope is mostly useless: it has no control flow other
-than call and return. This means that you can't have conditional
-branches in the code, significantly limiting its power. In this episode
-of "build that compiler", we'll extend Kaleidoscope to have an
-if/then/else expression plus a simple 'for' loop.
-
-If/Then/Else
-============
-
-Extending Kaleidoscope to support if/then/else is quite straightforward.
-It basically requires adding support for this "new" concept to the
-lexer, parser, AST, and LLVM code emitter. This example is nice, because
-it shows how easy it is to "grow" a language over time, incrementally
-extending it as new ideas are discovered.
-
-Before we get going on "how" we add this extension, lets talk about
-"what" we want. The basic idea is that we want to be able to write this
-sort of thing:
-
-::
-
-    def fib(x)
-      if x < 3 then
-        1
-      else
-        fib(x-1)+fib(x-2);
-
-In Kaleidoscope, every construct is an expression: there are no
-statements. As such, the if/then/else expression needs to return a value
-like any other. Since we're using a mostly functional form, we'll have
-it evaluate its conditional, then return the 'then' or 'else' value
-based on how the condition was resolved. This is very similar to the C
-"?:" expression.
-
-The semantics of the if/then/else expression is that it evaluates the
-condition to a boolean equality value: 0.0 is considered to be false and
-everything else is considered to be true. If the condition is true, the
-first subexpression is evaluated and returned, if the condition is
-false, the second subexpression is evaluated and returned. Since
-Kaleidoscope allows side-effects, this behavior is important to nail
-down.
-
-Now that we know what we "want", lets break this down into its
-constituent pieces.
-
-Lexer Extensions for If/Then/Else
----------------------------------
-
-The lexer extensions are straightforward. First we add new enum values
-for the relevant tokens:
-
-.. code-block:: c++
-
-      // control
-      tok_if = -6,
-      tok_then = -7,
-      tok_else = -8,
-
-Once we have that, we recognize the new keywords in the lexer. This is
-pretty simple stuff:
-
-.. code-block:: c++
-
-        ...
-        if (IdentifierStr == "def")
-          return tok_def;
-        if (IdentifierStr == "extern")
-          return tok_extern;
-        if (IdentifierStr == "if")
-          return tok_if;
-        if (IdentifierStr == "then")
-          return tok_then;
-        if (IdentifierStr == "else")
-          return tok_else;
-        return tok_identifier;
-
-AST Extensions for If/Then/Else
--------------------------------
-
-To represent the new expression we add a new AST node for it:
-
-.. code-block:: c++
-
-    /// IfExprAST - Expression class for if/then/else.
-    class IfExprAST : public ExprAST {
-      std::unique_ptr<ExprAST> Cond, Then, Else;
-
-    public:
-      IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,
-                std::unique_ptr<ExprAST> Else)
-        : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}
-      virtual Value *codegen();
-    };
-
-The AST node just has pointers to the various subexpressions.
-
-Parser Extensions for If/Then/Else
-----------------------------------
-
-Now that we have the relevant tokens coming from the lexer and we have
-the AST node to build, our parsing logic is relatively straightforward.
-First we define a new parsing function:
-
-.. code-block:: c++
-
-    /// ifexpr ::= 'if' expression 'then' expression 'else' expression
-    static std::unique_ptr<ExprAST> ParseIfExpr() {
-      getNextToken();  // eat the if.
-
-      // condition.
-      auto Cond = ParseExpression();
-      if (!Cond)
-        return nullptr;
-
-      if (CurTok != tok_then)
-        return LogError("expected then");
-      getNextToken();  // eat the then
-
-      auto Then = ParseExpression();
-      if (!Then)
-        return nullptr;
-
-      if (CurTok != tok_else)
-        return LogError("expected else");
-
-      getNextToken();
-
-      auto Else = ParseExpression();
-      if (!Else)
-        return nullptr;
-
-      return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),
-                                          std::move(Else));
-    }
-
-Next we hook it up as a primary expression:
-
-.. code-block:: c++
-
-    static std::unique_ptr<ExprAST> ParsePrimary() {
-      switch (CurTok) {
-      default:
-        return LogError("unknown token when expecting an expression");
-      case tok_identifier:
-        return ParseIdentifierExpr();
-      case tok_number:
-        return ParseNumberExpr();
-      case '(':
-        return ParseParenExpr();
-      case tok_if:
-        return ParseIfExpr();
-      }
-    }
-
-LLVM IR for If/Then/Else
-------------------------
-
-Now that we have it parsing and building the AST, the final piece is
-adding LLVM code generation support. This is the most interesting part
-of the if/then/else example, because this is where it starts to
-introduce new concepts. All of the code above has been thoroughly
-described in previous chapters.
-
-To motivate the code we want to produce, lets take a look at a simple
-example. Consider:
-
-::
-
-    extern foo();
-    extern bar();
-    def baz(x) if x then foo() else bar();
-
-If you disable optimizations, the code you'll (soon) get from
-Kaleidoscope looks like this:
-
-.. code-block:: llvm
-
-    declare double @foo()
-
-    declare double @bar()
-
-    define double @baz(double %x) {
-    entry:
-      %ifcond = fcmp one double %x, 0.000000e+00
-      br i1 %ifcond, label %then, label %else
-
-    then:       ; preds = %entry
-      %calltmp = call double @foo()
-      br label %ifcont
-
-    else:       ; preds = %entry
-      %calltmp1 = call double @bar()
-      br label %ifcont
-
-    ifcont:     ; preds = %else, %then
-      %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]
-      ret double %iftmp
-    }
-
-To visualize the control flow graph, you can use a nifty feature of the
-LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM
-IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a
-window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll
-see this graph:
-
-.. figure:: LangImpl5-cfg.png
-   :align: center
-   :alt: Example CFG
-
-   Example CFG
-
-Another way to get this is to call "``F->viewCFG()``" or
-"``F->viewCFGOnly()``" (where F is a "``Function*``") either by
-inserting actual calls into the code and recompiling or by calling these
-in the debugger. LLVM has many nice features for visualizing various
-graphs.
-
-Getting back to the generated code, it is fairly simple: the entry block
-evaluates the conditional expression ("x" in our case here) and compares
-the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered
-and Not Equal"). Based on the result of this expression, the code jumps
-to either the "then" or "else" blocks, which contain the expressions for
-the true/false cases.
-
-Once the then/else blocks are finished executing, they both branch back
-to the 'ifcont' block to execute the code that happens after the
-if/then/else. In this case the only thing left to do is to return to the
-caller of the function. The question then becomes: how does the code
-know which expression to return?
-
-The answer to this question involves an important SSA operation: the
-`Phi
-operation <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
-If you're not familiar with SSA, `the wikipedia
-article <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
-is a good introduction and there are various other introductions to it
-available on your favorite search engine. The short version is that
-"execution" of the Phi operation requires "remembering" which block
-control came from. The Phi operation takes on the value corresponding to
-the input control block. In this case, if control comes in from the
-"then" block, it gets the value of "calltmp". If control comes from the
-"else" block, it gets the value of "calltmp1".
-
-At this point, you are probably starting to think "Oh no! This means my
-simple and elegant front-end will have to start generating SSA form in
-order to use LLVM!". Fortunately, this is not the case, and we strongly
-advise *not* implementing an SSA construction algorithm in your
-front-end unless there is an amazingly good reason to do so. In
-practice, there are two sorts of values that float around in code
-written for your average imperative programming language that might need
-Phi nodes:
-
-#. Code that involves user variables: ``x = 1; x = x + 1;``
-#. Values that are implicit in the structure of your AST, such as the
-   Phi node in this case.
-
-In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"),
-we'll talk about #1 in depth. For now, just believe me that you don't
-need SSA construction to handle this case. For #2, you have the choice
-of using the techniques that we will describe for #1, or you can insert
-Phi nodes directly, if convenient. In this case, it is really
-easy to generate the Phi node, so we choose to do it directly.
-
-Okay, enough of the motivation and overview, lets generate code!
-
-Code Generation for If/Then/Else
---------------------------------
-
-In order to generate code for this, we implement the ``codegen`` method
-for ``IfExprAST``:
-
-.. code-block:: c++
-
-    Value *IfExprAST::codegen() {
-      Value *CondV = Cond->codegen();
-      if (!CondV)
-        return nullptr;
-
-      // Convert condition to a bool by comparing equal to 0.0.
-      CondV = Builder.CreateFCmpONE(
-          CondV, ConstantFP::get(LLVMContext, APFloat(0.0)), "ifcond");
-
-This code is straightforward and similar to what we saw before. We emit
-the expression for the condition, then compare that value to zero to get
-a truth value as a 1-bit (bool) value.
-
-.. code-block:: c++
-
-      Function *TheFunction = Builder.GetInsertBlock()->getParent();
-
-      // Create blocks for the then and else cases.  Insert the 'then' block at the
-      // end of the function.
-      BasicBlock *ThenBB =
-          BasicBlock::Create(LLVMContext, "then", TheFunction);
-      BasicBlock *ElseBB = BasicBlock::Create(LLVMContext, "else");
-      BasicBlock *MergeBB = BasicBlock::Create(LLVMContext, "ifcont");
-
-      Builder.CreateCondBr(CondV, ThenBB, ElseBB);
-
-This code creates the basic blocks that are related to the if/then/else
-statement, and correspond directly to the blocks in the example above.
-The first line gets the current Function object that is being built. It
-gets this by asking the builder for the current BasicBlock, and asking
-that block for its "parent" (the function it is currently embedded
-into).
-
-Once it has that, it creates three blocks. Note that it passes
-"TheFunction" into the constructor for the "then" block. This causes the
-constructor to automatically insert the new block into the end of the
-specified function. The other two blocks are created, but aren't yet
-inserted into the function.
-
-Once the blocks are created, we can emit the conditional branch that
-chooses between them. Note that creating new blocks does not implicitly
-affect the IRBuilder, so it is still inserting into the block that the
-condition went into. Also note that it is creating a branch to the
-"then" block and the "else" block, even though the "else" block isn't
-inserted into the function yet. This is all ok: it is the standard way
-that LLVM supports forward references.
-
-.. code-block:: c++
-
-      // Emit then value.
-      Builder.SetInsertPoint(ThenBB);
-
-      Value *ThenV = Then->codegen();
-      if (!ThenV)
-        return nullptr;
-
-      Builder.CreateBr(MergeBB);
-      // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
-      ThenBB = Builder.GetInsertBlock();
-
-After the conditional branch is inserted, we move the builder to start
-inserting into the "then" block. Strictly speaking, this call moves the
-insertion point to be at the end of the specified block. However, since
-the "then" block is empty, it also starts out by inserting at the
-beginning of the block. :)
-
-Once the insertion point is set, we recursively codegen the "then"
-expression from the AST. To finish off the "then" block, we create an
-unconditional branch to the merge block. One interesting (and very
-important) aspect of the LLVM IR is that it `requires all basic blocks
-to be "terminated" <../LangRef.html#functionstructure>`_ with a `control
-flow instruction <../LangRef.html#terminators>`_ such as return or
-branch. This means that all control flow, *including fall throughs* must
-be made explicit in the LLVM IR. If you violate this rule, the verifier
-will emit an error.
-
-The final line here is quite subtle, but is very important. The basic
-issue is that when we create the Phi node in the merge block, we need to
-set up the block/value pairs that indicate how the Phi will work.
-Importantly, the Phi node expects to have an entry for each predecessor
-of the block in the CFG. Why then, are we getting the current block when
-we just set it to ThenBB 5 lines above? The problem is that the "Then"
-expression may actually itself change the block that the Builder is
-emitting into if, for example, it contains a nested "if/then/else"
-expression. Because calling ``codegen()`` recursively could arbitrarily change
-the notion of the current block, we are required to get an up-to-date
-value for code that will set up the Phi node.
-
-.. code-block:: c++
-
-      // Emit else block.
-      TheFunction->getBasicBlockList().push_back(ElseBB);
-      Builder.SetInsertPoint(ElseBB);
-
-      Value *ElseV = Else->codegen();
-      if (!ElseV)
-        return nullptr;
-
-      Builder.CreateBr(MergeBB);
-      // codegen of 'Else' can change the current block, update ElseBB for the PHI.
-      ElseBB = Builder.GetInsertBlock();
-
-Code generation for the 'else' block is basically identical to codegen
-for the 'then' block. The only significant difference is the first line,
-which adds the 'else' block to the function. Recall previously that the
-'else' block was created, but not added to the function. Now that the
-'then' and 'else' blocks are emitted, we can finish up with the merge
-code:
-
-.. code-block:: c++
-
-      // Emit merge block.
-      TheFunction->getBasicBlockList().push_back(MergeBB);
-      Builder.SetInsertPoint(MergeBB);
-      PHINode *PN =
-        Builder.CreatePHI(Type::getDoubleTy(LLVMContext), 2, "iftmp");
-
-      PN->addIncoming(ThenV, ThenBB);
-      PN->addIncoming(ElseV, ElseBB);
-      return PN;
-    }
-
-The first two lines here are now familiar: the first adds the "merge"
-block to the Function object (it was previously floating, like the else
-block above). The second changes the insertion point so that newly
-created code will go into the "merge" block. Once that is done, we need
-to create the PHI node and set up the block/value pairs for the PHI.
-
-Finally, the CodeGen function returns the phi node as the value computed
-by the if/then/else expression. In our example above, this returned
-value will feed into the code for the top-level function, which will
-create the return instruction.
-
-Overall, we now have the ability to execute conditional code in
-Kaleidoscope. With this extension, Kaleidoscope is a fairly complete
-language that can calculate a wide variety of numeric functions. Next up
-we'll add another useful expression that is familiar from non-functional
-languages...
-
-'for' Loop Expression
-=====================
-
-Now that we know how to add basic control flow constructs to the
-language, we have the tools to add more powerful things. Lets add
-something more aggressive, a 'for' expression:
-
-::
-
-     extern putchard(char)
-     def printstar(n)
-       for i = 1, i < n, 1.0 in
-         putchard(42);  # ascii 42 = '*'
-
-     # print 100 '*' characters
-     printstar(100);
-
-This expression defines a new variable ("i" in this case) which iterates
-from a starting value, while the condition ("i < n" in this case) is
-true, incrementing by an optional step value ("1.0" in this case). If
-the step value is omitted, it defaults to 1.0. While the loop is true,
-it executes its body expression. Because we don't have anything better
-to return, we'll just define the loop as always returning 0.0. In the
-future when we have mutable variables, it will get more useful.
-
-As before, lets talk about the changes that we need to Kaleidoscope to
-support this.
-
-Lexer Extensions for the 'for' Loop
------------------------------------
-
-The lexer extensions are the same sort of thing as for if/then/else:
-
-.. code-block:: c++
-
-      ... in enum Token ...
-      // control
-      tok_if = -6, tok_then = -7, tok_else = -8,
-      tok_for = -9, tok_in = -10
-
-      ... in gettok ...
-      if (IdentifierStr == "def")
-        return tok_def;
-      if (IdentifierStr == "extern")
-        return tok_extern;
-      if (IdentifierStr == "if")
-        return tok_if;
-      if (IdentifierStr == "then")
-        return tok_then;
-      if (IdentifierStr == "else")
-        return tok_else;
-      if (IdentifierStr == "for")
-        return tok_for;
-      if (IdentifierStr == "in")
-        return tok_in;
-      return tok_identifier;
-
-AST Extensions for the 'for' Loop
----------------------------------
-
-The AST node is just as simple. It basically boils down to capturing the
-variable name and the constituent expressions in the node.
-
-.. code-block:: c++
-
-    /// ForExprAST - Expression class for for/in.
-    class ForExprAST : public ExprAST {
-      std::string VarName;
-      std::unique_ptr<ExprAST> Start, End, Step, Body;
-
-    public:
-      ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,
-                 std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,
-                 std::unique_ptr<ExprAST> Body)
-        : VarName(VarName), Start(std::move(Start)), End(std::move(End)),
-          Step(std::move(Step)), Body(std::move(Body)) {}
-      virtual Value *codegen();
-    };
-
-Parser Extensions for the 'for' Loop
-------------------------------------
-
-The parser code is also fairly standard. The only interesting thing here
-is handling of the optional step value. The parser code handles it by
-checking to see if the second comma is present. If not, it sets the step
-value to null in the AST node:
-
-.. code-block:: c++
-
-    /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
-    static std::unique_ptr<ExprAST> ParseForExpr() {
-      getNextToken();  // eat the for.
-
-      if (CurTok != tok_identifier)
-        return LogError("expected identifier after for");
-
-      std::string IdName = IdentifierStr;
-      getNextToken();  // eat identifier.
-
-      if (CurTok != '=')
-        return LogError("expected '=' after for");
-      getNextToken();  // eat '='.
-
-
-      auto Start = ParseExpression();
-      if (!Start)
-        return nullptr;
-      if (CurTok != ',')
-        return LogError("expected ',' after for start value");
-      getNextToken();
-
-      auto End = ParseExpression();
-      if (!End)
-        return nullptr;
-
-      // The step value is optional.
-      std::unique_ptr<ExprAST> Step;
-      if (CurTok == ',') {
-        getNextToken();
-        Step = ParseExpression();
-        if (!Step)
-          return nullptr;
-      }
-
-      if (CurTok != tok_in)
-        return LogError("expected 'in' after for");
-      getNextToken();  // eat 'in'.
-
-      auto Body = ParseExpression();
-      if (!Body)
-        return nullptr;
-
-      return llvm::make_unique<ForExprAST>(IdName, std::move(Start),
-                                           std::move(End), std::move(Step),
-                                           std::move(Body));
-    }
-
-LLVM IR for the 'for' Loop
---------------------------
-
-Now we get to the good part: the LLVM IR we want to generate for this
-thing. With the simple example above, we get this LLVM IR (note that
-this dump is generated with optimizations disabled for clarity):
-
-.. code-block:: llvm
-
-    declare double @putchard(double)
-
-    define double @printstar(double %n) {
-    entry:
-      ; initial value = 1.0 (inlined into phi)
-      br label %loop
-
-    loop:       ; preds = %loop, %entry
-      %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]
-      ; body
-      %calltmp = call double @putchard(double 4.200000e+01)
-      ; increment
-      %nextvar = fadd double %i, 1.000000e+00
-
-      ; termination test
-      %cmptmp = fcmp ult double %i, %n
-      %booltmp = uitofp i1 %cmptmp to double
-      %loopcond = fcmp one double %booltmp, 0.000000e+00
-      br i1 %loopcond, label %loop, label %afterloop
-
-    afterloop:      ; preds = %loop
-      ; loop always returns 0.0
-      ret double 0.000000e+00
-    }
-
-This loop contains all the same constructs we saw before: a phi node,
-several expressions, and some basic blocks. Lets see how this fits
-together.
-
-Code Generation for the 'for' Loop
-----------------------------------
-
-The first part of codegen is very simple: we just output the start
-expression for the loop value:
-
-.. code-block:: c++
-
-    Value *ForExprAST::codegen() {
-      // Emit the start code first, without 'variable' in scope.
-      Value *StartVal = Start->codegen();
-      if (StartVal == 0) return 0;
-
-With this out of the way, the next step is to set up the LLVM basic
-block for the start of the loop body. In the case above, the whole loop
-body is one block, but remember that the body code itself could consist
-of multiple blocks (e.g. if it contains an if/then/else or a for/in
-expression).
-
-.. code-block:: c++
-
-      // Make the new basic block for the loop header, inserting after current
-      // block.
-      Function *TheFunction = Builder.GetInsertBlock()->getParent();
-      BasicBlock *PreheaderBB = Builder.GetInsertBlock();
-      BasicBlock *LoopBB =
-          BasicBlock::Create(LLVMContext, "loop", TheFunction);
-
-      // Insert an explicit fall through from the current block to the LoopBB.
-      Builder.CreateBr(LoopBB);
-
-This code is similar to what we saw for if/then/else. Because we will
-need it to create the Phi node, we remember the block that falls through
-into the loop. Once we have that, we create the actual block that starts
-the loop and create an unconditional branch for the fall-through between
-the two blocks.
-
-.. code-block:: c++
-
-      // Start insertion in LoopBB.
-      Builder.SetInsertPoint(LoopBB);
-
-      // Start the PHI node with an entry for Start.
-      PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(LLVMContext),
-                                            2, VarName.c_str());
-      Variable->addIncoming(StartVal, PreheaderBB);
-
-Now that the "preheader" for the loop is set up, we switch to emitting
-code for the loop body. To begin with, we move the insertion point and
-create the PHI node for the loop induction variable. Since we already
-know the incoming value for the starting value, we add it to the Phi
-node. Note that the Phi will eventually get a second value for the
-backedge, but we can't set it up yet (because it doesn't exist!).
-
-.. code-block:: c++
-
-      // Within the loop, the variable is defined equal to the PHI node.  If it
-      // shadows an existing variable, we have to restore it, so save it now.
-      Value *OldVal = NamedValues[VarName];
-      NamedValues[VarName] = Variable;
-
-      // Emit the body of the loop.  This, like any other expr, can change the
-      // current BB.  Note that we ignore the value computed by the body, but don't
-      // allow an error.
-      if (!Body->codegen())
-        return nullptr;
-
-Now the code starts to get more interesting. Our 'for' loop introduces a
-new variable to the symbol table. This means that our symbol table can
-now contain either function arguments or loop variables. To handle this,
-before we codegen the body of the loop, we add the loop variable as the
-current value for its name. Note that it is possible that there is a
-variable of the same name in the outer scope. It would be easy to make
-this an error (emit an error and return null if there is already an
-entry for VarName) but we choose to allow shadowing of variables. In
-order to handle this correctly, we remember the Value that we are
-potentially shadowing in ``OldVal`` (which will be null if there is no
-shadowed variable).
-
-Once the loop variable is set into the symbol table, the code
-recursively codegen's the body. This allows the body to use the loop
-variable: any references to it will naturally find it in the symbol
-table.
-
-.. code-block:: c++
-
-      // Emit the step value.
-      Value *StepVal = nullptr;
-      if (Step) {
-        StepVal = Step->codegen();
-        if (!StepVal)
-          return nullptr;
-      } else {
-        // If not specified, use 1.0.
-        StepVal = ConstantFP::get(LLVMContext, APFloat(1.0));
-      }
-
-      Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");
-
-Now that the body is emitted, we compute the next value of the iteration
-variable by adding the step value, or 1.0 if it isn't present.
-'``NextVar``' will be the value of the loop variable on the next
-iteration of the loop.
-
-.. code-block:: c++
-
-      // Compute the end condition.
-      Value *EndCond = End->codegen();
-      if (!EndCond)
-        return nullptr;
-
-      // Convert condition to a bool by comparing equal to 0.0.
-      EndCond = Builder.CreateFCmpONE(
-          EndCond, ConstantFP::get(LLVMContext, APFloat(0.0)), "loopcond");
-
-Finally, we evaluate the exit value of the loop, to determine whether
-the loop should exit. This mirrors the condition evaluation for the
-if/then/else statement.
-
-.. code-block:: c++
-
-      // Create the "after loop" block and insert it.
-      BasicBlock *LoopEndBB = Builder.GetInsertBlock();
-      BasicBlock *AfterBB =
-          BasicBlock::Create(LLVMContext, "afterloop", TheFunction);
-
-      // Insert the conditional branch into the end of LoopEndBB.
-      Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
-
-      // Any new code will be inserted in AfterBB.
-      Builder.SetInsertPoint(AfterBB);
-
-With the code for the body of the loop complete, we just need to finish
-up the control flow for it. This code remembers the end block (for the
-phi node), then creates the block for the loop exit ("afterloop"). Based
-on the value of the exit condition, it creates a conditional branch that
-chooses between executing the loop again and exiting the loop. Any
-future code is emitted in the "afterloop" block, so it sets the
-insertion position to it.
-
-.. code-block:: c++
-
-      // Add a new entry to the PHI node for the backedge.
-      Variable->addIncoming(NextVar, LoopEndBB);
-
-      // Restore the unshadowed variable.
-      if (OldVal)
-        NamedValues[VarName] = OldVal;
-      else
-        NamedValues.erase(VarName);
-
-      // for expr always returns 0.0.
-      return Constant::getNullValue(Type::getDoubleTy(LLVMContext));
-    }
-
-The final code handles various cleanups: now that we have the "NextVar"
-value, we can add the incoming value to the loop PHI node. After that,
-we remove the loop variable from the symbol table, so that it isn't in
-scope after the for loop. Finally, code generation of the for loop
-always returns 0.0, so that is what we return from
-``ForExprAST::codegen()``.
-
-With this, we conclude the "adding control flow to Kaleidoscope" chapter
-of the tutorial. In this chapter we added two control flow constructs,
-and used them to motivate a couple of aspects of the LLVM IR that are
-important for front-end implementors to know. In the next chapter of our
-saga, we will get a bit crazier and add `user-defined
-operators <LangImpl6.html>`_ to our poor innocent language.
-
-Full Code Listing
-=================
-
-Here is the complete code listing for our running example, enhanced with
-the if/then/else and for expressions.. To build this example, use:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
-    # Run
-    ./toy
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter5/toy.cpp
-   :language: c++
-
-`Next: Extending the language: user-defined operators <LangImpl6.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl6.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl6.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl6.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl6.rst (removed)
@@ -1,768 +0,0 @@
-============================================================
-Kaleidoscope: Extending the Language: User-defined Operators
-============================================================
-
-.. contents::
-   :local:
-
-Chapter 6 Introduction
-======================
-
-Welcome to Chapter 6 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
-have a fully functional language that is fairly minimal, but also
-useful. There is still one big problem with it, however. Our language
-doesn't have many useful operators (like division, logical negation, or
-even any comparisons besides less-than).
-
-This chapter of the tutorial takes a wild digression into adding
-user-defined operators to the simple and beautiful Kaleidoscope
-language. This digression now gives us a simple and ugly language in
-some ways, but also a powerful one at the same time. One of the great
-things about creating your own language is that you get to decide what
-is good or bad. In this tutorial we'll assume that it is okay to use
-this as a way to show some interesting parsing techniques.
-
-At the end of this tutorial, we'll run through an example Kaleidoscope
-application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
-example of what you can build with Kaleidoscope and its feature set.
-
-User-defined Operators: the Idea
-================================
-
-The "operator overloading" that we will add to Kaleidoscope is more
-general than languages like C++. In C++, you are only allowed to
-redefine existing operators: you can't programatically change the
-grammar, introduce new operators, change precedence levels, etc. In this
-chapter, we will add this capability to Kaleidoscope, which will let the
-user round out the set of operators that are supported.
-
-The point of going into user-defined operators in a tutorial like this
-is to show the power and flexibility of using a hand-written parser.
-Thus far, the parser we have been implementing uses recursive descent
-for most parts of the grammar and operator precedence parsing for the
-expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without
-using operator precedence parsing, it would be very difficult to allow
-the programmer to introduce new operators into the grammar: the grammar
-is dynamically extensible as the JIT runs.
-
-The two specific features we'll add are programmable unary operators
-(right now, Kaleidoscope has no unary operators at all) as well as
-binary operators. An example of this is:
-
-::
-
-    # Logical unary not.
-    def unary!(v)
-      if v then
-        0
-      else
-        1;
-
-    # Define > with the same precedence as <.
-    def binary> 10 (LHS RHS)
-      RHS < LHS;
-
-    # Binary "logical or", (note that it does not "short circuit")
-    def binary| 5 (LHS RHS)
-      if LHS then
-        1
-      else if RHS then
-        1
-      else
-        0;
-
-    # Define = with slightly lower precedence than relationals.
-    def binary= 9 (LHS RHS)
-      !(LHS < RHS | LHS > RHS);
-
-Many languages aspire to being able to implement their standard runtime
-library in the language itself. In Kaleidoscope, we can implement
-significant parts of the language in the library!
-
-We will break down implementation of these features into two parts:
-implementing support for user-defined binary operators and adding unary
-operators.
-
-User-defined Binary Operators
-=============================
-
-Adding support for user-defined binary operators is pretty simple with
-our current framework. We'll first add support for the unary/binary
-keywords:
-
-.. code-block:: c++
-
-    enum Token {
-      ...
-      // operators
-      tok_binary = -11,
-      tok_unary = -12
-    };
-    ...
-    static int gettok() {
-    ...
-        if (IdentifierStr == "for")
-          return tok_for;
-        if (IdentifierStr == "in")
-          return tok_in;
-        if (IdentifierStr == "binary")
-          return tok_binary;
-        if (IdentifierStr == "unary")
-          return tok_unary;
-        return tok_identifier;
-
-This just adds lexer support for the unary and binary keywords, like we
-did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing
-about our current AST, is that we represent binary operators with full
-generalisation by using their ASCII code as the opcode. For our extended
-operators, we'll use this same representation, so we don't need any new
-AST or parser support.
-
-On the other hand, we have to be able to represent the definitions of
-these new operators, in the "def binary\| 5" part of the function
-definition. In our grammar so far, the "name" for the function
-definition is parsed as the "prototype" production and into the
-``PrototypeAST`` AST node. To represent our new user-defined operators
-as prototypes, we have to extend the ``PrototypeAST`` AST node like
-this:
-
-.. code-block:: c++
-
-    /// PrototypeAST - This class represents the "prototype" for a function,
-    /// which captures its argument names as well as if it is an operator.
-    class PrototypeAST {
-      std::string Name;
-      std::vector<std::string> Args;
-      bool IsOperator;
-      unsigned Precedence;  // Precedence if a binary op.
-
-    public:
-      PrototypeAST(const std::string &name, std::vector<std::string> Args,
-                   bool IsOperator = false, unsigned Prec = 0)
-      : Name(name), Args(std::move(Args)), IsOperator(IsOperator),
-        Precedence(Prec) {}
-
-      bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
-      bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
-
-      char getOperatorName() const {
-        assert(isUnaryOp() || isBinaryOp());
-        return Name[Name.size()-1];
-      }
-
-      unsigned getBinaryPrecedence() const { return Precedence; }
-
-      Function *codegen();
-    };
-
-Basically, in addition to knowing a name for the prototype, we now keep
-track of whether it was an operator, and if it was, what precedence
-level the operator is at. The precedence is only used for binary
-operators (as you'll see below, it just doesn't apply for unary
-operators). Now that we have a way to represent the prototype for a
-user-defined operator, we need to parse it:
-
-.. code-block:: c++
-
-    /// prototype
-    ///   ::= id '(' id* ')'
-    ///   ::= binary LETTER number? (id, id)
-    static std::unique_ptr<PrototypeAST> ParsePrototype() {
-      std::string FnName;
-
-      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
-      unsigned BinaryPrecedence = 30;
-
-      switch (CurTok) {
-      default:
-        return LogErrorP("Expected function name in prototype");
-      case tok_identifier:
-        FnName = IdentifierStr;
-        Kind = 0;
-        getNextToken();
-        break;
-      case tok_binary:
-        getNextToken();
-        if (!isascii(CurTok))
-          return LogErrorP("Expected binary operator");
-        FnName = "binary";
-        FnName += (char)CurTok;
-        Kind = 2;
-        getNextToken();
-
-        // Read the precedence if present.
-        if (CurTok == tok_number) {
-          if (NumVal < 1 || NumVal > 100)
-            return LogErrorP("Invalid precedecnce: must be 1..100");
-          BinaryPrecedence = (unsigned)NumVal;
-          getNextToken();
-        }
-        break;
-      }
-
-      if (CurTok != '(')
-        return LogErrorP("Expected '(' in prototype");
-
-      std::vector<std::string> ArgNames;
-      while (getNextToken() == tok_identifier)
-        ArgNames.push_back(IdentifierStr);
-      if (CurTok != ')')
-        return LogErrorP("Expected ')' in prototype");
-
-      // success.
-      getNextToken();  // eat ')'.
-
-      // Verify right number of names for operator.
-      if (Kind && ArgNames.size() != Kind)
-        return LogErrorP("Invalid number of operands for operator");
-
-      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,
-                                             BinaryPrecedence);
-    }
-
-This is all fairly straightforward parsing code, and we have already
-seen a lot of similar code in the past. One interesting part about the
-code above is the couple lines that set up ``FnName`` for binary
-operators. This builds names like "binary@" for a newly defined "@"
-operator. This then takes advantage of the fact that symbol names in the
-LLVM symbol table are allowed to have any character in them, including
-embedded nul characters.
-
-The next interesting thing to add, is codegen support for these binary
-operators. Given our current structure, this is a simple addition of a
-default case for our existing binary operator node:
-
-.. code-block:: c++
-
-    Value *BinaryExprAST::codegen() {
-      Value *L = LHS->codegen();
-      Value *R = RHS->codegen();
-      if (!L || !R)
-        return nullptr;
-
-      switch (Op) {
-      case '+':
-        return Builder.CreateFAdd(L, R, "addtmp");
-      case '-':
-        return Builder.CreateFSub(L, R, "subtmp");
-      case '*':
-        return Builder.CreateFMul(L, R, "multmp");
-      case '<':
-        L = Builder.CreateFCmpULT(L, R, "cmptmp");
-        // Convert bool 0/1 to double 0.0 or 1.0
-        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),
-                                    "booltmp");
-      default:
-        break;
-      }
-
-      // If it wasn't a builtin binary operator, it must be a user defined one. Emit
-      // a call to it.
-      Function *F = TheModule->getFunction(std::string("binary") + Op);
-      assert(F && "binary operator not found!");
-
-      Value *Ops[2] = { L, R };
-      return Builder.CreateCall(F, Ops, "binop");
-    }
-
-As you can see above, the new code is actually really simple. It just
-does a lookup for the appropriate operator in the symbol table and
-generates a function call to it. Since user-defined operators are just
-built as normal functions (because the "prototype" boils down to a
-function with the right name) everything falls into place.
-
-The final piece of code we are missing, is a bit of top-level magic:
-
-.. code-block:: c++
-
-    Function *FunctionAST::codegen() {
-      NamedValues.clear();
-
-      Function *TheFunction = Proto->codegen();
-      if (!TheFunction)
-        return nullptr;
-
-      // If this is an operator, install it.
-      if (Proto->isBinaryOp())
-        BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();
-
-      // Create a new basic block to start insertion into.
-      BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);
-      Builder.SetInsertPoint(BB);
-
-      if (Value *RetVal = Body->codegen()) {
-        ...
-
-Basically, before codegening a function, if it is a user-defined
-operator, we register it in the precedence table. This allows the binary
-operator parsing logic we already have in place to handle it. Since we
-are working on a fully-general operator precedence parser, this is all
-we need to do to "extend the grammar".
-
-Now we have useful user-defined binary operators. This builds a lot on
-the previous framework we built for other operators. Adding unary
-operators is a bit more challenging, because we don't have any framework
-for it yet - lets see what it takes.
-
-User-defined Unary Operators
-============================
-
-Since we don't currently support unary operators in the Kaleidoscope
-language, we'll need to add everything to support them. Above, we added
-simple support for the 'unary' keyword to the lexer. In addition to
-that, we need an AST node:
-
-.. code-block:: c++
-
-    /// UnaryExprAST - Expression class for a unary operator.
-    class UnaryExprAST : public ExprAST {
-      char Opcode;
-      std::unique_ptr<ExprAST> Operand;
-
-    public:
-      UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
-        : Opcode(Opcode), Operand(std::move(Operand)) {}
-      virtual Value *codegen();
-    };
-
-This AST node is very simple and obvious by now. It directly mirrors the
-binary operator AST node, except that it only has one child. With this,
-we need to add the parsing logic. Parsing a unary operator is pretty
-simple: we'll add a new function to do it:
-
-.. code-block:: c++
-
-    /// unary
-    ///   ::= primary
-    ///   ::= '!' unary
-    static std::unique_ptr<ExprAST> ParseUnary() {
-      // If the current token is not an operator, it must be a primary expr.
-      if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
-        return ParsePrimary();
-
-      // If this is a unary operator, read it.
-      int Opc = CurTok;
-      getNextToken();
-      if (auto Operand = ParseUnary())
-        return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));
-      return nullptr;
-    }
-
-The grammar we add is pretty straightforward here. If we see a unary
-operator when parsing a primary operator, we eat the operator as a
-prefix and parse the remaining piece as another unary operator. This
-allows us to handle multiple unary operators (e.g. "!!x"). Note that
-unary operators can't have ambiguous parses like binary operators can,
-so there is no need for precedence information.
-
-The problem with this function, is that we need to call ParseUnary from
-somewhere. To do this, we change previous callers of ParsePrimary to
-call ParseUnary instead:
-
-.. code-block:: c++
-
-    /// binoprhs
-    ///   ::= ('+' unary)*
-    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
-                                                  std::unique_ptr<ExprAST> LHS) {
-      ...
-        // Parse the unary expression after the binary operator.
-        auto RHS = ParseUnary();
-        if (!RHS)
-          return nullptr;
-      ...
-    }
-    /// expression
-    ///   ::= unary binoprhs
-    ///
-    static std::unique_ptr<ExprAST> ParseExpression() {
-      auto LHS = ParseUnary();
-      if (!LHS)
-        return nullptr;
-
-      return ParseBinOpRHS(0, std::move(LHS));
-    }
-
-With these two simple changes, we are now able to parse unary operators
-and build the AST for them. Next up, we need to add parser support for
-prototypes, to parse the unary operator prototype. We extend the binary
-operator code above with:
-
-.. code-block:: c++
-
-    /// prototype
-    ///   ::= id '(' id* ')'
-    ///   ::= binary LETTER number? (id, id)
-    ///   ::= unary LETTER (id)
-    static std::unique_ptr<PrototypeAST> ParsePrototype() {
-      std::string FnName;
-
-      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
-      unsigned BinaryPrecedence = 30;
-
-      switch (CurTok) {
-      default:
-        return LogErrorP("Expected function name in prototype");
-      case tok_identifier:
-        FnName = IdentifierStr;
-        Kind = 0;
-        getNextToken();
-        break;
-      case tok_unary:
-        getNextToken();
-        if (!isascii(CurTok))
-          return LogErrorP("Expected unary operator");
-        FnName = "unary";
-        FnName += (char)CurTok;
-        Kind = 1;
-        getNextToken();
-        break;
-      case tok_binary:
-        ...
-
-As with binary operators, we name unary operators with a name that
-includes the operator character. This assists us at code generation
-time. Speaking of, the final piece we need to add is codegen support for
-unary operators. It looks like this:
-
-.. code-block:: c++
-
-    Value *UnaryExprAST::codegen() {
-      Value *OperandV = Operand->codegen();
-      if (!OperandV)
-        return nullptr;
-
-      Function *F = TheModule->getFunction(std::string("unary")+Opcode);
-      if (!F)
-        return LogErrorV("Unknown unary operator");
-
-      return Builder.CreateCall(F, OperandV, "unop");
-    }
-
-This code is similar to, but simpler than, the code for binary
-operators. It is simpler primarily because it doesn't need to handle any
-predefined operators.
-
-Kicking the Tires
-=================
-
-It is somewhat hard to believe, but with a few simple extensions we've
-covered in the last chapters, we have grown a real-ish language. With
-this, we can do a lot of interesting things, including I/O, math, and a
-bunch of other things. For example, we can now add a nice sequencing
-operator (printd is defined to print out the specified value and a
-newline):
-
-::
-
-    ready> extern printd(x);
-    Read extern:
-    declare double @printd(double)
-
-    ready> def binary : 1 (x y) 0;  # Low-precedence operator that ignores operands.
-    ..
-    ready> printd(123) : printd(456) : printd(789);
-    123.000000
-    456.000000
-    789.000000
-    Evaluated to 0.000000
-
-We can also define a bunch of other "primitive" operations, such as:
-
-::
-
-    # Logical unary not.
-    def unary!(v)
-      if v then
-        0
-      else
-        1;
-
-    # Unary negate.
-    def unary-(v)
-      0-v;
-
-    # Define > with the same precedence as <.
-    def binary> 10 (LHS RHS)
-      RHS < LHS;
-
-    # Binary logical or, which does not short circuit.
-    def binary| 5 (LHS RHS)
-      if LHS then
-        1
-      else if RHS then
-        1
-      else
-        0;
-
-    # Binary logical and, which does not short circuit.
-    def binary& 6 (LHS RHS)
-      if !LHS then
-        0
-      else
-        !!RHS;
-
-    # Define = with slightly lower precedence than relationals.
-    def binary = 9 (LHS RHS)
-      !(LHS < RHS | LHS > RHS);
-
-    # Define ':' for sequencing: as a low-precedence operator that ignores operands
-    # and just returns the RHS.
-    def binary : 1 (x y) y;
-
-Given the previous if/then/else support, we can also define interesting
-functions for I/O. For example, the following prints out a character
-whose "density" reflects the value passed in: the lower the value, the
-denser the character:
-
-::
-
-    ready>
-
-    extern putchard(char)
-    def printdensity(d)
-      if d > 8 then
-        putchard(32)  # ' '
-      else if d > 4 then
-        putchard(46)  # '.'
-      else if d > 2 then
-        putchard(43)  # '+'
-      else
-        putchard(42); # '*'
-    ...
-    ready> printdensity(1): printdensity(2): printdensity(3):
-           printdensity(4): printdensity(5): printdensity(9):
-           putchard(10);
-    **++.
-    Evaluated to 0.000000
-
-Based on these simple primitive operations, we can start to define more
-interesting things. For example, here's a little function that solves
-for the number of iterations it takes a function in the complex plane to
-converge:
-
-::
-
-    # Determine whether the specific location diverges.
-    # Solve for z = z^2 + c in the complex plane.
-    def mandelconverger(real imag iters creal cimag)
-      if iters > 255 | (real*real + imag*imag > 4) then
-        iters
-      else
-        mandelconverger(real*real - imag*imag + creal,
-                        2*real*imag + cimag,
-                        iters+1, creal, cimag);
-
-    # Return the number of iterations required for the iteration to escape
-    def mandelconverge(real imag)
-      mandelconverger(real, imag, 0, real, imag);
-
-This "``z = z2 + c``" function is a beautiful little creature that is
-the basis for computation of the `Mandelbrot
-Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
-``mandelconverge`` function returns the number of iterations that it
-takes for a complex orbit to escape, saturating to 255. This is not a
-very useful function by itself, but if you plot its value over a
-two-dimensional plane, you can see the Mandelbrot set. Given that we are
-limited to using putchard here, our amazing graphical output is limited,
-but we can whip together something using the density plotter above:
-
-::
-
-    # Compute and plot the mandelbrot set with the specified 2 dimensional range
-    # info.
-    def mandelhelp(xmin xmax xstep   ymin ymax ystep)
-      for y = ymin, y < ymax, ystep in (
-        (for x = xmin, x < xmax, xstep in
-           printdensity(mandelconverge(x,y)))
-        : putchard(10)
-      )
-
-    # mandel - This is a convenient helper function for plotting the mandelbrot set
-    # from the specified position with the specified Magnification.
-    def mandel(realstart imagstart realmag imagmag)
-      mandelhelp(realstart, realstart+realmag*78, realmag,
-                 imagstart, imagstart+imagmag*40, imagmag);
-
-Given this, we can try plotting out the mandelbrot set! Lets try it out:
-
-::
-
-    ready> mandel(-2.3, -1.3, 0.05, 0.07);
-    *******************************+++++++++++*************************************
-    *************************+++++++++++++++++++++++*******************************
-    **********************+++++++++++++++++++++++++++++****************************
-    *******************+++++++++++++++++++++.. ...++++++++*************************
-    *****************++++++++++++++++++++++.... ...+++++++++***********************
-    ***************+++++++++++++++++++++++.....   ...+++++++++*********************
-    **************+++++++++++++++++++++++....     ....+++++++++********************
-    *************++++++++++++++++++++++......      .....++++++++*******************
-    ************+++++++++++++++++++++.......       .......+++++++******************
-    ***********+++++++++++++++++++....                ... .+++++++*****************
-    **********+++++++++++++++++.......                     .+++++++****************
-    *********++++++++++++++...........                    ...+++++++***************
-    ********++++++++++++............                      ...++++++++**************
-    ********++++++++++... ..........                        .++++++++**************
-    *******+++++++++.....                                   .+++++++++*************
-    *******++++++++......                                  ..+++++++++*************
-    *******++++++.......                                   ..+++++++++*************
-    *******+++++......                                     ..+++++++++*************
-    *******.... ....                                      ...+++++++++*************
-    *******.... .                                         ...+++++++++*************
-    *******+++++......                                    ...+++++++++*************
-    *******++++++.......                                   ..+++++++++*************
-    *******++++++++......                                   .+++++++++*************
-    *******+++++++++.....                                  ..+++++++++*************
-    ********++++++++++... ..........                        .++++++++**************
-    ********++++++++++++............                      ...++++++++**************
-    *********++++++++++++++..........                     ...+++++++***************
-    **********++++++++++++++++........                     .+++++++****************
-    **********++++++++++++++++++++....                ... ..+++++++****************
-    ***********++++++++++++++++++++++.......       .......++++++++*****************
-    ************+++++++++++++++++++++++......      ......++++++++******************
-    **************+++++++++++++++++++++++....      ....++++++++********************
-    ***************+++++++++++++++++++++++.....   ...+++++++++*********************
-    *****************++++++++++++++++++++++....  ...++++++++***********************
-    *******************+++++++++++++++++++++......++++++++*************************
-    *********************++++++++++++++++++++++.++++++++***************************
-    *************************+++++++++++++++++++++++*******************************
-    ******************************+++++++++++++************************************
-    *******************************************************************************
-    *******************************************************************************
-    *******************************************************************************
-    Evaluated to 0.000000
-    ready> mandel(-2, -1, 0.02, 0.04);
-    **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
-    ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-    *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
-    *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
-    *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
-    ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
-    **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
-    ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
-    ***********++++++++++++++++++++++++++++++++++++++++++++++++++........        .
-    **********++++++++++++++++++++++++++++++++++++++++++++++.............
-    ********+++++++++++++++++++++++++++++++++++++++++++..................
-    *******+++++++++++++++++++++++++++++++++++++++.......................
-    ******+++++++++++++++++++++++++++++++++++...........................
-    *****++++++++++++++++++++++++++++++++............................
-    *****++++++++++++++++++++++++++++...............................
-    ****++++++++++++++++++++++++++......   .........................
-    ***++++++++++++++++++++++++.........     ......    ...........
-    ***++++++++++++++++++++++............
-    **+++++++++++++++++++++..............
-    **+++++++++++++++++++................
-    *++++++++++++++++++.................
-    *++++++++++++++++............ ...
-    *++++++++++++++..............
-    *+++....++++................
-    *..........  ...........
-    *
-    *..........  ...........
-    *+++....++++................
-    *++++++++++++++..............
-    *++++++++++++++++............ ...
-    *++++++++++++++++++.................
-    **+++++++++++++++++++................
-    **+++++++++++++++++++++..............
-    ***++++++++++++++++++++++............
-    ***++++++++++++++++++++++++.........     ......    ...........
-    ****++++++++++++++++++++++++++......   .........................
-    *****++++++++++++++++++++++++++++...............................
-    *****++++++++++++++++++++++++++++++++............................
-    ******+++++++++++++++++++++++++++++++++++...........................
-    *******+++++++++++++++++++++++++++++++++++++++.......................
-    ********+++++++++++++++++++++++++++++++++++++++++++..................
-    Evaluated to 0.000000
-    ready> mandel(-0.9, -1.4, 0.02, 0.03);
-    *******************************************************************************
-    *******************************************************************************
-    *******************************************************************************
-    **********+++++++++++++++++++++************************************************
-    *+++++++++++++++++++++++++++++++++++++++***************************************
-    +++++++++++++++++++++++++++++++++++++++++++++**********************************
-    ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
-    ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
-    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
-    +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
-    +++++++++++++++++++++++++++++++....   ......+++++++++++++++++++****************
-    +++++++++++++++++++++++++++++.......  ........+++++++++++++++++++**************
-    ++++++++++++++++++++++++++++........   ........++++++++++++++++++++************
-    +++++++++++++++++++++++++++.........     ..  ...+++++++++++++++++++++**********
-    ++++++++++++++++++++++++++...........        ....++++++++++++++++++++++********
-    ++++++++++++++++++++++++.............       .......++++++++++++++++++++++******
-    +++++++++++++++++++++++.............        ........+++++++++++++++++++++++****
-    ++++++++++++++++++++++...........           ..........++++++++++++++++++++++***
-    ++++++++++++++++++++...........                .........++++++++++++++++++++++*
-    ++++++++++++++++++............                  ...........++++++++++++++++++++
-    ++++++++++++++++...............                 .............++++++++++++++++++
-    ++++++++++++++.................                 ...............++++++++++++++++
-    ++++++++++++..................                  .................++++++++++++++
-    +++++++++..................                      .................+++++++++++++
-    ++++++........        .                               .........  ..++++++++++++
-    ++............                                         ......    ....++++++++++
-    ..............                                                    ...++++++++++
-    ..............                                                    ....+++++++++
-    ..............                                                    .....++++++++
-    .............                                                    ......++++++++
-    ...........                                                     .......++++++++
-    .........                                                       ........+++++++
-    .........                                                       ........+++++++
-    .........                                                           ....+++++++
-    ........                                                             ...+++++++
-    .......                                                              ...+++++++
-                                                                        ....+++++++
-                                                                       .....+++++++
-                                                                        ....+++++++
-                                                                        ....+++++++
-                                                                        ....+++++++
-    Evaluated to 0.000000
-    ready> ^D
-
-At this point, you may be starting to realize that Kaleidoscope is a
-real and powerful language. It may not be self-similar :), but it can be
-used to plot things that are!
-
-With this, we conclude the "adding user-defined operators" chapter of
-the tutorial. We have successfully augmented our language, adding the
-ability to extend the language in the library, and we have shown how
-this can be used to build a simple but interesting end-user application
-in Kaleidoscope. At this point, Kaleidoscope can build a variety of
-applications that are functional and can call functions with
-side-effects, but it can't actually define and mutate a variable itself.
-
-Strikingly, variable mutation is an important feature of some languages,
-and it is not at all obvious how to `add support for mutable
-variables <LangImpl7.html>`_ without having to add an "SSA construction"
-phase to your front-end. In the next chapter, we will describe how you
-can add variable mutation without building SSA in your front-end.
-
-Full Code Listing
-=================
-
-Here is the complete code listing for our running example, enhanced with
-the if/then/else and for expressions.. To build this example, use:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
-    # Run
-    ./toy
-
-On some platforms, you will need to specify -rdynamic or
--Wl,--export-dynamic when linking. This ensures that symbols defined in
-the main executable are exported to the dynamic linker and so are
-available for symbol resolution at run time. This is not needed if you
-compile your support code into a shared library, although doing that
-will cause problems on Windows.
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp
-   :language: c++
-
-`Next: Extending the language: mutable variables / SSA
-construction <LangImpl7.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl7.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl7.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl7.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl7.rst (removed)
@@ -1,881 +0,0 @@
-=======================================================
-Kaleidoscope: Extending the Language: Mutable Variables
-=======================================================
-
-.. contents::
-   :local:
-
-Chapter 7 Introduction
-======================
-
-Welcome to Chapter 7 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a
-very respectable, albeit simple, `functional programming
-language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our
-journey, we learned some parsing techniques, how to build and represent
-an AST, how to build LLVM IR, and how to optimize the resultant code as
-well as JIT compile it.
-
-While Kaleidoscope is interesting as a functional language, the fact
-that it is functional makes it "too easy" to generate LLVM IR for it. In
-particular, a functional language makes it very easy to build LLVM IR
-directly in `SSA
-form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
-Since LLVM requires that the input code be in SSA form, this is a very
-nice property and it is often unclear to newcomers how to generate code
-for an imperative language with mutable variables.
-
-The short (and happy) summary of this chapter is that there is no need
-for your front-end to build SSA form: LLVM provides highly tuned and
-well tested support for this, though the way it works is a bit
-unexpected for some.
-
-Why is this a hard problem?
-===========================
-
-To understand why mutable variables cause complexities in SSA
-construction, consider this extremely simple C example:
-
-.. code-block:: c
-
-    int G, H;
-    int test(_Bool Condition) {
-      int X;
-      if (Condition)
-        X = G;
-      else
-        X = H;
-      return X;
-    }
-
-In this case, we have the variable "X", whose value depends on the path
-executed in the program. Because there are two different possible values
-for X before the return instruction, a PHI node is inserted to merge the
-two values. The LLVM IR that we want for this example looks like this:
-
-.. code-block:: llvm
-
-    @G = weak global i32 0   ; type of @G is i32*
-    @H = weak global i32 0   ; type of @H is i32*
-
-    define i32 @test(i1 %Condition) {
-    entry:
-      br i1 %Condition, label %cond_true, label %cond_false
-
-    cond_true:
-      %X.0 = load i32* @G
-      br label %cond_next
-
-    cond_false:
-      %X.1 = load i32* @H
-      br label %cond_next
-
-    cond_next:
-      %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
-      ret i32 %X.2
-    }
-
-In this example, the loads from the G and H global variables are
-explicit in the LLVM IR, and they live in the then/else branches of the
-if statement (cond\_true/cond\_false). In order to merge the incoming
-values, the X.2 phi node in the cond\_next block selects the right value
-to use based on where control flow is coming from: if control flow comes
-from the cond\_false block, X.2 gets the value of X.1. Alternatively, if
-control flow comes from cond\_true, it gets the value of X.0. The intent
-of this chapter is not to explain the details of SSA form. For more
-information, see one of the many `online
-references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
-
-The question for this article is "who places the phi nodes when lowering
-assignments to mutable variables?". The issue here is that LLVM
-*requires* that its IR be in SSA form: there is no "non-ssa" mode for
-it. However, SSA construction requires non-trivial algorithms and data
-structures, so it is inconvenient and wasteful for every front-end to
-have to reproduce this logic.
-
-Memory in LLVM
-==============
-
-The 'trick' here is that while LLVM does require all register values to
-be in SSA form, it does not require (or permit) memory objects to be in
-SSA form. In the example above, note that the loads from G and H are
-direct accesses to G and H: they are not renamed or versioned. This
-differs from some other compiler systems, which do try to version memory
-objects. In LLVM, instead of encoding dataflow analysis of memory into
-the LLVM IR, it is handled with `Analysis
-Passes <../WritingAnLLVMPass.html>`_ which are computed on demand.
-
-With this in mind, the high-level idea is that we want to make a stack
-variable (which lives in memory, because it is on the stack) for each
-mutable object in a function. To take advantage of this trick, we need
-to talk about how LLVM represents stack variables.
-
-In LLVM, all memory accesses are explicit with load/store instructions,
-and it is carefully designed not to have (or need) an "address-of"
-operator. Notice how the type of the @G/@H global variables is actually
-"i32\*" even though the variable is defined as "i32". What this means is
-that @G defines *space* for an i32 in the global data area, but its
-*name* actually refers to the address for that space. Stack variables
-work the same way, except that instead of being declared with global
-variable definitions, they are declared with the `LLVM alloca
-instruction <../LangRef.html#alloca-instruction>`_:
-
-.. code-block:: llvm
-
-    define i32 @example() {
-    entry:
-      %X = alloca i32           ; type of %X is i32*.
-      ...
-      %tmp = load i32* %X       ; load the stack value %X from the stack.
-      %tmp2 = add i32 %tmp, 1   ; increment it
-      store i32 %tmp2, i32* %X  ; store it back
-      ...
-
-This code shows an example of how you can declare and manipulate a stack
-variable in the LLVM IR. Stack memory allocated with the alloca
-instruction is fully general: you can pass the address of the stack slot
-to functions, you can store it in other variables, etc. In our example
-above, we could rewrite the example to use the alloca technique to avoid
-using a PHI node:
-
-.. code-block:: llvm
-
-    @G = weak global i32 0   ; type of @G is i32*
-    @H = weak global i32 0   ; type of @H is i32*
-
-    define i32 @test(i1 %Condition) {
-    entry:
-      %X = alloca i32           ; type of %X is i32*.
-      br i1 %Condition, label %cond_true, label %cond_false
-
-    cond_true:
-      %X.0 = load i32* @G
-      store i32 %X.0, i32* %X   ; Update X
-      br label %cond_next
-
-    cond_false:
-      %X.1 = load i32* @H
-      store i32 %X.1, i32* %X   ; Update X
-      br label %cond_next
-
-    cond_next:
-      %X.2 = load i32* %X       ; Read X
-      ret i32 %X.2
-    }
-
-With this, we have discovered a way to handle arbitrary mutable
-variables without the need to create Phi nodes at all:
-
-#. Each mutable variable becomes a stack allocation.
-#. Each read of the variable becomes a load from the stack.
-#. Each update of the variable becomes a store to the stack.
-#. Taking the address of a variable just uses the stack address
-   directly.
-
-While this solution has solved our immediate problem, it introduced
-another one: we have now apparently introduced a lot of stack traffic
-for very simple and common operations, a major performance problem.
-Fortunately for us, the LLVM optimizer has a highly-tuned optimization
-pass named "mem2reg" that handles this case, promoting allocas like this
-into SSA registers, inserting Phi nodes as appropriate. If you run this
-example through the pass, for example, you'll get:
-
-.. code-block:: bash
-
-    $ llvm-as < example.ll | opt -mem2reg | llvm-dis
-    @G = weak global i32 0
-    @H = weak global i32 0
-
-    define i32 @test(i1 %Condition) {
-    entry:
-      br i1 %Condition, label %cond_true, label %cond_false
-
-    cond_true:
-      %X.0 = load i32* @G
-      br label %cond_next
-
-    cond_false:
-      %X.1 = load i32* @H
-      br label %cond_next
-
-    cond_next:
-      %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
-      ret i32 %X.01
-    }
-
-The mem2reg pass implements the standard "iterated dominance frontier"
-algorithm for constructing SSA form and has a number of optimizations
-that speed up (very common) degenerate cases. The mem2reg optimization
-pass is the answer to dealing with mutable variables, and we highly
-recommend that you depend on it. Note that mem2reg only works on
-variables in certain circumstances:
-
-#. mem2reg is alloca-driven: it looks for allocas and if it can handle
-   them, it promotes them. It does not apply to global variables or heap
-   allocations.
-#. mem2reg only looks for alloca instructions in the entry block of the
-   function. Being in the entry block guarantees that the alloca is only
-   executed once, which makes analysis simpler.
-#. mem2reg only promotes allocas whose uses are direct loads and stores.
-   If the address of the stack object is passed to a function, or if any
-   funny pointer arithmetic is involved, the alloca will not be
-   promoted.
-#. mem2reg only works on allocas of `first
-   class <../LangRef.html#first-class-types>`_ values (such as pointers,
-   scalars and vectors), and only if the array size of the allocation is
-   1 (or missing in the .ll file). mem2reg is not capable of promoting
-   structs or arrays to registers. Note that the "sroa" pass is
-   more powerful and can promote structs, "unions", and arrays in many
-   cases.
-
-All of these properties are easy to satisfy for most imperative
-languages, and we'll illustrate it below with Kaleidoscope. The final
-question you may be asking is: should I bother with this nonsense for my
-front-end? Wouldn't it be better if I just did SSA construction
-directly, avoiding use of the mem2reg optimization pass? In short, we
-strongly recommend that you use this technique for building SSA form,
-unless there is an extremely good reason not to. Using this technique
-is:
-
--  Proven and well tested: clang uses this technique
-   for local mutable variables. As such, the most common clients of LLVM
-   are using this to handle a bulk of their variables. You can be sure
-   that bugs are found fast and fixed early.
--  Extremely Fast: mem2reg has a number of special cases that make it
-   fast in common cases as well as fully general. For example, it has
-   fast-paths for variables that are only used in a single block,
-   variables that only have one assignment point, good heuristics to
-   avoid insertion of unneeded phi nodes, etc.
--  Needed for debug info generation: `Debug information in
-   LLVM <../SourceLevelDebugging.html>`_ relies on having the address of
-   the variable exposed so that debug info can be attached to it. This
-   technique dovetails very naturally with this style of debug info.
-
-If nothing else, this makes it much easier to get your front-end up and
-running, and is very simple to implement. Let's extend Kaleidoscope with
-mutable variables now!
-
-Mutable Variables in Kaleidoscope
-=================================
-
-Now that we know the sort of problem we want to tackle, let's see what
-this looks like in the context of our little Kaleidoscope language.
-We're going to add two features:
-
-#. The ability to mutate variables with the '=' operator.
-#. The ability to define new variables.
-
-While the first item is really what this is about, we only have
-variables for incoming arguments as well as for induction variables, and
-redefining those only goes so far :). Also, the ability to define new
-variables is a useful thing regardless of whether you will be mutating
-them. Here's a motivating example that shows how we could use these:
-
-::
-
-    # Define ':' for sequencing: as a low-precedence operator that ignores operands
-    # and just returns the RHS.
-    def binary : 1 (x y) y;
-
-    # Recursive fib, we could do this before.
-    def fib(x)
-      if (x < 3) then
-        1
-      else
-        fib(x-1)+fib(x-2);
-
-    # Iterative fib.
-    def fibi(x)
-      var a = 1, b = 1, c in
-      (for i = 3, i < x in
-         c = a + b :
-         a = b :
-         b = c) :
-      b;
-
-    # Call it.
-    fibi(10);
-
-In order to mutate variables, we have to change our existing variables
-to use the "alloca trick". Once we have that, we'll add our new
-operator, then extend Kaleidoscope to support new variable definitions.
-
-Adjusting Existing Variables for Mutation
-=========================================
-
-The symbol table in Kaleidoscope is managed at code generation time by
-the '``NamedValues``' map. This map currently keeps track of the LLVM
-"Value\*" that holds the double value for the named variable. In order
-to support mutation, we need to change this slightly, so that
-``NamedValues`` holds the *memory location* of the variable in question.
-Note that this change is a refactoring: it changes the structure of the
-code, but does not (by itself) change the behavior of the compiler. All
-of these changes are isolated in the Kaleidoscope code generator.
-
-At this point in Kaleidoscope's development, it only supports variables
-for two things: incoming arguments to functions and the induction
-variable of 'for' loops. For consistency, we'll allow mutation of these
-variables in addition to other user-defined variables. This means that
-these will both need memory locations.
-
-To start our transformation of Kaleidoscope, we'll change the
-NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once
-we do this, the C++ compiler will tell us what parts of the code we need
-to update:
-
-.. code-block:: c++
-
-    static std::map<std::string, AllocaInst*> NamedValues;
-
-Also, since we will need to create these alloca's, we'll use a helper
-function that ensures that the allocas are created in the entry block of
-the function:
-
-.. code-block:: c++
-
-    /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
-    /// the function.  This is used for mutable variables etc.
-    static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
-                                              const std::string &VarName) {
-      IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
-                     TheFunction->getEntryBlock().begin());
-      return TmpB.CreateAlloca(Type::getDoubleTy(LLVMContext), 0,
-                               VarName.c_str());
-    }
-
-This funny looking code creates an IRBuilder object that is pointing at
-the first instruction (.begin()) of the entry block. It then creates an
-alloca with the expected name and returns it. Because all values in
-Kaleidoscope are doubles, there is no need to pass in a type to use.
-
-With this in place, the first functionality change we want to make is to
-variable references. In our new scheme, variables live on the stack, so
-code generating a reference to them actually needs to produce a load
-from the stack slot:
-
-.. code-block:: c++
-
-    Value *VariableExprAST::codegen() {
-      // Look this variable up in the function.
-      Value *V = NamedValues[Name];
-      if (!V)
-        return LogErrorV("Unknown variable name");
-
-      // Load the value.
-      return Builder.CreateLoad(V, Name.c_str());
-    }
-
-As you can see, this is pretty straightforward. Now we need to update
-the things that define the variables to set up the alloca. We'll start
-with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for
-the unabridged code):
-
-.. code-block:: c++
-
-      Function *TheFunction = Builder.GetInsertBlock()->getParent();
-
-      // Create an alloca for the variable in the entry block.
-      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
-
-        // Emit the start code first, without 'variable' in scope.
-      Value *StartVal = Start->codegen();
-      if (!StartVal)
-        return nullptr;
-
-      // Store the value into the alloca.
-      Builder.CreateStore(StartVal, Alloca);
-      ...
-
-      // Compute the end condition.
-      Value *EndCond = End->codegen();
-      if (!EndCond)
-        return nullptr;
-
-      // Reload, increment, and restore the alloca.  This handles the case where
-      // the body of the loop mutates the variable.
-      Value *CurVar = Builder.CreateLoad(Alloca);
-      Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
-      Builder.CreateStore(NextVar, Alloca);
-      ...
-
-This code is virtually identical to the code `before we allowed mutable
-variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we
-no longer have to construct a PHI node, and we use load/store to access
-the variable as needed.
-
-To support mutable argument variables, we need to also make allocas for
-them. The code for this is also pretty simple:
-
-.. code-block:: c++
-
-    /// CreateArgumentAllocas - Create an alloca for each argument and register the
-    /// argument in the symbol table so that references to it will succeed.
-    void PrototypeAST::CreateArgumentAllocas(Function *F) {
-      Function::arg_iterator AI = F->arg_begin();
-      for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
-        // Create an alloca for this variable.
-        AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
-
-        // Store the initial value into the alloca.
-        Builder.CreateStore(AI, Alloca);
-
-        // Add arguments to variable symbol table.
-        NamedValues[Args[Idx]] = Alloca;
-      }
-    }
-
-For each argument, we make an alloca, store the input value to the
-function into the alloca, and register the alloca as the memory location
-for the argument. This method gets invoked by ``FunctionAST::codegen()``
-right after it sets up the entry block for the function.
-
-The final missing piece is adding the mem2reg pass, which allows us to
-get good codegen once again:
-
-.. code-block:: c++
-
-        // Set up the optimizer pipeline.  Start with registering info about how the
-        // target lays out data structures.
-        OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));
-        // Promote allocas to registers.
-        OurFPM.add(createPromoteMemoryToRegisterPass());
-        // Do simple "peephole" optimizations and bit-twiddling optzns.
-        OurFPM.add(createInstructionCombiningPass());
-        // Reassociate expressions.
-        OurFPM.add(createReassociatePass());
-
-It is interesting to see what the code looks like before and after the
-mem2reg optimization runs. For example, this is the before/after code
-for our recursive fib function. Before the optimization:
-
-.. code-block:: llvm
-
-    define double @fib(double %x) {
-    entry:
-      %x1 = alloca double
-      store double %x, double* %x1
-      %x2 = load double* %x1
-      %cmptmp = fcmp ult double %x2, 3.000000e+00
-      %booltmp = uitofp i1 %cmptmp to double
-      %ifcond = fcmp one double %booltmp, 0.000000e+00
-      br i1 %ifcond, label %then, label %else
-
-    then:       ; preds = %entry
-      br label %ifcont
-
-    else:       ; preds = %entry
-      %x3 = load double* %x1
-      %subtmp = fsub double %x3, 1.000000e+00
-      %calltmp = call double @fib(double %subtmp)
-      %x4 = load double* %x1
-      %subtmp5 = fsub double %x4, 2.000000e+00
-      %calltmp6 = call double @fib(double %subtmp5)
-      %addtmp = fadd double %calltmp, %calltmp6
-      br label %ifcont
-
-    ifcont:     ; preds = %else, %then
-      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
-      ret double %iftmp
-    }
-
-Here there is only one variable (x, the input argument) but you can
-still see the extremely simple-minded code generation strategy we are
-using. In the entry block, an alloca is created, and the initial input
-value is stored into it. Each reference to the variable does a reload
-from the stack. Also, note that we didn't modify the if/then/else
-expression, so it still inserts a PHI node. While we could make an
-alloca for it, it is actually easier to create a PHI node for it, so we
-still just make the PHI.
-
-Here is the code after the mem2reg pass runs:
-
-.. code-block:: llvm
-
-    define double @fib(double %x) {
-    entry:
-      %cmptmp = fcmp ult double %x, 3.000000e+00
-      %booltmp = uitofp i1 %cmptmp to double
-      %ifcond = fcmp one double %booltmp, 0.000000e+00
-      br i1 %ifcond, label %then, label %else
-
-    then:
-      br label %ifcont
-
-    else:
-      %subtmp = fsub double %x, 1.000000e+00
-      %calltmp = call double @fib(double %subtmp)
-      %subtmp5 = fsub double %x, 2.000000e+00
-      %calltmp6 = call double @fib(double %subtmp5)
-      %addtmp = fadd double %calltmp, %calltmp6
-      br label %ifcont
-
-    ifcont:     ; preds = %else, %then
-      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
-      ret double %iftmp
-    }
-
-This is a trivial case for mem2reg, since there are no redefinitions of
-the variable. The point of showing this is to calm your tension about
-inserting such blatent inefficiencies :).
-
-After the rest of the optimizers run, we get:
-
-.. code-block:: llvm
-
-    define double @fib(double %x) {
-    entry:
-      %cmptmp = fcmp ult double %x, 3.000000e+00
-      %booltmp = uitofp i1 %cmptmp to double
-      %ifcond = fcmp ueq double %booltmp, 0.000000e+00
-      br i1 %ifcond, label %else, label %ifcont
-
-    else:
-      %subtmp = fsub double %x, 1.000000e+00
-      %calltmp = call double @fib(double %subtmp)
-      %subtmp5 = fsub double %x, 2.000000e+00
-      %calltmp6 = call double @fib(double %subtmp5)
-      %addtmp = fadd double %calltmp, %calltmp6
-      ret double %addtmp
-
-    ifcont:
-      ret double 1.000000e+00
-    }
-
-Here we see that the simplifycfg pass decided to clone the return
-instruction into the end of the 'else' block. This allowed it to
-eliminate some branches and the PHI node.
-
-Now that all symbol table references are updated to use stack variables,
-we'll add the assignment operator.
-
-New Assignment Operator
-=======================
-
-With our current framework, adding a new assignment operator is really
-simple. We will parse it just like any other binary operator, but handle
-it internally (instead of allowing the user to define it). The first
-step is to set a precedence:
-
-.. code-block:: c++
-
-     int main() {
-       // Install standard binary operators.
-       // 1 is lowest precedence.
-       BinopPrecedence['='] = 2;
-       BinopPrecedence['<'] = 10;
-       BinopPrecedence['+'] = 20;
-       BinopPrecedence['-'] = 20;
-
-Now that the parser knows the precedence of the binary operator, it
-takes care of all the parsing and AST generation. We just need to
-implement codegen for the assignment operator. This looks like:
-
-.. code-block:: c++
-
-    Value *BinaryExprAST::codegen() {
-      // Special case '=' because we don't want to emit the LHS as an expression.
-      if (Op == '=') {
-        // Assignment requires the LHS to be an identifier.
-        VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());
-        if (!LHSE)
-          return LogErrorV("destination of '=' must be a variable");
-
-Unlike the rest of the binary operators, our assignment operator doesn't
-follow the "emit LHS, emit RHS, do computation" model. As such, it is
-handled as a special case before the other binary operators are handled.
-The other strange thing is that it requires the LHS to be a variable. It
-is invalid to have "(x+1) = expr" - only things like "x = expr" are
-allowed.
-
-.. code-block:: c++
-
-        // Codegen the RHS.
-        Value *Val = RHS->codegen();
-        if (!Val)
-          return nullptr;
-
-        // Look up the name.
-        Value *Variable = NamedValues[LHSE->getName()];
-        if (!Variable)
-          return LogErrorV("Unknown variable name");
-
-        Builder.CreateStore(Val, Variable);
-        return Val;
-      }
-      ...
-
-Once we have the variable, codegen'ing the assignment is
-straightforward: we emit the RHS of the assignment, create a store, and
-return the computed value. Returning a value allows for chained
-assignments like "X = (Y = Z)".
-
-Now that we have an assignment operator, we can mutate loop variables
-and arguments. For example, we can now run code like this:
-
-::
-
-    # Function to print a double.
-    extern printd(x);
-
-    # Define ':' for sequencing: as a low-precedence operator that ignores operands
-    # and just returns the RHS.
-    def binary : 1 (x y) y;
-
-    def test(x)
-      printd(x) :
-      x = 4 :
-      printd(x);
-
-    test(123);
-
-When run, this example prints "123" and then "4", showing that we did
-actually mutate the value! Okay, we have now officially implemented our
-goal: getting this to work requires SSA construction in the general
-case. However, to be really useful, we want the ability to define our
-own local variables, let's add this next!
-
-User-defined Local Variables
-============================
-
-Adding var/in is just like any other extension we made to
-Kaleidoscope: we extend the lexer, the parser, the AST and the code
-generator. The first step for adding our new 'var/in' construct is to
-extend the lexer. As before, this is pretty trivial, the code looks like
-this:
-
-.. code-block:: c++
-
-    enum Token {
-      ...
-      // var definition
-      tok_var = -13
-    ...
-    }
-    ...
-    static int gettok() {
-    ...
-        if (IdentifierStr == "in")
-          return tok_in;
-        if (IdentifierStr == "binary")
-          return tok_binary;
-        if (IdentifierStr == "unary")
-          return tok_unary;
-        if (IdentifierStr == "var")
-          return tok_var;
-        return tok_identifier;
-    ...
-
-The next step is to define the AST node that we will construct. For
-var/in, it looks like this:
-
-.. code-block:: c++
-
-    /// VarExprAST - Expression class for var/in
-    class VarExprAST : public ExprAST {
-      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
-      std::unique_ptr<ExprAST> Body;
-
-    public:
-      VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
-                 std::unique_ptr<ExprAST> body)
-      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
-
-      virtual Value *codegen();
-    };
-
-var/in allows a list of names to be defined all at once, and each name
-can optionally have an initializer value. As such, we capture this
-information in the VarNames vector. Also, var/in has a body, this body
-is allowed to access the variables defined by the var/in.
-
-With this in place, we can define the parser pieces. The first thing we
-do is add it as a primary expression:
-
-.. code-block:: c++
-
-    /// primary
-    ///   ::= identifierexpr
-    ///   ::= numberexpr
-    ///   ::= parenexpr
-    ///   ::= ifexpr
-    ///   ::= forexpr
-    ///   ::= varexpr
-    static std::unique_ptr<ExprAST> ParsePrimary() {
-      switch (CurTok) {
-      default:
-        return LogError("unknown token when expecting an expression");
-      case tok_identifier:
-        return ParseIdentifierExpr();
-      case tok_number:
-        return ParseNumberExpr();
-      case '(':
-        return ParseParenExpr();
-      case tok_if:
-        return ParseIfExpr();
-      case tok_for:
-        return ParseForExpr();
-      case tok_var:
-        return ParseVarExpr();
-      }
-    }
-
-Next we define ParseVarExpr:
-
-.. code-block:: c++
-
-    /// varexpr ::= 'var' identifier ('=' expression)?
-    //                    (',' identifier ('=' expression)?)* 'in' expression
-    static std::unique_ptr<ExprAST> ParseVarExpr() {
-      getNextToken();  // eat the var.
-
-      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
-
-      // At least one variable name is required.
-      if (CurTok != tok_identifier)
-        return LogError("expected identifier after var");
-
-The first part of this code parses the list of identifier/expr pairs
-into the local ``VarNames`` vector.
-
-.. code-block:: c++
-
-      while (1) {
-        std::string Name = IdentifierStr;
-        getNextToken();  // eat identifier.
-
-        // Read the optional initializer.
-        std::unique_ptr<ExprAST> Init;
-        if (CurTok == '=') {
-          getNextToken(); // eat the '='.
-
-          Init = ParseExpression();
-          if (!Init) return nullptr;
-        }
-
-        VarNames.push_back(std::make_pair(Name, std::move(Init)));
-
-        // End of var list, exit loop.
-        if (CurTok != ',') break;
-        getNextToken(); // eat the ','.
-
-        if (CurTok != tok_identifier)
-          return LogError("expected identifier list after var");
-      }
-
-Once all the variables are parsed, we then parse the body and create the
-AST node:
-
-.. code-block:: c++
-
-      // At this point, we have to have 'in'.
-      if (CurTok != tok_in)
-        return LogError("expected 'in' keyword after 'var'");
-      getNextToken();  // eat 'in'.
-
-      auto Body = ParseExpression();
-      if (!Body)
-        return nullptr;
-
-      return llvm::make_unique<VarExprAST>(std::move(VarNames),
-                                           std::move(Body));
-    }
-
-Now that we can parse and represent the code, we need to support
-emission of LLVM IR for it. This code starts out with:
-
-.. code-block:: c++
-
-    Value *VarExprAST::codegen() {
-      std::vector<AllocaInst *> OldBindings;
-
-      Function *TheFunction = Builder.GetInsertBlock()->getParent();
-
-      // Register all variables and emit their initializer.
-      for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
-        const std::string &VarName = VarNames[i].first;
-        ExprAST *Init = VarNames[i].second.get();
-
-Basically it loops over all the variables, installing them one at a
-time. For each variable we put into the symbol table, we remember the
-previous value that we replace in OldBindings.
-
-.. code-block:: c++
-
-        // Emit the initializer before adding the variable to scope, this prevents
-        // the initializer from referencing the variable itself, and permits stuff
-        // like this:
-        //  var a = 1 in
-        //    var a = a in ...   # refers to outer 'a'.
-        Value *InitVal;
-        if (Init) {
-          InitVal = Init->codegen();
-          if (!InitVal)
-            return nullptr;
-        } else { // If not specified, use 0.0.
-          InitVal = ConstantFP::get(LLVMContext, APFloat(0.0));
-        }
-
-        AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
-        Builder.CreateStore(InitVal, Alloca);
-
-        // Remember the old variable binding so that we can restore the binding when
-        // we unrecurse.
-        OldBindings.push_back(NamedValues[VarName]);
-
-        // Remember this binding.
-        NamedValues[VarName] = Alloca;
-      }
-
-There are more comments here than code. The basic idea is that we emit
-the initializer, create the alloca, then update the symbol table to
-point to it. Once all the variables are installed in the symbol table,
-we evaluate the body of the var/in expression:
-
-.. code-block:: c++
-
-      // Codegen the body, now that all vars are in scope.
-      Value *BodyVal = Body->codegen();
-      if (!BodyVal)
-        return nullptr;
-
-Finally, before returning, we restore the previous variable bindings:
-
-.. code-block:: c++
-
-      // Pop all our variables from scope.
-      for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
-        NamedValues[VarNames[i].first] = OldBindings[i];
-
-      // Return the body computation.
-      return BodyVal;
-    }
-
-The end result of all of this is that we get properly scoped variable
-definitions, and we even (trivially) allow mutation of them :).
-
-With this, we completed what we set out to do. Our nice iterative fib
-example from the intro compiles and runs just fine. The mem2reg pass
-optimizes all of our stack variables into SSA registers, inserting PHI
-nodes where needed, and our front-end remains simple: no "iterated
-dominance frontier" computation anywhere in sight.
-
-Full Code Listing
-=================
-
-Here is the complete code listing for our running example, enhanced with
-mutable variables and var/in support. To build this example, use:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
-    # Run
-    ./toy
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp
-   :language: c++
-
-`Next: Adding Debug Information <LangImpl8.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl8.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl8.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl8.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl8.rst (removed)
@@ -1,462 +0,0 @@
-======================================
-Kaleidoscope: Adding Debug Information
-======================================
-
-.. contents::
-   :local:
-
-Chapter 8 Introduction
-======================
-
-Welcome to Chapter 8 of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a
-decent little programming language with functions and variables.
-What happens if something goes wrong though, how do you debug your
-program?
-
-Source level debugging uses formatted data that helps a debugger
-translate from binary and the state of the machine back to the
-source that the programmer wrote. In LLVM we generally use a format
-called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding
-that represents types, source locations, and variable locations. 
-
-The short summary of this chapter is that we'll go through the
-various things you have to add to a programming language to
-support debug info, and how you translate that into DWARF.
-
-Caveat: For now we can't debug via the JIT, so we'll need to compile
-our program down to something small and standalone. As part of this
-we'll make a few modifications to the running of the language and
-how programs are compiled. This means that we'll have a source file
-with a simple program written in Kaleidoscope rather than the
-interactive JIT. It does involve a limitation that we can only
-have one "top level" command at a time to reduce the number of
-changes necessary.
-
-Here's the sample program we'll be compiling:
-
-.. code-block:: python
-
-   def fib(x)
-     if x < 3 then
-       1
-     else
-       fib(x-1)+fib(x-2);
-
-   fib(10)
-
-
-Why is this a hard problem?
-===========================
-
-Debug information is a hard problem for a few different reasons - mostly
-centered around optimized code. First, optimization makes keeping source
-locations more difficult. In LLVM IR we keep the original source location
-for each IR level instruction on the instruction. Optimization passes
-should keep the source locations for newly created instructions, but merged
-instructions only get to keep a single location - this can cause jumping
-around when stepping through optimized programs. Secondly, optimization
-can move variables in ways that are either optimized out, shared in memory
-with other variables, or difficult to track. For the purposes of this
-tutorial we're going to avoid optimization (as you'll see with one of the
-next sets of patches).
-
-Ahead-of-Time Compilation Mode
-==============================
-
-To highlight only the aspects of adding debug information to a source
-language without needing to worry about the complexities of JIT debugging
-we're going to make a few changes to Kaleidoscope to support compiling
-the IR emitted by the front end into a simple standalone program that
-you can execute, debug, and see results.
-
-First we make our anonymous function that contains our top level
-statement be our "main":
-
-.. code-block:: udiff
-
-  -    auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());
-  +    auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>());
-
-just with the simple change of giving it a name.
-
-Then we're going to remove the command line code wherever it exists:
-
-.. code-block:: udiff
-
-  @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {
-   /// top ::= definition | external | expression | ';'
-   static void MainLoop() {
-     while (1) {
-  -    fprintf(stderr, "ready> ");
-       switch (CurTok) {
-       case tok_eof:
-         return;
-  @@ -1184,7 +1183,6 @@ int main() {
-     BinopPrecedence['*'] = 40; // highest.
- 
-     // Prime the first token.
-  -  fprintf(stderr, "ready> ");
-     getNextToken();
- 
-Lastly we're going to disable all of the optimization passes and the JIT so
-that the only thing that happens after we're done parsing and generating
-code is that the llvm IR goes to standard error:
-
-.. code-block:: udiff
-
-  @@ -1108,17 +1108,8 @@ static void HandleExtern() {
-   static void HandleTopLevelExpression() {
-     // Evaluate a top-level expression into an anonymous function.
-     if (auto FnAST = ParseTopLevelExpr()) {
-  -    if (auto *FnIR = FnAST->codegen()) {
-  -      // We're just doing this to make sure it executes.
-  -      TheExecutionEngine->finalizeObject();
-  -      // JIT the function, returning a function pointer.
-  -      void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR);
-  -
-  -      // Cast it to the right type (takes no arguments, returns a double) so we
-  -      // can call it as a native function.
-  -      double (*FP)() = (double (*)())(intptr_t)FPtr;
-  -      // Ignore the return value for this.
-  -      (void)FP;
-  +    if (!F->codegen()) {
-  +      fprintf(stderr, "Error generating code for top level expr");
-       }
-     } else {
-       // Skip token for error recovery.
-  @@ -1439,11 +1459,11 @@ int main() {
-     // target lays out data structures.
-     TheModule->setDataLayout(TheExecutionEngine->getDataLayout());
-     OurFPM.add(new DataLayoutPass());
-  +#if 0
-     OurFPM.add(createBasicAliasAnalysisPass());
-     // Promote allocas to registers.
-     OurFPM.add(createPromoteMemoryToRegisterPass());
-  @@ -1218,7 +1210,7 @@ int main() {
-     OurFPM.add(createGVNPass());
-     // Simplify the control flow graph (deleting unreachable blocks, etc).
-     OurFPM.add(createCFGSimplificationPass());
-  -
-  +  #endif
-     OurFPM.doInitialization();
- 
-     // Set the global so the code gen can use this.
-
-This relatively small set of changes get us to the point that we can compile
-our piece of Kaleidoscope language down to an executable program via this
-command line:
-
-.. code-block:: bash
-
-  Kaleidoscope-Ch8 < fib.ks | & clang -x ir -
-
-which gives an a.out/a.exe in the current working directory.
-
-Compile Unit
-============
-
-The top level container for a section of code in DWARF is a compile unit.
-This contains the type and function data for an individual translation unit
-(read: one file of source code). So the first thing we need to do is
-construct one for our fib.ks file.
-
-DWARF Emission Setup
-====================
-
-Similar to the ``IRBuilder`` class we have a
-`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class
-that helps in constructing debug metadata for an llvm IR file. It
-corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.
-Using it does require that you be more familiar with DWARF terminology than
-you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you
-read through the general documentation on the
-`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it
-should be a little more clear. We'll be using this class to construct all
-of our IR level descriptions. Construction for it takes a module so we
-need to construct it shortly after we construct our module. We've left it
-as a global static variable to make it a bit easier to use.
-
-Next we're going to create a small container to cache some of our frequent
-data. The first will be our compile unit, but we'll also write a bit of
-code for our one type since we won't have to worry about multiple typed
-expressions:
-
-.. code-block:: c++
-
-  static DIBuilder *DBuilder;
-
-  struct DebugInfo {
-    DICompileUnit *TheCU;
-    DIType *DblTy;
-
-    DIType *getDoubleTy();
-  } KSDbgInfo;
-
-  DIType *DebugInfo::getDoubleTy() {
-    if (DblTy.isValid())
-      return DblTy;
-
-    DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
-    return DblTy;
-  }
-
-And then later on in ``main`` when we're constructing our module:
-
-.. code-block:: c++
-
-  DBuilder = new DIBuilder(*TheModule);
-
-  KSDbgInfo.TheCU = DBuilder->createCompileUnit(
-      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
-
-There are a couple of things to note here. First, while we're producing a
-compile unit for a language called Kaleidoscope we used the language
-constant for C. This is because a debugger wouldn't necessarily understand
-the calling conventions or default ABI for a language it doesn't recognize
-and we follow the C ABI in our llvm code generation so it's the closest
-thing to accurate. This ensures we can actually call functions from the
-debugger and have them execute. Secondly, you'll see the "fib.ks" in the
-call to ``createCompileUnit``. This is a default hard coded value since
-we're using shell redirection to put our source into the Kaleidoscope
-compiler. In a usual front end you'd have an input file name and it would
-go there.
-
-One last thing as part of emitting debug information via DIBuilder is that
-we need to "finalize" the debug information. The reasons are part of the
-underlying API for DIBuilder, but make sure you do this near the end of
-main:
-
-.. code-block:: c++
-
-  DBuilder->finalize();
-
-before you dump out the module.
-
-Functions
-=========
-
-Now that we have our ``Compile Unit`` and our source locations, we can add
-function definitions to the debug info. So in ``PrototypeAST::codegen()`` we
-add a few lines of code to describe a context for our subprogram, in this
-case the "File", and the actual definition of the function itself.
-
-So the context:
-
-.. code-block:: c++
-
-  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
-                                      KSDbgInfo.TheCU.getDirectory());
-
-giving us an DIFile and asking the ``Compile Unit`` we created above for the
-directory and filename where we are currently. Then, for now, we use some
-source locations of 0 (since our AST doesn't currently have source location
-information) and construct our function definition:
-
-.. code-block:: c++
-
-  DIScope *FContext = Unit;
-  unsigned LineNo = 0;
-  unsigned ScopeLine = 0;
-  DISubprogram *SP = DBuilder->createFunction(
-      FContext, Name, StringRef(), Unit, LineNo,
-      CreateFunctionType(Args.size(), Unit), false /* internal linkage */,
-      true /* definition */, ScopeLine, DINode::FlagPrototyped, false);
-  F->setSubprogram(SP);
-
-and we now have an DISubprogram that contains a reference to all of our
-metadata for the function.
-
-Source Locations
-================
-
-The most important thing for debug information is accurate source location -
-this makes it possible to map your source code back. We have a problem though,
-Kaleidoscope really doesn't have any source location information in the lexer
-or parser so we'll need to add it.
-
-.. code-block:: c++
-
-   struct SourceLocation {
-     int Line;
-     int Col;
-   };
-   static SourceLocation CurLoc;
-   static SourceLocation LexLoc = {1, 0};
-
-   static int advance() {
-     int LastChar = getchar();
-
-     if (LastChar == '\n' || LastChar == '\r') {
-       LexLoc.Line++;
-       LexLoc.Col = 0;
-     } else
-       LexLoc.Col++;
-     return LastChar;
-   }
-
-In this set of code we've added some functionality on how to keep track of the
-line and column of the "source file". As we lex every token we set our current
-current "lexical location" to the assorted line and column for the beginning
-of the token. We do this by overriding all of the previous calls to
-``getchar()`` with our new ``advance()`` that keeps track of the information
-and then we have added to all of our AST classes a source location:
-
-.. code-block:: c++
-
-   class ExprAST {
-     SourceLocation Loc;
-
-     public:
-       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
-       virtual ~ExprAST() {}
-       virtual Value* codegen() = 0;
-       int getLine() const { return Loc.Line; }
-       int getCol() const { return Loc.Col; }
-       virtual raw_ostream &dump(raw_ostream &out, int ind) {
-         return out << ':' << getLine() << ':' << getCol() << '\n';
-       }
-
-that we pass down through when we create a new expression:
-
-.. code-block:: c++
-
-   LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),
-                                          std::move(RHS));
-
-giving us locations for each of our expressions and variables.
-
-From this we can make sure to tell ``DIBuilder`` when we're at a new source
-location so it can use that when we generate the rest of our code and make
-sure that each instruction has source location information. We do this
-by constructing another small function:
-
-.. code-block:: c++
-
-  void DebugInfo::emitLocation(ExprAST *AST) {
-    DIScope *Scope;
-    if (LexicalBlocks.empty())
-      Scope = TheCU;
-    else
-      Scope = LexicalBlocks.back();
-    Builder.SetCurrentDebugLocation(
-        DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
-  }
-
-that both tells the main ``IRBuilder`` where we are, but also what scope
-we're in. Since we've just created a function above we can either be in
-the main file scope (like when we created our function), or now we can be
-in the function scope we just created. To represent this we create a stack
-of scopes:
-
-.. code-block:: c++
-
-   std::vector<DIScope *> LexicalBlocks;
-   std::map<const PrototypeAST *, DIScope *> FnScopeMap;
-
-and keep a map of each function to the scope that it represents (an
-DISubprogram is also an DIScope).
-
-Then we make sure to:
-
-.. code-block:: c++
-
-   KSDbgInfo.emitLocation(this);
-
-emit the location every time we start to generate code for a new AST, and
-also:
-
-.. code-block:: c++
-
-  KSDbgInfo.FnScopeMap[this] = SP;
-
-store the scope (function) when we create it and use it:
-
-  KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);
-
-when we start generating the code for each function.
-
-also, don't forget to pop the scope back off of your scope stack at the
-end of the code generation for the function:
-
-.. code-block:: c++
-
-  // Pop off the lexical block for the function since we added it
-  // unconditionally.
-  KSDbgInfo.LexicalBlocks.pop_back();
-
-Variables
-=========
-
-Now that we have functions, we need to be able to print out the variables
-we have in scope. Let's get our function arguments set up so we can get
-decent backtraces and see how our functions are being called. It isn't
-a lot of code, and we generally handle it when we're creating the
-argument allocas in ``PrototypeAST::CreateArgumentAllocas``.
-
-.. code-block:: c++
-
-  DIScope *Scope = KSDbgInfo.LexicalBlocks.back();
-  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),
-                                      KSDbgInfo.TheCU.getDirectory());
-  DILocalVariable D = DBuilder->createParameterVariable(
-      Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true);
-
-  DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
-                          DebugLoc::get(Line, 0, Scope),
-                          Builder.GetInsertBlock());
-
-Here we're doing a few things. First, we're grabbing our current scope
-for the variable so we can say what range of code our variable is valid
-through. Second, we're creating the variable, giving it the scope,
-the name, source location, type, and since it's an argument, the argument
-index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR
-level that we've got a variable in an alloca (and it gives a starting
-location for the variable), and setting a source location for the
-beginning of the scope on the declare.
-
-One interesting thing to note at this point is that various debuggers have
-assumptions based on how code and debug information was generated for them
-in the past. In this case we need to do a little bit of a hack to avoid
-generating line information for the function prologue so that the debugger
-knows to skip over those instructions when setting a breakpoint. So in
-``FunctionAST::CodeGen`` we add a couple of lines:
-
-.. code-block:: c++
-
-  // Unset the location for the prologue emission (leading instructions with no
-  // location in a function are considered part of the prologue and the debugger
-  // will run past them when breaking on a function)
-  KSDbgInfo.emitLocation(nullptr);
-
-and then emit a new location when we actually start generating code for the
-body of the function:
-
-.. code-block:: c++
-
-  KSDbgInfo.emitLocation(Body);
-
-With this we have enough debug information to set breakpoints in functions,
-print out argument variables, and call functions. Not too bad for just a
-few simple lines of code!
-
-Full Code Listing
-=================
-
-Here is the complete code listing for our running example, enhanced with
-debug information. To build this example, use:
-
-.. code-block:: bash
-
-    # Compile
-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
-    # Run
-    ./toy
-
-Here is the code:
-
-.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp
-   :language: c++
-
-`Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_
-

Removed: llvm/trunk/docs/tutorial/LangImpl9.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl9.rst?rev=274440&view=auto
==============================================================================
--- llvm/trunk/docs/tutorial/LangImpl9.rst (original)
+++ llvm/trunk/docs/tutorial/LangImpl9.rst (removed)
@@ -1,259 +0,0 @@
-======================================================
-Kaleidoscope: Conclusion and other useful LLVM tidbits
-======================================================
-
-.. contents::
-   :local:
-
-Tutorial Conclusion
-===================
-
-Welcome to the final chapter of the "`Implementing a language with
-LLVM <index.html>`_" tutorial. In the course of this tutorial, we have
-grown our little Kaleidoscope language from being a useless toy, to
-being a semi-interesting (but probably still useless) toy. :)
-
-It is interesting to see how far we've come, and how little code it has
-taken. We built the entire lexer, parser, AST, code generator, an
-interactive run-loop (with a JIT!), and emitted debug information in
-standalone executables - all in under 1000 lines of (non-comment/non-blank)
-code.
-
-Our little language supports a couple of interesting features: it
-supports user defined binary and unary operators, it uses JIT
-compilation for immediate evaluation, and it supports a few control flow
-constructs with SSA construction.
-
-Part of the idea of this tutorial was to show you how easy and fun it
-can be to define, build, and play with languages. Building a compiler
-need not be a scary or mystical process! Now that you've seen some of
-the basics, I strongly encourage you to take the code and hack on it.
-For example, try adding:
-
--  **global variables** - While global variables have questional value
-   in modern software engineering, they are often useful when putting
-   together quick little hacks like the Kaleidoscope compiler itself.
-   Fortunately, our current setup makes it very easy to add global
-   variables: just have value lookup check to see if an unresolved
-   variable is in the global variable symbol table before rejecting it.
-   To create a new global variable, make an instance of the LLVM
-   ``GlobalVariable`` class.
--  **typed variables** - Kaleidoscope currently only supports variables
-   of type double. This gives the language a very nice elegance, because
-   only supporting one type means that you never have to specify types.
-   Different languages have different ways of handling this. The easiest
-   way is to require the user to specify types for every variable
-   definition, and record the type of the variable in the symbol table
-   along with its Value\*.
--  **arrays, structs, vectors, etc** - Once you add types, you can start
-   extending the type system in all sorts of interesting ways. Simple
-   arrays are very easy and are quite useful for many different
-   applications. Adding them is mostly an exercise in learning how the
-   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction
-   works: it is so nifty/unconventional, it `has its own
-   FAQ <../GetElementPtr.html>`_!
--  **standard runtime** - Our current language allows the user to access
-   arbitrary external functions, and we use it for things like "printd"
-   and "putchard". As you extend the language to add higher-level
-   constructs, often these constructs make the most sense if they are
-   lowered to calls into a language-supplied runtime. For example, if
-   you add hash tables to the language, it would probably make sense to
-   add the routines to a runtime, instead of inlining them all the way.
--  **memory management** - Currently we can only access the stack in
-   Kaleidoscope. It would also be useful to be able to allocate heap
-   memory, either with calls to the standard libc malloc/free interface
-   or with a garbage collector. If you would like to use garbage
-   collection, note that LLVM fully supports `Accurate Garbage
-   Collection <../GarbageCollection.html>`_ including algorithms that
-   move objects and need to scan/update the stack.
--  **exception handling support** - LLVM supports generation of `zero
-   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
-   code compiled in other languages. You could also generate code by
-   implicitly making every function return an error value and checking
-   it. You could also make explicit use of setjmp/longjmp. There are
-   many different ways to go here.
--  **object orientation, generics, database access, complex numbers,
-   geometric programming, ...** - Really, there is no end of crazy
-   features that you can add to the language.
--  **unusual domains** - We've been talking about applying LLVM to a
-   domain that many people are interested in: building a compiler for a
-   specific language. However, there are many other domains that can use
-   compiler technology that are not typically considered. For example,
-   LLVM has been used to implement OpenGL graphics acceleration,
-   translate C++ code to ActionScript, and many other cute and clever
-   things. Maybe you will be the first to JIT compile a regular
-   expression interpreter into native code with LLVM?
-
-Have fun - try doing something crazy and unusual. Building a language
-like everyone else always has, is much less fun than trying something a
-little crazy or off the wall and seeing how it turns out. If you get
-stuck or want to talk about it, feel free to email the `llvm-dev mailing
-list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_: it has lots
-of people who are interested in languages and are often willing to help
-out.
-
-Before we end this tutorial, I want to talk about some "tips and tricks"
-for generating LLVM IR. These are some of the more subtle things that
-may not be obvious, but are very useful if you want to take advantage of
-LLVM's capabilities.
-
-Properties of the LLVM IR
-=========================
-
-We have a couple of common questions about code in the LLVM IR form -
-let's just get these out of the way right now, shall we?
-
-Target Independence
--------------------
-
-Kaleidoscope is an example of a "portable language": any program written
-in Kaleidoscope will work the same way on any target that it runs on.
-Many other languages have this property, e.g. lisp, java, haskell,
-javascript, python, etc (note that while these languages are portable,
-not all their libraries are).
-
-One nice aspect of LLVM is that it is often capable of preserving target
-independence in the IR: you can take the LLVM IR for a
-Kaleidoscope-compiled program and run it on any target that LLVM
-supports, even emitting C code and compiling that on targets that LLVM
-doesn't support natively. You can trivially tell that the Kaleidoscope
-compiler generates target-independent code because it never queries for
-any target-specific information when generating code.
-
-The fact that LLVM provides a compact, target-independent,
-representation for code gets a lot of people excited. Unfortunately,
-these people are usually thinking about C or a language from the C
-family when they are asking questions about language portability. I say
-"unfortunately", because there is really no way to make (fully general)
-C code portable, other than shipping the source code around (and of
-course, C source code is not actually portable in general either - ever
-port a really old application from 32- to 64-bits?).
-
-The problem with C (again, in its full generality) is that it is heavily
-laden with target specific assumptions. As one simple example, the
-preprocessor often destructively removes target-independence from the
-code when it processes the input text:
-
-.. code-block:: c
-
-    #ifdef __i386__
-      int X = 1;
-    #else
-      int X = 42;
-    #endif
-
-While it is possible to engineer more and more complex solutions to
-problems like this, it cannot be solved in full generality in a way that
-is better than shipping the actual source code.
-
-That said, there are interesting subsets of C that can be made portable.
-If you are willing to fix primitive types to a fixed size (say int =
-32-bits, and long = 64-bits), don't care about ABI compatibility with
-existing binaries, and are willing to give up some other minor features,
-you can have portable code. This can make sense for specialized domains
-such as an in-kernel language.
-
-Safety Guarantees
------------------
-
-Many of the languages above are also "safe" languages: it is impossible
-for a program written in Java to corrupt its address space and crash the
-process (assuming the JVM has no bugs). Safety is an interesting
-property that requires a combination of language design, runtime
-support, and often operating system support.
-
-It is certainly possible to implement a safe language in LLVM, but LLVM
-IR does not itself guarantee safety. The LLVM IR allows unsafe pointer
-casts, use after free bugs, buffer over-runs, and a variety of other
-problems. Safety needs to be implemented as a layer on top of LLVM and,
-conveniently, several groups have investigated this. Ask on the `llvm-dev
-mailing list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ if
-you are interested in more details.
-
-Language-Specific Optimizations
--------------------------------
-
-One thing about LLVM that turns off many people is that it does not
-solve all the world's problems in one system (sorry 'world hunger',
-someone else will have to solve you some other day). One specific
-complaint is that people perceive LLVM as being incapable of performing
-high-level language-specific optimization: LLVM "loses too much
-information".
-
-Unfortunately, this is really not the place to give you a full and
-unified version of "Chris Lattner's theory of compiler design". Instead,
-I'll make a few observations:
-
-First, you're right that LLVM does lose information. For example, as of
-this writing, there is no way to distinguish in the LLVM IR whether an
-SSA-value came from a C "int" or a C "long" on an ILP32 machine (other
-than debug info). Both get compiled down to an 'i32' value and the
-information about what it came from is lost. The more general issue
-here, is that the LLVM type system uses "structural equivalence" instead
-of "name equivalence". Another place this surprises people is if you
-have two types in a high-level language that have the same structure
-(e.g. two different structs that have a single int field): these types
-will compile down into a single LLVM type and it will be impossible to
-tell what it came from.
-
-Second, while LLVM does lose information, LLVM is not a fixed target: we
-continue to enhance and improve it in many different ways. In addition
-to adding new features (LLVM did not always support exceptions or debug
-info), we also extend the IR to capture important information for
-optimization (e.g. whether an argument is sign or zero extended,
-information about pointers aliasing, etc). Many of the enhancements are
-user-driven: people want LLVM to include some specific feature, so they
-go ahead and extend it.
-
-Third, it is *possible and easy* to add language-specific optimizations,
-and you have a number of choices in how to do it. As one trivial
-example, it is easy to add language-specific optimization passes that
-"know" things about code compiled for a language. In the case of the C
-family, there is an optimization pass that "knows" about the standard C
-library functions. If you call "exit(0)" in main(), it knows that it is
-safe to optimize that into "return 0;" because C specifies what the
-'exit' function does.
-
-In addition to simple library knowledge, it is possible to embed a
-variety of other language-specific information into the LLVM IR. If you
-have a specific need and run into a wall, please bring the topic up on
-the llvm-dev list. At the very worst, you can always treat LLVM as if it
-were a "dumb code generator" and implement the high-level optimizations
-you desire in your front-end, on the language-specific AST.
-
-Tips and Tricks
-===============
-
-There is a variety of useful tips and tricks that you come to know after
-working on/with LLVM that aren't obvious at first glance. Instead of
-letting everyone rediscover them, this section talks about some of these
-issues.
-
-Implementing portable offsetof/sizeof
--------------------------------------
-
-One interesting thing that comes up, if you are trying to keep the code
-generated by your compiler "target independent", is that you often need
-to know the size of some LLVM type or the offset of some field in an
-llvm structure. For example, you might need to pass the size of a type
-into a function that allocates memory.
-
-Unfortunately, this can vary widely across targets: for example the
-width of a pointer is trivially target-specific. However, there is a
-`clever way to use the getelementptr
-instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
-that allows you to compute this in a portable way.
-
-Garbage Collected Stack Frames
-------------------------------
-
-Some languages want to explicitly manage their stack frames, often so
-that they are garbage collected or to allow easy implementation of
-closures. There are often better ways to implement these features than
-explicit stack frames, but `LLVM does support
-them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
-if you want. It requires your front-end to convert the code into
-`Continuation Passing
-Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
-the use of tail calls (which LLVM also supports).
-

Modified: llvm/trunk/docs/tutorial/OCamlLangImpl5.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/OCamlLangImpl5.rst?rev=274441&r1=274440&r2=274441&view=diff
==============================================================================
--- llvm/trunk/docs/tutorial/OCamlLangImpl5.rst (original)
+++ llvm/trunk/docs/tutorial/OCamlLangImpl5.rst Sat Jul  2 12:01:59 2016
@@ -178,7 +178,7 @@ IR into "t.ll" and run "``llvm-as < t.ll
 window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll
 see this graph:
 
-.. figure:: LangImpl5-cfg.png
+.. figure:: LangImpl05-cfg.png
    :align: center
    :alt: Example CFG
 

Modified: llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt?rev=274441&r1=274440&r2=274441&view=diff
==============================================================================
--- llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt (original)
+++ llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt Sat Jul  2 12:01:59 2016
@@ -1,9 +1,5 @@
 set(LLVM_LINK_COMPONENTS
-  Core
-  ExecutionEngine
-  Object
-  Support
-  native
+  all
   )
 
 add_kaleidoscope_chapter(Kaleidoscope-Ch8

Modified: llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp?rev=274441&r1=274440&r2=274441&view=diff
==============================================================================
--- llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp (original)
+++ llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp Sat Jul  2 12:01:59 2016
@@ -1,28 +1,20 @@
 #include "llvm/ADT/APFloat.h"
-#include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/STLExtras.h"
-#include "llvm/ADT/StringRef.h"
-#include "llvm/ADT/Triple.h"
-#include "llvm/IR/BasicBlock.h"
-#include "llvm/IR/Constants.h"
-#include "llvm/IR/DebugInfoMetadata.h"
-#include "llvm/IR/DebugLoc.h"
-#include "llvm/IR/DerivedTypes.h"
-#include "llvm/IR/DIBuilder.h"
-#include "llvm/IR/Function.h"
-#include "llvm/IR/Instructions.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/Passes.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/LegacyPassManager.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Verifier.h"
-#include "llvm/Support/Host.h"
-#include "llvm/Support/raw_ostream.h"
+#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/TargetRegistry.h"
 #include "llvm/Support/TargetSelect.h"
 #include "llvm/Target/TargetMachine.h"
-#include "../include/KaleidoscopeJIT.h"
-#include <cassert>
+#include "llvm/Target/TargetOptions.h"
+#include "llvm/Transforms/Scalar.h"
 #include <cctype>
 #include <cstdio>
 #include <cstdlib>
@@ -33,7 +25,7 @@
 #include <vector>
 
 using namespace llvm;
-using namespace llvm::orc;
+using namespace llvm::sys;
 
 //===----------------------------------------------------------------------===//
 // Lexer
@@ -67,71 +59,6 @@ enum Token {
   tok_var = -13
 };
 
-std::string getTokName(int Tok) {
-  switch (Tok) {
-  case tok_eof:
-    return "eof";
-  case tok_def:
-    return "def";
-  case tok_extern:
-    return "extern";
-  case tok_identifier:
-    return "identifier";
-  case tok_number:
-    return "number";
-  case tok_if:
-    return "if";
-  case tok_then:
-    return "then";
-  case tok_else:
-    return "else";
-  case tok_for:
-    return "for";
-  case tok_in:
-    return "in";
-  case tok_binary:
-    return "binary";
-  case tok_unary:
-    return "unary";
-  case tok_var:
-    return "var";
-  }
-  return std::string(1, (char)Tok);
-}
-
-namespace {
-class ExprAST;
-} // end anonymous namespace
-
-static LLVMContext TheContext;
-static IRBuilder<> Builder(TheContext);
-struct DebugInfo {
-  DICompileUnit *TheCU;
-  DIType *DblTy;
-  std::vector<DIScope *> LexicalBlocks;
-
-  void emitLocation(ExprAST *AST);
-  DIType *getDoubleTy();
-} KSDbgInfo;
-
-struct SourceLocation {
-  int Line;
-  int Col;
-};
-static SourceLocation CurLoc;
-static SourceLocation LexLoc = {1, 0};
-
-static int advance() {
-  int LastChar = getchar();
-
-  if (LastChar == '\n' || LastChar == '\r') {
-    LexLoc.Line++;
-    LexLoc.Col = 0;
-  } else
-    LexLoc.Col++;
-  return LastChar;
-}
-
 static std::string IdentifierStr; // Filled in if tok_identifier
 static double NumVal;             // Filled in if tok_number
 
@@ -141,13 +68,11 @@ static int gettok() {
 
   // Skip any whitespace.
   while (isspace(LastChar))
-    LastChar = advance();
-
-  CurLoc = LexLoc;
+    LastChar = getchar();
 
   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
     IdentifierStr = LastChar;
-    while (isalnum((LastChar = advance())))
+    while (isalnum((LastChar = getchar())))
       IdentifierStr += LastChar;
 
     if (IdentifierStr == "def")
@@ -177,7 +102,7 @@ static int gettok() {
     std::string NumStr;
     do {
       NumStr += LastChar;
-      LastChar = advance();
+      LastChar = getchar();
     } while (isdigit(LastChar) || LastChar == '.');
 
     NumVal = strtod(NumStr.c_str(), nullptr);
@@ -187,7 +112,7 @@ static int gettok() {
   if (LastChar == '#') {
     // Comment until end of line.
     do
-      LastChar = advance();
+      LastChar = getchar();
     while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
 
     if (LastChar != EOF)
@@ -200,7 +125,7 @@ static int gettok() {
 
   // Otherwise, just return the character as its ascii value.
   int ThisChar = LastChar;
-  LastChar = advance();
+  LastChar = getchar();
   return ThisChar;
 }
 
@@ -208,25 +133,11 @@ static int gettok() {
 // Abstract Syntax Tree (aka Parse Tree)
 //===----------------------------------------------------------------------===//
 namespace {
-
-raw_ostream &indent(raw_ostream &O, int size) {
-  return O << std::string(size, ' ');
-}
-
 /// ExprAST - Base class for all expression nodes.
 class ExprAST {
-  SourceLocation Loc;
-
 public:
-  ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
   virtual ~ExprAST() {}
   virtual Value *codegen() = 0;
-  int getLine() const { return Loc.Line; }
-  int getCol() const { return Loc.Col; }
-
-  virtual raw_ostream &dump(raw_ostream &out, int ind) {
-    return out << ':' << getLine() << ':' << getCol() << '\n';
-  }
 };
 
 /// NumberExprAST - Expression class for numeric literals like "1.0".
@@ -236,10 +147,6 @@ class NumberExprAST : public ExprAST {
 public:
   NumberExprAST(double Val) : Val(Val) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    return ExprAST::dump(out << Val, ind);
-  }
 };
 
 /// VariableExprAST - Expression class for referencing a variable, like "a".
@@ -247,14 +154,9 @@ class VariableExprAST : public ExprAST {
   std::string Name;
 
 public:
-  VariableExprAST(SourceLocation Loc, const std::string &Name)
-      : ExprAST(Loc), Name(Name) {}
+  VariableExprAST(const std::string &Name) : Name(Name) {}
   const std::string &getName() const { return Name; }
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    return ExprAST::dump(out << Name, ind);
-  }
 };
 
 /// UnaryExprAST - Expression class for a unary operator.
@@ -266,12 +168,6 @@ public:
   UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
       : Opcode(Opcode), Operand(std::move(Operand)) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    ExprAST::dump(out << "unary" << Opcode, ind);
-    Operand->dump(out, ind + 1);
-    return out;
-  }
 };
 
 /// BinaryExprAST - Expression class for a binary operator.
@@ -280,17 +176,10 @@ class BinaryExprAST : public ExprAST {
   std::unique_ptr<ExprAST> LHS, RHS;
 
 public:
-  BinaryExprAST(SourceLocation Loc, char Op, std::unique_ptr<ExprAST> LHS,
+  BinaryExprAST(char Op, std::unique_ptr<ExprAST> LHS,
                 std::unique_ptr<ExprAST> RHS)
-      : ExprAST(Loc), Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
+      : Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    ExprAST::dump(out << "binary" << Op, ind);
-    LHS->dump(indent(out, ind) << "LHS:", ind + 1);
-    RHS->dump(indent(out, ind) << "RHS:", ind + 1);
-    return out;
-  }
 };
 
 /// CallExprAST - Expression class for function calls.
@@ -299,17 +188,10 @@ class CallExprAST : public ExprAST {
   std::vector<std::unique_ptr<ExprAST>> Args;
 
 public:
-  CallExprAST(SourceLocation Loc, const std::string &Callee,
+  CallExprAST(const std::string &Callee,
               std::vector<std::unique_ptr<ExprAST>> Args)
-      : ExprAST(Loc), Callee(Callee), Args(std::move(Args)) {}
+      : Callee(Callee), Args(std::move(Args)) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    ExprAST::dump(out << "call " << Callee, ind);
-    for (const auto &Arg : Args)
-      Arg->dump(indent(out, ind + 1), ind + 1);
-    return out;
-  }
 };
 
 /// IfExprAST - Expression class for if/then/else.
@@ -317,19 +199,10 @@ class IfExprAST : public ExprAST {
   std::unique_ptr<ExprAST> Cond, Then, Else;
 
 public:
-  IfExprAST(SourceLocation Loc, std::unique_ptr<ExprAST> Cond,
-            std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else)
-      : ExprAST(Loc), Cond(std::move(Cond)), Then(std::move(Then)),
-        Else(std::move(Else)) {}
+  IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,
+            std::unique_ptr<ExprAST> Else)
+      : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    ExprAST::dump(out << "if", ind);
-    Cond->dump(indent(out, ind) << "Cond:", ind + 1);
-    Then->dump(indent(out, ind) << "Then:", ind + 1);
-    Else->dump(indent(out, ind) << "Else:", ind + 1);
-    return out;
-  }
 };
 
 /// ForExprAST - Expression class for for/in.
@@ -344,15 +217,6 @@ public:
       : VarName(VarName), Start(std::move(Start)), End(std::move(End)),
         Step(std::move(Step)), Body(std::move(Body)) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    ExprAST::dump(out << "for", ind);
-    Start->dump(indent(out, ind) << "Cond:", ind + 1);
-    End->dump(indent(out, ind) << "End:", ind + 1);
-    Step->dump(indent(out, ind) << "Step:", ind + 1);
-    Body->dump(indent(out, ind) << "Body:", ind + 1);
-    return out;
-  }
 };
 
 /// VarExprAST - Expression class for var/in
@@ -366,14 +230,6 @@ public:
       std::unique_ptr<ExprAST> Body)
       : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
   Value *codegen() override;
-
-  raw_ostream &dump(raw_ostream &out, int ind) override {
-    ExprAST::dump(out << "var", ind);
-    for (const auto &NamedVar : VarNames)
-      NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1);
-    Body->dump(indent(out, ind) << "Body:", ind + 1);
-    return out;
-  }
 };
 
 /// PrototypeAST - This class represents the "prototype" for a function,
@@ -384,14 +240,12 @@ class PrototypeAST {
   std::vector<std::string> Args;
   bool IsOperator;
   unsigned Precedence; // Precedence if a binary op.
-  int Line;
 
 public:
-  PrototypeAST(SourceLocation Loc, const std::string &Name,
-               std::vector<std::string> Args, bool IsOperator = false,
-               unsigned Prec = 0)
+  PrototypeAST(const std::string &Name, std::vector<std::string> Args,
+               bool IsOperator = false, unsigned Prec = 0)
       : Name(Name), Args(std::move(Args)), IsOperator(IsOperator),
-        Precedence(Prec), Line(Loc.Line) {}
+        Precedence(Prec) {}
   Function *codegen();
   const std::string &getName() const { return Name; }
 
@@ -404,7 +258,6 @@ public:
   }
 
   unsigned getBinaryPrecedence() const { return Precedence; }
-  int getLine() const { return Line; }
 };
 
 /// FunctionAST - This class represents a function definition itself.
@@ -417,13 +270,6 @@ public:
               std::unique_ptr<ExprAST> Body)
       : Proto(std::move(Proto)), Body(std::move(Body)) {}
   Function *codegen();
-
-  raw_ostream &dump(raw_ostream &out, int ind) {
-    indent(out, ind) << "FunctionAST\n";
-    ++ind;
-    indent(out, ind) << "Body:";
-    return Body ? Body->dump(out, ind) : out << "null\n";
-  }
 };
 } // end anonymous namespace
 
@@ -492,12 +338,10 @@ static std::unique_ptr<ExprAST> ParsePar
 static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
   std::string IdName = IdentifierStr;
 
-  SourceLocation LitLoc = CurLoc;
-
   getNextToken(); // eat identifier.
 
   if (CurTok != '(') // Simple variable ref.
-    return llvm::make_unique<VariableExprAST>(LitLoc, IdName);
+    return llvm::make_unique<VariableExprAST>(IdName);
 
   // Call.
   getNextToken(); // eat (
@@ -521,13 +365,11 @@ static std::unique_ptr<ExprAST> ParseIde
   // Eat the ')'.
   getNextToken();
 
-  return llvm::make_unique<CallExprAST>(LitLoc, IdName, std::move(Args));
+  return llvm::make_unique<CallExprAST>(IdName, std::move(Args));
 }
 
 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
 static std::unique_ptr<ExprAST> ParseIfExpr() {
-  SourceLocation IfLoc = CurLoc;
-
   getNextToken(); // eat the if.
 
   // condition.
@@ -552,7 +394,7 @@ static std::unique_ptr<ExprAST> ParseIfE
   if (!Else)
     return nullptr;
 
-  return llvm::make_unique<IfExprAST>(IfLoc, std::move(Cond), std::move(Then),
+  return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),
                                       std::move(Else));
 }
 
@@ -707,7 +549,6 @@ static std::unique_ptr<ExprAST> ParseBin
 
     // Okay, we know this is a binop.
     int BinOp = CurTok;
-    SourceLocation BinLoc = CurLoc;
     getNextToken(); // eat binop
 
     // Parse the unary expression after the binary operator.
@@ -725,8 +566,8 @@ static std::unique_ptr<ExprAST> ParseBin
     }
 
     // Merge LHS/RHS.
-    LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),
-                                           std::move(RHS));
+    LHS =
+        llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), std::move(RHS));
   }
 }
 
@@ -748,8 +589,6 @@ static std::unique_ptr<ExprAST> ParseExp
 static std::unique_ptr<PrototypeAST> ParsePrototype() {
   std::string FnName;
 
-  SourceLocation FnLoc = CurLoc;
-
   unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
   unsigned BinaryPrecedence = 30;
 
@@ -805,7 +644,7 @@ static std::unique_ptr<PrototypeAST> Par
   if (Kind && ArgNames.size() != Kind)
     return LogErrorP("Invalid number of operands for operator");
 
-  return llvm::make_unique<PrototypeAST>(FnLoc, FnName, ArgNames, Kind != 0,
+  return llvm::make_unique<PrototypeAST>(FnName, ArgNames, Kind != 0,
                                          BinaryPrecedence);
 }
 
@@ -823,10 +662,9 @@ static std::unique_ptr<FunctionAST> Pars
 
 /// toplevelexpr ::= expression
 static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
-  SourceLocation FnLoc = CurLoc;
   if (auto E = ParseExpression()) {
     // Make an anonymous proto.
-    auto Proto = llvm::make_unique<PrototypeAST>(FnLoc, "__anon_expr",
+    auto Proto = llvm::make_unique<PrototypeAST>("__anon_expr",
                                                  std::vector<std::string>());
     return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
   }
@@ -840,51 +678,13 @@ static std::unique_ptr<PrototypeAST> Par
 }
 
 //===----------------------------------------------------------------------===//
-// Debug Info Support
-//===----------------------------------------------------------------------===//
-
-static std::unique_ptr<DIBuilder> DBuilder;
-
-DIType *DebugInfo::getDoubleTy() {
-  if (DblTy)
-    return DblTy;
-
-  DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
-  return DblTy;
-}
-
-void DebugInfo::emitLocation(ExprAST *AST) {
-  if (!AST)
-    return Builder.SetCurrentDebugLocation(DebugLoc());
-  DIScope *Scope;
-  if (LexicalBlocks.empty())
-    Scope = TheCU;
-  else
-    Scope = LexicalBlocks.back();
-  Builder.SetCurrentDebugLocation(
-      DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
-}
-
-static DISubroutineType *CreateFunctionType(unsigned NumArgs, DIFile *Unit) {
-  SmallVector<Metadata *, 8> EltTys;
-  DIType *DblTy = KSDbgInfo.getDoubleTy();
-
-  // Add the result type.
-  EltTys.push_back(DblTy);
-
-  for (unsigned i = 0, e = NumArgs; i != e; ++i)
-    EltTys.push_back(DblTy);
-
-  return DBuilder->createSubroutineType(DBuilder->getOrCreateTypeArray(EltTys));
-}
-
-//===----------------------------------------------------------------------===//
 // Code Generation
 //===----------------------------------------------------------------------===//
 
+static LLVMContext TheContext;
+static IRBuilder<> Builder(TheContext);
 static std::unique_ptr<Module> TheModule;
 static std::map<std::string, AllocaInst *> NamedValues;
-static std::unique_ptr<KaleidoscopeJIT> TheJIT;
 static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos;
 
 Value *LogErrorV(const char *Str) {
@@ -917,7 +717,6 @@ static AllocaInst *CreateEntryBlockAlloc
 }
 
 Value *NumberExprAST::codegen() {
-  KSDbgInfo.emitLocation(this);
   return ConstantFP::get(TheContext, APFloat(Val));
 }
 
@@ -927,7 +726,6 @@ Value *VariableExprAST::codegen() {
   if (!V)
     return LogErrorV("Unknown variable name");
 
-  KSDbgInfo.emitLocation(this);
   // Load the value.
   return Builder.CreateLoad(V, Name.c_str());
 }
@@ -941,13 +739,10 @@ Value *UnaryExprAST::codegen() {
   if (!F)
     return LogErrorV("Unknown unary operator");
 
-  KSDbgInfo.emitLocation(this);
   return Builder.CreateCall(F, OperandV, "unop");
 }
 
 Value *BinaryExprAST::codegen() {
-  KSDbgInfo.emitLocation(this);
-
   // Special case '=' because we don't want to emit the LHS as an expression.
   if (Op == '=') {
     // Assignment requires the LHS to be an identifier.
@@ -1001,8 +796,6 @@ Value *BinaryExprAST::codegen() {
 }
 
 Value *CallExprAST::codegen() {
-  KSDbgInfo.emitLocation(this);
-
   // Look up the name in the global module table.
   Function *CalleeF = getFunction(Callee);
   if (!CalleeF)
@@ -1023,8 +816,6 @@ Value *CallExprAST::codegen() {
 }
 
 Value *IfExprAST::codegen() {
-  KSDbgInfo.emitLocation(this);
-
   Value *CondV = Cond->codegen();
   if (!CondV)
     return nullptr;
@@ -1101,8 +892,6 @@ Value *ForExprAST::codegen() {
   // Create an alloca for the variable in the entry block.
   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 
-  KSDbgInfo.emitLocation(this);
-
   // Emit the start code first, without 'variable' in scope.
   Value *StartVal = Start->codegen();
   if (!StartVal)
@@ -1213,8 +1002,6 @@ Value *VarExprAST::codegen() {
     NamedValues[VarName] = Alloca;
   }
 
-  KSDbgInfo.emitLocation(this);
-
   // Codegen the body, now that all vars are in scope.
   Value *BodyVal = Body->codegen();
   if (!BodyVal)
@@ -1262,43 +1049,12 @@ Function *FunctionAST::codegen() {
   BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction);
   Builder.SetInsertPoint(BB);
 
-  // Create a subprogram DIE for this function.
-  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(),
-                                      KSDbgInfo.TheCU->getDirectory());
-  DIScope *FContext = Unit;
-  unsigned LineNo = P.getLine();
-  unsigned ScopeLine = LineNo;
-  DISubprogram *SP = DBuilder->createFunction(
-      FContext, P.getName(), StringRef(), Unit, LineNo,
-      CreateFunctionType(TheFunction->arg_size(), Unit),
-      false /* internal linkage */, true /* definition */, ScopeLine,
-      DINode::FlagPrototyped, false);
-  TheFunction->setSubprogram(SP);
-
-  // Push the current scope.
-  KSDbgInfo.LexicalBlocks.push_back(SP);
-
-  // Unset the location for the prologue emission (leading instructions with no
-  // location in a function are considered part of the prologue and the debugger
-  // will run past them when breaking on a function)
-  KSDbgInfo.emitLocation(nullptr);
-
   // Record the function arguments in the NamedValues map.
   NamedValues.clear();
-  unsigned ArgIdx = 0;
   for (auto &Arg : TheFunction->args()) {
     // Create an alloca for this variable.
     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());
 
-    // Create a debug descriptor for the variable.
-    DILocalVariable *D = DBuilder->createParameterVariable(
-        SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(),
-        true);
-
-    DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
-                            DebugLoc::get(LineNo, 0, SP),
-                            Builder.GetInsertBlock());
-
     // Store the initial value into the alloca.
     Builder.CreateStore(&Arg, Alloca);
 
@@ -1306,15 +1062,10 @@ Function *FunctionAST::codegen() {
     NamedValues[Arg.getName()] = Alloca;
   }
 
-  KSDbgInfo.emitLocation(Body.get());
-
   if (Value *RetVal = Body->codegen()) {
     // Finish off the function.
     Builder.CreateRet(RetVal);
 
-    // Pop off the lexical block for the function.
-    KSDbgInfo.LexicalBlocks.pop_back();
-
     // Validate the generated code, checking for consistency.
     verifyFunction(*TheFunction);
 
@@ -1326,11 +1077,6 @@ Function *FunctionAST::codegen() {
 
   if (P.isBinaryOp())
     BinopPrecedence.erase(Proto->getOperatorName());
-
-  // Pop off the lexical block for the function since we added it
-  // unconditionally.
-  KSDbgInfo.LexicalBlocks.pop_back();
-
   return nullptr;
 }
 
@@ -1338,16 +1084,17 @@ Function *FunctionAST::codegen() {
 // Top-Level parsing and JIT Driver
 //===----------------------------------------------------------------------===//
 
-static void InitializeModule() {
+static void InitializeModuleAndPassManager() {
   // Open a new module.
   TheModule = llvm::make_unique<Module>("my cool jit", TheContext);
-  TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
 }
 
 static void HandleDefinition() {
   if (auto FnAST = ParseDefinition()) {
-    if (!FnAST->codegen())
-      fprintf(stderr, "Error reading function definition:");
+    if (auto *FnIR = FnAST->codegen()) {
+      fprintf(stderr, "Read function definition:");
+      FnIR->dump();
+    }
   } else {
     // Skip token for error recovery.
     getNextToken();
@@ -1356,10 +1103,11 @@ static void HandleDefinition() {
 
 static void HandleExtern() {
   if (auto ProtoAST = ParseExtern()) {
-    if (!ProtoAST->codegen())
-      fprintf(stderr, "Error reading extern");
-    else
+    if (auto *FnIR = ProtoAST->codegen()) {
+      fprintf(stderr, "Read extern: ");
+      FnIR->dump();
       FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
+    }
   } else {
     // Skip token for error recovery.
     getNextToken();
@@ -1369,9 +1117,7 @@ static void HandleExtern() {
 static void HandleTopLevelExpression() {
   // Evaluate a top-level expression into an anonymous function.
   if (auto FnAST = ParseTopLevelExpr()) {
-    if (!FnAST->codegen()) {
-      fprintf(stderr, "Error generating code for top level expr");
-    }
+    FnAST->codegen();
   } else {
     // Skip token for error recovery.
     getNextToken();
@@ -1421,50 +1167,74 @@ extern "C" double printd(double X) {
 //===----------------------------------------------------------------------===//
 
 int main() {
-  InitializeNativeTarget();
-  InitializeNativeTargetAsmPrinter();
-  InitializeNativeTargetAsmParser();
-
   // Install standard binary operators.
   // 1 is lowest precedence.
-  BinopPrecedence['='] = 2;
   BinopPrecedence['<'] = 10;
   BinopPrecedence['+'] = 20;
   BinopPrecedence['-'] = 20;
   BinopPrecedence['*'] = 40; // highest.
 
   // Prime the first token.
+  fprintf(stderr, "ready> ");
   getNextToken();
 
-  TheJIT = llvm::make_unique<KaleidoscopeJIT>();
-
-  InitializeModule();
-
-  // Add the current debug info version into the module.
-  TheModule->addModuleFlag(Module::Warning, "Debug Info Version",
-                           DEBUG_METADATA_VERSION);
-
-  // Darwin only supports dwarf2.
-  if (Triple(sys::getProcessTriple()).isOSDarwin())
-    TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2);
-
-  // Construct the DIBuilder, we do this here because we need the module.
-  DBuilder = llvm::make_unique<DIBuilder>(*TheModule);
-
-  // Create the compile unit for the module.
-  // Currently down as "fib.ks" as a filename since we're redirecting stdin
-  // but we'd like actual source locations.
-  KSDbgInfo.TheCU = DBuilder->createCompileUnit(
-      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", false, "", 0);
+  InitializeModuleAndPassManager();
 
   // Run the main "interpreter loop" now.
   MainLoop();
 
-  // Finalize the debug info.
-  DBuilder->finalize();
+  // Initialize the target registry etc.
+  InitializeAllTargetInfos();
+  InitializeAllTargets();
+  InitializeAllTargetMCs();
+  InitializeAllAsmParsers();
+  InitializeAllAsmPrinters();
+
+  auto TargetTriple = sys::getDefaultTargetTriple();
+  TheModule->setTargetTriple(TargetTriple);
+
+  std::string Error;
+  auto Target = TargetRegistry::lookupTarget(TargetTriple, Error);
+
+  // Print an error and exit if we couldn't find the requested target.
+  // This generally occurs if we've forgotten to initialise the
+  // TargetRegistry or we have a bogus target triple.
+  if (!Target) {
+    errs() << Error;
+    return 1;
+  }
+
+  auto CPU = "generic";
+  auto Features = "";
+
+  TargetOptions opt;
+  auto RM = Optional<Reloc::Model>();
+  auto TheTargetMachine =
+      Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM);
+
+  TheModule->setDataLayout(TheTargetMachine->createDataLayout());
+
+  auto Filename = "output.o";
+  std::error_code EC;
+  raw_fd_ostream dest(Filename, EC, sys::fs::F_None);
+
+  if (EC) {
+    errs() << "Could not open file: " << EC.message();
+    return 1;
+  }
+
+  legacy::PassManager pass;
+  auto FileType = TargetMachine::CGFT_ObjectFile;
+
+  if (TheTargetMachine->addPassesToEmitFile(pass, dest, FileType)) {
+    errs() << "TheTargetMachine can't emit a file of this type";
+    return 1;
+  }
+
+  pass.run(*TheModule);
+  dest.flush();
 
-  // Print out all of the generated code.
-  TheModule->dump();
+  outs() << "Wrote " << Filename << "\n";
 
   return 0;
 }

Added: llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt?rev=274441&view=auto
==============================================================================
--- llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt (added)
+++ llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt Sat Jul  2 12:01:59 2016
@@ -0,0 +1,13 @@
+set(LLVM_LINK_COMPONENTS
+  Core
+  ExecutionEngine
+  Object
+  Support
+  native
+  )
+
+add_kaleidoscope_chapter(Kaleidoscope-Ch9
+  toy.cpp
+  )
+
+export_executable_symbols(Kaleidoscope-Ch9)

Added: llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp?rev=274441&view=auto
==============================================================================
--- llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp (added)
+++ llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp Sat Jul  2 12:01:59 2016
@@ -0,0 +1,1445 @@
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/BasicAliasAnalysis.h"
+#include "llvm/Analysis/Passes.h"
+#include "llvm/IR/DIBuilder.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/LegacyPassManager.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Verifier.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Transforms/Scalar.h"
+#include <cctype>
+#include <cstdio>
+#include <map>
+#include <string>
+#include <vector>
+#include "../include/KaleidoscopeJIT.h"
+
+using namespace llvm;
+using namespace llvm::orc;
+
+//===----------------------------------------------------------------------===//
+// Lexer
+//===----------------------------------------------------------------------===//
+
+// The lexer returns tokens [0-255] if it is an unknown character, otherwise one
+// of these for known things.
+enum Token {
+  tok_eof = -1,
+
+  // commands
+  tok_def = -2,
+  tok_extern = -3,
+
+  // primary
+  tok_identifier = -4,
+  tok_number = -5,
+
+  // control
+  tok_if = -6,
+  tok_then = -7,
+  tok_else = -8,
+  tok_for = -9,
+  tok_in = -10,
+
+  // operators
+  tok_binary = -11,
+  tok_unary = -12,
+
+  // var definition
+  tok_var = -13
+};
+
+std::string getTokName(int Tok) {
+  switch (Tok) {
+  case tok_eof:
+    return "eof";
+  case tok_def:
+    return "def";
+  case tok_extern:
+    return "extern";
+  case tok_identifier:
+    return "identifier";
+  case tok_number:
+    return "number";
+  case tok_if:
+    return "if";
+  case tok_then:
+    return "then";
+  case tok_else:
+    return "else";
+  case tok_for:
+    return "for";
+  case tok_in:
+    return "in";
+  case tok_binary:
+    return "binary";
+  case tok_unary:
+    return "unary";
+  case tok_var:
+    return "var";
+  }
+  return std::string(1, (char)Tok);
+}
+
+namespace {
+class PrototypeAST;
+class ExprAST;
+}
+static LLVMContext TheContext;
+static IRBuilder<> Builder(TheContext);
+struct DebugInfo {
+  DICompileUnit *TheCU;
+  DIType *DblTy;
+  std::vector<DIScope *> LexicalBlocks;
+
+  void emitLocation(ExprAST *AST);
+  DIType *getDoubleTy();
+} KSDbgInfo;
+
+struct SourceLocation {
+  int Line;
+  int Col;
+};
+static SourceLocation CurLoc;
+static SourceLocation LexLoc = {1, 0};
+
+static int advance() {
+  int LastChar = getchar();
+
+  if (LastChar == '\n' || LastChar == '\r') {
+    LexLoc.Line++;
+    LexLoc.Col = 0;
+  } else
+    LexLoc.Col++;
+  return LastChar;
+}
+
+static std::string IdentifierStr; // Filled in if tok_identifier
+static double NumVal;             // Filled in if tok_number
+
+/// gettok - Return the next token from standard input.
+static int gettok() {
+  static int LastChar = ' ';
+
+  // Skip any whitespace.
+  while (isspace(LastChar))
+    LastChar = advance();
+
+  CurLoc = LexLoc;
+
+  if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
+    IdentifierStr = LastChar;
+    while (isalnum((LastChar = advance())))
+      IdentifierStr += LastChar;
+
+    if (IdentifierStr == "def")
+      return tok_def;
+    if (IdentifierStr == "extern")
+      return tok_extern;
+    if (IdentifierStr == "if")
+      return tok_if;
+    if (IdentifierStr == "then")
+      return tok_then;
+    if (IdentifierStr == "else")
+      return tok_else;
+    if (IdentifierStr == "for")
+      return tok_for;
+    if (IdentifierStr == "in")
+      return tok_in;
+    if (IdentifierStr == "binary")
+      return tok_binary;
+    if (IdentifierStr == "unary")
+      return tok_unary;
+    if (IdentifierStr == "var")
+      return tok_var;
+    return tok_identifier;
+  }
+
+  if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+
+    std::string NumStr;
+    do {
+      NumStr += LastChar;
+      LastChar = advance();
+    } while (isdigit(LastChar) || LastChar == '.');
+
+    NumVal = strtod(NumStr.c_str(), nullptr);
+    return tok_number;
+  }
+
+  if (LastChar == '#') {
+    // Comment until end of line.
+    do
+      LastChar = advance();
+    while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');
+
+    if (LastChar != EOF)
+      return gettok();
+  }
+
+  // Check for end of file.  Don't eat the EOF.
+  if (LastChar == EOF)
+    return tok_eof;
+
+  // Otherwise, just return the character as its ascii value.
+  int ThisChar = LastChar;
+  LastChar = advance();
+  return ThisChar;
+}
+
+//===----------------------------------------------------------------------===//
+// Abstract Syntax Tree (aka Parse Tree)
+//===----------------------------------------------------------------------===//
+namespace {
+
+raw_ostream &indent(raw_ostream &O, int size) {
+  return O << std::string(size, ' ');
+}
+
+/// ExprAST - Base class for all expression nodes.
+class ExprAST {
+  SourceLocation Loc;
+
+public:
+  ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}
+  virtual ~ExprAST() {}
+  virtual Value *codegen() = 0;
+  int getLine() const { return Loc.Line; }
+  int getCol() const { return Loc.Col; }
+  virtual raw_ostream &dump(raw_ostream &out, int ind) {
+    return out << ':' << getLine() << ':' << getCol() << '\n';
+  }
+};
+
+/// NumberExprAST - Expression class for numeric literals like "1.0".
+class NumberExprAST : public ExprAST {
+  double Val;
+
+public:
+  NumberExprAST(double Val) : Val(Val) {}
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    return ExprAST::dump(out << Val, ind);
+  }
+  Value *codegen() override;
+};
+
+/// VariableExprAST - Expression class for referencing a variable, like "a".
+class VariableExprAST : public ExprAST {
+  std::string Name;
+
+public:
+  VariableExprAST(SourceLocation Loc, const std::string &Name)
+      : ExprAST(Loc), Name(Name) {}
+  const std::string &getName() const { return Name; }
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    return ExprAST::dump(out << Name, ind);
+  }
+};
+
+/// UnaryExprAST - Expression class for a unary operator.
+class UnaryExprAST : public ExprAST {
+  char Opcode;
+  std::unique_ptr<ExprAST> Operand;
+
+public:
+  UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
+      : Opcode(Opcode), Operand(std::move(Operand)) {}
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    ExprAST::dump(out << "unary" << Opcode, ind);
+    Operand->dump(out, ind + 1);
+    return out;
+  }
+};
+
+/// BinaryExprAST - Expression class for a binary operator.
+class BinaryExprAST : public ExprAST {
+  char Op;
+  std::unique_ptr<ExprAST> LHS, RHS;
+
+public:
+  BinaryExprAST(SourceLocation Loc, char Op, std::unique_ptr<ExprAST> LHS,
+                std::unique_ptr<ExprAST> RHS)
+      : ExprAST(Loc), Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    ExprAST::dump(out << "binary" << Op, ind);
+    LHS->dump(indent(out, ind) << "LHS:", ind + 1);
+    RHS->dump(indent(out, ind) << "RHS:", ind + 1);
+    return out;
+  }
+};
+
+/// CallExprAST - Expression class for function calls.
+class CallExprAST : public ExprAST {
+  std::string Callee;
+  std::vector<std::unique_ptr<ExprAST>> Args;
+
+public:
+  CallExprAST(SourceLocation Loc, const std::string &Callee,
+              std::vector<std::unique_ptr<ExprAST>> Args)
+      : ExprAST(Loc), Callee(Callee), Args(std::move(Args)) {}
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    ExprAST::dump(out << "call " << Callee, ind);
+    for (const auto &Arg : Args)
+      Arg->dump(indent(out, ind + 1), ind + 1);
+    return out;
+  }
+};
+
+/// IfExprAST - Expression class for if/then/else.
+class IfExprAST : public ExprAST {
+  std::unique_ptr<ExprAST> Cond, Then, Else;
+
+public:
+  IfExprAST(SourceLocation Loc, std::unique_ptr<ExprAST> Cond,
+            std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else)
+      : ExprAST(Loc), Cond(std::move(Cond)), Then(std::move(Then)),
+        Else(std::move(Else)) {}
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    ExprAST::dump(out << "if", ind);
+    Cond->dump(indent(out, ind) << "Cond:", ind + 1);
+    Then->dump(indent(out, ind) << "Then:", ind + 1);
+    Else->dump(indent(out, ind) << "Else:", ind + 1);
+    return out;
+  }
+};
+
+/// ForExprAST - Expression class for for/in.
+class ForExprAST : public ExprAST {
+  std::string VarName;
+  std::unique_ptr<ExprAST> Start, End, Step, Body;
+
+public:
+  ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,
+             std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,
+             std::unique_ptr<ExprAST> Body)
+      : VarName(VarName), Start(std::move(Start)), End(std::move(End)),
+        Step(std::move(Step)), Body(std::move(Body)) {}
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    ExprAST::dump(out << "for", ind);
+    Start->dump(indent(out, ind) << "Cond:", ind + 1);
+    End->dump(indent(out, ind) << "End:", ind + 1);
+    Step->dump(indent(out, ind) << "Step:", ind + 1);
+    Body->dump(indent(out, ind) << "Body:", ind + 1);
+    return out;
+  }
+};
+
+/// VarExprAST - Expression class for var/in
+class VarExprAST : public ExprAST {
+  std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
+  std::unique_ptr<ExprAST> Body;
+
+public:
+  VarExprAST(
+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
+      std::unique_ptr<ExprAST> Body)
+      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
+  Value *codegen() override;
+  raw_ostream &dump(raw_ostream &out, int ind) override {
+    ExprAST::dump(out << "var", ind);
+    for (const auto &NamedVar : VarNames)
+      NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1);
+    Body->dump(indent(out, ind) << "Body:", ind + 1);
+    return out;
+  }
+};
+
+/// PrototypeAST - This class represents the "prototype" for a function,
+/// which captures its name, and its argument names (thus implicitly the number
+/// of arguments the function takes), as well as if it is an operator.
+class PrototypeAST {
+  std::string Name;
+  std::vector<std::string> Args;
+  bool IsOperator;
+  unsigned Precedence; // Precedence if a binary op.
+  int Line;
+
+public:
+  PrototypeAST(SourceLocation Loc, const std::string &Name,
+               std::vector<std::string> Args, bool IsOperator = false,
+               unsigned Prec = 0)
+      : Name(Name), Args(std::move(Args)), IsOperator(IsOperator),
+        Precedence(Prec), Line(Loc.Line) {}
+  Function *codegen();
+  const std::string &getName() const { return Name; }
+
+  bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
+  bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
+
+  char getOperatorName() const {
+    assert(isUnaryOp() || isBinaryOp());
+    return Name[Name.size() - 1];
+  }
+
+  unsigned getBinaryPrecedence() const { return Precedence; }
+  int getLine() const { return Line; }
+};
+
+/// FunctionAST - This class represents a function definition itself.
+class FunctionAST {
+  std::unique_ptr<PrototypeAST> Proto;
+  std::unique_ptr<ExprAST> Body;
+
+public:
+  FunctionAST(std::unique_ptr<PrototypeAST> Proto,
+              std::unique_ptr<ExprAST> Body)
+      : Proto(std::move(Proto)), Body(std::move(Body)) {}
+  Function *codegen();
+  raw_ostream &dump(raw_ostream &out, int ind) {
+    indent(out, ind) << "FunctionAST\n";
+    ++ind;
+    indent(out, ind) << "Body:";
+    return Body ? Body->dump(out, ind) : out << "null\n";
+  }
+};
+} // end anonymous namespace
+
+//===----------------------------------------------------------------------===//
+// Parser
+//===----------------------------------------------------------------------===//
+
+/// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
+/// token the parser is looking at.  getNextToken reads another token from the
+/// lexer and updates CurTok with its results.
+static int CurTok;
+static int getNextToken() { return CurTok = gettok(); }
+
+/// BinopPrecedence - This holds the precedence for each binary operator that is
+/// defined.
+static std::map<char, int> BinopPrecedence;
+
+/// GetTokPrecedence - Get the precedence of the pending binary operator token.
+static int GetTokPrecedence() {
+  if (!isascii(CurTok))
+    return -1;
+
+  // Make sure it's a declared binop.
+  int TokPrec = BinopPrecedence[CurTok];
+  if (TokPrec <= 0)
+    return -1;
+  return TokPrec;
+}
+
+/// LogError* - These are little helper functions for error handling.
+std::unique_ptr<ExprAST> LogError(const char *Str) {
+  fprintf(stderr, "Error: %s\n", Str);
+  return nullptr;
+}
+
+std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
+  LogError(Str);
+  return nullptr;
+}
+
+static std::unique_ptr<ExprAST> ParseExpression();
+
+/// numberexpr ::= number
+static std::unique_ptr<ExprAST> ParseNumberExpr() {
+  auto Result = llvm::make_unique<NumberExprAST>(NumVal);
+  getNextToken(); // consume the number
+  return std::move(Result);
+}
+
+/// parenexpr ::= '(' expression ')'
+static std::unique_ptr<ExprAST> ParseParenExpr() {
+  getNextToken(); // eat (.
+  auto V = ParseExpression();
+  if (!V)
+    return nullptr;
+
+  if (CurTok != ')')
+    return LogError("expected ')'");
+  getNextToken(); // eat ).
+  return V;
+}
+
+/// identifierexpr
+///   ::= identifier
+///   ::= identifier '(' expression* ')'
+static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
+  std::string IdName = IdentifierStr;
+
+  SourceLocation LitLoc = CurLoc;
+
+  getNextToken(); // eat identifier.
+
+  if (CurTok != '(') // Simple variable ref.
+    return llvm::make_unique<VariableExprAST>(LitLoc, IdName);
+
+  // Call.
+  getNextToken(); // eat (
+  std::vector<std::unique_ptr<ExprAST>> Args;
+  if (CurTok != ')') {
+    while (1) {
+      if (auto Arg = ParseExpression())
+        Args.push_back(std::move(Arg));
+      else
+        return nullptr;
+
+      if (CurTok == ')')
+        break;
+
+      if (CurTok != ',')
+        return LogError("Expected ')' or ',' in argument list");
+      getNextToken();
+    }
+  }
+
+  // Eat the ')'.
+  getNextToken();
+
+  return llvm::make_unique<CallExprAST>(LitLoc, IdName, std::move(Args));
+}
+
+/// ifexpr ::= 'if' expression 'then' expression 'else' expression
+static std::unique_ptr<ExprAST> ParseIfExpr() {
+  SourceLocation IfLoc = CurLoc;
+
+  getNextToken(); // eat the if.
+
+  // condition.
+  auto Cond = ParseExpression();
+  if (!Cond)
+    return nullptr;
+
+  if (CurTok != tok_then)
+    return LogError("expected then");
+  getNextToken(); // eat the then
+
+  auto Then = ParseExpression();
+  if (!Then)
+    return nullptr;
+
+  if (CurTok != tok_else)
+    return LogError("expected else");
+
+  getNextToken();
+
+  auto Else = ParseExpression();
+  if (!Else)
+    return nullptr;
+
+  return llvm::make_unique<IfExprAST>(IfLoc, std::move(Cond), std::move(Then),
+                                      std::move(Else));
+}
+
+/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
+static std::unique_ptr<ExprAST> ParseForExpr() {
+  getNextToken(); // eat the for.
+
+  if (CurTok != tok_identifier)
+    return LogError("expected identifier after for");
+
+  std::string IdName = IdentifierStr;
+  getNextToken(); // eat identifier.
+
+  if (CurTok != '=')
+    return LogError("expected '=' after for");
+  getNextToken(); // eat '='.
+
+  auto Start = ParseExpression();
+  if (!Start)
+    return nullptr;
+  if (CurTok != ',')
+    return LogError("expected ',' after for start value");
+  getNextToken();
+
+  auto End = ParseExpression();
+  if (!End)
+    return nullptr;
+
+  // The step value is optional.
+  std::unique_ptr<ExprAST> Step;
+  if (CurTok == ',') {
+    getNextToken();
+    Step = ParseExpression();
+    if (!Step)
+      return nullptr;
+  }
+
+  if (CurTok != tok_in)
+    return LogError("expected 'in' after for");
+  getNextToken(); // eat 'in'.
+
+  auto Body = ParseExpression();
+  if (!Body)
+    return nullptr;
+
+  return llvm::make_unique<ForExprAST>(IdName, std::move(Start), std::move(End),
+                                       std::move(Step), std::move(Body));
+}
+
+/// varexpr ::= 'var' identifier ('=' expression)?
+//                    (',' identifier ('=' expression)?)* 'in' expression
+static std::unique_ptr<ExprAST> ParseVarExpr() {
+  getNextToken(); // eat the var.
+
+  std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
+
+  // At least one variable name is required.
+  if (CurTok != tok_identifier)
+    return LogError("expected identifier after var");
+
+  while (1) {
+    std::string Name = IdentifierStr;
+    getNextToken(); // eat identifier.
+
+    // Read the optional initializer.
+    std::unique_ptr<ExprAST> Init = nullptr;
+    if (CurTok == '=') {
+      getNextToken(); // eat the '='.
+
+      Init = ParseExpression();
+      if (!Init)
+        return nullptr;
+    }
+
+    VarNames.push_back(std::make_pair(Name, std::move(Init)));
+
+    // End of var list, exit loop.
+    if (CurTok != ',')
+      break;
+    getNextToken(); // eat the ','.
+
+    if (CurTok != tok_identifier)
+      return LogError("expected identifier list after var");
+  }
+
+  // At this point, we have to have 'in'.
+  if (CurTok != tok_in)
+    return LogError("expected 'in' keyword after 'var'");
+  getNextToken(); // eat 'in'.
+
+  auto Body = ParseExpression();
+  if (!Body)
+    return nullptr;
+
+  return llvm::make_unique<VarExprAST>(std::move(VarNames), std::move(Body));
+}
+
+/// primary
+///   ::= identifierexpr
+///   ::= numberexpr
+///   ::= parenexpr
+///   ::= ifexpr
+///   ::= forexpr
+///   ::= varexpr
+static std::unique_ptr<ExprAST> ParsePrimary() {
+  switch (CurTok) {
+  default:
+    return LogError("unknown token when expecting an expression");
+  case tok_identifier:
+    return ParseIdentifierExpr();
+  case tok_number:
+    return ParseNumberExpr();
+  case '(':
+    return ParseParenExpr();
+  case tok_if:
+    return ParseIfExpr();
+  case tok_for:
+    return ParseForExpr();
+  case tok_var:
+    return ParseVarExpr();
+  }
+}
+
+/// unary
+///   ::= primary
+///   ::= '!' unary
+static std::unique_ptr<ExprAST> ParseUnary() {
+  // If the current token is not an operator, it must be a primary expr.
+  if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
+    return ParsePrimary();
+
+  // If this is a unary operator, read it.
+  int Opc = CurTok;
+  getNextToken();
+  if (auto Operand = ParseUnary())
+    return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand));
+  return nullptr;
+}
+
+/// binoprhs
+///   ::= ('+' unary)*
+static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
+                                              std::unique_ptr<ExprAST> LHS) {
+  // If this is a binop, find its precedence.
+  while (1) {
+    int TokPrec = GetTokPrecedence();
+
+    // If this is a binop that binds at least as tightly as the current binop,
+    // consume it, otherwise we are done.
+    if (TokPrec < ExprPrec)
+      return LHS;
+
+    // Okay, we know this is a binop.
+    int BinOp = CurTok;
+    SourceLocation BinLoc = CurLoc;
+    getNextToken(); // eat binop
+
+    // Parse the unary expression after the binary operator.
+    auto RHS = ParseUnary();
+    if (!RHS)
+      return nullptr;
+
+    // If BinOp binds less tightly with RHS than the operator after RHS, let
+    // the pending operator take RHS as its LHS.
+    int NextPrec = GetTokPrecedence();
+    if (TokPrec < NextPrec) {
+      RHS = ParseBinOpRHS(TokPrec + 1, std::move(RHS));
+      if (!RHS)
+        return nullptr;
+    }
+
+    // Merge LHS/RHS.
+    LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),
+                                           std::move(RHS));
+  }
+}
+
+/// expression
+///   ::= unary binoprhs
+///
+static std::unique_ptr<ExprAST> ParseExpression() {
+  auto LHS = ParseUnary();
+  if (!LHS)
+    return nullptr;
+
+  return ParseBinOpRHS(0, std::move(LHS));
+}
+
+/// prototype
+///   ::= id '(' id* ')'
+///   ::= binary LETTER number? (id, id)
+///   ::= unary LETTER (id)
+static std::unique_ptr<PrototypeAST> ParsePrototype() {
+  std::string FnName;
+
+  SourceLocation FnLoc = CurLoc;
+
+  unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
+  unsigned BinaryPrecedence = 30;
+
+  switch (CurTok) {
+  default:
+    return LogErrorP("Expected function name in prototype");
+  case tok_identifier:
+    FnName = IdentifierStr;
+    Kind = 0;
+    getNextToken();
+    break;
+  case tok_unary:
+    getNextToken();
+    if (!isascii(CurTok))
+      return LogErrorP("Expected unary operator");
+    FnName = "unary";
+    FnName += (char)CurTok;
+    Kind = 1;
+    getNextToken();
+    break;
+  case tok_binary:
+    getNextToken();
+    if (!isascii(CurTok))
+      return LogErrorP("Expected binary operator");
+    FnName = "binary";
+    FnName += (char)CurTok;
+    Kind = 2;
+    getNextToken();
+
+    // Read the precedence if present.
+    if (CurTok == tok_number) {
+      if (NumVal < 1 || NumVal > 100)
+        return LogErrorP("Invalid precedecnce: must be 1..100");
+      BinaryPrecedence = (unsigned)NumVal;
+      getNextToken();
+    }
+    break;
+  }
+
+  if (CurTok != '(')
+    return LogErrorP("Expected '(' in prototype");
+
+  std::vector<std::string> ArgNames;
+  while (getNextToken() == tok_identifier)
+    ArgNames.push_back(IdentifierStr);
+  if (CurTok != ')')
+    return LogErrorP("Expected ')' in prototype");
+
+  // success.
+  getNextToken(); // eat ')'.
+
+  // Verify right number of names for operator.
+  if (Kind && ArgNames.size() != Kind)
+    return LogErrorP("Invalid number of operands for operator");
+
+  return llvm::make_unique<PrototypeAST>(FnLoc, FnName, ArgNames, Kind != 0,
+                                         BinaryPrecedence);
+}
+
+/// definition ::= 'def' prototype expression
+static std::unique_ptr<FunctionAST> ParseDefinition() {
+  getNextToken(); // eat def.
+  auto Proto = ParsePrototype();
+  if (!Proto)
+    return nullptr;
+
+  if (auto E = ParseExpression())
+    return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
+  return nullptr;
+}
+
+/// toplevelexpr ::= expression
+static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
+  SourceLocation FnLoc = CurLoc;
+  if (auto E = ParseExpression()) {
+    // Make an anonymous proto.
+    auto Proto = llvm::make_unique<PrototypeAST>(FnLoc, "__anon_expr",
+                                                 std::vector<std::string>());
+    return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));
+  }
+  return nullptr;
+}
+
+/// external ::= 'extern' prototype
+static std::unique_ptr<PrototypeAST> ParseExtern() {
+  getNextToken(); // eat extern.
+  return ParsePrototype();
+}
+
+//===----------------------------------------------------------------------===//
+// Debug Info Support
+//===----------------------------------------------------------------------===//
+
+static std::unique_ptr<DIBuilder> DBuilder;
+
+DIType *DebugInfo::getDoubleTy() {
+  if (DblTy)
+    return DblTy;
+
+  DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);
+  return DblTy;
+}
+
+void DebugInfo::emitLocation(ExprAST *AST) {
+  if (!AST)
+    return Builder.SetCurrentDebugLocation(DebugLoc());
+  DIScope *Scope;
+  if (LexicalBlocks.empty())
+    Scope = TheCU;
+  else
+    Scope = LexicalBlocks.back();
+  Builder.SetCurrentDebugLocation(
+      DebugLoc::get(AST->getLine(), AST->getCol(), Scope));
+}
+
+static DISubroutineType *CreateFunctionType(unsigned NumArgs, DIFile *Unit) {
+  SmallVector<Metadata *, 8> EltTys;
+  DIType *DblTy = KSDbgInfo.getDoubleTy();
+
+  // Add the result type.
+  EltTys.push_back(DblTy);
+
+  for (unsigned i = 0, e = NumArgs; i != e; ++i)
+    EltTys.push_back(DblTy);
+
+  return DBuilder->createSubroutineType(DBuilder->getOrCreateTypeArray(EltTys));
+}
+
+//===----------------------------------------------------------------------===//
+// Code Generation
+//===----------------------------------------------------------------------===//
+
+static std::unique_ptr<Module> TheModule;
+static std::map<std::string, AllocaInst *> NamedValues;
+static std::unique_ptr<KaleidoscopeJIT> TheJIT;
+static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos;
+
+Value *LogErrorV(const char *Str) {
+  LogError(Str);
+  return nullptr;
+}
+
+Function *getFunction(std::string Name) {
+  // First, see if the function has already been added to the current module.
+  if (auto *F = TheModule->getFunction(Name))
+    return F;
+
+  // If not, check whether we can codegen the declaration from some existing
+  // prototype.
+  auto FI = FunctionProtos.find(Name);
+  if (FI != FunctionProtos.end())
+    return FI->second->codegen();
+
+  // If no existing prototype exists, return null.
+  return nullptr;
+}
+
+/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
+/// the function.  This is used for mutable variables etc.
+static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
+                                          const std::string &VarName) {
+  IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
+                   TheFunction->getEntryBlock().begin());
+  return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), nullptr,
+                           VarName.c_str());
+}
+
+Value *NumberExprAST::codegen() {
+  KSDbgInfo.emitLocation(this);
+  return ConstantFP::get(TheContext, APFloat(Val));
+}
+
+Value *VariableExprAST::codegen() {
+  // Look this variable up in the function.
+  Value *V = NamedValues[Name];
+  if (!V)
+    return LogErrorV("Unknown variable name");
+
+  KSDbgInfo.emitLocation(this);
+  // Load the value.
+  return Builder.CreateLoad(V, Name.c_str());
+}
+
+Value *UnaryExprAST::codegen() {
+  Value *OperandV = Operand->codegen();
+  if (!OperandV)
+    return nullptr;
+
+  Function *F = getFunction(std::string("unary") + Opcode);
+  if (!F)
+    return LogErrorV("Unknown unary operator");
+
+  KSDbgInfo.emitLocation(this);
+  return Builder.CreateCall(F, OperandV, "unop");
+}
+
+Value *BinaryExprAST::codegen() {
+  KSDbgInfo.emitLocation(this);
+
+  // Special case '=' because we don't want to emit the LHS as an expression.
+  if (Op == '=') {
+    // Assignment requires the LHS to be an identifier.
+    // This assume we're building without RTTI because LLVM builds that way by
+    // default.  If you build LLVM with RTTI this can be changed to a
+    // dynamic_cast for automatic error checking.
+    VariableExprAST *LHSE = static_cast<VariableExprAST *>(LHS.get());
+    if (!LHSE)
+      return LogErrorV("destination of '=' must be a variable");
+    // Codegen the RHS.
+    Value *Val = RHS->codegen();
+    if (!Val)
+      return nullptr;
+
+    // Look up the name.
+    Value *Variable = NamedValues[LHSE->getName()];
+    if (!Variable)
+      return LogErrorV("Unknown variable name");
+
+    Builder.CreateStore(Val, Variable);
+    return Val;
+  }
+
+  Value *L = LHS->codegen();
+  Value *R = RHS->codegen();
+  if (!L || !R)
+    return nullptr;
+
+  switch (Op) {
+  case '+':
+    return Builder.CreateFAdd(L, R, "addtmp");
+  case '-':
+    return Builder.CreateFSub(L, R, "subtmp");
+  case '*':
+    return Builder.CreateFMul(L, R, "multmp");
+  case '<':
+    L = Builder.CreateFCmpULT(L, R, "cmptmp");
+    // Convert bool 0/1 to double 0.0 or 1.0
+    return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), "booltmp");
+  default:
+    break;
+  }
+
+  // If it wasn't a builtin binary operator, it must be a user defined one. Emit
+  // a call to it.
+  Function *F = getFunction(std::string("binary") + Op);
+  assert(F && "binary operator not found!");
+
+  Value *Ops[] = {L, R};
+  return Builder.CreateCall(F, Ops, "binop");
+}
+
+Value *CallExprAST::codegen() {
+  KSDbgInfo.emitLocation(this);
+
+  // Look up the name in the global module table.
+  Function *CalleeF = getFunction(Callee);
+  if (!CalleeF)
+    return LogErrorV("Unknown function referenced");
+
+  // If argument mismatch error.
+  if (CalleeF->arg_size() != Args.size())
+    return LogErrorV("Incorrect # arguments passed");
+
+  std::vector<Value *> ArgsV;
+  for (unsigned i = 0, e = Args.size(); i != e; ++i) {
+    ArgsV.push_back(Args[i]->codegen());
+    if (!ArgsV.back())
+      return nullptr;
+  }
+
+  return Builder.CreateCall(CalleeF, ArgsV, "calltmp");
+}
+
+Value *IfExprAST::codegen() {
+  KSDbgInfo.emitLocation(this);
+
+  Value *CondV = Cond->codegen();
+  if (!CondV)
+    return nullptr;
+
+  // Convert condition to a bool by comparing equal to 0.0.
+  CondV = Builder.CreateFCmpONE(
+      CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond");
+
+  Function *TheFunction = Builder.GetInsertBlock()->getParent();
+
+  // Create blocks for the then and else cases.  Insert the 'then' block at the
+  // end of the function.
+  BasicBlock *ThenBB = BasicBlock::Create(TheContext, "then", TheFunction);
+  BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else");
+  BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont");
+
+  Builder.CreateCondBr(CondV, ThenBB, ElseBB);
+
+  // Emit then value.
+  Builder.SetInsertPoint(ThenBB);
+
+  Value *ThenV = Then->codegen();
+  if (!ThenV)
+    return nullptr;
+
+  Builder.CreateBr(MergeBB);
+  // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
+  ThenBB = Builder.GetInsertBlock();
+
+  // Emit else block.
+  TheFunction->getBasicBlockList().push_back(ElseBB);
+  Builder.SetInsertPoint(ElseBB);
+
+  Value *ElseV = Else->codegen();
+  if (!ElseV)
+    return nullptr;
+
+  Builder.CreateBr(MergeBB);
+  // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
+  ElseBB = Builder.GetInsertBlock();
+
+  // Emit merge block.
+  TheFunction->getBasicBlockList().push_back(MergeBB);
+  Builder.SetInsertPoint(MergeBB);
+  PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp");
+
+  PN->addIncoming(ThenV, ThenBB);
+  PN->addIncoming(ElseV, ElseBB);
+  return PN;
+}
+
+// Output for-loop as:
+//   var = alloca double
+//   ...
+//   start = startexpr
+//   store start -> var
+//   goto loop
+// loop:
+//   ...
+//   bodyexpr
+//   ...
+// loopend:
+//   step = stepexpr
+//   endcond = endexpr
+//
+//   curvar = load var
+//   nextvar = curvar + step
+//   store nextvar -> var
+//   br endcond, loop, endloop
+// outloop:
+Value *ForExprAST::codegen() {
+  Function *TheFunction = Builder.GetInsertBlock()->getParent();
+
+  // Create an alloca for the variable in the entry block.
+  AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
+
+  KSDbgInfo.emitLocation(this);
+
+  // Emit the start code first, without 'variable' in scope.
+  Value *StartVal = Start->codegen();
+  if (!StartVal)
+    return nullptr;
+
+  // Store the value into the alloca.
+  Builder.CreateStore(StartVal, Alloca);
+
+  // Make the new basic block for the loop header, inserting after current
+  // block.
+  BasicBlock *LoopBB = BasicBlock::Create(TheContext, "loop", TheFunction);
+
+  // Insert an explicit fall through from the current block to the LoopBB.
+  Builder.CreateBr(LoopBB);
+
+  // Start insertion in LoopBB.
+  Builder.SetInsertPoint(LoopBB);
+
+  // Within the loop, the variable is defined equal to the PHI node.  If it
+  // shadows an existing variable, we have to restore it, so save it now.
+  AllocaInst *OldVal = NamedValues[VarName];
+  NamedValues[VarName] = Alloca;
+
+  // Emit the body of the loop.  This, like any other expr, can change the
+  // current BB.  Note that we ignore the value computed by the body, but don't
+  // allow an error.
+  if (!Body->codegen())
+    return nullptr;
+
+  // Emit the step value.
+  Value *StepVal = nullptr;
+  if (Step) {
+    StepVal = Step->codegen();
+    if (!StepVal)
+      return nullptr;
+  } else {
+    // If not specified, use 1.0.
+    StepVal = ConstantFP::get(TheContext, APFloat(1.0));
+  }
+
+  // Compute the end condition.
+  Value *EndCond = End->codegen();
+  if (!EndCond)
+    return nullptr;
+
+  // Reload, increment, and restore the alloca.  This handles the case where
+  // the body of the loop mutates the variable.
+  Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
+  Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
+  Builder.CreateStore(NextVar, Alloca);
+
+  // Convert condition to a bool by comparing equal to 0.0.
+  EndCond = Builder.CreateFCmpONE(
+      EndCond, ConstantFP::get(TheContext, APFloat(0.0)), "loopcond");
+
+  // Create the "after loop" block and insert it.
+  BasicBlock *AfterBB =
+      BasicBlock::Create(TheContext, "afterloop", TheFunction);
+
+  // Insert the conditional branch into the end of LoopEndBB.
+  Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
+
+  // Any new code will be inserted in AfterBB.
+  Builder.SetInsertPoint(AfterBB);
+
+  // Restore the unshadowed variable.
+  if (OldVal)
+    NamedValues[VarName] = OldVal;
+  else
+    NamedValues.erase(VarName);
+
+  // for expr always returns 0.0.
+  return Constant::getNullValue(Type::getDoubleTy(TheContext));
+}
+
+Value *VarExprAST::codegen() {
+  std::vector<AllocaInst *> OldBindings;
+
+  Function *TheFunction = Builder.GetInsertBlock()->getParent();
+
+  // Register all variables and emit their initializer.
+  for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
+    const std::string &VarName = VarNames[i].first;
+    ExprAST *Init = VarNames[i].second.get();
+
+    // Emit the initializer before adding the variable to scope, this prevents
+    // the initializer from referencing the variable itself, and permits stuff
+    // like this:
+    //  var a = 1 in
+    //    var a = a in ...   # refers to outer 'a'.
+    Value *InitVal;
+    if (Init) {
+      InitVal = Init->codegen();
+      if (!InitVal)
+        return nullptr;
+    } else { // If not specified, use 0.0.
+      InitVal = ConstantFP::get(TheContext, APFloat(0.0));
+    }
+
+    AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
+    Builder.CreateStore(InitVal, Alloca);
+
+    // Remember the old variable binding so that we can restore the binding when
+    // we unrecurse.
+    OldBindings.push_back(NamedValues[VarName]);
+
+    // Remember this binding.
+    NamedValues[VarName] = Alloca;
+  }
+
+  KSDbgInfo.emitLocation(this);
+
+  // Codegen the body, now that all vars are in scope.
+  Value *BodyVal = Body->codegen();
+  if (!BodyVal)
+    return nullptr;
+
+  // Pop all our variables from scope.
+  for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
+    NamedValues[VarNames[i].first] = OldBindings[i];
+
+  // Return the body computation.
+  return BodyVal;
+}
+
+Function *PrototypeAST::codegen() {
+  // Make the function type:  double(double,double) etc.
+  std::vector<Type *> Doubles(Args.size(), Type::getDoubleTy(TheContext));
+  FunctionType *FT =
+      FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false);
+
+  Function *F =
+      Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get());
+
+  // Set names for all arguments.
+  unsigned Idx = 0;
+  for (auto &Arg : F->args())
+    Arg.setName(Args[Idx++]);
+
+  return F;
+}
+
+Function *FunctionAST::codegen() {
+  // Transfer ownership of the prototype to the FunctionProtos map, but keep a
+  // reference to it for use below.
+  auto &P = *Proto;
+  FunctionProtos[Proto->getName()] = std::move(Proto);
+  Function *TheFunction = getFunction(P.getName());
+  if (!TheFunction)
+    return nullptr;
+
+  // If this is an operator, install it.
+  if (P.isBinaryOp())
+    BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence();
+
+  // Create a new basic block to start insertion into.
+  BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction);
+  Builder.SetInsertPoint(BB);
+
+  // Create a subprogram DIE for this function.
+  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(),
+                                      KSDbgInfo.TheCU->getDirectory());
+  DIScope *FContext = Unit;
+  unsigned LineNo = P.getLine();
+  unsigned ScopeLine = LineNo;
+  DISubprogram *SP = DBuilder->createFunction(
+      FContext, P.getName(), StringRef(), Unit, LineNo,
+      CreateFunctionType(TheFunction->arg_size(), Unit),
+      false /* internal linkage */, true /* definition */, ScopeLine,
+      DINode::FlagPrototyped, false);
+  TheFunction->setSubprogram(SP);
+
+  // Push the current scope.
+  KSDbgInfo.LexicalBlocks.push_back(SP);
+
+  // Unset the location for the prologue emission (leading instructions with no
+  // location in a function are considered part of the prologue and the debugger
+  // will run past them when breaking on a function)
+  KSDbgInfo.emitLocation(nullptr);
+
+  // Record the function arguments in the NamedValues map.
+  NamedValues.clear();
+  unsigned ArgIdx = 0;
+  for (auto &Arg : TheFunction->args()) {
+    // Create an alloca for this variable.
+    AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());
+
+    // Create a debug descriptor for the variable.
+    DILocalVariable *D = DBuilder->createParameterVariable(
+        SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(),
+        true);
+
+    DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),
+                            DebugLoc::get(LineNo, 0, SP),
+                            Builder.GetInsertBlock());
+
+    // Store the initial value into the alloca.
+    Builder.CreateStore(&Arg, Alloca);
+
+    // Add arguments to variable symbol table.
+    NamedValues[Arg.getName()] = Alloca;
+  }
+
+  KSDbgInfo.emitLocation(Body.get());
+
+  if (Value *RetVal = Body->codegen()) {
+    // Finish off the function.
+    Builder.CreateRet(RetVal);
+
+    // Pop off the lexical block for the function.
+    KSDbgInfo.LexicalBlocks.pop_back();
+
+    // Validate the generated code, checking for consistency.
+    verifyFunction(*TheFunction);
+
+    return TheFunction;
+  }
+
+  // Error reading body, remove function.
+  TheFunction->eraseFromParent();
+
+  if (P.isBinaryOp())
+    BinopPrecedence.erase(Proto->getOperatorName());
+
+  // Pop off the lexical block for the function since we added it
+  // unconditionally.
+  KSDbgInfo.LexicalBlocks.pop_back();
+
+  return nullptr;
+}
+
+//===----------------------------------------------------------------------===//
+// Top-Level parsing and JIT Driver
+//===----------------------------------------------------------------------===//
+
+static void InitializeModule() {
+  // Open a new module.
+  TheModule = llvm::make_unique<Module>("my cool jit", TheContext);
+  TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
+}
+
+static void HandleDefinition() {
+  if (auto FnAST = ParseDefinition()) {
+    if (!FnAST->codegen())
+      fprintf(stderr, "Error reading function definition:");
+  } else {
+    // Skip token for error recovery.
+    getNextToken();
+  }
+}
+
+static void HandleExtern() {
+  if (auto ProtoAST = ParseExtern()) {
+    if (!ProtoAST->codegen())
+      fprintf(stderr, "Error reading extern");
+    else
+      FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
+  } else {
+    // Skip token for error recovery.
+    getNextToken();
+  }
+}
+
+static void HandleTopLevelExpression() {
+  // Evaluate a top-level expression into an anonymous function.
+  if (auto FnAST = ParseTopLevelExpr()) {
+    if (!FnAST->codegen()) {
+      fprintf(stderr, "Error generating code for top level expr");
+    }
+  } else {
+    // Skip token for error recovery.
+    getNextToken();
+  }
+}
+
+/// top ::= definition | external | expression | ';'
+static void MainLoop() {
+  while (1) {
+    switch (CurTok) {
+    case tok_eof:
+      return;
+    case ';': // ignore top-level semicolons.
+      getNextToken();
+      break;
+    case tok_def:
+      HandleDefinition();
+      break;
+    case tok_extern:
+      HandleExtern();
+      break;
+    default:
+      HandleTopLevelExpression();
+      break;
+    }
+  }
+}
+
+//===----------------------------------------------------------------------===//
+// "Library" functions that can be "extern'd" from user code.
+//===----------------------------------------------------------------------===//
+
+/// putchard - putchar that takes a double and returns 0.
+extern "C" double putchard(double X) {
+  fputc((char)X, stderr);
+  return 0;
+}
+
+/// printd - printf that takes a double prints it as "%f\n", returning 0.
+extern "C" double printd(double X) {
+  fprintf(stderr, "%f\n", X);
+  return 0;
+}
+
+//===----------------------------------------------------------------------===//
+// Main driver code.
+//===----------------------------------------------------------------------===//
+
+int main() {
+  InitializeNativeTarget();
+  InitializeNativeTargetAsmPrinter();
+  InitializeNativeTargetAsmParser();
+
+  // Install standard binary operators.
+  // 1 is lowest precedence.
+  BinopPrecedence['='] = 2;
+  BinopPrecedence['<'] = 10;
+  BinopPrecedence['+'] = 20;
+  BinopPrecedence['-'] = 20;
+  BinopPrecedence['*'] = 40; // highest.
+
+  // Prime the first token.
+  getNextToken();
+
+  TheJIT = llvm::make_unique<KaleidoscopeJIT>();
+
+  InitializeModule();
+
+  // Add the current debug info version into the module.
+  TheModule->addModuleFlag(Module::Warning, "Debug Info Version",
+                           DEBUG_METADATA_VERSION);
+
+  // Darwin only supports dwarf2.
+  if (Triple(sys::getProcessTriple()).isOSDarwin())
+    TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2);
+
+  // Construct the DIBuilder, we do this here because we need the module.
+  DBuilder = llvm::make_unique<DIBuilder>(*TheModule);
+
+  // Create the compile unit for the module.
+  // Currently down as "fib.ks" as a filename since we're redirecting stdin
+  // but we'd like actual source locations.
+  KSDbgInfo.TheCU = DBuilder->createCompileUnit(
+      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);
+
+  // Run the main "interpreter loop" now.
+  MainLoop();
+
+  // Finalize the debug info.
+  DBuilder->finalize();
+
+  // Print out all of the generated code.
+  TheModule->dump();
+
+  return 0;
+}




More information about the llvm-commits mailing list