<div dir="ltr">This is great. Thanks Wilfred!<div><br></div><div>- Lang.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jul 2, 2016 at 10:02 AM, Wilfred Hughes via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: wilfred<br>

Date: Sat Jul  2 12:01:59 2016<br>

New Revision: 274441<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=274441&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=274441&view=rev</a><br>

Log:<br>

New Kaleidoscope chapter: Creating object files<br>

<br>

This new chapter describes compiling LLVM IR to object files.<br>

<br>

The new chaper is chapter 8, so later chapters have been renumbered.<br>

Since this brings us to 10 chapters total, I've also needed to rename<br>

the other chapters to use two digit numbering.<br>

<br>

Differential Revision: <a href="http://reviews.llvm.org/D18070" rel="noreferrer" target="_blank">http://reviews.llvm.org/D18070</a><br>

<br>

<br>

Added:<br>

    llvm/trunk/docs/tutorial/LangImpl01.rst<br>

    llvm/trunk/docs/tutorial/LangImpl02.rst<br>

    llvm/trunk/docs/tutorial/LangImpl03.rst<br>

    llvm/trunk/docs/tutorial/LangImpl04.rst<br>

    llvm/trunk/docs/tutorial/LangImpl05-cfg.png   (with props)<br>

    llvm/trunk/docs/tutorial/LangImpl05.rst<br>

    llvm/trunk/docs/tutorial/LangImpl06.rst<br>

    llvm/trunk/docs/tutorial/LangImpl07.rst<br>

    llvm/trunk/docs/tutorial/LangImpl08.rst<br>

    llvm/trunk/docs/tutorial/LangImpl09.rst<br>

    llvm/trunk/docs/tutorial/LangImpl10.rst<br>

    llvm/trunk/examples/Kaleidoscope/Chapter9/<br>

    llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt<br>

    llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp<br>

Removed:<br>

    llvm/trunk/docs/tutorial/LangImpl1.rst<br>

    llvm/trunk/docs/tutorial/LangImpl2.rst<br>

    llvm/trunk/docs/tutorial/LangImpl3.rst<br>

    llvm/trunk/docs/tutorial/LangImpl4.rst<br>

    llvm/trunk/docs/tutorial/LangImpl5-cfg.png<br>

    llvm/trunk/docs/tutorial/LangImpl5.rst<br>

    llvm/trunk/docs/tutorial/LangImpl6.rst<br>

    llvm/trunk/docs/tutorial/LangImpl7.rst<br>

    llvm/trunk/docs/tutorial/LangImpl8.rst<br>

    llvm/trunk/docs/tutorial/LangImpl9.rst<br>

Modified:<br>

    llvm/trunk/docs/tutorial/OCamlLangImpl5.rst<br>

    llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt<br>

    llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl01.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl01.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl01.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl01.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl01.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,293 @@<br>

+=================================================<br>

+Kaleidoscope: Tutorial Introduction and the Lexer<br>

+=================================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Tutorial Introduction<br>

+=====================<br>

+<br>

+Welcome to the "Implementing a language with LLVM" tutorial. This<br>

+tutorial runs through the implementation of a simple language, showing<br>

+how fun and easy it can be. This tutorial will get you up and started as<br>

+well as help to build a framework you can extend to other languages. The<br>

+code in this tutorial can also be used as a playground to hack on other<br>

+LLVM specific things.<br>

+<br>

+The goal of this tutorial is to progressively unveil our language,<br>

+describing how it is built up over time. This will let us cover a fairly<br>

+broad range of language design and LLVM-specific usage issues, showing<br>

+and explaining the code for it all along the way, without overwhelming<br>

+you with tons of details up front.<br>

+<br>

+It is useful to point out ahead of time that this tutorial is really<br>

+about teaching compiler techniques and LLVM specifically, *not* about<br>

+teaching modern and sane software engineering principles. In practice,<br>

+this means that we'll take a number of shortcuts to simplify the<br>

+exposition. For example, the code uses global variables<br>

+all over the place, doesn't use nice design patterns like<br>

+`visitors <<a href="http://en.wikipedia.org/wiki/Visitor_pattern" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Visitor_pattern</a>>`_, etc... but<br>

+it is very simple. If you dig in and use the code as a basis for future<br>

+projects, fixing these deficiencies shouldn't be hard.<br>

+<br>

+I've tried to put this tutorial together in a way that makes chapters<br>

+easy to skip over if you are already familiar with or are uninterested<br>

+in the various pieces. The structure of the tutorial is:<br>

+<br>

+-  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope<br>

+   language, and the definition of its Lexer - This shows where we are<br>

+   going and the basic functionality that we want it to do. In order to<br>

+   make this tutorial maximally understandable and hackable, we choose<br>

+   to implement everything in C++ instead of using lexer and parser<br>

+   generators. LLVM obviously works just fine with such tools, feel free<br>

+   to use one if you prefer.<br>

+-  `Chapter #2 <LangImpl02.html>`_: Implementing a Parser and AST -<br>

+   With the lexer in place, we can talk about parsing techniques and<br>

+   basic AST construction. This tutorial describes recursive descent<br>

+   parsing and operator precedence parsing. Nothing in Chapters 1 or 2<br>

+   is LLVM-specific, the code doesn't even link in LLVM at this point.<br>

+   :)<br>

+-  `Chapter #3 <LangImpl03.html>`_: Code generation to LLVM IR - With<br>

+   the AST ready, we can show off how easy generation of LLVM IR really<br>

+   is.<br>

+-  `Chapter #4 <LangImpl04.html>`_: Adding JIT and Optimizer Support<br>

+   - Because a lot of people are interested in using LLVM as a JIT,<br>

+   we'll dive right into it and show you the 3 lines it takes to add JIT<br>

+   support. LLVM is also useful in many other ways, but this is one<br>

+   simple and "sexy" way to show off its power. :)<br>

+-  `Chapter #5 <LangImpl05.html>`_: Extending the Language: Control<br>

+   Flow - With the language up and running, we show how to extend it<br>

+   with control flow operations (if/then/else and a 'for' loop). This<br>

+   gives us a chance to talk about simple SSA construction and control<br>

+   flow.<br>

+-  `Chapter #6 <LangImpl06.html>`_: Extending the Language:<br>

+   User-defined Operators - This is a silly but fun chapter that talks<br>

+   about extending the language to let the user program define their own<br>

+   arbitrary unary and binary operators (with assignable precedence!).<br>

+   This lets us build a significant piece of the "language" as library<br>

+   routines.<br>

+-  `Chapter #7 <LangImpl07.html>`_: Extending the Language: Mutable<br>

+   Variables - This chapter talks about adding user-defined local<br>

+   variables along with an assignment operator. The interesting part<br>

+   about this is how easy and trivial it is to construct SSA form in<br>

+   LLVM: no, LLVM does *not* require your front-end to construct SSA<br>

+   form!<br>

+-  `Chapter #8 <LangImpl08.html>`_: Compiling to Object Files - This<br>

+   chapter explains how to take LLVM IR and compile it down to object<br>

+   files.<br>

+-  `Chapter #9 <LangImpl09.html>`_: Extending the Language: Debug<br>

+   Information - Having built a decent little programming language with<br>

+   control flow, functions and mutable variables, we consider what it<br>

+   takes to add debug information to standalone executables. This debug<br>

+   information will allow you to set breakpoints in Kaleidoscope<br>

+   functions, print out argument variables, and call functions - all<br>

+   from within the debugger!<br>

+-  `Chapter #10 <LangImpl10.html>`_: Conclusion and other useful LLVM<br>

+   tidbits - This chapter wraps up the series by talking about<br>

+   potential ways to extend the language, but also includes a bunch of<br>

+   pointers to info about "special topics" like adding garbage<br>

+   collection support, exceptions, debugging, support for "spaghetti<br>

+   stacks", and a bunch of other tips and tricks.<br>

+<br>

+By the end of the tutorial, we'll have written a bit less than 1000 lines<br>

+of non-comment, non-blank, lines of code. With this small amount of<br>

+code, we'll have built up a very reasonable compiler for a non-trivial<br>

+language including a hand-written lexer, parser, AST, as well as code<br>

+generation support with a JIT compiler. While other systems may have<br>

+interesting "hello world" tutorials, I think the breadth of this<br>

+tutorial is a great testament to the strengths of LLVM and why you<br>

+should consider it if you're interested in language or compiler design.<br>

+<br>

+A note about this tutorial: we expect you to extend the language and<br>

+play with it on your own. Take the code and go crazy hacking away at it,<br>

+compilers don't need to be scary creatures - it can be a lot of fun to<br>

+play with languages!<br>

+<br>

+The Basic Language<br>

+==================<br>

+<br>

+This tutorial will be illustrated with a toy language that we'll call<br>

+"`Kaleidoscope <<a href="http://en.wikipedia.org/wiki/Kaleidoscope" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Kaleidoscope</a>>`_" (derived<br>

+from "meaning beautiful, form, and view"). Kaleidoscope is a procedural<br>

+language that allows you to define functions, use conditionals, math,<br>

+etc. Over the course of the tutorial, we'll extend Kaleidoscope to<br>

+support the if/then/else construct, a for loop, user defined operators,<br>

+JIT compilation with a simple command line interface, etc.<br>

+<br>

+Because we want to keep things simple, the only datatype in Kaleidoscope<br>

+is a 64-bit floating point type (aka 'double' in C parlance). As such,<br>

+all values are implicitly double precision and the language doesn't<br>

+require type declarations. This gives the language a very nice and<br>

+simple syntax. For example, the following simple example computes<br>

+`Fibonacci numbers: <<a href="http://en.wikipedia.org/wiki/Fibonacci_number" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Fibonacci_number</a>>`_<br>

+<br>

+::<br>

+<br>

+    # Compute the x'th fibonacci number.<br>

+    def fib(x)<br>

+      if x < 3 then<br>

+        1<br>

+      else<br>

+        fib(x-1)+fib(x-2)<br>

+<br>

+    # This expression will compute the 40th number.<br>

+    fib(40)<br>

+<br>

+We also allow Kaleidoscope to call into standard library functions (the<br>

+LLVM JIT makes this completely trivial). This means that you can use the<br>

+'extern' keyword to define a function before you use it (this is also<br>

+useful for mutually recursive functions). For example:<br>

+<br>

+::<br>

+<br>

+    extern sin(arg);<br>

+    extern cos(arg);<br>

+    extern atan2(arg1 arg2);<br>

+<br>

+    atan2(sin(.4), cos(42))<br>

+<br>

+A more interesting example is included in Chapter 6 where we write a<br>

+little Kaleidoscope application that `displays a Mandelbrot<br>

+Set <LangImpl06.html#kicking-the-tires>`_ at various levels of magnification.<br>

+<br>

+Lets dive into the implementation of this language!<br>

+<br>

+The Lexer<br>

+=========<br>

+<br>

+When it comes to implementing a language, the first thing needed is the<br>

+ability to process a text file and recognize what it says. The<br>

+traditional way to do this is to use a<br>

+"`lexer <<a href="http://en.wikipedia.org/wiki/Lexical_analysis" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Lexical_analysis</a>>`_" (aka<br>

+'scanner') to break the input up into "tokens". Each token returned by<br>

+the lexer includes a token code and potentially some metadata (e.g. the<br>

+numeric value of a number). First, we define the possibilities:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    // The lexer returns tokens [0-255] if it is an unknown character, otherwise one<br>

+    // of these for known things.<br>

+    enum Token {<br>

+      tok_eof = -1,<br>

+<br>

+      // commands<br>

+      tok_def = -2,<br>

+      tok_extern = -3,<br>

+<br>

+      // primary<br>

+      tok_identifier = -4,<br>

+      tok_number = -5,<br>

+    };<br>

+<br>

+    static std::string IdentifierStr; // Filled in if tok_identifier<br>

+    static double NumVal;             // Filled in if tok_number<br>

+<br>

+Each token returned by our lexer will either be one of the Token enum<br>

+values or it will be an 'unknown' character like '+', which is returned<br>

+as its ASCII value. If the current token is an identifier, the<br>

+``IdentifierStr`` global variable holds the name of the identifier. If<br>

+the current token is a numeric literal (like 1.0), ``NumVal`` holds its<br>

+value. Note that we use global variables for simplicity, this is not the<br>

+best choice for a real language implementation :).<br>

+<br>

+The actual implementation of the lexer is a single function named<br>

+``gettok``. The ``gettok`` function is called to return the next token<br>

+from standard input. Its definition starts as:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// gettok - Return the next token from standard input.<br>

+    static int gettok() {<br>

+      static int LastChar = ' ';<br>

+<br>

+      // Skip any whitespace.<br>

+      while (isspace(LastChar))<br>

+        LastChar = getchar();<br>

+<br>

+``gettok`` works by calling the C ``getchar()`` function to read<br>

+characters one at a time from standard input. It eats them as it<br>

+recognizes them and stores the last character read, but not processed,<br>

+in LastChar. The first thing that it has to do is ignore whitespace<br>

+between tokens. This is accomplished with the loop above.<br>

+<br>

+The next thing ``gettok`` needs to do is recognize identifiers and<br>

+specific keywords like "def". Kaleidoscope does this with this simple<br>

+loop:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*<br>

+        IdentifierStr = LastChar;<br>

+        while (isalnum((LastChar = getchar())))<br>

+          IdentifierStr += LastChar;<br>

+<br>

+        if (IdentifierStr == "def")<br>

+          return tok_def;<br>

+        if (IdentifierStr == "extern")<br>

+          return tok_extern;<br>

+        return tok_identifier;<br>

+      }<br>

+<br>

+Note that this code sets the '``IdentifierStr``' global whenever it<br>

+lexes an identifier. Also, since language keywords are matched by the<br>

+same loop, we handle them here inline. Numeric values are similar:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+<br>

+        std::string NumStr;<br>

+        do {<br>

+          NumStr += LastChar;<br>

+          LastChar = getchar();<br>

+        } while (isdigit(LastChar) || LastChar == '.');<br>

+<br>

+        NumVal = strtod(NumStr.c_str(), 0);<br>

+        return tok_number;<br>

+      }<br>

+<br>

+This is all pretty straight-forward code for processing input. When<br>

+reading a numeric value from input, we use the C ``strtod`` function to<br>

+convert it to a numeric value that we store in ``NumVal``. Note that<br>

+this isn't doing sufficient error checking: it will incorrectly read<br>

+"1.23.45.67" and handle it as if you typed in "1.23". Feel free to<br>

+extend it :). Next we handle comments:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      if (LastChar == '#') {<br>

+        // Comment until end of line.<br>

+        do<br>

+          LastChar = getchar();<br>

+        while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');<br>

+<br>

+        if (LastChar != EOF)<br>

+          return gettok();<br>

+      }<br>

+<br>

+We handle comments by skipping to the end of the line and then return<br>

+the next token. Finally, if the input doesn't match one of the above<br>

+cases, it is either an operator character like '+' or the end of the<br>

+file. These are handled with this code:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Check for end of file.  Don't eat the EOF.<br>

+      if (LastChar == EOF)<br>

+        return tok_eof;<br>

+<br>

+      // Otherwise, just return the character as its ascii value.<br>

+      int ThisChar = LastChar;<br>

+      LastChar = getchar();<br>

+      return ThisChar;<br>

+    }<br>

+<br>

+With this, we have the complete lexer for the basic Kaleidoscope<br>

+language (the `full code listing <LangImpl02.html#full-code-listing>`_ for the Lexer<br>

+is available in the `next chapter <LangImpl02.html>`_ of the tutorial).<br>

+Next we'll `build a simple parser that uses this to build an Abstract<br>

+Syntax Tree <LangImpl02.html>`_. When we have that, we'll include a<br>

+driver so that you can use the lexer and parser together.<br>

+<br>

+`Next: Implementing a Parser and AST <LangImpl02.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl02.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl02.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl02.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl02.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl02.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,735 @@<br>

+===========================================<br>

+Kaleidoscope: Implementing a Parser and AST<br>

+===========================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 2 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 2 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. This chapter shows you how to use the<br>

+lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full<br>

+`parser <<a href="http://en.wikipedia.org/wiki/Parsing" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Parsing</a>>`_ for our Kaleidoscope<br>

+language. Once we have a parser, we'll define and build an `Abstract<br>

+Syntax Tree <<a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Abstract_syntax_tree</a>>`_ (AST).<br>

+<br>

+The parser we will build uses a combination of `Recursive Descent<br>

+Parsing <<a href="http://en.wikipedia.org/wiki/Recursive_descent_parser" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Recursive_descent_parser</a>>`_ and<br>

+`Operator-Precedence<br>

+Parsing <<a href="http://en.wikipedia.org/wiki/Operator-precedence_parser" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Operator-precedence_parser</a>>`_ to<br>

+parse the Kaleidoscope language (the latter for binary expressions and<br>

+the former for everything else). Before we get to parsing though, lets<br>

+talk about the output of the parser: the Abstract Syntax Tree.<br>

+<br>

+The Abstract Syntax Tree (AST)<br>

+==============================<br>

+<br>

+The AST for a program captures its behavior in such a way that it is<br>

+easy for later stages of the compiler (e.g. code generation) to<br>

+interpret. We basically want one object for each construct in the<br>

+language, and the AST should closely model the language. In<br>

+Kaleidoscope, we have expressions, a prototype, and a function object.<br>

+We'll start with expressions first:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// ExprAST - Base class for all expression nodes.<br>

+    class ExprAST {<br>

+    public:<br>

+      virtual ~ExprAST() {}<br>

+    };<br>

+<br>

+    /// NumberExprAST - Expression class for numeric literals like "1.0".<br>

+    class NumberExprAST : public ExprAST {<br>

+      double Val;<br>

+<br>

+    public:<br>

+      NumberExprAST(double Val) : Val(Val) {}<br>

+    };<br>

+<br>

+The code above shows the definition of the base ExprAST class and one<br>

+subclass which we use for numeric literals. The important thing to note<br>

+about this code is that the NumberExprAST class captures the numeric<br>

+value of the literal as an instance variable. This allows later phases<br>

+of the compiler to know what the stored numeric value is.<br>

+<br>

+Right now we only create the AST, so there are no useful accessor<br>

+methods on them. It would be very easy to add a virtual method to pretty<br>

+print the code, for example. Here are the other expression AST node<br>

+definitions that we'll use in the basic form of the Kaleidoscope<br>

+language:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// VariableExprAST - Expression class for referencing a variable, like "a".<br>

+    class VariableExprAST : public ExprAST {<br>

+      std::string Name;<br>

+<br>

+    public:<br>

+      VariableExprAST(const std::string &Name) : Name(Name) {}<br>

+    };<br>

+<br>

+    /// BinaryExprAST - Expression class for a binary operator.<br>

+    class BinaryExprAST : public ExprAST {<br>

+      char Op;<br>

+      std::unique_ptr<ExprAST> LHS, RHS;<br>

+<br>

+    public:<br>

+      BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,<br>

+                    std::unique_ptr<ExprAST> RHS)<br>

+        : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}<br>

+    };<br>

+<br>

+    /// CallExprAST - Expression class for function calls.<br>

+    class CallExprAST : public ExprAST {<br>

+      std::string Callee;<br>

+      std::vector<std::unique_ptr<ExprAST>> Args;<br>

+<br>

+    public:<br>

+      CallExprAST(const std::string &Callee,<br>

+                  std::vector<std::unique_ptr<ExprAST>> Args)<br>

+        : Callee(Callee), Args(std::move(Args)) {}<br>

+    };<br>

+<br>

+This is all (intentionally) rather straight-forward: variables capture<br>

+the variable name, binary operators capture their opcode (e.g. '+'), and<br>

+calls capture a function name as well as a list of any argument<br>

+expressions. One thing that is nice about our AST is that it captures<br>

+the language features without talking about the syntax of the language.<br>

+Note that there is no discussion about precedence of binary operators,<br>

+lexical structure, etc.<br>

+<br>

+For our basic language, these are all of the expression nodes we'll<br>

+define. Because it doesn't have conditional control flow, it isn't<br>

+Turing-complete; we'll fix that in a later installment. The two things<br>

+we need next are a way to talk about the interface to a function, and a<br>

+way to talk about functions themselves:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// PrototypeAST - This class represents the "prototype" for a function,<br>

+    /// which captures its name, and its argument names (thus implicitly the number<br>

+    /// of arguments the function takes).<br>

+    class PrototypeAST {<br>

+      std::string Name;<br>

+      std::vector<std::string> Args;<br>

+<br>

+    public:<br>

+      PrototypeAST(const std::string &name, std::vector<std::string> Args)<br>

+        : Name(name), Args(std::move(Args)) {}<br>

+    };<br>

+<br>

+    /// FunctionAST - This class represents a function definition itself.<br>

+    class FunctionAST {<br>

+      std::unique_ptr<PrototypeAST> Proto;<br>

+      std::unique_ptr<ExprAST> Body;<br>

+<br>

+    public:<br>

+      FunctionAST(std::unique_ptr<PrototypeAST> Proto,<br>

+                  std::unique_ptr<ExprAST> Body)<br>

+        : Proto(std::move(Proto)), Body(std::move(Body)) {}<br>

+    };<br>

+<br>

+In Kaleidoscope, functions are typed with just a count of their<br>

+arguments. Since all values are double precision floating point, the<br>

+type of each argument doesn't need to be stored anywhere. In a more<br>

+aggressive and realistic language, the "ExprAST" class would probably<br>

+have a type field.<br>

+<br>

+With this scaffolding, we can now talk about parsing expressions and<br>

+function bodies in Kaleidoscope.<br>

+<br>

+Parser Basics<br>

+=============<br>

+<br>

+Now that we have an AST to build, we need to define the parser code to<br>

+build it. The idea here is that we want to parse something like "x+y"<br>

+(which is returned as three tokens by the lexer) into an AST that could<br>

+be generated with calls like this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      auto LHS = llvm::make_unique<VariableExprAST>("x");<br>

+      auto RHS = llvm::make_unique<VariableExprAST>("y");<br>

+      auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),<br>

+                                                    std::move(RHS));<br>

+<br>

+In order to do this, we'll start by defining some basic helper routines:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current<br>

+    /// token the parser is looking at.  getNextToken reads another token from the<br>

+    /// lexer and updates CurTok with its results.<br>

+    static int CurTok;<br>

+    static int getNextToken() {<br>

+      return CurTok = gettok();<br>

+    }<br>

+<br>

+This implements a simple token buffer around the lexer. This allows us<br>

+to look one token ahead at what the lexer is returning. Every function<br>

+in our parser will assume that CurTok is the current token that needs to<br>

+be parsed.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+<br>

+    /// LogError* - These are little helper functions for error handling.<br>

+    std::unique_ptr<ExprAST> LogError(const char *Str) {<br>

+      fprintf(stderr, "LogError: %s\n", Str);<br>

+      return nullptr;<br>

+    }<br>

+    std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {<br>

+      LogError(Str);<br>

+      return nullptr;<br>

+    }<br>

+<br>

+The ``LogError`` routines are simple helper routines that our parser will<br>

+use to handle errors. The error recovery in our parser will not be the<br>

+best and is not particular user-friendly, but it will be enough for our<br>

+tutorial. These routines make it easier to handle errors in routines<br>

+that have various return types: they always return null.<br>

+<br>

+With these basic helper functions, we can implement the first piece of<br>

+our grammar: numeric literals.<br>

+<br>

+Basic Expression Parsing<br>

+========================<br>

+<br>

+We start with numeric literals, because they are the simplest to<br>

+process. For each production in our grammar, we'll define a function<br>

+which parses that production. For numeric literals, we have:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// numberexpr ::= number<br>

+    static std::unique_ptr<ExprAST> ParseNumberExpr() {<br>

+      auto Result = llvm::make_unique<NumberExprAST>(NumVal);<br>

+      getNextToken(); // consume the number<br>

+      return std::move(Result);<br>

+    }<br>

+<br>

+This routine is very simple: it expects to be called when the current<br>

+token is a ``tok_number`` token. It takes the current number value,<br>

+creates a ``NumberExprAST`` node, advances the lexer to the next token,<br>

+and finally returns.<br>

+<br>

+There are some interesting aspects to this. The most important one is<br>

+that this routine eats all of the tokens that correspond to the<br>

+production and returns the lexer buffer with the next token (which is<br>

+not part of the grammar production) ready to go. This is a fairly<br>

+standard way to go for recursive descent parsers. For a better example,<br>

+the parenthesis operator is defined like this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// parenexpr ::= '(' expression ')'<br>

+    static std::unique_ptr<ExprAST> ParseParenExpr() {<br>

+      getNextToken(); // eat (.<br>

+      auto V = ParseExpression();<br>

+      if (!V)<br>

+        return nullptr;<br>

+<br>

+      if (CurTok != ')')<br>

+        return LogError("expected ')'");<br>

+      getNextToken(); // eat ).<br>

+      return V;<br>

+    }<br>

+<br>

+This function illustrates a number of interesting things about the<br>

+parser:<br>

+<br>

+1) It shows how we use the LogError routines. When called, this function<br>

+expects that the current token is a '(' token, but after parsing the<br>

+subexpression, it is possible that there is no ')' waiting. For example,<br>

+if the user types in "(4 x" instead of "(4)", the parser should emit an<br>

+error. Because errors can occur, the parser needs a way to indicate that<br>

+they happened: in our parser, we return null on an error.<br>

+<br>

+2) Another interesting aspect of this function is that it uses recursion<br>

+by calling ``ParseExpression`` (we will soon see that<br>

+``ParseExpression`` can call ``ParseParenExpr``). This is powerful<br>

+because it allows us to handle recursive grammars, and keeps each<br>

+production very simple. Note that parentheses do not cause construction<br>

+of AST nodes themselves. While we could do it this way, the most<br>

+important role of parentheses are to guide the parser and provide<br>

+grouping. Once the parser constructs the AST, parentheses are not<br>

+needed.<br>

+<br>

+The next simple production is for handling variable references and<br>

+function calls:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// identifierexpr<br>

+    ///   ::= identifier<br>

+    ///   ::= identifier '(' expression* ')'<br>

+    static std::unique_ptr<ExprAST> ParseIdentifierExpr() {<br>

+      std::string IdName = IdentifierStr;<br>

+<br>

+      getNextToken();  // eat identifier.<br>

+<br>

+      if (CurTok != '(') // Simple variable ref.<br>

+        return llvm::make_unique<VariableExprAST>(IdName);<br>

+<br>

+      // Call.<br>

+      getNextToken();  // eat (<br>

+      std::vector<std::unique_ptr<ExprAST>> Args;<br>

+      if (CurTok != ')') {<br>

+        while (1) {<br>

+          if (auto Arg = ParseExpression())<br>

+            Args.push_back(std::move(Arg));<br>

+          else<br>

+            return nullptr;<br>

+<br>

+          if (CurTok == ')')<br>

+            break;<br>

+<br>

+          if (CurTok != ',')<br>

+            return LogError("Expected ')' or ',' in argument list");<br>

+          getNextToken();<br>

+        }<br>

+      }<br>

+<br>

+      // Eat the ')'.<br>

+      getNextToken();<br>

+<br>

+      return llvm::make_unique<CallExprAST>(IdName, std::move(Args));<br>

+    }<br>

+<br>

+This routine follows the same style as the other routines. (It expects<br>

+to be called if the current token is a ``tok_identifier`` token). It<br>

+also has recursion and error handling. One interesting aspect of this is<br>

+that it uses *look-ahead* to determine if the current identifier is a<br>

+stand alone variable reference or if it is a function call expression.<br>

+It handles this by checking to see if the token after the identifier is<br>

+a '(' token, constructing either a ``VariableExprAST`` or<br>

+``CallExprAST`` node as appropriate.<br>

+<br>

+Now that we have all of our simple expression-parsing logic in place, we<br>

+can define a helper function to wrap it together into one entry point.<br>

+We call this class of expressions "primary" expressions, for reasons<br>

+that will become more clear `later in the<br>

+tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary<br>

+primary expression, we need to determine what sort of expression it is:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// primary<br>

+    ///   ::= identifierexpr<br>

+    ///   ::= numberexpr<br>

+    ///   ::= parenexpr<br>

+    static std::unique_ptr<ExprAST> ParsePrimary() {<br>

+      switch (CurTok) {<br>

+      default:<br>

+        return LogError("unknown token when expecting an expression");<br>

+      case tok_identifier:<br>

+        return ParseIdentifierExpr();<br>

+      case tok_number:<br>

+        return ParseNumberExpr();<br>

+      case '(':<br>

+        return ParseParenExpr();<br>

+      }<br>

+    }<br>

+<br>

+Now that you see the definition of this function, it is more obvious why<br>

+we can assume the state of CurTok in the various functions. This uses<br>

+look-ahead to determine which sort of expression is being inspected, and<br>

+then parses it with a function call.<br>

+<br>

+Now that basic expressions are handled, we need to handle binary<br>

+expressions. They are a bit more complex.<br>

+<br>

+Binary Expression Parsing<br>

+=========================<br>

+<br>

+Binary expressions are significantly harder to parse because they are<br>

+often ambiguous. For example, when given the string "x+y\*z", the parser<br>

+can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common<br>

+definitions from mathematics, we expect the later parse, because "\*"<br>

+(multiplication) has higher *precedence* than "+" (addition).<br>

+<br>

+There are many ways to handle this, but an elegant and efficient way is<br>

+to use `Operator-Precedence<br>

+Parsing <<a href="http://en.wikipedia.org/wiki/Operator-precedence_parser" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Operator-precedence_parser</a>>`_.<br>

+This parsing technique uses the precedence of binary operators to guide<br>

+recursion. To start with, we need a table of precedences:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// BinopPrecedence - This holds the precedence for each binary operator that is<br>

+    /// defined.<br>

+    static std::map<char, int> BinopPrecedence;<br>

+<br>

+    /// GetTokPrecedence - Get the precedence of the pending binary operator token.<br>

+    static int GetTokPrecedence() {<br>

+      if (!isascii(CurTok))<br>

+        return -1;<br>

+<br>

+      // Make sure it's a declared binop.<br>

+      int TokPrec = BinopPrecedence[CurTok];<br>

+      if (TokPrec <= 0) return -1;<br>

+      return TokPrec;<br>

+    }<br>

+<br>

+    int main() {<br>

+      // Install standard binary operators.<br>

+      // 1 is lowest precedence.<br>

+      BinopPrecedence['<'] = 10;<br>

+      BinopPrecedence['+'] = 20;<br>

+      BinopPrecedence['-'] = 20;<br>

+      BinopPrecedence['*'] = 40;  // highest.<br>

+      ...<br>

+    }<br>

+<br>

+For the basic form of Kaleidoscope, we will only support 4 binary<br>

+operators (this can obviously be extended by you, our brave and intrepid<br>

+reader). The ``GetTokPrecedence`` function returns the precedence for<br>

+the current token, or -1 if the token is not a binary operator. Having a<br>

+map makes it easy to add new operators and makes it clear that the<br>

+algorithm doesn't depend on the specific operators involved, but it<br>

+would be easy enough to eliminate the map and do the comparisons in the<br>

+``GetTokPrecedence`` function. (Or just use a fixed-size array).<br>

+<br>

+With the helper above defined, we can now start parsing binary<br>

+expressions. The basic idea of operator precedence parsing is to break<br>

+down an expression with potentially ambiguous binary operators into<br>

+pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".<br>

+Operator precedence parsing considers this as a stream of primary<br>

+expressions separated by binary operators. As such, it will first parse<br>

+the leading primary expression "a", then it will see the pairs [+, b]<br>

+[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are<br>

+primary expressions, the binary expression parser doesn't need to worry<br>

+about nested subexpressions like (c+d) at all.<br>

+<br>

+To start, an expression is a primary expression potentially followed by<br>

+a sequence of [binop,primaryexpr] pairs:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// expression<br>

+    ///   ::= primary binoprhs<br>

+    ///<br>

+    static std::unique_ptr<ExprAST> ParseExpression() {<br>

+      auto LHS = ParsePrimary();<br>

+      if (!LHS)<br>

+        return nullptr;<br>

+<br>

+      return ParseBinOpRHS(0, std::move(LHS));<br>

+    }<br>

+<br>

+``ParseBinOpRHS`` is the function that parses the sequence of pairs for<br>

+us. It takes a precedence and a pointer to an expression for the part<br>

+that has been parsed so far. Note that "x" is a perfectly valid<br>

+expression: As such, "binoprhs" is allowed to be empty, in which case it<br>

+returns the expression that is passed into it. In our example above, the<br>

+code passes the expression for "a" into ``ParseBinOpRHS`` and the<br>

+current token is "+".<br>

+<br>

+The precedence value passed into ``ParseBinOpRHS`` indicates the<br>

+*minimal operator precedence* that the function is allowed to eat. For<br>

+example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is<br>

+passed in a precedence of 40, it will not consume any tokens (because<br>

+the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``<br>

+starts with:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// binoprhs<br>

+    ///   ::= ('+' primary)*<br>

+    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,<br>

+                                                  std::unique_ptr<ExprAST> LHS) {<br>

+      // If this is a binop, find its precedence.<br>

+      while (1) {<br>

+        int TokPrec = GetTokPrecedence();<br>

+<br>

+        // If this is a binop that binds at least as tightly as the current binop,<br>

+        // consume it, otherwise we are done.<br>

+        if (TokPrec < ExprPrec)<br>

+          return LHS;<br>

+<br>

+This code gets the precedence of the current token and checks to see if<br>

+if is too low. Because we defined invalid tokens to have a precedence of<br>

+-1, this check implicitly knows that the pair-stream ends when the token<br>

+stream runs out of binary operators. If this check succeeds, we know<br>

+that the token is a binary operator and that it will be included in this<br>

+expression:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        // Okay, we know this is a binop.<br>

+        int BinOp = CurTok;<br>

+        getNextToken();  // eat binop<br>

+<br>

+        // Parse the primary expression after the binary operator.<br>

+        auto RHS = ParsePrimary();<br>

+        if (!RHS)<br>

+          return nullptr;<br>

+<br>

+As such, this code eats (and remembers) the binary operator and then<br>

+parses the primary expression that follows. This builds up the whole<br>

+pair, the first of which is [+, b] for the running example.<br>

+<br>

+Now that we parsed the left-hand side of an expression and one pair of<br>

+the RHS sequence, we have to decide which way the expression associates.<br>

+In particular, we could have "(a+b) binop unparsed" or "a + (b binop<br>

+unparsed)". To determine this, we look ahead at "binop" to determine its<br>

+precedence and compare it to BinOp's precedence (which is '+' in this<br>

+case):<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        // If BinOp binds less tightly with RHS than the operator after RHS, let<br>

+        // the pending operator take RHS as its LHS.<br>

+        int NextPrec = GetTokPrecedence();<br>

+        if (TokPrec < NextPrec) {<br>

+<br>

+If the precedence of the binop to the right of "RHS" is lower or equal<br>

+to the precedence of our current operator, then we know that the<br>

+parentheses associate as "(a+b) binop ...". In our example, the current<br>

+operator is "+" and the next operator is "+", we know that they have the<br>

+same precedence. In this case we'll create the AST node for "a+b", and<br>

+then continue parsing:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+          ... if body omitted ...<br>

+        }<br>

+<br>

+        // Merge LHS/RHS.<br>

+        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),<br>

+                                               std::move(RHS));<br>

+      }  // loop around to the top of the while loop.<br>

+    }<br>

+<br>

+In our example above, this will turn "a+b+" into "(a+b)" and execute the<br>

+next iteration of the loop, with "+" as the current token. The code<br>

+above will eat, remember, and parse "(c+d)" as the primary expression,<br>

+which makes the current pair equal to [+, (c+d)]. It will then evaluate<br>

+the 'if' conditional above with "\*" as the binop to the right of the<br>

+primary. In this case, the precedence of "\*" is higher than the<br>

+precedence of "+" so the if condition will be entered.<br>

+<br>

+The critical question left here is "how can the if condition parse the<br>

+right hand side in full"? In particular, to build the AST correctly for<br>

+our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression<br>

+variable. The code to do this is surprisingly simple (code from the<br>

+above two blocks duplicated for context):<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        // If BinOp binds less tightly with RHS than the operator after RHS, let<br>

+        // the pending operator take RHS as its LHS.<br>

+        int NextPrec = GetTokPrecedence();<br>

+        if (TokPrec < NextPrec) {<br>

+          RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));<br>

+          if (!RHS)<br>

+            return nullptr;<br>

+        }<br>

+        // Merge LHS/RHS.<br>

+        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),<br>

+                                               std::move(RHS));<br>

+      }  // loop around to the top of the while loop.<br>

+    }<br>

+<br>

+At this point, we know that the binary operator to the RHS of our<br>

+primary has higher precedence than the binop we are currently parsing.<br>

+As such, we know that any sequence of pairs whose operators are all<br>

+higher precedence than "+" should be parsed together and returned as<br>

+"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function<br>

+specifying "TokPrec+1" as the minimum precedence required for it to<br>

+continue. In our example above, this will cause it to return the AST<br>

+node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'<br>

+expression.<br>

+<br>

+Finally, on the next iteration of the while loop, the "+g" piece is<br>

+parsed and added to the AST. With this little bit of code (14<br>

+non-trivial lines), we correctly handle fully general binary expression<br>

+parsing in a very elegant way. This was a whirlwind tour of this code,<br>

+and it is somewhat subtle. I recommend running through it with a few<br>

+tough examples to see how it works.<br>

+<br>

+This wraps up handling of expressions. At this point, we can point the<br>

+parser at an arbitrary token stream and build an expression from it,<br>

+stopping at the first token that is not part of the expression. Next up<br>

+we need to handle function definitions, etc.<br>

+<br>

+Parsing the Rest<br>

+================<br>

+<br>

+The next thing missing is handling of function prototypes. In<br>

+Kaleidoscope, these are used both for 'extern' function declarations as<br>

+well as function body definitions. The code to do this is<br>

+straight-forward and not very interesting (once you've survived<br>

+expressions):<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// prototype<br>

+    ///   ::= id '(' id* ')'<br>

+    static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

+      if (CurTok != tok_identifier)<br>

+        return LogErrorP("Expected function name in prototype");<br>

+<br>

+      std::string FnName = IdentifierStr;<br>

+      getNextToken();<br>

+<br>

+      if (CurTok != '(')<br>

+        return LogErrorP("Expected '(' in prototype");<br>

+<br>

+      // Read the list of argument names.<br>

+      std::vector<std::string> ArgNames;<br>

+      while (getNextToken() == tok_identifier)<br>

+        ArgNames.push_back(IdentifierStr);<br>

+      if (CurTok != ')')<br>

+        return LogErrorP("Expected ')' in prototype");<br>

+<br>

+      // success.<br>

+      getNextToken();  // eat ')'.<br>

+<br>

+      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));<br>

+    }<br>

+<br>

+Given this, a function definition is very simple, just a prototype plus<br>

+an expression to implement the body:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// definition ::= 'def' prototype expression<br>

+    static std::unique_ptr<FunctionAST> ParseDefinition() {<br>

+      getNextToken();  // eat def.<br>

+      auto Proto = ParsePrototype();<br>

+      if (!Proto) return nullptr;<br>

+<br>

+      if (auto E = ParseExpression())<br>

+        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

+      return nullptr;<br>

+    }<br>

+<br>

+In addition, we support 'extern' to declare functions like 'sin' and<br>

+'cos' as well as to support forward declaration of user functions. These<br>

+'extern's are just prototypes with no body:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// external ::= 'extern' prototype<br>

+    static std::unique_ptr<PrototypeAST> ParseExtern() {<br>

+      getNextToken();  // eat extern.<br>

+      return ParsePrototype();<br>

+    }<br>

+<br>

+Finally, we'll also let the user type in arbitrary top-level expressions<br>

+and evaluate them on the fly. We will handle this by defining anonymous<br>

+nullary (zero argument) functions for them:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// toplevelexpr ::= expression<br>

+    static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {<br>

+      if (auto E = ParseExpression()) {<br>

+        // Make an anonymous proto.<br>

+        auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());<br>

+        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

+      }<br>

+      return nullptr;<br>

+    }<br>

+<br>

+Now that we have all the pieces, let's build a little driver that will<br>

+let us actually *execute* this code we've built!<br>

+<br>

+The Driver<br>

+==========<br>

+<br>

+The driver for this simply invokes all of the parsing pieces with a<br>

+top-level dispatch loop. There isn't much interesting here, so I'll just<br>

+include the top-level loop. See `below <#full-code-listing>`_ for full code in the<br>

+"Top-Level Parsing" section.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// top ::= definition | external | expression | ';'<br>

+    static void MainLoop() {<br>

+      while (1) {<br>

+        fprintf(stderr, "ready> ");<br>

+        switch (CurTok) {<br>

+        case tok_eof:<br>

+          return;<br>

+        case ';': // ignore top-level semicolons.<br>

+          getNextToken();<br>

+          break;<br>

+        case tok_def:<br>

+          HandleDefinition();<br>

+          break;<br>

+        case tok_extern:<br>

+          HandleExtern();<br>

+          break;<br>

+        default:<br>

+          HandleTopLevelExpression();<br>

+          break;<br>

+        }<br>

+      }<br>

+    }<br>

+<br>

+The most interesting part of this is that we ignore top-level<br>

+semicolons. Why is this, you ask? The basic reason is that if you type<br>

+"4 + 5" at the command line, the parser doesn't know whether that is the<br>

+end of what you will type or not. For example, on the next line you<br>

+could type "def foo..." in which case 4+5 is the end of a top-level<br>

+expression. Alternatively you could type "\* 6", which would continue<br>

+the expression. Having top-level semicolons allows you to type "4+5;",<br>

+and the parser will know you are done.<br>

+<br>

+Conclusions<br>

+===========<br>

+<br>

+With just under 400 lines of commented code (240 lines of non-comment,<br>

+non-blank code), we fully defined our minimal language, including a<br>

+lexer, parser, and AST builder. With this done, the executable will<br>

+validate Kaleidoscope code and tell us if it is grammatically invalid.<br>

+For example, here is a sample interaction:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    $ ./a.out<br>

+    ready> def foo(x y) x+foo(y, 4.0);<br>

+    Parsed a function definition.<br>

+    ready> def foo(x y) x+y y;<br>

+    Parsed a function definition.<br>

+    Parsed a top-level expr<br>

+    ready> def foo(x y) x+y );<br>

+    Parsed a function definition.<br>

+    Error: unknown token when expecting an expression<br>

+    ready> extern sin(a);<br>

+    ready> Parsed an extern<br>

+    ready> ^D<br>

+    $<br>

+<br>

+There is a lot of room for extension here. You can define new AST nodes,<br>

+extend the language in many ways, etc. In the `next<br>

+installment <LangImpl3.html>`_, we will describe how to generate LLVM<br>

+Intermediate Representation (IR) from the AST.<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for this and the previous chapter.<br>

+Note that it is fully self-contained: you don't need LLVM or any<br>

+external libraries at all for this. (Besides the C and C++ standard<br>

+libraries, of course.) To build this, just compile with:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g -O3 toy.cpp<br>

+    # Run<br>

+    ./a.out<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl03.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl03.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl03.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl03.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl03.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,567 @@<br>

+========================================<br>

+Kaleidoscope: Code generation to LLVM IR<br>

+========================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 3 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 3 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. This chapter shows you how to transform<br>

+the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into<br>

+LLVM IR. This will teach you a little bit about how LLVM does things, as<br>

+well as demonstrate how easy it is to use. It's much more work to build<br>

+a lexer and parser than it is to generate LLVM IR code. :)<br>

+<br>

+**Please note**: the code in this chapter and later require LLVM 3.7 or<br>

+later. LLVM 3.6 and before will not work with it. Also note that you<br>

+need to use a version of this tutorial that matches your LLVM release:<br>

+If you are using an official LLVM release, use the version of the<br>

+documentation included with your release or on the `<a href="http://llvm.org" rel="noreferrer" target="_blank">llvm.org</a> releases<br>

+page <<a href="http://llvm.org/releases/" rel="noreferrer" target="_blank">http://llvm.org/releases/</a>>`_.<br>

+<br>

+Code Generation Setup<br>

+=====================<br>

+<br>

+In order to generate LLVM IR, we want some simple setup to get started.<br>

+First we define virtual code generation (codegen) methods in each AST<br>

+class:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// ExprAST - Base class for all expression nodes.<br>

+    class ExprAST {<br>

+    public:<br>

+      virtual ~ExprAST() {}<br>

+      virtual Value *codegen() = 0;<br>

+    };<br>

+<br>

+    /// NumberExprAST - Expression class for numeric literals like "1.0".<br>

+    class NumberExprAST : public ExprAST {<br>

+      double Val;<br>

+<br>

+    public:<br>

+      NumberExprAST(double Val) : Val(Val) {}<br>

+      virtual Value *codegen();<br>

+    };<br>

+    ...<br>

+<br>

+The codegen() method says to emit IR for that AST node along with all<br>

+the things it depends on, and they all return an LLVM Value object.<br>

+"Value" is the class used to represent a "`Static Single Assignment<br>

+(SSA) <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_<br>

+register" or "SSA value" in LLVM. The most distinct aspect of SSA values<br>

+is that their value is computed as the related instruction executes, and<br>

+it does not get a new value until (and if) the instruction re-executes.<br>

+In other words, there is no way to "change" an SSA value. For more<br>

+information, please read up on `Static Single<br>

+Assignment <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_<br>

+- the concepts are really quite natural once you grok them.<br>

+<br>

+Note that instead of adding virtual methods to the ExprAST class<br>

+hierarchy, it could also make sense to use a `visitor<br>

+pattern <<a href="http://en.wikipedia.org/wiki/Visitor_pattern" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Visitor_pattern</a>>`_ or some other<br>

+way to model this. Again, this tutorial won't dwell on good software<br>

+engineering practices: for our purposes, adding a virtual method is<br>

+simplest.<br>

+<br>

+The second thing we want is an "LogError" method like we used for the<br>

+parser, which will be used to report errors found during code generation<br>

+(for example, use of an undeclared parameter):<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static LLVMContext TheContext;<br>

+    static IRBuilder<> Builder(TheContext);<br>

+    static std::unique_ptr<Module> TheModule;<br>

+    static std::map<std::string, Value *> NamedValues;<br>

+<br>

+    Value *LogErrorV(const char *Str) {<br>

+      LogError(Str);<br>

+      return nullptr;<br>

+    }<br>

+<br>

+The static variables will be used during code generation. ``TheContext``<br>

+is an opaque object that owns a lot of core LLVM data structures, such as<br>

+the type and constant value tables. We don't need to understand it in<br>

+detail, we just need a single instance to pass into APIs that require it.<br>

+<br>

+The ``Builder`` object is a helper object that makes it easy to generate<br>

+LLVM instructions. Instances of the<br>

+`IRBuilder <<a href="http://llvm.org/doxygen/IRBuilder_8h-source.html" rel="noreferrer" target="_blank">http://llvm.org/doxygen/IRBuilder_8h-source.html</a>>`_<br>

+class template keep track of the current place to insert instructions<br>

+and has methods to create new instructions.<br>

+<br>

+``TheModule`` is an LLVM construct that contains functions and global<br>

+variables. In many ways, it is the top-level structure that the LLVM IR<br>

+uses to contain code. It will own the memory for all of the IR that we<br>

+generate, which is why the codegen() method returns a raw Value\*,<br>

+rather than a unique_ptr<Value>.<br>

+<br>

+The ``NamedValues`` map keeps track of which values are defined in the<br>

+current scope and what their LLVM representation is. (In other words, it<br>

+is a symbol table for the code). In this form of Kaleidoscope, the only<br>

+things that can be referenced are function parameters. As such, function<br>

+parameters will be in this map when generating code for their function<br>

+body.<br>

+<br>

+With these basics in place, we can start talking about how to generate<br>

+code for each expression. Note that this assumes that the ``Builder``<br>

+has been set up to generate code *into* something. For now, we'll assume<br>

+that this has already been done, and we'll just use it to emit code.<br>

+<br>

+Expression Code Generation<br>

+==========================<br>

+<br>

+Generating LLVM code for expression nodes is very straightforward: less<br>

+than 45 lines of commented code for all four of our expression nodes.<br>

+First we'll do numeric literals:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *NumberExprAST::codegen() {<br>

+      return ConstantFP::get(LLVMContext, APFloat(Val));<br>

+    }<br>

+<br>

+In the LLVM IR, numeric constants are represented with the<br>

+``ConstantFP`` class, which holds the numeric value in an ``APFloat``<br>

+internally (``APFloat`` has the capability of holding floating point<br>

+constants of Arbitrary Precision). This code basically just creates<br>

+and returns a ``ConstantFP``. Note that in the LLVM IR that constants<br>

+are all uniqued together and shared. For this reason, the API uses the<br>

+"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *VariableExprAST::codegen() {<br>

+      // Look this variable up in the function.<br>

+      Value *V = NamedValues[Name];<br>

+      if (!V)<br>

+        LogErrorV("Unknown variable name");<br>

+      return V;<br>

+    }<br>

+<br>

+References to variables are also quite simple using LLVM. In the simple<br>

+version of Kaleidoscope, we assume that the variable has already been<br>

+emitted somewhere and its value is available. In practice, the only<br>

+values that can be in the ``NamedValues`` map are function arguments.<br>

+This code simply checks to see that the specified name is in the map (if<br>

+not, an unknown variable is being referenced) and returns the value for<br>

+it. In future chapters, we'll add support for `loop induction<br>

+variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local<br>

+variables <LangImpl7.html#user-defined-local-variables>`_.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *BinaryExprAST::codegen() {<br>

+      Value *L = LHS->codegen();<br>

+      Value *R = RHS->codegen();<br>

+      if (!L || !R)<br>

+        return nullptr;<br>

+<br>

+      switch (Op) {<br>

+      case '+':<br>

+        return Builder.CreateFAdd(L, R, "addtmp");<br>

+      case '-':<br>

+        return Builder.CreateFSub(L, R, "subtmp");<br>

+      case '*':<br>

+        return Builder.CreateFMul(L, R, "multmp");<br>

+      case '<':<br>

+        L = Builder.CreateFCmpULT(L, R, "cmptmp");<br>

+        // Convert bool 0/1 to double 0.0 or 1.0<br>

+        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),<br>

+                                    "booltmp");<br>

+      default:<br>

+        return LogErrorV("invalid binary operator");<br>

+      }<br>

+    }<br>

+<br>

+Binary operators start to get more interesting. The basic idea here is<br>

+that we recursively emit code for the left-hand side of the expression,<br>

+then the right-hand side, then we compute the result of the binary<br>

+expression. In this code, we do a simple switch on the opcode to create<br>

+the right LLVM instruction.<br>

+<br>

+In the example above, the LLVM builder class is starting to show its<br>

+value. IRBuilder knows where to insert the newly created instruction,<br>

+all you have to do is specify what instruction to create (e.g. with<br>

+``CreateFAdd``), which operands to use (``L`` and ``R`` here) and<br>

+optionally provide a name for the generated instruction.<br>

+<br>

+One nice thing about LLVM is that the name is just a hint. For instance,<br>

+if the code above emits multiple "addtmp" variables, LLVM will<br>

+automatically provide each one with an increasing, unique numeric<br>

+suffix. Local value names for instructions are purely optional, but it<br>

+makes it much easier to read the IR dumps.<br>

+<br>

+`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict<br>

+rules: for example, the Left and Right operators of an `add<br>

+instruction <../LangRef.html#add-instruction>`_ must have the same type, and the<br>

+result type of the add must match the operand types. Because all values<br>

+in Kaleidoscope are doubles, this makes for very simple code for add,<br>

+sub and mul.<br>

+<br>

+On the other hand, LLVM specifies that the `fcmp<br>

+instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a<br>

+one bit integer). The problem with this is that Kaleidoscope wants the<br>

+value to be a 0.0 or 1.0 value. In order to get these semantics, we<br>

+combine the fcmp instruction with a `uitofp<br>

+instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its<br>

+input integer into a floating point value by treating the input as an<br>

+unsigned value. In contrast, if we used the `sitofp<br>

+instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator<br>

+would return 0.0 and -1.0, depending on the input value.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *CallExprAST::codegen() {<br>

+      // Look up the name in the global module table.<br>

+      Function *CalleeF = TheModule->getFunction(Callee);<br>

+      if (!CalleeF)<br>

+        return LogErrorV("Unknown function referenced");<br>

+<br>

+      // If argument mismatch error.<br>

+      if (CalleeF->arg_size() != Args.size())<br>

+        return LogErrorV("Incorrect # arguments passed");<br>

+<br>

+      std::vector<Value *> ArgsV;<br>

+      for (unsigned i = 0, e = Args.size(); i != e; ++i) {<br>

+        ArgsV.push_back(Args[i]->codegen());<br>

+        if (!ArgsV.back())<br>

+          return nullptr;<br>

+      }<br>

+<br>

+      return Builder.CreateCall(CalleeF, ArgsV, "calltmp");<br>

+    }<br>

+<br>

+Code generation for function calls is quite straightforward with LLVM. The code<br>

+above initially does a function name lookup in the LLVM Module's symbol table.<br>

+Recall that the LLVM Module is the container that holds the functions we are<br>

+JIT'ing. By giving each function the same name as what the user specifies, we<br>

+can use the LLVM symbol table to resolve function names for us.<br>

+<br>

+Once we have the function to call, we recursively codegen each argument<br>

+that is to be passed in, and create an LLVM `call<br>

+instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C<br>

+calling conventions by default, allowing these calls to also call into<br>

+standard library functions like "sin" and "cos", with no additional<br>

+effort.<br>

+<br>

+This wraps up our handling of the four basic expressions that we have so<br>

+far in Kaleidoscope. Feel free to go in and add some more. For example,<br>

+by browsing the `LLVM language reference <../LangRef.html>`_ you'll find<br>

+several other interesting instructions that are really easy to plug into<br>

+our basic framework.<br>

+<br>

+Function Code Generation<br>

+========================<br>

+<br>

+Code generation for prototypes and functions must handle a number of<br>

+details, which make their code less beautiful than expression code<br>

+generation, but allows us to illustrate some important points. First,<br>

+lets talk about code generation for prototypes: they are used both for<br>

+function bodies and external function declarations. The code starts<br>

+with:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Function *PrototypeAST::codegen() {<br>

+      // Make the function type:  double(double,double) etc.<br>

+      std::vector<Type*> Doubles(Args.size(),<br>

+                                 Type::getDoubleTy(LLVMContext));<br>

+      FunctionType *FT =<br>

+        FunctionType::get(Type::getDoubleTy(LLVMContext), Doubles, false);<br>

+<br>

+      Function *F =<br>

+        Function::Create(FT, Function::ExternalLinkage, Name, TheModule);<br>

+<br>

+This code packs a lot of power into a few lines. Note first that this<br>

+function returns a "Function\*" instead of a "Value\*". Because a<br>

+"prototype" really talks about the external interface for a function<br>

+(not the value computed by an expression), it makes sense for it to<br>

+return the LLVM Function it corresponds to when codegen'd.<br>

+<br>

+The call to ``FunctionType::get`` creates the ``FunctionType`` that<br>

+should be used for a given Prototype. Since all function arguments in<br>

+Kaleidoscope are of type double, the first line creates a vector of "N"<br>

+LLVM double types. It then uses the ``Functiontype::get`` method to<br>

+create a function type that takes "N" doubles as arguments, returns one<br>

+double as a result, and that is not vararg (the false parameter<br>

+indicates this). Note that Types in LLVM are uniqued just like Constants<br>

+are, so you don't "new" a type, you "get" it.<br>

+<br>

+The final line above actually creates the IR Function corresponding to<br>

+the Prototype. This indicates the type, linkage and name to use, as<br>

+well as which module to insert into. "`external<br>

+linkage <../LangRef.html#linkage>`_" means that the function may be<br>

+defined outside the current module and/or that it is callable by<br>

+functions outside the module. The Name passed in is the name the user<br>

+specified: since "``TheModule``" is specified, this name is registered<br>

+in "``TheModule``"s symbol table.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  // Set names for all arguments.<br>

+  unsigned Idx = 0;<br>

+  for (auto &Arg : F->args())<br>

+    Arg.setName(Args[Idx++]);<br>

+<br>

+  return F;<br>

+<br>

+Finally, we set the name of each of the function's arguments according to the<br>

+names given in the Prototype. This step isn't strictly necessary, but keeping<br>

+the names consistent makes the IR more readable, and allows subsequent code to<br>

+refer directly to the arguments for their names, rather than having to look up<br>

+them up in the Prototype AST.<br>

+<br>

+At this point we have a function prototype with no body. This is how LLVM IR<br>

+represents function declarations. For extern statements in Kaleidoscope, this<br>

+is as far as we need to go. For function definitions however, we need to<br>

+codegen and attach a function body.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  Function *FunctionAST::codegen() {<br>

+      // First, check for an existing function from a previous 'extern' declaration.<br>

+    Function *TheFunction = TheModule->getFunction(Proto->getName());<br>

+<br>

+    if (!TheFunction)<br>

+      TheFunction = Proto->codegen();<br>

+<br>

+    if (!TheFunction)<br>

+      return nullptr;<br>

+<br>

+    if (!TheFunction->empty())<br>

+      return (Function*)LogErrorV("Function cannot be redefined.");<br>

+<br>

+<br>

+For function definitions, we start by searching TheModule's symbol table for an<br>

+existing version of this function, in case one has already been created using an<br>

+'extern' statement. If Module::getFunction returns null then no previous version<br>

+exists, so we'll codegen one from the Prototype. In either case, we want to<br>

+assert that the function is empty (i.e. has no body yet) before we start.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  // Create a new basic block to start insertion into.<br>

+  BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);<br>

+  Builder.SetInsertPoint(BB);<br>

+<br>

+  // Record the function arguments in the NamedValues map.<br>

+  NamedValues.clear();<br>

+  for (auto &Arg : TheFunction->args())<br>

+    NamedValues[Arg.getName()] = &Arg;<br>

+<br>

+Now we get to the point where the ``Builder`` is set up. The first line<br>

+creates a new `basic block <<a href="http://en.wikipedia.org/wiki/Basic_block" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Basic_block</a>>`_<br>

+(named "entry"), which is inserted into ``TheFunction``. The second line<br>

+then tells the builder that new instructions should be inserted into the<br>

+end of the new basic block. Basic blocks in LLVM are an important part<br>

+of functions that define the `Control Flow<br>

+Graph <<a href="http://en.wikipedia.org/wiki/Control_flow_graph" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Control_flow_graph</a>>`_. Since we<br>

+don't have any control flow, our functions will only contain one block<br>

+at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).<br>

+<br>

+Next we add the function arguments to the NamedValues map (after first clearing<br>

+it out) so that they're accessible to ``VariableExprAST`` nodes.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      if (Value *RetVal = Body->codegen()) {<br>

+        // Finish off the function.<br>

+        Builder.CreateRet(RetVal);<br>

+<br>

+        // Validate the generated code, checking for consistency.<br>

+        verifyFunction(*TheFunction);<br>

+<br>

+        return TheFunction;<br>

+      }<br>

+<br>

+Once the insertion point has been set up and the NamedValues map populated,<br>

+we call the ``codegen()`` method for the root expression of the function. If no<br>

+error happens, this emits code to compute the expression into the entry block<br>

+and returns the value that was computed. Assuming no error, we then create an<br>

+LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function.<br>

+Once the function is built, we call ``verifyFunction``, which is<br>

+provided by LLVM. This function does a variety of consistency checks on<br>

+the generated code, to determine if our compiler is doing everything<br>

+right. Using this is important: it can catch a lot of bugs. Once the<br>

+function is finished and validated, we return it.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Error reading body, remove function.<br>

+      TheFunction->eraseFromParent();<br>

+      return nullptr;<br>

+    }<br>

+<br>

+The only piece left here is handling of the error case. For simplicity,<br>

+we handle this by merely deleting the function we produced with the<br>

+``eraseFromParent`` method. This allows the user to redefine a function<br>

+that they incorrectly typed in before: if we didn't delete it, it would<br>

+live in the symbol table, with a body, preventing future redefinition.<br>

+<br>

+This code does have a bug, though: If the ``FunctionAST::codegen()`` method<br>

+finds an existing IR Function, it does not validate its signature against the<br>

+definition's own prototype. This means that an earlier 'extern' declaration will<br>

+take precedence over the function definition's signature, which can cause<br>

+codegen to fail, for instance if the function arguments are named differently.<br>

+There are a number of ways to fix this bug, see what you can come up with! Here<br>

+is a testcase:<br>

+<br>

+::<br>

+<br>

+    extern foo(a);     # ok, defines foo.<br>

+    def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence).<br>

+<br>

+Driver Changes and Closing Thoughts<br>

+===================================<br>

+<br>

+For now, code generation to LLVM doesn't really get us much, except that<br>

+we can look at the pretty IR calls. The sample code inserts calls to<br>

+codegen into the "``HandleDefinition``", "``HandleExtern``" etc<br>

+functions, and then dumps out the LLVM IR. This gives a nice way to look<br>

+at the LLVM IR for simple functions. For example:<br>

+<br>

+::<br>

+<br>

+    ready> 4+5;<br>

+    Read top-level expression:<br>

+    define double @0() {<br>

+    entry:<br>

+      ret double 9.000000e+00<br>

+    }<br>

+<br>

+Note how the parser turns the top-level expression into anonymous<br>

+functions for us. This will be handy when we add `JIT<br>

+support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the<br>

+code is very literally transcribed, no optimizations are being performed<br>

+except simple constant folding done by IRBuilder. We will `add<br>

+optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next<br>

+chapter.<br>

+<br>

+::<br>

+<br>

+    ready> def foo(a b) a*a + 2*a*b + b*b;<br>

+    Read function definition:<br>

+    define double @foo(double %a, double %b) {<br>

+    entry:<br>

+      %multmp = fmul double %a, %a<br>

+      %multmp1 = fmul double 2.000000e+00, %a<br>

+      %multmp2 = fmul double %multmp1, %b<br>

+      %addtmp = fadd double %multmp, %multmp2<br>

+      %multmp3 = fmul double %b, %b<br>

+      %addtmp4 = fadd double %addtmp, %multmp3<br>

+      ret double %addtmp4<br>

+    }<br>

+<br>

+This shows some simple arithmetic. Notice the striking similarity to the<br>

+LLVM builder calls that we use to create the instructions.<br>

+<br>

+::<br>

+<br>

+    ready> def bar(a) foo(a, 4.0) + bar(31337);<br>

+    Read function definition:<br>

+    define double @bar(double %a) {<br>

+    entry:<br>

+      %calltmp = call double @foo(double %a, double 4.000000e+00)<br>

+      %calltmp1 = call double @bar(double 3.133700e+04)<br>

+      %addtmp = fadd double %calltmp, %calltmp1<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+This shows some function calls. Note that this function will take a long<br>

+time to execute if you call it. In the future we'll add conditional<br>

+control flow to actually make recursion useful :).<br>

+<br>

+::<br>

+<br>

+    ready> extern cos(x);<br>

+    Read extern:<br>

+    declare double @cos(double)<br>

+<br>

+    ready> cos(1.234);<br>

+    Read top-level expression:<br>

+    define double @1() {<br>

+    entry:<br>

+      %calltmp = call double @cos(double 1.234000e+00)<br>

+      ret double %calltmp<br>

+    }<br>

+<br>

+This shows an extern for the libm "cos" function, and a call to it.<br>

+<br>

+.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up<br>

+   on highlighting this due to the first line.<br>

+<br>

+::<br>

+<br>

+    ready> ^D<br>

+    ; ModuleID = 'my cool jit'<br>

+<br>

+    define double @0() {<br>

+    entry:<br>

+      %addtmp = fadd double 4.000000e+00, 5.000000e+00<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+    define double @foo(double %a, double %b) {<br>

+    entry:<br>

+      %multmp = fmul double %a, %a<br>

+      %multmp1 = fmul double 2.000000e+00, %a<br>

+      %multmp2 = fmul double %multmp1, %b<br>

+      %addtmp = fadd double %multmp, %multmp2<br>

+      %multmp3 = fmul double %b, %b<br>

+      %addtmp4 = fadd double %addtmp, %multmp3<br>

+      ret double %addtmp4<br>

+    }<br>

+<br>

+    define double @bar(double %a) {<br>

+    entry:<br>

+      %calltmp = call double @foo(double %a, double 4.000000e+00)<br>

+      %calltmp1 = call double @bar(double 3.133700e+04)<br>

+      %addtmp = fadd double %calltmp, %calltmp1<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+    declare double @cos(double)<br>

+<br>

+    define double @1() {<br>

+    entry:<br>

+      %calltmp = call double @cos(double 1.234000e+00)<br>

+      ret double %calltmp<br>

+    }<br>

+<br>

+When you quit the current demo, it dumps out the IR for the entire<br>

+module generated. Here you can see the big picture with all the<br>

+functions referencing each other.<br>

+<br>

+This wraps up the third chapter of the Kaleidoscope tutorial. Up next,<br>

+we'll describe how to `add JIT codegen and optimizer<br>

+support <LangImpl4.html>`_ to this so we can actually start running<br>

+code!<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for our running example, enhanced with<br>

+the LLVM code generator. Because this uses the LLVM libraries, we need<br>

+to link them in. To do this, we use the<br>

+`llvm-config <<a href="http://llvm.org/cmds/llvm-config.html" rel="noreferrer" target="_blank">http://llvm.org/cmds/llvm-config.html</a>>`_ tool to inform<br>

+our makefile/command line about which options to use:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy<br>

+    # Run<br>

+    ./toy<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter3/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Adding JIT and Optimizer Support <LangImpl04.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl04.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl04.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl04.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl04.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl04.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,610 @@<br>

+==============================================<br>

+Kaleidoscope: Adding JIT and Optimizer Support<br>

+==============================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 4 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 4 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation<br>

+of a simple language and added support for generating LLVM IR. This<br>

+chapter describes two new techniques: adding optimizer support to your<br>

+language, and adding JIT compiler support. These additions will<br>

+demonstrate how to get nice, efficient code for the Kaleidoscope<br>

+language.<br>

+<br>

+Trivial Constant Folding<br>

+========================<br>

+<br>

+Our demonstration for Chapter 3 is elegant and easy to extend.<br>

+Unfortunately, it does not produce wonderful code. The IRBuilder,<br>

+however, does give us obvious optimizations when compiling simple code:<br>

+<br>

+::<br>

+<br>

+    ready> def test(x) 1+2+x;<br>

+    Read function definition:<br>

+    define double @test(double %x) {<br>

+    entry:<br>

+            %addtmp = fadd double 3.000000e+00, %x<br>

+            ret double %addtmp<br>

+    }<br>

+<br>

+This code is not a literal transcription of the AST built by parsing the<br>

+input. That would be:<br>

+<br>

+::<br>

+<br>

+    ready> def test(x) 1+2+x;<br>

+    Read function definition:<br>

+    define double @test(double %x) {<br>

+    entry:<br>

+            %addtmp = fadd double 2.000000e+00, 1.000000e+00<br>

+            %addtmp1 = fadd double %addtmp, %x<br>

+            ret double %addtmp1<br>

+    }<br>

+<br>

+Constant folding, as seen above, in particular, is a very common and<br>

+very important optimization: so much so that many language implementors<br>

+implement constant folding support in their AST representation.<br>

+<br>

+With LLVM, you don't need this support in the AST. Since all calls to<br>

+build LLVM IR go through the LLVM IR builder, the builder itself checked<br>

+to see if there was a constant folding opportunity when you call it. If<br>

+so, it just does the constant fold and return the constant instead of<br>

+creating an instruction.<br>

+<br>

+Well, that was easy :). In practice, we recommend always using<br>

+``IRBuilder`` when generating code like this. It has no "syntactic<br>

+overhead" for its use (you don't have to uglify your compiler with<br>

+constant checks everywhere) and it can dramatically reduce the amount of<br>

+LLVM IR that is generated in some cases (particular for languages with a<br>

+macro preprocessor or that use a lot of constants).<br>

+<br>

+On the other hand, the ``IRBuilder`` is limited by the fact that it does<br>

+all of its analysis inline with the code as it is built. If you take a<br>

+slightly more complex example:<br>

+<br>

+::<br>

+<br>

+    ready> def test(x) (1+2+x)*(x+(1+2));<br>

+    ready> Read function definition:<br>

+    define double @test(double %x) {<br>

+    entry:<br>

+            %addtmp = fadd double 3.000000e+00, %x<br>

+            %addtmp1 = fadd double %x, 3.000000e+00<br>

+            %multmp = fmul double %addtmp, %addtmp1<br>

+            ret double %multmp<br>

+    }<br>

+<br>

+In this case, the LHS and RHS of the multiplication are the same value.<br>

+We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"<br>

+instead of computing "``x+3``" twice.<br>

+<br>

+Unfortunately, no amount of local analysis will be able to detect and<br>

+correct this. This requires two transformations: reassociation of<br>

+expressions (to make the add's lexically identical) and Common<br>

+Subexpression Elimination (CSE) to delete the redundant add instruction.<br>

+Fortunately, LLVM provides a broad range of optimizations that you can<br>

+use, in the form of "passes".<br>

+<br>

+LLVM Optimization Passes<br>

+========================<br>

+<br>

+LLVM provides many optimization passes, which do many different sorts of<br>

+things and have different tradeoffs. Unlike other systems, LLVM doesn't<br>

+hold to the mistaken notion that one set of optimizations is right for<br>

+all languages and for all situations. LLVM allows a compiler implementor<br>

+to make complete decisions about what optimizations to use, in which<br>

+order, and in what situation.<br>

+<br>

+As a concrete example, LLVM supports both "whole module" passes, which<br>

+look across as large of body of code as they can (often a whole file,<br>

+but if run at link time, this can be a substantial portion of the whole<br>

+program). It also supports and includes "per-function" passes which just<br>

+operate on a single function at a time, without looking at other<br>

+functions. For more information on passes and how they are run, see the<br>

+`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the<br>

+`List of LLVM Passes <../Passes.html>`_.<br>

+<br>

+For Kaleidoscope, we are currently generating functions on the fly, one<br>

+at a time, as the user types them in. We aren't shooting for the<br>

+ultimate optimization experience in this setting, but we also want to<br>

+catch the easy and quick stuff where possible. As such, we will choose<br>

+to run a few per-function optimizations as the user types the function<br>

+in. If we wanted to make a "static Kaleidoscope compiler", we would use<br>

+exactly the code we have now, except that we would defer running the<br>

+optimizer until the entire file has been parsed.<br>

+<br>

+In order to get per-function optimizations going, we need to set up a<br>

+`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold<br>

+and organize the LLVM optimizations that we want to run. Once we have<br>

+that, we can add a set of optimizations to run. We'll need a new<br>

+FunctionPassManager for each module that we want to optimize, so we'll<br>

+write a function to create and initialize both the module and pass manager<br>

+for us:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    void InitializeModuleAndPassManager(void) {<br>

+      // Open a new module.<br>

+      Context LLVMContext;<br>

+      TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext);<br>

+      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());<br>

+<br>

+      // Create a new pass manager attached to it.<br>

+      TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get());<br>

+<br>

+      // Provide basic AliasAnalysis support for GVN.<br>

+      TheFPM.add(createBasicAliasAnalysisPass());<br>

+      // Do simple "peephole" optimizations and bit-twiddling optzns.<br>

+      TheFPM.add(createInstructionCombiningPass());<br>

+      // Reassociate expressions.<br>

+      TheFPM.add(createReassociatePass());<br>

+      // Eliminate Common SubExpressions.<br>

+      TheFPM.add(createGVNPass());<br>

+      // Simplify the control flow graph (deleting unreachable blocks, etc).<br>

+      TheFPM.add(createCFGSimplificationPass());<br>

+<br>

+      TheFPM.doInitialization();<br>

+    }<br>

+<br>

+This code initializes the global module ``TheModule``, and the function pass<br>

+manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is<br>

+set up, we use a series of "add" calls to add a bunch of LLVM passes.<br>

+<br>

+In this case, we choose to add five passes: one analysis pass (alias analysis),<br>

+and four optimization passes. The passes we choose here are a pretty standard set<br>

+of "cleanup" optimizations that are useful for a wide variety of code. I won't<br>

+delve into what they do but, believe me, they are a good starting place :).<br>

+<br>

+Once the PassManager is set up, we need to make use of it. We do this by<br>

+running it after our newly created function is constructed (in<br>

+``FunctionAST::codegen()``), but before it is returned to the client:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      if (Value *RetVal = Body->codegen()) {<br>

+        // Finish off the function.<br>

+        Builder.CreateRet(RetVal);<br>

+<br>

+        // Validate the generated code, checking for consistency.<br>

+        verifyFunction(*TheFunction);<br>

+<br>

+        // Optimize the function.<br>

+        TheFPM->run(*TheFunction);<br>

+<br>

+        return TheFunction;<br>

+      }<br>

+<br>

+As you can see, this is pretty straightforward. The<br>

+``FunctionPassManager`` optimizes and updates the LLVM Function\* in<br>

+place, improving (hopefully) its body. With this in place, we can try<br>

+our test above again:<br>

+<br>

+::<br>

+<br>

+    ready> def test(x) (1+2+x)*(x+(1+2));<br>

+    ready> Read function definition:<br>

+    define double @test(double %x) {<br>

+    entry:<br>

+            %addtmp = fadd double %x, 3.000000e+00<br>

+            %multmp = fmul double %addtmp, %addtmp<br>

+            ret double %multmp<br>

+    }<br>

+<br>

+As expected, we now get our nicely optimized code, saving a floating<br>

+point add instruction from every execution of this function.<br>

+<br>

+LLVM provides a wide variety of optimizations that can be used in<br>

+certain circumstances. Some `documentation about the various<br>

+passes <../Passes.html>`_ is available, but it isn't very complete.<br>

+Another good source of ideas can come from looking at the passes that<br>

+``Clang`` runs to get started. The "``opt``" tool allows you to<br>

+experiment with passes from the command line, so you can see if they do<br>

+anything.<br>

+<br>

+Now that we have reasonable code coming out of our front-end, lets talk<br>

+about executing it!<br>

+<br>

+Adding a JIT Compiler<br>

+=====================<br>

+<br>

+Code that is available in LLVM IR can have a wide variety of tools<br>

+applied to it. For example, you can run optimizations on it (as we did<br>

+above), you can dump it out in textual or binary forms, you can compile<br>

+the code to an assembly file (.s) for some target, or you can JIT<br>

+compile it. The nice thing about the LLVM IR representation is that it<br>

+is the "common currency" between many different parts of the compiler.<br>

+<br>

+In this section, we'll add JIT compiler support to our interpreter. The<br>

+basic idea that we want for Kaleidoscope is to have the user enter<br>

+function bodies as they do now, but immediately evaluate the top-level<br>

+expressions they type in. For example, if they type in "1 + 2;", we<br>

+should evaluate and print out 3. If they define a function, they should<br>

+be able to call it from the command line.<br>

+<br>

+In order to do this, we first declare and initialize the JIT. This is<br>

+done by adding a global variable ``TheJIT``, and initializing it in<br>

+``main``:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static std::unique_ptr<KaleidoscopeJIT> TheJIT;<br>

+    ...<br>

+    int main() {<br>

+      ..<br>

+      TheJIT = llvm::make_unique<KaleidoscopeJIT>();<br>

+<br>

+      // Run the main "interpreter loop" now.<br>

+      MainLoop();<br>

+<br>

+      return 0;<br>

+    }<br>

+<br>

+The KaleidoscopeJIT class is a simple JIT built specifically for these<br>

+tutorials. In later chapters we will look at how it works and extend it with<br>

+new features, but for now we will take it as given. Its API is very simple::<br>

+``addModule`` adds an LLVM IR module to the JIT, making its functions<br>

+available for execution; ``removeModule`` removes a module, freeing any<br>

+memory associated with the code in that module; and ``findSymbol`` allows us<br>

+to look up pointers to the compiled code.<br>

+<br>

+We can take this simple API and change our code that parses top-level expressions to<br>

+look like this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static void HandleTopLevelExpression() {<br>

+      // Evaluate a top-level expression into an anonymous function.<br>

+      if (auto FnAST = ParseTopLevelExpr()) {<br>

+        if (FnAST->codegen()) {<br>

+<br>

+          // JIT the module containing the anonymous expression, keeping a handle so<br>

+          // we can free it later.<br>

+          auto H = TheJIT->addModule(std::move(TheModule));<br>

+          InitializeModuleAndPassManager();<br>

+<br>

+          // Search the JIT for the __anon_expr symbol.<br>

+          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");<br>

+          assert(ExprSymbol && "Function not found");<br>

+<br>

+          // Get the symbol's address and cast it to the right type (takes no<br>

+          // arguments, returns a double) so we can call it as a native function.<br>

+          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();<br>

+          fprintf(stderr, "Evaluated to %f\n", FP());<br>

+<br>

+          // Delete the anonymous expression module from the JIT.<br>

+          TheJIT->removeModule(H);<br>

+        }<br>

+<br>

+If parsing and codegen succeeed, the next step is to add the module containing<br>

+the top-level expression to the JIT. We do this by calling addModule, which<br>

+triggers code generation for all the functions in the module, and returns a<br>

+handle that can be used to remove the module from the JIT later. Once the module<br>

+has been added to the JIT it can no longer be modified, so we also open a new<br>

+module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.<br>

+<br>

+Once we've added the module to the JIT we need to get a pointer to the final<br>

+generated code. We do this by calling the JIT's findSymbol method, and passing<br>

+the name of the top-level expression function: ``__anon_expr``. Since we just<br>

+added this function, we assert that findSymbol returned a result.<br>

+<br>

+Next, we get the in-memory address of the ``__anon_expr`` function by calling<br>

+``getAddress()`` on the symbol. Recall that we compile top-level expressions<br>

+into a self-contained LLVM function that takes no arguments and returns the<br>

+computed double. Because the LLVM JIT compiler matches the native platform ABI,<br>

+this means that you can just cast the result pointer to a function pointer of<br>

+that type and call it directly. This means, there is no difference between JIT<br>

+compiled code and native machine code that is statically linked into your<br>

+application.<br>

+<br>

+Finally, since we don't support re-evaluation of top-level expressions, we<br>

+remove the module from the JIT when we're done to free the associated memory.<br>

+Recall, however, that the module we created a few lines earlier (via<br>

+``InitializeModuleAndPassManager``) is still open and waiting for new code to be<br>

+added.<br>

+<br>

+With just these two changes, lets see how Kaleidoscope works now!<br>

+<br>

+::<br>

+<br>

+    ready> 4+5;<br>

+    Read top-level expression:<br>

+    define double @0() {<br>

+    entry:<br>

+      ret double 9.000000e+00<br>

+    }<br>

+<br>

+    Evaluated to 9.000000<br>

+<br>

+Well this looks like it is basically working. The dump of the function<br>

+shows the "no argument function that always returns double" that we<br>

+synthesize for each top-level expression that is typed in. This<br>

+demonstrates very basic functionality, but can we do more?<br>

+<br>

+::<br>

+<br>

+    ready> def testfunc(x y) x + y*2;<br>

+    Read function definition:<br>

+    define double @testfunc(double %x, double %y) {<br>

+    entry:<br>

+      %multmp = fmul double %y, 2.000000e+00<br>

+      %addtmp = fadd double %multmp, %x<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+    ready> testfunc(4, 10);<br>

+    Read top-level expression:<br>

+    define double @1() {<br>

+    entry:<br>

+      %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)<br>

+      ret double %calltmp<br>

+    }<br>

+<br>

+    Evaluated to 24.000000<br>

+<br>

+    ready> testfunc(5, 10);<br>

+    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!<br>

+<br>

+<br>

+Function definitions and calls also work, but something went very wrong on that<br>

+last line. The call looks valid, so what happened? As you may have guessed from<br>

+the the API a Module is a unit of allocation for the JIT, and testfunc was part<br>

+of the same module that contained anonymous expression. When we removed that<br>

+module from the JIT to free the memory for the anonymous expression, we deleted<br>

+the definition of ``testfunc`` along with it. Then, when we tried to call<br>

+testfunc a second time, the JIT could no longer find it.<br>

+<br>

+The easiest way to fix this is to put the anonymous expression in a separate<br>

+module from the rest of the function definitions. The JIT will happily resolve<br>

+function calls across module boundaries, as long as each of the functions called<br>

+has a prototype, and is added to the JIT before it is called. By putting the<br>

+anonymous expression in a different module we can delete it without affecting<br>

+the rest of the functions.<br>

+<br>

+In fact, we're going to go a step further and put every function in its own<br>

+module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT<br>

+that will make our environment more REPL-like: Functions can be added to the<br>

+JIT more than once (unlike a module where every function must have a unique<br>

+definition). When you look up a symbol in KaleidoscopeJIT it will always return<br>

+the most recent definition:<br>

+<br>

+::<br>

+<br>

+    ready> def foo(x) x + 1;<br>

+    Read function definition:<br>

+    define double @foo(double %x) {<br>

+    entry:<br>

+      %addtmp = fadd double %x, 1.000000e+00<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+    ready> foo(2);<br>

+    Evaluated to 3.000000<br>

+<br>

+    ready> def foo(x) x + 2;<br>

+    define double @foo(double %x) {<br>

+    entry:<br>

+      %addtmp = fadd double %x, 2.000000e+00<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+    ready> foo(2);<br>

+    Evaluated to 4.000000<br>

+<br>

+<br>

+To allow each function to live in its own module we'll need a way to<br>

+re-generate previous function declarations into each new module we open:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static std::unique_ptr<KaleidoscopeJIT> TheJIT;<br>

+<br>

+    ...<br>

+<br>

+    Function *getFunction(std::string Name) {<br>

+      // First, see if the function has already been added to the current module.<br>

+      if (auto *F = TheModule->getFunction(Name))<br>

+        return F;<br>

+<br>

+      // If not, check whether we can codegen the declaration from some existing<br>

+      // prototype.<br>

+      auto FI = FunctionProtos.find(Name);<br>

+      if (FI != FunctionProtos.end())<br>

+        return FI->second->codegen();<br>

+<br>

+      // If no existing prototype exists, return null.<br>

+      return nullptr;<br>

+    }<br>

+<br>

+    ...<br>

+<br>

+    Value *CallExprAST::codegen() {<br>

+      // Look up the name in the global module table.<br>

+      Function *CalleeF = getFunction(Callee);<br>

+<br>

+    ...<br>

+<br>

+    Function *FunctionAST::codegen() {<br>

+      // Transfer ownership of the prototype to the FunctionProtos map, but keep a<br>

+      // reference to it for use below.<br>

+      auto &P = *Proto;<br>

+      FunctionProtos[Proto->getName()] = std::move(Proto);<br>

+      Function *TheFunction = getFunction(P.getName());<br>

+      if (!TheFunction)<br>

+        return nullptr;<br>

+<br>

+<br>

+To enable this, we'll start by adding a new global, ``FunctionProtos``, that<br>

+holds the most recent prototype for each function. We'll also add a convenience<br>

+method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.<br>

+Our convenience method searches ``TheModule`` for an existing function<br>

+declaration, falling back to generating a new declaration from FunctionProtos if<br>

+it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the<br>

+call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to<br>

+update the FunctionProtos map first, then call ``getFunction()``. With this<br>

+done, we can always obtain a function declaration in the current module for any<br>

+previously declared function.<br>

+<br>

+We also need to update HandleDefinition and HandleExtern:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static void HandleDefinition() {<br>

+      if (auto FnAST = ParseDefinition()) {<br>

+        if (auto *FnIR = FnAST->codegen()) {<br>

+          fprintf(stderr, "Read function definition:");<br>

+          FnIR->dump();<br>

+          TheJIT->addModule(std::move(TheModule));<br>

+          InitializeModuleAndPassManager();<br>

+        }<br>

+      } else {<br>

+        // Skip token for error recovery.<br>

+         getNextToken();<br>

+      }<br>

+    }<br>

+<br>

+    static void HandleExtern() {<br>

+      if (auto ProtoAST = ParseExtern()) {<br>

+        if (auto *FnIR = ProtoAST->codegen()) {<br>

+          fprintf(stderr, "Read extern: ");<br>

+          FnIR->dump();<br>

+          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);<br>

+        }<br>

+      } else {<br>

+        // Skip token for error recovery.<br>

+        getNextToken();<br>

+      }<br>

+    }<br>

+<br>

+In HandleDefinition, we add two lines to transfer the newly defined function to<br>

+the JIT and open a new module. In HandleExtern, we just need to add one line to<br>

+add the prototype to FunctionProtos.<br>

+<br>

+With these changes made, lets try our REPL again (I removed the dump of the<br>

+anonymous functions this time, you should get the idea by now :) :<br>

+<br>

+::<br>

+<br>

+    ready> def foo(x) x + 1;<br>

+    ready> foo(2);<br>

+    Evaluated to 3.000000<br>

+<br>

+    ready> def foo(x) x + 2;<br>

+    ready> foo(2);<br>

+    Evaluated to 4.000000<br>

+<br>

+It works!<br>

+<br>

+Even with this simple code, we get some surprisingly powerful capabilities -<br>

+check this out:<br>

+<br>

+::<br>

+<br>

+    ready> extern sin(x);<br>

+    Read extern:<br>

+    declare double @sin(double)<br>

+<br>

+    ready> extern cos(x);<br>

+    Read extern:<br>

+    declare double @cos(double)<br>

+<br>

+    ready> sin(1.0);<br>

+    Read top-level expression:<br>

+    define double @2() {<br>

+    entry:<br>

+      ret double 0x3FEAED548F090CEE<br>

+    }<br>

+<br>

+    Evaluated to 0.841471<br>

+<br>

+    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);<br>

+    Read function definition:<br>

+    define double @foo(double %x) {<br>

+    entry:<br>

+      %calltmp = call double @sin(double %x)<br>

+      %multmp = fmul double %calltmp, %calltmp<br>

+      %calltmp2 = call double @cos(double %x)<br>

+      %multmp4 = fmul double %calltmp2, %calltmp2<br>

+      %addtmp = fadd double %multmp, %multmp4<br>

+      ret double %addtmp<br>

+    }<br>

+<br>

+    ready> foo(4.0);<br>

+    Read top-level expression:<br>

+    define double @3() {<br>

+    entry:<br>

+      %calltmp = call double @foo(double 4.000000e+00)<br>

+      ret double %calltmp<br>

+    }<br>

+<br>

+    Evaluated to 1.000000<br>

+<br>

+Whoa, how does the JIT know about sin and cos? The answer is surprisingly<br>

+simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that<br>

+it uses to find symbols that aren't available in any given module: First<br>

+it searches all the modules that have already been added to the JIT, from the<br>

+most recent to the oldest, to find the newest definition. If no definition is<br>

+found inside the JIT, it falls back to calling "``dlsym("sin")``" on the<br>

+Kaleidoscope process itself. Since "``sin``" is defined within the JIT's<br>

+address space, it simply patches up calls in the module to call the libm<br>

+version of ``sin`` directly.<br>

+<br>

+In the future we'll see how tweaking this symbol resolution rule can be used to<br>

+enable all sorts of useful features, from security (restricting the set of<br>

+symbols available to JIT'd code), to dynamic code generation based on symbol<br>

+names, and even lazy compilation.<br>

+<br>

+One immediate benefit of the symbol resolution rule is that we can now extend<br>

+the language by writing arbitrary C++ code to implement operations. For example,<br>

+if we add:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// putchard - putchar that takes a double and returns 0.<br>

+    extern "C" double putchard(double X) {<br>

+      fputc((char)X, stderr);<br>

+      return 0;<br>

+    }<br>

+<br>

+Now we can produce simple output to the console by using things like:<br>

+"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'<br>

+on the console (120 is the ASCII code for 'x'). Similar code could be<br>

+used to implement file I/O, console input, and many other capabilities<br>

+in Kaleidoscope.<br>

+<br>

+This completes the JIT and optimizer chapter of the Kaleidoscope<br>

+tutorial. At this point, we can compile a non-Turing-complete<br>

+programming language, optimize and JIT compile it in a user-driven way.<br>

+Next up we'll look into `extending the language with control flow<br>

+constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues<br>

+along the way.<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for our running example, enhanced with<br>

+the LLVM JIT and optimizer. To build this example, use:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

+    # Run<br>

+    ./toy<br>

+<br>

+If you are compiling this on Linux, make sure to add the "-rdynamic"<br>

+option as well. This makes sure that the external functions are resolved<br>

+properly at runtime.<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Extending the language: control flow <LangImpl05.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl05-cfg.png<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl05-cfg.png?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl05-cfg.png?rev=274441&view=auto</a><br>

==============================================================================<br>

Binary file - no diff available.<br>

<br>

Propchange: llvm/trunk/docs/tutorial/LangImpl05-cfg.png<br>

------------------------------------------------------------------------------<br>

    svn:mime-type = image/png<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl05.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl05.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl05.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl05.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl05.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,790 @@<br>

+==================================================<br>

+Kaleidoscope: Extending the Language: Control Flow<br>

+==================================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 5 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 5 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of<br>

+the simple Kaleidoscope language and included support for generating<br>

+LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as<br>

+presented, Kaleidoscope is mostly useless: it has no control flow other<br>

+than call and return. This means that you can't have conditional<br>

+branches in the code, significantly limiting its power. In this episode<br>

+of "build that compiler", we'll extend Kaleidoscope to have an<br>

+if/then/else expression plus a simple 'for' loop.<br>

+<br>

+If/Then/Else<br>

+============<br>

+<br>

+Extending Kaleidoscope to support if/then/else is quite straightforward.<br>

+It basically requires adding support for this "new" concept to the<br>

+lexer, parser, AST, and LLVM code emitter. This example is nice, because<br>

+it shows how easy it is to "grow" a language over time, incrementally<br>

+extending it as new ideas are discovered.<br>

+<br>

+Before we get going on "how" we add this extension, lets talk about<br>

+"what" we want. The basic idea is that we want to be able to write this<br>

+sort of thing:<br>

+<br>

+::<br>

+<br>

+    def fib(x)<br>

+      if x < 3 then<br>

+        1<br>

+      else<br>

+        fib(x-1)+fib(x-2);<br>

+<br>

+In Kaleidoscope, every construct is an expression: there are no<br>

+statements. As such, the if/then/else expression needs to return a value<br>

+like any other. Since we're using a mostly functional form, we'll have<br>

+it evaluate its conditional, then return the 'then' or 'else' value<br>

+based on how the condition was resolved. This is very similar to the C<br>

+"?:" expression.<br>

+<br>

+The semantics of the if/then/else expression is that it evaluates the<br>

+condition to a boolean equality value: 0.0 is considered to be false and<br>

+everything else is considered to be true. If the condition is true, the<br>

+first subexpression is evaluated and returned, if the condition is<br>

+false, the second subexpression is evaluated and returned. Since<br>

+Kaleidoscope allows side-effects, this behavior is important to nail<br>

+down.<br>

+<br>

+Now that we know what we "want", lets break this down into its<br>

+constituent pieces.<br>

+<br>

+Lexer Extensions for If/Then/Else<br>

+---------------------------------<br>

+<br>

+The lexer extensions are straightforward. First we add new enum values<br>

+for the relevant tokens:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // control<br>

+      tok_if = -6,<br>

+      tok_then = -7,<br>

+      tok_else = -8,<br>

+<br>

+Once we have that, we recognize the new keywords in the lexer. This is<br>

+pretty simple stuff:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        ...<br>

+        if (IdentifierStr == "def")<br>

+          return tok_def;<br>

+        if (IdentifierStr == "extern")<br>

+          return tok_extern;<br>

+        if (IdentifierStr == "if")<br>

+          return tok_if;<br>

+        if (IdentifierStr == "then")<br>

+          return tok_then;<br>

+        if (IdentifierStr == "else")<br>

+          return tok_else;<br>

+        return tok_identifier;<br>

+<br>

+AST Extensions for If/Then/Else<br>

+-------------------------------<br>

+<br>

+To represent the new expression we add a new AST node for it:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// IfExprAST - Expression class for if/then/else.<br>

+    class IfExprAST : public ExprAST {<br>

+      std::unique_ptr<ExprAST> Cond, Then, Else;<br>

+<br>

+    public:<br>

+      IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,<br>

+                std::unique_ptr<ExprAST> Else)<br>

+        : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}<br>

+      virtual Value *codegen();<br>

+    };<br>

+<br>

+The AST node just has pointers to the various subexpressions.<br>

+<br>

+Parser Extensions for If/Then/Else<br>

+----------------------------------<br>

+<br>

+Now that we have the relevant tokens coming from the lexer and we have<br>

+the AST node to build, our parsing logic is relatively straightforward.<br>

+First we define a new parsing function:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// ifexpr ::= 'if' expression 'then' expression 'else' expression<br>

+    static std::unique_ptr<ExprAST> ParseIfExpr() {<br>

+      getNextToken();  // eat the if.<br>

+<br>

+      // condition.<br>

+      auto Cond = ParseExpression();<br>

+      if (!Cond)<br>

+        return nullptr;<br>

+<br>

+      if (CurTok != tok_then)<br>

+        return LogError("expected then");<br>

+      getNextToken();  // eat the then<br>

+<br>

+      auto Then = ParseExpression();<br>

+      if (!Then)<br>

+        return nullptr;<br>

+<br>

+      if (CurTok != tok_else)<br>

+        return LogError("expected else");<br>

+<br>

+      getNextToken();<br>

+<br>

+      auto Else = ParseExpression();<br>

+      if (!Else)<br>

+        return nullptr;<br>

+<br>

+      return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),<br>

+                                          std::move(Else));<br>

+    }<br>

+<br>

+Next we hook it up as a primary expression:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static std::unique_ptr<ExprAST> ParsePrimary() {<br>

+      switch (CurTok) {<br>

+      default:<br>

+        return LogError("unknown token when expecting an expression");<br>

+      case tok_identifier:<br>

+        return ParseIdentifierExpr();<br>

+      case tok_number:<br>

+        return ParseNumberExpr();<br>

+      case '(':<br>

+        return ParseParenExpr();<br>

+      case tok_if:<br>

+        return ParseIfExpr();<br>

+      }<br>

+    }<br>

+<br>

+LLVM IR for If/Then/Else<br>

+------------------------<br>

+<br>

+Now that we have it parsing and building the AST, the final piece is<br>

+adding LLVM code generation support. This is the most interesting part<br>

+of the if/then/else example, because this is where it starts to<br>

+introduce new concepts. All of the code above has been thoroughly<br>

+described in previous chapters.<br>

+<br>

+To motivate the code we want to produce, lets take a look at a simple<br>

+example. Consider:<br>

+<br>

+::<br>

+<br>

+    extern foo();<br>

+    extern bar();<br>

+    def baz(x) if x then foo() else bar();<br>

+<br>

+If you disable optimizations, the code you'll (soon) get from<br>

+Kaleidoscope looks like this:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    declare double @foo()<br>

+<br>

+    declare double @bar()<br>

+<br>

+    define double @baz(double %x) {<br>

+    entry:<br>

+      %ifcond = fcmp one double %x, 0.000000e+00<br>

+      br i1 %ifcond, label %then, label %else<br>

+<br>

+    then:       ; preds = %entry<br>

+      %calltmp = call double @foo()<br>

+      br label %ifcont<br>

+<br>

+    else:       ; preds = %entry<br>

+      %calltmp1 = call double @bar()<br>

+      br label %ifcont<br>

+<br>

+    ifcont:     ; preds = %else, %then<br>

+      %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]<br>

+      ret double %iftmp<br>

+    }<br>

+<br>

+To visualize the control flow graph, you can use a nifty feature of the<br>

+LLVM '`opt <<a href="http://llvm.org/cmds/opt.html" rel="noreferrer" target="_blank">http://llvm.org/cmds/opt.html</a>>`_' tool. If you put this LLVM<br>

+IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a<br>

+window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll<br>

+see this graph:<br>

+<br>

+.. figure:: LangImpl05-cfg.png<br>

+   :align: center<br>

+   :alt: Example CFG<br>

+<br>

+   Example CFG<br>

+<br>

+Another way to get this is to call "``F->viewCFG()``" or<br>

+"``F->viewCFGOnly()``" (where F is a "``Function*``") either by<br>

+inserting actual calls into the code and recompiling or by calling these<br>

+in the debugger. LLVM has many nice features for visualizing various<br>

+graphs.<br>

+<br>

+Getting back to the generated code, it is fairly simple: the entry block<br>

+evaluates the conditional expression ("x" in our case here) and compares<br>

+the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered<br>

+and Not Equal"). Based on the result of this expression, the code jumps<br>

+to either the "then" or "else" blocks, which contain the expressions for<br>

+the true/false cases.<br>

+<br>

+Once the then/else blocks are finished executing, they both branch back<br>

+to the 'ifcont' block to execute the code that happens after the<br>

+if/then/else. In this case the only thing left to do is to return to the<br>

+caller of the function. The question then becomes: how does the code<br>

+know which expression to return?<br>

+<br>

+The answer to this question involves an important SSA operation: the<br>

+`Phi<br>

+operation <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_.<br>

+If you're not familiar with SSA, `the wikipedia<br>

+article <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_<br>

+is a good introduction and there are various other introductions to it<br>

+available on your favorite search engine. The short version is that<br>

+"execution" of the Phi operation requires "remembering" which block<br>

+control came from. The Phi operation takes on the value corresponding to<br>

+the input control block. In this case, if control comes in from the<br>

+"then" block, it gets the value of "calltmp". If control comes from the<br>

+"else" block, it gets the value of "calltmp1".<br>

+<br>

+At this point, you are probably starting to think "Oh no! This means my<br>

+simple and elegant front-end will have to start generating SSA form in<br>

+order to use LLVM!". Fortunately, this is not the case, and we strongly<br>

+advise *not* implementing an SSA construction algorithm in your<br>

+front-end unless there is an amazingly good reason to do so. In<br>

+practice, there are two sorts of values that float around in code<br>

+written for your average imperative programming language that might need<br>

+Phi nodes:<br>

+<br>

+#. Code that involves user variables: ``x = 1; x = x + 1;``<br>

+#. Values that are implicit in the structure of your AST, such as the<br>

+   Phi node in this case.<br>

+<br>

+In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"),<br>

+we'll talk about #1 in depth. For now, just believe me that you don't<br>

+need SSA construction to handle this case. For #2, you have the choice<br>

+of using the techniques that we will describe for #1, or you can insert<br>

+Phi nodes directly, if convenient. In this case, it is really<br>

+easy to generate the Phi node, so we choose to do it directly.<br>

+<br>

+Okay, enough of the motivation and overview, lets generate code!<br>

+<br>

+Code Generation for If/Then/Else<br>

+--------------------------------<br>

+<br>

+In order to generate code for this, we implement the ``codegen`` method<br>

+for ``IfExprAST``:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *IfExprAST::codegen() {<br>

+      Value *CondV = Cond->codegen();<br>

+      if (!CondV)<br>

+        return nullptr;<br>

+<br>

+      // Convert condition to a bool by comparing equal to 0.0.<br>

+      CondV = Builder.CreateFCmpONE(<br>

+          CondV, ConstantFP::get(LLVMContext, APFloat(0.0)), "ifcond");<br>

+<br>

+This code is straightforward and similar to what we saw before. We emit<br>

+the expression for the condition, then compare that value to zero to get<br>

+a truth value as a 1-bit (bool) value.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+<br>

+      // Create blocks for the then and else cases.  Insert the 'then' block at the<br>

+      // end of the function.<br>

+      BasicBlock *ThenBB =<br>

+          BasicBlock::Create(LLVMContext, "then", TheFunction);<br>

+      BasicBlock *ElseBB = BasicBlock::Create(LLVMContext, "else");<br>

+      BasicBlock *MergeBB = BasicBlock::Create(LLVMContext, "ifcont");<br>

+<br>

+      Builder.CreateCondBr(CondV, ThenBB, ElseBB);<br>

+<br>

+This code creates the basic blocks that are related to the if/then/else<br>

+statement, and correspond directly to the blocks in the example above.<br>

+The first line gets the current Function object that is being built. It<br>

+gets this by asking the builder for the current BasicBlock, and asking<br>

+that block for its "parent" (the function it is currently embedded<br>

+into).<br>

+<br>

+Once it has that, it creates three blocks. Note that it passes<br>

+"TheFunction" into the constructor for the "then" block. This causes the<br>

+constructor to automatically insert the new block into the end of the<br>

+specified function. The other two blocks are created, but aren't yet<br>

+inserted into the function.<br>

+<br>

+Once the blocks are created, we can emit the conditional branch that<br>

+chooses between them. Note that creating new blocks does not implicitly<br>

+affect the IRBuilder, so it is still inserting into the block that the<br>

+condition went into. Also note that it is creating a branch to the<br>

+"then" block and the "else" block, even though the "else" block isn't<br>

+inserted into the function yet. This is all ok: it is the standard way<br>

+that LLVM supports forward references.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Emit then value.<br>

+      Builder.SetInsertPoint(ThenBB);<br>

+<br>

+      Value *ThenV = Then->codegen();<br>

+      if (!ThenV)<br>

+        return nullptr;<br>

+<br>

+      Builder.CreateBr(MergeBB);<br>

+      // Codegen of 'Then' can change the current block, update ThenBB for the PHI.<br>

+      ThenBB = Builder.GetInsertBlock();<br>

+<br>

+After the conditional branch is inserted, we move the builder to start<br>

+inserting into the "then" block. Strictly speaking, this call moves the<br>

+insertion point to be at the end of the specified block. However, since<br>

+the "then" block is empty, it also starts out by inserting at the<br>

+beginning of the block. :)<br>

+<br>

+Once the insertion point is set, we recursively codegen the "then"<br>

+expression from the AST. To finish off the "then" block, we create an<br>

+unconditional branch to the merge block. One interesting (and very<br>

+important) aspect of the LLVM IR is that it `requires all basic blocks<br>

+to be "terminated" <../LangRef.html#functionstructure>`_ with a `control<br>

+flow instruction <../LangRef.html#terminators>`_ such as return or<br>

+branch. This means that all control flow, *including fall throughs* must<br>

+be made explicit in the LLVM IR. If you violate this rule, the verifier<br>

+will emit an error.<br>

+<br>

+The final line here is quite subtle, but is very important. The basic<br>

+issue is that when we create the Phi node in the merge block, we need to<br>

+set up the block/value pairs that indicate how the Phi will work.<br>

+Importantly, the Phi node expects to have an entry for each predecessor<br>

+of the block in the CFG. Why then, are we getting the current block when<br>

+we just set it to ThenBB 5 lines above? The problem is that the "Then"<br>

+expression may actually itself change the block that the Builder is<br>

+emitting into if, for example, it contains a nested "if/then/else"<br>

+expression. Because calling ``codegen()`` recursively could arbitrarily change<br>

+the notion of the current block, we are required to get an up-to-date<br>

+value for code that will set up the Phi node.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Emit else block.<br>

+      TheFunction->getBasicBlockList().push_back(ElseBB);<br>

+      Builder.SetInsertPoint(ElseBB);<br>

+<br>

+      Value *ElseV = Else->codegen();<br>

+      if (!ElseV)<br>

+        return nullptr;<br>

+<br>

+      Builder.CreateBr(MergeBB);<br>

+      // codegen of 'Else' can change the current block, update ElseBB for the PHI.<br>

+      ElseBB = Builder.GetInsertBlock();<br>

+<br>

+Code generation for the 'else' block is basically identical to codegen<br>

+for the 'then' block. The only significant difference is the first line,<br>

+which adds the 'else' block to the function. Recall previously that the<br>

+'else' block was created, but not added to the function. Now that the<br>

+'then' and 'else' blocks are emitted, we can finish up with the merge<br>

+code:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Emit merge block.<br>

+      TheFunction->getBasicBlockList().push_back(MergeBB);<br>

+      Builder.SetInsertPoint(MergeBB);<br>

+      PHINode *PN =<br>

+        Builder.CreatePHI(Type::getDoubleTy(LLVMContext), 2, "iftmp");<br>

+<br>

+      PN->addIncoming(ThenV, ThenBB);<br>

+      PN->addIncoming(ElseV, ElseBB);<br>

+      return PN;<br>

+    }<br>

+<br>

+The first two lines here are now familiar: the first adds the "merge"<br>

+block to the Function object (it was previously floating, like the else<br>

+block above). The second changes the insertion point so that newly<br>

+created code will go into the "merge" block. Once that is done, we need<br>

+to create the PHI node and set up the block/value pairs for the PHI.<br>

+<br>

+Finally, the CodeGen function returns the phi node as the value computed<br>

+by the if/then/else expression. In our example above, this returned<br>

+value will feed into the code for the top-level function, which will<br>

+create the return instruction.<br>

+<br>

+Overall, we now have the ability to execute conditional code in<br>

+Kaleidoscope. With this extension, Kaleidoscope is a fairly complete<br>

+language that can calculate a wide variety of numeric functions. Next up<br>

+we'll add another useful expression that is familiar from non-functional<br>

+languages...<br>

+<br>

+'for' Loop Expression<br>

+=====================<br>

+<br>

+Now that we know how to add basic control flow constructs to the<br>

+language, we have the tools to add more powerful things. Lets add<br>

+something more aggressive, a 'for' expression:<br>

+<br>

+::<br>

+<br>

+     extern putchard(char)<br>

+     def printstar(n)<br>

+       for i = 1, i < n, 1.0 in<br>

+         putchard(42);  # ascii 42 = '*'<br>

+<br>

+     # print 100 '*' characters<br>

+     printstar(100);<br>

+<br>

+This expression defines a new variable ("i" in this case) which iterates<br>

+from a starting value, while the condition ("i < n" in this case) is<br>

+true, incrementing by an optional step value ("1.0" in this case). If<br>

+the step value is omitted, it defaults to 1.0. While the loop is true,<br>

+it executes its body expression. Because we don't have anything better<br>

+to return, we'll just define the loop as always returning 0.0. In the<br>

+future when we have mutable variables, it will get more useful.<br>

+<br>

+As before, lets talk about the changes that we need to Kaleidoscope to<br>

+support this.<br>

+<br>

+Lexer Extensions for the 'for' Loop<br>

+-----------------------------------<br>

+<br>

+The lexer extensions are the same sort of thing as for if/then/else:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      ... in enum Token ...<br>

+      // control<br>

+      tok_if = -6, tok_then = -7, tok_else = -8,<br>

+      tok_for = -9, tok_in = -10<br>

+<br>

+      ... in gettok ...<br>

+      if (IdentifierStr == "def")<br>

+        return tok_def;<br>

+      if (IdentifierStr == "extern")<br>

+        return tok_extern;<br>

+      if (IdentifierStr == "if")<br>

+        return tok_if;<br>

+      if (IdentifierStr == "then")<br>

+        return tok_then;<br>

+      if (IdentifierStr == "else")<br>

+        return tok_else;<br>

+      if (IdentifierStr == "for")<br>

+        return tok_for;<br>

+      if (IdentifierStr == "in")<br>

+        return tok_in;<br>

+      return tok_identifier;<br>

+<br>

+AST Extensions for the 'for' Loop<br>

+---------------------------------<br>

+<br>

+The AST node is just as simple. It basically boils down to capturing the<br>

+variable name and the constituent expressions in the node.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// ForExprAST - Expression class for for/in.<br>

+    class ForExprAST : public ExprAST {<br>

+      std::string VarName;<br>

+      std::unique_ptr<ExprAST> Start, End, Step, Body;<br>

+<br>

+    public:<br>

+      ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,<br>

+                 std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,<br>

+                 std::unique_ptr<ExprAST> Body)<br>

+        : VarName(VarName), Start(std::move(Start)), End(std::move(End)),<br>

+          Step(std::move(Step)), Body(std::move(Body)) {}<br>

+      virtual Value *codegen();<br>

+    };<br>

+<br>

+Parser Extensions for the 'for' Loop<br>

+------------------------------------<br>

+<br>

+The parser code is also fairly standard. The only interesting thing here<br>

+is handling of the optional step value. The parser code handles it by<br>

+checking to see if the second comma is present. If not, it sets the step<br>

+value to null in the AST node:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression<br>

+    static std::unique_ptr<ExprAST> ParseForExpr() {<br>

+      getNextToken();  // eat the for.<br>

+<br>

+      if (CurTok != tok_identifier)<br>

+        return LogError("expected identifier after for");<br>

+<br>

+      std::string IdName = IdentifierStr;<br>

+      getNextToken();  // eat identifier.<br>

+<br>

+      if (CurTok != '=')<br>

+        return LogError("expected '=' after for");<br>

+      getNextToken();  // eat '='.<br>

+<br>

+<br>

+      auto Start = ParseExpression();<br>

+      if (!Start)<br>

+        return nullptr;<br>

+      if (CurTok != ',')<br>

+        return LogError("expected ',' after for start value");<br>

+      getNextToken();<br>

+<br>

+      auto End = ParseExpression();<br>

+      if (!End)<br>

+        return nullptr;<br>

+<br>

+      // The step value is optional.<br>

+      std::unique_ptr<ExprAST> Step;<br>

+      if (CurTok == ',') {<br>

+        getNextToken();<br>

+        Step = ParseExpression();<br>

+        if (!Step)<br>

+          return nullptr;<br>

+      }<br>

+<br>

+      if (CurTok != tok_in)<br>

+        return LogError("expected 'in' after for");<br>

+      getNextToken();  // eat 'in'.<br>

+<br>

+      auto Body = ParseExpression();<br>

+      if (!Body)<br>

+        return nullptr;<br>

+<br>

+      return llvm::make_unique<ForExprAST>(IdName, std::move(Start),<br>

+                                           std::move(End), std::move(Step),<br>

+                                           std::move(Body));<br>

+    }<br>

+<br>

+LLVM IR for the 'for' Loop<br>

+--------------------------<br>

+<br>

+Now we get to the good part: the LLVM IR we want to generate for this<br>

+thing. With the simple example above, we get this LLVM IR (note that<br>

+this dump is generated with optimizations disabled for clarity):<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    declare double @putchard(double)<br>

+<br>

+    define double @printstar(double %n) {<br>

+    entry:<br>

+      ; initial value = 1.0 (inlined into phi)<br>

+      br label %loop<br>

+<br>

+    loop:       ; preds = %loop, %entry<br>

+      %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]<br>

+      ; body<br>

+      %calltmp = call double @putchard(double 4.200000e+01)<br>

+      ; increment<br>

+      %nextvar = fadd double %i, 1.000000e+00<br>

+<br>

+      ; termination test<br>

+      %cmptmp = fcmp ult double %i, %n<br>

+      %booltmp = uitofp i1 %cmptmp to double<br>

+      %loopcond = fcmp one double %booltmp, 0.000000e+00<br>

+      br i1 %loopcond, label %loop, label %afterloop<br>

+<br>

+    afterloop:      ; preds = %loop<br>

+      ; loop always returns 0.0<br>

+      ret double 0.000000e+00<br>

+    }<br>

+<br>

+This loop contains all the same constructs we saw before: a phi node,<br>

+several expressions, and some basic blocks. Lets see how this fits<br>

+together.<br>

+<br>

+Code Generation for the 'for' Loop<br>

+----------------------------------<br>

+<br>

+The first part of codegen is very simple: we just output the start<br>

+expression for the loop value:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *ForExprAST::codegen() {<br>

+      // Emit the start code first, without 'variable' in scope.<br>

+      Value *StartVal = Start->codegen();<br>

+      if (StartVal == 0) return 0;<br>

+<br>

+With this out of the way, the next step is to set up the LLVM basic<br>

+block for the start of the loop body. In the case above, the whole loop<br>

+body is one block, but remember that the body code itself could consist<br>

+of multiple blocks (e.g. if it contains an if/then/else or a for/in<br>

+expression).<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Make the new basic block for the loop header, inserting after current<br>

+      // block.<br>

+      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+      BasicBlock *PreheaderBB = Builder.GetInsertBlock();<br>

+      BasicBlock *LoopBB =<br>

+          BasicBlock::Create(LLVMContext, "loop", TheFunction);<br>

+<br>

+      // Insert an explicit fall through from the current block to the LoopBB.<br>

+      Builder.CreateBr(LoopBB);<br>

+<br>

+This code is similar to what we saw for if/then/else. Because we will<br>

+need it to create the Phi node, we remember the block that falls through<br>

+into the loop. Once we have that, we create the actual block that starts<br>

+the loop and create an unconditional branch for the fall-through between<br>

+the two blocks.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Start insertion in LoopBB.<br>

+      Builder.SetInsertPoint(LoopBB);<br>

+<br>

+      // Start the PHI node with an entry for Start.<br>

+      PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(LLVMContext),<br>

+                                            2, VarName.c_str());<br>

+      Variable->addIncoming(StartVal, PreheaderBB);<br>

+<br>

+Now that the "preheader" for the loop is set up, we switch to emitting<br>

+code for the loop body. To begin with, we move the insertion point and<br>

+create the PHI node for the loop induction variable. Since we already<br>

+know the incoming value for the starting value, we add it to the Phi<br>

+node. Note that the Phi will eventually get a second value for the<br>

+backedge, but we can't set it up yet (because it doesn't exist!).<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Within the loop, the variable is defined equal to the PHI node.  If it<br>

+      // shadows an existing variable, we have to restore it, so save it now.<br>

+      Value *OldVal = NamedValues[VarName];<br>

+      NamedValues[VarName] = Variable;<br>

+<br>

+      // Emit the body of the loop.  This, like any other expr, can change the<br>

+      // current BB.  Note that we ignore the value computed by the body, but don't<br>

+      // allow an error.<br>

+      if (!Body->codegen())<br>

+        return nullptr;<br>

+<br>

+Now the code starts to get more interesting. Our 'for' loop introduces a<br>

+new variable to the symbol table. This means that our symbol table can<br>

+now contain either function arguments or loop variables. To handle this,<br>

+before we codegen the body of the loop, we add the loop variable as the<br>

+current value for its name. Note that it is possible that there is a<br>

+variable of the same name in the outer scope. It would be easy to make<br>

+this an error (emit an error and return null if there is already an<br>

+entry for VarName) but we choose to allow shadowing of variables. In<br>

+order to handle this correctly, we remember the Value that we are<br>

+potentially shadowing in ``OldVal`` (which will be null if there is no<br>

+shadowed variable).<br>

+<br>

+Once the loop variable is set into the symbol table, the code<br>

+recursively codegen's the body. This allows the body to use the loop<br>

+variable: any references to it will naturally find it in the symbol<br>

+table.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Emit the step value.<br>

+      Value *StepVal = nullptr;<br>

+      if (Step) {<br>

+        StepVal = Step->codegen();<br>

+        if (!StepVal)<br>

+          return nullptr;<br>

+      } else {<br>

+        // If not specified, use 1.0.<br>

+        StepVal = ConstantFP::get(LLVMContext, APFloat(1.0));<br>

+      }<br>

+<br>

+      Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");<br>

+<br>

+Now that the body is emitted, we compute the next value of the iteration<br>

+variable by adding the step value, or 1.0 if it isn't present.<br>

+'``NextVar``' will be the value of the loop variable on the next<br>

+iteration of the loop.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Compute the end condition.<br>

+      Value *EndCond = End->codegen();<br>

+      if (!EndCond)<br>

+        return nullptr;<br>

+<br>

+      // Convert condition to a bool by comparing equal to 0.0.<br>

+      EndCond = Builder.CreateFCmpONE(<br>

+          EndCond, ConstantFP::get(LLVMContext, APFloat(0.0)), "loopcond");<br>

+<br>

+Finally, we evaluate the exit value of the loop, to determine whether<br>

+the loop should exit. This mirrors the condition evaluation for the<br>

+if/then/else statement.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Create the "after loop" block and insert it.<br>

+      BasicBlock *LoopEndBB = Builder.GetInsertBlock();<br>

+      BasicBlock *AfterBB =<br>

+          BasicBlock::Create(LLVMContext, "afterloop", TheFunction);<br>

+<br>

+      // Insert the conditional branch into the end of LoopEndBB.<br>

+      Builder.CreateCondBr(EndCond, LoopBB, AfterBB);<br>

+<br>

+      // Any new code will be inserted in AfterBB.<br>

+      Builder.SetInsertPoint(AfterBB);<br>

+<br>

+With the code for the body of the loop complete, we just need to finish<br>

+up the control flow for it. This code remembers the end block (for the<br>

+phi node), then creates the block for the loop exit ("afterloop"). Based<br>

+on the value of the exit condition, it creates a conditional branch that<br>

+chooses between executing the loop again and exiting the loop. Any<br>

+future code is emitted in the "afterloop" block, so it sets the<br>

+insertion position to it.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Add a new entry to the PHI node for the backedge.<br>

+      Variable->addIncoming(NextVar, LoopEndBB);<br>

+<br>

+      // Restore the unshadowed variable.<br>

+      if (OldVal)<br>

+        NamedValues[VarName] = OldVal;<br>

+      else<br>

+        NamedValues.erase(VarName);<br>

+<br>

+      // for expr always returns 0.0.<br>

+      return Constant::getNullValue(Type::getDoubleTy(LLVMContext));<br>

+    }<br>

+<br>

+The final code handles various cleanups: now that we have the "NextVar"<br>

+value, we can add the incoming value to the loop PHI node. After that,<br>

+we remove the loop variable from the symbol table, so that it isn't in<br>

+scope after the for loop. Finally, code generation of the for loop<br>

+always returns 0.0, so that is what we return from<br>

+``ForExprAST::codegen()``.<br>

+<br>

+With this, we conclude the "adding control flow to Kaleidoscope" chapter<br>

+of the tutorial. In this chapter we added two control flow constructs,<br>

+and used them to motivate a couple of aspects of the LLVM IR that are<br>

+important for front-end implementors to know. In the next chapter of our<br>

+saga, we will get a bit crazier and add `user-defined<br>

+operators <LangImpl6.html>`_ to our poor innocent language.<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for our running example, enhanced with<br>

+the if/then/else and for expressions.. To build this example, use:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

+    # Run<br>

+    ./toy<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter5/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Extending the language: user-defined operators <LangImpl06.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl06.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl06.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl06.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl06.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl06.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,768 @@<br>

+============================================================<br>

+Kaleidoscope: Extending the Language: User-defined Operators<br>

+============================================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 6 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 6 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. At this point in our tutorial, we now<br>

+have a fully functional language that is fairly minimal, but also<br>

+useful. There is still one big problem with it, however. Our language<br>

+doesn't have many useful operators (like division, logical negation, or<br>

+even any comparisons besides less-than).<br>

+<br>

+This chapter of the tutorial takes a wild digression into adding<br>

+user-defined operators to the simple and beautiful Kaleidoscope<br>

+language. This digression now gives us a simple and ugly language in<br>

+some ways, but also a powerful one at the same time. One of the great<br>

+things about creating your own language is that you get to decide what<br>

+is good or bad. In this tutorial we'll assume that it is okay to use<br>

+this as a way to show some interesting parsing techniques.<br>

+<br>

+At the end of this tutorial, we'll run through an example Kaleidoscope<br>

+application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an<br>

+example of what you can build with Kaleidoscope and its feature set.<br>

+<br>

+User-defined Operators: the Idea<br>

+================================<br>

+<br>

+The "operator overloading" that we will add to Kaleidoscope is more<br>

+general than languages like C++. In C++, you are only allowed to<br>

+redefine existing operators: you can't programatically change the<br>

+grammar, introduce new operators, change precedence levels, etc. In this<br>

+chapter, we will add this capability to Kaleidoscope, which will let the<br>

+user round out the set of operators that are supported.<br>

+<br>

+The point of going into user-defined operators in a tutorial like this<br>

+is to show the power and flexibility of using a hand-written parser.<br>

+Thus far, the parser we have been implementing uses recursive descent<br>

+for most parts of the grammar and operator precedence parsing for the<br>

+expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without<br>

+using operator precedence parsing, it would be very difficult to allow<br>

+the programmer to introduce new operators into the grammar: the grammar<br>

+is dynamically extensible as the JIT runs.<br>

+<br>

+The two specific features we'll add are programmable unary operators<br>

+(right now, Kaleidoscope has no unary operators at all) as well as<br>

+binary operators. An example of this is:<br>

+<br>

+::<br>

+<br>

+    # Logical unary not.<br>

+    def unary!(v)<br>

+      if v then<br>

+        0<br>

+      else<br>

+        1;<br>

+<br>

+    # Define > with the same precedence as <.<br>

+    def binary> 10 (LHS RHS)<br>

+      RHS < LHS;<br>

+<br>

+    # Binary "logical or", (note that it does not "short circuit")<br>

+    def binary| 5 (LHS RHS)<br>

+      if LHS then<br>

+        1<br>

+      else if RHS then<br>

+        1<br>

+      else<br>

+        0;<br>

+<br>

+    # Define = with slightly lower precedence than relationals.<br>

+    def binary= 9 (LHS RHS)<br>

+      !(LHS < RHS | LHS > RHS);<br>

+<br>

+Many languages aspire to being able to implement their standard runtime<br>

+library in the language itself. In Kaleidoscope, we can implement<br>

+significant parts of the language in the library!<br>

+<br>

+We will break down implementation of these features into two parts:<br>

+implementing support for user-defined binary operators and adding unary<br>

+operators.<br>

+<br>

+User-defined Binary Operators<br>

+=============================<br>

+<br>

+Adding support for user-defined binary operators is pretty simple with<br>

+our current framework. We'll first add support for the unary/binary<br>

+keywords:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    enum Token {<br>

+      ...<br>

+      // operators<br>

+      tok_binary = -11,<br>

+      tok_unary = -12<br>

+    };<br>

+    ...<br>

+    static int gettok() {<br>

+    ...<br>

+        if (IdentifierStr == "for")<br>

+          return tok_for;<br>

+        if (IdentifierStr == "in")<br>

+          return tok_in;<br>

+        if (IdentifierStr == "binary")<br>

+          return tok_binary;<br>

+        if (IdentifierStr == "unary")<br>

+          return tok_unary;<br>

+        return tok_identifier;<br>

+<br>

+This just adds lexer support for the unary and binary keywords, like we<br>

+did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing<br>

+about our current AST, is that we represent binary operators with full<br>

+generalisation by using their ASCII code as the opcode. For our extended<br>

+operators, we'll use this same representation, so we don't need any new<br>

+AST or parser support.<br>

+<br>

+On the other hand, we have to be able to represent the definitions of<br>

+these new operators, in the "def binary\| 5" part of the function<br>

+definition. In our grammar so far, the "name" for the function<br>

+definition is parsed as the "prototype" production and into the<br>

+``PrototypeAST`` AST node. To represent our new user-defined operators<br>

+as prototypes, we have to extend the ``PrototypeAST`` AST node like<br>

+this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// PrototypeAST - This class represents the "prototype" for a function,<br>

+    /// which captures its argument names as well as if it is an operator.<br>

+    class PrototypeAST {<br>

+      std::string Name;<br>

+      std::vector<std::string> Args;<br>

+      bool IsOperator;<br>

+      unsigned Precedence;  // Precedence if a binary op.<br>

+<br>

+    public:<br>

+      PrototypeAST(const std::string &name, std::vector<std::string> Args,<br>

+                   bool IsOperator = false, unsigned Prec = 0)<br>

+      : Name(name), Args(std::move(Args)), IsOperator(IsOperator),<br>

+        Precedence(Prec) {}<br>

+<br>

+      bool isUnaryOp() const { return IsOperator && Args.size() == 1; }<br>

+      bool isBinaryOp() const { return IsOperator && Args.size() == 2; }<br>

+<br>

+      char getOperatorName() const {<br>

+        assert(isUnaryOp() || isBinaryOp());<br>

+        return Name[Name.size()-1];<br>

+      }<br>

+<br>

+      unsigned getBinaryPrecedence() const { return Precedence; }<br>

+<br>

+      Function *codegen();<br>

+    };<br>

+<br>

+Basically, in addition to knowing a name for the prototype, we now keep<br>

+track of whether it was an operator, and if it was, what precedence<br>

+level the operator is at. The precedence is only used for binary<br>

+operators (as you'll see below, it just doesn't apply for unary<br>

+operators). Now that we have a way to represent the prototype for a<br>

+user-defined operator, we need to parse it:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// prototype<br>

+    ///   ::= id '(' id* ')'<br>

+    ///   ::= binary LETTER number? (id, id)<br>

+    static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

+      std::string FnName;<br>

+<br>

+      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.<br>

+      unsigned BinaryPrecedence = 30;<br>

+<br>

+      switch (CurTok) {<br>

+      default:<br>

+        return LogErrorP("Expected function name in prototype");<br>

+      case tok_identifier:<br>

+        FnName = IdentifierStr;<br>

+        Kind = 0;<br>

+        getNextToken();<br>

+        break;<br>

+      case tok_binary:<br>

+        getNextToken();<br>

+        if (!isascii(CurTok))<br>

+          return LogErrorP("Expected binary operator");<br>

+        FnName = "binary";<br>

+        FnName += (char)CurTok;<br>

+        Kind = 2;<br>

+        getNextToken();<br>

+<br>

+        // Read the precedence if present.<br>

+        if (CurTok == tok_number) {<br>

+          if (NumVal < 1 || NumVal > 100)<br>

+            return LogErrorP("Invalid precedecnce: must be 1..100");<br>

+          BinaryPrecedence = (unsigned)NumVal;<br>

+          getNextToken();<br>

+        }<br>

+        break;<br>

+      }<br>

+<br>

+      if (CurTok != '(')<br>

+        return LogErrorP("Expected '(' in prototype");<br>

+<br>

+      std::vector<std::string> ArgNames;<br>

+      while (getNextToken() == tok_identifier)<br>

+        ArgNames.push_back(IdentifierStr);<br>

+      if (CurTok != ')')<br>

+        return LogErrorP("Expected ')' in prototype");<br>

+<br>

+      // success.<br>

+      getNextToken();  // eat ')'.<br>

+<br>

+      // Verify right number of names for operator.<br>

+      if (Kind && ArgNames.size() != Kind)<br>

+        return LogErrorP("Invalid number of operands for operator");<br>

+<br>

+      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,<br>

+                                             BinaryPrecedence);<br>

+    }<br>

+<br>

+This is all fairly straightforward parsing code, and we have already<br>

+seen a lot of similar code in the past. One interesting part about the<br>

+code above is the couple lines that set up ``FnName`` for binary<br>

+operators. This builds names like "binary@" for a newly defined "@"<br>

+operator. This then takes advantage of the fact that symbol names in the<br>

+LLVM symbol table are allowed to have any character in them, including<br>

+embedded nul characters.<br>

+<br>

+The next interesting thing to add, is codegen support for these binary<br>

+operators. Given our current structure, this is a simple addition of a<br>

+default case for our existing binary operator node:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *BinaryExprAST::codegen() {<br>

+      Value *L = LHS->codegen();<br>

+      Value *R = RHS->codegen();<br>

+      if (!L || !R)<br>

+        return nullptr;<br>

+<br>

+      switch (Op) {<br>

+      case '+':<br>

+        return Builder.CreateFAdd(L, R, "addtmp");<br>

+      case '-':<br>

+        return Builder.CreateFSub(L, R, "subtmp");<br>

+      case '*':<br>

+        return Builder.CreateFMul(L, R, "multmp");<br>

+      case '<':<br>

+        L = Builder.CreateFCmpULT(L, R, "cmptmp");<br>

+        // Convert bool 0/1 to double 0.0 or 1.0<br>

+        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),<br>

+                                    "booltmp");<br>

+      default:<br>

+        break;<br>

+      }<br>

+<br>

+      // If it wasn't a builtin binary operator, it must be a user defined one. Emit<br>

+      // a call to it.<br>

+      Function *F = TheModule->getFunction(std::string("binary") + Op);<br>

+      assert(F && "binary operator not found!");<br>

+<br>

+      Value *Ops[2] = { L, R };<br>

+      return Builder.CreateCall(F, Ops, "binop");<br>

+    }<br>

+<br>

+As you can see above, the new code is actually really simple. It just<br>

+does a lookup for the appropriate operator in the symbol table and<br>

+generates a function call to it. Since user-defined operators are just<br>

+built as normal functions (because the "prototype" boils down to a<br>

+function with the right name) everything falls into place.<br>

+<br>

+The final piece of code we are missing, is a bit of top-level magic:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Function *FunctionAST::codegen() {<br>

+      NamedValues.clear();<br>

+<br>

+      Function *TheFunction = Proto->codegen();<br>

+      if (!TheFunction)<br>

+        return nullptr;<br>

+<br>

+      // If this is an operator, install it.<br>

+      if (Proto->isBinaryOp())<br>

+        BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();<br>

+<br>

+      // Create a new basic block to start insertion into.<br>

+      BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);<br>

+      Builder.SetInsertPoint(BB);<br>

+<br>

+      if (Value *RetVal = Body->codegen()) {<br>

+        ...<br>

+<br>

+Basically, before codegening a function, if it is a user-defined<br>

+operator, we register it in the precedence table. This allows the binary<br>

+operator parsing logic we already have in place to handle it. Since we<br>

+are working on a fully-general operator precedence parser, this is all<br>

+we need to do to "extend the grammar".<br>

+<br>

+Now we have useful user-defined binary operators. This builds a lot on<br>

+the previous framework we built for other operators. Adding unary<br>

+operators is a bit more challenging, because we don't have any framework<br>

+for it yet - lets see what it takes.<br>

+<br>

+User-defined Unary Operators<br>

+============================<br>

+<br>

+Since we don't currently support unary operators in the Kaleidoscope<br>

+language, we'll need to add everything to support them. Above, we added<br>

+simple support for the 'unary' keyword to the lexer. In addition to<br>

+that, we need an AST node:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// UnaryExprAST - Expression class for a unary operator.<br>

+    class UnaryExprAST : public ExprAST {<br>

+      char Opcode;<br>

+      std::unique_ptr<ExprAST> Operand;<br>

+<br>

+    public:<br>

+      UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)<br>

+        : Opcode(Opcode), Operand(std::move(Operand)) {}<br>

+      virtual Value *codegen();<br>

+    };<br>

+<br>

+This AST node is very simple and obvious by now. It directly mirrors the<br>

+binary operator AST node, except that it only has one child. With this,<br>

+we need to add the parsing logic. Parsing a unary operator is pretty<br>

+simple: we'll add a new function to do it:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// unary<br>

+    ///   ::= primary<br>

+    ///   ::= '!' unary<br>

+    static std::unique_ptr<ExprAST> ParseUnary() {<br>

+      // If the current token is not an operator, it must be a primary expr.<br>

+      if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')<br>

+        return ParsePrimary();<br>

+<br>

+      // If this is a unary operator, read it.<br>

+      int Opc = CurTok;<br>

+      getNextToken();<br>

+      if (auto Operand = ParseUnary())<br>

+        return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));<br>

+      return nullptr;<br>

+    }<br>

+<br>

+The grammar we add is pretty straightforward here. If we see a unary<br>

+operator when parsing a primary operator, we eat the operator as a<br>

+prefix and parse the remaining piece as another unary operator. This<br>

+allows us to handle multiple unary operators (e.g. "!!x"). Note that<br>

+unary operators can't have ambiguous parses like binary operators can,<br>

+so there is no need for precedence information.<br>

+<br>

+The problem with this function, is that we need to call ParseUnary from<br>

+somewhere. To do this, we change previous callers of ParsePrimary to<br>

+call ParseUnary instead:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// binoprhs<br>

+    ///   ::= ('+' unary)*<br>

+    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,<br>

+                                                  std::unique_ptr<ExprAST> LHS) {<br>

+      ...<br>

+        // Parse the unary expression after the binary operator.<br>

+        auto RHS = ParseUnary();<br>

+        if (!RHS)<br>

+          return nullptr;<br>

+      ...<br>

+    }<br>

+    /// expression<br>

+    ///   ::= unary binoprhs<br>

+    ///<br>

+    static std::unique_ptr<ExprAST> ParseExpression() {<br>

+      auto LHS = ParseUnary();<br>

+      if (!LHS)<br>

+        return nullptr;<br>

+<br>

+      return ParseBinOpRHS(0, std::move(LHS));<br>

+    }<br>

+<br>

+With these two simple changes, we are now able to parse unary operators<br>

+and build the AST for them. Next up, we need to add parser support for<br>

+prototypes, to parse the unary operator prototype. We extend the binary<br>

+operator code above with:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// prototype<br>

+    ///   ::= id '(' id* ')'<br>

+    ///   ::= binary LETTER number? (id, id)<br>

+    ///   ::= unary LETTER (id)<br>

+    static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

+      std::string FnName;<br>

+<br>

+      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.<br>

+      unsigned BinaryPrecedence = 30;<br>

+<br>

+      switch (CurTok) {<br>

+      default:<br>

+        return LogErrorP("Expected function name in prototype");<br>

+      case tok_identifier:<br>

+        FnName = IdentifierStr;<br>

+        Kind = 0;<br>

+        getNextToken();<br>

+        break;<br>

+      case tok_unary:<br>

+        getNextToken();<br>

+        if (!isascii(CurTok))<br>

+          return LogErrorP("Expected unary operator");<br>

+        FnName = "unary";<br>

+        FnName += (char)CurTok;<br>

+        Kind = 1;<br>

+        getNextToken();<br>

+        break;<br>

+      case tok_binary:<br>

+        ...<br>

+<br>

+As with binary operators, we name unary operators with a name that<br>

+includes the operator character. This assists us at code generation<br>

+time. Speaking of, the final piece we need to add is codegen support for<br>

+unary operators. It looks like this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *UnaryExprAST::codegen() {<br>

+      Value *OperandV = Operand->codegen();<br>

+      if (!OperandV)<br>

+        return nullptr;<br>

+<br>

+      Function *F = TheModule->getFunction(std::string("unary")+Opcode);<br>

+      if (!F)<br>

+        return LogErrorV("Unknown unary operator");<br>

+<br>

+      return Builder.CreateCall(F, OperandV, "unop");<br>

+    }<br>

+<br>

+This code is similar to, but simpler than, the code for binary<br>

+operators. It is simpler primarily because it doesn't need to handle any<br>

+predefined operators.<br>

+<br>

+Kicking the Tires<br>

+=================<br>

+<br>

+It is somewhat hard to believe, but with a few simple extensions we've<br>

+covered in the last chapters, we have grown a real-ish language. With<br>

+this, we can do a lot of interesting things, including I/O, math, and a<br>

+bunch of other things. For example, we can now add a nice sequencing<br>

+operator (printd is defined to print out the specified value and a<br>

+newline):<br>

+<br>

+::<br>

+<br>

+    ready> extern printd(x);<br>

+    Read extern:<br>

+    declare double @printd(double)<br>

+<br>

+    ready> def binary : 1 (x y) 0;  # Low-precedence operator that ignores operands.<br>

+    ..<br>

+    ready> printd(123) : printd(456) : printd(789);<br>

+    123.000000<br>

+    456.000000<br>

+    789.000000<br>

+    Evaluated to 0.000000<br>

+<br>

+We can also define a bunch of other "primitive" operations, such as:<br>

+<br>

+::<br>

+<br>

+    # Logical unary not.<br>

+    def unary!(v)<br>

+      if v then<br>

+        0<br>

+      else<br>

+        1;<br>

+<br>

+    # Unary negate.<br>

+    def unary-(v)<br>

+      0-v;<br>

+<br>

+    # Define > with the same precedence as <.<br>

+    def binary> 10 (LHS RHS)<br>

+      RHS < LHS;<br>

+<br>

+    # Binary logical or, which does not short circuit.<br>

+    def binary| 5 (LHS RHS)<br>

+      if LHS then<br>

+        1<br>

+      else if RHS then<br>

+        1<br>

+      else<br>

+        0;<br>

+<br>

+    # Binary logical and, which does not short circuit.<br>

+    def binary& 6 (LHS RHS)<br>

+      if !LHS then<br>

+        0<br>

+      else<br>

+        !!RHS;<br>

+<br>

+    # Define = with slightly lower precedence than relationals.<br>

+    def binary = 9 (LHS RHS)<br>

+      !(LHS < RHS | LHS > RHS);<br>

+<br>

+    # Define ':' for sequencing: as a low-precedence operator that ignores operands<br>

+    # and just returns the RHS.<br>

+    def binary : 1 (x y) y;<br>

+<br>

+Given the previous if/then/else support, we can also define interesting<br>

+functions for I/O. For example, the following prints out a character<br>

+whose "density" reflects the value passed in: the lower the value, the<br>

+denser the character:<br>

+<br>

+::<br>

+<br>

+    ready><br>

+<br>

+    extern putchard(char)<br>

+    def printdensity(d)<br>

+      if d > 8 then<br>

+        putchard(32)  # ' '<br>

+      else if d > 4 then<br>

+        putchard(46)  # '.'<br>

+      else if d > 2 then<br>

+        putchard(43)  # '+'<br>

+      else<br>

+        putchard(42); # '*'<br>

+    ...<br>

+    ready> printdensity(1): printdensity(2): printdensity(3):<br>

+           printdensity(4): printdensity(5): printdensity(9):<br>

+           putchard(10);<br>

+    **++.<br>

+    Evaluated to 0.000000<br>

+<br>

+Based on these simple primitive operations, we can start to define more<br>

+interesting things. For example, here's a little function that solves<br>

+for the number of iterations it takes a function in the complex plane to<br>

+converge:<br>

+<br>

+::<br>

+<br>

+    # Determine whether the specific location diverges.<br>

+    # Solve for z = z^2 + c in the complex plane.<br>

+    def mandelconverger(real imag iters creal cimag)<br>

+      if iters > 255 | (real*real + imag*imag > 4) then<br>

+        iters<br>

+      else<br>

+        mandelconverger(real*real - imag*imag + creal,<br>

+                        2*real*imag + cimag,<br>

+                        iters+1, creal, cimag);<br>

+<br>

+    # Return the number of iterations required for the iteration to escape<br>

+    def mandelconverge(real imag)<br>

+      mandelconverger(real, imag, 0, real, imag);<br>

+<br>

+This "``z = z2 + c``" function is a beautiful little creature that is<br>

+the basis for computation of the `Mandelbrot<br>

+Set <<a href="http://en.wikipedia.org/wiki/Mandelbrot_set" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Mandelbrot_set</a>>`_. Our<br>

+``mandelconverge`` function returns the number of iterations that it<br>

+takes for a complex orbit to escape, saturating to 255. This is not a<br>

+very useful function by itself, but if you plot its value over a<br>

+two-dimensional plane, you can see the Mandelbrot set. Given that we are<br>

+limited to using putchard here, our amazing graphical output is limited,<br>

+but we can whip together something using the density plotter above:<br>

+<br>

+::<br>

+<br>

+    # Compute and plot the mandelbrot set with the specified 2 dimensional range<br>

+    # info.<br>

+    def mandelhelp(xmin xmax xstep   ymin ymax ystep)<br>

+      for y = ymin, y < ymax, ystep in (<br>

+        (for x = xmin, x < xmax, xstep in<br>

+           printdensity(mandelconverge(x,y)))<br>

+        : putchard(10)<br>

+      )<br>

+<br>

+    # mandel - This is a convenient helper function for plotting the mandelbrot set<br>

+    # from the specified position with the specified Magnification.<br>

+    def mandel(realstart imagstart realmag imagmag)<br>

+      mandelhelp(realstart, realstart+realmag*78, realmag,<br>

+                 imagstart, imagstart+imagmag*40, imagmag);<br>

+<br>

+Given this, we can try plotting out the mandelbrot set! Lets try it out:<br>

+<br>

+::<br>

+<br>

+    ready> mandel(-2.3, -1.3, 0.05, 0.07);<br>

+    *******************************+++++++++++*************************************<br>

+    *************************+++++++++++++++++++++++*******************************<br>

+    **********************+++++++++++++++++++++++++++++****************************<br>

+    *******************+++++++++++++++++++++.. ...++++++++*************************<br>

+    *****************++++++++++++++++++++++.... ...+++++++++***********************<br>

+    ***************+++++++++++++++++++++++.....   ...+++++++++*********************<br>

+    **************+++++++++++++++++++++++....     ....+++++++++********************<br>

+    *************++++++++++++++++++++++......      .....++++++++*******************<br>

+    ************+++++++++++++++++++++.......       .......+++++++******************<br>

+    ***********+++++++++++++++++++....                ... .+++++++*****************<br>

+    **********+++++++++++++++++.......                     .+++++++****************<br>

+    *********++++++++++++++...........                    ...+++++++***************<br>

+    ********++++++++++++............                      ...++++++++**************<br>

+    ********++++++++++... ..........                        .++++++++**************<br>

+    *******+++++++++.....                                   .+++++++++*************<br>

+    *******++++++++......                                  ..+++++++++*************<br>

+    *******++++++.......                                   ..+++++++++*************<br>

+    *******+++++......                                     ..+++++++++*************<br>

+    *******.... ....                                      ...+++++++++*************<br>

+    *******.... .                                         ...+++++++++*************<br>

+    *******+++++......                                    ...+++++++++*************<br>

+    *******++++++.......                                   ..+++++++++*************<br>

+    *******++++++++......                                   .+++++++++*************<br>

+    *******+++++++++.....                                  ..+++++++++*************<br>

+    ********++++++++++... ..........                        .++++++++**************<br>

+    ********++++++++++++............                      ...++++++++**************<br>

+    *********++++++++++++++..........                     ...+++++++***************<br>

+    **********++++++++++++++++........                     .+++++++****************<br>

+    **********++++++++++++++++++++....                ... ..+++++++****************<br>

+    ***********++++++++++++++++++++++.......       .......++++++++*****************<br>

+    ************+++++++++++++++++++++++......      ......++++++++******************<br>

+    **************+++++++++++++++++++++++....      ....++++++++********************<br>

+    ***************+++++++++++++++++++++++.....   ...+++++++++*********************<br>

+    *****************++++++++++++++++++++++....  ...++++++++***********************<br>

+    *******************+++++++++++++++++++++......++++++++*************************<br>

+    *********************++++++++++++++++++++++.++++++++***************************<br>

+    *************************+++++++++++++++++++++++*******************************<br>

+    ******************************+++++++++++++************************************<br>

+    *******************************************************************************<br>

+    *******************************************************************************<br>

+    *******************************************************************************<br>

+    Evaluated to 0.000000<br>

+    ready> mandel(-2, -1, 0.02, 0.04);<br>

+    **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++<br>

+    ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++<br>

+    *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.<br>

+    *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...<br>

+    *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....<br>

+    ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........<br>

+    **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........<br>

+    ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............<br>

+    ***********++++++++++++++++++++++++++++++++++++++++++++++++++........        .<br>

+    **********++++++++++++++++++++++++++++++++++++++++++++++.............<br>

+    ********+++++++++++++++++++++++++++++++++++++++++++..................<br>

+    *******+++++++++++++++++++++++++++++++++++++++.......................<br>

+    ******+++++++++++++++++++++++++++++++++++...........................<br>

+    *****++++++++++++++++++++++++++++++++............................<br>

+    *****++++++++++++++++++++++++++++...............................<br>

+    ****++++++++++++++++++++++++++......   .........................<br>

+    ***++++++++++++++++++++++++.........     ......    ...........<br>

+    ***++++++++++++++++++++++............<br>

+    **+++++++++++++++++++++..............<br>

+    **+++++++++++++++++++................<br>

+    *++++++++++++++++++.................<br>

+    *++++++++++++++++............ ...<br>

+    *++++++++++++++..............<br>

+    *+++....++++................<br>

+    *..........  ...........<br>

+    *<br>

+    *..........  ...........<br>

+    *+++....++++................<br>

+    *++++++++++++++..............<br>

+    *++++++++++++++++............ ...<br>

+    *++++++++++++++++++.................<br>

+    **+++++++++++++++++++................<br>

+    **+++++++++++++++++++++..............<br>

+    ***++++++++++++++++++++++............<br>

+    ***++++++++++++++++++++++++.........     ......    ...........<br>

+    ****++++++++++++++++++++++++++......   .........................<br>

+    *****++++++++++++++++++++++++++++...............................<br>

+    *****++++++++++++++++++++++++++++++++............................<br>

+    ******+++++++++++++++++++++++++++++++++++...........................<br>

+    *******+++++++++++++++++++++++++++++++++++++++.......................<br>

+    ********+++++++++++++++++++++++++++++++++++++++++++..................<br>

+    Evaluated to 0.000000<br>

+    ready> mandel(-0.9, -1.4, 0.02, 0.03);<br>

+    *******************************************************************************<br>

+    *******************************************************************************<br>

+    *******************************************************************************<br>

+    **********+++++++++++++++++++++************************************************<br>

+    *+++++++++++++++++++++++++++++++++++++++***************************************<br>

+    +++++++++++++++++++++++++++++++++++++++++++++**********************************<br>

+    ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************<br>

+    ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************<br>

+    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************<br>

+    +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************<br>

+    +++++++++++++++++++++++++++++++....   ......+++++++++++++++++++****************<br>

+    +++++++++++++++++++++++++++++.......  ........+++++++++++++++++++**************<br>

+    ++++++++++++++++++++++++++++........   ........++++++++++++++++++++************<br>

+    +++++++++++++++++++++++++++.........     ..  ...+++++++++++++++++++++**********<br>

+    ++++++++++++++++++++++++++...........        ....++++++++++++++++++++++********<br>

+    ++++++++++++++++++++++++.............       .......++++++++++++++++++++++******<br>

+    +++++++++++++++++++++++.............        ........+++++++++++++++++++++++****<br>

+    ++++++++++++++++++++++...........           ..........++++++++++++++++++++++***<br>

+    ++++++++++++++++++++...........                .........++++++++++++++++++++++*<br>

+    ++++++++++++++++++............                  ...........++++++++++++++++++++<br>

+    ++++++++++++++++...............                 .............++++++++++++++++++<br>

+    ++++++++++++++.................                 ...............++++++++++++++++<br>

+    ++++++++++++..................                  .................++++++++++++++<br>

+    +++++++++..................                      .................+++++++++++++<br>

+    ++++++........        .                               .........  ..++++++++++++<br>

+    ++............                                         ......    ....++++++++++<br>

+    ..............                                                    ...++++++++++<br>

+    ..............                                                    ....+++++++++<br>

+    ..............                                                    .....++++++++<br>

+    .............                                                    ......++++++++<br>

+    ...........                                                     .......++++++++<br>

+    .........                                                       ........+++++++<br>

+    .........                                                       ........+++++++<br>

+    .........                                                           ....+++++++<br>

+    ........                                                             ...+++++++<br>

+    .......                                                              ...+++++++<br>

+                                                                        ....+++++++<br>

+                                                                       .....+++++++<br>

+                                                                        ....+++++++<br>

+                                                                        ....+++++++<br>

+                                                                        ....+++++++<br>

+    Evaluated to 0.000000<br>

+    ready> ^D<br>

+<br>

+At this point, you may be starting to realize that Kaleidoscope is a<br>

+real and powerful language. It may not be self-similar :), but it can be<br>

+used to plot things that are!<br>

+<br>

+With this, we conclude the "adding user-defined operators" chapter of<br>

+the tutorial. We have successfully augmented our language, adding the<br>

+ability to extend the language in the library, and we have shown how<br>

+this can be used to build a simple but interesting end-user application<br>

+in Kaleidoscope. At this point, Kaleidoscope can build a variety of<br>

+applications that are functional and can call functions with<br>

+side-effects, but it can't actually define and mutate a variable itself.<br>

+<br>

+Strikingly, variable mutation is an important feature of some languages,<br>

+and it is not at all obvious how to `add support for mutable<br>

+variables <LangImpl7.html>`_ without having to add an "SSA construction"<br>

+phase to your front-end. In the next chapter, we will describe how you<br>

+can add variable mutation without building SSA in your front-end.<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for our running example, enhanced with<br>

+the if/then/else and for expressions.. To build this example, use:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

+    # Run<br>

+    ./toy<br>

+<br>

+On some platforms, you will need to specify -rdynamic or<br>

+-Wl,--export-dynamic when linking. This ensures that symbols defined in<br>

+the main executable are exported to the dynamic linker and so are<br>

+available for symbol resolution at run time. This is not needed if you<br>

+compile your support code into a shared library, although doing that<br>

+will cause problems on Windows.<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Extending the language: mutable variables / SSA<br>

+construction <LangImpl07.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl07.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl07.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl07.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl07.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl07.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,881 @@<br>

+=======================================================<br>

+Kaleidoscope: Extending the Language: Mutable Variables<br>

+=======================================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 7 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 7 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a<br>

+very respectable, albeit simple, `functional programming<br>

+language <<a href="http://en.wikipedia.org/wiki/Functional_programming" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Functional_programming</a>>`_. In our<br>

+journey, we learned some parsing techniques, how to build and represent<br>

+an AST, how to build LLVM IR, and how to optimize the resultant code as<br>

+well as JIT compile it.<br>

+<br>

+While Kaleidoscope is interesting as a functional language, the fact<br>

+that it is functional makes it "too easy" to generate LLVM IR for it. In<br>

+particular, a functional language makes it very easy to build LLVM IR<br>

+directly in `SSA<br>

+form <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_.<br>

+Since LLVM requires that the input code be in SSA form, this is a very<br>

+nice property and it is often unclear to newcomers how to generate code<br>

+for an imperative language with mutable variables.<br>

+<br>

+The short (and happy) summary of this chapter is that there is no need<br>

+for your front-end to build SSA form: LLVM provides highly tuned and<br>

+well tested support for this, though the way it works is a bit<br>

+unexpected for some.<br>

+<br>

+Why is this a hard problem?<br>

+===========================<br>

+<br>

+To understand why mutable variables cause complexities in SSA<br>

+construction, consider this extremely simple C example:<br>

+<br>

+.. code-block:: c<br>

+<br>

+    int G, H;<br>

+    int test(_Bool Condition) {<br>

+      int X;<br>

+      if (Condition)<br>

+        X = G;<br>

+      else<br>

+        X = H;<br>

+      return X;<br>

+    }<br>

+<br>

+In this case, we have the variable "X", whose value depends on the path<br>

+executed in the program. Because there are two different possible values<br>

+for X before the return instruction, a PHI node is inserted to merge the<br>

+two values. The LLVM IR that we want for this example looks like this:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    @G = weak global i32 0   ; type of @G is i32*<br>

+    @H = weak global i32 0   ; type of @H is i32*<br>

+<br>

+    define i32 @test(i1 %Condition) {<br>

+    entry:<br>

+      br i1 %Condition, label %cond_true, label %cond_false<br>

+<br>

+    cond_true:<br>

+      %X.0 = load i32* @G<br>

+      br label %cond_next<br>

+<br>

+    cond_false:<br>

+      %X.1 = load i32* @H<br>

+      br label %cond_next<br>

+<br>

+    cond_next:<br>

+      %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]<br>

+      ret i32 %X.2<br>

+    }<br>

+<br>

+In this example, the loads from the G and H global variables are<br>

+explicit in the LLVM IR, and they live in the then/else branches of the<br>

+if statement (cond\_true/cond\_false). In order to merge the incoming<br>

+values, the X.2 phi node in the cond\_next block selects the right value<br>

+to use based on where control flow is coming from: if control flow comes<br>

+from the cond\_false block, X.2 gets the value of X.1. Alternatively, if<br>

+control flow comes from cond\_true, it gets the value of X.0. The intent<br>

+of this chapter is not to explain the details of SSA form. For more<br>

+information, see one of the many `online<br>

+references <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_.<br>

+<br>

+The question for this article is "who places the phi nodes when lowering<br>

+assignments to mutable variables?". The issue here is that LLVM<br>

+*requires* that its IR be in SSA form: there is no "non-ssa" mode for<br>

+it. However, SSA construction requires non-trivial algorithms and data<br>

+structures, so it is inconvenient and wasteful for every front-end to<br>

+have to reproduce this logic.<br>

+<br>

+Memory in LLVM<br>

+==============<br>

+<br>

+The 'trick' here is that while LLVM does require all register values to<br>

+be in SSA form, it does not require (or permit) memory objects to be in<br>

+SSA form. In the example above, note that the loads from G and H are<br>

+direct accesses to G and H: they are not renamed or versioned. This<br>

+differs from some other compiler systems, which do try to version memory<br>

+objects. In LLVM, instead of encoding dataflow analysis of memory into<br>

+the LLVM IR, it is handled with `Analysis<br>

+Passes <../WritingAnLLVMPass.html>`_ which are computed on demand.<br>

+<br>

+With this in mind, the high-level idea is that we want to make a stack<br>

+variable (which lives in memory, because it is on the stack) for each<br>

+mutable object in a function. To take advantage of this trick, we need<br>

+to talk about how LLVM represents stack variables.<br>

+<br>

+In LLVM, all memory accesses are explicit with load/store instructions,<br>

+and it is carefully designed not to have (or need) an "address-of"<br>

+operator. Notice how the type of the @G/@H global variables is actually<br>

+"i32\*" even though the variable is defined as "i32". What this means is<br>

+that @G defines *space* for an i32 in the global data area, but its<br>

+*name* actually refers to the address for that space. Stack variables<br>

+work the same way, except that instead of being declared with global<br>

+variable definitions, they are declared with the `LLVM alloca<br>

+instruction <../LangRef.html#alloca-instruction>`_:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    define i32 @example() {<br>

+    entry:<br>

+      %X = alloca i32           ; type of %X is i32*.<br>

+      ...<br>

+      %tmp = load i32* %X       ; load the stack value %X from the stack.<br>

+      %tmp2 = add i32 %tmp, 1   ; increment it<br>

+      store i32 %tmp2, i32* %X  ; store it back<br>

+      ...<br>

+<br>

+This code shows an example of how you can declare and manipulate a stack<br>

+variable in the LLVM IR. Stack memory allocated with the alloca<br>

+instruction is fully general: you can pass the address of the stack slot<br>

+to functions, you can store it in other variables, etc. In our example<br>

+above, we could rewrite the example to use the alloca technique to avoid<br>

+using a PHI node:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    @G = weak global i32 0   ; type of @G is i32*<br>

+    @H = weak global i32 0   ; type of @H is i32*<br>

+<br>

+    define i32 @test(i1 %Condition) {<br>

+    entry:<br>

+      %X = alloca i32           ; type of %X is i32*.<br>

+      br i1 %Condition, label %cond_true, label %cond_false<br>

+<br>

+    cond_true:<br>

+      %X.0 = load i32* @G<br>

+      store i32 %X.0, i32* %X   ; Update X<br>

+      br label %cond_next<br>

+<br>

+    cond_false:<br>

+      %X.1 = load i32* @H<br>

+      store i32 %X.1, i32* %X   ; Update X<br>

+      br label %cond_next<br>

+<br>

+    cond_next:<br>

+      %X.2 = load i32* %X       ; Read X<br>

+      ret i32 %X.2<br>

+    }<br>

+<br>

+With this, we have discovered a way to handle arbitrary mutable<br>

+variables without the need to create Phi nodes at all:<br>

+<br>

+#. Each mutable variable becomes a stack allocation.<br>

+#. Each read of the variable becomes a load from the stack.<br>

+#. Each update of the variable becomes a store to the stack.<br>

+#. Taking the address of a variable just uses the stack address<br>

+   directly.<br>

+<br>

+While this solution has solved our immediate problem, it introduced<br>

+another one: we have now apparently introduced a lot of stack traffic<br>

+for very simple and common operations, a major performance problem.<br>

+Fortunately for us, the LLVM optimizer has a highly-tuned optimization<br>

+pass named "mem2reg" that handles this case, promoting allocas like this<br>

+into SSA registers, inserting Phi nodes as appropriate. If you run this<br>

+example through the pass, for example, you'll get:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    $ llvm-as < example.ll | opt -mem2reg | llvm-dis<br>

+    @G = weak global i32 0<br>

+    @H = weak global i32 0<br>

+<br>

+    define i32 @test(i1 %Condition) {<br>

+    entry:<br>

+      br i1 %Condition, label %cond_true, label %cond_false<br>

+<br>

+    cond_true:<br>

+      %X.0 = load i32* @G<br>

+      br label %cond_next<br>

+<br>

+    cond_false:<br>

+      %X.1 = load i32* @H<br>

+      br label %cond_next<br>

+<br>

+    cond_next:<br>

+      %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]<br>

+      ret i32 %X.01<br>

+    }<br>

+<br>

+The mem2reg pass implements the standard "iterated dominance frontier"<br>

+algorithm for constructing SSA form and has a number of optimizations<br>

+that speed up (very common) degenerate cases. The mem2reg optimization<br>

+pass is the answer to dealing with mutable variables, and we highly<br>

+recommend that you depend on it. Note that mem2reg only works on<br>

+variables in certain circumstances:<br>

+<br>

+#. mem2reg is alloca-driven: it looks for allocas and if it can handle<br>

+   them, it promotes them. It does not apply to global variables or heap<br>

+   allocations.<br>

+#. mem2reg only looks for alloca instructions in the entry block of the<br>

+   function. Being in the entry block guarantees that the alloca is only<br>

+   executed once, which makes analysis simpler.<br>

+#. mem2reg only promotes allocas whose uses are direct loads and stores.<br>

+   If the address of the stack object is passed to a function, or if any<br>

+   funny pointer arithmetic is involved, the alloca will not be<br>

+   promoted.<br>

+#. mem2reg only works on allocas of `first<br>

+   class <../LangRef.html#first-class-types>`_ values (such as pointers,<br>

+   scalars and vectors), and only if the array size of the allocation is<br>

+   1 (or missing in the .ll file). mem2reg is not capable of promoting<br>

+   structs or arrays to registers. Note that the "sroa" pass is<br>

+   more powerful and can promote structs, "unions", and arrays in many<br>

+   cases.<br>

+<br>

+All of these properties are easy to satisfy for most imperative<br>

+languages, and we'll illustrate it below with Kaleidoscope. The final<br>

+question you may be asking is: should I bother with this nonsense for my<br>

+front-end? Wouldn't it be better if I just did SSA construction<br>

+directly, avoiding use of the mem2reg optimization pass? In short, we<br>

+strongly recommend that you use this technique for building SSA form,<br>

+unless there is an extremely good reason not to. Using this technique<br>

+is:<br>

+<br>

+-  Proven and well tested: clang uses this technique<br>

+   for local mutable variables. As such, the most common clients of LLVM<br>

+   are using this to handle a bulk of their variables. You can be sure<br>

+   that bugs are found fast and fixed early.<br>

+-  Extremely Fast: mem2reg has a number of special cases that make it<br>

+   fast in common cases as well as fully general. For example, it has<br>

+   fast-paths for variables that are only used in a single block,<br>

+   variables that only have one assignment point, good heuristics to<br>

+   avoid insertion of unneeded phi nodes, etc.<br>

+-  Needed for debug info generation: `Debug information in<br>

+   LLVM <../SourceLevelDebugging.html>`_ relies on having the address of<br>

+   the variable exposed so that debug info can be attached to it. This<br>

+   technique dovetails very naturally with this style of debug info.<br>

+<br>

+If nothing else, this makes it much easier to get your front-end up and<br>

+running, and is very simple to implement. Let's extend Kaleidoscope with<br>

+mutable variables now!<br>

+<br>

+Mutable Variables in Kaleidoscope<br>

+=================================<br>

+<br>

+Now that we know the sort of problem we want to tackle, let's see what<br>

+this looks like in the context of our little Kaleidoscope language.<br>

+We're going to add two features:<br>

+<br>

+#. The ability to mutate variables with the '=' operator.<br>

+#. The ability to define new variables.<br>

+<br>

+While the first item is really what this is about, we only have<br>

+variables for incoming arguments as well as for induction variables, and<br>

+redefining those only goes so far :). Also, the ability to define new<br>

+variables is a useful thing regardless of whether you will be mutating<br>

+them. Here's a motivating example that shows how we could use these:<br>

+<br>

+::<br>

+<br>

+    # Define ':' for sequencing: as a low-precedence operator that ignores operands<br>

+    # and just returns the RHS.<br>

+    def binary : 1 (x y) y;<br>

+<br>

+    # Recursive fib, we could do this before.<br>

+    def fib(x)<br>

+      if (x < 3) then<br>

+        1<br>

+      else<br>

+        fib(x-1)+fib(x-2);<br>

+<br>

+    # Iterative fib.<br>

+    def fibi(x)<br>

+      var a = 1, b = 1, c in<br>

+      (for i = 3, i < x in<br>

+         c = a + b :<br>

+         a = b :<br>

+         b = c) :<br>

+      b;<br>

+<br>

+    # Call it.<br>

+    fibi(10);<br>

+<br>

+In order to mutate variables, we have to change our existing variables<br>

+to use the "alloca trick". Once we have that, we'll add our new<br>

+operator, then extend Kaleidoscope to support new variable definitions.<br>

+<br>

+Adjusting Existing Variables for Mutation<br>

+=========================================<br>

+<br>

+The symbol table in Kaleidoscope is managed at code generation time by<br>

+the '``NamedValues``' map. This map currently keeps track of the LLVM<br>

+"Value\*" that holds the double value for the named variable. In order<br>

+to support mutation, we need to change this slightly, so that<br>

+``NamedValues`` holds the *memory location* of the variable in question.<br>

+Note that this change is a refactoring: it changes the structure of the<br>

+code, but does not (by itself) change the behavior of the compiler. All<br>

+of these changes are isolated in the Kaleidoscope code generator.<br>

+<br>

+At this point in Kaleidoscope's development, it only supports variables<br>

+for two things: incoming arguments to functions and the induction<br>

+variable of 'for' loops. For consistency, we'll allow mutation of these<br>

+variables in addition to other user-defined variables. This means that<br>

+these will both need memory locations.<br>

+<br>

+To start our transformation of Kaleidoscope, we'll change the<br>

+NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once<br>

+we do this, the C++ compiler will tell us what parts of the code we need<br>

+to update:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    static std::map<std::string, AllocaInst*> NamedValues;<br>

+<br>

+Also, since we will need to create these alloca's, we'll use a helper<br>

+function that ensures that the allocas are created in the entry block of<br>

+the function:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of<br>

+    /// the function.  This is used for mutable variables etc.<br>

+    static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,<br>

+                                              const std::string &VarName) {<br>

+      IRBuilder<> TmpB(&TheFunction->getEntryBlock(),<br>

+                     TheFunction->getEntryBlock().begin());<br>

+      return TmpB.CreateAlloca(Type::getDoubleTy(LLVMContext), 0,<br>

+                               VarName.c_str());<br>

+    }<br>

+<br>

+This funny looking code creates an IRBuilder object that is pointing at<br>

+the first instruction (.begin()) of the entry block. It then creates an<br>

+alloca with the expected name and returns it. Because all values in<br>

+Kaleidoscope are doubles, there is no need to pass in a type to use.<br>

+<br>

+With this in place, the first functionality change we want to make is to<br>

+variable references. In our new scheme, variables live on the stack, so<br>

+code generating a reference to them actually needs to produce a load<br>

+from the stack slot:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *VariableExprAST::codegen() {<br>

+      // Look this variable up in the function.<br>

+      Value *V = NamedValues[Name];<br>

+      if (!V)<br>

+        return LogErrorV("Unknown variable name");<br>

+<br>

+      // Load the value.<br>

+      return Builder.CreateLoad(V, Name.c_str());<br>

+    }<br>

+<br>

+As you can see, this is pretty straightforward. Now we need to update<br>

+the things that define the variables to set up the alloca. We'll start<br>

+with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for<br>

+the unabridged code):<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+<br>

+      // Create an alloca for the variable in the entry block.<br>

+      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

+<br>

+        // Emit the start code first, without 'variable' in scope.<br>

+      Value *StartVal = Start->codegen();<br>

+      if (!StartVal)<br>

+        return nullptr;<br>

+<br>

+      // Store the value into the alloca.<br>

+      Builder.CreateStore(StartVal, Alloca);<br>

+      ...<br>

+<br>

+      // Compute the end condition.<br>

+      Value *EndCond = End->codegen();<br>

+      if (!EndCond)<br>

+        return nullptr;<br>

+<br>

+      // Reload, increment, and restore the alloca.  This handles the case where<br>

+      // the body of the loop mutates the variable.<br>

+      Value *CurVar = Builder.CreateLoad(Alloca);<br>

+      Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");<br>

+      Builder.CreateStore(NextVar, Alloca);<br>

+      ...<br>

+<br>

+This code is virtually identical to the code `before we allowed mutable<br>

+variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we<br>

+no longer have to construct a PHI node, and we use load/store to access<br>

+the variable as needed.<br>

+<br>

+To support mutable argument variables, we need to also make allocas for<br>

+them. The code for this is also pretty simple:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// CreateArgumentAllocas - Create an alloca for each argument and register the<br>

+    /// argument in the symbol table so that references to it will succeed.<br>

+    void PrototypeAST::CreateArgumentAllocas(Function *F) {<br>

+      Function::arg_iterator AI = F->arg_begin();<br>

+      for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {<br>

+        // Create an alloca for this variable.<br>

+        AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);<br>

+<br>

+        // Store the initial value into the alloca.<br>

+        Builder.CreateStore(AI, Alloca);<br>

+<br>

+        // Add arguments to variable symbol table.<br>

+        NamedValues[Args[Idx]] = Alloca;<br>

+      }<br>

+    }<br>

+<br>

+For each argument, we make an alloca, store the input value to the<br>

+function into the alloca, and register the alloca as the memory location<br>

+for the argument. This method gets invoked by ``FunctionAST::codegen()``<br>

+right after it sets up the entry block for the function.<br>

+<br>

+The final missing piece is adding the mem2reg pass, which allows us to<br>

+get good codegen once again:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        // Set up the optimizer pipeline.  Start with registering info about how the<br>

+        // target lays out data structures.<br>

+        OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));<br>

+        // Promote allocas to registers.<br>

+        OurFPM.add(createPromoteMemoryToRegisterPass());<br>

+        // Do simple "peephole" optimizations and bit-twiddling optzns.<br>

+        OurFPM.add(createInstructionCombiningPass());<br>

+        // Reassociate expressions.<br>

+        OurFPM.add(createReassociatePass());<br>

+<br>

+It is interesting to see what the code looks like before and after the<br>

+mem2reg optimization runs. For example, this is the before/after code<br>

+for our recursive fib function. Before the optimization:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    define double @fib(double %x) {<br>

+    entry:<br>

+      %x1 = alloca double<br>

+      store double %x, double* %x1<br>

+      %x2 = load double* %x1<br>

+      %cmptmp = fcmp ult double %x2, 3.000000e+00<br>

+      %booltmp = uitofp i1 %cmptmp to double<br>

+      %ifcond = fcmp one double %booltmp, 0.000000e+00<br>

+      br i1 %ifcond, label %then, label %else<br>

+<br>

+    then:       ; preds = %entry<br>

+      br label %ifcont<br>

+<br>

+    else:       ; preds = %entry<br>

+      %x3 = load double* %x1<br>

+      %subtmp = fsub double %x3, 1.000000e+00<br>

+      %calltmp = call double @fib(double %subtmp)<br>

+      %x4 = load double* %x1<br>

+      %subtmp5 = fsub double %x4, 2.000000e+00<br>

+      %calltmp6 = call double @fib(double %subtmp5)<br>

+      %addtmp = fadd double %calltmp, %calltmp6<br>

+      br label %ifcont<br>

+<br>

+    ifcont:     ; preds = %else, %then<br>

+      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]<br>

+      ret double %iftmp<br>

+    }<br>

+<br>

+Here there is only one variable (x, the input argument) but you can<br>

+still see the extremely simple-minded code generation strategy we are<br>

+using. In the entry block, an alloca is created, and the initial input<br>

+value is stored into it. Each reference to the variable does a reload<br>

+from the stack. Also, note that we didn't modify the if/then/else<br>

+expression, so it still inserts a PHI node. While we could make an<br>

+alloca for it, it is actually easier to create a PHI node for it, so we<br>

+still just make the PHI.<br>

+<br>

+Here is the code after the mem2reg pass runs:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    define double @fib(double %x) {<br>

+    entry:<br>

+      %cmptmp = fcmp ult double %x, 3.000000e+00<br>

+      %booltmp = uitofp i1 %cmptmp to double<br>

+      %ifcond = fcmp one double %booltmp, 0.000000e+00<br>

+      br i1 %ifcond, label %then, label %else<br>

+<br>

+    then:<br>

+      br label %ifcont<br>

+<br>

+    else:<br>

+      %subtmp = fsub double %x, 1.000000e+00<br>

+      %calltmp = call double @fib(double %subtmp)<br>

+      %subtmp5 = fsub double %x, 2.000000e+00<br>

+      %calltmp6 = call double @fib(double %subtmp5)<br>

+      %addtmp = fadd double %calltmp, %calltmp6<br>

+      br label %ifcont<br>

+<br>

+    ifcont:     ; preds = %else, %then<br>

+      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]<br>

+      ret double %iftmp<br>

+    }<br>

+<br>

+This is a trivial case for mem2reg, since there are no redefinitions of<br>

+the variable. The point of showing this is to calm your tension about<br>

+inserting such blatent inefficiencies :).<br>

+<br>

+After the rest of the optimizers run, we get:<br>

+<br>

+.. code-block:: llvm<br>

+<br>

+    define double @fib(double %x) {<br>

+    entry:<br>

+      %cmptmp = fcmp ult double %x, 3.000000e+00<br>

+      %booltmp = uitofp i1 %cmptmp to double<br>

+      %ifcond = fcmp ueq double %booltmp, 0.000000e+00<br>

+      br i1 %ifcond, label %else, label %ifcont<br>

+<br>

+    else:<br>

+      %subtmp = fsub double %x, 1.000000e+00<br>

+      %calltmp = call double @fib(double %subtmp)<br>

+      %subtmp5 = fsub double %x, 2.000000e+00<br>

+      %calltmp6 = call double @fib(double %subtmp5)<br>

+      %addtmp = fadd double %calltmp, %calltmp6<br>

+      ret double %addtmp<br>

+<br>

+    ifcont:<br>

+      ret double 1.000000e+00<br>

+    }<br>

+<br>

+Here we see that the simplifycfg pass decided to clone the return<br>

+instruction into the end of the 'else' block. This allowed it to<br>

+eliminate some branches and the PHI node.<br>

+<br>

+Now that all symbol table references are updated to use stack variables,<br>

+we'll add the assignment operator.<br>

+<br>

+New Assignment Operator<br>

+=======================<br>

+<br>

+With our current framework, adding a new assignment operator is really<br>

+simple. We will parse it just like any other binary operator, but handle<br>

+it internally (instead of allowing the user to define it). The first<br>

+step is to set a precedence:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+     int main() {<br>

+       // Install standard binary operators.<br>

+       // 1 is lowest precedence.<br>

+       BinopPrecedence['='] = 2;<br>

+       BinopPrecedence['<'] = 10;<br>

+       BinopPrecedence['+'] = 20;<br>

+       BinopPrecedence['-'] = 20;<br>

+<br>

+Now that the parser knows the precedence of the binary operator, it<br>

+takes care of all the parsing and AST generation. We just need to<br>

+implement codegen for the assignment operator. This looks like:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *BinaryExprAST::codegen() {<br>

+      // Special case '=' because we don't want to emit the LHS as an expression.<br>

+      if (Op == '=') {<br>

+        // Assignment requires the LHS to be an identifier.<br>

+        VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());<br>

+        if (!LHSE)<br>

+          return LogErrorV("destination of '=' must be a variable");<br>

+<br>

+Unlike the rest of the binary operators, our assignment operator doesn't<br>

+follow the "emit LHS, emit RHS, do computation" model. As such, it is<br>

+handled as a special case before the other binary operators are handled.<br>

+The other strange thing is that it requires the LHS to be a variable. It<br>

+is invalid to have "(x+1) = expr" - only things like "x = expr" are<br>

+allowed.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        // Codegen the RHS.<br>

+        Value *Val = RHS->codegen();<br>

+        if (!Val)<br>

+          return nullptr;<br>

+<br>

+        // Look up the name.<br>

+        Value *Variable = NamedValues[LHSE->getName()];<br>

+        if (!Variable)<br>

+          return LogErrorV("Unknown variable name");<br>

+<br>

+        Builder.CreateStore(Val, Variable);<br>

+        return Val;<br>

+      }<br>

+      ...<br>

+<br>

+Once we have the variable, codegen'ing the assignment is<br>

+straightforward: we emit the RHS of the assignment, create a store, and<br>

+return the computed value. Returning a value allows for chained<br>

+assignments like "X = (Y = Z)".<br>

+<br>

+Now that we have an assignment operator, we can mutate loop variables<br>

+and arguments. For example, we can now run code like this:<br>

+<br>

+::<br>

+<br>

+    # Function to print a double.<br>

+    extern printd(x);<br>

+<br>

+    # Define ':' for sequencing: as a low-precedence operator that ignores operands<br>

+    # and just returns the RHS.<br>

+    def binary : 1 (x y) y;<br>

+<br>

+    def test(x)<br>

+      printd(x) :<br>

+      x = 4 :<br>

+      printd(x);<br>

+<br>

+    test(123);<br>

+<br>

+When run, this example prints "123" and then "4", showing that we did<br>

+actually mutate the value! Okay, we have now officially implemented our<br>

+goal: getting this to work requires SSA construction in the general<br>

+case. However, to be really useful, we want the ability to define our<br>

+own local variables, let's add this next!<br>

+<br>

+User-defined Local Variables<br>

+============================<br>

+<br>

+Adding var/in is just like any other extension we made to<br>

+Kaleidoscope: we extend the lexer, the parser, the AST and the code<br>

+generator. The first step for adding our new 'var/in' construct is to<br>

+extend the lexer. As before, this is pretty trivial, the code looks like<br>

+this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    enum Token {<br>

+      ...<br>

+      // var definition<br>

+      tok_var = -13<br>

+    ...<br>

+    }<br>

+    ...<br>

+    static int gettok() {<br>

+    ...<br>

+        if (IdentifierStr == "in")<br>

+          return tok_in;<br>

+        if (IdentifierStr == "binary")<br>

+          return tok_binary;<br>

+        if (IdentifierStr == "unary")<br>

+          return tok_unary;<br>

+        if (IdentifierStr == "var")<br>

+          return tok_var;<br>

+        return tok_identifier;<br>

+    ...<br>

+<br>

+The next step is to define the AST node that we will construct. For<br>

+var/in, it looks like this:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// VarExprAST - Expression class for var/in<br>

+    class VarExprAST : public ExprAST {<br>

+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;<br>

+      std::unique_ptr<ExprAST> Body;<br>

+<br>

+    public:<br>

+      VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,<br>

+                 std::unique_ptr<ExprAST> body)<br>

+      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}<br>

+<br>

+      virtual Value *codegen();<br>

+    };<br>

+<br>

+var/in allows a list of names to be defined all at once, and each name<br>

+can optionally have an initializer value. As such, we capture this<br>

+information in the VarNames vector. Also, var/in has a body, this body<br>

+is allowed to access the variables defined by the var/in.<br>

+<br>

+With this in place, we can define the parser pieces. The first thing we<br>

+do is add it as a primary expression:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// primary<br>

+    ///   ::= identifierexpr<br>

+    ///   ::= numberexpr<br>

+    ///   ::= parenexpr<br>

+    ///   ::= ifexpr<br>

+    ///   ::= forexpr<br>

+    ///   ::= varexpr<br>

+    static std::unique_ptr<ExprAST> ParsePrimary() {<br>

+      switch (CurTok) {<br>

+      default:<br>

+        return LogError("unknown token when expecting an expression");<br>

+      case tok_identifier:<br>

+        return ParseIdentifierExpr();<br>

+      case tok_number:<br>

+        return ParseNumberExpr();<br>

+      case '(':<br>

+        return ParseParenExpr();<br>

+      case tok_if:<br>

+        return ParseIfExpr();<br>

+      case tok_for:<br>

+        return ParseForExpr();<br>

+      case tok_var:<br>

+        return ParseVarExpr();<br>

+      }<br>

+    }<br>

+<br>

+Next we define ParseVarExpr:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    /// varexpr ::= 'var' identifier ('=' expression)?<br>

+    //                    (',' identifier ('=' expression)?)* 'in' expression<br>

+    static std::unique_ptr<ExprAST> ParseVarExpr() {<br>

+      getNextToken();  // eat the var.<br>

+<br>

+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;<br>

+<br>

+      // At least one variable name is required.<br>

+      if (CurTok != tok_identifier)<br>

+        return LogError("expected identifier after var");<br>

+<br>

+The first part of this code parses the list of identifier/expr pairs<br>

+into the local ``VarNames`` vector.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      while (1) {<br>

+        std::string Name = IdentifierStr;<br>

+        getNextToken();  // eat identifier.<br>

+<br>

+        // Read the optional initializer.<br>

+        std::unique_ptr<ExprAST> Init;<br>

+        if (CurTok == '=') {<br>

+          getNextToken(); // eat the '='.<br>

+<br>

+          Init = ParseExpression();<br>

+          if (!Init) return nullptr;<br>

+        }<br>

+<br>

+        VarNames.push_back(std::make_pair(Name, std::move(Init)));<br>

+<br>

+        // End of var list, exit loop.<br>

+        if (CurTok != ',') break;<br>

+        getNextToken(); // eat the ','.<br>

+<br>

+        if (CurTok != tok_identifier)<br>

+          return LogError("expected identifier list after var");<br>

+      }<br>

+<br>

+Once all the variables are parsed, we then parse the body and create the<br>

+AST node:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // At this point, we have to have 'in'.<br>

+      if (CurTok != tok_in)<br>

+        return LogError("expected 'in' keyword after 'var'");<br>

+      getNextToken();  // eat 'in'.<br>

+<br>

+      auto Body = ParseExpression();<br>

+      if (!Body)<br>

+        return nullptr;<br>

+<br>

+      return llvm::make_unique<VarExprAST>(std::move(VarNames),<br>

+                                           std::move(Body));<br>

+    }<br>

+<br>

+Now that we can parse and represent the code, we need to support<br>

+emission of LLVM IR for it. This code starts out with:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    Value *VarExprAST::codegen() {<br>

+      std::vector<AllocaInst *> OldBindings;<br>

+<br>

+      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+<br>

+      // Register all variables and emit their initializer.<br>

+      for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {<br>

+        const std::string &VarName = VarNames[i].first;<br>

+        ExprAST *Init = VarNames[i].second.get();<br>

+<br>

+Basically it loops over all the variables, installing them one at a<br>

+time. For each variable we put into the symbol table, we remember the<br>

+previous value that we replace in OldBindings.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+        // Emit the initializer before adding the variable to scope, this prevents<br>

+        // the initializer from referencing the variable itself, and permits stuff<br>

+        // like this:<br>

+        //  var a = 1 in<br>

+        //    var a = a in ...   # refers to outer 'a'.<br>

+        Value *InitVal;<br>

+        if (Init) {<br>

+          InitVal = Init->codegen();<br>

+          if (!InitVal)<br>

+            return nullptr;<br>

+        } else { // If not specified, use 0.0.<br>

+          InitVal = ConstantFP::get(LLVMContext, APFloat(0.0));<br>

+        }<br>

+<br>

+        AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

+        Builder.CreateStore(InitVal, Alloca);<br>

+<br>

+        // Remember the old variable binding so that we can restore the binding when<br>

+        // we unrecurse.<br>

+        OldBindings.push_back(NamedValues[VarName]);<br>

+<br>

+        // Remember this binding.<br>

+        NamedValues[VarName] = Alloca;<br>

+      }<br>

+<br>

+There are more comments here than code. The basic idea is that we emit<br>

+the initializer, create the alloca, then update the symbol table to<br>

+point to it. Once all the variables are installed in the symbol table,<br>

+we evaluate the body of the var/in expression:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Codegen the body, now that all vars are in scope.<br>

+      Value *BodyVal = Body->codegen();<br>

+      if (!BodyVal)<br>

+        return nullptr;<br>

+<br>

+Finally, before returning, we restore the previous variable bindings:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+      // Pop all our variables from scope.<br>

+      for (unsigned i = 0, e = VarNames.size(); i != e; ++i)<br>

+        NamedValues[VarNames[i].first] = OldBindings[i];<br>

+<br>

+      // Return the body computation.<br>

+      return BodyVal;<br>

+    }<br>

+<br>

+The end result of all of this is that we get properly scoped variable<br>

+definitions, and we even (trivially) allow mutation of them :).<br>

+<br>

+With this, we completed what we set out to do. Our nice iterative fib<br>

+example from the intro compiles and runs just fine. The mem2reg pass<br>

+optimizes all of our stack variables into SSA registers, inserting PHI<br>

+nodes where needed, and our front-end remains simple: no "iterated<br>

+dominance frontier" computation anywhere in sight.<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for our running example, enhanced with<br>

+mutable variables and var/in support. To build this example, use:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

+    # Run<br>

+    ./toy<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Compiling to Object Code <LangImpl08.html>`_<br>

+<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl08.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl08.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl08.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl08.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl08.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,218 @@<br>

+========================================<br>

+ Kaleidoscope: Compiling to Object Code<br>

+========================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 8 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 8 of the "`Implementing a language with LLVM<br>

+<index.html>`_" tutorial. This chapter describes how to compile our<br>

+language down to object files.<br>

+<br>

+Choosing a target<br>

+=================<br>

+<br>

+LLVM has native support for cross-compilation. You can compile to the<br>

+architecture of your current machine, or just as easily compile for<br>

+other architectures. In this tutorial, we'll target the current<br>

+machine.<br>

+<br>

+To specify the architecture that you want to target, we use a string<br>

+called a "target triple". This takes the form<br>

+``<arch><sub>-<vendor>-<sys>-<abi>`` (see the `cross compilation docs<br>

+<<a href="http://clang.llvm.org/docs/CrossCompilation.html#target-triple" rel="noreferrer" target="_blank">http://clang.llvm.org/docs/CrossCompilation.html#target-triple</a>>`_).<br>

+<br>

+As an example, we can see what clang thinks is our current target<br>

+triple:<br>

+<br>

+::<br>

+<br>

+    $ clang --version | grep Target<br>

+    Target: x86_64-unknown-linux-gnu<br>

+<br>

+Running this command may show something different on your machine as<br>

+you might be using a different architecture or operating system to me.<br>

+<br>

+Fortunately, we don't need to hard-code a target triple to target the<br>

+current machine. LLVM provides ``sys::getDefaultTargetTriple``, which<br>

+returns the target triple of the current machine.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    auto TargetTriple = sys::getDefaultTargetTriple();<br>

+<br>

+LLVM doesn't require us to to link in all the target<br>

+functionality. For example, if we're just using the JIT, we don't need<br>

+the assembly printers. Similarly, if we're only targetting certain<br>

+architectures, we can only link in the functionality for those<br>

+architectures.<br>

+<br>

+For this example, we'll initialize all the targets for emitting object<br>

+code.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    InitializeAllTargetInfos();<br>

+    InitializeAllTargets();<br>

+    InitializeAllTargetMCs();<br>

+    InitializeAllAsmParsers();<br>

+    InitializeAllAsmPrinters();<br>

+<br>

+We can now use our target triple to get a ``Target``:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  std::string Error;<br>

+  auto Target = TargetRegistry::lookupTarget(TargetTriple, Error);<br>

+<br>

+  // Print an error and exit if we couldn't find the requested target.<br>

+  // This generally occurs if we've forgotten to initialise the<br>

+  // TargetRegistry or we have a bogus target triple.<br>

+  if (!Target) {<br>

+    errs() << Error;<br>

+    return 1;<br>

+  }<br>

+<br>

+Target Machine<br>

+==============<br>

+<br>

+We will also need a ``TargetMachine``. This class provides a complete<br>

+machine description of the machine we're targetting. If we want to<br>

+target a specific feature (such as SSE) or a specific CPU (such as<br>

+Intel's Sandylake), we do so now.<br>

+<br>

+To see which features and CPUs that LLVM knows about, we can use<br>

+``llc``. For example, let's look at x86:<br>

+<br>

+::<br>

+<br>

+    $ llvm-as < /dev/null | llc -march=x86 -mattr=help<br>

+    Available CPUs for this target:<br>

+<br>

+      amdfam10      - Select the amdfam10 processor.<br>

+      athlon        - Select the athlon processor.<br>

+      athlon-4      - Select the athlon-4 processor.<br>

+      ...<br>

+<br>

+    Available features for this target:<br>

+<br>

+      16bit-mode            - 16-bit mode (i8086).<br>

+      32bit-mode            - 32-bit mode (80386).<br>

+      3dnow                 - Enable 3DNow! instructions.<br>

+      3dnowa                - Enable 3DNow! Athlon instructions.<br>

+      ...<br>

+<br>

+For our example, we'll use the generic CPU without any additional<br>

+features, options or relocation model.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  auto CPU = "generic";<br>

+  auto Features = "";<br>

+<br>

+  TargetOptions opt;<br>

+  auto RM = Optional<Reloc::Model>();<br>

+  auto TargetMachine = Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM);<br>

+<br>

+<br>

+Configuring the Module<br>

+======================<br>

+<br>

+We're now ready to configure our module, to specify the target and<br>

+data layout. This isn't strictly necessary, but the `frontend<br>

+performance guide <../Frontend/PerformanceTips.html>`_ recommends<br>

+this. Optimizations benefit from knowing about the target and data<br>

+layout.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  TheModule->setDataLayout(TargetMachine->createDataLayout());<br>

+  TheModule->setTargetTriple(TargetTriple);<br>

+<br>

+Emit Object Code<br>

+================<br>

+<br>

+We're ready to emit object code! Let's define where we want to write<br>

+our file to:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  auto Filename = "output.o";<br>

+  std::error_code EC;<br>

+  raw_fd_ostream dest(Filename, EC, sys::fs::F_None);<br>

+<br>

+  if (EC) {<br>

+    errs() << "Could not open file: " << EC.message();<br>

+    return 1;<br>

+  }<br>

+<br>

+Finally, we define a pass that emits object code, then we run that<br>

+pass:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  legacy::PassManager pass;<br>

+  auto FileType = TargetMachine::CGFT_ObjectFile;<br>

+<br>

+  if (TargetMachine->addPassesToEmitFile(pass, dest, FileType)) {<br>

+    errs() << "TargetMachine can't emit a file of this type";<br>

+    return 1;<br>

+  }<br>

+<br>

+  pass.run(*TheModule);<br>

+  dest.flush();<br>

+<br>

+Putting It All Together<br>

+=======================<br>

+<br>

+Does it work? Let's give it a try. We need to compile our code, but<br>

+note that the arguments to ``llvm-config`` are different to the previous chapters.<br>

+<br>

+::<br>

+<br>

+    $ clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs all` -o toy<br>

+<br>

+Let's run it, and define a simple ``average`` function. Press Ctrl-D<br>

+when you're done.<br>

+<br>

+::<br>

+<br>

+    $ ./toy<br>

+    ready> def average(x y) (x + y) * 0.5;<br>

+    ^D<br>

+    Wrote output.o<br>

+<br>

+We have an object file! To test it, let's write a simple program and<br>

+link it with our output. Here's the source code:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+    #include <iostream><br>

+<br>

+    extern "C" {<br>

+        double average(double, double);<br>

+    }<br>

+<br>

+    int main() {<br>

+        std::cout << "average of 3.0 and 4.0: " << average(3.0, 4.0) << std::endl;<br>

+    }<br>

+<br>

+We link our program to output.o and check the result is what we<br>

+expected:<br>

+<br>

+::<br>

+<br>

+    $ clang++ main.cpp output.o -o main<br>

+    $ ./main<br>

+    average of 3.0 and 4.0: 3.5<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Adding Debug Information <LangImpl09.html>`_<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl09.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl09.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl09.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl09.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl09.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,462 @@<br>

+======================================<br>

+Kaleidoscope: Adding Debug Information<br>

+======================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Chapter 9 Introduction<br>

+======================<br>

+<br>

+Welcome to Chapter 9 of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a<br>

+decent little programming language with functions and variables.<br>

+What happens if something goes wrong though, how do you debug your<br>

+program?<br>

+<br>

+Source level debugging uses formatted data that helps a debugger<br>

+translate from binary and the state of the machine back to the<br>

+source that the programmer wrote. In LLVM we generally use a format<br>

+called `DWARF <<a href="http://dwarfstd.org" rel="noreferrer" target="_blank">http://dwarfstd.org</a>>`_. DWARF is a compact encoding<br>

+that represents types, source locations, and variable locations.<br>

+<br>

+The short summary of this chapter is that we'll go through the<br>

+various things you have to add to a programming language to<br>

+support debug info, and how you translate that into DWARF.<br>

+<br>

+Caveat: For now we can't debug via the JIT, so we'll need to compile<br>

+our program down to something small and standalone. As part of this<br>

+we'll make a few modifications to the running of the language and<br>

+how programs are compiled. This means that we'll have a source file<br>

+with a simple program written in Kaleidoscope rather than the<br>

+interactive JIT. It does involve a limitation that we can only<br>

+have one "top level" command at a time to reduce the number of<br>

+changes necessary.<br>

+<br>

+Here's the sample program we'll be compiling:<br>

+<br>

+.. code-block:: python<br>

+<br>

+   def fib(x)<br>

+     if x < 3 then<br>

+       1<br>

+     else<br>

+       fib(x-1)+fib(x-2);<br>

+<br>

+   fib(10)<br>

+<br>

+<br>

+Why is this a hard problem?<br>

+===========================<br>

+<br>

+Debug information is a hard problem for a few different reasons - mostly<br>

+centered around optimized code. First, optimization makes keeping source<br>

+locations more difficult. In LLVM IR we keep the original source location<br>

+for each IR level instruction on the instruction. Optimization passes<br>

+should keep the source locations for newly created instructions, but merged<br>

+instructions only get to keep a single location - this can cause jumping<br>

+around when stepping through optimized programs. Secondly, optimization<br>

+can move variables in ways that are either optimized out, shared in memory<br>

+with other variables, or difficult to track. For the purposes of this<br>

+tutorial we're going to avoid optimization (as you'll see with one of the<br>

+next sets of patches).<br>

+<br>

+Ahead-of-Time Compilation Mode<br>

+==============================<br>

+<br>

+To highlight only the aspects of adding debug information to a source<br>

+language without needing to worry about the complexities of JIT debugging<br>

+we're going to make a few changes to Kaleidoscope to support compiling<br>

+the IR emitted by the front end into a simple standalone program that<br>

+you can execute, debug, and see results.<br>

+<br>

+First we make our anonymous function that contains our top level<br>

+statement be our "main":<br>

+<br>

+.. code-block:: udiff<br>

+<br>

+  -    auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());<br>

+  +    auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>());<br>

+<br>

+just with the simple change of giving it a name.<br>

+<br>

+Then we're going to remove the command line code wherever it exists:<br>

+<br>

+.. code-block:: udiff<br>

+<br>

+  @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {<br>

+   /// top ::= definition | external | expression | ';'<br>

+   static void MainLoop() {<br>

+     while (1) {<br>

+  -    fprintf(stderr, "ready> ");<br>

+       switch (CurTok) {<br>

+       case tok_eof:<br>

+         return;<br>

+  @@ -1184,7 +1183,6 @@ int main() {<br>

+     BinopPrecedence['*'] = 40; // highest.<br>

+<br>

+     // Prime the first token.<br>

+  -  fprintf(stderr, "ready> ");<br>

+     getNextToken();<br>

+<br>

+Lastly we're going to disable all of the optimization passes and the JIT so<br>

+that the only thing that happens after we're done parsing and generating<br>

+code is that the llvm IR goes to standard error:<br>

+<br>

+.. code-block:: udiff<br>

+<br>

+  @@ -1108,17 +1108,8 @@ static void HandleExtern() {<br>

+   static void HandleTopLevelExpression() {<br>

+     // Evaluate a top-level expression into an anonymous function.<br>

+     if (auto FnAST = ParseTopLevelExpr()) {<br>

+  -    if (auto *FnIR = FnAST->codegen()) {<br>

+  -      // We're just doing this to make sure it executes.<br>

+  -      TheExecutionEngine->finalizeObject();<br>

+  -      // JIT the function, returning a function pointer.<br>

+  -      void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR);<br>

+  -<br>

+  -      // Cast it to the right type (takes no arguments, returns a double) so we<br>

+  -      // can call it as a native function.<br>

+  -      double (*FP)() = (double (*)())(intptr_t)FPtr;<br>

+  -      // Ignore the return value for this.<br>

+  -      (void)FP;<br>

+  +    if (!F->codegen()) {<br>

+  +      fprintf(stderr, "Error generating code for top level expr");<br>

+       }<br>

+     } else {<br>

+       // Skip token for error recovery.<br>

+  @@ -1439,11 +1459,11 @@ int main() {<br>

+     // target lays out data structures.<br>

+     TheModule->setDataLayout(TheExecutionEngine->getDataLayout());<br>

+     OurFPM.add(new DataLayoutPass());<br>

+  +#if 0<br>

+     OurFPM.add(createBasicAliasAnalysisPass());<br>

+     // Promote allocas to registers.<br>

+     OurFPM.add(createPromoteMemoryToRegisterPass());<br>

+  @@ -1218,7 +1210,7 @@ int main() {<br>

+     OurFPM.add(createGVNPass());<br>

+     // Simplify the control flow graph (deleting unreachable blocks, etc).<br>

+     OurFPM.add(createCFGSimplificationPass());<br>

+  -<br>

+  +  #endif<br>

+     OurFPM.doInitialization();<br>

+<br>

+     // Set the global so the code gen can use this.<br>

+<br>

+This relatively small set of changes get us to the point that we can compile<br>

+our piece of Kaleidoscope language down to an executable program via this<br>

+command line:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+  Kaleidoscope-Ch9 < fib.ks | & clang -x ir -<br>

+<br>

+which gives an a.out/a.exe in the current working directory.<br>

+<br>

+Compile Unit<br>

+============<br>

+<br>

+The top level container for a section of code in DWARF is a compile unit.<br>

+This contains the type and function data for an individual translation unit<br>

+(read: one file of source code). So the first thing we need to do is<br>

+construct one for our fib.ks file.<br>

+<br>

+DWARF Emission Setup<br>

+====================<br>

+<br>

+Similar to the ``IRBuilder`` class we have a<br>

+`DIBuilder <<a href="http://llvm.org/doxygen/classllvm_1_1DIBuilder.html" rel="noreferrer" target="_blank">http://llvm.org/doxygen/classllvm_1_1DIBuilder.html</a>>`_ class<br>

+that helps in constructing debug metadata for an llvm IR file. It<br>

+corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.<br>

+Using it does require that you be more familiar with DWARF terminology than<br>

+you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you<br>

+read through the general documentation on the<br>

+`Metadata Format <<a href="http://llvm.org/docs/SourceLevelDebugging.html" rel="noreferrer" target="_blank">http://llvm.org/docs/SourceLevelDebugging.html</a>>`_ it<br>

+should be a little more clear. We'll be using this class to construct all<br>

+of our IR level descriptions. Construction for it takes a module so we<br>

+need to construct it shortly after we construct our module. We've left it<br>

+as a global static variable to make it a bit easier to use.<br>

+<br>

+Next we're going to create a small container to cache some of our frequent<br>

+data. The first will be our compile unit, but we'll also write a bit of<br>

+code for our one type since we won't have to worry about multiple typed<br>

+expressions:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  static DIBuilder *DBuilder;<br>

+<br>

+  struct DebugInfo {<br>

+    DICompileUnit *TheCU;<br>

+    DIType *DblTy;<br>

+<br>

+    DIType *getDoubleTy();<br>

+  } KSDbgInfo;<br>

+<br>

+  DIType *DebugInfo::getDoubleTy() {<br>

+    if (DblTy.isValid())<br>

+      return DblTy;<br>

+<br>

+    DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);<br>

+    return DblTy;<br>

+  }<br>

+<br>

+And then later on in ``main`` when we're constructing our module:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  DBuilder = new DIBuilder(*TheModule);<br>

+<br>

+  KSDbgInfo.TheCU = DBuilder->createCompileUnit(<br>

+      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);<br>

+<br>

+There are a couple of things to note here. First, while we're producing a<br>

+compile unit for a language called Kaleidoscope we used the language<br>

+constant for C. This is because a debugger wouldn't necessarily understand<br>

+the calling conventions or default ABI for a language it doesn't recognize<br>

+and we follow the C ABI in our llvm code generation so it's the closest<br>

+thing to accurate. This ensures we can actually call functions from the<br>

+debugger and have them execute. Secondly, you'll see the "fib.ks" in the<br>

+call to ``createCompileUnit``. This is a default hard coded value since<br>

+we're using shell redirection to put our source into the Kaleidoscope<br>

+compiler. In a usual front end you'd have an input file name and it would<br>

+go there.<br>

+<br>

+One last thing as part of emitting debug information via DIBuilder is that<br>

+we need to "finalize" the debug information. The reasons are part of the<br>

+underlying API for DIBuilder, but make sure you do this near the end of<br>

+main:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  DBuilder->finalize();<br>

+<br>

+before you dump out the module.<br>

+<br>

+Functions<br>

+=========<br>

+<br>

+Now that we have our ``Compile Unit`` and our source locations, we can add<br>

+function definitions to the debug info. So in ``PrototypeAST::codegen()`` we<br>

+add a few lines of code to describe a context for our subprogram, in this<br>

+case the "File", and the actual definition of the function itself.<br>

+<br>

+So the context:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),<br>

+                                      KSDbgInfo.TheCU.getDirectory());<br>

+<br>

+giving us an DIFile and asking the ``Compile Unit`` we created above for the<br>

+directory and filename where we are currently. Then, for now, we use some<br>

+source locations of 0 (since our AST doesn't currently have source location<br>

+information) and construct our function definition:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  DIScope *FContext = Unit;<br>

+  unsigned LineNo = 0;<br>

+  unsigned ScopeLine = 0;<br>

+  DISubprogram *SP = DBuilder->createFunction(<br>

+      FContext, Name, StringRef(), Unit, LineNo,<br>

+      CreateFunctionType(Args.size(), Unit), false /* internal linkage */,<br>

+      true /* definition */, ScopeLine, DINode::FlagPrototyped, false);<br>

+  F->setSubprogram(SP);<br>

+<br>

+and we now have an DISubprogram that contains a reference to all of our<br>

+metadata for the function.<br>

+<br>

+Source Locations<br>

+================<br>

+<br>

+The most important thing for debug information is accurate source location -<br>

+this makes it possible to map your source code back. We have a problem though,<br>

+Kaleidoscope really doesn't have any source location information in the lexer<br>

+or parser so we'll need to add it.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+   struct SourceLocation {<br>

+     int Line;<br>

+     int Col;<br>

+   };<br>

+   static SourceLocation CurLoc;<br>

+   static SourceLocation LexLoc = {1, 0};<br>

+<br>

+   static int advance() {<br>

+     int LastChar = getchar();<br>

+<br>

+     if (LastChar == '\n' || LastChar == '\r') {<br>

+       LexLoc.Line++;<br>

+       LexLoc.Col = 0;<br>

+     } else<br>

+       LexLoc.Col++;<br>

+     return LastChar;<br>

+   }<br>

+<br>

+In this set of code we've added some functionality on how to keep track of the<br>

+line and column of the "source file". As we lex every token we set our current<br>

+current "lexical location" to the assorted line and column for the beginning<br>

+of the token. We do this by overriding all of the previous calls to<br>

+``getchar()`` with our new ``advance()`` that keeps track of the information<br>

+and then we have added to all of our AST classes a source location:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+   class ExprAST {<br>

+     SourceLocation Loc;<br>

+<br>

+     public:<br>

+       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}<br>

+       virtual ~ExprAST() {}<br>

+       virtual Value* codegen() = 0;<br>

+       int getLine() const { return Loc.Line; }<br>

+       int getCol() const { return Loc.Col; }<br>

+       virtual raw_ostream &dump(raw_ostream &out, int ind) {<br>

+         return out << ':' << getLine() << ':' << getCol() << '\n';<br>

+       }<br>

+<br>

+that we pass down through when we create a new expression:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+   LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),<br>

+                                          std::move(RHS));<br>

+<br>

+giving us locations for each of our expressions and variables.<br>

+<br>

+From this we can make sure to tell ``DIBuilder`` when we're at a new source<br>

+location so it can use that when we generate the rest of our code and make<br>

+sure that each instruction has source location information. We do this<br>

+by constructing another small function:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  void DebugInfo::emitLocation(ExprAST *AST) {<br>

+    DIScope *Scope;<br>

+    if (LexicalBlocks.empty())<br>

+      Scope = TheCU;<br>

+    else<br>

+      Scope = LexicalBlocks.back();<br>

+    Builder.SetCurrentDebugLocation(<br>

+        DebugLoc::get(AST->getLine(), AST->getCol(), Scope));<br>

+  }<br>

+<br>

+that both tells the main ``IRBuilder`` where we are, but also what scope<br>

+we're in. Since we've just created a function above we can either be in<br>

+the main file scope (like when we created our function), or now we can be<br>

+in the function scope we just created. To represent this we create a stack<br>

+of scopes:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+   std::vector<DIScope *> LexicalBlocks;<br>

+   std::map<const PrototypeAST *, DIScope *> FnScopeMap;<br>

+<br>

+and keep a map of each function to the scope that it represents (an<br>

+DISubprogram is also an DIScope).<br>

+<br>

+Then we make sure to:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+   KSDbgInfo.emitLocation(this);<br>

+<br>

+emit the location every time we start to generate code for a new AST, and<br>

+also:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  KSDbgInfo.FnScopeMap[this] = SP;<br>

+<br>

+store the scope (function) when we create it and use it:<br>

+<br>

+  KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);<br>

+<br>

+when we start generating the code for each function.<br>

+<br>

+also, don't forget to pop the scope back off of your scope stack at the<br>

+end of the code generation for the function:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  // Pop off the lexical block for the function since we added it<br>

+  // unconditionally.<br>

+  KSDbgInfo.LexicalBlocks.pop_back();<br>

+<br>

+Variables<br>

+=========<br>

+<br>

+Now that we have functions, we need to be able to print out the variables<br>

+we have in scope. Let's get our function arguments set up so we can get<br>

+decent backtraces and see how our functions are being called. It isn't<br>

+a lot of code, and we generally handle it when we're creating the<br>

+argument allocas in ``PrototypeAST::CreateArgumentAllocas``.<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  DIScope *Scope = KSDbgInfo.LexicalBlocks.back();<br>

+  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),<br>

+                                      KSDbgInfo.TheCU.getDirectory());<br>

+  DILocalVariable D = DBuilder->createParameterVariable(<br>

+      Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true);<br>

+<br>

+  DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),<br>

+                          DebugLoc::get(Line, 0, Scope),<br>

+                          Builder.GetInsertBlock());<br>

+<br>

+Here we're doing a few things. First, we're grabbing our current scope<br>

+for the variable so we can say what range of code our variable is valid<br>

+through. Second, we're creating the variable, giving it the scope,<br>

+the name, source location, type, and since it's an argument, the argument<br>

+index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR<br>

+level that we've got a variable in an alloca (and it gives a starting<br>

+location for the variable), and setting a source location for the<br>

+beginning of the scope on the declare.<br>

+<br>

+One interesting thing to note at this point is that various debuggers have<br>

+assumptions based on how code and debug information was generated for them<br>

+in the past. In this case we need to do a little bit of a hack to avoid<br>

+generating line information for the function prologue so that the debugger<br>

+knows to skip over those instructions when setting a breakpoint. So in<br>

+``FunctionAST::CodeGen`` we add a couple of lines:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  // Unset the location for the prologue emission (leading instructions with no<br>

+  // location in a function are considered part of the prologue and the debugger<br>

+  // will run past them when breaking on a function)<br>

+  KSDbgInfo.emitLocation(nullptr);<br>

+<br>

+and then emit a new location when we actually start generating code for the<br>

+body of the function:<br>

+<br>

+.. code-block:: c++<br>

+<br>

+  KSDbgInfo.emitLocation(Body);<br>

+<br>

+With this we have enough debug information to set breakpoints in functions,<br>

+print out argument variables, and call functions. Not too bad for just a<br>

+few simple lines of code!<br>

+<br>

+Full Code Listing<br>

+=================<br>

+<br>

+Here is the complete code listing for our running example, enhanced with<br>

+debug information. To build this example, use:<br>

+<br>

+.. code-block:: bash<br>

+<br>

+    # Compile<br>

+    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

+    # Run<br>

+    ./toy<br>

+<br>

+Here is the code:<br>

+<br>

+.. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp<br>

+   :language: c++<br>

+<br>

+`Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_<br>

+<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl1.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl1.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl1.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl1.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl1.rst (removed)<br>

@@ -1,290 +0,0 @@<br>

-=================================================<br>

-Kaleidoscope: Tutorial Introduction and the Lexer<br>

-=================================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Tutorial Introduction<br>

-=====================<br>

-<br>

-Welcome to the "Implementing a language with LLVM" tutorial. This<br>

-tutorial runs through the implementation of a simple language, showing<br>

-how fun and easy it can be. This tutorial will get you up and started as<br>

-well as help to build a framework you can extend to other languages. The<br>

-code in this tutorial can also be used as a playground to hack on other<br>

-LLVM specific things.<br>

-<br>

-The goal of this tutorial is to progressively unveil our language,<br>

-describing how it is built up over time. This will let us cover a fairly<br>

-broad range of language design and LLVM-specific usage issues, showing<br>

-and explaining the code for it all along the way, without overwhelming<br>

-you with tons of details up front.<br>

-<br>

-It is useful to point out ahead of time that this tutorial is really<br>

-about teaching compiler techniques and LLVM specifically, *not* about<br>

-teaching modern and sane software engineering principles. In practice,<br>

-this means that we'll take a number of shortcuts to simplify the<br>

-exposition. For example, the code uses global variables<br>

-all over the place, doesn't use nice design patterns like<br>

-`visitors <<a href="http://en.wikipedia.org/wiki/Visitor_pattern" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Visitor_pattern</a>>`_, etc... but<br>

-it is very simple. If you dig in and use the code as a basis for future<br>

-projects, fixing these deficiencies shouldn't be hard.<br>

-<br>

-I've tried to put this tutorial together in a way that makes chapters<br>

-easy to skip over if you are already familiar with or are uninterested<br>

-in the various pieces. The structure of the tutorial is:<br>

-<br>

--  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope<br>

-   language, and the definition of its Lexer - This shows where we are<br>

-   going and the basic functionality that we want it to do. In order to<br>

-   make this tutorial maximally understandable and hackable, we choose<br>

-   to implement everything in C++ instead of using lexer and parser<br>

-   generators. LLVM obviously works just fine with such tools, feel free<br>

-   to use one if you prefer.<br>

--  `Chapter #2 <LangImpl2.html>`_: Implementing a Parser and AST -<br>

-   With the lexer in place, we can talk about parsing techniques and<br>

-   basic AST construction. This tutorial describes recursive descent<br>

-   parsing and operator precedence parsing. Nothing in Chapters 1 or 2<br>

-   is LLVM-specific, the code doesn't even link in LLVM at this point.<br>

-   :)<br>

--  `Chapter #3 <LangImpl3.html>`_: Code generation to LLVM IR - With<br>

-   the AST ready, we can show off how easy generation of LLVM IR really<br>

-   is.<br>

--  `Chapter #4 <LangImpl4.html>`_: Adding JIT and Optimizer Support<br>

-   - Because a lot of people are interested in using LLVM as a JIT,<br>

-   we'll dive right into it and show you the 3 lines it takes to add JIT<br>

-   support. LLVM is also useful in many other ways, but this is one<br>

-   simple and "sexy" way to show off its power. :)<br>

--  `Chapter #5 <LangImpl5.html>`_: Extending the Language: Control<br>

-   Flow - With the language up and running, we show how to extend it<br>

-   with control flow operations (if/then/else and a 'for' loop). This<br>

-   gives us a chance to talk about simple SSA construction and control<br>

-   flow.<br>

--  `Chapter #6 <LangImpl6.html>`_: Extending the Language:<br>

-   User-defined Operators - This is a silly but fun chapter that talks<br>

-   about extending the language to let the user program define their own<br>

-   arbitrary unary and binary operators (with assignable precedence!).<br>

-   This lets us build a significant piece of the "language" as library<br>

-   routines.<br>

--  `Chapter #7 <LangImpl7.html>`_: Extending the Language: Mutable<br>

-   Variables - This chapter talks about adding user-defined local<br>

-   variables along with an assignment operator. The interesting part<br>

-   about this is how easy and trivial it is to construct SSA form in<br>

-   LLVM: no, LLVM does *not* require your front-end to construct SSA<br>

-   form!<br>

--  `Chapter #8 <LangImpl8.html>`_: Extending the Language: Debug<br>

-   Information - Having built a decent little programming language with<br>

-   control flow, functions and mutable variables, we consider what it<br>

-   takes to add debug information to standalone executables. This debug<br>

-   information will allow you to set breakpoints in Kaleidoscope<br>

-   functions, print out argument variables, and call functions - all<br>

-   from within the debugger!<br>

--  `Chapter #9 <LangImpl9.html>`_: Conclusion and other useful LLVM<br>

-   tidbits - This chapter wraps up the series by talking about<br>

-   potential ways to extend the language, but also includes a bunch of<br>

-   pointers to info about "special topics" like adding garbage<br>

-   collection support, exceptions, debugging, support for "spaghetti<br>

-   stacks", and a bunch of other tips and tricks.<br>

-<br>

-By the end of the tutorial, we'll have written a bit less than 1000 lines<br>

-of non-comment, non-blank, lines of code. With this small amount of<br>

-code, we'll have built up a very reasonable compiler for a non-trivial<br>

-language including a hand-written lexer, parser, AST, as well as code<br>

-generation support with a JIT compiler. While other systems may have<br>

-interesting "hello world" tutorials, I think the breadth of this<br>

-tutorial is a great testament to the strengths of LLVM and why you<br>

-should consider it if you're interested in language or compiler design.<br>

-<br>

-A note about this tutorial: we expect you to extend the language and<br>

-play with it on your own. Take the code and go crazy hacking away at it,<br>

-compilers don't need to be scary creatures - it can be a lot of fun to<br>

-play with languages!<br>

-<br>

-The Basic Language<br>

-==================<br>

-<br>

-This tutorial will be illustrated with a toy language that we'll call<br>

-"`Kaleidoscope <<a href="http://en.wikipedia.org/wiki/Kaleidoscope" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Kaleidoscope</a>>`_" (derived<br>

-from "meaning beautiful, form, and view"). Kaleidoscope is a procedural<br>

-language that allows you to define functions, use conditionals, math,<br>

-etc. Over the course of the tutorial, we'll extend Kaleidoscope to<br>

-support the if/then/else construct, a for loop, user defined operators,<br>

-JIT compilation with a simple command line interface, etc.<br>

-<br>

-Because we want to keep things simple, the only datatype in Kaleidoscope<br>

-is a 64-bit floating point type (aka 'double' in C parlance). As such,<br>

-all values are implicitly double precision and the language doesn't<br>

-require type declarations. This gives the language a very nice and<br>

-simple syntax. For example, the following simple example computes<br>

-`Fibonacci numbers: <<a href="http://en.wikipedia.org/wiki/Fibonacci_number" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Fibonacci_number</a>>`_<br>

-<br>

-::<br>

-<br>

-    # Compute the x'th fibonacci number.<br>

-    def fib(x)<br>

-      if x < 3 then<br>

-        1<br>

-      else<br>

-        fib(x-1)+fib(x-2)<br>

-<br>

-    # This expression will compute the 40th number.<br>

-    fib(40)<br>

-<br>

-We also allow Kaleidoscope to call into standard library functions (the<br>

-LLVM JIT makes this completely trivial). This means that you can use the<br>

-'extern' keyword to define a function before you use it (this is also<br>

-useful for mutually recursive functions). For example:<br>

-<br>

-::<br>

-<br>

-    extern sin(arg);<br>

-    extern cos(arg);<br>

-    extern atan2(arg1 arg2);<br>

-<br>

-    atan2(sin(.4), cos(42))<br>

-<br>

-A more interesting example is included in Chapter 6 where we write a<br>

-little Kaleidoscope application that `displays a Mandelbrot<br>

-Set <LangImpl6.html#kicking-the-tires>`_ at various levels of magnification.<br>

-<br>

-Lets dive into the implementation of this language!<br>

-<br>

-The Lexer<br>

-=========<br>

-<br>

-When it comes to implementing a language, the first thing needed is the<br>

-ability to process a text file and recognize what it says. The<br>

-traditional way to do this is to use a<br>

-"`lexer <<a href="http://en.wikipedia.org/wiki/Lexical_analysis" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Lexical_analysis</a>>`_" (aka<br>

-'scanner') to break the input up into "tokens". Each token returned by<br>

-the lexer includes a token code and potentially some metadata (e.g. the<br>

-numeric value of a number). First, we define the possibilities:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    // The lexer returns tokens [0-255] if it is an unknown character, otherwise one<br>

-    // of these for known things.<br>

-    enum Token {<br>

-      tok_eof = -1,<br>

-<br>

-      // commands<br>

-      tok_def = -2,<br>

-      tok_extern = -3,<br>

-<br>

-      // primary<br>

-      tok_identifier = -4,<br>

-      tok_number = -5,<br>

-    };<br>

-<br>

-    static std::string IdentifierStr; // Filled in if tok_identifier<br>

-    static double NumVal;             // Filled in if tok_number<br>

-<br>

-Each token returned by our lexer will either be one of the Token enum<br>

-values or it will be an 'unknown' character like '+', which is returned<br>

-as its ASCII value. If the current token is an identifier, the<br>

-``IdentifierStr`` global variable holds the name of the identifier. If<br>

-the current token is a numeric literal (like 1.0), ``NumVal`` holds its<br>

-value. Note that we use global variables for simplicity, this is not the<br>

-best choice for a real language implementation :).<br>

-<br>

-The actual implementation of the lexer is a single function named<br>

-``gettok``. The ``gettok`` function is called to return the next token<br>

-from standard input. Its definition starts as:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// gettok - Return the next token from standard input.<br>

-    static int gettok() {<br>

-      static int LastChar = ' ';<br>

-<br>

-      // Skip any whitespace.<br>

-      while (isspace(LastChar))<br>

-        LastChar = getchar();<br>

-<br>

-``gettok`` works by calling the C ``getchar()`` function to read<br>

-characters one at a time from standard input. It eats them as it<br>

-recognizes them and stores the last character read, but not processed,<br>

-in LastChar. The first thing that it has to do is ignore whitespace<br>

-between tokens. This is accomplished with the loop above.<br>

-<br>

-The next thing ``gettok`` needs to do is recognize identifiers and<br>

-specific keywords like "def". Kaleidoscope does this with this simple<br>

-loop:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*<br>

-        IdentifierStr = LastChar;<br>

-        while (isalnum((LastChar = getchar())))<br>

-          IdentifierStr += LastChar;<br>

-<br>

-        if (IdentifierStr == "def")<br>

-          return tok_def;<br>

-        if (IdentifierStr == "extern")<br>

-          return tok_extern;<br>

-        return tok_identifier;<br>

-      }<br>

-<br>

-Note that this code sets the '``IdentifierStr``' global whenever it<br>

-lexes an identifier. Also, since language keywords are matched by the<br>

-same loop, we handle them here inline. Numeric values are similar:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+<br>

-        std::string NumStr;<br>

-        do {<br>

-          NumStr += LastChar;<br>

-          LastChar = getchar();<br>

-        } while (isdigit(LastChar) || LastChar == '.');<br>

-<br>

-        NumVal = strtod(NumStr.c_str(), 0);<br>

-        return tok_number;<br>

-      }<br>

-<br>

-This is all pretty straight-forward code for processing input. When<br>

-reading a numeric value from input, we use the C ``strtod`` function to<br>

-convert it to a numeric value that we store in ``NumVal``. Note that<br>

-this isn't doing sufficient error checking: it will incorrectly read<br>

-"1.23.45.67" and handle it as if you typed in "1.23". Feel free to<br>

-extend it :). Next we handle comments:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      if (LastChar == '#') {<br>

-        // Comment until end of line.<br>

-        do<br>

-          LastChar = getchar();<br>

-        while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');<br>

-<br>

-        if (LastChar != EOF)<br>

-          return gettok();<br>

-      }<br>

-<br>

-We handle comments by skipping to the end of the line and then return<br>

-the next token. Finally, if the input doesn't match one of the above<br>

-cases, it is either an operator character like '+' or the end of the<br>

-file. These are handled with this code:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Check for end of file.  Don't eat the EOF.<br>

-      if (LastChar == EOF)<br>

-        return tok_eof;<br>

-<br>

-      // Otherwise, just return the character as its ascii value.<br>

-      int ThisChar = LastChar;<br>

-      LastChar = getchar();<br>

-      return ThisChar;<br>

-    }<br>

-<br>

-With this, we have the complete lexer for the basic Kaleidoscope<br>

-language (the `full code listing <LangImpl2.html#full-code-listing>`_ for the Lexer<br>

-is available in the `next chapter <LangImpl2.html>`_ of the tutorial).<br>

-Next we'll `build a simple parser that uses this to build an Abstract<br>

-Syntax Tree <LangImpl2.html>`_. When we have that, we'll include a<br>

-driver so that you can use the lexer and parser together.<br>

-<br>

-`Next: Implementing a Parser and AST <LangImpl2.html>`_<br>

-<br>

<br>

Added: llvm/trunk/docs/tutorial/LangImpl10.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl10.rst?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl10.rst?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl10.rst (added)<br>

+++ llvm/trunk/docs/tutorial/LangImpl10.rst Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,259 @@<br>

+======================================================<br>

+Kaleidoscope: Conclusion and other useful LLVM tidbits<br>

+======================================================<br>

+<br>

+.. contents::<br>

+   :local:<br>

+<br>

+Tutorial Conclusion<br>

+===================<br>

+<br>

+Welcome to the final chapter of the "`Implementing a language with<br>

+LLVM <index.html>`_" tutorial. In the course of this tutorial, we have<br>

+grown our little Kaleidoscope language from being a useless toy, to<br>

+being a semi-interesting (but probably still useless) toy. :)<br>

+<br>

+It is interesting to see how far we've come, and how little code it has<br>

+taken. We built the entire lexer, parser, AST, code generator, an<br>

+interactive run-loop (with a JIT!), and emitted debug information in<br>

+standalone executables - all in under 1000 lines of (non-comment/non-blank)<br>

+code.<br>

+<br>

+Our little language supports a couple of interesting features: it<br>

+supports user defined binary and unary operators, it uses JIT<br>

+compilation for immediate evaluation, and it supports a few control flow<br>

+constructs with SSA construction.<br>

+<br>

+Part of the idea of this tutorial was to show you how easy and fun it<br>

+can be to define, build, and play with languages. Building a compiler<br>

+need not be a scary or mystical process! Now that you've seen some of<br>

+the basics, I strongly encourage you to take the code and hack on it.<br>

+For example, try adding:<br>

+<br>

+-  **global variables** - While global variables have questional value<br>

+   in modern software engineering, they are often useful when putting<br>

+   together quick little hacks like the Kaleidoscope compiler itself.<br>

+   Fortunately, our current setup makes it very easy to add global<br>

+   variables: just have value lookup check to see if an unresolved<br>

+   variable is in the global variable symbol table before rejecting it.<br>

+   To create a new global variable, make an instance of the LLVM<br>

+   ``GlobalVariable`` class.<br>

+-  **typed variables** - Kaleidoscope currently only supports variables<br>

+   of type double. This gives the language a very nice elegance, because<br>

+   only supporting one type means that you never have to specify types.<br>

+   Different languages have different ways of handling this. The easiest<br>

+   way is to require the user to specify types for every variable<br>

+   definition, and record the type of the variable in the symbol table<br>

+   along with its Value\*.<br>

+-  **arrays, structs, vectors, etc** - Once you add types, you can start<br>

+   extending the type system in all sorts of interesting ways. Simple<br>

+   arrays are very easy and are quite useful for many different<br>

+   applications. Adding them is mostly an exercise in learning how the<br>

+   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction<br>

+   works: it is so nifty/unconventional, it `has its own<br>

+   FAQ <../GetElementPtr.html>`_!<br>

+-  **standard runtime** - Our current language allows the user to access<br>

+   arbitrary external functions, and we use it for things like "printd"<br>

+   and "putchard". As you extend the language to add higher-level<br>

+   constructs, often these constructs make the most sense if they are<br>

+   lowered to calls into a language-supplied runtime. For example, if<br>

+   you add hash tables to the language, it would probably make sense to<br>

+   add the routines to a runtime, instead of inlining them all the way.<br>

+-  **memory management** - Currently we can only access the stack in<br>

+   Kaleidoscope. It would also be useful to be able to allocate heap<br>

+   memory, either with calls to the standard libc malloc/free interface<br>

+   or with a garbage collector. If you would like to use garbage<br>

+   collection, note that LLVM fully supports `Accurate Garbage<br>

+   Collection <../GarbageCollection.html>`_ including algorithms that<br>

+   move objects and need to scan/update the stack.<br>

+-  **exception handling support** - LLVM supports generation of `zero<br>

+   cost exceptions <../ExceptionHandling.html>`_ which interoperate with<br>

+   code compiled in other languages. You could also generate code by<br>

+   implicitly making every function return an error value and checking<br>

+   it. You could also make explicit use of setjmp/longjmp. There are<br>

+   many different ways to go here.<br>

+-  **object orientation, generics, database access, complex numbers,<br>

+   geometric programming, ...** - Really, there is no end of crazy<br>

+   features that you can add to the language.<br>

+-  **unusual domains** - We've been talking about applying LLVM to a<br>

+   domain that many people are interested in: building a compiler for a<br>

+   specific language. However, there are many other domains that can use<br>

+   compiler technology that are not typically considered. For example,<br>

+   LLVM has been used to implement OpenGL graphics acceleration,<br>

+   translate C++ code to ActionScript, and many other cute and clever<br>

+   things. Maybe you will be the first to JIT compile a regular<br>

+   expression interpreter into native code with LLVM?<br>

+<br>

+Have fun - try doing something crazy and unusual. Building a language<br>

+like everyone else always has, is much less fun than trying something a<br>

+little crazy or off the wall and seeing how it turns out. If you get<br>

+stuck or want to talk about it, feel free to email the `llvm-dev mailing<br>

+list <<a href="http://lists.llvm.org/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/mailman/listinfo/llvm-dev</a>>`_: it has lots<br>

+of people who are interested in languages and are often willing to help<br>

+out.<br>

+<br>

+Before we end this tutorial, I want to talk about some "tips and tricks"<br>

+for generating LLVM IR. These are some of the more subtle things that<br>

+may not be obvious, but are very useful if you want to take advantage of<br>

+LLVM's capabilities.<br>

+<br>

+Properties of the LLVM IR<br>

+=========================<br>

+<br>

+We have a couple of common questions about code in the LLVM IR form -<br>

+let's just get these out of the way right now, shall we?<br>

+<br>

+Target Independence<br>

+-------------------<br>

+<br>

+Kaleidoscope is an example of a "portable language": any program written<br>

+in Kaleidoscope will work the same way on any target that it runs on.<br>

+Many other languages have this property, e.g. lisp, java, haskell,<br>

+javascript, python, etc (note that while these languages are portable,<br>

+not all their libraries are).<br>

+<br>

+One nice aspect of LLVM is that it is often capable of preserving target<br>

+independence in the IR: you can take the LLVM IR for a<br>

+Kaleidoscope-compiled program and run it on any target that LLVM<br>

+supports, even emitting C code and compiling that on targets that LLVM<br>

+doesn't support natively. You can trivially tell that the Kaleidoscope<br>

+compiler generates target-independent code because it never queries for<br>

+any target-specific information when generating code.<br>

+<br>

+The fact that LLVM provides a compact, target-independent,<br>

+representation for code gets a lot of people excited. Unfortunately,<br>

+these people are usually thinking about C or a language from the C<br>

+family when they are asking questions about language portability. I say<br>

+"unfortunately", because there is really no way to make (fully general)<br>

+C code portable, other than shipping the source code around (and of<br>

+course, C source code is not actually portable in general either - ever<br>

+port a really old application from 32- to 64-bits?).<br>

+<br>

+The problem with C (again, in its full generality) is that it is heavily<br>

+laden with target specific assumptions. As one simple example, the<br>

+preprocessor often destructively removes target-independence from the<br>

+code when it processes the input text:<br>

+<br>

+.. code-block:: c<br>

+<br>

+    #ifdef __i386__<br>

+      int X = 1;<br>

+    #else<br>

+      int X = 42;<br>

+    #endif<br>

+<br>

+While it is possible to engineer more and more complex solutions to<br>

+problems like this, it cannot be solved in full generality in a way that<br>

+is better than shipping the actual source code.<br>

+<br>

+That said, there are interesting subsets of C that can be made portable.<br>

+If you are willing to fix primitive types to a fixed size (say int =<br>

+32-bits, and long = 64-bits), don't care about ABI compatibility with<br>

+existing binaries, and are willing to give up some other minor features,<br>

+you can have portable code. This can make sense for specialized domains<br>

+such as an in-kernel language.<br>

+<br>

+Safety Guarantees<br>

+-----------------<br>

+<br>

+Many of the languages above are also "safe" languages: it is impossible<br>

+for a program written in Java to corrupt its address space and crash the<br>

+process (assuming the JVM has no bugs). Safety is an interesting<br>

+property that requires a combination of language design, runtime<br>

+support, and often operating system support.<br>

+<br>

+It is certainly possible to implement a safe language in LLVM, but LLVM<br>

+IR does not itself guarantee safety. The LLVM IR allows unsafe pointer<br>

+casts, use after free bugs, buffer over-runs, and a variety of other<br>

+problems. Safety needs to be implemented as a layer on top of LLVM and,<br>

+conveniently, several groups have investigated this. Ask on the `llvm-dev<br>

+mailing list <<a href="http://lists.llvm.org/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/mailman/listinfo/llvm-dev</a>>`_ if<br>

+you are interested in more details.<br>

+<br>

+Language-Specific Optimizations<br>

+-------------------------------<br>

+<br>

+One thing about LLVM that turns off many people is that it does not<br>

+solve all the world's problems in one system (sorry 'world hunger',<br>

+someone else will have to solve you some other day). One specific<br>

+complaint is that people perceive LLVM as being incapable of performing<br>

+high-level language-specific optimization: LLVM "loses too much<br>

+information".<br>

+<br>

+Unfortunately, this is really not the place to give you a full and<br>

+unified version of "Chris Lattner's theory of compiler design". Instead,<br>

+I'll make a few observations:<br>

+<br>

+First, you're right that LLVM does lose information. For example, as of<br>

+this writing, there is no way to distinguish in the LLVM IR whether an<br>

+SSA-value came from a C "int" or a C "long" on an ILP32 machine (other<br>

+than debug info). Both get compiled down to an 'i32' value and the<br>

+information about what it came from is lost. The more general issue<br>

+here, is that the LLVM type system uses "structural equivalence" instead<br>

+of "name equivalence". Another place this surprises people is if you<br>

+have two types in a high-level language that have the same structure<br>

+(e.g. two different structs that have a single int field): these types<br>

+will compile down into a single LLVM type and it will be impossible to<br>

+tell what it came from.<br>

+<br>

+Second, while LLVM does lose information, LLVM is not a fixed target: we<br>

+continue to enhance and improve it in many different ways. In addition<br>

+to adding new features (LLVM did not always support exceptions or debug<br>

+info), we also extend the IR to capture important information for<br>

+optimization (e.g. whether an argument is sign or zero extended,<br>

+information about pointers aliasing, etc). Many of the enhancements are<br>

+user-driven: people want LLVM to include some specific feature, so they<br>

+go ahead and extend it.<br>

+<br>

+Third, it is *possible and easy* to add language-specific optimizations,<br>

+and you have a number of choices in how to do it. As one trivial<br>

+example, it is easy to add language-specific optimization passes that<br>

+"know" things about code compiled for a language. In the case of the C<br>

+family, there is an optimization pass that "knows" about the standard C<br>

+library functions. If you call "exit(0)" in main(), it knows that it is<br>

+safe to optimize that into "return 0;" because C specifies what the<br>

+'exit' function does.<br>

+<br>

+In addition to simple library knowledge, it is possible to embed a<br>

+variety of other language-specific information into the LLVM IR. If you<br>

+have a specific need and run into a wall, please bring the topic up on<br>

+the llvm-dev list. At the very worst, you can always treat LLVM as if it<br>

+were a "dumb code generator" and implement the high-level optimizations<br>

+you desire in your front-end, on the language-specific AST.<br>

+<br>

+Tips and Tricks<br>

+===============<br>

+<br>

+There is a variety of useful tips and tricks that you come to know after<br>

+working on/with LLVM that aren't obvious at first glance. Instead of<br>

+letting everyone rediscover them, this section talks about some of these<br>

+issues.<br>

+<br>

+Implementing portable offsetof/sizeof<br>

+-------------------------------------<br>

+<br>

+One interesting thing that comes up, if you are trying to keep the code<br>

+generated by your compiler "target independent", is that you often need<br>

+to know the size of some LLVM type or the offset of some field in an<br>

+llvm structure. For example, you might need to pass the size of a type<br>

+into a function that allocates memory.<br>

+<br>

+Unfortunately, this can vary widely across targets: for example the<br>

+width of a pointer is trivially target-specific. However, there is a<br>

+`clever way to use the getelementptr<br>

+instruction <<a href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt" rel="noreferrer" target="_blank">http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt</a>>`_<br>

+that allows you to compute this in a portable way.<br>

+<br>

+Garbage Collected Stack Frames<br>

+------------------------------<br>

+<br>

+Some languages want to explicitly manage their stack frames, often so<br>

+that they are garbage collected or to allow easy implementation of<br>

+closures. There are often better ways to implement these features than<br>

+explicit stack frames, but `LLVM does support<br>

+them, <<a href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt" rel="noreferrer" target="_blank">http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt</a>>`_<br>

+if you want. It requires your front-end to convert the code into<br>

+`Continuation Passing<br>

+Style <<a href="http://en.wikipedia.org/wiki/Continuation-passing_style" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Continuation-passing_style</a>>`_ and<br>

+the use of tail calls (which LLVM also supports).<br>

+<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl2.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl2.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl2.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl2.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl2.rst (removed)<br>

@@ -1,735 +0,0 @@<br>

-===========================================<br>

-Kaleidoscope: Implementing a Parser and AST<br>

-===========================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 2 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 2 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. This chapter shows you how to use the<br>

-lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full<br>

-`parser <<a href="http://en.wikipedia.org/wiki/Parsing" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Parsing</a>>`_ for our Kaleidoscope<br>

-language. Once we have a parser, we'll define and build an `Abstract<br>

-Syntax Tree <<a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Abstract_syntax_tree</a>>`_ (AST).<br>

-<br>

-The parser we will build uses a combination of `Recursive Descent<br>

-Parsing <<a href="http://en.wikipedia.org/wiki/Recursive_descent_parser" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Recursive_descent_parser</a>>`_ and<br>

-`Operator-Precedence<br>

-Parsing <<a href="http://en.wikipedia.org/wiki/Operator-precedence_parser" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Operator-precedence_parser</a>>`_ to<br>

-parse the Kaleidoscope language (the latter for binary expressions and<br>

-the former for everything else). Before we get to parsing though, lets<br>

-talk about the output of the parser: the Abstract Syntax Tree.<br>

-<br>

-The Abstract Syntax Tree (AST)<br>

-==============================<br>

-<br>

-The AST for a program captures its behavior in such a way that it is<br>

-easy for later stages of the compiler (e.g. code generation) to<br>

-interpret. We basically want one object for each construct in the<br>

-language, and the AST should closely model the language. In<br>

-Kaleidoscope, we have expressions, a prototype, and a function object.<br>

-We'll start with expressions first:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// ExprAST - Base class for all expression nodes.<br>

-    class ExprAST {<br>

-    public:<br>

-      virtual ~ExprAST() {}<br>

-    };<br>

-<br>

-    /// NumberExprAST - Expression class for numeric literals like "1.0".<br>

-    class NumberExprAST : public ExprAST {<br>

-      double Val;<br>

-<br>

-    public:<br>

-      NumberExprAST(double Val) : Val(Val) {}<br>

-    };<br>

-<br>

-The code above shows the definition of the base ExprAST class and one<br>

-subclass which we use for numeric literals. The important thing to note<br>

-about this code is that the NumberExprAST class captures the numeric<br>

-value of the literal as an instance variable. This allows later phases<br>

-of the compiler to know what the stored numeric value is.<br>

-<br>

-Right now we only create the AST, so there are no useful accessor<br>

-methods on them. It would be very easy to add a virtual method to pretty<br>

-print the code, for example. Here are the other expression AST node<br>

-definitions that we'll use in the basic form of the Kaleidoscope<br>

-language:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// VariableExprAST - Expression class for referencing a variable, like "a".<br>

-    class VariableExprAST : public ExprAST {<br>

-      std::string Name;<br>

-<br>

-    public:<br>

-      VariableExprAST(const std::string &Name) : Name(Name) {}<br>

-    };<br>

-<br>

-    /// BinaryExprAST - Expression class for a binary operator.<br>

-    class BinaryExprAST : public ExprAST {<br>

-      char Op;<br>

-      std::unique_ptr<ExprAST> LHS, RHS;<br>

-<br>

-    public:<br>

-      BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,<br>

-                    std::unique_ptr<ExprAST> RHS)<br>

-        : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}<br>

-    };<br>

-<br>

-    /// CallExprAST - Expression class for function calls.<br>

-    class CallExprAST : public ExprAST {<br>

-      std::string Callee;<br>

-      std::vector<std::unique_ptr<ExprAST>> Args;<br>

-<br>

-    public:<br>

-      CallExprAST(const std::string &Callee,<br>

-                  std::vector<std::unique_ptr<ExprAST>> Args)<br>

-        : Callee(Callee), Args(std::move(Args)) {}<br>

-    };<br>

-<br>

-This is all (intentionally) rather straight-forward: variables capture<br>

-the variable name, binary operators capture their opcode (e.g. '+'), and<br>

-calls capture a function name as well as a list of any argument<br>

-expressions. One thing that is nice about our AST is that it captures<br>

-the language features without talking about the syntax of the language.<br>

-Note that there is no discussion about precedence of binary operators,<br>

-lexical structure, etc.<br>

-<br>

-For our basic language, these are all of the expression nodes we'll<br>

-define. Because it doesn't have conditional control flow, it isn't<br>

-Turing-complete; we'll fix that in a later installment. The two things<br>

-we need next are a way to talk about the interface to a function, and a<br>

-way to talk about functions themselves:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// PrototypeAST - This class represents the "prototype" for a function,<br>

-    /// which captures its name, and its argument names (thus implicitly the number<br>

-    /// of arguments the function takes).<br>

-    class PrototypeAST {<br>

-      std::string Name;<br>

-      std::vector<std::string> Args;<br>

-<br>

-    public:<br>

-      PrototypeAST(const std::string &name, std::vector<std::string> Args)<br>

-        : Name(name), Args(std::move(Args)) {}<br>

-    };<br>

-<br>

-    /// FunctionAST - This class represents a function definition itself.<br>

-    class FunctionAST {<br>

-      std::unique_ptr<PrototypeAST> Proto;<br>

-      std::unique_ptr<ExprAST> Body;<br>

-<br>

-    public:<br>

-      FunctionAST(std::unique_ptr<PrototypeAST> Proto,<br>

-                  std::unique_ptr<ExprAST> Body)<br>

-        : Proto(std::move(Proto)), Body(std::move(Body)) {}<br>

-    };<br>

-<br>

-In Kaleidoscope, functions are typed with just a count of their<br>

-arguments. Since all values are double precision floating point, the<br>

-type of each argument doesn't need to be stored anywhere. In a more<br>

-aggressive and realistic language, the "ExprAST" class would probably<br>

-have a type field.<br>

-<br>

-With this scaffolding, we can now talk about parsing expressions and<br>

-function bodies in Kaleidoscope.<br>

-<br>

-Parser Basics<br>

-=============<br>

-<br>

-Now that we have an AST to build, we need to define the parser code to<br>

-build it. The idea here is that we want to parse something like "x+y"<br>

-(which is returned as three tokens by the lexer) into an AST that could<br>

-be generated with calls like this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      auto LHS = llvm::make_unique<VariableExprAST>("x");<br>

-      auto RHS = llvm::make_unique<VariableExprAST>("y");<br>

-      auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),<br>

-                                                    std::move(RHS));<br>

-<br>

-In order to do this, we'll start by defining some basic helper routines:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current<br>

-    /// token the parser is looking at.  getNextToken reads another token from the<br>

-    /// lexer and updates CurTok with its results.<br>

-    static int CurTok;<br>

-    static int getNextToken() {<br>

-      return CurTok = gettok();<br>

-    }<br>

-<br>

-This implements a simple token buffer around the lexer. This allows us<br>

-to look one token ahead at what the lexer is returning. Every function<br>

-in our parser will assume that CurTok is the current token that needs to<br>

-be parsed.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-<br>

-    /// LogError* - These are little helper functions for error handling.<br>

-    std::unique_ptr<ExprAST> LogError(const char *Str) {<br>

-      fprintf(stderr, "LogError: %s\n", Str);<br>

-      return nullptr;<br>

-    }<br>

-    std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {<br>

-      LogError(Str);<br>

-      return nullptr;<br>

-    }<br>

-<br>

-The ``LogError`` routines are simple helper routines that our parser will<br>

-use to handle errors. The error recovery in our parser will not be the<br>

-best and is not particular user-friendly, but it will be enough for our<br>

-tutorial. These routines make it easier to handle errors in routines<br>

-that have various return types: they always return null.<br>

-<br>

-With these basic helper functions, we can implement the first piece of<br>

-our grammar: numeric literals.<br>

-<br>

-Basic Expression Parsing<br>

-========================<br>

-<br>

-We start with numeric literals, because they are the simplest to<br>

-process. For each production in our grammar, we'll define a function<br>

-which parses that production. For numeric literals, we have:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// numberexpr ::= number<br>

-    static std::unique_ptr<ExprAST> ParseNumberExpr() {<br>

-      auto Result = llvm::make_unique<NumberExprAST>(NumVal);<br>

-      getNextToken(); // consume the number<br>

-      return std::move(Result);<br>

-    }<br>

-<br>

-This routine is very simple: it expects to be called when the current<br>

-token is a ``tok_number`` token. It takes the current number value,<br>

-creates a ``NumberExprAST`` node, advances the lexer to the next token,<br>

-and finally returns.<br>

-<br>

-There are some interesting aspects to this. The most important one is<br>

-that this routine eats all of the tokens that correspond to the<br>

-production and returns the lexer buffer with the next token (which is<br>

-not part of the grammar production) ready to go. This is a fairly<br>

-standard way to go for recursive descent parsers. For a better example,<br>

-the parenthesis operator is defined like this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// parenexpr ::= '(' expression ')'<br>

-    static std::unique_ptr<ExprAST> ParseParenExpr() {<br>

-      getNextToken(); // eat (.<br>

-      auto V = ParseExpression();<br>

-      if (!V)<br>

-        return nullptr;<br>

-<br>

-      if (CurTok != ')')<br>

-        return LogError("expected ')'");<br>

-      getNextToken(); // eat ).<br>

-      return V;<br>

-    }<br>

-<br>

-This function illustrates a number of interesting things about the<br>

-parser:<br>

-<br>

-1) It shows how we use the LogError routines. When called, this function<br>

-expects that the current token is a '(' token, but after parsing the<br>

-subexpression, it is possible that there is no ')' waiting. For example,<br>

-if the user types in "(4 x" instead of "(4)", the parser should emit an<br>

-error. Because errors can occur, the parser needs a way to indicate that<br>

-they happened: in our parser, we return null on an error.<br>

-<br>

-2) Another interesting aspect of this function is that it uses recursion<br>

-by calling ``ParseExpression`` (we will soon see that<br>

-``ParseExpression`` can call ``ParseParenExpr``). This is powerful<br>

-because it allows us to handle recursive grammars, and keeps each<br>

-production very simple. Note that parentheses do not cause construction<br>

-of AST nodes themselves. While we could do it this way, the most<br>

-important role of parentheses are to guide the parser and provide<br>

-grouping. Once the parser constructs the AST, parentheses are not<br>

-needed.<br>

-<br>

-The next simple production is for handling variable references and<br>

-function calls:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// identifierexpr<br>

-    ///   ::= identifier<br>

-    ///   ::= identifier '(' expression* ')'<br>

-    static std::unique_ptr<ExprAST> ParseIdentifierExpr() {<br>

-      std::string IdName = IdentifierStr;<br>

-<br>

-      getNextToken();  // eat identifier.<br>

-<br>

-      if (CurTok != '(') // Simple variable ref.<br>

-        return llvm::make_unique<VariableExprAST>(IdName);<br>

-<br>

-      // Call.<br>

-      getNextToken();  // eat (<br>

-      std::vector<std::unique_ptr<ExprAST>> Args;<br>

-      if (CurTok != ')') {<br>

-        while (1) {<br>

-          if (auto Arg = ParseExpression())<br>

-            Args.push_back(std::move(Arg));<br>

-          else<br>

-            return nullptr;<br>

-<br>

-          if (CurTok == ')')<br>

-            break;<br>

-<br>

-          if (CurTok != ',')<br>

-            return LogError("Expected ')' or ',' in argument list");<br>

-          getNextToken();<br>

-        }<br>

-      }<br>

-<br>

-      // Eat the ')'.<br>

-      getNextToken();<br>

-<br>

-      return llvm::make_unique<CallExprAST>(IdName, std::move(Args));<br>

-    }<br>

-<br>

-This routine follows the same style as the other routines. (It expects<br>

-to be called if the current token is a ``tok_identifier`` token). It<br>

-also has recursion and error handling. One interesting aspect of this is<br>

-that it uses *look-ahead* to determine if the current identifier is a<br>

-stand alone variable reference or if it is a function call expression.<br>

-It handles this by checking to see if the token after the identifier is<br>

-a '(' token, constructing either a ``VariableExprAST`` or<br>

-``CallExprAST`` node as appropriate.<br>

-<br>

-Now that we have all of our simple expression-parsing logic in place, we<br>

-can define a helper function to wrap it together into one entry point.<br>

-We call this class of expressions "primary" expressions, for reasons<br>

-that will become more clear `later in the<br>

-tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary<br>

-primary expression, we need to determine what sort of expression it is:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// primary<br>

-    ///   ::= identifierexpr<br>

-    ///   ::= numberexpr<br>

-    ///   ::= parenexpr<br>

-    static std::unique_ptr<ExprAST> ParsePrimary() {<br>

-      switch (CurTok) {<br>

-      default:<br>

-        return LogError("unknown token when expecting an expression");<br>

-      case tok_identifier:<br>

-        return ParseIdentifierExpr();<br>

-      case tok_number:<br>

-        return ParseNumberExpr();<br>

-      case '(':<br>

-        return ParseParenExpr();<br>

-      }<br>

-    }<br>

-<br>

-Now that you see the definition of this function, it is more obvious why<br>

-we can assume the state of CurTok in the various functions. This uses<br>

-look-ahead to determine which sort of expression is being inspected, and<br>

-then parses it with a function call.<br>

-<br>

-Now that basic expressions are handled, we need to handle binary<br>

-expressions. They are a bit more complex.<br>

-<br>

-Binary Expression Parsing<br>

-=========================<br>

-<br>

-Binary expressions are significantly harder to parse because they are<br>

-often ambiguous. For example, when given the string "x+y\*z", the parser<br>

-can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common<br>

-definitions from mathematics, we expect the later parse, because "\*"<br>

-(multiplication) has higher *precedence* than "+" (addition).<br>

-<br>

-There are many ways to handle this, but an elegant and efficient way is<br>

-to use `Operator-Precedence<br>

-Parsing <<a href="http://en.wikipedia.org/wiki/Operator-precedence_parser" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Operator-precedence_parser</a>>`_.<br>

-This parsing technique uses the precedence of binary operators to guide<br>

-recursion. To start with, we need a table of precedences:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// BinopPrecedence - This holds the precedence for each binary operator that is<br>

-    /// defined.<br>

-    static std::map<char, int> BinopPrecedence;<br>

-<br>

-    /// GetTokPrecedence - Get the precedence of the pending binary operator token.<br>

-    static int GetTokPrecedence() {<br>

-      if (!isascii(CurTok))<br>

-        return -1;<br>

-<br>

-      // Make sure it's a declared binop.<br>

-      int TokPrec = BinopPrecedence[CurTok];<br>

-      if (TokPrec <= 0) return -1;<br>

-      return TokPrec;<br>

-    }<br>

-<br>

-    int main() {<br>

-      // Install standard binary operators.<br>

-      // 1 is lowest precedence.<br>

-      BinopPrecedence['<'] = 10;<br>

-      BinopPrecedence['+'] = 20;<br>

-      BinopPrecedence['-'] = 20;<br>

-      BinopPrecedence['*'] = 40;  // highest.<br>

-      ...<br>

-    }<br>

-<br>

-For the basic form of Kaleidoscope, we will only support 4 binary<br>

-operators (this can obviously be extended by you, our brave and intrepid<br>

-reader). The ``GetTokPrecedence`` function returns the precedence for<br>

-the current token, or -1 if the token is not a binary operator. Having a<br>

-map makes it easy to add new operators and makes it clear that the<br>

-algorithm doesn't depend on the specific operators involved, but it<br>

-would be easy enough to eliminate the map and do the comparisons in the<br>

-``GetTokPrecedence`` function. (Or just use a fixed-size array).<br>

-<br>

-With the helper above defined, we can now start parsing binary<br>

-expressions. The basic idea of operator precedence parsing is to break<br>

-down an expression with potentially ambiguous binary operators into<br>

-pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".<br>

-Operator precedence parsing considers this as a stream of primary<br>

-expressions separated by binary operators. As such, it will first parse<br>

-the leading primary expression "a", then it will see the pairs [+, b]<br>

-[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are<br>

-primary expressions, the binary expression parser doesn't need to worry<br>

-about nested subexpressions like (c+d) at all.<br>

-<br>

-To start, an expression is a primary expression potentially followed by<br>

-a sequence of [binop,primaryexpr] pairs:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// expression<br>

-    ///   ::= primary binoprhs<br>

-    ///<br>

-    static std::unique_ptr<ExprAST> ParseExpression() {<br>

-      auto LHS = ParsePrimary();<br>

-      if (!LHS)<br>

-        return nullptr;<br>

-<br>

-      return ParseBinOpRHS(0, std::move(LHS));<br>

-    }<br>

-<br>

-``ParseBinOpRHS`` is the function that parses the sequence of pairs for<br>

-us. It takes a precedence and a pointer to an expression for the part<br>

-that has been parsed so far. Note that "x" is a perfectly valid<br>

-expression: As such, "binoprhs" is allowed to be empty, in which case it<br>

-returns the expression that is passed into it. In our example above, the<br>

-code passes the expression for "a" into ``ParseBinOpRHS`` and the<br>

-current token is "+".<br>

-<br>

-The precedence value passed into ``ParseBinOpRHS`` indicates the<br>

-*minimal operator precedence* that the function is allowed to eat. For<br>

-example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is<br>

-passed in a precedence of 40, it will not consume any tokens (because<br>

-the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``<br>

-starts with:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// binoprhs<br>

-    ///   ::= ('+' primary)*<br>

-    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,<br>

-                                                  std::unique_ptr<ExprAST> LHS) {<br>

-      // If this is a binop, find its precedence.<br>

-      while (1) {<br>

-        int TokPrec = GetTokPrecedence();<br>

-<br>

-        // If this is a binop that binds at least as tightly as the current binop,<br>

-        // consume it, otherwise we are done.<br>

-        if (TokPrec < ExprPrec)<br>

-          return LHS;<br>

-<br>

-This code gets the precedence of the current token and checks to see if<br>

-if is too low. Because we defined invalid tokens to have a precedence of<br>

--1, this check implicitly knows that the pair-stream ends when the token<br>

-stream runs out of binary operators. If this check succeeds, we know<br>

-that the token is a binary operator and that it will be included in this<br>

-expression:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        // Okay, we know this is a binop.<br>

-        int BinOp = CurTok;<br>

-        getNextToken();  // eat binop<br>

-<br>

-        // Parse the primary expression after the binary operator.<br>

-        auto RHS = ParsePrimary();<br>

-        if (!RHS)<br>

-          return nullptr;<br>

-<br>

-As such, this code eats (and remembers) the binary operator and then<br>

-parses the primary expression that follows. This builds up the whole<br>

-pair, the first of which is [+, b] for the running example.<br>

-<br>

-Now that we parsed the left-hand side of an expression and one pair of<br>

-the RHS sequence, we have to decide which way the expression associates.<br>

-In particular, we could have "(a+b) binop unparsed" or "a + (b binop<br>

-unparsed)". To determine this, we look ahead at "binop" to determine its<br>

-precedence and compare it to BinOp's precedence (which is '+' in this<br>

-case):<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        // If BinOp binds less tightly with RHS than the operator after RHS, let<br>

-        // the pending operator take RHS as its LHS.<br>

-        int NextPrec = GetTokPrecedence();<br>

-        if (TokPrec < NextPrec) {<br>

-<br>

-If the precedence of the binop to the right of "RHS" is lower or equal<br>

-to the precedence of our current operator, then we know that the<br>

-parentheses associate as "(a+b) binop ...". In our example, the current<br>

-operator is "+" and the next operator is "+", we know that they have the<br>

-same precedence. In this case we'll create the AST node for "a+b", and<br>

-then continue parsing:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-          ... if body omitted ...<br>

-        }<br>

-<br>

-        // Merge LHS/RHS.<br>

-        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),<br>

-                                               std::move(RHS));<br>

-      }  // loop around to the top of the while loop.<br>

-    }<br>

-<br>

-In our example above, this will turn "a+b+" into "(a+b)" and execute the<br>

-next iteration of the loop, with "+" as the current token. The code<br>

-above will eat, remember, and parse "(c+d)" as the primary expression,<br>

-which makes the current pair equal to [+, (c+d)]. It will then evaluate<br>

-the 'if' conditional above with "\*" as the binop to the right of the<br>

-primary. In this case, the precedence of "\*" is higher than the<br>

-precedence of "+" so the if condition will be entered.<br>

-<br>

-The critical question left here is "how can the if condition parse the<br>

-right hand side in full"? In particular, to build the AST correctly for<br>

-our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression<br>

-variable. The code to do this is surprisingly simple (code from the<br>

-above two blocks duplicated for context):<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        // If BinOp binds less tightly with RHS than the operator after RHS, let<br>

-        // the pending operator take RHS as its LHS.<br>

-        int NextPrec = GetTokPrecedence();<br>

-        if (TokPrec < NextPrec) {<br>

-          RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));<br>

-          if (!RHS)<br>

-            return nullptr;<br>

-        }<br>

-        // Merge LHS/RHS.<br>

-        LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS),<br>

-                                               std::move(RHS));<br>

-      }  // loop around to the top of the while loop.<br>

-    }<br>

-<br>

-At this point, we know that the binary operator to the RHS of our<br>

-primary has higher precedence than the binop we are currently parsing.<br>

-As such, we know that any sequence of pairs whose operators are all<br>

-higher precedence than "+" should be parsed together and returned as<br>

-"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function<br>

-specifying "TokPrec+1" as the minimum precedence required for it to<br>

-continue. In our example above, this will cause it to return the AST<br>

-node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'<br>

-expression.<br>

-<br>

-Finally, on the next iteration of the while loop, the "+g" piece is<br>

-parsed and added to the AST. With this little bit of code (14<br>

-non-trivial lines), we correctly handle fully general binary expression<br>

-parsing in a very elegant way. This was a whirlwind tour of this code,<br>

-and it is somewhat subtle. I recommend running through it with a few<br>

-tough examples to see how it works.<br>

-<br>

-This wraps up handling of expressions. At this point, we can point the<br>

-parser at an arbitrary token stream and build an expression from it,<br>

-stopping at the first token that is not part of the expression. Next up<br>

-we need to handle function definitions, etc.<br>

-<br>

-Parsing the Rest<br>

-================<br>

-<br>

-The next thing missing is handling of function prototypes. In<br>

-Kaleidoscope, these are used both for 'extern' function declarations as<br>

-well as function body definitions. The code to do this is<br>

-straight-forward and not very interesting (once you've survived<br>

-expressions):<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// prototype<br>

-    ///   ::= id '(' id* ')'<br>

-    static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

-      if (CurTok != tok_identifier)<br>

-        return LogErrorP("Expected function name in prototype");<br>

-<br>

-      std::string FnName = IdentifierStr;<br>

-      getNextToken();<br>

-<br>

-      if (CurTok != '(')<br>

-        return LogErrorP("Expected '(' in prototype");<br>

-<br>

-      // Read the list of argument names.<br>

-      std::vector<std::string> ArgNames;<br>

-      while (getNextToken() == tok_identifier)<br>

-        ArgNames.push_back(IdentifierStr);<br>

-      if (CurTok != ')')<br>

-        return LogErrorP("Expected ')' in prototype");<br>

-<br>

-      // success.<br>

-      getNextToken();  // eat ')'.<br>

-<br>

-      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames));<br>

-    }<br>

-<br>

-Given this, a function definition is very simple, just a prototype plus<br>

-an expression to implement the body:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// definition ::= 'def' prototype expression<br>

-    static std::unique_ptr<FunctionAST> ParseDefinition() {<br>

-      getNextToken();  // eat def.<br>

-      auto Proto = ParsePrototype();<br>

-      if (!Proto) return nullptr;<br>

-<br>

-      if (auto E = ParseExpression())<br>

-        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

-      return nullptr;<br>

-    }<br>

-<br>

-In addition, we support 'extern' to declare functions like 'sin' and<br>

-'cos' as well as to support forward declaration of user functions. These<br>

-'extern's are just prototypes with no body:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// external ::= 'extern' prototype<br>

-    static std::unique_ptr<PrototypeAST> ParseExtern() {<br>

-      getNextToken();  // eat extern.<br>

-      return ParsePrototype();<br>

-    }<br>

-<br>

-Finally, we'll also let the user type in arbitrary top-level expressions<br>

-and evaluate them on the fly. We will handle this by defining anonymous<br>

-nullary (zero argument) functions for them:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// toplevelexpr ::= expression<br>

-    static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {<br>

-      if (auto E = ParseExpression()) {<br>

-        // Make an anonymous proto.<br>

-        auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());<br>

-        return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

-      }<br>

-      return nullptr;<br>

-    }<br>

-<br>

-Now that we have all the pieces, let's build a little driver that will<br>

-let us actually *execute* this code we've built!<br>

-<br>

-The Driver<br>

-==========<br>

-<br>

-The driver for this simply invokes all of the parsing pieces with a<br>

-top-level dispatch loop. There isn't much interesting here, so I'll just<br>

-include the top-level loop. See `below <#full-code-listing>`_ for full code in the<br>

-"Top-Level Parsing" section.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// top ::= definition | external | expression | ';'<br>

-    static void MainLoop() {<br>

-      while (1) {<br>

-        fprintf(stderr, "ready> ");<br>

-        switch (CurTok) {<br>

-        case tok_eof:<br>

-          return;<br>

-        case ';': // ignore top-level semicolons.<br>

-          getNextToken();<br>

-          break;<br>

-        case tok_def:<br>

-          HandleDefinition();<br>

-          break;<br>

-        case tok_extern:<br>

-          HandleExtern();<br>

-          break;<br>

-        default:<br>

-          HandleTopLevelExpression();<br>

-          break;<br>

-        }<br>

-      }<br>

-    }<br>

-<br>

-The most interesting part of this is that we ignore top-level<br>

-semicolons. Why is this, you ask? The basic reason is that if you type<br>

-"4 + 5" at the command line, the parser doesn't know whether that is the<br>

-end of what you will type or not. For example, on the next line you<br>

-could type "def foo..." in which case 4+5 is the end of a top-level<br>

-expression. Alternatively you could type "\* 6", which would continue<br>

-the expression. Having top-level semicolons allows you to type "4+5;",<br>

-and the parser will know you are done.<br>

-<br>

-Conclusions<br>

-===========<br>

-<br>

-With just under 400 lines of commented code (240 lines of non-comment,<br>

-non-blank code), we fully defined our minimal language, including a<br>

-lexer, parser, and AST builder. With this done, the executable will<br>

-validate Kaleidoscope code and tell us if it is grammatically invalid.<br>

-For example, here is a sample interaction:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    $ ./a.out<br>

-    ready> def foo(x y) x+foo(y, 4.0);<br>

-    Parsed a function definition.<br>

-    ready> def foo(x y) x+y y;<br>

-    Parsed a function definition.<br>

-    Parsed a top-level expr<br>

-    ready> def foo(x y) x+y );<br>

-    Parsed a function definition.<br>

-    Error: unknown token when expecting an expression<br>

-    ready> extern sin(a);<br>

-    ready> Parsed an extern<br>

-    ready> ^D<br>

-    $<br>

-<br>

-There is a lot of room for extension here. You can define new AST nodes,<br>

-extend the language in many ways, etc. In the `next<br>

-installment <LangImpl3.html>`_, we will describe how to generate LLVM<br>

-Intermediate Representation (IR) from the AST.<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for this and the previous chapter.<br>

-Note that it is fully self-contained: you don't need LLVM or any<br>

-external libraries at all for this. (Besides the C and C++ standard<br>

-libraries, of course.) To build this, just compile with:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g -O3 toy.cpp<br>

-    # Run<br>

-    ./a.out<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Implementing Code Generation to LLVM IR <LangImpl3.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl3.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl3.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl3.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl3.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl3.rst (removed)<br>

@@ -1,567 +0,0 @@<br>

-========================================<br>

-Kaleidoscope: Code generation to LLVM IR<br>

-========================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 3 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 3 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. This chapter shows you how to transform<br>

-the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into<br>

-LLVM IR. This will teach you a little bit about how LLVM does things, as<br>

-well as demonstrate how easy it is to use. It's much more work to build<br>

-a lexer and parser than it is to generate LLVM IR code. :)<br>

-<br>

-**Please note**: the code in this chapter and later require LLVM 3.7 or<br>

-later. LLVM 3.6 and before will not work with it. Also note that you<br>

-need to use a version of this tutorial that matches your LLVM release:<br>

-If you are using an official LLVM release, use the version of the<br>

-documentation included with your release or on the `<a href="http://llvm.org" rel="noreferrer" target="_blank">llvm.org</a> releases<br>

-page <<a href="http://llvm.org/releases/" rel="noreferrer" target="_blank">http://llvm.org/releases/</a>>`_.<br>

-<br>

-Code Generation Setup<br>

-=====================<br>

-<br>

-In order to generate LLVM IR, we want some simple setup to get started.<br>

-First we define virtual code generation (codegen) methods in each AST<br>

-class:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// ExprAST - Base class for all expression nodes.<br>

-    class ExprAST {<br>

-    public:<br>

-      virtual ~ExprAST() {}<br>

-      virtual Value *codegen() = 0;<br>

-    };<br>

-<br>

-    /// NumberExprAST - Expression class for numeric literals like "1.0".<br>

-    class NumberExprAST : public ExprAST {<br>

-      double Val;<br>

-<br>

-    public:<br>

-      NumberExprAST(double Val) : Val(Val) {}<br>

-      virtual Value *codegen();<br>

-    };<br>

-    ...<br>

-<br>

-The codegen() method says to emit IR for that AST node along with all<br>

-the things it depends on, and they all return an LLVM Value object.<br>

-"Value" is the class used to represent a "`Static Single Assignment<br>

-(SSA) <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_<br>

-register" or "SSA value" in LLVM. The most distinct aspect of SSA values<br>

-is that their value is computed as the related instruction executes, and<br>

-it does not get a new value until (and if) the instruction re-executes.<br>

-In other words, there is no way to "change" an SSA value. For more<br>

-information, please read up on `Static Single<br>

-Assignment <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_<br>

-- the concepts are really quite natural once you grok them.<br>

-<br>

-Note that instead of adding virtual methods to the ExprAST class<br>

-hierarchy, it could also make sense to use a `visitor<br>

-pattern <<a href="http://en.wikipedia.org/wiki/Visitor_pattern" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Visitor_pattern</a>>`_ or some other<br>

-way to model this. Again, this tutorial won't dwell on good software<br>

-engineering practices: for our purposes, adding a virtual method is<br>

-simplest.<br>

-<br>

-The second thing we want is an "LogError" method like we used for the<br>

-parser, which will be used to report errors found during code generation<br>

-(for example, use of an undeclared parameter):<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static LLVMContext TheContext;<br>

-    static IRBuilder<> Builder(TheContext);<br>

-    static std::unique_ptr<Module> TheModule;<br>

-    static std::map<std::string, Value *> NamedValues;<br>

-<br>

-    Value *LogErrorV(const char *Str) {<br>

-      LogError(Str);<br>

-      return nullptr;<br>

-    }<br>

-<br>

-The static variables will be used during code generation. ``TheContext``<br>

-is an opaque object that owns a lot of core LLVM data structures, such as<br>

-the type and constant value tables. We don't need to understand it in<br>

-detail, we just need a single instance to pass into APIs that require it.<br>

-<br>

-The ``Builder`` object is a helper object that makes it easy to generate<br>

-LLVM instructions. Instances of the<br>

-`IRBuilder <<a href="http://llvm.org/doxygen/IRBuilder_8h-source.html" rel="noreferrer" target="_blank">http://llvm.org/doxygen/IRBuilder_8h-source.html</a>>`_<br>

-class template keep track of the current place to insert instructions<br>

-and has methods to create new instructions.<br>

-<br>

-``TheModule`` is an LLVM construct that contains functions and global<br>

-variables. In many ways, it is the top-level structure that the LLVM IR<br>

-uses to contain code. It will own the memory for all of the IR that we<br>

-generate, which is why the codegen() method returns a raw Value\*,<br>

-rather than a unique_ptr<Value>.<br>

-<br>

-The ``NamedValues`` map keeps track of which values are defined in the<br>

-current scope and what their LLVM representation is. (In other words, it<br>

-is a symbol table for the code). In this form of Kaleidoscope, the only<br>

-things that can be referenced are function parameters. As such, function<br>

-parameters will be in this map when generating code for their function<br>

-body.<br>

-<br>

-With these basics in place, we can start talking about how to generate<br>

-code for each expression. Note that this assumes that the ``Builder``<br>

-has been set up to generate code *into* something. For now, we'll assume<br>

-that this has already been done, and we'll just use it to emit code.<br>

-<br>

-Expression Code Generation<br>

-==========================<br>

-<br>

-Generating LLVM code for expression nodes is very straightforward: less<br>

-than 45 lines of commented code for all four of our expression nodes.<br>

-First we'll do numeric literals:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *NumberExprAST::codegen() {<br>

-      return ConstantFP::get(LLVMContext, APFloat(Val));<br>

-    }<br>

-<br>

-In the LLVM IR, numeric constants are represented with the<br>

-``ConstantFP`` class, which holds the numeric value in an ``APFloat``<br>

-internally (``APFloat`` has the capability of holding floating point<br>

-constants of Arbitrary Precision). This code basically just creates<br>

-and returns a ``ConstantFP``. Note that in the LLVM IR that constants<br>

-are all uniqued together and shared. For this reason, the API uses the<br>

-"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *VariableExprAST::codegen() {<br>

-      // Look this variable up in the function.<br>

-      Value *V = NamedValues[Name];<br>

-      if (!V)<br>

-        LogErrorV("Unknown variable name");<br>

-      return V;<br>

-    }<br>

-<br>

-References to variables are also quite simple using LLVM. In the simple<br>

-version of Kaleidoscope, we assume that the variable has already been<br>

-emitted somewhere and its value is available. In practice, the only<br>

-values that can be in the ``NamedValues`` map are function arguments.<br>

-This code simply checks to see that the specified name is in the map (if<br>

-not, an unknown variable is being referenced) and returns the value for<br>

-it. In future chapters, we'll add support for `loop induction<br>

-variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local<br>

-variables <LangImpl7.html#user-defined-local-variables>`_.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *BinaryExprAST::codegen() {<br>

-      Value *L = LHS->codegen();<br>

-      Value *R = RHS->codegen();<br>

-      if (!L || !R)<br>

-        return nullptr;<br>

-<br>

-      switch (Op) {<br>

-      case '+':<br>

-        return Builder.CreateFAdd(L, R, "addtmp");<br>

-      case '-':<br>

-        return Builder.CreateFSub(L, R, "subtmp");<br>

-      case '*':<br>

-        return Builder.CreateFMul(L, R, "multmp");<br>

-      case '<':<br>

-        L = Builder.CreateFCmpULT(L, R, "cmptmp");<br>

-        // Convert bool 0/1 to double 0.0 or 1.0<br>

-        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),<br>

-                                    "booltmp");<br>

-      default:<br>

-        return LogErrorV("invalid binary operator");<br>

-      }<br>

-    }<br>

-<br>

-Binary operators start to get more interesting. The basic idea here is<br>

-that we recursively emit code for the left-hand side of the expression,<br>

-then the right-hand side, then we compute the result of the binary<br>

-expression. In this code, we do a simple switch on the opcode to create<br>

-the right LLVM instruction.<br>

-<br>

-In the example above, the LLVM builder class is starting to show its<br>

-value. IRBuilder knows where to insert the newly created instruction,<br>

-all you have to do is specify what instruction to create (e.g. with<br>

-``CreateFAdd``), which operands to use (``L`` and ``R`` here) and<br>

-optionally provide a name for the generated instruction.<br>

-<br>

-One nice thing about LLVM is that the name is just a hint. For instance,<br>

-if the code above emits multiple "addtmp" variables, LLVM will<br>

-automatically provide each one with an increasing, unique numeric<br>

-suffix. Local value names for instructions are purely optional, but it<br>

-makes it much easier to read the IR dumps.<br>

-<br>

-`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict<br>

-rules: for example, the Left and Right operators of an `add<br>

-instruction <../LangRef.html#add-instruction>`_ must have the same type, and the<br>

-result type of the add must match the operand types. Because all values<br>

-in Kaleidoscope are doubles, this makes for very simple code for add,<br>

-sub and mul.<br>

-<br>

-On the other hand, LLVM specifies that the `fcmp<br>

-instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a<br>

-one bit integer). The problem with this is that Kaleidoscope wants the<br>

-value to be a 0.0 or 1.0 value. In order to get these semantics, we<br>

-combine the fcmp instruction with a `uitofp<br>

-instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its<br>

-input integer into a floating point value by treating the input as an<br>

-unsigned value. In contrast, if we used the `sitofp<br>

-instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator<br>

-would return 0.0 and -1.0, depending on the input value.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *CallExprAST::codegen() {<br>

-      // Look up the name in the global module table.<br>

-      Function *CalleeF = TheModule->getFunction(Callee);<br>

-      if (!CalleeF)<br>

-        return LogErrorV("Unknown function referenced");<br>

-<br>

-      // If argument mismatch error.<br>

-      if (CalleeF->arg_size() != Args.size())<br>

-        return LogErrorV("Incorrect # arguments passed");<br>

-<br>

-      std::vector<Value *> ArgsV;<br>

-      for (unsigned i = 0, e = Args.size(); i != e; ++i) {<br>

-        ArgsV.push_back(Args[i]->codegen());<br>

-        if (!ArgsV.back())<br>

-          return nullptr;<br>

-      }<br>

-<br>

-      return Builder.CreateCall(CalleeF, ArgsV, "calltmp");<br>

-    }<br>

-<br>

-Code generation for function calls is quite straightforward with LLVM. The code<br>

-above initially does a function name lookup in the LLVM Module's symbol table.<br>

-Recall that the LLVM Module is the container that holds the functions we are<br>

-JIT'ing. By giving each function the same name as what the user specifies, we<br>

-can use the LLVM symbol table to resolve function names for us.<br>

-<br>

-Once we have the function to call, we recursively codegen each argument<br>

-that is to be passed in, and create an LLVM `call<br>

-instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C<br>

-calling conventions by default, allowing these calls to also call into<br>

-standard library functions like "sin" and "cos", with no additional<br>

-effort.<br>

-<br>

-This wraps up our handling of the four basic expressions that we have so<br>

-far in Kaleidoscope. Feel free to go in and add some more. For example,<br>

-by browsing the `LLVM language reference <../LangRef.html>`_ you'll find<br>

-several other interesting instructions that are really easy to plug into<br>

-our basic framework.<br>

-<br>

-Function Code Generation<br>

-========================<br>

-<br>

-Code generation for prototypes and functions must handle a number of<br>

-details, which make their code less beautiful than expression code<br>

-generation, but allows us to illustrate some important points. First,<br>

-lets talk about code generation for prototypes: they are used both for<br>

-function bodies and external function declarations. The code starts<br>

-with:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Function *PrototypeAST::codegen() {<br>

-      // Make the function type:  double(double,double) etc.<br>

-      std::vector<Type*> Doubles(Args.size(),<br>

-                                 Type::getDoubleTy(LLVMContext));<br>

-      FunctionType *FT =<br>

-        FunctionType::get(Type::getDoubleTy(LLVMContext), Doubles, false);<br>

-<br>

-      Function *F =<br>

-        Function::Create(FT, Function::ExternalLinkage, Name, TheModule);<br>

-<br>

-This code packs a lot of power into a few lines. Note first that this<br>

-function returns a "Function\*" instead of a "Value\*". Because a<br>

-"prototype" really talks about the external interface for a function<br>

-(not the value computed by an expression), it makes sense for it to<br>

-return the LLVM Function it corresponds to when codegen'd.<br>

-<br>

-The call to ``FunctionType::get`` creates the ``FunctionType`` that<br>

-should be used for a given Prototype. Since all function arguments in<br>

-Kaleidoscope are of type double, the first line creates a vector of "N"<br>

-LLVM double types. It then uses the ``Functiontype::get`` method to<br>

-create a function type that takes "N" doubles as arguments, returns one<br>

-double as a result, and that is not vararg (the false parameter<br>

-indicates this). Note that Types in LLVM are uniqued just like Constants<br>

-are, so you don't "new" a type, you "get" it.<br>

-<br>

-The final line above actually creates the IR Function corresponding to<br>

-the Prototype. This indicates the type, linkage and name to use, as<br>

-well as which module to insert into. "`external<br>

-linkage <../LangRef.html#linkage>`_" means that the function may be<br>

-defined outside the current module and/or that it is callable by<br>

-functions outside the module. The Name passed in is the name the user<br>

-specified: since "``TheModule``" is specified, this name is registered<br>

-in "``TheModule``"s symbol table.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  // Set names for all arguments.<br>

-  unsigned Idx = 0;<br>

-  for (auto &Arg : F->args())<br>

-    Arg.setName(Args[Idx++]);<br>

-<br>

-  return F;<br>

-<br>

-Finally, we set the name of each of the function's arguments according to the<br>

-names given in the Prototype. This step isn't strictly necessary, but keeping<br>

-the names consistent makes the IR more readable, and allows subsequent code to<br>

-refer directly to the arguments for their names, rather than having to look up<br>

-them up in the Prototype AST.<br>

-<br>

-At this point we have a function prototype with no body. This is how LLVM IR<br>

-represents function declarations. For extern statements in Kaleidoscope, this<br>

-is as far as we need to go. For function definitions however, we need to<br>

-codegen and attach a function body.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  Function *FunctionAST::codegen() {<br>

-      // First, check for an existing function from a previous 'extern' declaration.<br>

-    Function *TheFunction = TheModule->getFunction(Proto->getName());<br>

-<br>

-    if (!TheFunction)<br>

-      TheFunction = Proto->codegen();<br>

-<br>

-    if (!TheFunction)<br>

-      return nullptr;<br>

-<br>

-    if (!TheFunction->empty())<br>

-      return (Function*)LogErrorV("Function cannot be redefined.");<br>

-<br>

-<br>

-For function definitions, we start by searching TheModule's symbol table for an<br>

-existing version of this function, in case one has already been created using an<br>

-'extern' statement. If Module::getFunction returns null then no previous version<br>

-exists, so we'll codegen one from the Prototype. In either case, we want to<br>

-assert that the function is empty (i.e. has no body yet) before we start.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  // Create a new basic block to start insertion into.<br>

-  BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);<br>

-  Builder.SetInsertPoint(BB);<br>

-<br>

-  // Record the function arguments in the NamedValues map.<br>

-  NamedValues.clear();<br>

-  for (auto &Arg : TheFunction->args())<br>

-    NamedValues[Arg.getName()] = &Arg;<br>

-<br>

-Now we get to the point where the ``Builder`` is set up. The first line<br>

-creates a new `basic block <<a href="http://en.wikipedia.org/wiki/Basic_block" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Basic_block</a>>`_<br>

-(named "entry"), which is inserted into ``TheFunction``. The second line<br>

-then tells the builder that new instructions should be inserted into the<br>

-end of the new basic block. Basic blocks in LLVM are an important part<br>

-of functions that define the `Control Flow<br>

-Graph <<a href="http://en.wikipedia.org/wiki/Control_flow_graph" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Control_flow_graph</a>>`_. Since we<br>

-don't have any control flow, our functions will only contain one block<br>

-at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :).<br>

-<br>

-Next we add the function arguments to the NamedValues map (after first clearing<br>

-it out) so that they're accessible to ``VariableExprAST`` nodes.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      if (Value *RetVal = Body->codegen()) {<br>

-        // Finish off the function.<br>

-        Builder.CreateRet(RetVal);<br>

-<br>

-        // Validate the generated code, checking for consistency.<br>

-        verifyFunction(*TheFunction);<br>

-<br>

-        return TheFunction;<br>

-      }<br>

-<br>

-Once the insertion point has been set up and the NamedValues map populated,<br>

-we call the ``codegen()`` method for the root expression of the function. If no<br>

-error happens, this emits code to compute the expression into the entry block<br>

-and returns the value that was computed. Assuming no error, we then create an<br>

-LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function.<br>

-Once the function is built, we call ``verifyFunction``, which is<br>

-provided by LLVM. This function does a variety of consistency checks on<br>

-the generated code, to determine if our compiler is doing everything<br>

-right. Using this is important: it can catch a lot of bugs. Once the<br>

-function is finished and validated, we return it.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Error reading body, remove function.<br>

-      TheFunction->eraseFromParent();<br>

-      return nullptr;<br>

-    }<br>

-<br>

-The only piece left here is handling of the error case. For simplicity,<br>

-we handle this by merely deleting the function we produced with the<br>

-``eraseFromParent`` method. This allows the user to redefine a function<br>

-that they incorrectly typed in before: if we didn't delete it, it would<br>

-live in the symbol table, with a body, preventing future redefinition.<br>

-<br>

-This code does have a bug, though: If the ``FunctionAST::codegen()`` method<br>

-finds an existing IR Function, it does not validate its signature against the<br>

-definition's own prototype. This means that an earlier 'extern' declaration will<br>

-take precedence over the function definition's signature, which can cause<br>

-codegen to fail, for instance if the function arguments are named differently.<br>

-There are a number of ways to fix this bug, see what you can come up with! Here<br>

-is a testcase:<br>

-<br>

-::<br>

-<br>

-    extern foo(a);     # ok, defines foo.<br>

-    def foo(b) b;      # Error: Unknown variable name. (decl using 'a' takes precedence).<br>

-<br>

-Driver Changes and Closing Thoughts<br>

-===================================<br>

-<br>

-For now, code generation to LLVM doesn't really get us much, except that<br>

-we can look at the pretty IR calls. The sample code inserts calls to<br>

-codegen into the "``HandleDefinition``", "``HandleExtern``" etc<br>

-functions, and then dumps out the LLVM IR. This gives a nice way to look<br>

-at the LLVM IR for simple functions. For example:<br>

-<br>

-::<br>

-<br>

-    ready> 4+5;<br>

-    Read top-level expression:<br>

-    define double @0() {<br>

-    entry:<br>

-      ret double 9.000000e+00<br>

-    }<br>

-<br>

-Note how the parser turns the top-level expression into anonymous<br>

-functions for us. This will be handy when we add `JIT<br>

-support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the<br>

-code is very literally transcribed, no optimizations are being performed<br>

-except simple constant folding done by IRBuilder. We will `add<br>

-optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next<br>

-chapter.<br>

-<br>

-::<br>

-<br>

-    ready> def foo(a b) a*a + 2*a*b + b*b;<br>

-    Read function definition:<br>

-    define double @foo(double %a, double %b) {<br>

-    entry:<br>

-      %multmp = fmul double %a, %a<br>

-      %multmp1 = fmul double 2.000000e+00, %a<br>

-      %multmp2 = fmul double %multmp1, %b<br>

-      %addtmp = fadd double %multmp, %multmp2<br>

-      %multmp3 = fmul double %b, %b<br>

-      %addtmp4 = fadd double %addtmp, %multmp3<br>

-      ret double %addtmp4<br>

-    }<br>

-<br>

-This shows some simple arithmetic. Notice the striking similarity to the<br>

-LLVM builder calls that we use to create the instructions.<br>

-<br>

-::<br>

-<br>

-    ready> def bar(a) foo(a, 4.0) + bar(31337);<br>

-    Read function definition:<br>

-    define double @bar(double %a) {<br>

-    entry:<br>

-      %calltmp = call double @foo(double %a, double 4.000000e+00)<br>

-      %calltmp1 = call double @bar(double 3.133700e+04)<br>

-      %addtmp = fadd double %calltmp, %calltmp1<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-This shows some function calls. Note that this function will take a long<br>

-time to execute if you call it. In the future we'll add conditional<br>

-control flow to actually make recursion useful :).<br>

-<br>

-::<br>

-<br>

-    ready> extern cos(x);<br>

-    Read extern:<br>

-    declare double @cos(double)<br>

-<br>

-    ready> cos(1.234);<br>

-    Read top-level expression:<br>

-    define double @1() {<br>

-    entry:<br>

-      %calltmp = call double @cos(double 1.234000e+00)<br>

-      ret double %calltmp<br>

-    }<br>

-<br>

-This shows an extern for the libm "cos" function, and a call to it.<br>

-<br>

-.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up<br>

-   on highlighting this due to the first line.<br>

-<br>

-::<br>

-<br>

-    ready> ^D<br>

-    ; ModuleID = 'my cool jit'<br>

-<br>

-    define double @0() {<br>

-    entry:<br>

-      %addtmp = fadd double 4.000000e+00, 5.000000e+00<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-    define double @foo(double %a, double %b) {<br>

-    entry:<br>

-      %multmp = fmul double %a, %a<br>

-      %multmp1 = fmul double 2.000000e+00, %a<br>

-      %multmp2 = fmul double %multmp1, %b<br>

-      %addtmp = fadd double %multmp, %multmp2<br>

-      %multmp3 = fmul double %b, %b<br>

-      %addtmp4 = fadd double %addtmp, %multmp3<br>

-      ret double %addtmp4<br>

-    }<br>

-<br>

-    define double @bar(double %a) {<br>

-    entry:<br>

-      %calltmp = call double @foo(double %a, double 4.000000e+00)<br>

-      %calltmp1 = call double @bar(double 3.133700e+04)<br>

-      %addtmp = fadd double %calltmp, %calltmp1<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-    declare double @cos(double)<br>

-<br>

-    define double @1() {<br>

-    entry:<br>

-      %calltmp = call double @cos(double 1.234000e+00)<br>

-      ret double %calltmp<br>

-    }<br>

-<br>

-When you quit the current demo, it dumps out the IR for the entire<br>

-module generated. Here you can see the big picture with all the<br>

-functions referencing each other.<br>

-<br>

-This wraps up the third chapter of the Kaleidoscope tutorial. Up next,<br>

-we'll describe how to `add JIT codegen and optimizer<br>

-support <LangImpl4.html>`_ to this so we can actually start running<br>

-code!<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for our running example, enhanced with<br>

-the LLVM code generator. Because this uses the LLVM libraries, we need<br>

-to link them in. To do this, we use the<br>

-`llvm-config <<a href="http://llvm.org/cmds/llvm-config.html" rel="noreferrer" target="_blank">http://llvm.org/cmds/llvm-config.html</a>>`_ tool to inform<br>

-our makefile/command line about which options to use:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy<br>

-    # Run<br>

-    ./toy<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter3/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Adding JIT and Optimizer Support <LangImpl4.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl4.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl4.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl4.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl4.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl4.rst (removed)<br>

@@ -1,610 +0,0 @@<br>

-==============================================<br>

-Kaleidoscope: Adding JIT and Optimizer Support<br>

-==============================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 4 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 4 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation<br>

-of a simple language and added support for generating LLVM IR. This<br>

-chapter describes two new techniques: adding optimizer support to your<br>

-language, and adding JIT compiler support. These additions will<br>

-demonstrate how to get nice, efficient code for the Kaleidoscope<br>

-language.<br>

-<br>

-Trivial Constant Folding<br>

-========================<br>

-<br>

-Our demonstration for Chapter 3 is elegant and easy to extend.<br>

-Unfortunately, it does not produce wonderful code. The IRBuilder,<br>

-however, does give us obvious optimizations when compiling simple code:<br>

-<br>

-::<br>

-<br>

-    ready> def test(x) 1+2+x;<br>

-    Read function definition:<br>

-    define double @test(double %x) {<br>

-    entry:<br>

-            %addtmp = fadd double 3.000000e+00, %x<br>

-            ret double %addtmp<br>

-    }<br>

-<br>

-This code is not a literal transcription of the AST built by parsing the<br>

-input. That would be:<br>

-<br>

-::<br>

-<br>

-    ready> def test(x) 1+2+x;<br>

-    Read function definition:<br>

-    define double @test(double %x) {<br>

-    entry:<br>

-            %addtmp = fadd double 2.000000e+00, 1.000000e+00<br>

-            %addtmp1 = fadd double %addtmp, %x<br>

-            ret double %addtmp1<br>

-    }<br>

-<br>

-Constant folding, as seen above, in particular, is a very common and<br>

-very important optimization: so much so that many language implementors<br>

-implement constant folding support in their AST representation.<br>

-<br>

-With LLVM, you don't need this support in the AST. Since all calls to<br>

-build LLVM IR go through the LLVM IR builder, the builder itself checked<br>

-to see if there was a constant folding opportunity when you call it. If<br>

-so, it just does the constant fold and return the constant instead of<br>

-creating an instruction.<br>

-<br>

-Well, that was easy :). In practice, we recommend always using<br>

-``IRBuilder`` when generating code like this. It has no "syntactic<br>

-overhead" for its use (you don't have to uglify your compiler with<br>

-constant checks everywhere) and it can dramatically reduce the amount of<br>

-LLVM IR that is generated in some cases (particular for languages with a<br>

-macro preprocessor or that use a lot of constants).<br>

-<br>

-On the other hand, the ``IRBuilder`` is limited by the fact that it does<br>

-all of its analysis inline with the code as it is built. If you take a<br>

-slightly more complex example:<br>

-<br>

-::<br>

-<br>

-    ready> def test(x) (1+2+x)*(x+(1+2));<br>

-    ready> Read function definition:<br>

-    define double @test(double %x) {<br>

-    entry:<br>

-            %addtmp = fadd double 3.000000e+00, %x<br>

-            %addtmp1 = fadd double %x, 3.000000e+00<br>

-            %multmp = fmul double %addtmp, %addtmp1<br>

-            ret double %multmp<br>

-    }<br>

-<br>

-In this case, the LHS and RHS of the multiplication are the same value.<br>

-We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"<br>

-instead of computing "``x+3``" twice.<br>

-<br>

-Unfortunately, no amount of local analysis will be able to detect and<br>

-correct this. This requires two transformations: reassociation of<br>

-expressions (to make the add's lexically identical) and Common<br>

-Subexpression Elimination (CSE) to delete the redundant add instruction.<br>

-Fortunately, LLVM provides a broad range of optimizations that you can<br>

-use, in the form of "passes".<br>

-<br>

-LLVM Optimization Passes<br>

-========================<br>

-<br>

-LLVM provides many optimization passes, which do many different sorts of<br>

-things and have different tradeoffs. Unlike other systems, LLVM doesn't<br>

-hold to the mistaken notion that one set of optimizations is right for<br>

-all languages and for all situations. LLVM allows a compiler implementor<br>

-to make complete decisions about what optimizations to use, in which<br>

-order, and in what situation.<br>

-<br>

-As a concrete example, LLVM supports both "whole module" passes, which<br>

-look across as large of body of code as they can (often a whole file,<br>

-but if run at link time, this can be a substantial portion of the whole<br>

-program). It also supports and includes "per-function" passes which just<br>

-operate on a single function at a time, without looking at other<br>

-functions. For more information on passes and how they are run, see the<br>

-`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the<br>

-`List of LLVM Passes <../Passes.html>`_.<br>

-<br>

-For Kaleidoscope, we are currently generating functions on the fly, one<br>

-at a time, as the user types them in. We aren't shooting for the<br>

-ultimate optimization experience in this setting, but we also want to<br>

-catch the easy and quick stuff where possible. As such, we will choose<br>

-to run a few per-function optimizations as the user types the function<br>

-in. If we wanted to make a "static Kaleidoscope compiler", we would use<br>

-exactly the code we have now, except that we would defer running the<br>

-optimizer until the entire file has been parsed.<br>

-<br>

-In order to get per-function optimizations going, we need to set up a<br>

-`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold<br>

-and organize the LLVM optimizations that we want to run. Once we have<br>

-that, we can add a set of optimizations to run. We'll need a new<br>

-FunctionPassManager for each module that we want to optimize, so we'll<br>

-write a function to create and initialize both the module and pass manager<br>

-for us:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    void InitializeModuleAndPassManager(void) {<br>

-      // Open a new module.<br>

-      Context LLVMContext;<br>

-      TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext);<br>

-      TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());<br>

-<br>

-      // Create a new pass manager attached to it.<br>

-      TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get());<br>

-<br>

-      // Provide basic AliasAnalysis support for GVN.<br>

-      TheFPM.add(createBasicAliasAnalysisPass());<br>

-      // Do simple "peephole" optimizations and bit-twiddling optzns.<br>

-      TheFPM.add(createInstructionCombiningPass());<br>

-      // Reassociate expressions.<br>

-      TheFPM.add(createReassociatePass());<br>

-      // Eliminate Common SubExpressions.<br>

-      TheFPM.add(createGVNPass());<br>

-      // Simplify the control flow graph (deleting unreachable blocks, etc).<br>

-      TheFPM.add(createCFGSimplificationPass());<br>

-<br>

-      TheFPM.doInitialization();<br>

-    }<br>

-<br>

-This code initializes the global module ``TheModule``, and the function pass<br>

-manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is<br>

-set up, we use a series of "add" calls to add a bunch of LLVM passes.<br>

-<br>

-In this case, we choose to add five passes: one analysis pass (alias analysis),<br>

-and four optimization passes. The passes we choose here are a pretty standard set<br>

-of "cleanup" optimizations that are useful for a wide variety of code. I won't<br>

-delve into what they do but, believe me, they are a good starting place :).<br>

-<br>

-Once the PassManager is set up, we need to make use of it. We do this by<br>

-running it after our newly created function is constructed (in<br>

-``FunctionAST::codegen()``), but before it is returned to the client:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      if (Value *RetVal = Body->codegen()) {<br>

-        // Finish off the function.<br>

-        Builder.CreateRet(RetVal);<br>

-<br>

-        // Validate the generated code, checking for consistency.<br>

-        verifyFunction(*TheFunction);<br>

-<br>

-        // Optimize the function.<br>

-        TheFPM->run(*TheFunction);<br>

-<br>

-        return TheFunction;<br>

-      }<br>

-<br>

-As you can see, this is pretty straightforward. The<br>

-``FunctionPassManager`` optimizes and updates the LLVM Function\* in<br>

-place, improving (hopefully) its body. With this in place, we can try<br>

-our test above again:<br>

-<br>

-::<br>

-<br>

-    ready> def test(x) (1+2+x)*(x+(1+2));<br>

-    ready> Read function definition:<br>

-    define double @test(double %x) {<br>

-    entry:<br>

-            %addtmp = fadd double %x, 3.000000e+00<br>

-            %multmp = fmul double %addtmp, %addtmp<br>

-            ret double %multmp<br>

-    }<br>

-<br>

-As expected, we now get our nicely optimized code, saving a floating<br>

-point add instruction from every execution of this function.<br>

-<br>

-LLVM provides a wide variety of optimizations that can be used in<br>

-certain circumstances. Some `documentation about the various<br>

-passes <../Passes.html>`_ is available, but it isn't very complete.<br>

-Another good source of ideas can come from looking at the passes that<br>

-``Clang`` runs to get started. The "``opt``" tool allows you to<br>

-experiment with passes from the command line, so you can see if they do<br>

-anything.<br>

-<br>

-Now that we have reasonable code coming out of our front-end, lets talk<br>

-about executing it!<br>

-<br>

-Adding a JIT Compiler<br>

-=====================<br>

-<br>

-Code that is available in LLVM IR can have a wide variety of tools<br>

-applied to it. For example, you can run optimizations on it (as we did<br>

-above), you can dump it out in textual or binary forms, you can compile<br>

-the code to an assembly file (.s) for some target, or you can JIT<br>

-compile it. The nice thing about the LLVM IR representation is that it<br>

-is the "common currency" between many different parts of the compiler.<br>

-<br>

-In this section, we'll add JIT compiler support to our interpreter. The<br>

-basic idea that we want for Kaleidoscope is to have the user enter<br>

-function bodies as they do now, but immediately evaluate the top-level<br>

-expressions they type in. For example, if they type in "1 + 2;", we<br>

-should evaluate and print out 3. If they define a function, they should<br>

-be able to call it from the command line.<br>

-<br>

-In order to do this, we first declare and initialize the JIT. This is<br>

-done by adding a global variable ``TheJIT``, and initializing it in<br>

-``main``:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static std::unique_ptr<KaleidoscopeJIT> TheJIT;<br>

-    ...<br>

-    int main() {<br>

-      ..<br>

-      TheJIT = llvm::make_unique<KaleidoscopeJIT>();<br>

-<br>

-      // Run the main "interpreter loop" now.<br>

-      MainLoop();<br>

-<br>

-      return 0;<br>

-    }<br>

-<br>

-The KaleidoscopeJIT class is a simple JIT built specifically for these<br>

-tutorials. In later chapters we will look at how it works and extend it with<br>

-new features, but for now we will take it as given. Its API is very simple::<br>

-``addModule`` adds an LLVM IR module to the JIT, making its functions<br>

-available for execution; ``removeModule`` removes a module, freeing any<br>

-memory associated with the code in that module; and ``findSymbol`` allows us<br>

-to look up pointers to the compiled code.<br>

-<br>

-We can take this simple API and change our code that parses top-level expressions to<br>

-look like this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static void HandleTopLevelExpression() {<br>

-      // Evaluate a top-level expression into an anonymous function.<br>

-      if (auto FnAST = ParseTopLevelExpr()) {<br>

-        if (FnAST->codegen()) {<br>

-<br>

-          // JIT the module containing the anonymous expression, keeping a handle so<br>

-          // we can free it later.<br>

-          auto H = TheJIT->addModule(std::move(TheModule));<br>

-          InitializeModuleAndPassManager();<br>

-<br>

-          // Search the JIT for the __anon_expr symbol.<br>

-          auto ExprSymbol = TheJIT->findSymbol("__anon_expr");<br>

-          assert(ExprSymbol && "Function not found");<br>

-<br>

-          // Get the symbol's address and cast it to the right type (takes no<br>

-          // arguments, returns a double) so we can call it as a native function.<br>

-          double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();<br>

-          fprintf(stderr, "Evaluated to %f\n", FP());<br>

-<br>

-          // Delete the anonymous expression module from the JIT.<br>

-          TheJIT->removeModule(H);<br>

-        }<br>

-<br>

-If parsing and codegen succeeed, the next step is to add the module containing<br>

-the top-level expression to the JIT. We do this by calling addModule, which<br>

-triggers code generation for all the functions in the module, and returns a<br>

-handle that can be used to remove the module from the JIT later. Once the module<br>

-has been added to the JIT it can no longer be modified, so we also open a new<br>

-module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.<br>

-<br>

-Once we've added the module to the JIT we need to get a pointer to the final<br>

-generated code. We do this by calling the JIT's findSymbol method, and passing<br>

-the name of the top-level expression function: ``__anon_expr``. Since we just<br>

-added this function, we assert that findSymbol returned a result.<br>

-<br>

-Next, we get the in-memory address of the ``__anon_expr`` function by calling<br>

-``getAddress()`` on the symbol. Recall that we compile top-level expressions<br>

-into a self-contained LLVM function that takes no arguments and returns the<br>

-computed double. Because the LLVM JIT compiler matches the native platform ABI,<br>

-this means that you can just cast the result pointer to a function pointer of<br>

-that type and call it directly. This means, there is no difference between JIT<br>

-compiled code and native machine code that is statically linked into your<br>

-application.<br>

-<br>

-Finally, since we don't support re-evaluation of top-level expressions, we<br>

-remove the module from the JIT when we're done to free the associated memory.<br>

-Recall, however, that the module we created a few lines earlier (via<br>

-``InitializeModuleAndPassManager``) is still open and waiting for new code to be<br>

-added.<br>

-<br>

-With just these two changes, lets see how Kaleidoscope works now!<br>

-<br>

-::<br>

-<br>

-    ready> 4+5;<br>

-    Read top-level expression:<br>

-    define double @0() {<br>

-    entry:<br>

-      ret double 9.000000e+00<br>

-    }<br>

-<br>

-    Evaluated to 9.000000<br>

-<br>

-Well this looks like it is basically working. The dump of the function<br>

-shows the "no argument function that always returns double" that we<br>

-synthesize for each top-level expression that is typed in. This<br>

-demonstrates very basic functionality, but can we do more?<br>

-<br>

-::<br>

-<br>

-    ready> def testfunc(x y) x + y*2;<br>

-    Read function definition:<br>

-    define double @testfunc(double %x, double %y) {<br>

-    entry:<br>

-      %multmp = fmul double %y, 2.000000e+00<br>

-      %addtmp = fadd double %multmp, %x<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-    ready> testfunc(4, 10);<br>

-    Read top-level expression:<br>

-    define double @1() {<br>

-    entry:<br>

-      %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)<br>

-      ret double %calltmp<br>

-    }<br>

-<br>

-    Evaluated to 24.000000<br>

-<br>

-    ready> testfunc(5, 10);<br>

-    ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!<br>

-<br>

-<br>

-Function definitions and calls also work, but something went very wrong on that<br>

-last line. The call looks valid, so what happened? As you may have guessed from<br>

-the the API a Module is a unit of allocation for the JIT, and testfunc was part<br>

-of the same module that contained anonymous expression. When we removed that<br>

-module from the JIT to free the memory for the anonymous expression, we deleted<br>

-the definition of ``testfunc`` along with it. Then, when we tried to call<br>

-testfunc a second time, the JIT could no longer find it.<br>

-<br>

-The easiest way to fix this is to put the anonymous expression in a separate<br>

-module from the rest of the function definitions. The JIT will happily resolve<br>

-function calls across module boundaries, as long as each of the functions called<br>

-has a prototype, and is added to the JIT before it is called. By putting the<br>

-anonymous expression in a different module we can delete it without affecting<br>

-the rest of the functions.<br>

-<br>

-In fact, we're going to go a step further and put every function in its own<br>

-module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT<br>

-that will make our environment more REPL-like: Functions can be added to the<br>

-JIT more than once (unlike a module where every function must have a unique<br>

-definition). When you look up a symbol in KaleidoscopeJIT it will always return<br>

-the most recent definition:<br>

-<br>

-::<br>

-<br>

-    ready> def foo(x) x + 1;<br>

-    Read function definition:<br>

-    define double @foo(double %x) {<br>

-    entry:<br>

-      %addtmp = fadd double %x, 1.000000e+00<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-    ready> foo(2);<br>

-    Evaluated to 3.000000<br>

-<br>

-    ready> def foo(x) x + 2;<br>

-    define double @foo(double %x) {<br>

-    entry:<br>

-      %addtmp = fadd double %x, 2.000000e+00<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-    ready> foo(2);<br>

-    Evaluated to 4.000000<br>

-<br>

-<br>

-To allow each function to live in its own module we'll need a way to<br>

-re-generate previous function declarations into each new module we open:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static std::unique_ptr<KaleidoscopeJIT> TheJIT;<br>

-<br>

-    ...<br>

-<br>

-    Function *getFunction(std::string Name) {<br>

-      // First, see if the function has already been added to the current module.<br>

-      if (auto *F = TheModule->getFunction(Name))<br>

-        return F;<br>

-<br>

-      // If not, check whether we can codegen the declaration from some existing<br>

-      // prototype.<br>

-      auto FI = FunctionProtos.find(Name);<br>

-      if (FI != FunctionProtos.end())<br>

-        return FI->second->codegen();<br>

-<br>

-      // If no existing prototype exists, return null.<br>

-      return nullptr;<br>

-    }<br>

-<br>

-    ...<br>

-<br>

-    Value *CallExprAST::codegen() {<br>

-      // Look up the name in the global module table.<br>

-      Function *CalleeF = getFunction(Callee);<br>

-<br>

-    ...<br>

-<br>

-    Function *FunctionAST::codegen() {<br>

-      // Transfer ownership of the prototype to the FunctionProtos map, but keep a<br>

-      // reference to it for use below.<br>

-      auto &P = *Proto;<br>

-      FunctionProtos[Proto->getName()] = std::move(Proto);<br>

-      Function *TheFunction = getFunction(P.getName());<br>

-      if (!TheFunction)<br>

-        return nullptr;<br>

-<br>

-<br>

-To enable this, we'll start by adding a new global, ``FunctionProtos``, that<br>

-holds the most recent prototype for each function. We'll also add a convenience<br>

-method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.<br>

-Our convenience method searches ``TheModule`` for an existing function<br>

-declaration, falling back to generating a new declaration from FunctionProtos if<br>

-it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the<br>

-call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to<br>

-update the FunctionProtos map first, then call ``getFunction()``. With this<br>

-done, we can always obtain a function declaration in the current module for any<br>

-previously declared function.<br>

-<br>

-We also need to update HandleDefinition and HandleExtern:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static void HandleDefinition() {<br>

-      if (auto FnAST = ParseDefinition()) {<br>

-        if (auto *FnIR = FnAST->codegen()) {<br>

-          fprintf(stderr, "Read function definition:");<br>

-          FnIR->dump();<br>

-          TheJIT->addModule(std::move(TheModule));<br>

-          InitializeModuleAndPassManager();<br>

-        }<br>

-      } else {<br>

-        // Skip token for error recovery.<br>

-         getNextToken();<br>

-      }<br>

-    }<br>

-<br>

-    static void HandleExtern() {<br>

-      if (auto ProtoAST = ParseExtern()) {<br>

-        if (auto *FnIR = ProtoAST->codegen()) {<br>

-          fprintf(stderr, "Read extern: ");<br>

-          FnIR->dump();<br>

-          FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);<br>

-        }<br>

-      } else {<br>

-        // Skip token for error recovery.<br>

-        getNextToken();<br>

-      }<br>

-    }<br>

-<br>

-In HandleDefinition, we add two lines to transfer the newly defined function to<br>

-the JIT and open a new module. In HandleExtern, we just need to add one line to<br>

-add the prototype to FunctionProtos.<br>

-<br>

-With these changes made, lets try our REPL again (I removed the dump of the<br>

-anonymous functions this time, you should get the idea by now :) :<br>

-<br>

-::<br>

-<br>

-    ready> def foo(x) x + 1;<br>

-    ready> foo(2);<br>

-    Evaluated to 3.000000<br>

-<br>

-    ready> def foo(x) x + 2;<br>

-    ready> foo(2);<br>

-    Evaluated to 4.000000<br>

-<br>

-It works!<br>

-<br>

-Even with this simple code, we get some surprisingly powerful capabilities -<br>

-check this out:<br>

-<br>

-::<br>

-<br>

-    ready> extern sin(x);<br>

-    Read extern:<br>

-    declare double @sin(double)<br>

-<br>

-    ready> extern cos(x);<br>

-    Read extern:<br>

-    declare double @cos(double)<br>

-<br>

-    ready> sin(1.0);<br>

-    Read top-level expression:<br>

-    define double @2() {<br>

-    entry:<br>

-      ret double 0x3FEAED548F090CEE<br>

-    }<br>

-<br>

-    Evaluated to 0.841471<br>

-<br>

-    ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);<br>

-    Read function definition:<br>

-    define double @foo(double %x) {<br>

-    entry:<br>

-      %calltmp = call double @sin(double %x)<br>

-      %multmp = fmul double %calltmp, %calltmp<br>

-      %calltmp2 = call double @cos(double %x)<br>

-      %multmp4 = fmul double %calltmp2, %calltmp2<br>

-      %addtmp = fadd double %multmp, %multmp4<br>

-      ret double %addtmp<br>

-    }<br>

-<br>

-    ready> foo(4.0);<br>

-    Read top-level expression:<br>

-    define double @3() {<br>

-    entry:<br>

-      %calltmp = call double @foo(double 4.000000e+00)<br>

-      ret double %calltmp<br>

-    }<br>

-<br>

-    Evaluated to 1.000000<br>

-<br>

-Whoa, how does the JIT know about sin and cos? The answer is surprisingly<br>

-simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that<br>

-it uses to find symbols that aren't available in any given module: First<br>

-it searches all the modules that have already been added to the JIT, from the<br>

-most recent to the oldest, to find the newest definition. If no definition is<br>

-found inside the JIT, it falls back to calling "``dlsym("sin")``" on the<br>

-Kaleidoscope process itself. Since "``sin``" is defined within the JIT's<br>

-address space, it simply patches up calls in the module to call the libm<br>

-version of ``sin`` directly.<br>

-<br>

-In the future we'll see how tweaking this symbol resolution rule can be used to<br>

-enable all sorts of useful features, from security (restricting the set of<br>

-symbols available to JIT'd code), to dynamic code generation based on symbol<br>

-names, and even lazy compilation.<br>

-<br>

-One immediate benefit of the symbol resolution rule is that we can now extend<br>

-the language by writing arbitrary C++ code to implement operations. For example,<br>

-if we add:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// putchard - putchar that takes a double and returns 0.<br>

-    extern "C" double putchard(double X) {<br>

-      fputc((char)X, stderr);<br>

-      return 0;<br>

-    }<br>

-<br>

-Now we can produce simple output to the console by using things like:<br>

-"``extern putchard(x); putchard(120);``", which prints a lowercase 'x'<br>

-on the console (120 is the ASCII code for 'x'). Similar code could be<br>

-used to implement file I/O, console input, and many other capabilities<br>

-in Kaleidoscope.<br>

-<br>

-This completes the JIT and optimizer chapter of the Kaleidoscope<br>

-tutorial. At this point, we can compile a non-Turing-complete<br>

-programming language, optimize and JIT compile it in a user-driven way.<br>

-Next up we'll look into `extending the language with control flow<br>

-constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues<br>

-along the way.<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for our running example, enhanced with<br>

-the LLVM JIT and optimizer. To build this example, use:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

-    # Run<br>

-    ./toy<br>

-<br>

-If you are compiling this on Linux, make sure to add the "-rdynamic"<br>

-option as well. This makes sure that the external functions are resolved<br>

-properly at runtime.<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Extending the language: control flow <LangImpl5.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl5-cfg.png<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl5-cfg.png?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl5-cfg.png?rev=274440&view=auto</a><br>

==============================================================================<br>

Binary file - no diff available.<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl5.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl5.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl5.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl5.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl5.rst (removed)<br>

@@ -1,790 +0,0 @@<br>

-==================================================<br>

-Kaleidoscope: Extending the Language: Control Flow<br>

-==================================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 5 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 5 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of<br>

-the simple Kaleidoscope language and included support for generating<br>

-LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as<br>

-presented, Kaleidoscope is mostly useless: it has no control flow other<br>

-than call and return. This means that you can't have conditional<br>

-branches in the code, significantly limiting its power. In this episode<br>

-of "build that compiler", we'll extend Kaleidoscope to have an<br>

-if/then/else expression plus a simple 'for' loop.<br>

-<br>

-If/Then/Else<br>

-============<br>

-<br>

-Extending Kaleidoscope to support if/then/else is quite straightforward.<br>

-It basically requires adding support for this "new" concept to the<br>

-lexer, parser, AST, and LLVM code emitter. This example is nice, because<br>

-it shows how easy it is to "grow" a language over time, incrementally<br>

-extending it as new ideas are discovered.<br>

-<br>

-Before we get going on "how" we add this extension, lets talk about<br>

-"what" we want. The basic idea is that we want to be able to write this<br>

-sort of thing:<br>

-<br>

-::<br>

-<br>

-    def fib(x)<br>

-      if x < 3 then<br>

-        1<br>

-      else<br>

-        fib(x-1)+fib(x-2);<br>

-<br>

-In Kaleidoscope, every construct is an expression: there are no<br>

-statements. As such, the if/then/else expression needs to return a value<br>

-like any other. Since we're using a mostly functional form, we'll have<br>

-it evaluate its conditional, then return the 'then' or 'else' value<br>

-based on how the condition was resolved. This is very similar to the C<br>

-"?:" expression.<br>

-<br>

-The semantics of the if/then/else expression is that it evaluates the<br>

-condition to a boolean equality value: 0.0 is considered to be false and<br>

-everything else is considered to be true. If the condition is true, the<br>

-first subexpression is evaluated and returned, if the condition is<br>

-false, the second subexpression is evaluated and returned. Since<br>

-Kaleidoscope allows side-effects, this behavior is important to nail<br>

-down.<br>

-<br>

-Now that we know what we "want", lets break this down into its<br>

-constituent pieces.<br>

-<br>

-Lexer Extensions for If/Then/Else<br>

----------------------------------<br>

-<br>

-The lexer extensions are straightforward. First we add new enum values<br>

-for the relevant tokens:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // control<br>

-      tok_if = -6,<br>

-      tok_then = -7,<br>

-      tok_else = -8,<br>

-<br>

-Once we have that, we recognize the new keywords in the lexer. This is<br>

-pretty simple stuff:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        ...<br>

-        if (IdentifierStr == "def")<br>

-          return tok_def;<br>

-        if (IdentifierStr == "extern")<br>

-          return tok_extern;<br>

-        if (IdentifierStr == "if")<br>

-          return tok_if;<br>

-        if (IdentifierStr == "then")<br>

-          return tok_then;<br>

-        if (IdentifierStr == "else")<br>

-          return tok_else;<br>

-        return tok_identifier;<br>

-<br>

-AST Extensions for If/Then/Else<br>

--------------------------------<br>

-<br>

-To represent the new expression we add a new AST node for it:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// IfExprAST - Expression class for if/then/else.<br>

-    class IfExprAST : public ExprAST {<br>

-      std::unique_ptr<ExprAST> Cond, Then, Else;<br>

-<br>

-    public:<br>

-      IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,<br>

-                std::unique_ptr<ExprAST> Else)<br>

-        : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}<br>

-      virtual Value *codegen();<br>

-    };<br>

-<br>

-The AST node just has pointers to the various subexpressions.<br>

-<br>

-Parser Extensions for If/Then/Else<br>

-----------------------------------<br>

-<br>

-Now that we have the relevant tokens coming from the lexer and we have<br>

-the AST node to build, our parsing logic is relatively straightforward.<br>

-First we define a new parsing function:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// ifexpr ::= 'if' expression 'then' expression 'else' expression<br>

-    static std::unique_ptr<ExprAST> ParseIfExpr() {<br>

-      getNextToken();  // eat the if.<br>

-<br>

-      // condition.<br>

-      auto Cond = ParseExpression();<br>

-      if (!Cond)<br>

-        return nullptr;<br>

-<br>

-      if (CurTok != tok_then)<br>

-        return LogError("expected then");<br>

-      getNextToken();  // eat the then<br>

-<br>

-      auto Then = ParseExpression();<br>

-      if (!Then)<br>

-        return nullptr;<br>

-<br>

-      if (CurTok != tok_else)<br>

-        return LogError("expected else");<br>

-<br>

-      getNextToken();<br>

-<br>

-      auto Else = ParseExpression();<br>

-      if (!Else)<br>

-        return nullptr;<br>

-<br>

-      return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),<br>

-                                          std::move(Else));<br>

-    }<br>

-<br>

-Next we hook it up as a primary expression:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static std::unique_ptr<ExprAST> ParsePrimary() {<br>

-      switch (CurTok) {<br>

-      default:<br>

-        return LogError("unknown token when expecting an expression");<br>

-      case tok_identifier:<br>

-        return ParseIdentifierExpr();<br>

-      case tok_number:<br>

-        return ParseNumberExpr();<br>

-      case '(':<br>

-        return ParseParenExpr();<br>

-      case tok_if:<br>

-        return ParseIfExpr();<br>

-      }<br>

-    }<br>

-<br>

-LLVM IR for If/Then/Else<br>

-------------------------<br>

-<br>

-Now that we have it parsing and building the AST, the final piece is<br>

-adding LLVM code generation support. This is the most interesting part<br>

-of the if/then/else example, because this is where it starts to<br>

-introduce new concepts. All of the code above has been thoroughly<br>

-described in previous chapters.<br>

-<br>

-To motivate the code we want to produce, lets take a look at a simple<br>

-example. Consider:<br>

-<br>

-::<br>

-<br>

-    extern foo();<br>

-    extern bar();<br>

-    def baz(x) if x then foo() else bar();<br>

-<br>

-If you disable optimizations, the code you'll (soon) get from<br>

-Kaleidoscope looks like this:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    declare double @foo()<br>

-<br>

-    declare double @bar()<br>

-<br>

-    define double @baz(double %x) {<br>

-    entry:<br>

-      %ifcond = fcmp one double %x, 0.000000e+00<br>

-      br i1 %ifcond, label %then, label %else<br>

-<br>

-    then:       ; preds = %entry<br>

-      %calltmp = call double @foo()<br>

-      br label %ifcont<br>

-<br>

-    else:       ; preds = %entry<br>

-      %calltmp1 = call double @bar()<br>

-      br label %ifcont<br>

-<br>

-    ifcont:     ; preds = %else, %then<br>

-      %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ]<br>

-      ret double %iftmp<br>

-    }<br>

-<br>

-To visualize the control flow graph, you can use a nifty feature of the<br>

-LLVM '`opt <<a href="http://llvm.org/cmds/opt.html" rel="noreferrer" target="_blank">http://llvm.org/cmds/opt.html</a>>`_' tool. If you put this LLVM<br>

-IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a<br>

-window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll<br>

-see this graph:<br>

-<br>

-.. figure:: LangImpl5-cfg.png<br>

-   :align: center<br>

-   :alt: Example CFG<br>

-<br>

-   Example CFG<br>

-<br>

-Another way to get this is to call "``F->viewCFG()``" or<br>

-"``F->viewCFGOnly()``" (where F is a "``Function*``") either by<br>

-inserting actual calls into the code and recompiling or by calling these<br>

-in the debugger. LLVM has many nice features for visualizing various<br>

-graphs.<br>

-<br>

-Getting back to the generated code, it is fairly simple: the entry block<br>

-evaluates the conditional expression ("x" in our case here) and compares<br>

-the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered<br>

-and Not Equal"). Based on the result of this expression, the code jumps<br>

-to either the "then" or "else" blocks, which contain the expressions for<br>

-the true/false cases.<br>

-<br>

-Once the then/else blocks are finished executing, they both branch back<br>

-to the 'ifcont' block to execute the code that happens after the<br>

-if/then/else. In this case the only thing left to do is to return to the<br>

-caller of the function. The question then becomes: how does the code<br>

-know which expression to return?<br>

-<br>

-The answer to this question involves an important SSA operation: the<br>

-`Phi<br>

-operation <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_.<br>

-If you're not familiar with SSA, `the wikipedia<br>

-article <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_<br>

-is a good introduction and there are various other introductions to it<br>

-available on your favorite search engine. The short version is that<br>

-"execution" of the Phi operation requires "remembering" which block<br>

-control came from. The Phi operation takes on the value corresponding to<br>

-the input control block. In this case, if control comes in from the<br>

-"then" block, it gets the value of "calltmp". If control comes from the<br>

-"else" block, it gets the value of "calltmp1".<br>

-<br>

-At this point, you are probably starting to think "Oh no! This means my<br>

-simple and elegant front-end will have to start generating SSA form in<br>

-order to use LLVM!". Fortunately, this is not the case, and we strongly<br>

-advise *not* implementing an SSA construction algorithm in your<br>

-front-end unless there is an amazingly good reason to do so. In<br>

-practice, there are two sorts of values that float around in code<br>

-written for your average imperative programming language that might need<br>

-Phi nodes:<br>

-<br>

-#. Code that involves user variables: ``x = 1; x = x + 1;``<br>

-#. Values that are implicit in the structure of your AST, such as the<br>

-   Phi node in this case.<br>

-<br>

-In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"),<br>

-we'll talk about #1 in depth. For now, just believe me that you don't<br>

-need SSA construction to handle this case. For #2, you have the choice<br>

-of using the techniques that we will describe for #1, or you can insert<br>

-Phi nodes directly, if convenient. In this case, it is really<br>

-easy to generate the Phi node, so we choose to do it directly.<br>

-<br>

-Okay, enough of the motivation and overview, lets generate code!<br>

-<br>

-Code Generation for If/Then/Else<br>

---------------------------------<br>

-<br>

-In order to generate code for this, we implement the ``codegen`` method<br>

-for ``IfExprAST``:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *IfExprAST::codegen() {<br>

-      Value *CondV = Cond->codegen();<br>

-      if (!CondV)<br>

-        return nullptr;<br>

-<br>

-      // Convert condition to a bool by comparing equal to 0.0.<br>

-      CondV = Builder.CreateFCmpONE(<br>

-          CondV, ConstantFP::get(LLVMContext, APFloat(0.0)), "ifcond");<br>

-<br>

-This code is straightforward and similar to what we saw before. We emit<br>

-the expression for the condition, then compare that value to zero to get<br>

-a truth value as a 1-bit (bool) value.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

-<br>

-      // Create blocks for the then and else cases.  Insert the 'then' block at the<br>

-      // end of the function.<br>

-      BasicBlock *ThenBB =<br>

-          BasicBlock::Create(LLVMContext, "then", TheFunction);<br>

-      BasicBlock *ElseBB = BasicBlock::Create(LLVMContext, "else");<br>

-      BasicBlock *MergeBB = BasicBlock::Create(LLVMContext, "ifcont");<br>

-<br>

-      Builder.CreateCondBr(CondV, ThenBB, ElseBB);<br>

-<br>

-This code creates the basic blocks that are related to the if/then/else<br>

-statement, and correspond directly to the blocks in the example above.<br>

-The first line gets the current Function object that is being built. It<br>

-gets this by asking the builder for the current BasicBlock, and asking<br>

-that block for its "parent" (the function it is currently embedded<br>

-into).<br>

-<br>

-Once it has that, it creates three blocks. Note that it passes<br>

-"TheFunction" into the constructor for the "then" block. This causes the<br>

-constructor to automatically insert the new block into the end of the<br>

-specified function. The other two blocks are created, but aren't yet<br>

-inserted into the function.<br>

-<br>

-Once the blocks are created, we can emit the conditional branch that<br>

-chooses between them. Note that creating new blocks does not implicitly<br>

-affect the IRBuilder, so it is still inserting into the block that the<br>

-condition went into. Also note that it is creating a branch to the<br>

-"then" block and the "else" block, even though the "else" block isn't<br>

-inserted into the function yet. This is all ok: it is the standard way<br>

-that LLVM supports forward references.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Emit then value.<br>

-      Builder.SetInsertPoint(ThenBB);<br>

-<br>

-      Value *ThenV = Then->codegen();<br>

-      if (!ThenV)<br>

-        return nullptr;<br>

-<br>

-      Builder.CreateBr(MergeBB);<br>

-      // Codegen of 'Then' can change the current block, update ThenBB for the PHI.<br>

-      ThenBB = Builder.GetInsertBlock();<br>

-<br>

-After the conditional branch is inserted, we move the builder to start<br>

-inserting into the "then" block. Strictly speaking, this call moves the<br>

-insertion point to be at the end of the specified block. However, since<br>

-the "then" block is empty, it also starts out by inserting at the<br>

-beginning of the block. :)<br>

-<br>

-Once the insertion point is set, we recursively codegen the "then"<br>

-expression from the AST. To finish off the "then" block, we create an<br>

-unconditional branch to the merge block. One interesting (and very<br>

-important) aspect of the LLVM IR is that it `requires all basic blocks<br>

-to be "terminated" <../LangRef.html#functionstructure>`_ with a `control<br>

-flow instruction <../LangRef.html#terminators>`_ such as return or<br>

-branch. This means that all control flow, *including fall throughs* must<br>

-be made explicit in the LLVM IR. If you violate this rule, the verifier<br>

-will emit an error.<br>

-<br>

-The final line here is quite subtle, but is very important. The basic<br>

-issue is that when we create the Phi node in the merge block, we need to<br>

-set up the block/value pairs that indicate how the Phi will work.<br>

-Importantly, the Phi node expects to have an entry for each predecessor<br>

-of the block in the CFG. Why then, are we getting the current block when<br>

-we just set it to ThenBB 5 lines above? The problem is that the "Then"<br>

-expression may actually itself change the block that the Builder is<br>

-emitting into if, for example, it contains a nested "if/then/else"<br>

-expression. Because calling ``codegen()`` recursively could arbitrarily change<br>

-the notion of the current block, we are required to get an up-to-date<br>

-value for code that will set up the Phi node.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Emit else block.<br>

-      TheFunction->getBasicBlockList().push_back(ElseBB);<br>

-      Builder.SetInsertPoint(ElseBB);<br>

-<br>

-      Value *ElseV = Else->codegen();<br>

-      if (!ElseV)<br>

-        return nullptr;<br>

-<br>

-      Builder.CreateBr(MergeBB);<br>

-      // codegen of 'Else' can change the current block, update ElseBB for the PHI.<br>

-      ElseBB = Builder.GetInsertBlock();<br>

-<br>

-Code generation for the 'else' block is basically identical to codegen<br>

-for the 'then' block. The only significant difference is the first line,<br>

-which adds the 'else' block to the function. Recall previously that the<br>

-'else' block was created, but not added to the function. Now that the<br>

-'then' and 'else' blocks are emitted, we can finish up with the merge<br>

-code:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Emit merge block.<br>

-      TheFunction->getBasicBlockList().push_back(MergeBB);<br>

-      Builder.SetInsertPoint(MergeBB);<br>

-      PHINode *PN =<br>

-        Builder.CreatePHI(Type::getDoubleTy(LLVMContext), 2, "iftmp");<br>

-<br>

-      PN->addIncoming(ThenV, ThenBB);<br>

-      PN->addIncoming(ElseV, ElseBB);<br>

-      return PN;<br>

-    }<br>

-<br>

-The first two lines here are now familiar: the first adds the "merge"<br>

-block to the Function object (it was previously floating, like the else<br>

-block above). The second changes the insertion point so that newly<br>

-created code will go into the "merge" block. Once that is done, we need<br>

-to create the PHI node and set up the block/value pairs for the PHI.<br>

-<br>

-Finally, the CodeGen function returns the phi node as the value computed<br>

-by the if/then/else expression. In our example above, this returned<br>

-value will feed into the code for the top-level function, which will<br>

-create the return instruction.<br>

-<br>

-Overall, we now have the ability to execute conditional code in<br>

-Kaleidoscope. With this extension, Kaleidoscope is a fairly complete<br>

-language that can calculate a wide variety of numeric functions. Next up<br>

-we'll add another useful expression that is familiar from non-functional<br>

-languages...<br>

-<br>

-'for' Loop Expression<br>

-=====================<br>

-<br>

-Now that we know how to add basic control flow constructs to the<br>

-language, we have the tools to add more powerful things. Lets add<br>

-something more aggressive, a 'for' expression:<br>

-<br>

-::<br>

-<br>

-     extern putchard(char)<br>

-     def printstar(n)<br>

-       for i = 1, i < n, 1.0 in<br>

-         putchard(42);  # ascii 42 = '*'<br>

-<br>

-     # print 100 '*' characters<br>

-     printstar(100);<br>

-<br>

-This expression defines a new variable ("i" in this case) which iterates<br>

-from a starting value, while the condition ("i < n" in this case) is<br>

-true, incrementing by an optional step value ("1.0" in this case). If<br>

-the step value is omitted, it defaults to 1.0. While the loop is true,<br>

-it executes its body expression. Because we don't have anything better<br>

-to return, we'll just define the loop as always returning 0.0. In the<br>

-future when we have mutable variables, it will get more useful.<br>

-<br>

-As before, lets talk about the changes that we need to Kaleidoscope to<br>

-support this.<br>

-<br>

-Lexer Extensions for the 'for' Loop<br>

------------------------------------<br>

-<br>

-The lexer extensions are the same sort of thing as for if/then/else:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      ... in enum Token ...<br>

-      // control<br>

-      tok_if = -6, tok_then = -7, tok_else = -8,<br>

-      tok_for = -9, tok_in = -10<br>

-<br>

-      ... in gettok ...<br>

-      if (IdentifierStr == "def")<br>

-        return tok_def;<br>

-      if (IdentifierStr == "extern")<br>

-        return tok_extern;<br>

-      if (IdentifierStr == "if")<br>

-        return tok_if;<br>

-      if (IdentifierStr == "then")<br>

-        return tok_then;<br>

-      if (IdentifierStr == "else")<br>

-        return tok_else;<br>

-      if (IdentifierStr == "for")<br>

-        return tok_for;<br>

-      if (IdentifierStr == "in")<br>

-        return tok_in;<br>

-      return tok_identifier;<br>

-<br>

-AST Extensions for the 'for' Loop<br>

----------------------------------<br>

-<br>

-The AST node is just as simple. It basically boils down to capturing the<br>

-variable name and the constituent expressions in the node.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// ForExprAST - Expression class for for/in.<br>

-    class ForExprAST : public ExprAST {<br>

-      std::string VarName;<br>

-      std::unique_ptr<ExprAST> Start, End, Step, Body;<br>

-<br>

-    public:<br>

-      ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,<br>

-                 std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,<br>

-                 std::unique_ptr<ExprAST> Body)<br>

-        : VarName(VarName), Start(std::move(Start)), End(std::move(End)),<br>

-          Step(std::move(Step)), Body(std::move(Body)) {}<br>

-      virtual Value *codegen();<br>

-    };<br>

-<br>

-Parser Extensions for the 'for' Loop<br>

-------------------------------------<br>

-<br>

-The parser code is also fairly standard. The only interesting thing here<br>

-is handling of the optional step value. The parser code handles it by<br>

-checking to see if the second comma is present. If not, it sets the step<br>

-value to null in the AST node:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression<br>

-    static std::unique_ptr<ExprAST> ParseForExpr() {<br>

-      getNextToken();  // eat the for.<br>

-<br>

-      if (CurTok != tok_identifier)<br>

-        return LogError("expected identifier after for");<br>

-<br>

-      std::string IdName = IdentifierStr;<br>

-      getNextToken();  // eat identifier.<br>

-<br>

-      if (CurTok != '=')<br>

-        return LogError("expected '=' after for");<br>

-      getNextToken();  // eat '='.<br>

-<br>

-<br>

-      auto Start = ParseExpression();<br>

-      if (!Start)<br>

-        return nullptr;<br>

-      if (CurTok != ',')<br>

-        return LogError("expected ',' after for start value");<br>

-      getNextToken();<br>

-<br>

-      auto End = ParseExpression();<br>

-      if (!End)<br>

-        return nullptr;<br>

-<br>

-      // The step value is optional.<br>

-      std::unique_ptr<ExprAST> Step;<br>

-      if (CurTok == ',') {<br>

-        getNextToken();<br>

-        Step = ParseExpression();<br>

-        if (!Step)<br>

-          return nullptr;<br>

-      }<br>

-<br>

-      if (CurTok != tok_in)<br>

-        return LogError("expected 'in' after for");<br>

-      getNextToken();  // eat 'in'.<br>

-<br>

-      auto Body = ParseExpression();<br>

-      if (!Body)<br>

-        return nullptr;<br>

-<br>

-      return llvm::make_unique<ForExprAST>(IdName, std::move(Start),<br>

-                                           std::move(End), std::move(Step),<br>

-                                           std::move(Body));<br>

-    }<br>

-<br>

-LLVM IR for the 'for' Loop<br>

---------------------------<br>

-<br>

-Now we get to the good part: the LLVM IR we want to generate for this<br>

-thing. With the simple example above, we get this LLVM IR (note that<br>

-this dump is generated with optimizations disabled for clarity):<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    declare double @putchard(double)<br>

-<br>

-    define double @printstar(double %n) {<br>

-    entry:<br>

-      ; initial value = 1.0 (inlined into phi)<br>

-      br label %loop<br>

-<br>

-    loop:       ; preds = %loop, %entry<br>

-      %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ]<br>

-      ; body<br>

-      %calltmp = call double @putchard(double 4.200000e+01)<br>

-      ; increment<br>

-      %nextvar = fadd double %i, 1.000000e+00<br>

-<br>

-      ; termination test<br>

-      %cmptmp = fcmp ult double %i, %n<br>

-      %booltmp = uitofp i1 %cmptmp to double<br>

-      %loopcond = fcmp one double %booltmp, 0.000000e+00<br>

-      br i1 %loopcond, label %loop, label %afterloop<br>

-<br>

-    afterloop:      ; preds = %loop<br>

-      ; loop always returns 0.0<br>

-      ret double 0.000000e+00<br>

-    }<br>

-<br>

-This loop contains all the same constructs we saw before: a phi node,<br>

-several expressions, and some basic blocks. Lets see how this fits<br>

-together.<br>

-<br>

-Code Generation for the 'for' Loop<br>

-----------------------------------<br>

-<br>

-The first part of codegen is very simple: we just output the start<br>

-expression for the loop value:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *ForExprAST::codegen() {<br>

-      // Emit the start code first, without 'variable' in scope.<br>

-      Value *StartVal = Start->codegen();<br>

-      if (StartVal == 0) return 0;<br>

-<br>

-With this out of the way, the next step is to set up the LLVM basic<br>

-block for the start of the loop body. In the case above, the whole loop<br>

-body is one block, but remember that the body code itself could consist<br>

-of multiple blocks (e.g. if it contains an if/then/else or a for/in<br>

-expression).<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Make the new basic block for the loop header, inserting after current<br>

-      // block.<br>

-      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

-      BasicBlock *PreheaderBB = Builder.GetInsertBlock();<br>

-      BasicBlock *LoopBB =<br>

-          BasicBlock::Create(LLVMContext, "loop", TheFunction);<br>

-<br>

-      // Insert an explicit fall through from the current block to the LoopBB.<br>

-      Builder.CreateBr(LoopBB);<br>

-<br>

-This code is similar to what we saw for if/then/else. Because we will<br>

-need it to create the Phi node, we remember the block that falls through<br>

-into the loop. Once we have that, we create the actual block that starts<br>

-the loop and create an unconditional branch for the fall-through between<br>

-the two blocks.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Start insertion in LoopBB.<br>

-      Builder.SetInsertPoint(LoopBB);<br>

-<br>

-      // Start the PHI node with an entry for Start.<br>

-      PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(LLVMContext),<br>

-                                            2, VarName.c_str());<br>

-      Variable->addIncoming(StartVal, PreheaderBB);<br>

-<br>

-Now that the "preheader" for the loop is set up, we switch to emitting<br>

-code for the loop body. To begin with, we move the insertion point and<br>

-create the PHI node for the loop induction variable. Since we already<br>

-know the incoming value for the starting value, we add it to the Phi<br>

-node. Note that the Phi will eventually get a second value for the<br>

-backedge, but we can't set it up yet (because it doesn't exist!).<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Within the loop, the variable is defined equal to the PHI node.  If it<br>

-      // shadows an existing variable, we have to restore it, so save it now.<br>

-      Value *OldVal = NamedValues[VarName];<br>

-      NamedValues[VarName] = Variable;<br>

-<br>

-      // Emit the body of the loop.  This, like any other expr, can change the<br>

-      // current BB.  Note that we ignore the value computed by the body, but don't<br>

-      // allow an error.<br>

-      if (!Body->codegen())<br>

-        return nullptr;<br>

-<br>

-Now the code starts to get more interesting. Our 'for' loop introduces a<br>

-new variable to the symbol table. This means that our symbol table can<br>

-now contain either function arguments or loop variables. To handle this,<br>

-before we codegen the body of the loop, we add the loop variable as the<br>

-current value for its name. Note that it is possible that there is a<br>

-variable of the same name in the outer scope. It would be easy to make<br>

-this an error (emit an error and return null if there is already an<br>

-entry for VarName) but we choose to allow shadowing of variables. In<br>

-order to handle this correctly, we remember the Value that we are<br>

-potentially shadowing in ``OldVal`` (which will be null if there is no<br>

-shadowed variable).<br>

-<br>

-Once the loop variable is set into the symbol table, the code<br>

-recursively codegen's the body. This allows the body to use the loop<br>

-variable: any references to it will naturally find it in the symbol<br>

-table.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Emit the step value.<br>

-      Value *StepVal = nullptr;<br>

-      if (Step) {<br>

-        StepVal = Step->codegen();<br>

-        if (!StepVal)<br>

-          return nullptr;<br>

-      } else {<br>

-        // If not specified, use 1.0.<br>

-        StepVal = ConstantFP::get(LLVMContext, APFloat(1.0));<br>

-      }<br>

-<br>

-      Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar");<br>

-<br>

-Now that the body is emitted, we compute the next value of the iteration<br>

-variable by adding the step value, or 1.0 if it isn't present.<br>

-'``NextVar``' will be the value of the loop variable on the next<br>

-iteration of the loop.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Compute the end condition.<br>

-      Value *EndCond = End->codegen();<br>

-      if (!EndCond)<br>

-        return nullptr;<br>

-<br>

-      // Convert condition to a bool by comparing equal to 0.0.<br>

-      EndCond = Builder.CreateFCmpONE(<br>

-          EndCond, ConstantFP::get(LLVMContext, APFloat(0.0)), "loopcond");<br>

-<br>

-Finally, we evaluate the exit value of the loop, to determine whether<br>

-the loop should exit. This mirrors the condition evaluation for the<br>

-if/then/else statement.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Create the "after loop" block and insert it.<br>

-      BasicBlock *LoopEndBB = Builder.GetInsertBlock();<br>

-      BasicBlock *AfterBB =<br>

-          BasicBlock::Create(LLVMContext, "afterloop", TheFunction);<br>

-<br>

-      // Insert the conditional branch into the end of LoopEndBB.<br>

-      Builder.CreateCondBr(EndCond, LoopBB, AfterBB);<br>

-<br>

-      // Any new code will be inserted in AfterBB.<br>

-      Builder.SetInsertPoint(AfterBB);<br>

-<br>

-With the code for the body of the loop complete, we just need to finish<br>

-up the control flow for it. This code remembers the end block (for the<br>

-phi node), then creates the block for the loop exit ("afterloop"). Based<br>

-on the value of the exit condition, it creates a conditional branch that<br>

-chooses between executing the loop again and exiting the loop. Any<br>

-future code is emitted in the "afterloop" block, so it sets the<br>

-insertion position to it.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Add a new entry to the PHI node for the backedge.<br>

-      Variable->addIncoming(NextVar, LoopEndBB);<br>

-<br>

-      // Restore the unshadowed variable.<br>

-      if (OldVal)<br>

-        NamedValues[VarName] = OldVal;<br>

-      else<br>

-        NamedValues.erase(VarName);<br>

-<br>

-      // for expr always returns 0.0.<br>

-      return Constant::getNullValue(Type::getDoubleTy(LLVMContext));<br>

-    }<br>

-<br>

-The final code handles various cleanups: now that we have the "NextVar"<br>

-value, we can add the incoming value to the loop PHI node. After that,<br>

-we remove the loop variable from the symbol table, so that it isn't in<br>

-scope after the for loop. Finally, code generation of the for loop<br>

-always returns 0.0, so that is what we return from<br>

-``ForExprAST::codegen()``.<br>

-<br>

-With this, we conclude the "adding control flow to Kaleidoscope" chapter<br>

-of the tutorial. In this chapter we added two control flow constructs,<br>

-and used them to motivate a couple of aspects of the LLVM IR that are<br>

-important for front-end implementors to know. In the next chapter of our<br>

-saga, we will get a bit crazier and add `user-defined<br>

-operators <LangImpl6.html>`_ to our poor innocent language.<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for our running example, enhanced with<br>

-the if/then/else and for expressions.. To build this example, use:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

-    # Run<br>

-    ./toy<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter5/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Extending the language: user-defined operators <LangImpl6.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl6.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl6.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl6.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl6.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl6.rst (removed)<br>

@@ -1,768 +0,0 @@<br>

-============================================================<br>

-Kaleidoscope: Extending the Language: User-defined Operators<br>

-============================================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 6 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 6 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. At this point in our tutorial, we now<br>

-have a fully functional language that is fairly minimal, but also<br>

-useful. There is still one big problem with it, however. Our language<br>

-doesn't have many useful operators (like division, logical negation, or<br>

-even any comparisons besides less-than).<br>

-<br>

-This chapter of the tutorial takes a wild digression into adding<br>

-user-defined operators to the simple and beautiful Kaleidoscope<br>

-language. This digression now gives us a simple and ugly language in<br>

-some ways, but also a powerful one at the same time. One of the great<br>

-things about creating your own language is that you get to decide what<br>

-is good or bad. In this tutorial we'll assume that it is okay to use<br>

-this as a way to show some interesting parsing techniques.<br>

-<br>

-At the end of this tutorial, we'll run through an example Kaleidoscope<br>

-application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an<br>

-example of what you can build with Kaleidoscope and its feature set.<br>

-<br>

-User-defined Operators: the Idea<br>

-================================<br>

-<br>

-The "operator overloading" that we will add to Kaleidoscope is more<br>

-general than languages like C++. In C++, you are only allowed to<br>

-redefine existing operators: you can't programatically change the<br>

-grammar, introduce new operators, change precedence levels, etc. In this<br>

-chapter, we will add this capability to Kaleidoscope, which will let the<br>

-user round out the set of operators that are supported.<br>

-<br>

-The point of going into user-defined operators in a tutorial like this<br>

-is to show the power and flexibility of using a hand-written parser.<br>

-Thus far, the parser we have been implementing uses recursive descent<br>

-for most parts of the grammar and operator precedence parsing for the<br>

-expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without<br>

-using operator precedence parsing, it would be very difficult to allow<br>

-the programmer to introduce new operators into the grammar: the grammar<br>

-is dynamically extensible as the JIT runs.<br>

-<br>

-The two specific features we'll add are programmable unary operators<br>

-(right now, Kaleidoscope has no unary operators at all) as well as<br>

-binary operators. An example of this is:<br>

-<br>

-::<br>

-<br>

-    # Logical unary not.<br>

-    def unary!(v)<br>

-      if v then<br>

-        0<br>

-      else<br>

-        1;<br>

-<br>

-    # Define > with the same precedence as <.<br>

-    def binary> 10 (LHS RHS)<br>

-      RHS < LHS;<br>

-<br>

-    # Binary "logical or", (note that it does not "short circuit")<br>

-    def binary| 5 (LHS RHS)<br>

-      if LHS then<br>

-        1<br>

-      else if RHS then<br>

-        1<br>

-      else<br>

-        0;<br>

-<br>

-    # Define = with slightly lower precedence than relationals.<br>

-    def binary= 9 (LHS RHS)<br>

-      !(LHS < RHS | LHS > RHS);<br>

-<br>

-Many languages aspire to being able to implement their standard runtime<br>

-library in the language itself. In Kaleidoscope, we can implement<br>

-significant parts of the language in the library!<br>

-<br>

-We will break down implementation of these features into two parts:<br>

-implementing support for user-defined binary operators and adding unary<br>

-operators.<br>

-<br>

-User-defined Binary Operators<br>

-=============================<br>

-<br>

-Adding support for user-defined binary operators is pretty simple with<br>

-our current framework. We'll first add support for the unary/binary<br>

-keywords:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    enum Token {<br>

-      ...<br>

-      // operators<br>

-      tok_binary = -11,<br>

-      tok_unary = -12<br>

-    };<br>

-    ...<br>

-    static int gettok() {<br>

-    ...<br>

-        if (IdentifierStr == "for")<br>

-          return tok_for;<br>

-        if (IdentifierStr == "in")<br>

-          return tok_in;<br>

-        if (IdentifierStr == "binary")<br>

-          return tok_binary;<br>

-        if (IdentifierStr == "unary")<br>

-          return tok_unary;<br>

-        return tok_identifier;<br>

-<br>

-This just adds lexer support for the unary and binary keywords, like we<br>

-did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing<br>

-about our current AST, is that we represent binary operators with full<br>

-generalisation by using their ASCII code as the opcode. For our extended<br>

-operators, we'll use this same representation, so we don't need any new<br>

-AST or parser support.<br>

-<br>

-On the other hand, we have to be able to represent the definitions of<br>

-these new operators, in the "def binary\| 5" part of the function<br>

-definition. In our grammar so far, the "name" for the function<br>

-definition is parsed as the "prototype" production and into the<br>

-``PrototypeAST`` AST node. To represent our new user-defined operators<br>

-as prototypes, we have to extend the ``PrototypeAST`` AST node like<br>

-this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// PrototypeAST - This class represents the "prototype" for a function,<br>

-    /// which captures its argument names as well as if it is an operator.<br>

-    class PrototypeAST {<br>

-      std::string Name;<br>

-      std::vector<std::string> Args;<br>

-      bool IsOperator;<br>

-      unsigned Precedence;  // Precedence if a binary op.<br>

-<br>

-    public:<br>

-      PrototypeAST(const std::string &name, std::vector<std::string> Args,<br>

-                   bool IsOperator = false, unsigned Prec = 0)<br>

-      : Name(name), Args(std::move(Args)), IsOperator(IsOperator),<br>

-        Precedence(Prec) {}<br>

-<br>

-      bool isUnaryOp() const { return IsOperator && Args.size() == 1; }<br>

-      bool isBinaryOp() const { return IsOperator && Args.size() == 2; }<br>

-<br>

-      char getOperatorName() const {<br>

-        assert(isUnaryOp() || isBinaryOp());<br>

-        return Name[Name.size()-1];<br>

-      }<br>

-<br>

-      unsigned getBinaryPrecedence() const { return Precedence; }<br>

-<br>

-      Function *codegen();<br>

-    };<br>

-<br>

-Basically, in addition to knowing a name for the prototype, we now keep<br>

-track of whether it was an operator, and if it was, what precedence<br>

-level the operator is at. The precedence is only used for binary<br>

-operators (as you'll see below, it just doesn't apply for unary<br>

-operators). Now that we have a way to represent the prototype for a<br>

-user-defined operator, we need to parse it:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// prototype<br>

-    ///   ::= id '(' id* ')'<br>

-    ///   ::= binary LETTER number? (id, id)<br>

-    static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

-      std::string FnName;<br>

-<br>

-      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.<br>

-      unsigned BinaryPrecedence = 30;<br>

-<br>

-      switch (CurTok) {<br>

-      default:<br>

-        return LogErrorP("Expected function name in prototype");<br>

-      case tok_identifier:<br>

-        FnName = IdentifierStr;<br>

-        Kind = 0;<br>

-        getNextToken();<br>

-        break;<br>

-      case tok_binary:<br>

-        getNextToken();<br>

-        if (!isascii(CurTok))<br>

-          return LogErrorP("Expected binary operator");<br>

-        FnName = "binary";<br>

-        FnName += (char)CurTok;<br>

-        Kind = 2;<br>

-        getNextToken();<br>

-<br>

-        // Read the precedence if present.<br>

-        if (CurTok == tok_number) {<br>

-          if (NumVal < 1 || NumVal > 100)<br>

-            return LogErrorP("Invalid precedecnce: must be 1..100");<br>

-          BinaryPrecedence = (unsigned)NumVal;<br>

-          getNextToken();<br>

-        }<br>

-        break;<br>

-      }<br>

-<br>

-      if (CurTok != '(')<br>

-        return LogErrorP("Expected '(' in prototype");<br>

-<br>

-      std::vector<std::string> ArgNames;<br>

-      while (getNextToken() == tok_identifier)<br>

-        ArgNames.push_back(IdentifierStr);<br>

-      if (CurTok != ')')<br>

-        return LogErrorP("Expected ')' in prototype");<br>

-<br>

-      // success.<br>

-      getNextToken();  // eat ')'.<br>

-<br>

-      // Verify right number of names for operator.<br>

-      if (Kind && ArgNames.size() != Kind)<br>

-        return LogErrorP("Invalid number of operands for operator");<br>

-<br>

-      return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,<br>

-                                             BinaryPrecedence);<br>

-    }<br>

-<br>

-This is all fairly straightforward parsing code, and we have already<br>

-seen a lot of similar code in the past. One interesting part about the<br>

-code above is the couple lines that set up ``FnName`` for binary<br>

-operators. This builds names like "binary@" for a newly defined "@"<br>

-operator. This then takes advantage of the fact that symbol names in the<br>

-LLVM symbol table are allowed to have any character in them, including<br>

-embedded nul characters.<br>

-<br>

-The next interesting thing to add, is codegen support for these binary<br>

-operators. Given our current structure, this is a simple addition of a<br>

-default case for our existing binary operator node:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *BinaryExprAST::codegen() {<br>

-      Value *L = LHS->codegen();<br>

-      Value *R = RHS->codegen();<br>

-      if (!L || !R)<br>

-        return nullptr;<br>

-<br>

-      switch (Op) {<br>

-      case '+':<br>

-        return Builder.CreateFAdd(L, R, "addtmp");<br>

-      case '-':<br>

-        return Builder.CreateFSub(L, R, "subtmp");<br>

-      case '*':<br>

-        return Builder.CreateFMul(L, R, "multmp");<br>

-      case '<':<br>

-        L = Builder.CreateFCmpULT(L, R, "cmptmp");<br>

-        // Convert bool 0/1 to double 0.0 or 1.0<br>

-        return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext),<br>

-                                    "booltmp");<br>

-      default:<br>

-        break;<br>

-      }<br>

-<br>

-      // If it wasn't a builtin binary operator, it must be a user defined one. Emit<br>

-      // a call to it.<br>

-      Function *F = TheModule->getFunction(std::string("binary") + Op);<br>

-      assert(F && "binary operator not found!");<br>

-<br>

-      Value *Ops[2] = { L, R };<br>

-      return Builder.CreateCall(F, Ops, "binop");<br>

-    }<br>

-<br>

-As you can see above, the new code is actually really simple. It just<br>

-does a lookup for the appropriate operator in the symbol table and<br>

-generates a function call to it. Since user-defined operators are just<br>

-built as normal functions (because the "prototype" boils down to a<br>

-function with the right name) everything falls into place.<br>

-<br>

-The final piece of code we are missing, is a bit of top-level magic:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Function *FunctionAST::codegen() {<br>

-      NamedValues.clear();<br>

-<br>

-      Function *TheFunction = Proto->codegen();<br>

-      if (!TheFunction)<br>

-        return nullptr;<br>

-<br>

-      // If this is an operator, install it.<br>

-      if (Proto->isBinaryOp())<br>

-        BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence();<br>

-<br>

-      // Create a new basic block to start insertion into.<br>

-      BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction);<br>

-      Builder.SetInsertPoint(BB);<br>

-<br>

-      if (Value *RetVal = Body->codegen()) {<br>

-        ...<br>

-<br>

-Basically, before codegening a function, if it is a user-defined<br>

-operator, we register it in the precedence table. This allows the binary<br>

-operator parsing logic we already have in place to handle it. Since we<br>

-are working on a fully-general operator precedence parser, this is all<br>

-we need to do to "extend the grammar".<br>

-<br>

-Now we have useful user-defined binary operators. This builds a lot on<br>

-the previous framework we built for other operators. Adding unary<br>

-operators is a bit more challenging, because we don't have any framework<br>

-for it yet - lets see what it takes.<br>

-<br>

-User-defined Unary Operators<br>

-============================<br>

-<br>

-Since we don't currently support unary operators in the Kaleidoscope<br>

-language, we'll need to add everything to support them. Above, we added<br>

-simple support for the 'unary' keyword to the lexer. In addition to<br>

-that, we need an AST node:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// UnaryExprAST - Expression class for a unary operator.<br>

-    class UnaryExprAST : public ExprAST {<br>

-      char Opcode;<br>

-      std::unique_ptr<ExprAST> Operand;<br>

-<br>

-    public:<br>

-      UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)<br>

-        : Opcode(Opcode), Operand(std::move(Operand)) {}<br>

-      virtual Value *codegen();<br>

-    };<br>

-<br>

-This AST node is very simple and obvious by now. It directly mirrors the<br>

-binary operator AST node, except that it only has one child. With this,<br>

-we need to add the parsing logic. Parsing a unary operator is pretty<br>

-simple: we'll add a new function to do it:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// unary<br>

-    ///   ::= primary<br>

-    ///   ::= '!' unary<br>

-    static std::unique_ptr<ExprAST> ParseUnary() {<br>

-      // If the current token is not an operator, it must be a primary expr.<br>

-      if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')<br>

-        return ParsePrimary();<br>

-<br>

-      // If this is a unary operator, read it.<br>

-      int Opc = CurTok;<br>

-      getNextToken();<br>

-      if (auto Operand = ParseUnary())<br>

-        return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand));<br>

-      return nullptr;<br>

-    }<br>

-<br>

-The grammar we add is pretty straightforward here. If we see a unary<br>

-operator when parsing a primary operator, we eat the operator as a<br>

-prefix and parse the remaining piece as another unary operator. This<br>

-allows us to handle multiple unary operators (e.g. "!!x"). Note that<br>

-unary operators can't have ambiguous parses like binary operators can,<br>

-so there is no need for precedence information.<br>

-<br>

-The problem with this function, is that we need to call ParseUnary from<br>

-somewhere. To do this, we change previous callers of ParsePrimary to<br>

-call ParseUnary instead:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// binoprhs<br>

-    ///   ::= ('+' unary)*<br>

-    static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,<br>

-                                                  std::unique_ptr<ExprAST> LHS) {<br>

-      ...<br>

-        // Parse the unary expression after the binary operator.<br>

-        auto RHS = ParseUnary();<br>

-        if (!RHS)<br>

-          return nullptr;<br>

-      ...<br>

-    }<br>

-    /// expression<br>

-    ///   ::= unary binoprhs<br>

-    ///<br>

-    static std::unique_ptr<ExprAST> ParseExpression() {<br>

-      auto LHS = ParseUnary();<br>

-      if (!LHS)<br>

-        return nullptr;<br>

-<br>

-      return ParseBinOpRHS(0, std::move(LHS));<br>

-    }<br>

-<br>

-With these two simple changes, we are now able to parse unary operators<br>

-and build the AST for them. Next up, we need to add parser support for<br>

-prototypes, to parse the unary operator prototype. We extend the binary<br>

-operator code above with:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// prototype<br>

-    ///   ::= id '(' id* ')'<br>

-    ///   ::= binary LETTER number? (id, id)<br>

-    ///   ::= unary LETTER (id)<br>

-    static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

-      std::string FnName;<br>

-<br>

-      unsigned Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.<br>

-      unsigned BinaryPrecedence = 30;<br>

-<br>

-      switch (CurTok) {<br>

-      default:<br>

-        return LogErrorP("Expected function name in prototype");<br>

-      case tok_identifier:<br>

-        FnName = IdentifierStr;<br>

-        Kind = 0;<br>

-        getNextToken();<br>

-        break;<br>

-      case tok_unary:<br>

-        getNextToken();<br>

-        if (!isascii(CurTok))<br>

-          return LogErrorP("Expected unary operator");<br>

-        FnName = "unary";<br>

-        FnName += (char)CurTok;<br>

-        Kind = 1;<br>

-        getNextToken();<br>

-        break;<br>

-      case tok_binary:<br>

-        ...<br>

-<br>

-As with binary operators, we name unary operators with a name that<br>

-includes the operator character. This assists us at code generation<br>

-time. Speaking of, the final piece we need to add is codegen support for<br>

-unary operators. It looks like this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *UnaryExprAST::codegen() {<br>

-      Value *OperandV = Operand->codegen();<br>

-      if (!OperandV)<br>

-        return nullptr;<br>

-<br>

-      Function *F = TheModule->getFunction(std::string("unary")+Opcode);<br>

-      if (!F)<br>

-        return LogErrorV("Unknown unary operator");<br>

-<br>

-      return Builder.CreateCall(F, OperandV, "unop");<br>

-    }<br>

-<br>

-This code is similar to, but simpler than, the code for binary<br>

-operators. It is simpler primarily because it doesn't need to handle any<br>

-predefined operators.<br>

-<br>

-Kicking the Tires<br>

-=================<br>

-<br>

-It is somewhat hard to believe, but with a few simple extensions we've<br>

-covered in the last chapters, we have grown a real-ish language. With<br>

-this, we can do a lot of interesting things, including I/O, math, and a<br>

-bunch of other things. For example, we can now add a nice sequencing<br>

-operator (printd is defined to print out the specified value and a<br>

-newline):<br>

-<br>

-::<br>

-<br>

-    ready> extern printd(x);<br>

-    Read extern:<br>

-    declare double @printd(double)<br>

-<br>

-    ready> def binary : 1 (x y) 0;  # Low-precedence operator that ignores operands.<br>

-    ..<br>

-    ready> printd(123) : printd(456) : printd(789);<br>

-    123.000000<br>

-    456.000000<br>

-    789.000000<br>

-    Evaluated to 0.000000<br>

-<br>

-We can also define a bunch of other "primitive" operations, such as:<br>

-<br>

-::<br>

-<br>

-    # Logical unary not.<br>

-    def unary!(v)<br>

-      if v then<br>

-        0<br>

-      else<br>

-        1;<br>

-<br>

-    # Unary negate.<br>

-    def unary-(v)<br>

-      0-v;<br>

-<br>

-    # Define > with the same precedence as <.<br>

-    def binary> 10 (LHS RHS)<br>

-      RHS < LHS;<br>

-<br>

-    # Binary logical or, which does not short circuit.<br>

-    def binary| 5 (LHS RHS)<br>

-      if LHS then<br>

-        1<br>

-      else if RHS then<br>

-        1<br>

-      else<br>

-        0;<br>

-<br>

-    # Binary logical and, which does not short circuit.<br>

-    def binary& 6 (LHS RHS)<br>

-      if !LHS then<br>

-        0<br>

-      else<br>

-        !!RHS;<br>

-<br>

-    # Define = with slightly lower precedence than relationals.<br>

-    def binary = 9 (LHS RHS)<br>

-      !(LHS < RHS | LHS > RHS);<br>

-<br>

-    # Define ':' for sequencing: as a low-precedence operator that ignores operands<br>

-    # and just returns the RHS.<br>

-    def binary : 1 (x y) y;<br>

-<br>

-Given the previous if/then/else support, we can also define interesting<br>

-functions for I/O. For example, the following prints out a character<br>

-whose "density" reflects the value passed in: the lower the value, the<br>

-denser the character:<br>

-<br>

-::<br>

-<br>

-    ready><br>

-<br>

-    extern putchard(char)<br>

-    def printdensity(d)<br>

-      if d > 8 then<br>

-        putchard(32)  # ' '<br>

-      else if d > 4 then<br>

-        putchard(46)  # '.'<br>

-      else if d > 2 then<br>

-        putchard(43)  # '+'<br>

-      else<br>

-        putchard(42); # '*'<br>

-    ...<br>

-    ready> printdensity(1): printdensity(2): printdensity(3):<br>

-           printdensity(4): printdensity(5): printdensity(9):<br>

-           putchard(10);<br>

-    **++.<br>

-    Evaluated to 0.000000<br>

-<br>

-Based on these simple primitive operations, we can start to define more<br>

-interesting things. For example, here's a little function that solves<br>

-for the number of iterations it takes a function in the complex plane to<br>

-converge:<br>

-<br>

-::<br>

-<br>

-    # Determine whether the specific location diverges.<br>

-    # Solve for z = z^2 + c in the complex plane.<br>

-    def mandelconverger(real imag iters creal cimag)<br>

-      if iters > 255 | (real*real + imag*imag > 4) then<br>

-        iters<br>

-      else<br>

-        mandelconverger(real*real - imag*imag + creal,<br>

-                        2*real*imag + cimag,<br>

-                        iters+1, creal, cimag);<br>

-<br>

-    # Return the number of iterations required for the iteration to escape<br>

-    def mandelconverge(real imag)<br>

-      mandelconverger(real, imag, 0, real, imag);<br>

-<br>

-This "``z = z2 + c``" function is a beautiful little creature that is<br>

-the basis for computation of the `Mandelbrot<br>

-Set <<a href="http://en.wikipedia.org/wiki/Mandelbrot_set" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Mandelbrot_set</a>>`_. Our<br>

-``mandelconverge`` function returns the number of iterations that it<br>

-takes for a complex orbit to escape, saturating to 255. This is not a<br>

-very useful function by itself, but if you plot its value over a<br>

-two-dimensional plane, you can see the Mandelbrot set. Given that we are<br>

-limited to using putchard here, our amazing graphical output is limited,<br>

-but we can whip together something using the density plotter above:<br>

-<br>

-::<br>

-<br>

-    # Compute and plot the mandelbrot set with the specified 2 dimensional range<br>

-    # info.<br>

-    def mandelhelp(xmin xmax xstep   ymin ymax ystep)<br>

-      for y = ymin, y < ymax, ystep in (<br>

-        (for x = xmin, x < xmax, xstep in<br>

-           printdensity(mandelconverge(x,y)))<br>

-        : putchard(10)<br>

-      )<br>

-<br>

-    # mandel - This is a convenient helper function for plotting the mandelbrot set<br>

-    # from the specified position with the specified Magnification.<br>

-    def mandel(realstart imagstart realmag imagmag)<br>

-      mandelhelp(realstart, realstart+realmag*78, realmag,<br>

-                 imagstart, imagstart+imagmag*40, imagmag);<br>

-<br>

-Given this, we can try plotting out the mandelbrot set! Lets try it out:<br>

-<br>

-::<br>

-<br>

-    ready> mandel(-2.3, -1.3, 0.05, 0.07);<br>

-    *******************************+++++++++++*************************************<br>

-    *************************+++++++++++++++++++++++*******************************<br>

-    **********************+++++++++++++++++++++++++++++****************************<br>

-    *******************+++++++++++++++++++++.. ...++++++++*************************<br>

-    *****************++++++++++++++++++++++.... ...+++++++++***********************<br>

-    ***************+++++++++++++++++++++++.....   ...+++++++++*********************<br>

-    **************+++++++++++++++++++++++....     ....+++++++++********************<br>

-    *************++++++++++++++++++++++......      .....++++++++*******************<br>

-    ************+++++++++++++++++++++.......       .......+++++++******************<br>

-    ***********+++++++++++++++++++....                ... .+++++++*****************<br>

-    **********+++++++++++++++++.......                     .+++++++****************<br>

-    *********++++++++++++++...........                    ...+++++++***************<br>

-    ********++++++++++++............                      ...++++++++**************<br>

-    ********++++++++++... ..........                        .++++++++**************<br>

-    *******+++++++++.....                                   .+++++++++*************<br>

-    *******++++++++......                                  ..+++++++++*************<br>

-    *******++++++.......                                   ..+++++++++*************<br>

-    *******+++++......                                     ..+++++++++*************<br>

-    *******.... ....                                      ...+++++++++*************<br>

-    *******.... .                                         ...+++++++++*************<br>

-    *******+++++......                                    ...+++++++++*************<br>

-    *******++++++.......                                   ..+++++++++*************<br>

-    *******++++++++......                                   .+++++++++*************<br>

-    *******+++++++++.....                                  ..+++++++++*************<br>

-    ********++++++++++... ..........                        .++++++++**************<br>

-    ********++++++++++++............                      ...++++++++**************<br>

-    *********++++++++++++++..........                     ...+++++++***************<br>

-    **********++++++++++++++++........                     .+++++++****************<br>

-    **********++++++++++++++++++++....                ... ..+++++++****************<br>

-    ***********++++++++++++++++++++++.......       .......++++++++*****************<br>

-    ************+++++++++++++++++++++++......      ......++++++++******************<br>

-    **************+++++++++++++++++++++++....      ....++++++++********************<br>

-    ***************+++++++++++++++++++++++.....   ...+++++++++*********************<br>

-    *****************++++++++++++++++++++++....  ...++++++++***********************<br>

-    *******************+++++++++++++++++++++......++++++++*************************<br>

-    *********************++++++++++++++++++++++.++++++++***************************<br>

-    *************************+++++++++++++++++++++++*******************************<br>

-    ******************************+++++++++++++************************************<br>

-    *******************************************************************************<br>

-    *******************************************************************************<br>

-    *******************************************************************************<br>

-    Evaluated to 0.000000<br>

-    ready> mandel(-2, -1, 0.02, 0.04);<br>

-    **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++<br>

-    ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++<br>

-    *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.<br>

-    *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...<br>

-    *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....<br>

-    ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........<br>

-    **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........<br>

-    ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............<br>

-    ***********++++++++++++++++++++++++++++++++++++++++++++++++++........        .<br>

-    **********++++++++++++++++++++++++++++++++++++++++++++++.............<br>

-    ********+++++++++++++++++++++++++++++++++++++++++++..................<br>

-    *******+++++++++++++++++++++++++++++++++++++++.......................<br>

-    ******+++++++++++++++++++++++++++++++++++...........................<br>

-    *****++++++++++++++++++++++++++++++++............................<br>

-    *****++++++++++++++++++++++++++++...............................<br>

-    ****++++++++++++++++++++++++++......   .........................<br>

-    ***++++++++++++++++++++++++.........     ......    ...........<br>

-    ***++++++++++++++++++++++............<br>

-    **+++++++++++++++++++++..............<br>

-    **+++++++++++++++++++................<br>

-    *++++++++++++++++++.................<br>

-    *++++++++++++++++............ ...<br>

-    *++++++++++++++..............<br>

-    *+++....++++................<br>

-    *..........  ...........<br>

-    *<br>

-    *..........  ...........<br>

-    *+++....++++................<br>

-    *++++++++++++++..............<br>

-    *++++++++++++++++............ ...<br>

-    *++++++++++++++++++.................<br>

-    **+++++++++++++++++++................<br>

-    **+++++++++++++++++++++..............<br>

-    ***++++++++++++++++++++++............<br>

-    ***++++++++++++++++++++++++.........     ......    ...........<br>

-    ****++++++++++++++++++++++++++......   .........................<br>

-    *****++++++++++++++++++++++++++++...............................<br>

-    *****++++++++++++++++++++++++++++++++............................<br>

-    ******+++++++++++++++++++++++++++++++++++...........................<br>

-    *******+++++++++++++++++++++++++++++++++++++++.......................<br>

-    ********+++++++++++++++++++++++++++++++++++++++++++..................<br>

-    Evaluated to 0.000000<br>

-    ready> mandel(-0.9, -1.4, 0.02, 0.03);<br>

-    *******************************************************************************<br>

-    *******************************************************************************<br>

-    *******************************************************************************<br>

-    **********+++++++++++++++++++++************************************************<br>

-    *+++++++++++++++++++++++++++++++++++++++***************************************<br>

-    +++++++++++++++++++++++++++++++++++++++++++++**********************************<br>

-    ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************<br>

-    ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************<br>

-    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************<br>

-    +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************<br>

-    +++++++++++++++++++++++++++++++....   ......+++++++++++++++++++****************<br>

-    +++++++++++++++++++++++++++++.......  ........+++++++++++++++++++**************<br>

-    ++++++++++++++++++++++++++++........   ........++++++++++++++++++++************<br>

-    +++++++++++++++++++++++++++.........     ..  ...+++++++++++++++++++++**********<br>

-    ++++++++++++++++++++++++++...........        ....++++++++++++++++++++++********<br>

-    ++++++++++++++++++++++++.............       .......++++++++++++++++++++++******<br>

-    +++++++++++++++++++++++.............        ........+++++++++++++++++++++++****<br>

-    ++++++++++++++++++++++...........           ..........++++++++++++++++++++++***<br>

-    ++++++++++++++++++++...........                .........++++++++++++++++++++++*<br>

-    ++++++++++++++++++............                  ...........++++++++++++++++++++<br>

-    ++++++++++++++++...............                 .............++++++++++++++++++<br>

-    ++++++++++++++.................                 ...............++++++++++++++++<br>

-    ++++++++++++..................                  .................++++++++++++++<br>

-    +++++++++..................                      .................+++++++++++++<br>

-    ++++++........        .                               .........  ..++++++++++++<br>

-    ++............                                         ......    ....++++++++++<br>

-    ..............                                                    ...++++++++++<br>

-    ..............                                                    ....+++++++++<br>

-    ..............                                                    .....++++++++<br>

-    .............                                                    ......++++++++<br>

-    ...........                                                     .......++++++++<br>

-    .........                                                       ........+++++++<br>

-    .........                                                       ........+++++++<br>

-    .........                                                           ....+++++++<br>

-    ........                                                             ...+++++++<br>

-    .......                                                              ...+++++++<br>

-                                                                        ....+++++++<br>

-                                                                       .....+++++++<br>

-                                                                        ....+++++++<br>

-                                                                        ....+++++++<br>

-                                                                        ....+++++++<br>

-    Evaluated to 0.000000<br>

-    ready> ^D<br>

-<br>

-At this point, you may be starting to realize that Kaleidoscope is a<br>

-real and powerful language. It may not be self-similar :), but it can be<br>

-used to plot things that are!<br>

-<br>

-With this, we conclude the "adding user-defined operators" chapter of<br>

-the tutorial. We have successfully augmented our language, adding the<br>

-ability to extend the language in the library, and we have shown how<br>

-this can be used to build a simple but interesting end-user application<br>

-in Kaleidoscope. At this point, Kaleidoscope can build a variety of<br>

-applications that are functional and can call functions with<br>

-side-effects, but it can't actually define and mutate a variable itself.<br>

-<br>

-Strikingly, variable mutation is an important feature of some languages,<br>

-and it is not at all obvious how to `add support for mutable<br>

-variables <LangImpl7.html>`_ without having to add an "SSA construction"<br>

-phase to your front-end. In the next chapter, we will describe how you<br>

-can add variable mutation without building SSA in your front-end.<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for our running example, enhanced with<br>

-the if/then/else and for expressions.. To build this example, use:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

-    # Run<br>

-    ./toy<br>

-<br>

-On some platforms, you will need to specify -rdynamic or<br>

--Wl,--export-dynamic when linking. This ensures that symbols defined in<br>

-the main executable are exported to the dynamic linker and so are<br>

-available for symbol resolution at run time. This is not needed if you<br>

-compile your support code into a shared library, although doing that<br>

-will cause problems on Windows.<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Extending the language: mutable variables / SSA<br>

-construction <LangImpl7.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl7.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl7.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl7.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl7.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl7.rst (removed)<br>

@@ -1,881 +0,0 @@<br>

-=======================================================<br>

-Kaleidoscope: Extending the Language: Mutable Variables<br>

-=======================================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 7 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 7 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a<br>

-very respectable, albeit simple, `functional programming<br>

-language <<a href="http://en.wikipedia.org/wiki/Functional_programming" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Functional_programming</a>>`_. In our<br>

-journey, we learned some parsing techniques, how to build and represent<br>

-an AST, how to build LLVM IR, and how to optimize the resultant code as<br>

-well as JIT compile it.<br>

-<br>

-While Kaleidoscope is interesting as a functional language, the fact<br>

-that it is functional makes it "too easy" to generate LLVM IR for it. In<br>

-particular, a functional language makes it very easy to build LLVM IR<br>

-directly in `SSA<br>

-form <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_.<br>

-Since LLVM requires that the input code be in SSA form, this is a very<br>

-nice property and it is often unclear to newcomers how to generate code<br>

-for an imperative language with mutable variables.<br>

-<br>

-The short (and happy) summary of this chapter is that there is no need<br>

-for your front-end to build SSA form: LLVM provides highly tuned and<br>

-well tested support for this, though the way it works is a bit<br>

-unexpected for some.<br>

-<br>

-Why is this a hard problem?<br>

-===========================<br>

-<br>

-To understand why mutable variables cause complexities in SSA<br>

-construction, consider this extremely simple C example:<br>

-<br>

-.. code-block:: c<br>

-<br>

-    int G, H;<br>

-    int test(_Bool Condition) {<br>

-      int X;<br>

-      if (Condition)<br>

-        X = G;<br>

-      else<br>

-        X = H;<br>

-      return X;<br>

-    }<br>

-<br>

-In this case, we have the variable "X", whose value depends on the path<br>

-executed in the program. Because there are two different possible values<br>

-for X before the return instruction, a PHI node is inserted to merge the<br>

-two values. The LLVM IR that we want for this example looks like this:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    @G = weak global i32 0   ; type of @G is i32*<br>

-    @H = weak global i32 0   ; type of @H is i32*<br>

-<br>

-    define i32 @test(i1 %Condition) {<br>

-    entry:<br>

-      br i1 %Condition, label %cond_true, label %cond_false<br>

-<br>

-    cond_true:<br>

-      %X.0 = load i32* @G<br>

-      br label %cond_next<br>

-<br>

-    cond_false:<br>

-      %X.1 = load i32* @H<br>

-      br label %cond_next<br>

-<br>

-    cond_next:<br>

-      %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]<br>

-      ret i32 %X.2<br>

-    }<br>

-<br>

-In this example, the loads from the G and H global variables are<br>

-explicit in the LLVM IR, and they live in the then/else branches of the<br>

-if statement (cond\_true/cond\_false). In order to merge the incoming<br>

-values, the X.2 phi node in the cond\_next block selects the right value<br>

-to use based on where control flow is coming from: if control flow comes<br>

-from the cond\_false block, X.2 gets the value of X.1. Alternatively, if<br>

-control flow comes from cond\_true, it gets the value of X.0. The intent<br>

-of this chapter is not to explain the details of SSA form. For more<br>

-information, see one of the many `online<br>

-references <<a href="http://en.wikipedia.org/wiki/Static_single_assignment_form" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Static_single_assignment_form</a>>`_.<br>

-<br>

-The question for this article is "who places the phi nodes when lowering<br>

-assignments to mutable variables?". The issue here is that LLVM<br>

-*requires* that its IR be in SSA form: there is no "non-ssa" mode for<br>

-it. However, SSA construction requires non-trivial algorithms and data<br>

-structures, so it is inconvenient and wasteful for every front-end to<br>

-have to reproduce this logic.<br>

-<br>

-Memory in LLVM<br>

-==============<br>

-<br>

-The 'trick' here is that while LLVM does require all register values to<br>

-be in SSA form, it does not require (or permit) memory objects to be in<br>

-SSA form. In the example above, note that the loads from G and H are<br>

-direct accesses to G and H: they are not renamed or versioned. This<br>

-differs from some other compiler systems, which do try to version memory<br>

-objects. In LLVM, instead of encoding dataflow analysis of memory into<br>

-the LLVM IR, it is handled with `Analysis<br>

-Passes <../WritingAnLLVMPass.html>`_ which are computed on demand.<br>

-<br>

-With this in mind, the high-level idea is that we want to make a stack<br>

-variable (which lives in memory, because it is on the stack) for each<br>

-mutable object in a function. To take advantage of this trick, we need<br>

-to talk about how LLVM represents stack variables.<br>

-<br>

-In LLVM, all memory accesses are explicit with load/store instructions,<br>

-and it is carefully designed not to have (or need) an "address-of"<br>

-operator. Notice how the type of the @G/@H global variables is actually<br>

-"i32\*" even though the variable is defined as "i32". What this means is<br>

-that @G defines *space* for an i32 in the global data area, but its<br>

-*name* actually refers to the address for that space. Stack variables<br>

-work the same way, except that instead of being declared with global<br>

-variable definitions, they are declared with the `LLVM alloca<br>

-instruction <../LangRef.html#alloca-instruction>`_:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    define i32 @example() {<br>

-    entry:<br>

-      %X = alloca i32           ; type of %X is i32*.<br>

-      ...<br>

-      %tmp = load i32* %X       ; load the stack value %X from the stack.<br>

-      %tmp2 = add i32 %tmp, 1   ; increment it<br>

-      store i32 %tmp2, i32* %X  ; store it back<br>

-      ...<br>

-<br>

-This code shows an example of how you can declare and manipulate a stack<br>

-variable in the LLVM IR. Stack memory allocated with the alloca<br>

-instruction is fully general: you can pass the address of the stack slot<br>

-to functions, you can store it in other variables, etc. In our example<br>

-above, we could rewrite the example to use the alloca technique to avoid<br>

-using a PHI node:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    @G = weak global i32 0   ; type of @G is i32*<br>

-    @H = weak global i32 0   ; type of @H is i32*<br>

-<br>

-    define i32 @test(i1 %Condition) {<br>

-    entry:<br>

-      %X = alloca i32           ; type of %X is i32*.<br>

-      br i1 %Condition, label %cond_true, label %cond_false<br>

-<br>

-    cond_true:<br>

-      %X.0 = load i32* @G<br>

-      store i32 %X.0, i32* %X   ; Update X<br>

-      br label %cond_next<br>

-<br>

-    cond_false:<br>

-      %X.1 = load i32* @H<br>

-      store i32 %X.1, i32* %X   ; Update X<br>

-      br label %cond_next<br>

-<br>

-    cond_next:<br>

-      %X.2 = load i32* %X       ; Read X<br>

-      ret i32 %X.2<br>

-    }<br>

-<br>

-With this, we have discovered a way to handle arbitrary mutable<br>

-variables without the need to create Phi nodes at all:<br>

-<br>

-#. Each mutable variable becomes a stack allocation.<br>

-#. Each read of the variable becomes a load from the stack.<br>

-#. Each update of the variable becomes a store to the stack.<br>

-#. Taking the address of a variable just uses the stack address<br>

-   directly.<br>

-<br>

-While this solution has solved our immediate problem, it introduced<br>

-another one: we have now apparently introduced a lot of stack traffic<br>

-for very simple and common operations, a major performance problem.<br>

-Fortunately for us, the LLVM optimizer has a highly-tuned optimization<br>

-pass named "mem2reg" that handles this case, promoting allocas like this<br>

-into SSA registers, inserting Phi nodes as appropriate. If you run this<br>

-example through the pass, for example, you'll get:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    $ llvm-as < example.ll | opt -mem2reg | llvm-dis<br>

-    @G = weak global i32 0<br>

-    @H = weak global i32 0<br>

-<br>

-    define i32 @test(i1 %Condition) {<br>

-    entry:<br>

-      br i1 %Condition, label %cond_true, label %cond_false<br>

-<br>

-    cond_true:<br>

-      %X.0 = load i32* @G<br>

-      br label %cond_next<br>

-<br>

-    cond_false:<br>

-      %X.1 = load i32* @H<br>

-      br label %cond_next<br>

-<br>

-    cond_next:<br>

-      %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]<br>

-      ret i32 %X.01<br>

-    }<br>

-<br>

-The mem2reg pass implements the standard "iterated dominance frontier"<br>

-algorithm for constructing SSA form and has a number of optimizations<br>

-that speed up (very common) degenerate cases. The mem2reg optimization<br>

-pass is the answer to dealing with mutable variables, and we highly<br>

-recommend that you depend on it. Note that mem2reg only works on<br>

-variables in certain circumstances:<br>

-<br>

-#. mem2reg is alloca-driven: it looks for allocas and if it can handle<br>

-   them, it promotes them. It does not apply to global variables or heap<br>

-   allocations.<br>

-#. mem2reg only looks for alloca instructions in the entry block of the<br>

-   function. Being in the entry block guarantees that the alloca is only<br>

-   executed once, which makes analysis simpler.<br>

-#. mem2reg only promotes allocas whose uses are direct loads and stores.<br>

-   If the address of the stack object is passed to a function, or if any<br>

-   funny pointer arithmetic is involved, the alloca will not be<br>

-   promoted.<br>

-#. mem2reg only works on allocas of `first<br>

-   class <../LangRef.html#first-class-types>`_ values (such as pointers,<br>

-   scalars and vectors), and only if the array size of the allocation is<br>

-   1 (or missing in the .ll file). mem2reg is not capable of promoting<br>

-   structs or arrays to registers. Note that the "sroa" pass is<br>

-   more powerful and can promote structs, "unions", and arrays in many<br>

-   cases.<br>

-<br>

-All of these properties are easy to satisfy for most imperative<br>

-languages, and we'll illustrate it below with Kaleidoscope. The final<br>

-question you may be asking is: should I bother with this nonsense for my<br>

-front-end? Wouldn't it be better if I just did SSA construction<br>

-directly, avoiding use of the mem2reg optimization pass? In short, we<br>

-strongly recommend that you use this technique for building SSA form,<br>

-unless there is an extremely good reason not to. Using this technique<br>

-is:<br>

-<br>

--  Proven and well tested: clang uses this technique<br>

-   for local mutable variables. As such, the most common clients of LLVM<br>

-   are using this to handle a bulk of their variables. You can be sure<br>

-   that bugs are found fast and fixed early.<br>

--  Extremely Fast: mem2reg has a number of special cases that make it<br>

-   fast in common cases as well as fully general. For example, it has<br>

-   fast-paths for variables that are only used in a single block,<br>

-   variables that only have one assignment point, good heuristics to<br>

-   avoid insertion of unneeded phi nodes, etc.<br>

--  Needed for debug info generation: `Debug information in<br>

-   LLVM <../SourceLevelDebugging.html>`_ relies on having the address of<br>

-   the variable exposed so that debug info can be attached to it. This<br>

-   technique dovetails very naturally with this style of debug info.<br>

-<br>

-If nothing else, this makes it much easier to get your front-end up and<br>

-running, and is very simple to implement. Let's extend Kaleidoscope with<br>

-mutable variables now!<br>

-<br>

-Mutable Variables in Kaleidoscope<br>

-=================================<br>

-<br>

-Now that we know the sort of problem we want to tackle, let's see what<br>

-this looks like in the context of our little Kaleidoscope language.<br>

-We're going to add two features:<br>

-<br>

-#. The ability to mutate variables with the '=' operator.<br>

-#. The ability to define new variables.<br>

-<br>

-While the first item is really what this is about, we only have<br>

-variables for incoming arguments as well as for induction variables, and<br>

-redefining those only goes so far :). Also, the ability to define new<br>

-variables is a useful thing regardless of whether you will be mutating<br>

-them. Here's a motivating example that shows how we could use these:<br>

-<br>

-::<br>

-<br>

-    # Define ':' for sequencing: as a low-precedence operator that ignores operands<br>

-    # and just returns the RHS.<br>

-    def binary : 1 (x y) y;<br>

-<br>

-    # Recursive fib, we could do this before.<br>

-    def fib(x)<br>

-      if (x < 3) then<br>

-        1<br>

-      else<br>

-        fib(x-1)+fib(x-2);<br>

-<br>

-    # Iterative fib.<br>

-    def fibi(x)<br>

-      var a = 1, b = 1, c in<br>

-      (for i = 3, i < x in<br>

-         c = a + b :<br>

-         a = b :<br>

-         b = c) :<br>

-      b;<br>

-<br>

-    # Call it.<br>

-    fibi(10);<br>

-<br>

-In order to mutate variables, we have to change our existing variables<br>

-to use the "alloca trick". Once we have that, we'll add our new<br>

-operator, then extend Kaleidoscope to support new variable definitions.<br>

-<br>

-Adjusting Existing Variables for Mutation<br>

-=========================================<br>

-<br>

-The symbol table in Kaleidoscope is managed at code generation time by<br>

-the '``NamedValues``' map. This map currently keeps track of the LLVM<br>

-"Value\*" that holds the double value for the named variable. In order<br>

-to support mutation, we need to change this slightly, so that<br>

-``NamedValues`` holds the *memory location* of the variable in question.<br>

-Note that this change is a refactoring: it changes the structure of the<br>

-code, but does not (by itself) change the behavior of the compiler. All<br>

-of these changes are isolated in the Kaleidoscope code generator.<br>

-<br>

-At this point in Kaleidoscope's development, it only supports variables<br>

-for two things: incoming arguments to functions and the induction<br>

-variable of 'for' loops. For consistency, we'll allow mutation of these<br>

-variables in addition to other user-defined variables. This means that<br>

-these will both need memory locations.<br>

-<br>

-To start our transformation of Kaleidoscope, we'll change the<br>

-NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once<br>

-we do this, the C++ compiler will tell us what parts of the code we need<br>

-to update:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    static std::map<std::string, AllocaInst*> NamedValues;<br>

-<br>

-Also, since we will need to create these alloca's, we'll use a helper<br>

-function that ensures that the allocas are created in the entry block of<br>

-the function:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of<br>

-    /// the function.  This is used for mutable variables etc.<br>

-    static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,<br>

-                                              const std::string &VarName) {<br>

-      IRBuilder<> TmpB(&TheFunction->getEntryBlock(),<br>

-                     TheFunction->getEntryBlock().begin());<br>

-      return TmpB.CreateAlloca(Type::getDoubleTy(LLVMContext), 0,<br>

-                               VarName.c_str());<br>

-    }<br>

-<br>

-This funny looking code creates an IRBuilder object that is pointing at<br>

-the first instruction (.begin()) of the entry block. It then creates an<br>

-alloca with the expected name and returns it. Because all values in<br>

-Kaleidoscope are doubles, there is no need to pass in a type to use.<br>

-<br>

-With this in place, the first functionality change we want to make is to<br>

-variable references. In our new scheme, variables live on the stack, so<br>

-code generating a reference to them actually needs to produce a load<br>

-from the stack slot:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *VariableExprAST::codegen() {<br>

-      // Look this variable up in the function.<br>

-      Value *V = NamedValues[Name];<br>

-      if (!V)<br>

-        return LogErrorV("Unknown variable name");<br>

-<br>

-      // Load the value.<br>

-      return Builder.CreateLoad(V, Name.c_str());<br>

-    }<br>

-<br>

-As you can see, this is pretty straightforward. Now we need to update<br>

-the things that define the variables to set up the alloca. We'll start<br>

-with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for<br>

-the unabridged code):<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

-<br>

-      // Create an alloca for the variable in the entry block.<br>

-      AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

-<br>

-        // Emit the start code first, without 'variable' in scope.<br>

-      Value *StartVal = Start->codegen();<br>

-      if (!StartVal)<br>

-        return nullptr;<br>

-<br>

-      // Store the value into the alloca.<br>

-      Builder.CreateStore(StartVal, Alloca);<br>

-      ...<br>

-<br>

-      // Compute the end condition.<br>

-      Value *EndCond = End->codegen();<br>

-      if (!EndCond)<br>

-        return nullptr;<br>

-<br>

-      // Reload, increment, and restore the alloca.  This handles the case where<br>

-      // the body of the loop mutates the variable.<br>

-      Value *CurVar = Builder.CreateLoad(Alloca);<br>

-      Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");<br>

-      Builder.CreateStore(NextVar, Alloca);<br>

-      ...<br>

-<br>

-This code is virtually identical to the code `before we allowed mutable<br>

-variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we<br>

-no longer have to construct a PHI node, and we use load/store to access<br>

-the variable as needed.<br>

-<br>

-To support mutable argument variables, we need to also make allocas for<br>

-them. The code for this is also pretty simple:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// CreateArgumentAllocas - Create an alloca for each argument and register the<br>

-    /// argument in the symbol table so that references to it will succeed.<br>

-    void PrototypeAST::CreateArgumentAllocas(Function *F) {<br>

-      Function::arg_iterator AI = F->arg_begin();<br>

-      for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {<br>

-        // Create an alloca for this variable.<br>

-        AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);<br>

-<br>

-        // Store the initial value into the alloca.<br>

-        Builder.CreateStore(AI, Alloca);<br>

-<br>

-        // Add arguments to variable symbol table.<br>

-        NamedValues[Args[Idx]] = Alloca;<br>

-      }<br>

-    }<br>

-<br>

-For each argument, we make an alloca, store the input value to the<br>

-function into the alloca, and register the alloca as the memory location<br>

-for the argument. This method gets invoked by ``FunctionAST::codegen()``<br>

-right after it sets up the entry block for the function.<br>

-<br>

-The final missing piece is adding the mem2reg pass, which allows us to<br>

-get good codegen once again:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        // Set up the optimizer pipeline.  Start with registering info about how the<br>

-        // target lays out data structures.<br>

-        OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout()));<br>

-        // Promote allocas to registers.<br>

-        OurFPM.add(createPromoteMemoryToRegisterPass());<br>

-        // Do simple "peephole" optimizations and bit-twiddling optzns.<br>

-        OurFPM.add(createInstructionCombiningPass());<br>

-        // Reassociate expressions.<br>

-        OurFPM.add(createReassociatePass());<br>

-<br>

-It is interesting to see what the code looks like before and after the<br>

-mem2reg optimization runs. For example, this is the before/after code<br>

-for our recursive fib function. Before the optimization:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    define double @fib(double %x) {<br>

-    entry:<br>

-      %x1 = alloca double<br>

-      store double %x, double* %x1<br>

-      %x2 = load double* %x1<br>

-      %cmptmp = fcmp ult double %x2, 3.000000e+00<br>

-      %booltmp = uitofp i1 %cmptmp to double<br>

-      %ifcond = fcmp one double %booltmp, 0.000000e+00<br>

-      br i1 %ifcond, label %then, label %else<br>

-<br>

-    then:       ; preds = %entry<br>

-      br label %ifcont<br>

-<br>

-    else:       ; preds = %entry<br>

-      %x3 = load double* %x1<br>

-      %subtmp = fsub double %x3, 1.000000e+00<br>

-      %calltmp = call double @fib(double %subtmp)<br>

-      %x4 = load double* %x1<br>

-      %subtmp5 = fsub double %x4, 2.000000e+00<br>

-      %calltmp6 = call double @fib(double %subtmp5)<br>

-      %addtmp = fadd double %calltmp, %calltmp6<br>

-      br label %ifcont<br>

-<br>

-    ifcont:     ; preds = %else, %then<br>

-      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]<br>

-      ret double %iftmp<br>

-    }<br>

-<br>

-Here there is only one variable (x, the input argument) but you can<br>

-still see the extremely simple-minded code generation strategy we are<br>

-using. In the entry block, an alloca is created, and the initial input<br>

-value is stored into it. Each reference to the variable does a reload<br>

-from the stack. Also, note that we didn't modify the if/then/else<br>

-expression, so it still inserts a PHI node. While we could make an<br>

-alloca for it, it is actually easier to create a PHI node for it, so we<br>

-still just make the PHI.<br>

-<br>

-Here is the code after the mem2reg pass runs:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    define double @fib(double %x) {<br>

-    entry:<br>

-      %cmptmp = fcmp ult double %x, 3.000000e+00<br>

-      %booltmp = uitofp i1 %cmptmp to double<br>

-      %ifcond = fcmp one double %booltmp, 0.000000e+00<br>

-      br i1 %ifcond, label %then, label %else<br>

-<br>

-    then:<br>

-      br label %ifcont<br>

-<br>

-    else:<br>

-      %subtmp = fsub double %x, 1.000000e+00<br>

-      %calltmp = call double @fib(double %subtmp)<br>

-      %subtmp5 = fsub double %x, 2.000000e+00<br>

-      %calltmp6 = call double @fib(double %subtmp5)<br>

-      %addtmp = fadd double %calltmp, %calltmp6<br>

-      br label %ifcont<br>

-<br>

-    ifcont:     ; preds = %else, %then<br>

-      %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]<br>

-      ret double %iftmp<br>

-    }<br>

-<br>

-This is a trivial case for mem2reg, since there are no redefinitions of<br>

-the variable. The point of showing this is to calm your tension about<br>

-inserting such blatent inefficiencies :).<br>

-<br>

-After the rest of the optimizers run, we get:<br>

-<br>

-.. code-block:: llvm<br>

-<br>

-    define double @fib(double %x) {<br>

-    entry:<br>

-      %cmptmp = fcmp ult double %x, 3.000000e+00<br>

-      %booltmp = uitofp i1 %cmptmp to double<br>

-      %ifcond = fcmp ueq double %booltmp, 0.000000e+00<br>

-      br i1 %ifcond, label %else, label %ifcont<br>

-<br>

-    else:<br>

-      %subtmp = fsub double %x, 1.000000e+00<br>

-      %calltmp = call double @fib(double %subtmp)<br>

-      %subtmp5 = fsub double %x, 2.000000e+00<br>

-      %calltmp6 = call double @fib(double %subtmp5)<br>

-      %addtmp = fadd double %calltmp, %calltmp6<br>

-      ret double %addtmp<br>

-<br>

-    ifcont:<br>

-      ret double 1.000000e+00<br>

-    }<br>

-<br>

-Here we see that the simplifycfg pass decided to clone the return<br>

-instruction into the end of the 'else' block. This allowed it to<br>

-eliminate some branches and the PHI node.<br>

-<br>

-Now that all symbol table references are updated to use stack variables,<br>

-we'll add the assignment operator.<br>

-<br>

-New Assignment Operator<br>

-=======================<br>

-<br>

-With our current framework, adding a new assignment operator is really<br>

-simple. We will parse it just like any other binary operator, but handle<br>

-it internally (instead of allowing the user to define it). The first<br>

-step is to set a precedence:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-     int main() {<br>

-       // Install standard binary operators.<br>

-       // 1 is lowest precedence.<br>

-       BinopPrecedence['='] = 2;<br>

-       BinopPrecedence['<'] = 10;<br>

-       BinopPrecedence['+'] = 20;<br>

-       BinopPrecedence['-'] = 20;<br>

-<br>

-Now that the parser knows the precedence of the binary operator, it<br>

-takes care of all the parsing and AST generation. We just need to<br>

-implement codegen for the assignment operator. This looks like:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *BinaryExprAST::codegen() {<br>

-      // Special case '=' because we don't want to emit the LHS as an expression.<br>

-      if (Op == '=') {<br>

-        // Assignment requires the LHS to be an identifier.<br>

-        VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());<br>

-        if (!LHSE)<br>

-          return LogErrorV("destination of '=' must be a variable");<br>

-<br>

-Unlike the rest of the binary operators, our assignment operator doesn't<br>

-follow the "emit LHS, emit RHS, do computation" model. As such, it is<br>

-handled as a special case before the other binary operators are handled.<br>

-The other strange thing is that it requires the LHS to be a variable. It<br>

-is invalid to have "(x+1) = expr" - only things like "x = expr" are<br>

-allowed.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        // Codegen the RHS.<br>

-        Value *Val = RHS->codegen();<br>

-        if (!Val)<br>

-          return nullptr;<br>

-<br>

-        // Look up the name.<br>

-        Value *Variable = NamedValues[LHSE->getName()];<br>

-        if (!Variable)<br>

-          return LogErrorV("Unknown variable name");<br>

-<br>

-        Builder.CreateStore(Val, Variable);<br>

-        return Val;<br>

-      }<br>

-      ...<br>

-<br>

-Once we have the variable, codegen'ing the assignment is<br>

-straightforward: we emit the RHS of the assignment, create a store, and<br>

-return the computed value. Returning a value allows for chained<br>

-assignments like "X = (Y = Z)".<br>

-<br>

-Now that we have an assignment operator, we can mutate loop variables<br>

-and arguments. For example, we can now run code like this:<br>

-<br>

-::<br>

-<br>

-    # Function to print a double.<br>

-    extern printd(x);<br>

-<br>

-    # Define ':' for sequencing: as a low-precedence operator that ignores operands<br>

-    # and just returns the RHS.<br>

-    def binary : 1 (x y) y;<br>

-<br>

-    def test(x)<br>

-      printd(x) :<br>

-      x = 4 :<br>

-      printd(x);<br>

-<br>

-    test(123);<br>

-<br>

-When run, this example prints "123" and then "4", showing that we did<br>

-actually mutate the value! Okay, we have now officially implemented our<br>

-goal: getting this to work requires SSA construction in the general<br>

-case. However, to be really useful, we want the ability to define our<br>

-own local variables, let's add this next!<br>

-<br>

-User-defined Local Variables<br>

-============================<br>

-<br>

-Adding var/in is just like any other extension we made to<br>

-Kaleidoscope: we extend the lexer, the parser, the AST and the code<br>

-generator. The first step for adding our new 'var/in' construct is to<br>

-extend the lexer. As before, this is pretty trivial, the code looks like<br>

-this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    enum Token {<br>

-      ...<br>

-      // var definition<br>

-      tok_var = -13<br>

-    ...<br>

-    }<br>

-    ...<br>

-    static int gettok() {<br>

-    ...<br>

-        if (IdentifierStr == "in")<br>

-          return tok_in;<br>

-        if (IdentifierStr == "binary")<br>

-          return tok_binary;<br>

-        if (IdentifierStr == "unary")<br>

-          return tok_unary;<br>

-        if (IdentifierStr == "var")<br>

-          return tok_var;<br>

-        return tok_identifier;<br>

-    ...<br>

-<br>

-The next step is to define the AST node that we will construct. For<br>

-var/in, it looks like this:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// VarExprAST - Expression class for var/in<br>

-    class VarExprAST : public ExprAST {<br>

-      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;<br>

-      std::unique_ptr<ExprAST> Body;<br>

-<br>

-    public:<br>

-      VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,<br>

-                 std::unique_ptr<ExprAST> body)<br>

-      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}<br>

-<br>

-      virtual Value *codegen();<br>

-    };<br>

-<br>

-var/in allows a list of names to be defined all at once, and each name<br>

-can optionally have an initializer value. As such, we capture this<br>

-information in the VarNames vector. Also, var/in has a body, this body<br>

-is allowed to access the variables defined by the var/in.<br>

-<br>

-With this in place, we can define the parser pieces. The first thing we<br>

-do is add it as a primary expression:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// primary<br>

-    ///   ::= identifierexpr<br>

-    ///   ::= numberexpr<br>

-    ///   ::= parenexpr<br>

-    ///   ::= ifexpr<br>

-    ///   ::= forexpr<br>

-    ///   ::= varexpr<br>

-    static std::unique_ptr<ExprAST> ParsePrimary() {<br>

-      switch (CurTok) {<br>

-      default:<br>

-        return LogError("unknown token when expecting an expression");<br>

-      case tok_identifier:<br>

-        return ParseIdentifierExpr();<br>

-      case tok_number:<br>

-        return ParseNumberExpr();<br>

-      case '(':<br>

-        return ParseParenExpr();<br>

-      case tok_if:<br>

-        return ParseIfExpr();<br>

-      case tok_for:<br>

-        return ParseForExpr();<br>

-      case tok_var:<br>

-        return ParseVarExpr();<br>

-      }<br>

-    }<br>

-<br>

-Next we define ParseVarExpr:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    /// varexpr ::= 'var' identifier ('=' expression)?<br>

-    //                    (',' identifier ('=' expression)?)* 'in' expression<br>

-    static std::unique_ptr<ExprAST> ParseVarExpr() {<br>

-      getNextToken();  // eat the var.<br>

-<br>

-      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;<br>

-<br>

-      // At least one variable name is required.<br>

-      if (CurTok != tok_identifier)<br>

-        return LogError("expected identifier after var");<br>

-<br>

-The first part of this code parses the list of identifier/expr pairs<br>

-into the local ``VarNames`` vector.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      while (1) {<br>

-        std::string Name = IdentifierStr;<br>

-        getNextToken();  // eat identifier.<br>

-<br>

-        // Read the optional initializer.<br>

-        std::unique_ptr<ExprAST> Init;<br>

-        if (CurTok == '=') {<br>

-          getNextToken(); // eat the '='.<br>

-<br>

-          Init = ParseExpression();<br>

-          if (!Init) return nullptr;<br>

-        }<br>

-<br>

-        VarNames.push_back(std::make_pair(Name, std::move(Init)));<br>

-<br>

-        // End of var list, exit loop.<br>

-        if (CurTok != ',') break;<br>

-        getNextToken(); // eat the ','.<br>

-<br>

-        if (CurTok != tok_identifier)<br>

-          return LogError("expected identifier list after var");<br>

-      }<br>

-<br>

-Once all the variables are parsed, we then parse the body and create the<br>

-AST node:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // At this point, we have to have 'in'.<br>

-      if (CurTok != tok_in)<br>

-        return LogError("expected 'in' keyword after 'var'");<br>

-      getNextToken();  // eat 'in'.<br>

-<br>

-      auto Body = ParseExpression();<br>

-      if (!Body)<br>

-        return nullptr;<br>

-<br>

-      return llvm::make_unique<VarExprAST>(std::move(VarNames),<br>

-                                           std::move(Body));<br>

-    }<br>

-<br>

-Now that we can parse and represent the code, we need to support<br>

-emission of LLVM IR for it. This code starts out with:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-    Value *VarExprAST::codegen() {<br>

-      std::vector<AllocaInst *> OldBindings;<br>

-<br>

-      Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

-<br>

-      // Register all variables and emit their initializer.<br>

-      for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {<br>

-        const std::string &VarName = VarNames[i].first;<br>

-        ExprAST *Init = VarNames[i].second.get();<br>

-<br>

-Basically it loops over all the variables, installing them one at a<br>

-time. For each variable we put into the symbol table, we remember the<br>

-previous value that we replace in OldBindings.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-        // Emit the initializer before adding the variable to scope, this prevents<br>

-        // the initializer from referencing the variable itself, and permits stuff<br>

-        // like this:<br>

-        //  var a = 1 in<br>

-        //    var a = a in ...   # refers to outer 'a'.<br>

-        Value *InitVal;<br>

-        if (Init) {<br>

-          InitVal = Init->codegen();<br>

-          if (!InitVal)<br>

-            return nullptr;<br>

-        } else { // If not specified, use 0.0.<br>

-          InitVal = ConstantFP::get(LLVMContext, APFloat(0.0));<br>

-        }<br>

-<br>

-        AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

-        Builder.CreateStore(InitVal, Alloca);<br>

-<br>

-        // Remember the old variable binding so that we can restore the binding when<br>

-        // we unrecurse.<br>

-        OldBindings.push_back(NamedValues[VarName]);<br>

-<br>

-        // Remember this binding.<br>

-        NamedValues[VarName] = Alloca;<br>

-      }<br>

-<br>

-There are more comments here than code. The basic idea is that we emit<br>

-the initializer, create the alloca, then update the symbol table to<br>

-point to it. Once all the variables are installed in the symbol table,<br>

-we evaluate the body of the var/in expression:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Codegen the body, now that all vars are in scope.<br>

-      Value *BodyVal = Body->codegen();<br>

-      if (!BodyVal)<br>

-        return nullptr;<br>

-<br>

-Finally, before returning, we restore the previous variable bindings:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-      // Pop all our variables from scope.<br>

-      for (unsigned i = 0, e = VarNames.size(); i != e; ++i)<br>

-        NamedValues[VarNames[i].first] = OldBindings[i];<br>

-<br>

-      // Return the body computation.<br>

-      return BodyVal;<br>

-    }<br>

-<br>

-The end result of all of this is that we get properly scoped variable<br>

-definitions, and we even (trivially) allow mutation of them :).<br>

-<br>

-With this, we completed what we set out to do. Our nice iterative fib<br>

-example from the intro compiles and runs just fine. The mem2reg pass<br>

-optimizes all of our stack variables into SSA registers, inserting PHI<br>

-nodes where needed, and our front-end remains simple: no "iterated<br>

-dominance frontier" computation anywhere in sight.<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for our running example, enhanced with<br>

-mutable variables and var/in support. To build this example, use:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

-    # Run<br>

-    ./toy<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Adding Debug Information <LangImpl8.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl8.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl8.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl8.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl8.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl8.rst (removed)<br>

@@ -1,462 +0,0 @@<br>

-======================================<br>

-Kaleidoscope: Adding Debug Information<br>

-======================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Chapter 8 Introduction<br>

-======================<br>

-<br>

-Welcome to Chapter 8 of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. In chapters 1 through 7, we've built a<br>

-decent little programming language with functions and variables.<br>

-What happens if something goes wrong though, how do you debug your<br>

-program?<br>

-<br>

-Source level debugging uses formatted data that helps a debugger<br>

-translate from binary and the state of the machine back to the<br>

-source that the programmer wrote. In LLVM we generally use a format<br>

-called `DWARF <<a href="http://dwarfstd.org" rel="noreferrer" target="_blank">http://dwarfstd.org</a>>`_. DWARF is a compact encoding<br>

-that represents types, source locations, and variable locations.<br>

-<br>

-The short summary of this chapter is that we'll go through the<br>

-various things you have to add to a programming language to<br>

-support debug info, and how you translate that into DWARF.<br>

-<br>

-Caveat: For now we can't debug via the JIT, so we'll need to compile<br>

-our program down to something small and standalone. As part of this<br>

-we'll make a few modifications to the running of the language and<br>

-how programs are compiled. This means that we'll have a source file<br>

-with a simple program written in Kaleidoscope rather than the<br>

-interactive JIT. It does involve a limitation that we can only<br>

-have one "top level" command at a time to reduce the number of<br>

-changes necessary.<br>

-<br>

-Here's the sample program we'll be compiling:<br>

-<br>

-.. code-block:: python<br>

-<br>

-   def fib(x)<br>

-     if x < 3 then<br>

-       1<br>

-     else<br>

-       fib(x-1)+fib(x-2);<br>

-<br>

-   fib(10)<br>

-<br>

-<br>

-Why is this a hard problem?<br>

-===========================<br>

-<br>

-Debug information is a hard problem for a few different reasons - mostly<br>

-centered around optimized code. First, optimization makes keeping source<br>

-locations more difficult. In LLVM IR we keep the original source location<br>

-for each IR level instruction on the instruction. Optimization passes<br>

-should keep the source locations for newly created instructions, but merged<br>

-instructions only get to keep a single location - this can cause jumping<br>

-around when stepping through optimized programs. Secondly, optimization<br>

-can move variables in ways that are either optimized out, shared in memory<br>

-with other variables, or difficult to track. For the purposes of this<br>

-tutorial we're going to avoid optimization (as you'll see with one of the<br>

-next sets of patches).<br>

-<br>

-Ahead-of-Time Compilation Mode<br>

-==============================<br>

-<br>

-To highlight only the aspects of adding debug information to a source<br>

-language without needing to worry about the complexities of JIT debugging<br>

-we're going to make a few changes to Kaleidoscope to support compiling<br>

-the IR emitted by the front end into a simple standalone program that<br>

-you can execute, debug, and see results.<br>

-<br>

-First we make our anonymous function that contains our top level<br>

-statement be our "main":<br>

-<br>

-.. code-block:: udiff<br>

-<br>

-  -    auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>());<br>

-  +    auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>());<br>

-<br>

-just with the simple change of giving it a name.<br>

-<br>

-Then we're going to remove the command line code wherever it exists:<br>

-<br>

-.. code-block:: udiff<br>

-<br>

-  @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() {<br>

-   /// top ::= definition | external | expression | ';'<br>

-   static void MainLoop() {<br>

-     while (1) {<br>

-  -    fprintf(stderr, "ready> ");<br>

-       switch (CurTok) {<br>

-       case tok_eof:<br>

-         return;<br>

-  @@ -1184,7 +1183,6 @@ int main() {<br>

-     BinopPrecedence['*'] = 40; // highest.<br>

-<br>

-     // Prime the first token.<br>

-  -  fprintf(stderr, "ready> ");<br>

-     getNextToken();<br>

-<br>

-Lastly we're going to disable all of the optimization passes and the JIT so<br>

-that the only thing that happens after we're done parsing and generating<br>

-code is that the llvm IR goes to standard error:<br>

-<br>

-.. code-block:: udiff<br>

-<br>

-  @@ -1108,17 +1108,8 @@ static void HandleExtern() {<br>

-   static void HandleTopLevelExpression() {<br>

-     // Evaluate a top-level expression into an anonymous function.<br>

-     if (auto FnAST = ParseTopLevelExpr()) {<br>

-  -    if (auto *FnIR = FnAST->codegen()) {<br>

-  -      // We're just doing this to make sure it executes.<br>

-  -      TheExecutionEngine->finalizeObject();<br>

-  -      // JIT the function, returning a function pointer.<br>

-  -      void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR);<br>

-  -<br>

-  -      // Cast it to the right type (takes no arguments, returns a double) so we<br>

-  -      // can call it as a native function.<br>

-  -      double (*FP)() = (double (*)())(intptr_t)FPtr;<br>

-  -      // Ignore the return value for this.<br>

-  -      (void)FP;<br>

-  +    if (!F->codegen()) {<br>

-  +      fprintf(stderr, "Error generating code for top level expr");<br>

-       }<br>

-     } else {<br>

-       // Skip token for error recovery.<br>

-  @@ -1439,11 +1459,11 @@ int main() {<br>

-     // target lays out data structures.<br>

-     TheModule->setDataLayout(TheExecutionEngine->getDataLayout());<br>

-     OurFPM.add(new DataLayoutPass());<br>

-  +#if 0<br>

-     OurFPM.add(createBasicAliasAnalysisPass());<br>

-     // Promote allocas to registers.<br>

-     OurFPM.add(createPromoteMemoryToRegisterPass());<br>

-  @@ -1218,7 +1210,7 @@ int main() {<br>

-     OurFPM.add(createGVNPass());<br>

-     // Simplify the control flow graph (deleting unreachable blocks, etc).<br>

-     OurFPM.add(createCFGSimplificationPass());<br>

-  -<br>

-  +  #endif<br>

-     OurFPM.doInitialization();<br>

-<br>

-     // Set the global so the code gen can use this.<br>

-<br>

-This relatively small set of changes get us to the point that we can compile<br>

-our piece of Kaleidoscope language down to an executable program via this<br>

-command line:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-  Kaleidoscope-Ch8 < fib.ks | & clang -x ir -<br>

-<br>

-which gives an a.out/a.exe in the current working directory.<br>

-<br>

-Compile Unit<br>

-============<br>

-<br>

-The top level container for a section of code in DWARF is a compile unit.<br>

-This contains the type and function data for an individual translation unit<br>

-(read: one file of source code). So the first thing we need to do is<br>

-construct one for our fib.ks file.<br>

-<br>

-DWARF Emission Setup<br>

-====================<br>

-<br>

-Similar to the ``IRBuilder`` class we have a<br>

-`DIBuilder <<a href="http://llvm.org/doxygen/classllvm_1_1DIBuilder.html" rel="noreferrer" target="_blank">http://llvm.org/doxygen/classllvm_1_1DIBuilder.html</a>>`_ class<br>

-that helps in constructing debug metadata for an llvm IR file. It<br>

-corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names.<br>

-Using it does require that you be more familiar with DWARF terminology than<br>

-you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you<br>

-read through the general documentation on the<br>

-`Metadata Format <<a href="http://llvm.org/docs/SourceLevelDebugging.html" rel="noreferrer" target="_blank">http://llvm.org/docs/SourceLevelDebugging.html</a>>`_ it<br>

-should be a little more clear. We'll be using this class to construct all<br>

-of our IR level descriptions. Construction for it takes a module so we<br>

-need to construct it shortly after we construct our module. We've left it<br>

-as a global static variable to make it a bit easier to use.<br>

-<br>

-Next we're going to create a small container to cache some of our frequent<br>

-data. The first will be our compile unit, but we'll also write a bit of<br>

-code for our one type since we won't have to worry about multiple typed<br>

-expressions:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  static DIBuilder *DBuilder;<br>

-<br>

-  struct DebugInfo {<br>

-    DICompileUnit *TheCU;<br>

-    DIType *DblTy;<br>

-<br>

-    DIType *getDoubleTy();<br>

-  } KSDbgInfo;<br>

-<br>

-  DIType *DebugInfo::getDoubleTy() {<br>

-    if (DblTy.isValid())<br>

-      return DblTy;<br>

-<br>

-    DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);<br>

-    return DblTy;<br>

-  }<br>

-<br>

-And then later on in ``main`` when we're constructing our module:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  DBuilder = new DIBuilder(*TheModule);<br>

-<br>

-  KSDbgInfo.TheCU = DBuilder->createCompileUnit(<br>

-      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);<br>

-<br>

-There are a couple of things to note here. First, while we're producing a<br>

-compile unit for a language called Kaleidoscope we used the language<br>

-constant for C. This is because a debugger wouldn't necessarily understand<br>

-the calling conventions or default ABI for a language it doesn't recognize<br>

-and we follow the C ABI in our llvm code generation so it's the closest<br>

-thing to accurate. This ensures we can actually call functions from the<br>

-debugger and have them execute. Secondly, you'll see the "fib.ks" in the<br>

-call to ``createCompileUnit``. This is a default hard coded value since<br>

-we're using shell redirection to put our source into the Kaleidoscope<br>

-compiler. In a usual front end you'd have an input file name and it would<br>

-go there.<br>

-<br>

-One last thing as part of emitting debug information via DIBuilder is that<br>

-we need to "finalize" the debug information. The reasons are part of the<br>

-underlying API for DIBuilder, but make sure you do this near the end of<br>

-main:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  DBuilder->finalize();<br>

-<br>

-before you dump out the module.<br>

-<br>

-Functions<br>

-=========<br>

-<br>

-Now that we have our ``Compile Unit`` and our source locations, we can add<br>

-function definitions to the debug info. So in ``PrototypeAST::codegen()`` we<br>

-add a few lines of code to describe a context for our subprogram, in this<br>

-case the "File", and the actual definition of the function itself.<br>

-<br>

-So the context:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),<br>

-                                      KSDbgInfo.TheCU.getDirectory());<br>

-<br>

-giving us an DIFile and asking the ``Compile Unit`` we created above for the<br>

-directory and filename where we are currently. Then, for now, we use some<br>

-source locations of 0 (since our AST doesn't currently have source location<br>

-information) and construct our function definition:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  DIScope *FContext = Unit;<br>

-  unsigned LineNo = 0;<br>

-  unsigned ScopeLine = 0;<br>

-  DISubprogram *SP = DBuilder->createFunction(<br>

-      FContext, Name, StringRef(), Unit, LineNo,<br>

-      CreateFunctionType(Args.size(), Unit), false /* internal linkage */,<br>

-      true /* definition */, ScopeLine, DINode::FlagPrototyped, false);<br>

-  F->setSubprogram(SP);<br>

-<br>

-and we now have an DISubprogram that contains a reference to all of our<br>

-metadata for the function.<br>

-<br>

-Source Locations<br>

-================<br>

-<br>

-The most important thing for debug information is accurate source location -<br>

-this makes it possible to map your source code back. We have a problem though,<br>

-Kaleidoscope really doesn't have any source location information in the lexer<br>

-or parser so we'll need to add it.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-   struct SourceLocation {<br>

-     int Line;<br>

-     int Col;<br>

-   };<br>

-   static SourceLocation CurLoc;<br>

-   static SourceLocation LexLoc = {1, 0};<br>

-<br>

-   static int advance() {<br>

-     int LastChar = getchar();<br>

-<br>

-     if (LastChar == '\n' || LastChar == '\r') {<br>

-       LexLoc.Line++;<br>

-       LexLoc.Col = 0;<br>

-     } else<br>

-       LexLoc.Col++;<br>

-     return LastChar;<br>

-   }<br>

-<br>

-In this set of code we've added some functionality on how to keep track of the<br>

-line and column of the "source file". As we lex every token we set our current<br>

-current "lexical location" to the assorted line and column for the beginning<br>

-of the token. We do this by overriding all of the previous calls to<br>

-``getchar()`` with our new ``advance()`` that keeps track of the information<br>

-and then we have added to all of our AST classes a source location:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-   class ExprAST {<br>

-     SourceLocation Loc;<br>

-<br>

-     public:<br>

-       ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}<br>

-       virtual ~ExprAST() {}<br>

-       virtual Value* codegen() = 0;<br>

-       int getLine() const { return Loc.Line; }<br>

-       int getCol() const { return Loc.Col; }<br>

-       virtual raw_ostream &dump(raw_ostream &out, int ind) {<br>

-         return out << ':' << getLine() << ':' << getCol() << '\n';<br>

-       }<br>

-<br>

-that we pass down through when we create a new expression:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-   LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),<br>

-                                          std::move(RHS));<br>

-<br>

-giving us locations for each of our expressions and variables.<br>

-<br>

-From this we can make sure to tell ``DIBuilder`` when we're at a new source<br>

-location so it can use that when we generate the rest of our code and make<br>

-sure that each instruction has source location information. We do this<br>

-by constructing another small function:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  void DebugInfo::emitLocation(ExprAST *AST) {<br>

-    DIScope *Scope;<br>

-    if (LexicalBlocks.empty())<br>

-      Scope = TheCU;<br>

-    else<br>

-      Scope = LexicalBlocks.back();<br>

-    Builder.SetCurrentDebugLocation(<br>

-        DebugLoc::get(AST->getLine(), AST->getCol(), Scope));<br>

-  }<br>

-<br>

-that both tells the main ``IRBuilder`` where we are, but also what scope<br>

-we're in. Since we've just created a function above we can either be in<br>

-the main file scope (like when we created our function), or now we can be<br>

-in the function scope we just created. To represent this we create a stack<br>

-of scopes:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-   std::vector<DIScope *> LexicalBlocks;<br>

-   std::map<const PrototypeAST *, DIScope *> FnScopeMap;<br>

-<br>

-and keep a map of each function to the scope that it represents (an<br>

-DISubprogram is also an DIScope).<br>

-<br>

-Then we make sure to:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-   KSDbgInfo.emitLocation(this);<br>

-<br>

-emit the location every time we start to generate code for a new AST, and<br>

-also:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  KSDbgInfo.FnScopeMap[this] = SP;<br>

-<br>

-store the scope (function) when we create it and use it:<br>

-<br>

-  KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]);<br>

-<br>

-when we start generating the code for each function.<br>

-<br>

-also, don't forget to pop the scope back off of your scope stack at the<br>

-end of the code generation for the function:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  // Pop off the lexical block for the function since we added it<br>

-  // unconditionally.<br>

-  KSDbgInfo.LexicalBlocks.pop_back();<br>

-<br>

-Variables<br>

-=========<br>

-<br>

-Now that we have functions, we need to be able to print out the variables<br>

-we have in scope. Let's get our function arguments set up so we can get<br>

-decent backtraces and see how our functions are being called. It isn't<br>

-a lot of code, and we generally handle it when we're creating the<br>

-argument allocas in ``PrototypeAST::CreateArgumentAllocas``.<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  DIScope *Scope = KSDbgInfo.LexicalBlocks.back();<br>

-  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(),<br>

-                                      KSDbgInfo.TheCU.getDirectory());<br>

-  DILocalVariable D = DBuilder->createParameterVariable(<br>

-      Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true);<br>

-<br>

-  DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),<br>

-                          DebugLoc::get(Line, 0, Scope),<br>

-                          Builder.GetInsertBlock());<br>

-<br>

-Here we're doing a few things. First, we're grabbing our current scope<br>

-for the variable so we can say what range of code our variable is valid<br>

-through. Second, we're creating the variable, giving it the scope,<br>

-the name, source location, type, and since it's an argument, the argument<br>

-index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR<br>

-level that we've got a variable in an alloca (and it gives a starting<br>

-location for the variable), and setting a source location for the<br>

-beginning of the scope on the declare.<br>

-<br>

-One interesting thing to note at this point is that various debuggers have<br>

-assumptions based on how code and debug information was generated for them<br>

-in the past. In this case we need to do a little bit of a hack to avoid<br>

-generating line information for the function prologue so that the debugger<br>

-knows to skip over those instructions when setting a breakpoint. So in<br>

-``FunctionAST::CodeGen`` we add a couple of lines:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  // Unset the location for the prologue emission (leading instructions with no<br>

-  // location in a function are considered part of the prologue and the debugger<br>

-  // will run past them when breaking on a function)<br>

-  KSDbgInfo.emitLocation(nullptr);<br>

-<br>

-and then emit a new location when we actually start generating code for the<br>

-body of the function:<br>

-<br>

-.. code-block:: c++<br>

-<br>

-  KSDbgInfo.emitLocation(Body);<br>

-<br>

-With this we have enough debug information to set breakpoints in functions,<br>

-print out argument variables, and call functions. Not too bad for just a<br>

-few simple lines of code!<br>

-<br>

-Full Code Listing<br>

-=================<br>

-<br>

-Here is the complete code listing for our running example, enhanced with<br>

-debug information. To build this example, use:<br>

-<br>

-.. code-block:: bash<br>

-<br>

-    # Compile<br>

-    clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy<br>

-    # Run<br>

-    ./toy<br>

-<br>

-Here is the code:<br>

-<br>

-.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp<br>

-   :language: c++<br>

-<br>

-`Next: Conclusion and other useful LLVM tidbits <LangImpl9.html>`_<br>

-<br>

<br>

Removed: llvm/trunk/docs/tutorial/LangImpl9.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl9.rst?rev=274440&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/LangImpl9.rst?rev=274440&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/LangImpl9.rst (original)<br>

+++ llvm/trunk/docs/tutorial/LangImpl9.rst (removed)<br>

@@ -1,259 +0,0 @@<br>

-======================================================<br>

-Kaleidoscope: Conclusion and other useful LLVM tidbits<br>

-======================================================<br>

-<br>

-.. contents::<br>

-   :local:<br>

-<br>

-Tutorial Conclusion<br>

-===================<br>

-<br>

-Welcome to the final chapter of the "`Implementing a language with<br>

-LLVM <index.html>`_" tutorial. In the course of this tutorial, we have<br>

-grown our little Kaleidoscope language from being a useless toy, to<br>

-being a semi-interesting (but probably still useless) toy. :)<br>

-<br>

-It is interesting to see how far we've come, and how little code it has<br>

-taken. We built the entire lexer, parser, AST, code generator, an<br>

-interactive run-loop (with a JIT!), and emitted debug information in<br>

-standalone executables - all in under 1000 lines of (non-comment/non-blank)<br>

-code.<br>

-<br>

-Our little language supports a couple of interesting features: it<br>

-supports user defined binary and unary operators, it uses JIT<br>

-compilation for immediate evaluation, and it supports a few control flow<br>

-constructs with SSA construction.<br>

-<br>

-Part of the idea of this tutorial was to show you how easy and fun it<br>

-can be to define, build, and play with languages. Building a compiler<br>

-need not be a scary or mystical process! Now that you've seen some of<br>

-the basics, I strongly encourage you to take the code and hack on it.<br>

-For example, try adding:<br>

-<br>

--  **global variables** - While global variables have questional value<br>

-   in modern software engineering, they are often useful when putting<br>

-   together quick little hacks like the Kaleidoscope compiler itself.<br>

-   Fortunately, our current setup makes it very easy to add global<br>

-   variables: just have value lookup check to see if an unresolved<br>

-   variable is in the global variable symbol table before rejecting it.<br>

-   To create a new global variable, make an instance of the LLVM<br>

-   ``GlobalVariable`` class.<br>

--  **typed variables** - Kaleidoscope currently only supports variables<br>

-   of type double. This gives the language a very nice elegance, because<br>

-   only supporting one type means that you never have to specify types.<br>

-   Different languages have different ways of handling this. The easiest<br>

-   way is to require the user to specify types for every variable<br>

-   definition, and record the type of the variable in the symbol table<br>

-   along with its Value\*.<br>

--  **arrays, structs, vectors, etc** - Once you add types, you can start<br>

-   extending the type system in all sorts of interesting ways. Simple<br>

-   arrays are very easy and are quite useful for many different<br>

-   applications. Adding them is mostly an exercise in learning how the<br>

-   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction<br>

-   works: it is so nifty/unconventional, it `has its own<br>

-   FAQ <../GetElementPtr.html>`_!<br>

--  **standard runtime** - Our current language allows the user to access<br>

-   arbitrary external functions, and we use it for things like "printd"<br>

-   and "putchard". As you extend the language to add higher-level<br>

-   constructs, often these constructs make the most sense if they are<br>

-   lowered to calls into a language-supplied runtime. For example, if<br>

-   you add hash tables to the language, it would probably make sense to<br>

-   add the routines to a runtime, instead of inlining them all the way.<br>

--  **memory management** - Currently we can only access the stack in<br>

-   Kaleidoscope. It would also be useful to be able to allocate heap<br>

-   memory, either with calls to the standard libc malloc/free interface<br>

-   or with a garbage collector. If you would like to use garbage<br>

-   collection, note that LLVM fully supports `Accurate Garbage<br>

-   Collection <../GarbageCollection.html>`_ including algorithms that<br>

-   move objects and need to scan/update the stack.<br>

--  **exception handling support** - LLVM supports generation of `zero<br>

-   cost exceptions <../ExceptionHandling.html>`_ which interoperate with<br>

-   code compiled in other languages. You could also generate code by<br>

-   implicitly making every function return an error value and checking<br>

-   it. You could also make explicit use of setjmp/longjmp. There are<br>

-   many different ways to go here.<br>

--  **object orientation, generics, database access, complex numbers,<br>

-   geometric programming, ...** - Really, there is no end of crazy<br>

-   features that you can add to the language.<br>

--  **unusual domains** - We've been talking about applying LLVM to a<br>

-   domain that many people are interested in: building a compiler for a<br>

-   specific language. However, there are many other domains that can use<br>

-   compiler technology that are not typically considered. For example,<br>

-   LLVM has been used to implement OpenGL graphics acceleration,<br>

-   translate C++ code to ActionScript, and many other cute and clever<br>

-   things. Maybe you will be the first to JIT compile a regular<br>

-   expression interpreter into native code with LLVM?<br>

-<br>

-Have fun - try doing something crazy and unusual. Building a language<br>

-like everyone else always has, is much less fun than trying something a<br>

-little crazy or off the wall and seeing how it turns out. If you get<br>

-stuck or want to talk about it, feel free to email the `llvm-dev mailing<br>

-list <<a href="http://lists.llvm.org/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/mailman/listinfo/llvm-dev</a>>`_: it has lots<br>

-of people who are interested in languages and are often willing to help<br>

-out.<br>

-<br>

-Before we end this tutorial, I want to talk about some "tips and tricks"<br>

-for generating LLVM IR. These are some of the more subtle things that<br>

-may not be obvious, but are very useful if you want to take advantage of<br>

-LLVM's capabilities.<br>

-<br>

-Properties of the LLVM IR<br>

-=========================<br>

-<br>

-We have a couple of common questions about code in the LLVM IR form -<br>

-let's just get these out of the way right now, shall we?<br>

-<br>

-Target Independence<br>

--------------------<br>

-<br>

-Kaleidoscope is an example of a "portable language": any program written<br>

-in Kaleidoscope will work the same way on any target that it runs on.<br>

-Many other languages have this property, e.g. lisp, java, haskell,<br>

-javascript, python, etc (note that while these languages are portable,<br>

-not all their libraries are).<br>

-<br>

-One nice aspect of LLVM is that it is often capable of preserving target<br>

-independence in the IR: you can take the LLVM IR for a<br>

-Kaleidoscope-compiled program and run it on any target that LLVM<br>

-supports, even emitting C code and compiling that on targets that LLVM<br>

-doesn't support natively. You can trivially tell that the Kaleidoscope<br>

-compiler generates target-independent code because it never queries for<br>

-any target-specific information when generating code.<br>

-<br>

-The fact that LLVM provides a compact, target-independent,<br>

-representation for code gets a lot of people excited. Unfortunately,<br>

-these people are usually thinking about C or a language from the C<br>

-family when they are asking questions about language portability. I say<br>

-"unfortunately", because there is really no way to make (fully general)<br>

-C code portable, other than shipping the source code around (and of<br>

-course, C source code is not actually portable in general either - ever<br>

-port a really old application from 32- to 64-bits?).<br>

-<br>

-The problem with C (again, in its full generality) is that it is heavily<br>

-laden with target specific assumptions. As one simple example, the<br>

-preprocessor often destructively removes target-independence from the<br>

-code when it processes the input text:<br>

-<br>

-.. code-block:: c<br>

-<br>

-    #ifdef __i386__<br>

-      int X = 1;<br>

-    #else<br>

-      int X = 42;<br>

-    #endif<br>

-<br>

-While it is possible to engineer more and more complex solutions to<br>

-problems like this, it cannot be solved in full generality in a way that<br>

-is better than shipping the actual source code.<br>

-<br>

-That said, there are interesting subsets of C that can be made portable.<br>

-If you are willing to fix primitive types to a fixed size (say int =<br>

-32-bits, and long = 64-bits), don't care about ABI compatibility with<br>

-existing binaries, and are willing to give up some other minor features,<br>

-you can have portable code. This can make sense for specialized domains<br>

-such as an in-kernel language.<br>

-<br>

-Safety Guarantees<br>

------------------<br>

-<br>

-Many of the languages above are also "safe" languages: it is impossible<br>

-for a program written in Java to corrupt its address space and crash the<br>

-process (assuming the JVM has no bugs). Safety is an interesting<br>

-property that requires a combination of language design, runtime<br>

-support, and often operating system support.<br>

-<br>

-It is certainly possible to implement a safe language in LLVM, but LLVM<br>

-IR does not itself guarantee safety. The LLVM IR allows unsafe pointer<br>

-casts, use after free bugs, buffer over-runs, and a variety of other<br>

-problems. Safety needs to be implemented as a layer on top of LLVM and,<br>

-conveniently, several groups have investigated this. Ask on the `llvm-dev<br>

-mailing list <<a href="http://lists.llvm.org/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/mailman/listinfo/llvm-dev</a>>`_ if<br>

-you are interested in more details.<br>

-<br>

-Language-Specific Optimizations<br>

--------------------------------<br>

-<br>

-One thing about LLVM that turns off many people is that it does not<br>

-solve all the world's problems in one system (sorry 'world hunger',<br>

-someone else will have to solve you some other day). One specific<br>

-complaint is that people perceive LLVM as being incapable of performing<br>

-high-level language-specific optimization: LLVM "loses too much<br>

-information".<br>

-<br>

-Unfortunately, this is really not the place to give you a full and<br>

-unified version of "Chris Lattner's theory of compiler design". Instead,<br>

-I'll make a few observations:<br>

-<br>

-First, you're right that LLVM does lose information. For example, as of<br>

-this writing, there is no way to distinguish in the LLVM IR whether an<br>

-SSA-value came from a C "int" or a C "long" on an ILP32 machine (other<br>

-than debug info). Both get compiled down to an 'i32' value and the<br>

-information about what it came from is lost. The more general issue<br>

-here, is that the LLVM type system uses "structural equivalence" instead<br>

-of "name equivalence". Another place this surprises people is if you<br>

-have two types in a high-level language that have the same structure<br>

-(e.g. two different structs that have a single int field): these types<br>

-will compile down into a single LLVM type and it will be impossible to<br>

-tell what it came from.<br>

-<br>

-Second, while LLVM does lose information, LLVM is not a fixed target: we<br>

-continue to enhance and improve it in many different ways. In addition<br>

-to adding new features (LLVM did not always support exceptions or debug<br>

-info), we also extend the IR to capture important information for<br>

-optimization (e.g. whether an argument is sign or zero extended,<br>

-information about pointers aliasing, etc). Many of the enhancements are<br>

-user-driven: people want LLVM to include some specific feature, so they<br>

-go ahead and extend it.<br>

-<br>

-Third, it is *possible and easy* to add language-specific optimizations,<br>

-and you have a number of choices in how to do it. As one trivial<br>

-example, it is easy to add language-specific optimization passes that<br>

-"know" things about code compiled for a language. In the case of the C<br>

-family, there is an optimization pass that "knows" about the standard C<br>

-library functions. If you call "exit(0)" in main(), it knows that it is<br>

-safe to optimize that into "return 0;" because C specifies what the<br>

-'exit' function does.<br>

-<br>

-In addition to simple library knowledge, it is possible to embed a<br>

-variety of other language-specific information into the LLVM IR. If you<br>

-have a specific need and run into a wall, please bring the topic up on<br>

-the llvm-dev list. At the very worst, you can always treat LLVM as if it<br>

-were a "dumb code generator" and implement the high-level optimizations<br>

-you desire in your front-end, on the language-specific AST.<br>

-<br>

-Tips and Tricks<br>

-===============<br>

-<br>

-There is a variety of useful tips and tricks that you come to know after<br>

-working on/with LLVM that aren't obvious at first glance. Instead of<br>

-letting everyone rediscover them, this section talks about some of these<br>

-issues.<br>

-<br>

-Implementing portable offsetof/sizeof<br>

--------------------------------------<br>

-<br>

-One interesting thing that comes up, if you are trying to keep the code<br>

-generated by your compiler "target independent", is that you often need<br>

-to know the size of some LLVM type or the offset of some field in an<br>

-llvm structure. For example, you might need to pass the size of a type<br>

-into a function that allocates memory.<br>

-<br>

-Unfortunately, this can vary widely across targets: for example the<br>

-width of a pointer is trivially target-specific. However, there is a<br>

-`clever way to use the getelementptr<br>

-instruction <<a href="http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt" rel="noreferrer" target="_blank">http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt</a>>`_<br>

-that allows you to compute this in a portable way.<br>

-<br>

-Garbage Collected Stack Frames<br>

-------------------------------<br>

-<br>

-Some languages want to explicitly manage their stack frames, often so<br>

-that they are garbage collected or to allow easy implementation of<br>

-closures. There are often better ways to implement these features than<br>

-explicit stack frames, but `LLVM does support<br>

-them, <<a href="http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt" rel="noreferrer" target="_blank">http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt</a>>`_<br>

-if you want. It requires your front-end to convert the code into<br>

-`Continuation Passing<br>

-Style <<a href="http://en.wikipedia.org/wiki/Continuation-passing_style" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Continuation-passing_style</a>>`_ and<br>

-the use of tail calls (which LLVM also supports).<br>

-<br>

<br>

Modified: llvm/trunk/docs/tutorial/OCamlLangImpl5.rst<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/OCamlLangImpl5.rst?rev=274441&r1=274440&r2=274441&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/OCamlLangImpl5.rst?rev=274441&r1=274440&r2=274441&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/docs/tutorial/OCamlLangImpl5.rst (original)<br>

+++ llvm/trunk/docs/tutorial/OCamlLangImpl5.rst Sat Jul  2 12:01:59 2016<br>

@@ -178,7 +178,7 @@ IR into "t.ll" and run "``llvm-as < t.ll<br>

 window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll<br>

 see this graph:<br>

<br>

-.. figure:: LangImpl5-cfg.png<br>

+.. figure:: LangImpl05-cfg.png<br>

    :align: center<br>

    :alt: Example CFG<br>

<br>

<br>

Modified: llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt?rev=274441&r1=274440&r2=274441&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt?rev=274441&r1=274440&r2=274441&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt (original)<br>

+++ llvm/trunk/examples/Kaleidoscope/Chapter8/CMakeLists.txt Sat Jul  2 12:01:59 2016<br>

@@ -1,9 +1,5 @@<br>

 set(LLVM_LINK_COMPONENTS<br>

-  Core<br>

-  ExecutionEngine<br>

-  Object<br>

-  Support<br>

-  native<br>

+  all<br>

   )<br>

<br>

 add_kaleidoscope_chapter(Kaleidoscope-Ch8<br>

<br>

Modified: llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp?rev=274441&r1=274440&r2=274441&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp?rev=274441&r1=274440&r2=274441&view=diff</a><br>

==============================================================================<br>

--- llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp (original)<br>

+++ llvm/trunk/examples/Kaleidoscope/Chapter8/toy.cpp Sat Jul  2 12:01:59 2016<br>

@@ -1,28 +1,20 @@<br>

 #include "llvm/ADT/APFloat.h"<br>

-#include "llvm/ADT/SmallVector.h"<br>

 #include "llvm/ADT/STLExtras.h"<br>

-#include "llvm/ADT/StringRef.h"<br>

-#include "llvm/ADT/Triple.h"<br>

-#include "llvm/IR/BasicBlock.h"<br>

-#include "llvm/IR/Constants.h"<br>

-#include "llvm/IR/DebugInfoMetadata.h"<br>

-#include "llvm/IR/DebugLoc.h"<br>

-#include "llvm/IR/DerivedTypes.h"<br>

-#include "llvm/IR/DIBuilder.h"<br>

-#include "llvm/IR/Function.h"<br>

-#include "llvm/IR/Instructions.h"<br>

+#include "llvm/ADT/SmallVector.h"<br>

+#include "llvm/Analysis/Passes.h"<br>

 #include "llvm/IR/IRBuilder.h"<br>

 #include "llvm/IR/LLVMContext.h"<br>

+#include "llvm/IR/LegacyPassManager.h"<br>

 #include "llvm/IR/Metadata.h"<br>

 #include "llvm/IR/Module.h"<br>

 #include "llvm/IR/Type.h"<br>

 #include "llvm/IR/Verifier.h"<br>

-#include "llvm/Support/Host.h"<br>

-#include "llvm/Support/raw_ostream.h"<br>

+#include "llvm/Support/FileSystem.h"<br>

+#include "llvm/Support/TargetRegistry.h"<br>

 #include "llvm/Support/TargetSelect.h"<br>

 #include "llvm/Target/TargetMachine.h"<br>

-#include "../include/KaleidoscopeJIT.h"<br>

-#include <cassert><br>

+#include "llvm/Target/TargetOptions.h"<br>

+#include "llvm/Transforms/Scalar.h"<br>

 #include <cctype><br>

 #include <cstdio><br>

 #include <cstdlib><br>

@@ -33,7 +25,7 @@<br>

 #include <vector><br>

<br>

 using namespace llvm;<br>

-using namespace llvm::orc;<br>

+using namespace llvm::sys;<br>

<br>

 //===----------------------------------------------------------------------===//<br>

 // Lexer<br>

@@ -67,71 +59,6 @@ enum Token {<br>

   tok_var = -13<br>

 };<br>

<br>

-std::string getTokName(int Tok) {<br>

-  switch (Tok) {<br>

-  case tok_eof:<br>

-    return "eof";<br>

-  case tok_def:<br>

-    return "def";<br>

-  case tok_extern:<br>

-    return "extern";<br>

-  case tok_identifier:<br>

-    return "identifier";<br>

-  case tok_number:<br>

-    return "number";<br>

-  case tok_if:<br>

-    return "if";<br>

-  case tok_then:<br>

-    return "then";<br>

-  case tok_else:<br>

-    return "else";<br>

-  case tok_for:<br>

-    return "for";<br>

-  case tok_in:<br>

-    return "in";<br>

-  case tok_binary:<br>

-    return "binary";<br>

-  case tok_unary:<br>

-    return "unary";<br>

-  case tok_var:<br>

-    return "var";<br>

-  }<br>

-  return std::string(1, (char)Tok);<br>

-}<br>

-<br>

-namespace {<br>

-class ExprAST;<br>

-} // end anonymous namespace<br>

-<br>

-static LLVMContext TheContext;<br>

-static IRBuilder<> Builder(TheContext);<br>

-struct DebugInfo {<br>

-  DICompileUnit *TheCU;<br>

-  DIType *DblTy;<br>

-  std::vector<DIScope *> LexicalBlocks;<br>

-<br>

-  void emitLocation(ExprAST *AST);<br>

-  DIType *getDoubleTy();<br>

-} KSDbgInfo;<br>

-<br>

-struct SourceLocation {<br>

-  int Line;<br>

-  int Col;<br>

-};<br>

-static SourceLocation CurLoc;<br>

-static SourceLocation LexLoc = {1, 0};<br>

-<br>

-static int advance() {<br>

-  int LastChar = getchar();<br>

-<br>

-  if (LastChar == '\n' || LastChar == '\r') {<br>

-    LexLoc.Line++;<br>

-    LexLoc.Col = 0;<br>

-  } else<br>

-    LexLoc.Col++;<br>

-  return LastChar;<br>

-}<br>

-<br>

 static std::string IdentifierStr; // Filled in if tok_identifier<br>

 static double NumVal;             // Filled in if tok_number<br>

<br>

@@ -141,13 +68,11 @@ static int gettok() {<br>

<br>

   // Skip any whitespace.<br>

   while (isspace(LastChar))<br>

-    LastChar = advance();<br>

-<br>

-  CurLoc = LexLoc;<br>

+    LastChar = getchar();<br>

<br>

   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*<br>

     IdentifierStr = LastChar;<br>

-    while (isalnum((LastChar = advance())))<br>

+    while (isalnum((LastChar = getchar())))<br>

       IdentifierStr += LastChar;<br>

<br>

     if (IdentifierStr == "def")<br>

@@ -177,7 +102,7 @@ static int gettok() {<br>

     std::string NumStr;<br>

     do {<br>

       NumStr += LastChar;<br>

-      LastChar = advance();<br>

+      LastChar = getchar();<br>

     } while (isdigit(LastChar) || LastChar == '.');<br>

<br>

     NumVal = strtod(NumStr.c_str(), nullptr);<br>

@@ -187,7 +112,7 @@ static int gettok() {<br>

   if (LastChar == '#') {<br>

     // Comment until end of line.<br>

     do<br>

-      LastChar = advance();<br>

+      LastChar = getchar();<br>

     while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');<br>

<br>

     if (LastChar != EOF)<br>

@@ -200,7 +125,7 @@ static int gettok() {<br>

<br>

   // Otherwise, just return the character as its ascii value.<br>

   int ThisChar = LastChar;<br>

-  LastChar = advance();<br>

+  LastChar = getchar();<br>

   return ThisChar;<br>

 }<br>

<br>

@@ -208,25 +133,11 @@ static int gettok() {<br>

 // Abstract Syntax Tree (aka Parse Tree)<br>

 //===----------------------------------------------------------------------===//<br>

 namespace {<br>

-<br>

-raw_ostream &indent(raw_ostream &O, int size) {<br>

-  return O << std::string(size, ' ');<br>

-}<br>

-<br>

 /// ExprAST - Base class for all expression nodes.<br>

 class ExprAST {<br>

-  SourceLocation Loc;<br>

-<br>

 public:<br>

-  ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}<br>

   virtual ~ExprAST() {}<br>

   virtual Value *codegen() = 0;<br>

-  int getLine() const { return Loc.Line; }<br>

-  int getCol() const { return Loc.Col; }<br>

-<br>

-  virtual raw_ostream &dump(raw_ostream &out, int ind) {<br>

-    return out << ':' << getLine() << ':' << getCol() << '\n';<br>

-  }<br>

 };<br>

<br>

 /// NumberExprAST - Expression class for numeric literals like "1.0".<br>

@@ -236,10 +147,6 @@ class NumberExprAST : public ExprAST {<br>

 public:<br>

   NumberExprAST(double Val) : Val(Val) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    return ExprAST::dump(out << Val, ind);<br>

-  }<br>

 };<br>

<br>

 /// VariableExprAST - Expression class for referencing a variable, like "a".<br>

@@ -247,14 +154,9 @@ class VariableExprAST : public ExprAST {<br>

   std::string Name;<br>

<br>

 public:<br>

-  VariableExprAST(SourceLocation Loc, const std::string &Name)<br>

-      : ExprAST(Loc), Name(Name) {}<br>

+  VariableExprAST(const std::string &Name) : Name(Name) {}<br>

   const std::string &getName() const { return Name; }<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    return ExprAST::dump(out << Name, ind);<br>

-  }<br>

 };<br>

<br>

 /// UnaryExprAST - Expression class for a unary operator.<br>

@@ -266,12 +168,6 @@ public:<br>

   UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)<br>

       : Opcode(Opcode), Operand(std::move(Operand)) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    ExprAST::dump(out << "unary" << Opcode, ind);<br>

-    Operand->dump(out, ind + 1);<br>

-    return out;<br>

-  }<br>

 };<br>

<br>

 /// BinaryExprAST - Expression class for a binary operator.<br>

@@ -280,17 +176,10 @@ class BinaryExprAST : public ExprAST {<br>

   std::unique_ptr<ExprAST> LHS, RHS;<br>

<br>

 public:<br>

-  BinaryExprAST(SourceLocation Loc, char Op, std::unique_ptr<ExprAST> LHS,<br>

+  BinaryExprAST(char Op, std::unique_ptr<ExprAST> LHS,<br>

                 std::unique_ptr<ExprAST> RHS)<br>

-      : ExprAST(Loc), Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}<br>

+      : Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    ExprAST::dump(out << "binary" << Op, ind);<br>

-    LHS->dump(indent(out, ind) << "LHS:", ind + 1);<br>

-    RHS->dump(indent(out, ind) << "RHS:", ind + 1);<br>

-    return out;<br>

-  }<br>

 };<br>

<br>

 /// CallExprAST - Expression class for function calls.<br>

@@ -299,17 +188,10 @@ class CallExprAST : public ExprAST {<br>

   std::vector<std::unique_ptr<ExprAST>> Args;<br>

<br>

 public:<br>

-  CallExprAST(SourceLocation Loc, const std::string &Callee,<br>

+  CallExprAST(const std::string &Callee,<br>

               std::vector<std::unique_ptr<ExprAST>> Args)<br>

-      : ExprAST(Loc), Callee(Callee), Args(std::move(Args)) {}<br>

+      : Callee(Callee), Args(std::move(Args)) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    ExprAST::dump(out << "call " << Callee, ind);<br>

-    for (const auto &Arg : Args)<br>

-      Arg->dump(indent(out, ind + 1), ind + 1);<br>

-    return out;<br>

-  }<br>

 };<br>

<br>

 /// IfExprAST - Expression class for if/then/else.<br>

@@ -317,19 +199,10 @@ class IfExprAST : public ExprAST {<br>

   std::unique_ptr<ExprAST> Cond, Then, Else;<br>

<br>

 public:<br>

-  IfExprAST(SourceLocation Loc, std::unique_ptr<ExprAST> Cond,<br>

-            std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else)<br>

-      : ExprAST(Loc), Cond(std::move(Cond)), Then(std::move(Then)),<br>

-        Else(std::move(Else)) {}<br>

+  IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then,<br>

+            std::unique_ptr<ExprAST> Else)<br>

+      : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    ExprAST::dump(out << "if", ind);<br>

-    Cond->dump(indent(out, ind) << "Cond:", ind + 1);<br>

-    Then->dump(indent(out, ind) << "Then:", ind + 1);<br>

-    Else->dump(indent(out, ind) << "Else:", ind + 1);<br>

-    return out;<br>

-  }<br>

 };<br>

<br>

 /// ForExprAST - Expression class for for/in.<br>

@@ -344,15 +217,6 @@ public:<br>

       : VarName(VarName), Start(std::move(Start)), End(std::move(End)),<br>

         Step(std::move(Step)), Body(std::move(Body)) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    ExprAST::dump(out << "for", ind);<br>

-    Start->dump(indent(out, ind) << "Cond:", ind + 1);<br>

-    End->dump(indent(out, ind) << "End:", ind + 1);<br>

-    Step->dump(indent(out, ind) << "Step:", ind + 1);<br>

-    Body->dump(indent(out, ind) << "Body:", ind + 1);<br>

-    return out;<br>

-  }<br>

 };<br>

<br>

 /// VarExprAST - Expression class for var/in<br>

@@ -366,14 +230,6 @@ public:<br>

       std::unique_ptr<ExprAST> Body)<br>

       : VarNames(std::move(VarNames)), Body(std::move(Body)) {}<br>

   Value *codegen() override;<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

-    ExprAST::dump(out << "var", ind);<br>

-    for (const auto &NamedVar : VarNames)<br>

-      NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1);<br>

-    Body->dump(indent(out, ind) << "Body:", ind + 1);<br>

-    return out;<br>

-  }<br>

 };<br>

<br>

 /// PrototypeAST - This class represents the "prototype" for a function,<br>

@@ -384,14 +240,12 @@ class PrototypeAST {<br>

   std::vector<std::string> Args;<br>

   bool IsOperator;<br>

   unsigned Precedence; // Precedence if a binary op.<br>

-  int Line;<br>

<br>

 public:<br>

-  PrototypeAST(SourceLocation Loc, const std::string &Name,<br>

-               std::vector<std::string> Args, bool IsOperator = false,<br>

-               unsigned Prec = 0)<br>

+  PrototypeAST(const std::string &Name, std::vector<std::string> Args,<br>

+               bool IsOperator = false, unsigned Prec = 0)<br>

       : Name(Name), Args(std::move(Args)), IsOperator(IsOperator),<br>

-        Precedence(Prec), Line(Loc.Line) {}<br>

+        Precedence(Prec) {}<br>

   Function *codegen();<br>

   const std::string &getName() const { return Name; }<br>

<br>

@@ -404,7 +258,6 @@ public:<br>

   }<br>

<br>

   unsigned getBinaryPrecedence() const { return Precedence; }<br>

-  int getLine() const { return Line; }<br>

 };<br>

<br>

 /// FunctionAST - This class represents a function definition itself.<br>

@@ -417,13 +270,6 @@ public:<br>

               std::unique_ptr<ExprAST> Body)<br>

       : Proto(std::move(Proto)), Body(std::move(Body)) {}<br>

   Function *codegen();<br>

-<br>

-  raw_ostream &dump(raw_ostream &out, int ind) {<br>

-    indent(out, ind) << "FunctionAST\n";<br>

-    ++ind;<br>

-    indent(out, ind) << "Body:";<br>

-    return Body ? Body->dump(out, ind) : out << "null\n";<br>

-  }<br>

 };<br>

 } // end anonymous namespace<br>

<br>

@@ -492,12 +338,10 @@ static std::unique_ptr<ExprAST> ParsePar<br>

 static std::unique_ptr<ExprAST> ParseIdentifierExpr() {<br>

   std::string IdName = IdentifierStr;<br>

<br>

-  SourceLocation LitLoc = CurLoc;<br>

-<br>

   getNextToken(); // eat identifier.<br>

<br>

   if (CurTok != '(') // Simple variable ref.<br>

-    return llvm::make_unique<VariableExprAST>(LitLoc, IdName);<br>

+    return llvm::make_unique<VariableExprAST>(IdName);<br>

<br>

   // Call.<br>

   getNextToken(); // eat (<br>

@@ -521,13 +365,11 @@ static std::unique_ptr<ExprAST> ParseIde<br>

   // Eat the ')'.<br>

   getNextToken();<br>

<br>

-  return llvm::make_unique<CallExprAST>(LitLoc, IdName, std::move(Args));<br>

+  return llvm::make_unique<CallExprAST>(IdName, std::move(Args));<br>

 }<br>

<br>

 /// ifexpr ::= 'if' expression 'then' expression 'else' expression<br>

 static std::unique_ptr<ExprAST> ParseIfExpr() {<br>

-  SourceLocation IfLoc = CurLoc;<br>

-<br>

   getNextToken(); // eat the if.<br>

<br>

   // condition.<br>

@@ -552,7 +394,7 @@ static std::unique_ptr<ExprAST> ParseIfE<br>

   if (!Else)<br>

     return nullptr;<br>

<br>

-  return llvm::make_unique<IfExprAST>(IfLoc, std::move(Cond), std::move(Then),<br>

+  return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then),<br>

                                       std::move(Else));<br>

 }<br>

<br>

@@ -707,7 +549,6 @@ static std::unique_ptr<ExprAST> ParseBin<br>

<br>

     // Okay, we know this is a binop.<br>

     int BinOp = CurTok;<br>

-    SourceLocation BinLoc = CurLoc;<br>

     getNextToken(); // eat binop<br>

<br>

     // Parse the unary expression after the binary operator.<br>

@@ -725,8 +566,8 @@ static std::unique_ptr<ExprAST> ParseBin<br>

     }<br>

<br>

     // Merge LHS/RHS.<br>

-    LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),<br>

-                                           std::move(RHS));<br>

+    LHS =<br>

+        llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), std::move(RHS));<br>

   }<br>

 }<br>

<br>

@@ -748,8 +589,6 @@ static std::unique_ptr<ExprAST> ParseExp<br>

 static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

   std::string FnName;<br>

<br>

-  SourceLocation FnLoc = CurLoc;<br>

-<br>

   unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.<br>

   unsigned BinaryPrecedence = 30;<br>

<br>

@@ -805,7 +644,7 @@ static std::unique_ptr<PrototypeAST> Par<br>

   if (Kind && ArgNames.size() != Kind)<br>

     return LogErrorP("Invalid number of operands for operator");<br>

<br>

-  return llvm::make_unique<PrototypeAST>(FnLoc, FnName, ArgNames, Kind != 0,<br>

+  return llvm::make_unique<PrototypeAST>(FnName, ArgNames, Kind != 0,<br>

                                          BinaryPrecedence);<br>

 }<br>

<br>

@@ -823,10 +662,9 @@ static std::unique_ptr<FunctionAST> Pars<br>

<br>

 /// toplevelexpr ::= expression<br>

 static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {<br>

-  SourceLocation FnLoc = CurLoc;<br>

   if (auto E = ParseExpression()) {<br>

     // Make an anonymous proto.<br>

-    auto Proto = llvm::make_unique<PrototypeAST>(FnLoc, "__anon_expr",<br>

+    auto Proto = llvm::make_unique<PrototypeAST>("__anon_expr",<br>

                                                  std::vector<std::string>());<br>

     return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

   }<br>

@@ -840,51 +678,13 @@ static std::unique_ptr<PrototypeAST> Par<br>

 }<br>

<br>

 //===----------------------------------------------------------------------===//<br>

-// Debug Info Support<br>

-//===----------------------------------------------------------------------===//<br>

-<br>

-static std::unique_ptr<DIBuilder> DBuilder;<br>

-<br>

-DIType *DebugInfo::getDoubleTy() {<br>

-  if (DblTy)<br>

-    return DblTy;<br>

-<br>

-  DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);<br>

-  return DblTy;<br>

-}<br>

-<br>

-void DebugInfo::emitLocation(ExprAST *AST) {<br>

-  if (!AST)<br>

-    return Builder.SetCurrentDebugLocation(DebugLoc());<br>

-  DIScope *Scope;<br>

-  if (LexicalBlocks.empty())<br>

-    Scope = TheCU;<br>

-  else<br>

-    Scope = LexicalBlocks.back();<br>

-  Builder.SetCurrentDebugLocation(<br>

-      DebugLoc::get(AST->getLine(), AST->getCol(), Scope));<br>

-}<br>

-<br>

-static DISubroutineType *CreateFunctionType(unsigned NumArgs, DIFile *Unit) {<br>

-  SmallVector<Metadata *, 8> EltTys;<br>

-  DIType *DblTy = KSDbgInfo.getDoubleTy();<br>

-<br>

-  // Add the result type.<br>

-  EltTys.push_back(DblTy);<br>

-<br>

-  for (unsigned i = 0, e = NumArgs; i != e; ++i)<br>

-    EltTys.push_back(DblTy);<br>

-<br>

-  return DBuilder->createSubroutineType(DBuilder->getOrCreateTypeArray(EltTys));<br>

-}<br>

-<br>

-//===----------------------------------------------------------------------===//<br>

 // Code Generation<br>

 //===----------------------------------------------------------------------===//<br>

<br>

+static LLVMContext TheContext;<br>

+static IRBuilder<> Builder(TheContext);<br>

 static std::unique_ptr<Module> TheModule;<br>

 static std::map<std::string, AllocaInst *> NamedValues;<br>

-static std::unique_ptr<KaleidoscopeJIT> TheJIT;<br>

 static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos;<br>

<br>

 Value *LogErrorV(const char *Str) {<br>

@@ -917,7 +717,6 @@ static AllocaInst *CreateEntryBlockAlloc<br>

 }<br>

<br>

 Value *NumberExprAST::codegen() {<br>

-  KSDbgInfo.emitLocation(this);<br>

   return ConstantFP::get(TheContext, APFloat(Val));<br>

 }<br>

<br>

@@ -927,7 +726,6 @@ Value *VariableExprAST::codegen() {<br>

   if (!V)<br>

     return LogErrorV("Unknown variable name");<br>

<br>

-  KSDbgInfo.emitLocation(this);<br>

   // Load the value.<br>

   return Builder.CreateLoad(V, Name.c_str());<br>

 }<br>

@@ -941,13 +739,10 @@ Value *UnaryExprAST::codegen() {<br>

   if (!F)<br>

     return LogErrorV("Unknown unary operator");<br>

<br>

-  KSDbgInfo.emitLocation(this);<br>

   return Builder.CreateCall(F, OperandV, "unop");<br>

 }<br>

<br>

 Value *BinaryExprAST::codegen() {<br>

-  KSDbgInfo.emitLocation(this);<br>

-<br>

   // Special case '=' because we don't want to emit the LHS as an expression.<br>

   if (Op == '=') {<br>

     // Assignment requires the LHS to be an identifier.<br>

@@ -1001,8 +796,6 @@ Value *BinaryExprAST::codegen() {<br>

 }<br>

<br>

 Value *CallExprAST::codegen() {<br>

-  KSDbgInfo.emitLocation(this);<br>

-<br>

   // Look up the name in the global module table.<br>

   Function *CalleeF = getFunction(Callee);<br>

   if (!CalleeF)<br>

@@ -1023,8 +816,6 @@ Value *CallExprAST::codegen() {<br>

 }<br>

<br>

 Value *IfExprAST::codegen() {<br>

-  KSDbgInfo.emitLocation(this);<br>

-<br>

   Value *CondV = Cond->codegen();<br>

   if (!CondV)<br>

     return nullptr;<br>

@@ -1101,8 +892,6 @@ Value *ForExprAST::codegen() {<br>

   // Create an alloca for the variable in the entry block.<br>

   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

<br>

-  KSDbgInfo.emitLocation(this);<br>

-<br>

   // Emit the start code first, without 'variable' in scope.<br>

   Value *StartVal = Start->codegen();<br>

   if (!StartVal)<br>

@@ -1213,8 +1002,6 @@ Value *VarExprAST::codegen() {<br>

     NamedValues[VarName] = Alloca;<br>

   }<br>

<br>

-  KSDbgInfo.emitLocation(this);<br>

-<br>

   // Codegen the body, now that all vars are in scope.<br>

   Value *BodyVal = Body->codegen();<br>

   if (!BodyVal)<br>

@@ -1262,43 +1049,12 @@ Function *FunctionAST::codegen() {<br>

   BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction);<br>

   Builder.SetInsertPoint(BB);<br>

<br>

-  // Create a subprogram DIE for this function.<br>

-  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(),<br>

-                                      KSDbgInfo.TheCU->getDirectory());<br>

-  DIScope *FContext = Unit;<br>

-  unsigned LineNo = P.getLine();<br>

-  unsigned ScopeLine = LineNo;<br>

-  DISubprogram *SP = DBuilder->createFunction(<br>

-      FContext, P.getName(), StringRef(), Unit, LineNo,<br>

-      CreateFunctionType(TheFunction->arg_size(), Unit),<br>

-      false /* internal linkage */, true /* definition */, ScopeLine,<br>

-      DINode::FlagPrototyped, false);<br>

-  TheFunction->setSubprogram(SP);<br>

-<br>

-  // Push the current scope.<br>

-  KSDbgInfo.LexicalBlocks.push_back(SP);<br>

-<br>

-  // Unset the location for the prologue emission (leading instructions with no<br>

-  // location in a function are considered part of the prologue and the debugger<br>

-  // will run past them when breaking on a function)<br>

-  KSDbgInfo.emitLocation(nullptr);<br>

-<br>

   // Record the function arguments in the NamedValues map.<br>

   NamedValues.clear();<br>

-  unsigned ArgIdx = 0;<br>

   for (auto &Arg : TheFunction->args()) {<br>

     // Create an alloca for this variable.<br>

     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());<br>

<br>

-    // Create a debug descriptor for the variable.<br>

-    DILocalVariable *D = DBuilder->createParameterVariable(<br>

-        SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(),<br>

-        true);<br>

-<br>

-    DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),<br>

-                            DebugLoc::get(LineNo, 0, SP),<br>

-                            Builder.GetInsertBlock());<br>

-<br>

     // Store the initial value into the alloca.<br>

     Builder.CreateStore(&Arg, Alloca);<br>

<br>

@@ -1306,15 +1062,10 @@ Function *FunctionAST::codegen() {<br>

     NamedValues[Arg.getName()] = Alloca;<br>

   }<br>

<br>

-  KSDbgInfo.emitLocation(Body.get());<br>

-<br>

   if (Value *RetVal = Body->codegen()) {<br>

     // Finish off the function.<br>

     Builder.CreateRet(RetVal);<br>

<br>

-    // Pop off the lexical block for the function.<br>

-    KSDbgInfo.LexicalBlocks.pop_back();<br>

-<br>

     // Validate the generated code, checking for consistency.<br>

     verifyFunction(*TheFunction);<br>

<br>

@@ -1326,11 +1077,6 @@ Function *FunctionAST::codegen() {<br>

<br>

   if (P.isBinaryOp())<br>

     BinopPrecedence.erase(Proto->getOperatorName());<br>

-<br>

-  // Pop off the lexical block for the function since we added it<br>

-  // unconditionally.<br>

-  KSDbgInfo.LexicalBlocks.pop_back();<br>

-<br>

   return nullptr;<br>

 }<br>

<br>

@@ -1338,16 +1084,17 @@ Function *FunctionAST::codegen() {<br>

 // Top-Level parsing and JIT Driver<br>

 //===----------------------------------------------------------------------===//<br>

<br>

-static void InitializeModule() {<br>

+static void InitializeModuleAndPassManager() {<br>

   // Open a new module.<br>

   TheModule = llvm::make_unique<Module>("my cool jit", TheContext);<br>

-  TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());<br>

 }<br>

<br>

 static void HandleDefinition() {<br>

   if (auto FnAST = ParseDefinition()) {<br>

-    if (!FnAST->codegen())<br>

-      fprintf(stderr, "Error reading function definition:");<br>

+    if (auto *FnIR = FnAST->codegen()) {<br>

+      fprintf(stderr, "Read function definition:");<br>

+      FnIR->dump();<br>

+    }<br>

   } else {<br>

     // Skip token for error recovery.<br>

     getNextToken();<br>

@@ -1356,10 +1103,11 @@ static void HandleDefinition() {<br>

<br>

 static void HandleExtern() {<br>

   if (auto ProtoAST = ParseExtern()) {<br>

-    if (!ProtoAST->codegen())<br>

-      fprintf(stderr, "Error reading extern");<br>

-    else<br>

+    if (auto *FnIR = ProtoAST->codegen()) {<br>

+      fprintf(stderr, "Read extern: ");<br>

+      FnIR->dump();<br>

       FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);<br>

+    }<br>

   } else {<br>

     // Skip token for error recovery.<br>

     getNextToken();<br>

@@ -1369,9 +1117,7 @@ static void HandleExtern() {<br>

 static void HandleTopLevelExpression() {<br>

   // Evaluate a top-level expression into an anonymous function.<br>

   if (auto FnAST = ParseTopLevelExpr()) {<br>

-    if (!FnAST->codegen()) {<br>

-      fprintf(stderr, "Error generating code for top level expr");<br>

-    }<br>

+    FnAST->codegen();<br>

   } else {<br>

     // Skip token for error recovery.<br>

     getNextToken();<br>

@@ -1421,50 +1167,74 @@ extern "C" double printd(double X) {<br>

 //===----------------------------------------------------------------------===//<br>

<br>

 int main() {<br>

-  InitializeNativeTarget();<br>

-  InitializeNativeTargetAsmPrinter();<br>

-  InitializeNativeTargetAsmParser();<br>

-<br>

   // Install standard binary operators.<br>

   // 1 is lowest precedence.<br>

-  BinopPrecedence['='] = 2;<br>

   BinopPrecedence['<'] = 10;<br>

   BinopPrecedence['+'] = 20;<br>

   BinopPrecedence['-'] = 20;<br>

   BinopPrecedence['*'] = 40; // highest.<br>

<br>

   // Prime the first token.<br>

+  fprintf(stderr, "ready> ");<br>

   getNextToken();<br>

<br>

-  TheJIT = llvm::make_unique<KaleidoscopeJIT>();<br>

-<br>

-  InitializeModule();<br>

-<br>

-  // Add the current debug info version into the module.<br>

-  TheModule->addModuleFlag(Module::Warning, "Debug Info Version",<br>

-                           DEBUG_METADATA_VERSION);<br>

-<br>

-  // Darwin only supports dwarf2.<br>

-  if (Triple(sys::getProcessTriple()).isOSDarwin())<br>

-    TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2);<br>

-<br>

-  // Construct the DIBuilder, we do this here because we need the module.<br>

-  DBuilder = llvm::make_unique<DIBuilder>(*TheModule);<br>

-<br>

-  // Create the compile unit for the module.<br>

-  // Currently down as "fib.ks" as a filename since we're redirecting stdin<br>

-  // but we'd like actual source locations.<br>

-  KSDbgInfo.TheCU = DBuilder->createCompileUnit(<br>

-      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", false, "", 0);<br>

+  InitializeModuleAndPassManager();<br>

<br>

   // Run the main "interpreter loop" now.<br>

   MainLoop();<br>

<br>

-  // Finalize the debug info.<br>

-  DBuilder->finalize();<br>

+  // Initialize the target registry etc.<br>

+  InitializeAllTargetInfos();<br>

+  InitializeAllTargets();<br>

+  InitializeAllTargetMCs();<br>

+  InitializeAllAsmParsers();<br>

+  InitializeAllAsmPrinters();<br>

+<br>

+  auto TargetTriple = sys::getDefaultTargetTriple();<br>

+  TheModule->setTargetTriple(TargetTriple);<br>

+<br>

+  std::string Error;<br>

+  auto Target = TargetRegistry::lookupTarget(TargetTriple, Error);<br>

+<br>

+  // Print an error and exit if we couldn't find the requested target.<br>

+  // This generally occurs if we've forgotten to initialise the<br>

+  // TargetRegistry or we have a bogus target triple.<br>

+  if (!Target) {<br>

+    errs() << Error;<br>

+    return 1;<br>

+  }<br>

+<br>

+  auto CPU = "generic";<br>

+  auto Features = "";<br>

+<br>

+  TargetOptions opt;<br>

+  auto RM = Optional<Reloc::Model>();<br>

+  auto TheTargetMachine =<br>

+      Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM);<br>

+<br>

+  TheModule->setDataLayout(TheTargetMachine->createDataLayout());<br>

+<br>

+  auto Filename = "output.o";<br>

+  std::error_code EC;<br>

+  raw_fd_ostream dest(Filename, EC, sys::fs::F_None);<br>

+<br>

+  if (EC) {<br>

+    errs() << "Could not open file: " << EC.message();<br>

+    return 1;<br>

+  }<br>

+<br>

+  legacy::PassManager pass;<br>

+  auto FileType = TargetMachine::CGFT_ObjectFile;<br>

+<br>

+  if (TheTargetMachine->addPassesToEmitFile(pass, dest, FileType)) {<br>

+    errs() << "TheTargetMachine can't emit a file of this type";<br>

+    return 1;<br>

+  }<br>

+<br>

+  pass.run(*TheModule);<br>

+  dest.flush();<br>

<br>

-  // Print out all of the generated code.<br>

-  TheModule->dump();<br>

+  outs() << "Wrote " << Filename << "\n";<br>

<br>

   return 0;<br>

 }<br>

<br>

Added: llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt (added)<br>

+++ llvm/trunk/examples/Kaleidoscope/Chapter9/CMakeLists.txt Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,13 @@<br>

+set(LLVM_LINK_COMPONENTS<br>

+  Core<br>

+  ExecutionEngine<br>

+  Object<br>

+  Support<br>

+  native<br>

+  )<br>

+<br>

+add_kaleidoscope_chapter(Kaleidoscope-Ch9<br>

+  toy.cpp<br>

+  )<br>

+<br>

+export_executable_symbols(Kaleidoscope-Ch9)<br>

<br>

Added: llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp<br>

URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp?rev=274441&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp?rev=274441&view=auto</a><br>

==============================================================================<br>

--- llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp (added)<br>

+++ llvm/trunk/examples/Kaleidoscope/Chapter9/toy.cpp Sat Jul  2 12:01:59 2016<br>

@@ -0,0 +1,1445 @@<br>

+#include "llvm/ADT/STLExtras.h"<br>

+#include "llvm/Analysis/BasicAliasAnalysis.h"<br>

+#include "llvm/Analysis/Passes.h"<br>

+#include "llvm/IR/DIBuilder.h"<br>

+#include "llvm/IR/IRBuilder.h"<br>

+#include "llvm/IR/LLVMContext.h"<br>

+#include "llvm/IR/LegacyPassManager.h"<br>

+#include "llvm/IR/Module.h"<br>

+#include "llvm/IR/Verifier.h"<br>

+#include "llvm/Support/TargetSelect.h"<br>

+#include "llvm/Transforms/Scalar.h"<br>

+#include <cctype><br>

+#include <cstdio><br>

+#include <map><br>

+#include <string><br>

+#include <vector><br>

+#include "../include/KaleidoscopeJIT.h"<br>

+<br>

+using namespace llvm;<br>

+using namespace llvm::orc;<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Lexer<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+// The lexer returns tokens [0-255] if it is an unknown character, otherwise one<br>

+// of these for known things.<br>

+enum Token {<br>

+  tok_eof = -1,<br>

+<br>

+  // commands<br>

+  tok_def = -2,<br>

+  tok_extern = -3,<br>

+<br>

+  // primary<br>

+  tok_identifier = -4,<br>

+  tok_number = -5,<br>

+<br>

+  // control<br>

+  tok_if = -6,<br>

+  tok_then = -7,<br>

+  tok_else = -8,<br>

+  tok_for = -9,<br>

+  tok_in = -10,<br>

+<br>

+  // operators<br>

+  tok_binary = -11,<br>

+  tok_unary = -12,<br>

+<br>

+  // var definition<br>

+  tok_var = -13<br>

+};<br>

+<br>

+std::string getTokName(int Tok) {<br>

+  switch (Tok) {<br>

+  case tok_eof:<br>

+    return "eof";<br>

+  case tok_def:<br>

+    return "def";<br>

+  case tok_extern:<br>

+    return "extern";<br>

+  case tok_identifier:<br>

+    return "identifier";<br>

+  case tok_number:<br>

+    return "number";<br>

+  case tok_if:<br>

+    return "if";<br>

+  case tok_then:<br>

+    return "then";<br>

+  case tok_else:<br>

+    return "else";<br>

+  case tok_for:<br>

+    return "for";<br>

+  case tok_in:<br>

+    return "in";<br>

+  case tok_binary:<br>

+    return "binary";<br>

+  case tok_unary:<br>

+    return "unary";<br>

+  case tok_var:<br>

+    return "var";<br>

+  }<br>

+  return std::string(1, (char)Tok);<br>

+}<br>

+<br>

+namespace {<br>

+class PrototypeAST;<br>

+class ExprAST;<br>

+}<br>

+static LLVMContext TheContext;<br>

+static IRBuilder<> Builder(TheContext);<br>

+struct DebugInfo {<br>

+  DICompileUnit *TheCU;<br>

+  DIType *DblTy;<br>

+  std::vector<DIScope *> LexicalBlocks;<br>

+<br>

+  void emitLocation(ExprAST *AST);<br>

+  DIType *getDoubleTy();<br>

+} KSDbgInfo;<br>

+<br>

+struct SourceLocation {<br>

+  int Line;<br>

+  int Col;<br>

+};<br>

+static SourceLocation CurLoc;<br>

+static SourceLocation LexLoc = {1, 0};<br>

+<br>

+static int advance() {<br>

+  int LastChar = getchar();<br>

+<br>

+  if (LastChar == '\n' || LastChar == '\r') {<br>

+    LexLoc.Line++;<br>

+    LexLoc.Col = 0;<br>

+  } else<br>

+    LexLoc.Col++;<br>

+  return LastChar;<br>

+}<br>

+<br>

+static std::string IdentifierStr; // Filled in if tok_identifier<br>

+static double NumVal;             // Filled in if tok_number<br>

+<br>

+/// gettok - Return the next token from standard input.<br>

+static int gettok() {<br>

+  static int LastChar = ' ';<br>

+<br>

+  // Skip any whitespace.<br>

+  while (isspace(LastChar))<br>

+    LastChar = advance();<br>

+<br>

+  CurLoc = LexLoc;<br>

+<br>

+  if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*<br>

+    IdentifierStr = LastChar;<br>

+    while (isalnum((LastChar = advance())))<br>

+      IdentifierStr += LastChar;<br>

+<br>

+    if (IdentifierStr == "def")<br>

+      return tok_def;<br>

+    if (IdentifierStr == "extern")<br>

+      return tok_extern;<br>

+    if (IdentifierStr == "if")<br>

+      return tok_if;<br>

+    if (IdentifierStr == "then")<br>

+      return tok_then;<br>

+    if (IdentifierStr == "else")<br>

+      return tok_else;<br>

+    if (IdentifierStr == "for")<br>

+      return tok_for;<br>

+    if (IdentifierStr == "in")<br>

+      return tok_in;<br>

+    if (IdentifierStr == "binary")<br>

+      return tok_binary;<br>

+    if (IdentifierStr == "unary")<br>

+      return tok_unary;<br>

+    if (IdentifierStr == "var")<br>

+      return tok_var;<br>

+    return tok_identifier;<br>

+  }<br>

+<br>

+  if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+<br>

+    std::string NumStr;<br>

+    do {<br>

+      NumStr += LastChar;<br>

+      LastChar = advance();<br>

+    } while (isdigit(LastChar) || LastChar == '.');<br>

+<br>

+    NumVal = strtod(NumStr.c_str(), nullptr);<br>

+    return tok_number;<br>

+  }<br>

+<br>

+  if (LastChar == '#') {<br>

+    // Comment until end of line.<br>

+    do<br>

+      LastChar = advance();<br>

+    while (LastChar != EOF && LastChar != '\n' && LastChar != '\r');<br>

+<br>

+    if (LastChar != EOF)<br>

+      return gettok();<br>

+  }<br>

+<br>

+  // Check for end of file.  Don't eat the EOF.<br>

+  if (LastChar == EOF)<br>

+    return tok_eof;<br>

+<br>

+  // Otherwise, just return the character as its ascii value.<br>

+  int ThisChar = LastChar;<br>

+  LastChar = advance();<br>

+  return ThisChar;<br>

+}<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Abstract Syntax Tree (aka Parse Tree)<br>

+//===----------------------------------------------------------------------===//<br>

+namespace {<br>

+<br>

+raw_ostream &indent(raw_ostream &O, int size) {<br>

+  return O << std::string(size, ' ');<br>

+}<br>

+<br>

+/// ExprAST - Base class for all expression nodes.<br>

+class ExprAST {<br>

+  SourceLocation Loc;<br>

+<br>

+public:<br>

+  ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {}<br>

+  virtual ~ExprAST() {}<br>

+  virtual Value *codegen() = 0;<br>

+  int getLine() const { return Loc.Line; }<br>

+  int getCol() const { return Loc.Col; }<br>

+  virtual raw_ostream &dump(raw_ostream &out, int ind) {<br>

+    return out << ':' << getLine() << ':' << getCol() << '\n';<br>

+  }<br>

+};<br>

+<br>

+/// NumberExprAST - Expression class for numeric literals like "1.0".<br>

+class NumberExprAST : public ExprAST {<br>

+  double Val;<br>

+<br>

+public:<br>

+  NumberExprAST(double Val) : Val(Val) {}<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    return ExprAST::dump(out << Val, ind);<br>

+  }<br>

+  Value *codegen() override;<br>

+};<br>

+<br>

+/// VariableExprAST - Expression class for referencing a variable, like "a".<br>

+class VariableExprAST : public ExprAST {<br>

+  std::string Name;<br>

+<br>

+public:<br>

+  VariableExprAST(SourceLocation Loc, const std::string &Name)<br>

+      : ExprAST(Loc), Name(Name) {}<br>

+  const std::string &getName() const { return Name; }<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    return ExprAST::dump(out << Name, ind);<br>

+  }<br>

+};<br>

+<br>

+/// UnaryExprAST - Expression class for a unary operator.<br>

+class UnaryExprAST : public ExprAST {<br>

+  char Opcode;<br>

+  std::unique_ptr<ExprAST> Operand;<br>

+<br>

+public:<br>

+  UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)<br>

+      : Opcode(Opcode), Operand(std::move(Operand)) {}<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    ExprAST::dump(out << "unary" << Opcode, ind);<br>

+    Operand->dump(out, ind + 1);<br>

+    return out;<br>

+  }<br>

+};<br>

+<br>

+/// BinaryExprAST - Expression class for a binary operator.<br>

+class BinaryExprAST : public ExprAST {<br>

+  char Op;<br>

+  std::unique_ptr<ExprAST> LHS, RHS;<br>

+<br>

+public:<br>

+  BinaryExprAST(SourceLocation Loc, char Op, std::unique_ptr<ExprAST> LHS,<br>

+                std::unique_ptr<ExprAST> RHS)<br>

+      : ExprAST(Loc), Op(Op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    ExprAST::dump(out << "binary" << Op, ind);<br>

+    LHS->dump(indent(out, ind) << "LHS:", ind + 1);<br>

+    RHS->dump(indent(out, ind) << "RHS:", ind + 1);<br>

+    return out;<br>

+  }<br>

+};<br>

+<br>

+/// CallExprAST - Expression class for function calls.<br>

+class CallExprAST : public ExprAST {<br>

+  std::string Callee;<br>

+  std::vector<std::unique_ptr<ExprAST>> Args;<br>

+<br>

+public:<br>

+  CallExprAST(SourceLocation Loc, const std::string &Callee,<br>

+              std::vector<std::unique_ptr<ExprAST>> Args)<br>

+      : ExprAST(Loc), Callee(Callee), Args(std::move(Args)) {}<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    ExprAST::dump(out << "call " << Callee, ind);<br>

+    for (const auto &Arg : Args)<br>

+      Arg->dump(indent(out, ind + 1), ind + 1);<br>

+    return out;<br>

+  }<br>

+};<br>

+<br>

+/// IfExprAST - Expression class for if/then/else.<br>

+class IfExprAST : public ExprAST {<br>

+  std::unique_ptr<ExprAST> Cond, Then, Else;<br>

+<br>

+public:<br>

+  IfExprAST(SourceLocation Loc, std::unique_ptr<ExprAST> Cond,<br>

+            std::unique_ptr<ExprAST> Then, std::unique_ptr<ExprAST> Else)<br>

+      : ExprAST(Loc), Cond(std::move(Cond)), Then(std::move(Then)),<br>

+        Else(std::move(Else)) {}<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    ExprAST::dump(out << "if", ind);<br>

+    Cond->dump(indent(out, ind) << "Cond:", ind + 1);<br>

+    Then->dump(indent(out, ind) << "Then:", ind + 1);<br>

+    Else->dump(indent(out, ind) << "Else:", ind + 1);<br>

+    return out;<br>

+  }<br>

+};<br>

+<br>

+/// ForExprAST - Expression class for for/in.<br>

+class ForExprAST : public ExprAST {<br>

+  std::string VarName;<br>

+  std::unique_ptr<ExprAST> Start, End, Step, Body;<br>

+<br>

+public:<br>

+  ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start,<br>

+             std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step,<br>

+             std::unique_ptr<ExprAST> Body)<br>

+      : VarName(VarName), Start(std::move(Start)), End(std::move(End)),<br>

+        Step(std::move(Step)), Body(std::move(Body)) {}<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    ExprAST::dump(out << "for", ind);<br>

+    Start->dump(indent(out, ind) << "Cond:", ind + 1);<br>

+    End->dump(indent(out, ind) << "End:", ind + 1);<br>

+    Step->dump(indent(out, ind) << "Step:", ind + 1);<br>

+    Body->dump(indent(out, ind) << "Body:", ind + 1);<br>

+    return out;<br>

+  }<br>

+};<br>

+<br>

+/// VarExprAST - Expression class for var/in<br>

+class VarExprAST : public ExprAST {<br>

+  std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;<br>

+  std::unique_ptr<ExprAST> Body;<br>

+<br>

+public:<br>

+  VarExprAST(<br>

+      std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,<br>

+      std::unique_ptr<ExprAST> Body)<br>

+      : VarNames(std::move(VarNames)), Body(std::move(Body)) {}<br>

+  Value *codegen() override;<br>

+  raw_ostream &dump(raw_ostream &out, int ind) override {<br>

+    ExprAST::dump(out << "var", ind);<br>

+    for (const auto &NamedVar : VarNames)<br>

+      NamedVar.second->dump(indent(out, ind) << NamedVar.first << ':', ind + 1);<br>

+    Body->dump(indent(out, ind) << "Body:", ind + 1);<br>

+    return out;<br>

+  }<br>

+};<br>

+<br>

+/// PrototypeAST - This class represents the "prototype" for a function,<br>

+/// which captures its name, and its argument names (thus implicitly the number<br>

+/// of arguments the function takes), as well as if it is an operator.<br>

+class PrototypeAST {<br>

+  std::string Name;<br>

+  std::vector<std::string> Args;<br>

+  bool IsOperator;<br>

+  unsigned Precedence; // Precedence if a binary op.<br>

+  int Line;<br>

+<br>

+public:<br>

+  PrototypeAST(SourceLocation Loc, const std::string &Name,<br>

+               std::vector<std::string> Args, bool IsOperator = false,<br>

+               unsigned Prec = 0)<br>

+      : Name(Name), Args(std::move(Args)), IsOperator(IsOperator),<br>

+        Precedence(Prec), Line(Loc.Line) {}<br>

+  Function *codegen();<br>

+  const std::string &getName() const { return Name; }<br>

+<br>

+  bool isUnaryOp() const { return IsOperator && Args.size() == 1; }<br>

+  bool isBinaryOp() const { return IsOperator && Args.size() == 2; }<br>

+<br>

+  char getOperatorName() const {<br>

+    assert(isUnaryOp() || isBinaryOp());<br>

+    return Name[Name.size() - 1];<br>

+  }<br>

+<br>

+  unsigned getBinaryPrecedence() const { return Precedence; }<br>

+  int getLine() const { return Line; }<br>

+};<br>

+<br>

+/// FunctionAST - This class represents a function definition itself.<br>

+class FunctionAST {<br>

+  std::unique_ptr<PrototypeAST> Proto;<br>

+  std::unique_ptr<ExprAST> Body;<br>

+<br>

+public:<br>

+  FunctionAST(std::unique_ptr<PrototypeAST> Proto,<br>

+              std::unique_ptr<ExprAST> Body)<br>

+      : Proto(std::move(Proto)), Body(std::move(Body)) {}<br>

+  Function *codegen();<br>

+  raw_ostream &dump(raw_ostream &out, int ind) {<br>

+    indent(out, ind) << "FunctionAST\n";<br>

+    ++ind;<br>

+    indent(out, ind) << "Body:";<br>

+    return Body ? Body->dump(out, ind) : out << "null\n";<br>

+  }<br>

+};<br>

+} // end anonymous namespace<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Parser<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+/// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current<br>

+/// token the parser is looking at.  getNextToken reads another token from the<br>

+/// lexer and updates CurTok with its results.<br>

+static int CurTok;<br>

+static int getNextToken() { return CurTok = gettok(); }<br>

+<br>

+/// BinopPrecedence - This holds the precedence for each binary operator that is<br>

+/// defined.<br>

+static std::map<char, int> BinopPrecedence;<br>

+<br>

+/// GetTokPrecedence - Get the precedence of the pending binary operator token.<br>

+static int GetTokPrecedence() {<br>

+  if (!isascii(CurTok))<br>

+    return -1;<br>

+<br>

+  // Make sure it's a declared binop.<br>

+  int TokPrec = BinopPrecedence[CurTok];<br>

+  if (TokPrec <= 0)<br>

+    return -1;<br>

+  return TokPrec;<br>

+}<br>

+<br>

+/// LogError* - These are little helper functions for error handling.<br>

+std::unique_ptr<ExprAST> LogError(const char *Str) {<br>

+  fprintf(stderr, "Error: %s\n", Str);<br>

+  return nullptr;<br>

+}<br>

+<br>

+std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {<br>

+  LogError(Str);<br>

+  return nullptr;<br>

+}<br>

+<br>

+static std::unique_ptr<ExprAST> ParseExpression();<br>

+<br>

+/// numberexpr ::= number<br>

+static std::unique_ptr<ExprAST> ParseNumberExpr() {<br>

+  auto Result = llvm::make_unique<NumberExprAST>(NumVal);<br>

+  getNextToken(); // consume the number<br>

+  return std::move(Result);<br>

+}<br>

+<br>

+/// parenexpr ::= '(' expression ')'<br>

+static std::unique_ptr<ExprAST> ParseParenExpr() {<br>

+  getNextToken(); // eat (.<br>

+  auto V = ParseExpression();<br>

+  if (!V)<br>

+    return nullptr;<br>

+<br>

+  if (CurTok != ')')<br>

+    return LogError("expected ')'");<br>

+  getNextToken(); // eat ).<br>

+  return V;<br>

+}<br>

+<br>

+/// identifierexpr<br>

+///   ::= identifier<br>

+///   ::= identifier '(' expression* ')'<br>

+static std::unique_ptr<ExprAST> ParseIdentifierExpr() {<br>

+  std::string IdName = IdentifierStr;<br>

+<br>

+  SourceLocation LitLoc = CurLoc;<br>

+<br>

+  getNextToken(); // eat identifier.<br>

+<br>

+  if (CurTok != '(') // Simple variable ref.<br>

+    return llvm::make_unique<VariableExprAST>(LitLoc, IdName);<br>

+<br>

+  // Call.<br>

+  getNextToken(); // eat (<br>

+  std::vector<std::unique_ptr<ExprAST>> Args;<br>

+  if (CurTok != ')') {<br>

+    while (1) {<br>

+      if (auto Arg = ParseExpression())<br>

+        Args.push_back(std::move(Arg));<br>

+      else<br>

+        return nullptr;<br>

+<br>

+      if (CurTok == ')')<br>

+        break;<br>

+<br>

+      if (CurTok != ',')<br>

+        return LogError("Expected ')' or ',' in argument list");<br>

+      getNextToken();<br>

+    }<br>

+  }<br>

+<br>

+  // Eat the ')'.<br>

+  getNextToken();<br>

+<br>

+  return llvm::make_unique<CallExprAST>(LitLoc, IdName, std::move(Args));<br>

+}<br>

+<br>

+/// ifexpr ::= 'if' expression 'then' expression 'else' expression<br>

+static std::unique_ptr<ExprAST> ParseIfExpr() {<br>

+  SourceLocation IfLoc = CurLoc;<br>

+<br>

+  getNextToken(); // eat the if.<br>

+<br>

+  // condition.<br>

+  auto Cond = ParseExpression();<br>

+  if (!Cond)<br>

+    return nullptr;<br>

+<br>

+  if (CurTok != tok_then)<br>

+    return LogError("expected then");<br>

+  getNextToken(); // eat the then<br>

+<br>

+  auto Then = ParseExpression();<br>

+  if (!Then)<br>

+    return nullptr;<br>

+<br>

+  if (CurTok != tok_else)<br>

+    return LogError("expected else");<br>

+<br>

+  getNextToken();<br>

+<br>

+  auto Else = ParseExpression();<br>

+  if (!Else)<br>

+    return nullptr;<br>

+<br>

+  return llvm::make_unique<IfExprAST>(IfLoc, std::move(Cond), std::move(Then),<br>

+                                      std::move(Else));<br>

+}<br>

+<br>

+/// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression<br>

+static std::unique_ptr<ExprAST> ParseForExpr() {<br>

+  getNextToken(); // eat the for.<br>

+<br>

+  if (CurTok != tok_identifier)<br>

+    return LogError("expected identifier after for");<br>

+<br>

+  std::string IdName = IdentifierStr;<br>

+  getNextToken(); // eat identifier.<br>

+<br>

+  if (CurTok != '=')<br>

+    return LogError("expected '=' after for");<br>

+  getNextToken(); // eat '='.<br>

+<br>

+  auto Start = ParseExpression();<br>

+  if (!Start)<br>

+    return nullptr;<br>

+  if (CurTok != ',')<br>

+    return LogError("expected ',' after for start value");<br>

+  getNextToken();<br>

+<br>

+  auto End = ParseExpression();<br>

+  if (!End)<br>

+    return nullptr;<br>

+<br>

+  // The step value is optional.<br>

+  std::unique_ptr<ExprAST> Step;<br>

+  if (CurTok == ',') {<br>

+    getNextToken();<br>

+    Step = ParseExpression();<br>

+    if (!Step)<br>

+      return nullptr;<br>

+  }<br>

+<br>

+  if (CurTok != tok_in)<br>

+    return LogError("expected 'in' after for");<br>

+  getNextToken(); // eat 'in'.<br>

+<br>

+  auto Body = ParseExpression();<br>

+  if (!Body)<br>

+    return nullptr;<br>

+<br>

+  return llvm::make_unique<ForExprAST>(IdName, std::move(Start), std::move(End),<br>

+                                       std::move(Step), std::move(Body));<br>

+}<br>

+<br>

+/// varexpr ::= 'var' identifier ('=' expression)?<br>

+//                    (',' identifier ('=' expression)?)* 'in' expression<br>

+static std::unique_ptr<ExprAST> ParseVarExpr() {<br>

+  getNextToken(); // eat the var.<br>

+<br>

+  std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;<br>

+<br>

+  // At least one variable name is required.<br>

+  if (CurTok != tok_identifier)<br>

+    return LogError("expected identifier after var");<br>

+<br>

+  while (1) {<br>

+    std::string Name = IdentifierStr;<br>

+    getNextToken(); // eat identifier.<br>

+<br>

+    // Read the optional initializer.<br>

+    std::unique_ptr<ExprAST> Init = nullptr;<br>

+    if (CurTok == '=') {<br>

+      getNextToken(); // eat the '='.<br>

+<br>

+      Init = ParseExpression();<br>

+      if (!Init)<br>

+        return nullptr;<br>

+    }<br>

+<br>

+    VarNames.push_back(std::make_pair(Name, std::move(Init)));<br>

+<br>

+    // End of var list, exit loop.<br>

+    if (CurTok != ',')<br>

+      break;<br>

+    getNextToken(); // eat the ','.<br>

+<br>

+    if (CurTok != tok_identifier)<br>

+      return LogError("expected identifier list after var");<br>

+  }<br>

+<br>

+  // At this point, we have to have 'in'.<br>

+  if (CurTok != tok_in)<br>

+    return LogError("expected 'in' keyword after 'var'");<br>

+  getNextToken(); // eat 'in'.<br>

+<br>

+  auto Body = ParseExpression();<br>

+  if (!Body)<br>

+    return nullptr;<br>

+<br>

+  return llvm::make_unique<VarExprAST>(std::move(VarNames), std::move(Body));<br>

+}<br>

+<br>

+/// primary<br>

+///   ::= identifierexpr<br>

+///   ::= numberexpr<br>

+///   ::= parenexpr<br>

+///   ::= ifexpr<br>

+///   ::= forexpr<br>

+///   ::= varexpr<br>

+static std::unique_ptr<ExprAST> ParsePrimary() {<br>

+  switch (CurTok) {<br>

+  default:<br>

+    return LogError("unknown token when expecting an expression");<br>

+  case tok_identifier:<br>

+    return ParseIdentifierExpr();<br>

+  case tok_number:<br>

+    return ParseNumberExpr();<br>

+  case '(':<br>

+    return ParseParenExpr();<br>

+  case tok_if:<br>

+    return ParseIfExpr();<br>

+  case tok_for:<br>

+    return ParseForExpr();<br>

+  case tok_var:<br>

+    return ParseVarExpr();<br>

+  }<br>

+}<br>

+<br>

+/// unary<br>

+///   ::= primary<br>

+///   ::= '!' unary<br>

+static std::unique_ptr<ExprAST> ParseUnary() {<br>

+  // If the current token is not an operator, it must be a primary expr.<br>

+  if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')<br>

+    return ParsePrimary();<br>

+<br>

+  // If this is a unary operator, read it.<br>

+  int Opc = CurTok;<br>

+  getNextToken();<br>

+  if (auto Operand = ParseUnary())<br>

+    return llvm::make_unique<UnaryExprAST>(Opc, std::move(Operand));<br>

+  return nullptr;<br>

+}<br>

+<br>

+/// binoprhs<br>

+///   ::= ('+' unary)*<br>

+static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,<br>

+                                              std::unique_ptr<ExprAST> LHS) {<br>

+  // If this is a binop, find its precedence.<br>

+  while (1) {<br>

+    int TokPrec = GetTokPrecedence();<br>

+<br>

+    // If this is a binop that binds at least as tightly as the current binop,<br>

+    // consume it, otherwise we are done.<br>

+    if (TokPrec < ExprPrec)<br>

+      return LHS;<br>

+<br>

+    // Okay, we know this is a binop.<br>

+    int BinOp = CurTok;<br>

+    SourceLocation BinLoc = CurLoc;<br>

+    getNextToken(); // eat binop<br>

+<br>

+    // Parse the unary expression after the binary operator.<br>

+    auto RHS = ParseUnary();<br>

+    if (!RHS)<br>

+      return nullptr;<br>

+<br>

+    // If BinOp binds less tightly with RHS than the operator after RHS, let<br>

+    // the pending operator take RHS as its LHS.<br>

+    int NextPrec = GetTokPrecedence();<br>

+    if (TokPrec < NextPrec) {<br>

+      RHS = ParseBinOpRHS(TokPrec + 1, std::move(RHS));<br>

+      if (!RHS)<br>

+        return nullptr;<br>

+    }<br>

+<br>

+    // Merge LHS/RHS.<br>

+    LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS),<br>

+                                           std::move(RHS));<br>

+  }<br>

+}<br>

+<br>

+/// expression<br>

+///   ::= unary binoprhs<br>

+///<br>

+static std::unique_ptr<ExprAST> ParseExpression() {<br>

+  auto LHS = ParseUnary();<br>

+  if (!LHS)<br>

+    return nullptr;<br>

+<br>

+  return ParseBinOpRHS(0, std::move(LHS));<br>

+}<br>

+<br>

+/// prototype<br>

+///   ::= id '(' id* ')'<br>

+///   ::= binary LETTER number? (id, id)<br>

+///   ::= unary LETTER (id)<br>

+static std::unique_ptr<PrototypeAST> ParsePrototype() {<br>

+  std::string FnName;<br>

+<br>

+  SourceLocation FnLoc = CurLoc;<br>

+<br>

+  unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.<br>

+  unsigned BinaryPrecedence = 30;<br>

+<br>

+  switch (CurTok) {<br>

+  default:<br>

+    return LogErrorP("Expected function name in prototype");<br>

+  case tok_identifier:<br>

+    FnName = IdentifierStr;<br>

+    Kind = 0;<br>

+    getNextToken();<br>

+    break;<br>

+  case tok_unary:<br>

+    getNextToken();<br>

+    if (!isascii(CurTok))<br>

+      return LogErrorP("Expected unary operator");<br>

+    FnName = "unary";<br>

+    FnName += (char)CurTok;<br>

+    Kind = 1;<br>

+    getNextToken();<br>

+    break;<br>

+  case tok_binary:<br>

+    getNextToken();<br>

+    if (!isascii(CurTok))<br>

+      return LogErrorP("Expected binary operator");<br>

+    FnName = "binary";<br>

+    FnName += (char)CurTok;<br>

+    Kind = 2;<br>

+    getNextToken();<br>

+<br>

+    // Read the precedence if present.<br>

+    if (CurTok == tok_number) {<br>

+      if (NumVal < 1 || NumVal > 100)<br>

+        return LogErrorP("Invalid precedecnce: must be 1..100");<br>

+      BinaryPrecedence = (unsigned)NumVal;<br>

+      getNextToken();<br>

+    }<br>

+    break;<br>

+  }<br>

+<br>

+  if (CurTok != '(')<br>

+    return LogErrorP("Expected '(' in prototype");<br>

+<br>

+  std::vector<std::string> ArgNames;<br>

+  while (getNextToken() == tok_identifier)<br>

+    ArgNames.push_back(IdentifierStr);<br>

+  if (CurTok != ')')<br>

+    return LogErrorP("Expected ')' in prototype");<br>

+<br>

+  // success.<br>

+  getNextToken(); // eat ')'.<br>

+<br>

+  // Verify right number of names for operator.<br>

+  if (Kind && ArgNames.size() != Kind)<br>

+    return LogErrorP("Invalid number of operands for operator");<br>

+<br>

+  return llvm::make_unique<PrototypeAST>(FnLoc, FnName, ArgNames, Kind != 0,<br>

+                                         BinaryPrecedence);<br>

+}<br>

+<br>

+/// definition ::= 'def' prototype expression<br>

+static std::unique_ptr<FunctionAST> ParseDefinition() {<br>

+  getNextToken(); // eat def.<br>

+  auto Proto = ParsePrototype();<br>

+  if (!Proto)<br>

+    return nullptr;<br>

+<br>

+  if (auto E = ParseExpression())<br>

+    return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

+  return nullptr;<br>

+}<br>

+<br>

+/// toplevelexpr ::= expression<br>

+static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {<br>

+  SourceLocation FnLoc = CurLoc;<br>

+  if (auto E = ParseExpression()) {<br>

+    // Make an anonymous proto.<br>

+    auto Proto = llvm::make_unique<PrototypeAST>(FnLoc, "__anon_expr",<br>

+                                                 std::vector<std::string>());<br>

+    return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E));<br>

+  }<br>

+  return nullptr;<br>

+}<br>

+<br>

+/// external ::= 'extern' prototype<br>

+static std::unique_ptr<PrototypeAST> ParseExtern() {<br>

+  getNextToken(); // eat extern.<br>

+  return ParsePrototype();<br>

+}<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Debug Info Support<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+static std::unique_ptr<DIBuilder> DBuilder;<br>

+<br>

+DIType *DebugInfo::getDoubleTy() {<br>

+  if (DblTy)<br>

+    return DblTy;<br>

+<br>

+  DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float);<br>

+  return DblTy;<br>

+}<br>

+<br>

+void DebugInfo::emitLocation(ExprAST *AST) {<br>

+  if (!AST)<br>

+    return Builder.SetCurrentDebugLocation(DebugLoc());<br>

+  DIScope *Scope;<br>

+  if (LexicalBlocks.empty())<br>

+    Scope = TheCU;<br>

+  else<br>

+    Scope = LexicalBlocks.back();<br>

+  Builder.SetCurrentDebugLocation(<br>

+      DebugLoc::get(AST->getLine(), AST->getCol(), Scope));<br>

+}<br>

+<br>

+static DISubroutineType *CreateFunctionType(unsigned NumArgs, DIFile *Unit) {<br>

+  SmallVector<Metadata *, 8> EltTys;<br>

+  DIType *DblTy = KSDbgInfo.getDoubleTy();<br>

+<br>

+  // Add the result type.<br>

+  EltTys.push_back(DblTy);<br>

+<br>

+  for (unsigned i = 0, e = NumArgs; i != e; ++i)<br>

+    EltTys.push_back(DblTy);<br>

+<br>

+  return DBuilder->createSubroutineType(DBuilder->getOrCreateTypeArray(EltTys));<br>

+}<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Code Generation<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+static std::unique_ptr<Module> TheModule;<br>

+static std::map<std::string, AllocaInst *> NamedValues;<br>

+static std::unique_ptr<KaleidoscopeJIT> TheJIT;<br>

+static std::map<std::string, std::unique_ptr<PrototypeAST>> FunctionProtos;<br>

+<br>

+Value *LogErrorV(const char *Str) {<br>

+  LogError(Str);<br>

+  return nullptr;<br>

+}<br>

+<br>

+Function *getFunction(std::string Name) {<br>

+  // First, see if the function has already been added to the current module.<br>

+  if (auto *F = TheModule->getFunction(Name))<br>

+    return F;<br>

+<br>

+  // If not, check whether we can codegen the declaration from some existing<br>

+  // prototype.<br>

+  auto FI = FunctionProtos.find(Name);<br>

+  if (FI != FunctionProtos.end())<br>

+    return FI->second->codegen();<br>

+<br>

+  // If no existing prototype exists, return null.<br>

+  return nullptr;<br>

+}<br>

+<br>

+/// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of<br>

+/// the function.  This is used for mutable variables etc.<br>

+static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,<br>

+                                          const std::string &VarName) {<br>

+  IRBuilder<> TmpB(&TheFunction->getEntryBlock(),<br>

+                   TheFunction->getEntryBlock().begin());<br>

+  return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), nullptr,<br>

+                           VarName.c_str());<br>

+}<br>

+<br>

+Value *NumberExprAST::codegen() {<br>

+  KSDbgInfo.emitLocation(this);<br>

+  return ConstantFP::get(TheContext, APFloat(Val));<br>

+}<br>

+<br>

+Value *VariableExprAST::codegen() {<br>

+  // Look this variable up in the function.<br>

+  Value *V = NamedValues[Name];<br>

+  if (!V)<br>

+    return LogErrorV("Unknown variable name");<br>

+<br>

+  KSDbgInfo.emitLocation(this);<br>

+  // Load the value.<br>

+  return Builder.CreateLoad(V, Name.c_str());<br>

+}<br>

+<br>

+Value *UnaryExprAST::codegen() {<br>

+  Value *OperandV = Operand->codegen();<br>

+  if (!OperandV)<br>

+    return nullptr;<br>

+<br>

+  Function *F = getFunction(std::string("unary") + Opcode);<br>

+  if (!F)<br>

+    return LogErrorV("Unknown unary operator");<br>

+<br>

+  KSDbgInfo.emitLocation(this);<br>

+  return Builder.CreateCall(F, OperandV, "unop");<br>

+}<br>

+<br>

+Value *BinaryExprAST::codegen() {<br>

+  KSDbgInfo.emitLocation(this);<br>

+<br>

+  // Special case '=' because we don't want to emit the LHS as an expression.<br>

+  if (Op == '=') {<br>

+    // Assignment requires the LHS to be an identifier.<br>

+    // This assume we're building without RTTI because LLVM builds that way by<br>

+    // default.  If you build LLVM with RTTI this can be changed to a<br>

+    // dynamic_cast for automatic error checking.<br>

+    VariableExprAST *LHSE = static_cast<VariableExprAST *>(LHS.get());<br>

+    if (!LHSE)<br>

+      return LogErrorV("destination of '=' must be a variable");<br>

+    // Codegen the RHS.<br>

+    Value *Val = RHS->codegen();<br>

+    if (!Val)<br>

+      return nullptr;<br>

+<br>

+    // Look up the name.<br>

+    Value *Variable = NamedValues[LHSE->getName()];<br>

+    if (!Variable)<br>

+      return LogErrorV("Unknown variable name");<br>

+<br>

+    Builder.CreateStore(Val, Variable);<br>

+    return Val;<br>

+  }<br>

+<br>

+  Value *L = LHS->codegen();<br>

+  Value *R = RHS->codegen();<br>

+  if (!L || !R)<br>

+    return nullptr;<br>

+<br>

+  switch (Op) {<br>

+  case '+':<br>

+    return Builder.CreateFAdd(L, R, "addtmp");<br>

+  case '-':<br>

+    return Builder.CreateFSub(L, R, "subtmp");<br>

+  case '*':<br>

+    return Builder.CreateFMul(L, R, "multmp");<br>

+  case '<':<br>

+    L = Builder.CreateFCmpULT(L, R, "cmptmp");<br>

+    // Convert bool 0/1 to double 0.0 or 1.0<br>

+    return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext), "booltmp");<br>

+  default:<br>

+    break;<br>

+  }<br>

+<br>

+  // If it wasn't a builtin binary operator, it must be a user defined one. Emit<br>

+  // a call to it.<br>

+  Function *F = getFunction(std::string("binary") + Op);<br>

+  assert(F && "binary operator not found!");<br>

+<br>

+  Value *Ops[] = {L, R};<br>

+  return Builder.CreateCall(F, Ops, "binop");<br>

+}<br>

+<br>

+Value *CallExprAST::codegen() {<br>

+  KSDbgInfo.emitLocation(this);<br>

+<br>

+  // Look up the name in the global module table.<br>

+  Function *CalleeF = getFunction(Callee);<br>

+  if (!CalleeF)<br>

+    return LogErrorV("Unknown function referenced");<br>

+<br>

+  // If argument mismatch error.<br>

+  if (CalleeF->arg_size() != Args.size())<br>

+    return LogErrorV("Incorrect # arguments passed");<br>

+<br>

+  std::vector<Value *> ArgsV;<br>

+  for (unsigned i = 0, e = Args.size(); i != e; ++i) {<br>

+    ArgsV.push_back(Args[i]->codegen());<br>

+    if (!ArgsV.back())<br>

+      return nullptr;<br>

+  }<br>

+<br>

+  return Builder.CreateCall(CalleeF, ArgsV, "calltmp");<br>

+}<br>

+<br>

+Value *IfExprAST::codegen() {<br>

+  KSDbgInfo.emitLocation(this);<br>

+<br>

+  Value *CondV = Cond->codegen();<br>

+  if (!CondV)<br>

+    return nullptr;<br>

+<br>

+  // Convert condition to a bool by comparing equal to 0.0.<br>

+  CondV = Builder.CreateFCmpONE(<br>

+      CondV, ConstantFP::get(TheContext, APFloat(0.0)), "ifcond");<br>

+<br>

+  Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+<br>

+  // Create blocks for the then and else cases.  Insert the 'then' block at the<br>

+  // end of the function.<br>

+  BasicBlock *ThenBB = BasicBlock::Create(TheContext, "then", TheFunction);<br>

+  BasicBlock *ElseBB = BasicBlock::Create(TheContext, "else");<br>

+  BasicBlock *MergeBB = BasicBlock::Create(TheContext, "ifcont");<br>

+<br>

+  Builder.CreateCondBr(CondV, ThenBB, ElseBB);<br>

+<br>

+  // Emit then value.<br>

+  Builder.SetInsertPoint(ThenBB);<br>

+<br>

+  Value *ThenV = Then->codegen();<br>

+  if (!ThenV)<br>

+    return nullptr;<br>

+<br>

+  Builder.CreateBr(MergeBB);<br>

+  // Codegen of 'Then' can change the current block, update ThenBB for the PHI.<br>

+  ThenBB = Builder.GetInsertBlock();<br>

+<br>

+  // Emit else block.<br>

+  TheFunction->getBasicBlockList().push_back(ElseBB);<br>

+  Builder.SetInsertPoint(ElseBB);<br>

+<br>

+  Value *ElseV = Else->codegen();<br>

+  if (!ElseV)<br>

+    return nullptr;<br>

+<br>

+  Builder.CreateBr(MergeBB);<br>

+  // Codegen of 'Else' can change the current block, update ElseBB for the PHI.<br>

+  ElseBB = Builder.GetInsertBlock();<br>

+<br>

+  // Emit merge block.<br>

+  TheFunction->getBasicBlockList().push_back(MergeBB);<br>

+  Builder.SetInsertPoint(MergeBB);<br>

+  PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(TheContext), 2, "iftmp");<br>

+<br>

+  PN->addIncoming(ThenV, ThenBB);<br>

+  PN->addIncoming(ElseV, ElseBB);<br>

+  return PN;<br>

+}<br>

+<br>

+// Output for-loop as:<br>

+//   var = alloca double<br>

+//   ...<br>

+//   start = startexpr<br>

+//   store start -> var<br>

+//   goto loop<br>

+// loop:<br>

+//   ...<br>

+//   bodyexpr<br>

+//   ...<br>

+// loopend:<br>

+//   step = stepexpr<br>

+//   endcond = endexpr<br>

+//<br>

+//   curvar = load var<br>

+//   nextvar = curvar + step<br>

+//   store nextvar -> var<br>

+//   br endcond, loop, endloop<br>

+// outloop:<br>

+Value *ForExprAST::codegen() {<br>

+  Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+<br>

+  // Create an alloca for the variable in the entry block.<br>

+  AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

+<br>

+  KSDbgInfo.emitLocation(this);<br>

+<br>

+  // Emit the start code first, without 'variable' in scope.<br>

+  Value *StartVal = Start->codegen();<br>

+  if (!StartVal)<br>

+    return nullptr;<br>

+<br>

+  // Store the value into the alloca.<br>

+  Builder.CreateStore(StartVal, Alloca);<br>

+<br>

+  // Make the new basic block for the loop header, inserting after current<br>

+  // block.<br>

+  BasicBlock *LoopBB = BasicBlock::Create(TheContext, "loop", TheFunction);<br>

+<br>

+  // Insert an explicit fall through from the current block to the LoopBB.<br>

+  Builder.CreateBr(LoopBB);<br>

+<br>

+  // Start insertion in LoopBB.<br>

+  Builder.SetInsertPoint(LoopBB);<br>

+<br>

+  // Within the loop, the variable is defined equal to the PHI node.  If it<br>

+  // shadows an existing variable, we have to restore it, so save it now.<br>

+  AllocaInst *OldVal = NamedValues[VarName];<br>

+  NamedValues[VarName] = Alloca;<br>

+<br>

+  // Emit the body of the loop.  This, like any other expr, can change the<br>

+  // current BB.  Note that we ignore the value computed by the body, but don't<br>

+  // allow an error.<br>

+  if (!Body->codegen())<br>

+    return nullptr;<br>

+<br>

+  // Emit the step value.<br>

+  Value *StepVal = nullptr;<br>

+  if (Step) {<br>

+    StepVal = Step->codegen();<br>

+    if (!StepVal)<br>

+      return nullptr;<br>

+  } else {<br>

+    // If not specified, use 1.0.<br>

+    StepVal = ConstantFP::get(TheContext, APFloat(1.0));<br>

+  }<br>

+<br>

+  // Compute the end condition.<br>

+  Value *EndCond = End->codegen();<br>

+  if (!EndCond)<br>

+    return nullptr;<br>

+<br>

+  // Reload, increment, and restore the alloca.  This handles the case where<br>

+  // the body of the loop mutates the variable.<br>

+  Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());<br>

+  Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");<br>

+  Builder.CreateStore(NextVar, Alloca);<br>

+<br>

+  // Convert condition to a bool by comparing equal to 0.0.<br>

+  EndCond = Builder.CreateFCmpONE(<br>

+      EndCond, ConstantFP::get(TheContext, APFloat(0.0)), "loopcond");<br>

+<br>

+  // Create the "after loop" block and insert it.<br>

+  BasicBlock *AfterBB =<br>

+      BasicBlock::Create(TheContext, "afterloop", TheFunction);<br>

+<br>

+  // Insert the conditional branch into the end of LoopEndBB.<br>

+  Builder.CreateCondBr(EndCond, LoopBB, AfterBB);<br>

+<br>

+  // Any new code will be inserted in AfterBB.<br>

+  Builder.SetInsertPoint(AfterBB);<br>

+<br>

+  // Restore the unshadowed variable.<br>

+  if (OldVal)<br>

+    NamedValues[VarName] = OldVal;<br>

+  else<br>

+    NamedValues.erase(VarName);<br>

+<br>

+  // for expr always returns 0.0.<br>

+  return Constant::getNullValue(Type::getDoubleTy(TheContext));<br>

+}<br>

+<br>

+Value *VarExprAST::codegen() {<br>

+  std::vector<AllocaInst *> OldBindings;<br>

+<br>

+  Function *TheFunction = Builder.GetInsertBlock()->getParent();<br>

+<br>

+  // Register all variables and emit their initializer.<br>

+  for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {<br>

+    const std::string &VarName = VarNames[i].first;<br>

+    ExprAST *Init = VarNames[i].second.get();<br>

+<br>

+    // Emit the initializer before adding the variable to scope, this prevents<br>

+    // the initializer from referencing the variable itself, and permits stuff<br>

+    // like this:<br>

+    //  var a = 1 in<br>

+    //    var a = a in ...   # refers to outer 'a'.<br>

+    Value *InitVal;<br>

+    if (Init) {<br>

+      InitVal = Init->codegen();<br>

+      if (!InitVal)<br>

+        return nullptr;<br>

+    } else { // If not specified, use 0.0.<br>

+      InitVal = ConstantFP::get(TheContext, APFloat(0.0));<br>

+    }<br>

+<br>

+    AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);<br>

+    Builder.CreateStore(InitVal, Alloca);<br>

+<br>

+    // Remember the old variable binding so that we can restore the binding when<br>

+    // we unrecurse.<br>

+    OldBindings.push_back(NamedValues[VarName]);<br>

+<br>

+    // Remember this binding.<br>

+    NamedValues[VarName] = Alloca;<br>

+  }<br>

+<br>

+  KSDbgInfo.emitLocation(this);<br>

+<br>

+  // Codegen the body, now that all vars are in scope.<br>

+  Value *BodyVal = Body->codegen();<br>

+  if (!BodyVal)<br>

+    return nullptr;<br>

+<br>

+  // Pop all our variables from scope.<br>

+  for (unsigned i = 0, e = VarNames.size(); i != e; ++i)<br>

+    NamedValues[VarNames[i].first] = OldBindings[i];<br>

+<br>

+  // Return the body computation.<br>

+  return BodyVal;<br>

+}<br>

+<br>

+Function *PrototypeAST::codegen() {<br>

+  // Make the function type:  double(double,double) etc.<br>

+  std::vector<Type *> Doubles(Args.size(), Type::getDoubleTy(TheContext));<br>

+  FunctionType *FT =<br>

+      FunctionType::get(Type::getDoubleTy(TheContext), Doubles, false);<br>

+<br>

+  Function *F =<br>

+      Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get());<br>

+<br>

+  // Set names for all arguments.<br>

+  unsigned Idx = 0;<br>

+  for (auto &Arg : F->args())<br>

+    Arg.setName(Args[Idx++]);<br>

+<br>

+  return F;<br>

+}<br>

+<br>

+Function *FunctionAST::codegen() {<br>

+  // Transfer ownership of the prototype to the FunctionProtos map, but keep a<br>

+  // reference to it for use below.<br>

+  auto &P = *Proto;<br>

+  FunctionProtos[Proto->getName()] = std::move(Proto);<br>

+  Function *TheFunction = getFunction(P.getName());<br>

+  if (!TheFunction)<br>

+    return nullptr;<br>

+<br>

+  // If this is an operator, install it.<br>

+  if (P.isBinaryOp())<br>

+    BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence();<br>

+<br>

+  // Create a new basic block to start insertion into.<br>

+  BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction);<br>

+  Builder.SetInsertPoint(BB);<br>

+<br>

+  // Create a subprogram DIE for this function.<br>

+  DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU->getFilename(),<br>

+                                      KSDbgInfo.TheCU->getDirectory());<br>

+  DIScope *FContext = Unit;<br>

+  unsigned LineNo = P.getLine();<br>

+  unsigned ScopeLine = LineNo;<br>

+  DISubprogram *SP = DBuilder->createFunction(<br>

+      FContext, P.getName(), StringRef(), Unit, LineNo,<br>

+      CreateFunctionType(TheFunction->arg_size(), Unit),<br>

+      false /* internal linkage */, true /* definition */, ScopeLine,<br>

+      DINode::FlagPrototyped, false);<br>

+  TheFunction->setSubprogram(SP);<br>

+<br>

+  // Push the current scope.<br>

+  KSDbgInfo.LexicalBlocks.push_back(SP);<br>

+<br>

+  // Unset the location for the prologue emission (leading instructions with no<br>

+  // location in a function are considered part of the prologue and the debugger<br>

+  // will run past them when breaking on a function)<br>

+  KSDbgInfo.emitLocation(nullptr);<br>

+<br>

+  // Record the function arguments in the NamedValues map.<br>

+  NamedValues.clear();<br>

+  unsigned ArgIdx = 0;<br>

+  for (auto &Arg : TheFunction->args()) {<br>

+    // Create an alloca for this variable.<br>

+    AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());<br>

+<br>

+    // Create a debug descriptor for the variable.<br>

+    DILocalVariable *D = DBuilder->createParameterVariable(<br>

+        SP, Arg.getName(), ++ArgIdx, Unit, LineNo, KSDbgInfo.getDoubleTy(),<br>

+        true);<br>

+<br>

+    DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(),<br>

+                            DebugLoc::get(LineNo, 0, SP),<br>

+                            Builder.GetInsertBlock());<br>

+<br>

+    // Store the initial value into the alloca.<br>

+    Builder.CreateStore(&Arg, Alloca);<br>

+<br>

+    // Add arguments to variable symbol table.<br>

+    NamedValues[Arg.getName()] = Alloca;<br>

+  }<br>

+<br>

+  KSDbgInfo.emitLocation(Body.get());<br>

+<br>

+  if (Value *RetVal = Body->codegen()) {<br>

+    // Finish off the function.<br>

+    Builder.CreateRet(RetVal);<br>

+<br>

+    // Pop off the lexical block for the function.<br>

+    KSDbgInfo.LexicalBlocks.pop_back();<br>

+<br>

+    // Validate the generated code, checking for consistency.<br>

+    verifyFunction(*TheFunction);<br>

+<br>

+    return TheFunction;<br>

+  }<br>

+<br>

+  // Error reading body, remove function.<br>

+  TheFunction->eraseFromParent();<br>

+<br>

+  if (P.isBinaryOp())<br>

+    BinopPrecedence.erase(Proto->getOperatorName());<br>

+<br>

+  // Pop off the lexical block for the function since we added it<br>

+  // unconditionally.<br>

+  KSDbgInfo.LexicalBlocks.pop_back();<br>

+<br>

+  return nullptr;<br>

+}<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Top-Level parsing and JIT Driver<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+static void InitializeModule() {<br>

+  // Open a new module.<br>

+  TheModule = llvm::make_unique<Module>("my cool jit", TheContext);<br>

+  TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());<br>

+}<br>

+<br>

+static void HandleDefinition() {<br>

+  if (auto FnAST = ParseDefinition()) {<br>

+    if (!FnAST->codegen())<br>

+      fprintf(stderr, "Error reading function definition:");<br>

+  } else {<br>

+    // Skip token for error recovery.<br>

+    getNextToken();<br>

+  }<br>

+}<br>

+<br>

+static void HandleExtern() {<br>

+  if (auto ProtoAST = ParseExtern()) {<br>

+    if (!ProtoAST->codegen())<br>

+      fprintf(stderr, "Error reading extern");<br>

+    else<br>

+      FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);<br>

+  } else {<br>

+    // Skip token for error recovery.<br>

+    getNextToken();<br>

+  }<br>

+}<br>

+<br>

+static void HandleTopLevelExpression() {<br>

+  // Evaluate a top-level expression into an anonymous function.<br>

+  if (auto FnAST = ParseTopLevelExpr()) {<br>

+    if (!FnAST->codegen()) {<br>

+      fprintf(stderr, "Error generating code for top level expr");<br>

+    }<br>

+  } else {<br>

+    // Skip token for error recovery.<br>

+    getNextToken();<br>

+  }<br>

+}<br>

+<br>

+/// top ::= definition | external | expression | ';'<br>

+static void MainLoop() {<br>

+  while (1) {<br>

+    switch (CurTok) {<br>

+    case tok_eof:<br>

+      return;<br>

+    case ';': // ignore top-level semicolons.<br>

+      getNextToken();<br>

+      break;<br>

+    case tok_def:<br>

+      HandleDefinition();<br>

+      break;<br>

+    case tok_extern:<br>

+      HandleExtern();<br>

+      break;<br>

+    default:<br>

+      HandleTopLevelExpression();<br>

+      break;<br>

+    }<br>

+  }<br>

+}<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// "Library" functions that can be "extern'd" from user code.<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+/// putchard - putchar that takes a double and returns 0.<br>

+extern "C" double putchard(double X) {<br>

+  fputc((char)X, stderr);<br>

+  return 0;<br>

+}<br>

+<br>

+/// printd - printf that takes a double prints it as "%f\n", returning 0.<br>

+extern "C" double printd(double X) {<br>

+  fprintf(stderr, "%f\n", X);<br>

+  return 0;<br>

+}<br>

+<br>

+//===----------------------------------------------------------------------===//<br>

+// Main driver code.<br>

+//===----------------------------------------------------------------------===//<br>

+<br>

+int main() {<br>

+  InitializeNativeTarget();<br>

+  InitializeNativeTargetAsmPrinter();<br>

+  InitializeNativeTargetAsmParser();<br>

+<br>

+  // Install standard binary operators.<br>

+  // 1 is lowest precedence.<br>

+  BinopPrecedence['='] = 2;<br>

+  BinopPrecedence['<'] = 10;<br>

+  BinopPrecedence['+'] = 20;<br>

+  BinopPrecedence['-'] = 20;<br>

+  BinopPrecedence['*'] = 40; // highest.<br>

+<br>

+  // Prime the first token.<br>

+  getNextToken();<br>

+<br>

+  TheJIT = llvm::make_unique<KaleidoscopeJIT>();<br>

+<br>

+  InitializeModule();<br>

+<br>

+  // Add the current debug info version into the module.<br>

+  TheModule->addModuleFlag(Module::Warning, "Debug Info Version",<br>

+                           DEBUG_METADATA_VERSION);<br>

+<br>

+  // Darwin only supports dwarf2.<br>

+  if (Triple(sys::getProcessTriple()).isOSDarwin())<br>

+    TheModule->addModuleFlag(llvm::Module::Warning, "Dwarf Version", 2);<br>

+<br>

+  // Construct the DIBuilder, we do this here because we need the module.<br>

+  DBuilder = llvm::make_unique<DIBuilder>(*TheModule);<br>

+<br>

+  // Create the compile unit for the module.<br>

+  // Currently down as "fib.ks" as a filename since we're redirecting stdin<br>

+  // but we'd like actual source locations.<br>

+  KSDbgInfo.TheCU = DBuilder->createCompileUnit(<br>

+      dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0);<br>

+<br>

+  // Run the main "interpreter loop" now.<br>

+  MainLoop();<br>

+<br>

+  // Finalize the debug info.<br>

+  DBuilder->finalize();<br>

+<br>

+  // Print out all of the generated code.<br>

+  TheModule->dump();<br>

+<br>

+  return 0;<br>

+}<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div>