[llvm] [OCaml] Adding back OCaml Kaleidoscope tutorial (PR #181394)

Shivam Gupta via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 13 10:10:06 PST 2026


https://github.com/xgupta created https://github.com/llvm/llvm-project/pull/181394

None

>From d4eceb86ba9f299c90f02bb43dfbfd882e6a6f64 Mon Sep 17 00:00:00 2001
From: Shivam Gupta <shivam98.tkg at gmail.com>
Date: Fri, 13 Feb 2026 13:55:45 +0530
Subject: [PATCH 1/3] [OCaml][Tutorial] Adding back OCaml Kaleidoscope tutorial

---
 llvm/docs/tutorial/OCamlLangImpl1.rst | 285 ++++++++++++++++++++++++++
 llvm/docs/tutorial/index.rst          |   5 +-
 2 files changed, 289 insertions(+), 1 deletion(-)
 create mode 100644 llvm/docs/tutorial/OCamlLangImpl1.rst

diff --git a/llvm/docs/tutorial/OCamlLangImpl1.rst b/llvm/docs/tutorial/OCamlLangImpl1.rst
new file mode 100644
index 0000000000000..3fed61d2d4e19
--- /dev/null
+++ b/llvm/docs/tutorial/OCamlLangImpl1.rst
@@ -0,0 +1,285 @@
+=================================================
+Kaleidoscope: Tutorial Introduction and the Lexer
+=================================================
+
+.. contents::
+   :local:
+
+Tutorial Introduction
+=====================
+
+Welcome to the "Implementing a language with LLVM" tutorial. This
+tutorial runs through the implementation of a simple language, showing
+how fun and easy it can be. This tutorial will get you up and started as
+well as help to build a framework you can extend to other languages. The
+code in this tutorial can also be used as a playground to hack on other
+LLVM specific things.
+
+The goal of this tutorial is to progressively unveil our language,
+describing how it is built up over time. This will let us cover a fairly
+broad range of language design and LLVM-specific usage issues, showing
+and explaining the code for it all along the way, without overwhelming
+you with tons of details up front.
+
+It is useful to point out ahead of time that this tutorial is really
+about teaching compiler techniques and LLVM specifically, *not* about
+teaching modern and sane software engineering principles. In practice,
+this means that we'll take a number of shortcuts to simplify the
+exposition. For example, the code leaks memory, uses global variables
+all over the place, doesn't use nice design patterns like
+`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but
+it is very simple. If you dig in and use the code as a basis for future
+projects, fixing these deficiencies shouldn't be hard.
+
+I've tried to put this tutorial together in a way that makes chapters
+easy to skip over if you are already familiar with or are uninterested
+in the various pieces. The structure of the tutorial is:
+
+-  `Chapter #1 <#language>`_: Introduction to the Kaleidoscope
+   language, and the definition of its Lexer - This shows where we are
+   going and the basic functionality that we want it to do. In order to
+   make this tutorial maximally understandable and hackable, we choose
+   to implement everything in Objective Caml instead of using lexer and
+   parser generators. LLVM obviously works just fine with such tools,
+   feel free to use one if you prefer.
+-  `Chapter #2 <OCamlLangImpl2.html>`_: Implementing a Parser and
+   AST - With the lexer in place, we can talk about parsing techniques
+   and basic AST construction. This tutorial describes recursive descent
+   parsing and operator precedence parsing. Nothing in Chapters 1 or 2
+   is LLVM-specific, the code doesn't even link in LLVM at this point.
+   :)
+-  `Chapter #3 <OCamlLangImpl3.html>`_: Code generation to LLVM IR -
+   With the AST ready, we can show off how easy generation of LLVM IR
+   really is.
+-  `Chapter #4 <OCamlLangImpl4.html>`_: Adding JIT and Optimizer
+   Support - Because a lot of people are interested in using LLVM as a
+   JIT, we'll dive right into it and show you the 3 lines it takes to
+   add JIT support. LLVM is also useful in many other ways, but this is
+   one simple and "sexy" way to shows off its power. :)
+-  `Chapter #5 <OCamlLangImpl5.html>`_: Extending the Language:
+   Control Flow - With the language up and running, we show how to
+   extend it with control flow operations (if/then/else and a 'for'
+   loop). This gives us a chance to talk about simple SSA construction
+   and control flow.
+-  `Chapter #6 <OCamlLangImpl6.html>`_: Extending the Language:
+   User-defined Operators - This is a silly but fun chapter that talks
+   about extending the language to let the user program define their own
+   arbitrary unary and binary operators (with assignable precedence!).
+   This lets us build a significant piece of the "language" as library
+   routines.
+-  `Chapter #7 <OCamlLangImpl7.html>`_: Extending the Language:
+   Mutable Variables - This chapter talks about adding user-defined
+   local variables along with an assignment operator. The interesting
+   part about this is how easy and trivial it is to construct SSA form
+   in LLVM: no, LLVM does *not* require your front-end to construct SSA
+   form!
+-  `Chapter #8 <OCamlLangImpl8.html>`_: Conclusion and other useful
+   LLVM tidbits - This chapter wraps up the series by talking about
+   potential ways to extend the language, but also includes a bunch of
+   pointers to info about "special topics" like adding garbage
+   collection support, exceptions, debugging, support for "spaghetti
+   stacks", and a bunch of other tips and tricks.
+
+By the end of the tutorial, we'll have written a bit less than 700 lines
+of non-comment, non-blank, lines of code. With this small amount of
+code, we'll have built up a very reasonable compiler for a non-trivial
+language including a hand-written lexer, parser, AST, as well as code
+generation support with a JIT compiler. While other systems may have
+interesting "hello world" tutorials, I think the breadth of this
+tutorial is a great testament to the strengths of LLVM and why you
+should consider it if you're interested in language or compiler design.
+
+A note about this tutorial: we expect you to extend the language and
+play with it on your own. Take the code and go crazy hacking away at it,
+compilers don't need to be scary creatures - it can be a lot of fun to
+play with languages!
+
+The Basic Language
+==================
+
+This tutorial will be illustrated with a toy language that we'll call
+"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived
+from "meaning beautiful, form, and view"). Kaleidoscope is a procedural
+language that allows you to define functions, use conditionals, math,
+etc. Over the course of the tutorial, we'll extend Kaleidoscope to
+support the if/then/else construct, a for loop, user defined operators,
+JIT compilation with a simple command line interface, etc.
+
+Because we want to keep things simple, the only datatype in Kaleidoscope
+is a 64-bit floating point type (aka 'float' in OCaml parlance). As
+such, all values are implicitly double precision and the language
+doesn't require type declarations. This gives the language a very nice
+and simple syntax. For example, the following simple example computes
+`Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_
+
+::
+
+    # Compute the x'th fibonacci number.
+    def fib(x)
+      if x < 3 then
+        1
+      else
+        fib(x-1)+fib(x-2)
+
+    # This expression will compute the 40th number.
+    fib(40)
+
+We also allow Kaleidoscope to call into standard library functions (the
+LLVM JIT makes this completely trivial). This means that you can use the
+'extern' keyword to define a function before you use it (this is also
+useful for mutually recursive functions). For example:
+
+::
+
+    extern sin(arg);
+    extern cos(arg);
+    extern atan2(arg1 arg2);
+
+    atan2(sin(.4), cos(42))
+
+A more interesting example is included in Chapter 6 where we write a
+little Kaleidoscope application that `displays a Mandelbrot
+Set <OCamlLangImpl6.html#kicking-the-tires>`_ at various levels of magnification.
+
+Lets dive into the implementation of this language!
+
+The Lexer
+=========
+
+When it comes to implementing a language, the first thing needed is the
+ability to process a text file and recognize what it says. The
+traditional way to do this is to use a
+"`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka
+'scanner') to break the input up into "tokens". Each token returned by
+the lexer includes a token code and potentially some metadata (e.g. the
+numeric value of a number). First, we define the possibilities:
+
+.. code-block:: ocaml
+
+    (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
+     * these others for known things. *)
+    type token =
+      (* commands *)
+      | Def | Extern
+
+      (* primary *)
+      | Ident of string | Number of float
+
+      (* unknown *)
+      | Kwd of char
+
+Each token returned by our lexer will be one of the token variant
+values. An unknown character like '+' will be returned as
+``Token.Kwd '+'``. If the curr token is an identifier, the value will be
+``Token.Ident s``. If the current token is a numeric literal (like 1.0),
+the value will be ``Token.Number 1.0``.
+
+The actual implementation of the lexer is a collection of functions
+driven by a function named ``Lexer.lex``. The ``Lexer.lex`` function is
+called to return the next token from standard input. We will use
+`Camlp4 <http://caml.inria.fr/pub/docs/manual-camlp4/index.html>`_ to
+simplify the tokenization of the standard input. Its definition starts
+as:
+
+.. code-block:: ocaml
+
+    (*===----------------------------------------------------------------------===
+     * Lexer
+     *===----------------------------------------------------------------------===*)
+
+    let rec lex = parser
+      (* Skip any whitespace. *)
+      | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
+
+``Lexer.lex`` works by recursing over a ``char Stream.t`` to read
+characters one at a time from the standard input. It eats them as it
+recognizes them and stores them in a ``Token.token`` variant. The
+first thing that it has to do is ignore whitespace between tokens. This
+is accomplished with the recursive call above.
+
+The next thing ``Lexer.lex`` needs to do is recognize identifiers and
+specific keywords like "def". Kaleidoscope does this with a pattern
+match and a helper function.
+
+.. code-block:: ocaml
+
+      (* identifier: [a-zA-Z][a-zA-Z0-9] *)
+      | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
+          let buffer = Buffer.create 1 in
+          Buffer.add_char buffer c;
+          lex_ident buffer stream
+
+    ...
+
+    and lex_ident buffer = parser
+      | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
+          Buffer.add_char buffer c;
+          lex_ident buffer stream
+      | [< stream=lex >] ->
+          match Buffer.contents buffer with
+          | "def" -> [< 'Token.Def; stream >]
+          | "extern" -> [< 'Token.Extern; stream >]
+          | id -> [< 'Token.Ident id; stream >]
+
+Numeric values are similar:
+
+.. code-block:: ocaml
+
+      (* number: [0-9.]+ *)
+      | [< ' ('0' .. '9' as c); stream >] ->
+          let buffer = Buffer.create 1 in
+          Buffer.add_char buffer c;
+          lex_number buffer stream
+
+    ...
+
+    and lex_number buffer = parser
+      | [< ' ('0' .. '9' | '.' as c); stream >] ->
+          Buffer.add_char buffer c;
+          lex_number buffer stream
+      | [< stream=lex >] ->
+          [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
+
+This is all pretty straight-forward code for processing input. When
+reading a numeric value from input, we use the ocaml ``float_of_string``
+function to convert it to a numeric value that we store in
+``Token.Number``. Note that this isn't doing sufficient error checking:
+it will raise ``Failure`` if the string "1.23.45.67". Feel free to
+extend it :). Next we handle comments:
+
+.. code-block:: ocaml
+
+      (* Comment until end of line. *)
+      | [< ' ('#'); stream >] ->
+          lex_comment stream
+
+    ...
+
+    and lex_comment = parser
+      | [< ' ('\n'); stream=lex >] -> stream
+      | [< 'c; e=lex_comment >] -> e
+      | [< >] -> [< >]
+
+We handle comments by skipping to the end of the line and then return
+the next token. Finally, if the input doesn't match one of the above
+cases, it is either an operator character like '+' or the end of the
+file. These are handled with this code:
+
+.. code-block:: ocaml
+
+      (* Otherwise, just return the character as its ascii value. *)
+      | [< 'c; stream >] ->
+          [< 'Token.Kwd c; lex stream >]
+
+      (* end of stream. *)
+      | [< >] -> [< >]
+
+With this, we have the complete lexer for the basic Kaleidoscope
+language (the `full code listing <OCamlLangImpl2.html#full-code-listing>`_ for the
+Lexer is available in the `next chapter <OCamlLangImpl2.html>`_ of the
+tutorial). Next we'll `build a simple parser that uses this to build an
+Abstract Syntax Tree <OCamlLangImpl2.html>`_. When we have that, we'll
+include a driver so that you can use the lexer and parser together.
+
+`Next: Implementing a Parser and AST <OCamlLangImpl2.html>`_
+
diff --git a/llvm/docs/tutorial/index.rst b/llvm/docs/tutorial/index.rst
index c4c6d06cb4c7d..5df71639bbe18 100644
--- a/llvm/docs/tutorial/index.rst
+++ b/llvm/docs/tutorial/index.rst
@@ -14,12 +14,15 @@ Kaleidoscope: Implementing a Language with LLVM
   This is the "Kaleidoscope" Language tutorial, showing how to implement a simple
   language using LLVM components in C++.
 
+Kaleidoscope: Implementing a Language with LLVM in Objective Caml
+=================================================================
+
 .. toctree::
    :titlesonly:
    :glob:
    :numbered:
 
-   MyFirstLanguageFrontend/LangImpl*
+   OCamlLangImpl*
 
 Building a JIT in LLVM
 ===============================================

>From 8b1f6985fa7e80a96c275c2bfb80eeb757d95368 Mon Sep 17 00:00:00 2001
From: Shivam Gupta <shivam98.tkg at gmail.com>
Date: Fri, 13 Feb 2026 14:38:59 +0530
Subject: [PATCH 2/3] Add chapter 2

---
 llvm/docs/tutorial/OCamlLangImpl2.rst         | 904 ++++++++++++++++++
 .../OCaml-Kaleidoscope/Chapter2/_tags         |   1 +
 .../OCaml-Kaleidoscope/Chapter2/ast.ml        |  25 +
 .../OCaml-Kaleidoscope/Chapter2/lexer.ml      |  52 +
 .../OCaml-Kaleidoscope/Chapter2/parser.ml     | 122 +++
 .../OCaml-Kaleidoscope/Chapter2/token.ml      |  15 +
 .../OCaml-Kaleidoscope/Chapter2/toplevel.ml   |  34 +
 .../OCaml-Kaleidoscope/Chapter2/toy.ml        |  21 +
 8 files changed, 1174 insertions(+)
 create mode 100644 llvm/docs/tutorial/OCamlLangImpl2.rst
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml

diff --git a/llvm/docs/tutorial/OCamlLangImpl2.rst b/llvm/docs/tutorial/OCamlLangImpl2.rst
new file mode 100644
index 0000000000000..401fa0dff88e9
--- /dev/null
+++ b/llvm/docs/tutorial/OCamlLangImpl2.rst
@@ -0,0 +1,904 @@
+===========================================
+Kaleidoscope: Implementing a Parser and AST
+===========================================
+
+.. contents::
+   :local:
+
+Chapter 2 Introduction
+======================
+
+Welcome to Chapter 2 of the "`Implementing a language with LLVM in
+Objective Caml <index.html>`_" tutorial. This chapter shows you how to
+use the lexer, built in `Chapter 1 <OCamlLangImpl1.html>`_, to build a
+full `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our
+Kaleidoscope language. Once we have a parser, we'll define and build an
+`Abstract Syntax
+Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
+
+The parser we will build uses a combination of `Recursive Descent
+Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
+`Operator-Precedence
+Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
+parse the Kaleidoscope language (the latter for binary expressions and
+the former for everything else). Before we get to parsing though, lets
+talk about the output of the parser: the Abstract Syntax Tree.
+
+The Abstract Syntax Tree (AST)
+==============================
+
+The AST for a program captures its behavior in such a way that it is
+easy for later stages of the compiler (e.g. code generation) to
+interpret. We basically want one object for each construct in the
+language, and the AST should closely model the language. In
+Kaleidoscope, we have expressions, a prototype, and a function object.
+We'll start with expressions first:
+
+.. code-block:: ocaml
+
+    (* expr - Base type for all expression nodes. *)
+    type expr =
+      (* variant for numeric literals like "1.0". *)
+      | Number of float
+
+The code above shows the definition of the base ExprAST class and one
+subclass which we use for numeric literals. The important thing to note
+about this code is that the Number variant captures the numeric value of
+the literal as an instance variable. This allows later phases of the
+compiler to know what the stored numeric value is.
+
+Right now we only create the AST, so there are no useful functions on
+them. It would be very easy to add a function to pretty print the code,
+for example. Here are the other expression AST node definitions that
+we'll use in the basic form of the Kaleidoscope language:
+
+.. code-block:: ocaml
+
+      (* variant for referencing a variable, like "a". *)
+      | Variable of string
+
+      (* variant for a binary operator. *)
+      | Binary of char * expr * expr
+
+      (* variant for function calls. *)
+      | Call of string * expr array
+
+This is all (intentionally) rather straight-forward: variables capture
+the variable name, binary operators capture their opcode (e.g. '+'), and
+calls capture a function name as well as a list of any argument
+expressions. One thing that is nice about our AST is that it captures
+the language features without talking about the syntax of the language.
+Note that there is no discussion about precedence of binary operators,
+lexical structure, etc.
+
+For our basic language, these are all of the expression nodes we'll
+define. Because it doesn't have conditional control flow, it isn't
+Turing-complete; we'll fix that in a later installment. The two things
+we need next are a way to talk about the interface to a function, and a
+way to talk about functions themselves:
+
+.. code-block:: ocaml
+
+    (* proto - This type represents the "prototype" for a function, which captures
+     * its name, and its argument names (thus implicitly the number of arguments the
+     * function takes). *)
+    type proto = Prototype of string * string array
+
+    (* func - This type represents a function definition itself. *)
+    type func = Function of proto * expr
+
+In Kaleidoscope, functions are typed with just a count of their
+arguments. Since all values are double precision floating point, the
+type of each argument doesn't need to be stored anywhere. In a more
+aggressive and realistic language, the "expr" variants would probably
+have a type field.
+
+With this scaffolding, we can now talk about parsing expressions and
+function bodies in Kaleidoscope.
+
+Parser Basics
+=============
+
+Now that we have an AST to build, we need to define the parser code to
+build it. The idea here is that we want to parse something like "x+y"
+(which is returned as three tokens by the lexer) into an AST that could
+be generated with calls like this:
+
+.. code-block:: ocaml
+
+      let x = Variable "x" in
+      let y = Variable "y" in
+      let result = Binary ('+', x, y) in
+      ...
+
+The error handling routines make use of the builtin ``Stream.Failure``
+and ``Stream.Error``s. ``Stream.Failure`` is raised when the parser is
+unable to find any matching token in the first position of a pattern.
+``Stream.Error`` is raised when the first token matches, but the rest do
+not. The error recovery in our parser will not be the best and is not
+particular user-friendly, but it will be enough for our tutorial. These
+exceptions make it easier to handle errors in routines that have various
+return types.
+
+With these basic types and exceptions, we can implement the first piece
+of our grammar: numeric literals.
+
+Basic Expression Parsing
+========================
+
+We start with numeric literals, because they are the simplest to
+process. For each production in our grammar, we'll define a function
+which parses that production. We call this class of expressions
+"primary" expressions, for reasons that will become more clear `later in
+the tutorial <OCamlLangImpl6.html#user-defined-unary-operators>`_. In order to parse an
+arbitrary primary expression, we need to determine what sort of
+expression it is. For numeric literals, we have:
+
+.. code-block:: ocaml
+
+    (* primary
+     *   ::= identifier
+     *   ::= numberexpr
+     *   ::= parenexpr *)
+    parse_primary = parser
+      (* numberexpr ::= number *)
+      | [< 'Token.Number n >] -> Ast.Number n
+
+This routine is very simple: it expects to be called when the current
+token is a ``Token.Number`` token. It takes the current number value,
+creates a ``Ast.Number`` node, advances the lexer to the next token, and
+finally returns.
+
+There are some interesting aspects to this. The most important one is
+that this routine eats all of the tokens that correspond to the
+production and returns the lexer buffer with the next token (which is
+not part of the grammar production) ready to go. This is a fairly
+standard way to go for recursive descent parsers. For a better example,
+the parenthesis operator is defined like this:
+
+.. code-block:: ocaml
+
+      (* parenexpr ::= '(' expression ')' *)
+      | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
+
+This function illustrates a number of interesting things about the
+parser:
+
+1) It shows how we use the ``Stream.Error`` exception. When called, this
+function expects that the current token is a '(' token, but after
+parsing the subexpression, it is possible that there is no ')' waiting.
+For example, if the user types in "(4 x" instead of "(4)", the parser
+should emit an error. Because errors can occur, the parser needs a way
+to indicate that they happened. In our parser, we use the camlp4
+shortcut syntax ``token ?? "parse error"``, where if the token before
+the ``??`` does not match, then ``Stream.Error "parse error"`` will be
+raised.
+
+2) Another interesting aspect of this function is that it uses recursion
+by calling ``Parser.parse_primary`` (we will soon see that
+``Parser.parse_primary`` can call ``Parser.parse_primary``). This is
+powerful because it allows us to handle recursive grammars, and keeps
+each production very simple. Note that parentheses do not cause
+construction of AST nodes themselves. While we could do it this way, the
+most important role of parentheses are to guide the parser and provide
+grouping. Once the parser constructs the AST, parentheses are not
+needed.
+
+The next simple production is for handling variable references and
+function calls:
+
+.. code-block:: ocaml
+
+      (* identifierexpr
+       *   ::= identifier
+       *   ::= identifier '(' argumentexpr ')' *)
+      | [< 'Token.Ident id; stream >] ->
+          let rec parse_args accumulator = parser
+            | [< e=parse_expr; stream >] ->
+                begin parser
+                  | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
+                  | [< >] -> e :: accumulator
+                end stream
+            | [< >] -> accumulator
+          in
+          let rec parse_ident id = parser
+            (* Call. *)
+            | [< 'Token.Kwd '(';
+                 args=parse_args [];
+                 'Token.Kwd ')' ?? "expected ')'">] ->
+                Ast.Call (id, Array.of_list (List.rev args))
+
+            (* Simple variable ref. *)
+            | [< >] -> Ast.Variable id
+          in
+          parse_ident id stream
+
+This routine follows the same style as the other routines. (It expects
+to be called if the current token is a ``Token.Ident`` token). It also
+has recursion and error handling. One interesting aspect of this is that
+it uses *look-ahead* to determine if the current identifier is a stand
+alone variable reference or if it is a function call expression. It
+handles this by checking to see if the token after the identifier is a
+'(' token, constructing either a ``Ast.Variable`` or ``Ast.Call`` node
+as appropriate.
+
+We finish up by raising an exception if we received a token we didn't
+expect:
+
+.. code-block:: ocaml
+
+      | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
+
+Now that basic expressions are handled, we need to handle binary
+expressions. They are a bit more complex.
+
+Binary Expression Parsing
+=========================
+
+Binary expressions are significantly harder to parse because they are
+often ambiguous. For example, when given the string "x+y\*z", the parser
+can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
+definitions from mathematics, we expect the later parse, because "\*"
+(multiplication) has higher *precedence* than "+" (addition).
+
+There are many ways to handle this, but an elegant and efficient way is
+to use `Operator-Precedence
+Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
+This parsing technique uses the precedence of binary operators to guide
+recursion. To start with, we need a table of precedences:
+
+.. code-block:: ocaml
+
+    (* binop_precedence - This holds the precedence for each binary operator that is
+     * defined *)
+    let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
+
+    (* precedence - Get the precedence of the pending binary operator token. *)
+    let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
+
+    ...
+
+    let main () =
+      (* Install standard binary operators.
+       * 1 is the lowest precedence. *)
+      Hashtbl.add Parser.binop_precedence '<' 10;
+      Hashtbl.add Parser.binop_precedence '+' 20;
+      Hashtbl.add Parser.binop_precedence '-' 20;
+      Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
+      ...
+
+For the basic form of Kaleidoscope, we will only support 4 binary
+operators (this can obviously be extended by you, our brave and intrepid
+reader). The ``Parser.precedence`` function returns the precedence for
+the current token, or -1 if the token is not a binary operator. Having a
+``Hashtbl.t`` makes it easy to add new operators and makes it clear that
+the algorithm doesn't depend on the specific operators involved, but it
+would be easy enough to eliminate the ``Hashtbl.t`` and do the
+comparisons in the ``Parser.precedence`` function. (Or just use a
+fixed-size array).
+
+With the helper above defined, we can now start parsing binary
+expressions. The basic idea of operator precedence parsing is to break
+down an expression with potentially ambiguous binary operators into
+pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
+Operator precedence parsing considers this as a stream of primary
+expressions separated by binary operators. As such, it will first parse
+the leading primary expression "a", then it will see the pairs [+, b]
+[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
+primary expressions, the binary expression parser doesn't need to worry
+about nested subexpressions like (c+d) at all.
+
+To start, an expression is a primary expression potentially followed by
+a sequence of [binop,primaryexpr] pairs:
+
+.. code-block:: ocaml
+
+    (* expression
+     *   ::= primary binoprhs *)
+    and parse_expr = parser
+      | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
+
+``Parser.parse_bin_rhs`` is the function that parses the sequence of
+pairs for us. It takes a precedence and a pointer to an expression for
+the part that has been parsed so far. Note that "x" is a perfectly valid
+expression: As such, "binoprhs" is allowed to be empty, in which case it
+returns the expression that is passed into it. In our example above, the
+code passes the expression for "a" into ``Parser.parse_bin_rhs`` and the
+current token is "+".
+
+The precedence value passed into ``Parser.parse_bin_rhs`` indicates the
+*minimal operator precedence* that the function is allowed to eat. For
+example, if the current pair stream is [+, x] and
+``Parser.parse_bin_rhs`` is passed in a precedence of 40, it will not
+consume any tokens (because the precedence of '+' is only 20). With this
+in mind, ``Parser.parse_bin_rhs`` starts with:
+
+.. code-block:: ocaml
+
+    (* binoprhs
+     *   ::= ('+' primary)* *)
+    and parse_bin_rhs expr_prec lhs stream =
+      match Stream.peek stream with
+      (* If this is a binop, find its precedence. *)
+      | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
+          let token_prec = precedence c in
+
+          (* If this is a binop that binds at least as tightly as the current binop,
+           * consume it, otherwise we are done. *)
+          if token_prec < expr_prec then lhs else begin
+
+This code gets the precedence of the current token and checks to see if
+if is too low. Because we defined invalid tokens to have a precedence of
+-1, this check implicitly knows that the pair-stream ends when the token
+stream runs out of binary operators. If this check succeeds, we know
+that the token is a binary operator and that it will be included in this
+expression:
+
+.. code-block:: ocaml
+
+            (* Eat the binop. *)
+            Stream.junk stream;
+
+            (* Parse the primary expression after the binary operator *)
+            let rhs = parse_primary stream in
+
+            (* Okay, we know this is a binop. *)
+            let rhs =
+              match Stream.peek stream with
+              | Some (Token.Kwd c2) ->
+
+As such, this code eats (and remembers) the binary operator and then
+parses the primary expression that follows. This builds up the whole
+pair, the first of which is [+, b] for the running example.
+
+Now that we parsed the left-hand side of an expression and one pair of
+the RHS sequence, we have to decide which way the expression associates.
+In particular, we could have "(a+b) binop unparsed" or "a + (b binop
+unparsed)". To determine this, we look ahead at "binop" to determine its
+precedence and compare it to BinOp's precedence (which is '+' in this
+case):
+
+.. code-block:: ocaml
+
+                  (* If BinOp binds less tightly with rhs than the operator after
+                   * rhs, let the pending operator take rhs as its lhs. *)
+                  let next_prec = precedence c2 in
+                  if token_prec < next_prec
+
+If the precedence of the binop to the right of "RHS" is lower or equal
+to the precedence of our current operator, then we know that the
+parentheses associate as "(a+b) binop ...". In our example, the current
+operator is "+" and the next operator is "+", we know that they have the
+same precedence. In this case we'll create the AST node for "a+b", and
+then continue parsing:
+
+.. code-block:: ocaml
+
+              ... if body omitted ...
+            in
+
+            (* Merge lhs/rhs. *)
+            let lhs = Ast.Binary (c, lhs, rhs) in
+            parse_bin_rhs expr_prec lhs stream
+          end
+
+In our example above, this will turn "a+b+" into "(a+b)" and execute the
+next iteration of the loop, with "+" as the current token. The code
+above will eat, remember, and parse "(c+d)" as the primary expression,
+which makes the current pair equal to [+, (c+d)]. It will then evaluate
+the 'if' conditional above with "\*" as the binop to the right of the
+primary. In this case, the precedence of "\*" is higher than the
+precedence of "+" so the if condition will be entered.
+
+The critical question left here is "how can the if condition parse the
+right hand side in full"? In particular, to build the AST correctly for
+our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
+variable. The code to do this is surprisingly simple (code from the
+above two blocks duplicated for context):
+
+.. code-block:: ocaml
+
+              match Stream.peek stream with
+              | Some (Token.Kwd c2) ->
+                  (* If BinOp binds less tightly with rhs than the operator after
+                   * rhs, let the pending operator take rhs as its lhs. *)
+                  if token_prec < precedence c2
+                  then parse_bin_rhs (token_prec + 1) rhs stream
+                  else rhs
+              | _ -> rhs
+            in
+
+            (* Merge lhs/rhs. *)
+            let lhs = Ast.Binary (c, lhs, rhs) in
+            parse_bin_rhs expr_prec lhs stream
+          end
+
+At this point, we know that the binary operator to the RHS of our
+primary has higher precedence than the binop we are currently parsing.
+As such, we know that any sequence of pairs whose operators are all
+higher precedence than "+" should be parsed together and returned as
+"RHS". To do this, we recursively invoke the ``Parser.parse_bin_rhs``
+function specifying "token\_prec+1" as the minimum precedence required
+for it to continue. In our example above, this will cause it to return
+the AST node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of
+the '+' expression.
+
+Finally, on the next iteration of the while loop, the "+g" piece is
+parsed and added to the AST. With this little bit of code (14
+non-trivial lines), we correctly handle fully general binary expression
+parsing in a very elegant way. This was a whirlwind tour of this code,
+and it is somewhat subtle. I recommend running through it with a few
+tough examples to see how it works.
+
+This wraps up handling of expressions. At this point, we can point the
+parser at an arbitrary token stream and build an expression from it,
+stopping at the first token that is not part of the expression. Next up
+we need to handle function definitions, etc.
+
+Parsing the Rest
+================
+
+The next thing missing is handling of function prototypes. In
+Kaleidoscope, these are used both for 'extern' function declarations as
+well as function body definitions. The code to do this is
+straight-forward and not very interesting (once you've survived
+expressions):
+
+.. code-block:: ocaml
+
+    (* prototype
+     *   ::= id '(' id* ')' *)
+    let parse_prototype =
+      let rec parse_args accumulator = parser
+        | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
+        | [< >] -> accumulator
+      in
+
+      parser
+      | [< 'Token.Ident id;
+           'Token.Kwd '(' ?? "expected '(' in prototype";
+           args=parse_args [];
+           'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
+          (* success. *)
+          Ast.Prototype (id, Array.of_list (List.rev args))
+
+      | [< >] ->
+          raise (Stream.Error "expected function name in prototype")
+
+Given this, a function definition is very simple, just a prototype plus
+an expression to implement the body:
+
+.. code-block:: ocaml
+
+    (* definition ::= 'def' prototype expression *)
+    let parse_definition = parser
+      | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
+          Ast.Function (p, e)
+
+In addition, we support 'extern' to declare functions like 'sin' and
+'cos' as well as to support forward declaration of user functions. These
+'extern's are just prototypes with no body:
+
+.. code-block:: ocaml
+
+    (*  external ::= 'extern' prototype *)
+    let parse_extern = parser
+      | [< 'Token.Extern; e=parse_prototype >] -> e
+
+Finally, we'll also let the user type in arbitrary top-level expressions
+and evaluate them on the fly. We will handle this by defining anonymous
+nullary (zero argument) functions for them:
+
+.. code-block:: ocaml
+
+    (* toplevelexpr ::= expression *)
+    let parse_toplevel = parser
+      | [< e=parse_expr >] ->
+          (* Make an anonymous proto. *)
+          Ast.Function (Ast.Prototype ("", [||]), e)
+
+Now that we have all the pieces, let's build a little driver that will
+let us actually *execute* this code we've built!
+
+The Driver
+==========
+
+The driver for this simply invokes all of the parsing pieces with a
+top-level dispatch loop. There isn't much interesting here, so I'll just
+include the top-level loop. See `below <#full-code-listing>`_ for full code in the
+"Top-Level Parsing" section.
+
+.. code-block:: ocaml
+
+    (* top ::= definition | external | expression | ';' *)
+    let rec main_loop stream =
+      match Stream.peek stream with
+      | None -> ()
+
+      (* ignore top-level semicolons. *)
+      | Some (Token.Kwd ';') ->
+          Stream.junk stream;
+          main_loop stream
+
+      | Some token ->
+          begin
+            try match token with
+            | Token.Def ->
+                ignore(Parser.parse_definition stream);
+                print_endline "parsed a function definition.";
+            | Token.Extern ->
+                ignore(Parser.parse_extern stream);
+                print_endline "parsed an extern.";
+            | _ ->
+                (* Evaluate a top-level expression into an anonymous function. *)
+                ignore(Parser.parse_toplevel stream);
+                print_endline "parsed a top-level expr";
+            with Stream.Error s ->
+              (* Skip token for error recovery. *)
+              Stream.junk stream;
+              print_endline s;
+          end;
+          print_string "ready> "; flush stdout;
+          main_loop stream
+
+The most interesting part of this is that we ignore top-level
+semicolons. Why is this, you ask? The basic reason is that if you type
+"4 + 5" at the command line, the parser doesn't know whether that is the
+end of what you will type or not. For example, on the next line you
+could type "def foo..." in which case 4+5 is the end of a top-level
+expression. Alternatively you could type "\* 6", which would continue
+the expression. Having top-level semicolons allows you to type "4+5;",
+and the parser will know you are done.
+
+Conclusions
+===========
+
+With just under 300 lines of commented code (240 lines of non-comment,
+non-blank code), we fully defined our minimal language, including a
+lexer, parser, and AST builder. With this done, the executable will
+validate Kaleidoscope code and tell us if it is grammatically invalid.
+For example, here is a sample interaction:
+
+.. code-block:: bash
+
+    $ ./toy.byte
+    ready> def foo(x y) x+foo(y, 4.0);
+    Parsed a function definition.
+    ready> def foo(x y) x+y y;
+    Parsed a function definition.
+    Parsed a top-level expr
+    ready> def foo(x y) x+y );
+    Parsed a function definition.
+    Error: unknown token when expecting an expression
+    ready> extern sin(a);
+    ready> Parsed an extern
+    ready> ^D
+    $
+
+There is a lot of room for extension here. You can define new AST nodes,
+extend the language in many ways, etc. In the `next
+installment <OCamlLangImpl3.html>`_, we will describe how to generate
+LLVM Intermediate Representation (IR) from the AST.
+
+Full Code Listing
+=================
+
+Here is the complete code listing for this and the previous chapter.
+Note that it is fully self-contained: you don't need LLVM or any
+external libraries at all for this. (Besides the ocaml standard
+libraries, of course.) To build this, just compile with:
+
+.. code-block:: bash
+
+    # Compile
+    ocamlbuild -use-ocamlfind -pkg camlp-streams toy.byte
+    # Run
+    ./toy.byte
+
+.. important::
+
+    ``lexer.ml`` and ``parser.ml`` were written using Camlp4 syntax which is deprecated in newer OCaml installations.
+    You must downgrade OCaml package and install ``camlp4`` with ``opam``.  
+
+Here is the code:
+
+\_tags:
+    ::
+
+        <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
+
+token.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Lexer Tokens
+         *===----------------------------------------------------------------------===*)
+
+        (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
+         * these others for known things. *)
+        type token =
+          (* commands *)
+          | Def | Extern
+
+          (* primary *)
+          | Ident of string | Number of float
+
+          (* unknown *)
+          | Kwd of char
+
+lexer.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Lexer
+         *===----------------------------------------------------------------------===*)
+
+        let rec lex = parser
+          (* Skip any whitespace. *)
+          | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
+
+          (* identifier: [a-zA-Z][a-zA-Z0-9] *)
+          | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
+              let buffer = Buffer.create 1 in
+              Buffer.add_char buffer c;
+              lex_ident buffer stream
+
+          (* number: [0-9.]+ *)
+          | [< ' ('0' .. '9' as c); stream >] ->
+              let buffer = Buffer.create 1 in
+              Buffer.add_char buffer c;
+              lex_number buffer stream
+
+          (* Comment until end of line. *)
+          | [< ' ('#'); stream >] ->
+              lex_comment stream
+
+          (* Otherwise, just return the character as its ascii value. *)
+          | [< 'c; stream >] ->
+              [< 'Token.Kwd c; lex stream >]
+
+          (* end of stream. *)
+          | [< >] -> [< >]
+
+        and lex_number buffer = parser
+          | [< ' ('0' .. '9' | '.' as c); stream >] ->
+              Buffer.add_char buffer c;
+              lex_number buffer stream
+          | [< stream=lex >] ->
+              [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
+
+        and lex_ident buffer = parser
+          | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
+              Buffer.add_char buffer c;
+              lex_ident buffer stream
+          | [< stream=lex >] ->
+              match Buffer.contents buffer with
+              | "def" -> [< 'Token.Def; stream >]
+              | "extern" -> [< 'Token.Extern; stream >]
+              | id -> [< 'Token.Ident id; stream >]
+
+        and lex_comment = parser
+          | [< ' ('\n'); stream=lex >] -> stream
+          | [< 'c; e=lex_comment >] -> e
+          | [< >] -> [< >]
+
+ast.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Abstract Syntax Tree (aka Parse Tree)
+         *===----------------------------------------------------------------------===*)
+
+        (* expr - Base type for all expression nodes. *)
+        type expr =
+          (* variant for numeric literals like "1.0". *)
+          | Number of float
+
+          (* variant for referencing a variable, like "a". *)
+          | Variable of string
+
+          (* variant for a binary operator. *)
+          | Binary of char * expr * expr
+
+          (* variant for function calls. *)
+          | Call of string * expr array
+
+        (* proto - This type represents the "prototype" for a function, which captures
+         * its name, and its argument names (thus implicitly the number of arguments the
+         * function takes). *)
+        type proto = Prototype of string * string array
+
+        (* func - This type represents a function definition itself. *)
+        type func = Function of proto * expr
+
+parser.ml:
+    .. code-block:: ocaml
+
+        (*===---------------------------------------------------------------------===
+         * Parser
+         *===---------------------------------------------------------------------===*)
+
+        (* binop_precedence - This holds the precedence for each binary operator that is
+         * defined *)
+        let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
+
+        (* precedence - Get the precedence of the pending binary operator token. *)
+        let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
+
+        (* primary
+         *   ::= identifier
+         *   ::= numberexpr
+         *   ::= parenexpr *)
+        let rec parse_primary = parser
+          (* numberexpr ::= number *)
+          | [< 'Token.Number n >] -> Ast.Number n
+
+          (* parenexpr ::= '(' expression ')' *)
+          | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
+
+          (* identifierexpr
+           *   ::= identifier
+           *   ::= identifier '(' argumentexpr ')' *)
+          | [< 'Token.Ident id; stream >] ->
+              let rec parse_args accumulator = parser
+                | [< e=parse_expr; stream >] ->
+                    begin parser
+                      | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
+                      | [< >] -> e :: accumulator
+                    end stream
+                | [< >] -> accumulator
+              in
+              let rec parse_ident id = parser
+                (* Call. *)
+                | [< 'Token.Kwd '(';
+                     args=parse_args [];
+                     'Token.Kwd ')' ?? "expected ')'">] ->
+                    Ast.Call (id, Array.of_list (List.rev args))
+
+                (* Simple variable ref. *)
+                | [< >] -> Ast.Variable id
+              in
+              parse_ident id stream
+
+          | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
+
+        (* binoprhs
+         *   ::= ('+' primary)* *)
+        and parse_bin_rhs expr_prec lhs stream =
+          match Stream.peek stream with
+          (* If this is a binop, find its precedence. *)
+          | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
+              let token_prec = precedence c in
+
+              (* If this is a binop that binds at least as tightly as the current binop,
+               * consume it, otherwise we are done. *)
+              if token_prec < expr_prec then lhs else begin
+                (* Eat the binop. *)
+                Stream.junk stream;
+
+                (* Parse the primary expression after the binary operator. *)
+                let rhs = parse_primary stream in
+
+                (* Okay, we know this is a binop. *)
+                let rhs =
+                  match Stream.peek stream with
+                  | Some (Token.Kwd c2) ->
+                      (* If BinOp binds less tightly with rhs than the operator after
+                       * rhs, let the pending operator take rhs as its lhs. *)
+                      let next_prec = precedence c2 in
+                      if token_prec < next_prec
+                      then parse_bin_rhs (token_prec + 1) rhs stream
+                      else rhs
+                  | _ -> rhs
+                in
+
+                (* Merge lhs/rhs. *)
+                let lhs = Ast.Binary (c, lhs, rhs) in
+                parse_bin_rhs expr_prec lhs stream
+              end
+          | _ -> lhs
+
+        (* expression
+         *   ::= primary binoprhs *)
+        and parse_expr = parser
+          | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
+
+        (* prototype
+         *   ::= id '(' id* ')' *)
+        let parse_prototype =
+          let rec parse_args accumulator = parser
+            | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
+            | [< >] -> accumulator
+          in
+
+          parser
+          | [< 'Token.Ident id;
+               'Token.Kwd '(' ?? "expected '(' in prototype";
+               args=parse_args [];
+               'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
+              (* success. *)
+              Ast.Prototype (id, Array.of_list (List.rev args))
+
+          | [< >] ->
+              raise (Stream.Error "expected function name in prototype")
+
+        (* definition ::= 'def' prototype expression *)
+        let parse_definition = parser
+          | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
+              Ast.Function (p, e)
+
+        (* toplevelexpr ::= expression *)
+        let parse_toplevel = parser
+          | [< e=parse_expr >] ->
+              (* Make an anonymous proto. *)
+              Ast.Function (Ast.Prototype ("", [||]), e)
+
+        (*  external ::= 'extern' prototype *)
+        let parse_extern = parser
+          | [< 'Token.Extern; e=parse_prototype >] -> e
+
+toplevel.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Top-Level parsing and JIT Driver
+         *===----------------------------------------------------------------------===*)
+
+        (* top ::= definition | external | expression | ';' *)
+        let rec main_loop stream =
+          match Stream.peek stream with
+          | None -> ()
+
+          (* ignore top-level semicolons. *)
+          | Some (Token.Kwd ';') ->
+              Stream.junk stream;
+              main_loop stream
+
+          | Some token ->
+              begin
+                try match token with
+                | Token.Def ->
+                    ignore(Parser.parse_definition stream);
+                    print_endline "parsed a function definition.";
+                | Token.Extern ->
+                    ignore(Parser.parse_extern stream);
+                    print_endline "parsed an extern.";
+                | _ ->
+                    (* Evaluate a top-level expression into an anonymous function. *)
+                    ignore(Parser.parse_toplevel stream);
+                    print_endline "parsed a top-level expr";
+                with Stream.Error s ->
+                  (* Skip token for error recovery. *)
+                  Stream.junk stream;
+                  print_endline s;
+              end;
+              print_string "ready> "; flush stdout;
+              main_loop stream
+
+toy.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Main driver code.
+         *===----------------------------------------------------------------------===*)
+
+        let main () =
+          (* Install standard binary operators.
+           * 1 is the lowest precedence. *)
+          Hashtbl.add Parser.binop_precedence '<' 10;
+          Hashtbl.add Parser.binop_precedence '+' 20;
+          Hashtbl.add Parser.binop_precedence '-' 20;
+          Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
+
+          (* Prime the first token. *)
+          print_string "ready> "; flush stdout;
+          let stream = Lexer.lex (Stream.of_channel stdin) in
+
+          (* Run the main "interpreter loop" now. *)
+          Toplevel.main_loop stream;
+        ;;
+
+        main ()
+
+`Next: Implementing Code Generation to LLVM IR <OCamlLangImpl3.html>`_
+
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags
new file mode 100644
index 0000000000000..7b9b80b1e949c
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/_tags
@@ -0,0 +1 @@
+<{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml
new file mode 100644
index 0000000000000..4cc2dea86b78b
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/ast.ml
@@ -0,0 +1,25 @@
+(*===----------------------------------------------------------------------===
+ * Abstract Syntax Tree (aka Parse Tree)
+ *===----------------------------------------------------------------------===*)
+
+(* expr - Base type for all expression nodes. *)
+type expr =
+  (* variant for numeric literals like "1.0". *)
+  | Number of float
+
+  (* variant for referencing a variable, like "a". *)
+  | Variable of string
+
+  (* variant for a binary operator. *)
+  | Binary of char * expr * expr
+
+  (* variant for function calls. *)
+  | Call of string * expr array
+
+(* proto - This type represents the "prototype" for a function, which captures
+ * its name, and its argument names (thus implicitly the number of arguments the
+ * function takes). *)
+type proto = Prototype of string * string array
+
+(* func - This type represents a function definition itself. *)
+type func = Function of proto * expr
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml
new file mode 100644
index 0000000000000..22a915552f035
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/lexer.ml
@@ -0,0 +1,52 @@
+(*===----------------------------------------------------------------------===
+ * Lexer
+ *===----------------------------------------------------------------------===*)
+
+let rec lex = parser
+  (* Skip any whitespace. *)
+  | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
+
+  (* identifier: [a-zA-Z][a-zA-Z0-9] *)
+  | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
+      let buffer = Buffer.create 1 in
+      Buffer.add_char buffer c;
+      lex_ident buffer stream
+
+  (* number: [0-9.]+ *)
+  | [< ' ('0' .. '9' as c); stream >] ->
+      let buffer = Buffer.create 1 in
+      Buffer.add_char buffer c;
+      lex_number buffer stream
+
+  (* Comment until end of line. *)
+  | [< ' ('#'); stream >] ->
+      lex_comment stream
+
+  (* Otherwise, just return the character as its ascii value. *)
+  | [< 'c; stream >] ->
+      [< 'Token.Kwd c; lex stream >]
+
+  (* end of stream. *)
+  | [< >] -> [< >]
+
+and lex_number buffer = parser
+  | [< ' ('0' .. '9' | '.' as c); stream >] ->
+      Buffer.add_char buffer c;
+      lex_number buffer stream
+  | [< stream=lex >] ->
+      [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
+
+and lex_ident buffer = parser
+  | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
+      Buffer.add_char buffer c;
+      lex_ident buffer stream
+  | [< stream=lex >] ->
+      match Buffer.contents buffer with
+      | "def" -> [< 'Token.Def; stream >]
+      | "extern" -> [< 'Token.Extern; stream >]
+      | id -> [< 'Token.Ident id; stream >]
+
+and lex_comment = parser
+  | [< ' ('\n'); stream=lex >] -> stream
+  | [< 'c; e=lex_comment >] -> e
+  | [< >] -> [< >]
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml
new file mode 100644
index 0000000000000..83d9874a4ab6e
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/parser.ml
@@ -0,0 +1,122 @@
+(*===---------------------------------------------------------------------===
+ * Parser
+ *===---------------------------------------------------------------------===*)
+
+(* binop_precedence - This holds the precedence for each binary operator that is
+ * defined *)
+let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
+
+(* precedence - Get the precedence of the pending binary operator token. *)
+let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
+
+(* primary
+ *   ::= identifier
+ *   ::= numberexpr
+ *   ::= parenexpr *)
+let rec parse_primary = parser
+  (* numberexpr ::= number *)
+  | [< 'Token.Number n >] -> Ast.Number n
+
+  (* parenexpr ::= '(' expression ')' *)
+  | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
+
+  (* identifierexpr
+   *   ::= identifier
+   *   ::= identifier '(' argumentexpr ')' *)
+  | [< 'Token.Ident id; stream >] ->
+      let rec parse_args accumulator = parser
+        | [< e=parse_expr; stream >] ->
+            begin parser
+              | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
+              | [< >] -> e :: accumulator
+            end stream
+        | [< >] -> accumulator
+      in
+      let rec parse_ident id = parser
+        (* Call. *)
+        | [< 'Token.Kwd '(';
+             args=parse_args [];
+             'Token.Kwd ')' ?? "expected ')'">] ->
+            Ast.Call (id, Array.of_list (List.rev args))
+
+        (* Simple variable ref. *)
+        | [< >] -> Ast.Variable id
+      in
+      parse_ident id stream
+
+  | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
+
+(* binoprhs
+ *   ::= ('+' primary)* *)
+and parse_bin_rhs expr_prec lhs stream =
+  match Stream.peek stream with
+  (* If this is a binop, find its precedence. *)
+  | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
+      let token_prec = precedence c in
+
+      (* If this is a binop that binds at least as tightly as the current binop,
+       * consume it, otherwise we are done. *)
+      if token_prec < expr_prec then lhs else begin
+        (* Eat the binop. *)
+        Stream.junk stream;
+
+        (* Parse the primary expression after the binary operator. *)
+        let rhs = parse_primary stream in
+
+        (* Okay, we know this is a binop. *)
+        let rhs =
+          match Stream.peek stream with
+          | Some (Token.Kwd c2) ->
+              (* If BinOp binds less tightly with rhs than the operator after
+               * rhs, let the pending operator take rhs as its lhs. *)
+              let next_prec = precedence c2 in
+              if token_prec < next_prec
+              then parse_bin_rhs (token_prec + 1) rhs stream
+              else rhs
+          | _ -> rhs
+        in
+
+        (* Merge lhs/rhs. *)
+        let lhs = Ast.Binary (c, lhs, rhs) in
+        parse_bin_rhs expr_prec lhs stream
+      end
+  | _ -> lhs
+
+(* expression
+ *   ::= primary binoprhs *)
+and parse_expr = parser
+  | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
+
+(* prototype
+ *   ::= id '(' id* ')' *)
+let parse_prototype =
+  let rec parse_args accumulator = parser
+    | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
+    | [< >] -> accumulator
+  in
+
+  parser
+  | [< 'Token.Ident id;
+       'Token.Kwd '(' ?? "expected '(' in prototype";
+       args=parse_args [];
+       'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
+      (* success. *)
+      Ast.Prototype (id, Array.of_list (List.rev args))
+
+  | [< >] ->
+      raise (Stream.Error "expected function name in prototype")
+
+(* definition ::= 'def' prototype expression *)
+let parse_definition = parser
+  | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
+      Ast.Function (p, e)
+
+(* toplevelexpr ::= expression *)
+let parse_toplevel = parser
+  | [< e=parse_expr >] ->
+      (* Make an anonymous proto. *)
+      Ast.Function (Ast.Prototype ("", [||]), e)
+
+(*  external ::= 'extern' prototype *)
+let parse_extern = parser
+  | [< 'Token.Extern; e=parse_prototype >] -> e
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml
new file mode 100644
index 0000000000000..2ca782e149971
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/token.ml
@@ -0,0 +1,15 @@
+(*===----------------------------------------------------------------------===
+ * Lexer Tokens
+ *===----------------------------------------------------------------------===*)
+
+(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
+ * these others for known things. *)
+type token =
+  (* commands *)
+  | Def | Extern
+
+  (* primary *)
+  | Ident of string | Number of float
+
+  (* unknown *)
+  | Kwd of char
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml
new file mode 100644
index 0000000000000..01c85bd70d292
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/toplevel.ml
@@ -0,0 +1,34 @@
+(*===----------------------------------------------------------------------===
+ * Top-Level parsing and JIT Driver
+ *===----------------------------------------------------------------------===*)
+
+(* top ::= definition | external | expression | ';' *)
+let rec main_loop stream =
+  match Stream.peek stream with
+  | None -> ()
+
+  (* ignore top-level semicolons. *)
+  | Some (Token.Kwd ';') ->
+      Stream.junk stream;
+      main_loop stream
+
+  | Some token ->
+      begin
+        try match token with
+        | Token.Def ->
+            ignore(Parser.parse_definition stream);
+            print_endline "parsed a function definition.";
+        | Token.Extern ->
+            ignore(Parser.parse_extern stream);
+            print_endline "parsed an extern.";
+        | _ ->
+            (* Evaluate a top-level expression into an anonymous function. *)
+            ignore(Parser.parse_toplevel stream);
+            print_endline "parsed a top-level expr";
+        with Stream.Error s ->
+          (* Skip token for error recovery. *)
+          Stream.junk stream;
+          print_endline s;
+      end;
+      print_string "ready> "; flush stdout;
+      main_loop stream
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml
new file mode 100644
index 0000000000000..42b19fec00191
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter2/toy.ml
@@ -0,0 +1,21 @@
+(*===----------------------------------------------------------------------===
+ * Main driver code.
+ *===----------------------------------------------------------------------===*)
+
+let main () =
+  (* Install standard binary operators.
+   * 1 is the lowest precedence. *)
+  Hashtbl.add Parser.binop_precedence '<' 10;
+  Hashtbl.add Parser.binop_precedence '+' 20;
+  Hashtbl.add Parser.binop_precedence '-' 20;
+  Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
+
+  (* Prime the first token. *)
+  print_string "ready> "; flush stdout;
+  let stream = Lexer.lex (Stream.of_channel stdin) in
+
+  (* Run the main "interpreter loop" now. *)
+  Toplevel.main_loop stream;
+;;
+
+main ()

>From d1644c0adbc45613262222fee43f89f079c0c8bf Mon Sep 17 00:00:00 2001
From: Shivam Gupta <shivam98.tkg at gmail.com>
Date: Fri, 13 Feb 2026 17:22:05 +0530
Subject: [PATCH 3/3] Add chapter 3

---
 llvm/docs/tutorial/OCamlLangImpl3.rst         | 1039 +++++++++++++++++
 .../OCaml-Kaleidoscope/Chapter3/_tags         |    2 +
 .../OCaml-Kaleidoscope/Chapter3/ast.ml        |   25 +
 .../OCaml-Kaleidoscope/Chapter3/codegen.ml    |  137 +++
 .../OCaml-Kaleidoscope/Chapter3/lexer.ml      |   52 +
 .../Chapter3/myocamlbuild.ml                  |    6 +
 .../OCaml-Kaleidoscope/Chapter3/parser.ml     |  122 ++
 .../OCaml-Kaleidoscope/Chapter3/token.ml      |   15 +
 .../OCaml-Kaleidoscope/Chapter3/toplevel.ml   |   39 +
 .../OCaml-Kaleidoscope/Chapter3/toy.ml        |   26 +
 10 files changed, 1463 insertions(+)
 create mode 100644 llvm/docs/tutorial/OCamlLangImpl3.rst
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml
 create mode 100644 llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml

diff --git a/llvm/docs/tutorial/OCamlLangImpl3.rst b/llvm/docs/tutorial/OCamlLangImpl3.rst
new file mode 100644
index 0000000000000..99aa7b0314563
--- /dev/null
+++ b/llvm/docs/tutorial/OCamlLangImpl3.rst
@@ -0,0 +1,1039 @@
+========================================
+Kaleidoscope: Code generation to LLVM IR
+========================================
+
+.. contents::
+   :local:
+
+Chapter 3 Introduction
+======================
+
+Welcome to Chapter 3 of the "`Implementing a language with
+LLVM <index.html>`_" tutorial. This chapter shows you how to transform
+the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2,
+into LLVM IR. This will teach you a little bit about how LLVM does
+things, as well as demonstrate how easy it is to use. It's much more
+work to build a lexer and parser than it is to generate LLVM IR code. :)
+
+Code Generation Setup
+=====================
+
+In order to generate LLVM IR, we want some simple setup to get started.
+First we define virtual code generation (codegen) methods in each AST
+class:
+
+.. code-block:: ocaml
+
+    let rec codegen_expr = function
+      | Ast.Number n -> ...
+      | Ast.Variable name -> ...
+
+The ``Codegen.codegen_expr`` function says to emit IR for that AST node
+along with all the things it depends on, and they all return an LLVM
+Value object. "Value" is the class used to represent a "`Static Single
+Assignment
+(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+register" or "SSA value" in LLVM. The most distinct aspect of SSA values
+is that their value is computed as the related instruction executes, and
+it does not get a new value until (and if) the instruction re-executes.
+In other words, there is no way to "change" an SSA value. For more
+information, please read up on `Static Single
+Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
+- the concepts are really quite natural once you grok them.
+
+The second thing we want is an "Error" exception like we used for the
+parser, which will be used to report errors found during code generation
+(for example, use of an undeclared parameter):
+
+.. code-block:: ocaml
+
+    exception Error of string
+
+    let context = global_context ()
+    let the_module = create_module context "my cool jit"
+    let builder = builder context
+    let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
+    let function_types : (string, lltype) Hashtbl.t = Hashtbl.create 10
+    let double_type = double_type context
+
+The static variables will be used during code generation.
+``Codegen.the_module`` is the LLVM construct that contains all of the
+functions and global variables in a chunk of code. In many ways, it is
+the top-level structure that the LLVM IR uses to contain code.
+
+The ``Codegen.builder`` object is a helper object that makes it easy to
+generate LLVM instructions. Instances of the
+`IRBuilder <https://llvm.org/doxygen/IRBuilder_8h-source.html>`_
+class keep track of the current place to insert instructions and has
+methods to create new instructions.
+
+The ``Codegen.named_values`` map keeps track of which values are defined
+in the current scope and what their LLVM representation is. (In other
+words, it is a symbol table for the code). In this form of Kaleidoscope,
+the only things that can be referenced are function parameters. As such,
+function parameters will be in this map when generating code for their
+function body.
+
+The ``Codegeb.function_types`` map keeps track of functions types in
+separate table instead of recovering them from LLVM values. This is
+needed becuase newer versions of LLVM use opaque pointers, which no
+longer preserve the function signature information. Therefore, we explicitly
+track function types.
+
+With these basics in place, we can start talking about how to generate
+code for each expression. Note that this assumes that the
+``Codegen.builder`` has been set up to generate code *into* something.
+For now, we'll assume that this has already been done, and we'll just
+use it to emit code.
+
+Expression Code Generation
+==========================
+
+Generating LLVM code for expression nodes is very straightforward: less
+than 30 lines of commented code for all four of our expression nodes.
+First we'll do numeric literals:
+
+.. code-block:: ocaml
+
+      | Ast.Number n -> const_float double_type n
+
+In the LLVM IR, numeric constants are represented with the
+``ConstantFP`` class, which holds the numeric value in an ``APFloat``
+internally (``APFloat`` has the capability of holding floating point
+constants of Arbitrary Precision). This code basically just creates
+and returns a ``ConstantFP``. Note that in the LLVM IR that constants
+are all uniqued together and shared. For this reason, the API uses "the
+foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".
+
+.. code-block:: ocaml
+
+      | Ast.Variable name ->
+          (try Hashtbl.find named_values name with
+            | Not_found -> raise (Error "unknown variable name"))
+
+References to variables are also quite simple using LLVM. In the simple
+version of Kaleidoscope, we assume that the variable has already been
+emitted somewhere and its value is available. In practice, the only
+values that can be in the ``Codegen.named_values`` map are function
+arguments. This code simply checks to see that the specified name is in
+the map (if not, an unknown variable is being referenced) and returns
+the value for it. In future chapters, we'll add support for `loop
+induction variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for
+`local variables <LangImpl7.html#user-defined-local-variables>`_.
+
+.. code-block:: ocaml
+
+      | Ast.Binary (op, lhs, rhs) ->
+          let lhs_val = codegen_expr lhs in
+          let rhs_val = codegen_expr rhs in
+          begin
+            match op with
+            | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
+            | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
+            | '*' -> build_fmul lhs_val rhs_val "multmp" builder
+            | '<' ->
+                (* Convert bool 0/1 to double 0.0 or 1.0 *)
+                let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
+                build_uitofp i double_type "booltmp" builder
+            | _ -> raise (Error "invalid binary operator")
+          end
+
+Binary operators start to get more interesting. The basic idea here is
+that we recursively emit code for the left-hand side of the expression,
+then the right-hand side, then we compute the result of the binary
+expression. In this code, we do a simple switch on the opcode to create
+the right LLVM instruction.
+
+In the example above, the LLVM builder class is starting to show its
+value. IRBuilder knows where to insert the newly created instruction,
+all you have to do is specify what instruction to create (e.g. with
+``Llvm.create_fadd``), which operands to use (``lhs`` and ``rhs`` here)
+and optionally provide a name for the generated instruction.
+
+One nice thing about LLVM is that the name is just a hint. For instance,
+if the code above emits multiple "addtmp" variables, LLVM will
+automatically provide each one with an increasing, unique numeric
+suffix. Local value names for instructions are purely optional, but it
+makes it much easier to read the IR dumps.
+
+`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
+rules: for example, the Left and Right operators of an `add
+instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
+result type of the add must match the operand types. Because all values
+in Kaleidoscope are doubles, this makes for very simple code for add,
+sub and mul.
+
+On the other hand, LLVM specifies that the `fcmp
+instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
+one bit integer). The problem with this is that Kaleidoscope wants the
+value to be a 0.0 or 1.0 value. In order to get these semantics, we
+combine the fcmp instruction with a `uitofp
+instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
+input integer into a floating point value by treating the input as an
+unsigned value. In contrast, if we used the `sitofp
+instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
+would return 0.0 and -1.0, depending on the input value.
+
+.. code-block:: ocaml
+
+
+      | Ast.Call (callee, args) ->
+          (* Look up the name in the module table. *)
+          let callee =
+            match lookup_function callee the_module with
+            | Some callee -> callee
+            | None -> raise (Error "unknown function referenced")
+          in
+          let params = params callee in
+
+          (* If argument mismatch error. *)
+          if Array.length params == Array.length args then () else
+            raise (Error "incorrect # arguments passed");
+
+          let fnty =
+            try Hashtbl.find function_types callee_name
+            with Not_found ->
+              raise (Error "unknown function type")
+          in
+
+          let args = Array.map codegen_expr args in
+          build_call fnty callee args "calltmp" builder
+
+Code generation for function calls is quite straightforward with LLVM.
+The code above initially does a function name lookup in the LLVM
+Module's symbol table. Recall that the LLVM Module is the container that
+holds all of the functions we are JIT'ing. By giving each function the
+same name as what the user specifies, we can use the LLVM symbol table
+to resolve function names for us.
+
+Once we have the function to call, we recursively codegen each argument
+that is to be passed in, and create an LLVM `call
+instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
+calling conventions by default, allowing these calls to also call into
+standard library functions like "sin" and "cos", with no additional
+effort.
+
+This wraps up our handling of the four basic expressions that we have so
+far in Kaleidoscope. Feel free to go in and add some more. For example,
+by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
+several other interesting instructions that are really easy to plug into
+our basic framework.
+
+Function Code Generation
+========================
+
+Code generation for prototypes and functions must handle a number of
+details, which make their code less beautiful than expression code
+generation, but allows us to illustrate some important points. First,
+lets talk about code generation for prototypes: they are used both for
+function bodies and external function declarations. The code starts
+with:
+
+.. code-block:: ocaml
+
+    let codegen_proto = function
+      | Ast.Prototype (name, args) ->
+          (* Make the function type: double(double,double) etc. *)
+          let doubles = Array.make (Array.length args) double_type in
+          let ft = function_type double_type doubles in
+	  Hashtbl.replace function_types name ft;
+          let f =
+            match lookup_function name the_module with
+
+This code packs a lot of power into a few lines. Note first that this
+function returns a "Function\*" instead of a "Value\*" (although at the
+moment they both are modeled by ``llvalue`` in ocaml). Because a
+"prototype" really talks about the external interface for a function
+(not the value computed by an expression), it makes sense for it to
+return the LLVM Function it corresponds to when codegen'd.
+
+The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that
+should be used for a given Prototype. Since all function arguments in
+Kaleidoscope are of type double, the first line creates a vector of "N"
+LLVM double types. It then uses the ``Llvm.function_type`` method to
+create a function type that takes "N" doubles as arguments, returns one
+double as a result, and that is not vararg (that uses the function
+``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued
+just like ``Constant``'s are, so you don't "new" a type, you "get" it.
+
+In newer versions of LLVM, function pointer values no longer retain
+signature information (opaque pointers). Therefore, we explicitly
+record the function type in a separate table Hashtbl.
+
+The final line above checks if the function has already been defined in
+``Codegen.the_module``. If not, we will create it.
+
+.. code-block:: ocaml
+
+            | None -> declare_function name ft the_module
+
+This indicates the type and name to use, as well as which module to
+insert into. By default we assume a function has
+``Llvm.Linkage.ExternalLinkage``. "`external
+linkage <../LangRef.html#linkage>`_" means that the function may be defined
+outside the current module and/or that it is callable by functions
+outside the module. The "``name``" passed in is the name the user
+specified: this name is registered in "``Codegen.the_module``"s symbol
+table, which is used by the function call code above.
+
+In Kaleidoscope, I choose to allow redefinitions of functions in two
+cases: first, we want to allow 'extern'ing a function more than once, as
+long as the prototypes for the externs match (since all arguments have
+the same type, we just have to check that the number of arguments
+match). Second, we want to allow 'extern'ing a function and then
+defining a body for it. This is useful when defining mutually recursive
+functions.
+
+.. code-block:: ocaml
+
+            (* If 'f' conflicted, there was already something named 'name'. If it
+             * has a body, don't allow redefinition or reextern. *)
+            | Some f ->
+                (* If 'f' already has a body, reject this. *)
+                if Array.length (basic_blocks f) == 0 then () else
+                  raise (Error "redefinition of function");
+
+                (* If 'f' took a different number of arguments, reject. *)
+                if Array.length (params f) == Array.length args then () else
+                  raise (Error "redefinition of function with different # args");
+                f
+          in
+
+In order to verify the logic above, we first check to see if the
+pre-existing function is "empty". In this case, empty means that it has
+no basic blocks in it, which means it has no body. If it has no body, it
+is a forward declaration. Since we don't allow anything after a full
+definition of the function, the code rejects this case. If the previous
+reference to a function was an 'extern', we simply verify that the
+number of arguments for that definition and this one match up. If not,
+we emit an error.
+
+.. code-block:: ocaml
+
+          (* Set names for all arguments. *)
+          Array.iteri (fun i a ->
+            set_value_name args.(i) a
+          ) (params f);
+          f
+
+The last bit of code for prototypes loops over all of the arguments in
+the function, setting the name of the LLVM Argument objects to match.
+Once this is set up, it returns the Function object to the caller.
+Note that we don't check for conflicting argument names here
+(e.g. "extern foo(a b a)"). Doing so would be very straight-forward
+with the mechanics we have already used above.
+
+.. code-block:: ocaml
+
+	  let codegen_func = function
+  	    | Ast.Function (proto, body) ->
+      		Hashtbl.clear named_values;
+
+      		(* Get function name *)
+      		let name =
+        		match proto with
+        		| Ast.Prototype (n, _) -> n
+      		in
+
+      		(* Get or create prototype *)
+      		let the_function =
+        		match lookup_function name the_module with
+        		| Some f -> f
+        		| None -> codegen_proto proto in
+
+Code generation for function definitions start with clear out the
+``Codegen.named_values`` map to make sure that there isn't anything
+in it from the last function we compiled. Next we extract the
+function name from the prototype and checks whether a function with
+that name already exists in the module. If it does, that existing
+declaration is reused. Otherwise, we just codegen the prototype 
+(Proto) and verify that it is ok. Code generation of the prototype
+ensures that there is an LLVM Function object that is ready to go
+for us.
+
+.. code-block:: ocaml
+
+          (* Create a new basic block to start insertion into. *)
+          let bb = append_block context "entry" the_function in
+          position_at_end bb builder;
+
+          try
+            let ret_val = codegen_expr body in
+
+Now we get to the point where the ``Codegen.builder`` is set up. The
+first line creates a new `basic
+block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"),
+which is inserted into ``the_function``. The second line then tells the
+builder that new instructions should be inserted into the end of the new
+basic block. Basic blocks in LLVM are an important part of functions
+that define the `Control Flow
+Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
+don't have any control flow, our functions will only contain one block
+at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :).
+
+.. code-block:: ocaml
+
+      	  (* Register arguments *)
+          Array.iter
+            (fun a ->
+              Hashtbl.add named_values (value_name a) a
+            )
+            (params the_function);
+
+Next, we register each function argument in the Codegen.named_values
+map. This establishes the initial variable bindings for the function
+body, allowing references to parameters to be resolved during
+expression code generation.
+
+.. code-block:: ocaml
+
+            let ret_val = codegen_expr body in
+
+            (* Finish off the function. *)
+            let _ = build_ret ret_val builder in
+
+            (* Validate the generated code, checking for consistency. *)
+            Llvm_analysis.assert_valid_function the_function;
+
+            the_function
+
+Once the insertion point is set up and function arguments are registered,
+we call the ``Codegen.codegen_func`` method for the root expression of
+the function. If no error happens, this emits code to compute the
+expression into the entry block and returns the value that was computed.
+Assuming no error, we then create an LLVM `ret instruction
+<../LangRef.html#ret-instruction>`_, which completes the function.
+Once the function is built, we call ``Llvm_analysis.assert_valid_function``,
+which is provided by LLVM. This function does a variety of consistency
+checks on the generated code, to determine if our compiler is doing
+everything right. Using this is important: it can catch a lot of bugs.
+Once the function is finished and validated, we return it.
+
+.. code-block:: ocaml
+
+          with e ->
+            delete_function the_function;
+            raise e
+
+The only piece left here is handling of the error case. For simplicity,
+we handle this by merely deleting the function we produced with the
+``Llvm.delete_function`` method. This allows the user to redefine a
+function that they incorrectly typed in before: if we didn't delete it,
+it would live in the symbol table, with a body, preventing future
+redefinition.
+
+This code does have a bug, though. Since the ``Codegen.codegen_proto``
+can return a previously defined forward declaration, our code can
+actually delete a forward declaration. There are a number of ways to fix
+this bug, see what you can come up with! Here is a testcase:
+
+::
+
+    extern foo(a b);     # ok, defines foo.
+    def foo(a b) c;      # error, 'c' is invalid.
+    def bar() foo(1, 2); # error, unknown function "foo"
+
+Driver Changes and Closing Thoughts
+===================================
+
+For now, code generation to LLVM doesn't really get us much, except that
+we can look at the pretty IR calls. The sample code inserts calls to
+Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM
+IR. This gives a nice way to look at the LLVM IR for simple functions.
+For example:
+
+::
+
+    ready> 4+5;
+    Read top-level expression:
+    define double @""() {
+    entry:
+            %addtmp = fadd double 4.000000e+00, 5.000000e+00
+            ret double %addtmp
+    }
+
+Note how the parser turns the top-level expression into anonymous
+functions for us. This will be handy when we add `JIT
+support <OCamlLangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that
+the code is very literally transcribed, no optimizations are being
+performed. We will `add
+optimizations <OCamlLangImpl4.html#trivial-constant-folding>`_ explicitly in the
+next chapter.
+
+::
+
+    ready> def foo(a b) a*a + 2*a*b + b*b;
+    Read function definition:
+    define double @foo(double %a, double %b) {
+    entry:
+            %multmp = fmul double %a, %a
+            %multmp1 = fmul double 2.000000e+00, %a
+            %multmp2 = fmul double %multmp1, %b
+            %addtmp = fadd double %multmp, %multmp2
+            %multmp3 = fmul double %b, %b
+            %addtmp4 = fadd double %addtmp, %multmp3
+            ret double %addtmp4
+    }
+
+This shows some simple arithmetic. Notice the striking similarity to the
+LLVM builder calls that we use to create the instructions.
+
+::
+
+    ready> def bar(a) foo(a, 4.0) + bar(31337);
+    Read function definition:
+    define double @bar(double %a) {
+    entry:
+            %calltmp = call double @foo(double %a, double 4.000000e+00)
+            %calltmp1 = call double @bar(double 3.133700e+04)
+            %addtmp = fadd double %calltmp, %calltmp1
+            ret double %addtmp
+    }
+
+This shows some function calls. Note that this function will take a long
+time to execute if you call it. In the future we'll add conditional
+control flow to actually make recursion useful :).
+
+::
+
+    ready> extern cos(x);
+    Read extern:
+    declare double @cos(double)
+
+    ready> cos(1.234);
+    Read top-level expression:
+    define double @""() {
+    entry:
+            %calltmp = call double @cos(double 1.234000e+00)
+            ret double %calltmp
+    }
+
+This shows an extern for the libm "cos" function, and a call to it.
+
+::
+
+    ready> ^D
+    ; ModuleID = 'my cool jit'
+
+    define double @""() {
+    entry:
+            %addtmp = fadd double 4.000000e+00, 5.000000e+00
+            ret double %addtmp
+    }
+
+    define double @foo(double %a, double %b) {
+    entry:
+            %multmp = fmul double %a, %a
+            %multmp1 = fmul double 2.000000e+00, %a
+            %multmp2 = fmul double %multmp1, %b
+            %addtmp = fadd double %multmp, %multmp2
+            %multmp3 = fmul double %b, %b
+            %addtmp4 = fadd double %addtmp, %multmp3
+            ret double %addtmp4
+    }
+
+    define double @bar(double %a) {
+    entry:
+            %calltmp = call double @foo(double %a, double 4.000000e+00)
+            %calltmp1 = call double @bar(double 3.133700e+04)
+            %addtmp = fadd double %calltmp, %calltmp1
+            ret double %addtmp
+    }
+
+    declare double @cos(double)
+
+    define double @""() {
+    entry:
+            %calltmp = call double @cos(double 1.234000e+00)
+            ret double %calltmp
+    }
+
+When you quit the current demo, it dumps out the IR for the entire
+module generated. Here you can see the big picture with all the
+functions referencing each other.
+
+This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
+we'll describe how to `add JIT codegen and optimizer
+support <OCamlLangImpl4.html>`_ to this so we can actually start running
+code!
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example, enhanced with
+the LLVM code generator. Because this uses the LLVM libraries, we need
+to link them in. To do this, we use the
+`llvm-config <https://llvm.org/cmds/llvm-config.html>`_ tool to inform
+our makefile/command line about which options to use:
+
+.. code-block:: bash
+
+    # Compile
+    ocamlbuild -use-ocamlfind -pkg camlp-streams,llvm,llvm.analysis toy.byte
+
+    # Run
+    ./toy.byte
+
+Here is the code:
+
+\_tags:
+    ::
+
+        <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
+        <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
+
+myocamlbuild.ml:
+    .. code-block:: ocaml
+
+        open Ocamlbuild_plugin;;
+
+        ocaml_lib ~extern:true "llvm";;
+        ocaml_lib ~extern:true "llvm_analysis";;
+
+        flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
+
+token.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Lexer Tokens
+         *===----------------------------------------------------------------------===*)
+
+        (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
+         * these others for known things. *)
+        type token =
+          (* commands *)
+          | Def | Extern
+
+          (* primary *)
+          | Ident of string | Number of float
+
+          (* unknown *)
+          | Kwd of char
+
+lexer.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Lexer
+         *===----------------------------------------------------------------------===*)
+
+        let rec lex = parser
+          (* Skip any whitespace. *)
+          | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
+
+          (* identifier: [a-zA-Z][a-zA-Z0-9] *)
+          | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
+              let buffer = Buffer.create 1 in
+              Buffer.add_char buffer c;
+              lex_ident buffer stream
+
+          (* number: [0-9.]+ *)
+          | [< ' ('0' .. '9' as c); stream >] ->
+              let buffer = Buffer.create 1 in
+              Buffer.add_char buffer c;
+              lex_number buffer stream
+
+          (* Comment until end of line. *)
+          | [< ' ('#'); stream >] ->
+              lex_comment stream
+
+          (* Otherwise, just return the character as its ascii value. *)
+          | [< 'c; stream >] ->
+              [< 'Token.Kwd c; lex stream >]
+
+          (* end of stream. *)
+          | [< >] -> [< >]
+
+        and lex_number buffer = parser
+          | [< ' ('0' .. '9' | '.' as c); stream >] ->
+              Buffer.add_char buffer c;
+              lex_number buffer stream
+          | [< stream=lex >] ->
+              [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
+
+        and lex_ident buffer = parser
+          | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
+              Buffer.add_char buffer c;
+              lex_ident buffer stream
+          | [< stream=lex >] ->
+              match Buffer.contents buffer with
+              | "def" -> [< 'Token.Def; stream >]
+              | "extern" -> [< 'Token.Extern; stream >]
+              | id -> [< 'Token.Ident id; stream >]
+
+        and lex_comment = parser
+          | [< ' ('\n'); stream=lex >] -> stream
+          | [< 'c; e=lex_comment >] -> e
+          | [< >] -> [< >]
+
+ast.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Abstract Syntax Tree (aka Parse Tree)
+         *===----------------------------------------------------------------------===*)
+
+        (* expr - Base type for all expression nodes. *)
+        type expr =
+          (* variant for numeric literals like "1.0". *)
+          | Number of float
+
+          (* variant for referencing a variable, like "a". *)
+          | Variable of string
+
+          (* variant for a binary operator. *)
+          | Binary of char * expr * expr
+
+          (* variant for function calls. *)
+          | Call of string * expr array
+
+        (* proto - This type represents the "prototype" for a function, which captures
+         * its name, and its argument names (thus implicitly the number of arguments the
+         * function takes). *)
+        type proto = Prototype of string * string array
+
+        (* func - This type represents a function definition itself. *)
+        type func = Function of proto * expr
+
+parser.ml:
+    .. code-block:: ocaml
+
+        (*===---------------------------------------------------------------------===
+         * Parser
+         *===---------------------------------------------------------------------===*)
+
+        (* binop_precedence - This holds the precedence for each binary operator that is
+         * defined *)
+        let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
+
+        (* precedence - Get the precedence of the pending binary operator token. *)
+        let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
+
+        (* primary
+         *   ::= identifier
+         *   ::= numberexpr
+         *   ::= parenexpr *)
+        let rec parse_primary = parser
+          (* numberexpr ::= number *)
+          | [< 'Token.Number n >] -> Ast.Number n
+
+          (* parenexpr ::= '(' expression ')' *)
+          | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
+
+          (* identifierexpr
+           *   ::= identifier
+           *   ::= identifier '(' argumentexpr ')' *)
+          | [< 'Token.Ident id; stream >] ->
+              let rec parse_args accumulator = parser
+                | [< e=parse_expr; stream >] ->
+                    begin parser
+                      | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
+                      | [< >] -> e :: accumulator
+                    end stream
+                | [< >] -> accumulator
+              in
+              let rec parse_ident id = parser
+                (* Call. *)
+                | [< 'Token.Kwd '(';
+                     args=parse_args [];
+                     'Token.Kwd ')' ?? "expected ')'">] ->
+                    Ast.Call (id, Array.of_list (List.rev args))
+
+                (* Simple variable ref. *)
+                | [< >] -> Ast.Variable id
+              in
+              parse_ident id stream
+
+          | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
+
+        (* binoprhs
+         *   ::= ('+' primary)* *)
+        and parse_bin_rhs expr_prec lhs stream =
+          match Stream.peek stream with
+          (* If this is a binop, find its precedence. *)
+          | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
+              let token_prec = precedence c in
+
+              (* If this is a binop that binds at least as tightly as the current binop,
+               * consume it, otherwise we are done. *)
+              if token_prec < expr_prec then lhs else begin
+                (* Eat the binop. *)
+                Stream.junk stream;
+
+                (* Parse the primary expression after the binary operator. *)
+                let rhs = parse_primary stream in
+
+                (* Okay, we know this is a binop. *)
+                let rhs =
+                  match Stream.peek stream with
+                  | Some (Token.Kwd c2) ->
+                      (* If BinOp binds less tightly with rhs than the operator after
+                       * rhs, let the pending operator take rhs as its lhs. *)
+                      let next_prec = precedence c2 in
+                      if token_prec < next_prec
+                      then parse_bin_rhs (token_prec + 1) rhs stream
+                      else rhs
+                  | _ -> rhs
+                in
+
+                (* Merge lhs/rhs. *)
+                let lhs = Ast.Binary (c, lhs, rhs) in
+                parse_bin_rhs expr_prec lhs stream
+              end
+          | _ -> lhs
+
+        (* expression
+         *   ::= primary binoprhs *)
+        and parse_expr = parser
+          | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
+
+        (* prototype
+         *   ::= id '(' id* ')' *)
+        let parse_prototype =
+          let rec parse_args accumulator = parser
+            | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
+            | [< >] -> accumulator
+          in
+
+          parser
+          | [< 'Token.Ident id;
+               'Token.Kwd '(' ?? "expected '(' in prototype";
+               args=parse_args [];
+               'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
+              (* success. *)
+              Ast.Prototype (id, Array.of_list (List.rev args))
+
+          | [< >] ->
+              raise (Stream.Error "expected function name in prototype")
+
+        (* definition ::= 'def' prototype expression *)
+        let parse_definition = parser
+          | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
+              Ast.Function (p, e)
+
+        (* toplevelexpr ::= expression *)
+        let parse_toplevel = parser
+          | [< e=parse_expr >] ->
+              (* Make an anonymous proto. *)
+              Ast.Function (Ast.Prototype ("", [||]), e)
+
+        (*  external ::= 'extern' prototype *)
+        let parse_extern = parser
+          | [< 'Token.Extern; e=parse_prototype >] -> e
+
+codegen.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Code Generation
+         *===----------------------------------------------------------------------===*)
+
+        open Llvm
+
+        exception Error of string
+
+        let context = global_context ()
+        let the_module = create_module context "my cool jit"
+        let builder = builder context
+        let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
+        let function_types:(string, lltype) Hashtbl.t = Hashtbl.create 10
+        let double_type = double_type context
+
+        let rec codegen_expr = function
+          | Ast.Number n -> const_float double_type n
+          | Ast.Variable name ->
+              (try Hashtbl.find named_values name with
+                | Not_found -> raise (Error "unknown variable name"))
+          | Ast.Binary (op, lhs, rhs) ->
+              let lhs_val = codegen_expr lhs in
+              let rhs_val = codegen_expr rhs in
+              begin
+                match op with
+                | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
+                | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
+                | '*' -> build_fmul lhs_val rhs_val "multmp" builder
+                | '<' ->
+                    (* Convert bool 0/1 to double 0.0 or 1.0 *)
+                    let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
+                    build_uitofp i double_type "booltmp" builder
+                | _ -> raise (Error "invalid binary operator")
+              end
+
+          | Ast.Call (callee_name, args) ->
+              let callee =
+                match lookup_function callee_name the_module with
+                | Some callee -> callee
+                | None -> raise (Error "unknown function referenced")
+              in
+              let params = params callee in
+
+              (* If argument mismatch error. *)
+              if Array.length params == Array.length args then () else
+                raise (Error "incorrect # arguments passed");
+
+              let fnty =
+                try Hashtbl.find function_types callee_name
+                with Not_found ->
+                  raise (Error "unknown function type")
+              in
+
+              let args = Array.map codegen_expr args in
+              build_call fnty callee args "calltmp" builder
+
+
+        let codegen_proto = function
+          | Ast.Prototype (name, args) ->
+
+              (* Make the function type: double(double,double) etc. *)
+              let arg_types =
+                Array.make (Array.length args) double_type
+              in
+        
+              let ft = function_type double_type arg_types in
+              Hashtbl.replace function_types name ft;
+
+              let f =
+                match lookup_function name the_module with
+                | None -> declare_function name ft the_module
+	        (* If 'f' conflicted, there was already something named 'name'. If it
+                 * has a body, don't allow redefinition or reextern. *)
+                | Some f ->
+                    (* If 'f' already has a body, reject this. *)
+                    if block_begin f <> At_end f then
+                      raise (Error "redefinition of function");
+
+                    (* If 'f' took a different number of arguments, reject. *)
+                    if element_type (type_of f) <> ft then
+                      raise (Error "redefinition of function with different # args");
+                    f
+              in
+
+              (* Set names for all arguments. *)
+              Array.iteri (fun i a ->
+                  set_value_name args.(i) a
+                ) (params f);
+
+              f
+
+        let codegen_func = function
+          | Ast.Function (proto, body) ->
+              Hashtbl.clear named_values;
+
+              (* Get function name *)
+              let name =
+                match proto with
+                | Ast.Prototype (n, _) -> n
+              in
+
+              (* Get or create prototype *)
+              let the_function =
+                match lookup_function name the_module with
+                | Some f -> f
+                | None -> codegen_proto proto
+              in
+
+              (* Reject redefinition *)
+              if Array.length (basic_blocks the_function) <> 0 then
+                raise (Error "redefinition of function");
+
+              (* Create a new basic block to start insertion into. *)
+              let bb = append_block context "entry" the_function in
+              position_at_end bb builder;
+
+              (* Register arguments *)
+              Array.iter
+                (fun a ->
+                  Hashtbl.add named_values (value_name a) a
+                )
+                (params the_function);
+        
+              try
+                let ret_val = codegen_expr body in
+
+                (* Finish off the function. *)
+                let _ = build_ret ret_val builder in
+        
+                (* Validate the generated code, checking for consistency. *)
+                Llvm_analysis.assert_valid_function the_function;
+
+                the_function
+              with e ->
+                delete_function the_function;
+                raise e
+
+toplevel.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Top-Level parsing and JIT Driver
+         *===----------------------------------------------------------------------===*)
+
+        open Llvm
+
+        (* top ::= definition | external | expression | ';' *)
+        let rec main_loop stream =
+          match Stream.peek stream with
+          | None -> ()
+
+          (* ignore top-level semicolons. *)
+          | Some (Token.Kwd ';') ->
+              Stream.junk stream;
+              main_loop stream
+
+          | Some token ->
+              begin
+                try match token with
+                | Token.Def ->
+                    let e = Parser.parse_definition stream in
+                    print_endline "parsed a function definition.";
+                    dump_value (Codegen.codegen_func e);
+                | Token.Extern ->
+                    let e = Parser.parse_extern stream in
+                    print_endline "parsed an extern.";
+                    dump_value (Codegen.codegen_proto e);
+                | _ ->
+                    (* Evaluate a top-level expression into an anonymous function. *)
+                    let e = Parser.parse_toplevel stream in
+                    print_endline "parsed a top-level expr";
+                    dump_value (Codegen.codegen_func e);
+                with Stream.Error s | Codegen.Error s ->
+                  (* Skip token for error recovery. *)
+                  Stream.junk stream;
+                  print_endline s;
+              end;
+              print_string "ready> "; flush stdout;
+              main_loop stream
+
+toy.ml:
+    .. code-block:: ocaml
+
+        (*===----------------------------------------------------------------------===
+         * Main driver code.
+         *===----------------------------------------------------------------------===*)
+
+        open Llvm
+
+        let main () =
+          (* Install standard binary operators.
+           * 1 is the lowest precedence. *)
+          Hashtbl.add Parser.binop_precedence '<' 10;
+          Hashtbl.add Parser.binop_precedence '+' 20;
+          Hashtbl.add Parser.binop_precedence '-' 20;
+          Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
+
+          (* Prime the first token. *)
+          print_string "ready> "; flush stdout;
+          let stream = Lexer.lex (Stream.of_channel stdin) in
+
+          (* Run the main "interpreter loop" now. *)
+          Toplevel.main_loop stream;
+
+          (* Print out all the generated code. *)
+          dump_module Codegen.the_module
+        ;;
+
+        main ()
+
+`Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_
+
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags b/llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags
new file mode 100644
index 0000000000000..990490a5db257
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/_tags
@@ -0,0 +1,2 @@
+<{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
+<*.{byte,native}>: g++, use_llvm, use_llvm_analysis
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml
new file mode 100644
index 0000000000000..4cc2dea86b78b
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/ast.ml
@@ -0,0 +1,25 @@
+(*===----------------------------------------------------------------------===
+ * Abstract Syntax Tree (aka Parse Tree)
+ *===----------------------------------------------------------------------===*)
+
+(* expr - Base type for all expression nodes. *)
+type expr =
+  (* variant for numeric literals like "1.0". *)
+  | Number of float
+
+  (* variant for referencing a variable, like "a". *)
+  | Variable of string
+
+  (* variant for a binary operator. *)
+  | Binary of char * expr * expr
+
+  (* variant for function calls. *)
+  | Call of string * expr array
+
+(* proto - This type represents the "prototype" for a function, which captures
+ * its name, and its argument names (thus implicitly the number of arguments the
+ * function takes). *)
+type proto = Prototype of string * string array
+
+(* func - This type represents a function definition itself. *)
+type func = Function of proto * expr
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml
new file mode 100644
index 0000000000000..3ed34799883bb
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/codegen.ml
@@ -0,0 +1,137 @@
+(*===----------------------------------------------------------------------===
+ * Code Generation
+ *===----------------------------------------------------------------------===*)
+
+open Llvm
+
+exception Error of string
+
+let context = global_context ()
+let the_module = create_module context "my cool jit"
+let builder = builder context
+let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
+let function_types:(string, lltype) Hashtbl.t = Hashtbl.create 10
+let double_type = double_type context
+
+let rec codegen_expr = function
+  | Ast.Number n -> const_float double_type n
+  | Ast.Variable name ->
+      (try Hashtbl.find named_values name with
+        | Not_found -> raise (Error "unknown variable name"))
+  | Ast.Binary (op, lhs, rhs) ->
+      let lhs_val = codegen_expr lhs in
+      let rhs_val = codegen_expr rhs in
+      begin
+        match op with
+        | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
+        | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
+        | '*' -> build_fmul lhs_val rhs_val "multmp" builder
+        | '<' ->
+            (* Convert bool 0/1 to double 0.0 or 1.0 *)
+            let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
+            build_uitofp i double_type "booltmp" builder
+        | _ -> raise (Error "invalid binary operator")
+      end
+
+  | Ast.Call (callee_name, args) ->
+      let callee =
+        match lookup_function callee_name the_module with
+        | Some callee -> callee
+        | None -> raise (Error "unknown function referenced")
+      in
+      let params = params callee in
+
+      (* If argument mismatch error. *)
+      if Array.length params == Array.length args then () else
+        raise (Error "incorrect # arguments passed");
+
+      let fnty =
+        try Hashtbl.find function_types callee_name
+        with Not_found ->
+          raise (Error "unknown function type")
+      in
+
+      let args = Array.map codegen_expr args in
+      build_call fnty callee args "calltmp" builder
+
+
+let codegen_proto = function
+  | Ast.Prototype (name, args) ->
+
+      (* Make the function type: double(double,double) etc. *)
+      let arg_types =
+        Array.make (Array.length args) double_type
+      in
+
+      let ft = function_type double_type arg_types in
+      Hashtbl.replace function_types name ft;
+
+      let f =
+        match lookup_function name the_module with
+        | None -> declare_function name ft the_module
+	(* If 'f' conflicted, there was already something named 'name'. If it
+         * has a body, don't allow redefinition or reextern. *)
+        | Some f ->
+            (* If 'f' already has a body, reject this. *)
+            if block_begin f <> At_end f then
+              raise (Error "redefinition of function");
+
+            (* If 'f' took a different number of arguments, reject. *)
+            if element_type (type_of f) <> ft then
+              raise (Error "redefinition of function with different # args");
+            f
+      in
+
+      (* Set names for all arguments. *)
+      Array.iteri (fun i a ->
+          set_value_name args.(i) a
+        ) (params f);
+
+      f
+
+let codegen_func = function
+  | Ast.Function (proto, body) ->
+      Hashtbl.clear named_values;
+
+      (* Get function name *)
+      let name =
+        match proto with
+        | Ast.Prototype (n, _) -> n
+      in
+
+      (* Get or create prototype *)
+      let the_function =
+        match lookup_function name the_module with
+        | Some f -> f
+        | None -> codegen_proto proto
+      in
+
+      (* Reject redefinition *)
+      if Array.length (basic_blocks the_function) <> 0 then
+        raise (Error "redefinition of function");
+
+      (* Create a new basic block to start insertion into. *)
+      let bb = append_block context "entry" the_function in
+      position_at_end bb builder;
+
+      (* Register arguments *)
+      Array.iter
+        (fun a ->
+          Hashtbl.add named_values (value_name a) a
+        )
+        (params the_function);
+
+      try
+        let ret_val = codegen_expr body in
+
+        (* Finish off the function. *)
+        let _ = build_ret ret_val builder in
+
+        (* Validate the generated code, checking for consistency. *)
+        Llvm_analysis.assert_valid_function the_function;
+
+        the_function
+      with e ->
+        delete_function the_function;
+        raise e
+
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml
new file mode 100644
index 0000000000000..22a915552f035
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/lexer.ml
@@ -0,0 +1,52 @@
+(*===----------------------------------------------------------------------===
+ * Lexer
+ *===----------------------------------------------------------------------===*)
+
+let rec lex = parser
+  (* Skip any whitespace. *)
+  | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
+
+  (* identifier: [a-zA-Z][a-zA-Z0-9] *)
+  | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
+      let buffer = Buffer.create 1 in
+      Buffer.add_char buffer c;
+      lex_ident buffer stream
+
+  (* number: [0-9.]+ *)
+  | [< ' ('0' .. '9' as c); stream >] ->
+      let buffer = Buffer.create 1 in
+      Buffer.add_char buffer c;
+      lex_number buffer stream
+
+  (* Comment until end of line. *)
+  | [< ' ('#'); stream >] ->
+      lex_comment stream
+
+  (* Otherwise, just return the character as its ascii value. *)
+  | [< 'c; stream >] ->
+      [< 'Token.Kwd c; lex stream >]
+
+  (* end of stream. *)
+  | [< >] -> [< >]
+
+and lex_number buffer = parser
+  | [< ' ('0' .. '9' | '.' as c); stream >] ->
+      Buffer.add_char buffer c;
+      lex_number buffer stream
+  | [< stream=lex >] ->
+      [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
+
+and lex_ident buffer = parser
+  | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
+      Buffer.add_char buffer c;
+      lex_ident buffer stream
+  | [< stream=lex >] ->
+      match Buffer.contents buffer with
+      | "def" -> [< 'Token.Def; stream >]
+      | "extern" -> [< 'Token.Extern; stream >]
+      | id -> [< 'Token.Ident id; stream >]
+
+and lex_comment = parser
+  | [< ' ('\n'); stream=lex >] -> stream
+  | [< 'c; e=lex_comment >] -> e
+  | [< >] -> [< >]
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml
new file mode 100644
index 0000000000000..b71f5d717ef7a
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/myocamlbuild.ml
@@ -0,0 +1,6 @@
+open Ocamlbuild_plugin;;
+
+ocaml_lib ~extern:true "llvm";;
+ocaml_lib ~extern:true "llvm_analysis";;
+
+flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml
new file mode 100644
index 0000000000000..83d9874a4ab6e
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/parser.ml
@@ -0,0 +1,122 @@
+(*===---------------------------------------------------------------------===
+ * Parser
+ *===---------------------------------------------------------------------===*)
+
+(* binop_precedence - This holds the precedence for each binary operator that is
+ * defined *)
+let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
+
+(* precedence - Get the precedence of the pending binary operator token. *)
+let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
+
+(* primary
+ *   ::= identifier
+ *   ::= numberexpr
+ *   ::= parenexpr *)
+let rec parse_primary = parser
+  (* numberexpr ::= number *)
+  | [< 'Token.Number n >] -> Ast.Number n
+
+  (* parenexpr ::= '(' expression ')' *)
+  | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
+
+  (* identifierexpr
+   *   ::= identifier
+   *   ::= identifier '(' argumentexpr ')' *)
+  | [< 'Token.Ident id; stream >] ->
+      let rec parse_args accumulator = parser
+        | [< e=parse_expr; stream >] ->
+            begin parser
+              | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
+              | [< >] -> e :: accumulator
+            end stream
+        | [< >] -> accumulator
+      in
+      let rec parse_ident id = parser
+        (* Call. *)
+        | [< 'Token.Kwd '(';
+             args=parse_args [];
+             'Token.Kwd ')' ?? "expected ')'">] ->
+            Ast.Call (id, Array.of_list (List.rev args))
+
+        (* Simple variable ref. *)
+        | [< >] -> Ast.Variable id
+      in
+      parse_ident id stream
+
+  | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
+
+(* binoprhs
+ *   ::= ('+' primary)* *)
+and parse_bin_rhs expr_prec lhs stream =
+  match Stream.peek stream with
+  (* If this is a binop, find its precedence. *)
+  | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
+      let token_prec = precedence c in
+
+      (* If this is a binop that binds at least as tightly as the current binop,
+       * consume it, otherwise we are done. *)
+      if token_prec < expr_prec then lhs else begin
+        (* Eat the binop. *)
+        Stream.junk stream;
+
+        (* Parse the primary expression after the binary operator. *)
+        let rhs = parse_primary stream in
+
+        (* Okay, we know this is a binop. *)
+        let rhs =
+          match Stream.peek stream with
+          | Some (Token.Kwd c2) ->
+              (* If BinOp binds less tightly with rhs than the operator after
+               * rhs, let the pending operator take rhs as its lhs. *)
+              let next_prec = precedence c2 in
+              if token_prec < next_prec
+              then parse_bin_rhs (token_prec + 1) rhs stream
+              else rhs
+          | _ -> rhs
+        in
+
+        (* Merge lhs/rhs. *)
+        let lhs = Ast.Binary (c, lhs, rhs) in
+        parse_bin_rhs expr_prec lhs stream
+      end
+  | _ -> lhs
+
+(* expression
+ *   ::= primary binoprhs *)
+and parse_expr = parser
+  | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
+
+(* prototype
+ *   ::= id '(' id* ')' *)
+let parse_prototype =
+  let rec parse_args accumulator = parser
+    | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
+    | [< >] -> accumulator
+  in
+
+  parser
+  | [< 'Token.Ident id;
+       'Token.Kwd '(' ?? "expected '(' in prototype";
+       args=parse_args [];
+       'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
+      (* success. *)
+      Ast.Prototype (id, Array.of_list (List.rev args))
+
+  | [< >] ->
+      raise (Stream.Error "expected function name in prototype")
+
+(* definition ::= 'def' prototype expression *)
+let parse_definition = parser
+  | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
+      Ast.Function (p, e)
+
+(* toplevelexpr ::= expression *)
+let parse_toplevel = parser
+  | [< e=parse_expr >] ->
+      (* Make an anonymous proto. *)
+      Ast.Function (Ast.Prototype ("", [||]), e)
+
+(*  external ::= 'extern' prototype *)
+let parse_extern = parser
+  | [< 'Token.Extern; e=parse_prototype >] -> e
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml
new file mode 100644
index 0000000000000..2ca782e149971
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/token.ml
@@ -0,0 +1,15 @@
+(*===----------------------------------------------------------------------===
+ * Lexer Tokens
+ *===----------------------------------------------------------------------===*)
+
+(* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
+ * these others for known things. *)
+type token =
+  (* commands *)
+  | Def | Extern
+
+  (* primary *)
+  | Ident of string | Number of float
+
+  (* unknown *)
+  | Kwd of char
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml
new file mode 100644
index 0000000000000..d1bf5d4c0c655
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/toplevel.ml
@@ -0,0 +1,39 @@
+(*===----------------------------------------------------------------------===
+ * Top-Level parsing and JIT Driver
+ *===----------------------------------------------------------------------===*)
+
+open Llvm
+
+(* top ::= definition | external | expression | ';' *)
+let rec main_loop stream =
+  match Stream.peek stream with
+  | None -> ()
+
+  (* ignore top-level semicolons. *)
+  | Some (Token.Kwd ';') ->
+      Stream.junk stream;
+      main_loop stream
+
+  | Some token ->
+      begin
+        try match token with
+        | Token.Def ->
+            let e = Parser.parse_definition stream in
+            print_endline "parsed a function definition.";
+            dump_value (Codegen.codegen_func e);
+        | Token.Extern ->
+            let e = Parser.parse_extern stream in
+            print_endline "parsed an extern.";
+            dump_value (Codegen.codegen_proto e);
+        | _ ->
+            (* Evaluate a top-level expression into an anonymous function. *)
+            let e = Parser.parse_toplevel stream in
+            print_endline "parsed a top-level expr";
+            dump_value (Codegen.codegen_func e);
+        with Stream.Error s | Codegen.Error s ->
+          (* Skip token for error recovery. *)
+          Stream.junk stream;
+          print_endline s;
+      end;
+      print_string "ready> "; flush stdout;
+      main_loop stream
diff --git a/llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml b/llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml
new file mode 100644
index 0000000000000..73c1a1ec62a28
--- /dev/null
+++ b/llvm/examples/OCaml-Kaleidoscope/Chapter3/toy.ml
@@ -0,0 +1,26 @@
+(*===----------------------------------------------------------------------===
+ * Main driver code.
+ *===----------------------------------------------------------------------===*)
+
+open Llvm
+
+let main () =
+  (* Install standard binary operators.
+   * 1 is the lowest precedence. *)
+  Hashtbl.add Parser.binop_precedence '<' 10;
+  Hashtbl.add Parser.binop_precedence '+' 20;
+  Hashtbl.add Parser.binop_precedence '-' 20;
+  Hashtbl.add Parser.binop_precedence '*' 40;    (* highest. *)
+
+  (* Prime the first token. *)
+  print_string "ready> "; flush stdout;
+  let stream = Lexer.lex (Stream.of_channel stdin) in
+
+  (* Run the main "interpreter loop" now. *)
+  Toplevel.main_loop stream;
+
+  (* Print out all the generated code. *)
+  dump_module Codegen.the_module
+;;
+
+main ()



More information about the llvm-commits mailing list