[llvm] r271225 - [Kaleidoscope][BuildingAJIT] Finish off Chapter 1.

Mon May 30 12:03:29 PDT 2016

Author: lhames
Date: Mon May 30 14:03:26 2016
New Revision: 271225

URL: http://llvm.org/viewvc/llvm-project?rev=271225&view=rev
Log:
[Kaleidoscope][BuildingAJIT] Finish off Chapter 1.

* Various tidy-up and streamlining of existing discussion.
* Describes findSymbol and removeModule.

Chapter 1 is now rough but essentially complete in terms of content.

Feedback, patches etc. very welcome.


Modified:
    llvm/trunk/docs/tutorial/BuildingAJIT1.rst

Modified: llvm/trunk/docs/tutorial/BuildingAJIT1.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/tutorial/BuildingAJIT1.rst?rev=271225&r1=271224&r2=271225&view=diff
==============================================================================

--- llvm/trunk/docs/tutorial/BuildingAJIT1.rst (original)
+++ llvm/trunk/docs/tutorial/BuildingAJIT1.rst Mon May 30 14:03:26 2016
@@ -5,10 +5,6 @@ Building a JIT: Starting out with Kaleid
 .. contents::
    :local:
 
-**This tutorial is under active development. It is incomplete and details may
-change frequently.** Nonetheless we invite you to try it out as it stands, and
-we welcome any feedback.
-
 Chapter 1 Introduction
 ======================
 
@@ -141,24 +137,25 @@ usual include guards and #includes [2]_,
 
 Our class begins with four members: A TargetMachine, TM, which will be used
 to build our LLVM compiler instance; A DataLayout, DL, which will be used for
-symbol mangling (more on that later), and two ORC *layers*: An
-ObjectLinkingLayer, and an IRCompileLayer. The ObjectLinkingLayer is the
-foundation of our JIT: it takes in-memory object files produced by a
-compiler and links them on the fly to make them executable. This
-JIT-on-top-of-a-linker design was introduced in MCJIT, where the linker was
-hidden inside the MCJIT class itself. In ORC we expose the linker as a visible,
-reusable component so that clients can access and configure it directly
-if they need to. In this tutorial our ObjectLinkingLayer will just be used to
-support the next layer in our stack: the IRCompileLayer, which will be
-responsible for taking LLVM IR, compiling it, and passing the resulting
-in-memory object files down to the object linking layer below.
-
-After our member variables comes typedef: ModuleHandle. This is the handle
-type that will be returned from our JIT's addModule method, and which can be
-used to remove a module again using the removeModule method. The IRCompileLayer
-class already provides a convenient handle type
-(IRCompileLayer::ModuleSetHandleT), so we will just provide a type-alias for
-this.
+symbol mangling (more on that later), and two ORC *layers*: an
+ObjectLinkingLayer and a IRCompileLayer. We'll be talking more about layers in
+the next chapter, but for now you can think of them as analogous to LLVM
+Passes: they wrap up useful JIT utilities behind an easy to compose interace.
+The first layer, ObjectLinkingLayer, is the foundation of our JIT: it takes
+in-memory object files produced by a compiler and links them on the fly to make
+them executable. This JIT-on-top-of-a-linker design was introduced in MCJIT,
+however the linker was hidden inside the MCJIT class. In ORC we expose the
+linker so that clients can access and configure it directly if they need to. In
+this tutorial our ObjectLinkingLayer will just be used to support the next layer
+in our stack: the IRCompileLayer, which will be responsible for taking LLVM IR,
+compiling it, and passing the resulting in-memory object files down to the
+object linking layer below.
+
+That's it for member variables, after that we have a single typedef:
+ModuleHandle. This is the handle type that will be returned from our JIT's
+addModule method, and can be passed to the removeModule method to remove a
+module. The IRCompileLayer class already provides a convenient handle type
+(IRCompileLayer::ModuleSetHandleT), so we just alias our ModuleHandle to this.
 
 .. code-block:: c++
 
@@ -176,8 +173,7 @@ the current process. Next we use our new
 DL, our DataLayout. Then we initialize our IRCompileLayer. Our IRCompile layer
 needs two things: (1) A reference to our object linking layer, and (2) a
 compiler instance to use to perform the actual compilation from IR to object
-files. We use the off-the-shelf SimpleCompiler instance for now, but in later
-chapters we will substitute our own configurable compiler classes. Finally, in
+files. We use the off-the-shelf SimpleCompiler instance for now. Finally, in
 the body of the constructor, we call the DynamicLibrary::LoadLibraryPermanently
 method with a nullptr argument. Normally the LoadLibraryPermanently method is
 called with the path of a dynamic library to load, but when passed a null
@@ -215,68 +211,62 @@ available for execution.
                                      std::move(Resolver));
   }
 
-Now we come to the first of our central JIT API methods: addModule. This method
-is responsible for adding IR to the JIT and making it available for execution.
-In this initial implementation of our JIT we will make our modules "available
-for execution" by compiling them immediately as they are added to the JIT. In
-later chapters we will teach our JIT to be lazier and instead add the Modules
-to a "pending" list to be compiled if and when they are first executed.
-
-To add our module to the IRCompileLayer we need to supply two auxiliary
-objects: a memory manager and a symbol resolver. The memory manager will be
-responsible for managing the memory allocated to JIT'd machine code, applying
-memory protection permissions, and registering JIT'd exception handling tables
-(if the JIT'd code uses exceptions). In our simple use-case we can just supply
-an off-the-shelf SectionMemoryManager instance. The memory, exception handling
-tables, etc. will be released when we remove the module from the JIT again
-(using removeModule) or, if removeModule is never called, when the JIT class
-itself is destructed.
-
-The second auxiliary class, the symbol resolver, is more interesting for us. It
-exists to tell the JIT where to look when it encounters an *external symbol* in
-the module we are adding. External symbols are any symbol not defined within the
-module itself, including calls to functions outside the JIT and calls to
-functions defined in other modules that have already been added to the JIT. It
-may seem as though modules added to the JIT should "know about one another" by
-default, but since we would still have to supply a symbol resolver for
-references to code outside the JIT it turns out to re-use this one mechanism
-for all symbol resolution. This has the added benefit that the user has full
-control over the symbol resolution process. Should we search for definitions
-within the JIT first, then fall back on external definitions? Or should we
-prefer external definitions where available and only JIT code if we don't
-already have an available implementation? By using a single symbol resolution
-scheme we are free to choose whatever makes the most sense for any given use
-case.
-
-Building a symbol resolver is made especially easy by the
-*createLambdaResolver* function. This function takes two lambdas (actually
-they don't have to be lambdas, any object with a call operator will do) and
-returns a RuntimeDyld::SymbolResolver instance. The first lambda is used as
-the implementation of the resolver's findSymbolInLogicalDylib method. This
-method searches for symbol definitions that should be thought of as being part
-of the same "logical" dynamic library as this Module. If you are familiar with
-static linking: this means that findSymbolInLogicalDylib should expose symbols
-with common linkage and hidden visibility. If all this sounds foreign you can
-ignore the details and just remember that this is the first method that the
-linker will use to try to find a symbol definition. If the
-findSymbolInLogicalDylib method returns a null result then the linker will
-call the second symbol resolver method, called findSymbol. This searches for
-symbols that should be thought of as external to (but visibile from) the module
-and its logical dylib.
-
-In this tutorial we will use the following simple breakdown: All modules added
-to the JIT will behave as if they were linked into a single, ever-growing
-logical dylib. To implement this our first lambda (the one defining
-findSymbolInLogicalDylib) will just search for JIT'd code by calling the
-CompileLayer's findSymbol method. If we don't find a symbol in the JIT itself
-we'll fall back to our second lambda, which implements findSymbol. This will
-use the RTDyldMemoyrManager::getSymbolAddressInProcess method to search for
-the symbol within the program itself. If we can't find a symbol definition
-via either of these paths the JIT will refuse to accept our moudle, returning
-a "symbol not found" error.
+Now we come to the first of our JIT API methods: addModule. This method is
+responsible for adding IR to the JIT and making it available for execution. In
+this initial implementation of our JIT we will make our modules "available for
+execution" by adding them straight to the IRCompileLayer, which will
+immediately compile them. In later chapters we will teach our JIT to be lazier
+and instead add the Modules to a "pending" list to be compiled if and when they
+are first executed.
+
+To add our module to the IRCompileLayer we need to supply two auxiliary objects
+(as well as the module itself): a memory manager and a symbol resolver.  The
+memory manager will be responsible for managing the memory allocated to JIT'd
+machine code, setting memory permissions, and registering exception handling
+tables (if the JIT'd code uses exceptions). For our memory manager we will use
+the SectionMemoryManager class: another off-the-shelf utility that provides all
+the basic functionality we need. The second auxiliary class, the symbol
+resolver, is more interesting for us. It exists to tell the JIT where to look
+when it encounters an *external symbol* in the module we are adding.  External
+symbols are any symbol not defined within the module itself, including calls to
+functions outside the JIT and calls to functions defined in other modules that
+have already been added to the JIT. It may seem as though modules added to the
+JIT should "know about one another" by default, but since we would still have to
+supply a symbol resolver for references to code outside the JIT it turns out to
+be easier to just re-use this one mechanism for all symbol resolution. This has
+the added benefit that the user has full control over the symbol resolution
+process. Should we search for definitions within the JIT first, then fall back
+on external definitions? Or should we prefer external definitions where
+available and only JIT code if we don't already have an available
+implementation? By using a single symbol resolution scheme we are free to choose
+whatever makes the most sense for any given use case.
+
+Building a symbol resolver is made especially easy by the *createLambdaResolver*
+function. This function takes two lambdas [3]_ and returns a
+RuntimeDyld::SymbolResolver instance. The first lambda is used as the
+implementation of the resolver's findSymbolInLogicalDylib method, which searches
+for symbol definitions that should be thought of as being part of the same
+"logical" dynamic library as this Module. If you are familiar with static
+linking: this means that findSymbolInLogicalDylib should expose symbols with
+common linkage and hidden visibility. If all this sounds foreign you can ignore
+the details and just remember that this is the first method that the linker will
+use to try to find a symbol definition. If the findSymbolInLogicalDylib method
+returns a null result then the linker will call the second symbol resolver
+method, called findSymbol, which searches for symbols that should be thought of
+as external to (but visibile from) the module and its logical dylib. In this
+tutorial we will adopt the following simple scheme: All modules added to the JIT
+will behave as if they were linked into a single, ever-growing logical dylib. To
+implement this our first lambda (the one defining findSymbolInLogicalDylib) will
+just search for JIT'd code by calling the CompileLayer's findSymbol method. If
+we don't find a symbol in the JIT itself we'll fall back to our second lambda,
+which implements findSymbol. This will use the
+RTDyldMemoyrManager::getSymbolAddressInProcess method to search for the symbol
+within the program itself. If we can't find a symbol definition via either of
+these paths the JIT will refuse to accept our module, returning a "symbol not
+found" error.
 
 Now that we've built our symbol resolver we're ready to add our module to the
-JIT. We do this by calling the CompileLayer's addModuleSet method [3]_. Since
+JIT. We do this by calling the CompileLayer's addModuleSet method [4]_. Since
 we only have a single Module and addModuleSet expects a collection, we will
 create a vector of modules and add our module as the only member. Since we
 have already typedef'd our ModuleHandle type to be the same as the
@@ -296,11 +286,34 @@ directly from our addModule method.
     CompileLayer.removeModuleSet(H);
   }
 
-*To be done: describe findSymbol and removeModule -- why do we mangle? what's
-the relationship between findSymbol and resolvers, why remove modules...*
+Now that we can add code to our JIT, we need a way to find the symbols we've
+added to it. To do that we call the findSymbol method on our IRCompileLayer,
+but with a twist: We have to *mangle* the name of the symbol we're searching
+for first. The reason for this is that the ORC JIT components use mangled
+symbols internally the same way a static compiler and linker would, rather
+than using plain IR symbol names. The kind of mangling will depend on the
+DataLayout, which in turn depends on the target platform. To allow us to
+remain portable and search based on the un-mangled name, we just re-produce
+this mangling ourselves.
+
+We now come to the last method in our JIT API: removeModule. This method is
+responsible for destructing the MemoryManager and SymbolResolver that were
+added with a given module, freeing any resources they were using in the
+process. In our Kaleidoscope demo we rely on this method to remove the module
+representing the most recent top-level expression, preventing it from being
+treated as a duplicate definition when the next top-level expression is
+entered. It is generally good to free any module that you know you won't need
+to call further, just to free up the resources dedicated to it. However, you
+don't strictly need to do this: All resources will be cleaned up when your
+JIT class is destructed, if the haven't been freed before then.
+
+This brings us to the end of Chapter 1 of Building a JIT. You now have a basic
+but fully functioning JIT stack that you can use to take LLVM IR and make it
+executable within the context of your JIT process. In the next chapter we'll
+look at how to extend this JIT to produce better quality code, and in the
+process take a deeper look at the ORC layer concept.
 
-*To be done: Conclusion, exercises (maybe a utility for a standalone IR JIT,
-like a mini-LLI), feed to next chapter.*
+`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
 
 Full Code Listing
 =================
@@ -320,8 +333,6 @@ Here is the code:
 .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h
    :language: c++
 
-`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
-
 .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a
        simplifying assumption: symbols cannot be re-defined. This will make it
        impossible to re-define symbols in the REPL, but will make our symbol
@@ -356,6 +367,9 @@ Here is the code:
        |                       | makes symbols in the host process searchable. |
        +-----------------------+-----------------------------------------------+
 
-.. [3] ORC layers accept sets of Modules, rather than individual ones, so that
+.. [3] Actually they don't have to be lambdas, any object with a call operator
+       will do, including plain old functions or std::functions.
+
+.. [4] ORC layers accept sets of Modules, rather than individual ones, so that
        all Modules in the set could be co-located by the memory manager, though
        this feature is not yet implemented.