[www-releases] r368037 - Add 8.0.1 LLVM docs
Tom Stellard via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 6 06:51:06 PDT 2019
Added: www-releases/trunk/8.0.1/docs/_sources/Phabricator.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Phabricator.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Phabricator.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Phabricator.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,250 @@
+.. _phabricator-reviews:
+
+=============================
+Code Reviews with Phabricator
+=============================
+
+.. contents::
+ :local:
+
+If you prefer to use a web user interface for code reviews, you can now submit
+your patches for Clang and LLVM at `LLVM's Phabricator`_ instance.
+
+While Phabricator is a useful tool for some, the relevant -commits mailing list
+is the system of record for all LLVM code review. The mailing list should be
+added as a subscriber on all reviews, and Phabricator users should be prepared
+to respond to free-form comments in mail sent to the commits list.
+
+Sign up
+-------
+
+To get started with Phabricator, navigate to `https://reviews.llvm.org`_ and
+click the power icon in the top right. You can register with a GitHub account,
+a Google account, or you can create your own profile.
+
+Make *sure* that the email address registered with Phabricator is subscribed
+to the relevant -commits mailing list. If you are not subscribed to the commit
+list, all mail sent by Phabricator on your behalf will be held for moderation.
+
+Note that if you use your Subversion user name as Phabricator user name,
+Phabricator will automatically connect your submits to your Phabricator user in
+the `Code Repository Browser`_.
+
+Requesting a review via the command line
+----------------------------------------
+
+Phabricator has a tool called *Arcanist* to upload patches from
+the command line. To get you set up, follow the
+`Arcanist Quick Start`_ instructions.
+
+You can learn more about how to use arc to interact with
+Phabricator in the `Arcanist User Guide`_.
+
+.. _phabricator-request-review-web:
+
+Requesting a review via the web interface
+-----------------------------------------
+
+The tool to create and review patches in Phabricator is called
+*Differential*.
+
+Note that you can upload patches created through various diff tools,
+including git and svn. To make reviews easier, please always include
+**as much context as possible** with your diff! Don't worry, Phabricator
+will automatically send a diff with a smaller context in the review
+email, but having the full file in the web interface will help the
+reviewer understand your code.
+
+To get a full diff, use one of the following commands (or just use Arcanist
+to upload your patch):
+
+* ``git show HEAD -U999999 > mypatch.patch``
+* ``git format-patch -U999999 @{u}``
+* ``svn diff --diff-cmd=diff -x -U999999``
+
+To upload a new patch:
+
+* Click *Differential*.
+* Click *+ Create Diff*.
+* Paste the text diff or browse to the patch file. Click *Create Diff*.
+* Leave this first Repository field blank. (We'll fill in the Repository
+ later, when sending the review.)
+* Leave the drop down on *Create a new Revision...* and click *Continue*.
+* Enter a descriptive title and summary. The title and summary are usually
+ in the form of a :ref:`commit message <commit messages>`.
+* Add reviewers (see below for advice). (If you set the Repository field
+ correctly, llvm-commits or cfe-commits will be subscribed automatically;
+ otherwise, you will have to manually subscribe them.)
+* In the Repository field, enter the name of the project (LLVM, Clang,
+ etc.) to which the review should be sent.
+* Click *Save*.
+
+To submit an updated patch:
+
+* Click *Differential*.
+* Click *+ Create Diff*.
+* Paste the updated diff or browse to the updated patch file. Click *Create Diff*.
+* Select the review you want to from the *Attach To* dropdown and click
+ *Continue*.
+* Leave the Repository field blank. (We previously filled out the Repository
+ for the review request.)
+* Add comments about the changes in the new diff. Click *Save*.
+
+Choosing reviewers: You typically pick one or two people as initial reviewers.
+This choice is not crucial, because you are merely suggesting and not requiring
+them to participate. Many people will see the email notification on cfe-commits
+or llvm-commits, and if the subject line suggests the patch is something they
+should look at, they will.
+
+
+.. _finding-potential-reviewers:
+
+Finding potential reviewers
+---------------------------
+
+Here are a couple of ways to pick the initial reviewer(s):
+
+* Use ``svn blame`` and the commit log to find names of people who have
+ recently modified the same area of code that you are modifying.
+* Look in CODE_OWNERS.TXT to see who might be responsible for that area.
+* If you've discussed the change on a dev list, the people who participated
+ might be appropriate reviewers.
+
+Even if you think the code owner is the busiest person in the world, it's still
+okay to put them as a reviewer. Being the code owner means they have accepted
+responsibility for making sure the review happens.
+
+Reviewing code with Phabricator
+-------------------------------
+
+Phabricator allows you to add inline comments as well as overall comments
+to a revision. To add an inline comment, select the lines of code you want
+to comment on by clicking and dragging the line numbers in the diff pane.
+When you have added all your comments, scroll to the bottom of the page and
+click the Submit button.
+
+You can add overall comments in the text box at the bottom of the page.
+When you're done, click the Submit button.
+
+Phabricator has many useful features, for example allowing you to select
+diffs between different versions of the patch as it was reviewed in the
+*Revision Update History*. Most features are self descriptive - explore, and
+if you have a question, drop by on #llvm in IRC to get help.
+
+Note that as e-mail is the system of reference for code reviews, and some
+people prefer it over a web interface, we do not generate automated mail
+when a review changes state, for example by clicking "Accept Revision" in
+the web interface. Thus, please type LGTM into the comment box to accept
+a change from Phabricator.
+
+Committing a change
+-------------------
+
+Once a patch has been reviewed and approved on Phabricator it can then be
+committed to trunk. If you do not have commit access, someone has to
+commit the change for you (with attribution). It is sufficient to add
+a comment to the approved review indicating you cannot commit the patch
+yourself. If you have commit access, there are multiple workflows to commit the
+change. Whichever method you follow it is recommended that your commit message
+ends with the line:
+
+::
+
+ Differential Revision: <URL>
+
+where ``<URL>`` is the URL for the code review, starting with
+``https://reviews.llvm.org/``.
+
+This allows people reading the version history to see the review for
+context. This also allows Phabricator to detect the commit, close the
+review, and add a link from the review to the commit.
+
+Note that if you use the Arcanist tool the ``Differential Revision`` line will
+be added automatically. If you don't want to use Arcanist, you can add the
+``Differential Revision`` line (as the last line) to the commit message
+yourself.
+
+Using the Arcanist tool can simplify the process of committing reviewed code as
+it will retrieve reviewers, the ``Differential Revision``, etc from the review
+and place it in the commit message. You may also commit an accepted change
+directly using ``git llvm push``, per the section in the :ref:`getting started
+guide <commit_from_git>`.
+
+Note that if you commit the change without using Arcanist and forget to add the
+``Differential Revision`` line to your commit message then it is recommended
+that you close the review manually. In the web UI, under "Leap Into Action" put
+the SVN revision number in the Comment, set the Action to "Close Revision" and
+click Submit. Note the review must have been Accepted first.
+
+
+Committing someone's change from Phabricator
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+On a clean Git repository on an up to date ``master`` branch run the
+following (where ``<Revision>`` is the Phabricator review number):
+
+::
+
+ arc patch D<Revision>
+
+
+This will create a new branch called ``arcpatch-D<Revision>`` based on the
+current ``master`` and will create a commit corresponding to ``D<Revision>`` with a
+commit message derived from information in the Phabricator review.
+
+Check you are happy with the commit message and amend it if necessary. Then,
+make sure the commit is up-to-date, and commit it. This can be done by running
+the following:
+
+::
+
+ git pull --rebase origin master
+ git show # Ensure the patch looks correct.
+ ninja check-$whatever # Rerun the appropriate tests if needed.
+ git llvm push
+
+Subversion and Arcanist (deprecated)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To download a change from Phabricator and commit it with subversion, you should
+first make sure you have a clean working directory. Then run the following
+(where ``<Revision>`` is the Phabricator review number):
+
+::
+
+ arc patch D<Revision>
+ arc commit --revision D<Revision>
+
+The first command will take the latest version of the reviewed patch and apply
+it to the working copy. The second command will commit this revision to trunk.
+
+Abandoning a change
+-------------------
+
+If you decide you should not commit the patch, you should explicitly abandon
+the review so that reviewers don't think it is still open. In the web UI,
+scroll to the bottom of the page where normally you would enter an overall
+comment. In the drop-down Action list, which defaults to "Comment," you should
+select "Abandon Revision" and then enter a comment explaining why. Click the
+Submit button to finish closing the review.
+
+Status
+------
+
+Please let us know whether you like it and what could be improved! We're still
+working on setting up a bug tracker, but you can email klimek-at-google-dot-com
+and chandlerc-at-gmail-dot-com and CC the llvm-dev mailing list with questions
+until then. We also could use help implementing improvements. This sadly is
+really painful and hard because the Phabricator codebase is in PHP and not as
+testable as you might like. However, we've put exactly what we're deploying up
+on an `llvm-reviews GitHub project`_ where folks can hack on it and post pull
+requests. We're looking into what the right long-term hosting for this is, but
+note that it is a derivative of an existing open source project, and so not
+trivially a good fit for an official LLVM project.
+
+.. _LLVM's Phabricator: https://reviews.llvm.org
+.. _`https://reviews.llvm.org`: https://reviews.llvm.org
+.. _Code Repository Browser: https://reviews.llvm.org/diffusion/
+.. _Arcanist Quick Start: https://secure.phabricator.com/book/phabricator/article/arcanist_quick_start/
+.. _Arcanist User Guide: https://secure.phabricator.com/book/phabricator/article/arcanist/
+.. _llvm-reviews GitHub project: https://github.com/r4nt/llvm-reviews/
Added: www-releases/trunk/8.0.1/docs/_sources/ProgrammersManual.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/ProgrammersManual.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/ProgrammersManual.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/ProgrammersManual.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,4099 @@
+========================
+LLVM Programmer's Manual
+========================
+
+.. contents::
+ :local:
+
+.. warning::
+ This is always a work in progress.
+
+.. _introduction:
+
+Introduction
+============
+
+This document is meant to highlight some of the important classes and interfaces
+available in the LLVM source-base. This manual is not intended to explain what
+LLVM is, how it works, and what LLVM code looks like. It assumes that you know
+the basics of LLVM and are interested in writing transformations or otherwise
+analyzing or manipulating the code.
+
+This document should get you oriented so that you can find your way in the
+continuously growing source code that makes up the LLVM infrastructure. Note
+that this manual is not intended to serve as a replacement for reading the
+source code, so if you think there should be a method in one of these classes to
+do something, but it's not listed, check the source. Links to the `doxygen
+<http://llvm.org/doxygen/>`__ sources are provided to make this as easy as
+possible.
+
+The first section of this document describes general information that is useful
+to know when working in the LLVM infrastructure, and the second describes the
+Core LLVM classes. In the future this manual will be extended with information
+describing how to use extension libraries, such as dominator information, CFG
+traversal routines, and useful utilities like the ``InstVisitor`` (`doxygen
+<http://llvm.org/doxygen/InstVisitor_8h_source.html>`__) template.
+
+.. _general:
+
+General Information
+===================
+
+This section contains general information that is useful if you are working in
+the LLVM source-base, but that isn't specific to any particular API.
+
+.. _stl:
+
+The C++ Standard Template Library
+---------------------------------
+
+LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much
+more than you are used to, or have seen before. Because of this, you might want
+to do a little background reading in the techniques used and capabilities of the
+library. There are many good pages that discuss the STL, and several books on
+the subject that you can get, so it will not be discussed in this document.
+
+Here are some useful links:
+
+#. `cppreference.com
+ <http://en.cppreference.com/w/>`_ - an excellent
+ reference for the STL and other parts of the standard C++ library.
+
+#. `C++ In a Nutshell <http://www.tempest-sw.com/cpp/>`_ - This is an O'Reilly
+ book in the making. It has a decent Standard Library Reference that rivals
+ Dinkumware's, and is unfortunately no longer free since the book has been
+ published.
+
+#. `C++ Frequently Asked Questions <http://www.parashift.com/c++-faq-lite/>`_.
+
+#. `SGI's STL Programmer's Guide <http://www.sgi.com/tech/stl/>`_ - Contains a
+ useful `Introduction to the STL
+ <http://www.sgi.com/tech/stl/stl_introduction.html>`_.
+
+#. `Bjarne Stroustrup's C++ Page
+ <http://www.research.att.com/%7Ebs/C++.html>`_.
+
+#. `Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0
+ (even better, get the book)
+ <http://www.mindview.net/Books/TICPP/ThinkingInCPP2e.html>`_.
+
+You are also encouraged to take a look at the :doc:`LLVM Coding Standards
+<CodingStandards>` guide which focuses on how to write maintainable code more
+than where to put your curly braces.
+
+.. _resources:
+
+Other useful references
+-----------------------
+
+#. `Using static and shared libraries across platforms
+ <http://www.fortran-2000.com/ArnaudRecipes/sharedlib.html>`_
+
+.. _apis:
+
+Important and useful LLVM APIs
+==============================
+
+Here we highlight some LLVM APIs that are generally useful and good to know
+about when writing transformations.
+
+.. _isa:
+
+The ``isa<>``, ``cast<>`` and ``dyn_cast<>`` templates
+------------------------------------------------------
+
+The LLVM source-base makes extensive use of a custom form of RTTI. These
+templates have many similarities to the C++ ``dynamic_cast<>`` operator, but
+they don't have some drawbacks (primarily stemming from the fact that
+``dynamic_cast<>`` only works on classes that have a v-table). Because they are
+used so often, you must know what they do and how they work. All of these
+templates are defined in the ``llvm/Support/Casting.h`` (`doxygen
+<http://llvm.org/doxygen/Casting_8h_source.html>`__) file (note that you very
+rarely have to include this file directly).
+
+``isa<>``:
+ The ``isa<>`` operator works exactly like the Java "``instanceof``" operator.
+ It returns true or false depending on whether a reference or pointer points to
+ an instance of the specified class. This can be very useful for constraint
+ checking of various sorts (example below).
+
+``cast<>``:
+ The ``cast<>`` operator is a "checked cast" operation. It converts a pointer
+ or reference from a base class to a derived class, causing an assertion
+ failure if it is not really an instance of the right type. This should be
+ used in cases where you have some information that makes you believe that
+ something is of the right type. An example of the ``isa<>`` and ``cast<>``
+ template is:
+
+ .. code-block:: c++
+
+ static bool isLoopInvariant(const Value *V, const Loop *L) {
+ if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
+ return true;
+
+ // Otherwise, it must be an instruction...
+ return !L->contains(cast<Instruction>(V)->getParent());
+ }
+
+ Note that you should **not** use an ``isa<>`` test followed by a ``cast<>``,
+ for that use the ``dyn_cast<>`` operator.
+
+``dyn_cast<>``:
+ The ``dyn_cast<>`` operator is a "checking cast" operation. It checks to see
+ if the operand is of the specified type, and if so, returns a pointer to it
+ (this operator does not work with references). If the operand is not of the
+ correct type, a null pointer is returned. Thus, this works very much like
+ the ``dynamic_cast<>`` operator in C++, and should be used in the same
+ circumstances. Typically, the ``dyn_cast<>`` operator is used in an ``if``
+ statement or some other flow control statement like this:
+
+ .. code-block:: c++
+
+ if (auto *AI = dyn_cast<AllocationInst>(Val)) {
+ // ...
+ }
+
+ This form of the ``if`` statement effectively combines together a call to
+ ``isa<>`` and a call to ``cast<>`` into one statement, which is very
+ convenient.
+
+ Note that the ``dyn_cast<>`` operator, like C++'s ``dynamic_cast<>`` or Java's
+ ``instanceof`` operator, can be abused. In particular, you should not use big
+ chained ``if/then/else`` blocks to check for lots of different variants of
+ classes. If you find yourself wanting to do this, it is much cleaner and more
+ efficient to use the ``InstVisitor`` class to dispatch over the instruction
+ type directly.
+
+``cast_or_null<>``:
+ The ``cast_or_null<>`` operator works just like the ``cast<>`` operator,
+ except that it allows for a null pointer as an argument (which it then
+ propagates). This can sometimes be useful, allowing you to combine several
+ null checks into one.
+
+``dyn_cast_or_null<>``:
+ The ``dyn_cast_or_null<>`` operator works just like the ``dyn_cast<>``
+ operator, except that it allows for a null pointer as an argument (which it
+ then propagates). This can sometimes be useful, allowing you to combine
+ several null checks into one.
+
+These five templates can be used with any classes, whether they have a v-table
+or not. If you want to add support for these templates, see the document
+:doc:`How to set up LLVM-style RTTI for your class hierarchy
+<HowToSetUpLLVMStyleRTTI>`
+
+.. _string_apis:
+
+Passing strings (the ``StringRef`` and ``Twine`` classes)
+---------------------------------------------------------
+
+Although LLVM generally does not do much string manipulation, we do have several
+important APIs which take strings. Two important examples are the Value class
+-- which has names for instructions, functions, etc. -- and the ``StringMap``
+class which is used extensively in LLVM and Clang.
+
+These are generic classes, and they need to be able to accept strings which may
+have embedded null characters. Therefore, they cannot simply take a ``const
+char *``, and taking a ``const std::string&`` requires clients to perform a heap
+allocation which is usually unnecessary. Instead, many LLVM APIs use a
+``StringRef`` or a ``const Twine&`` for passing strings efficiently.
+
+.. _StringRef:
+
+The ``StringRef`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``StringRef`` data type represents a reference to a constant string (a
+character array and a length) and supports the common operations available on
+``std::string``, but does not require heap allocation.
+
+It can be implicitly constructed using a C style null-terminated string, an
+``std::string``, or explicitly with a character pointer and length. For
+example, the ``StringRef`` find function is declared as:
+
+.. code-block:: c++
+
+ iterator find(StringRef Key);
+
+and clients can call it using any one of:
+
+.. code-block:: c++
+
+ Map.find("foo"); // Lookup "foo"
+ Map.find(std::string("bar")); // Lookup "bar"
+ Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
+
+Similarly, APIs which need to return a string may return a ``StringRef``
+instance, which can be used directly or converted to an ``std::string`` using
+the ``str`` member function. See ``llvm/ADT/StringRef.h`` (`doxygen
+<http://llvm.org/doxygen/StringRef_8h_source.html>`__) for more
+information.
+
+You should rarely use the ``StringRef`` class directly, because it contains
+pointers to external memory it is not generally safe to store an instance of the
+class (unless you know that the external storage will not be freed).
+``StringRef`` is small and pervasive enough in LLVM that it should always be
+passed by value.
+
+The ``Twine`` class
+^^^^^^^^^^^^^^^^^^^
+
+The ``Twine`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Twine.html>`__)
+class is an efficient way for APIs to accept concatenated strings. For example,
+a common LLVM paradigm is to name one instruction based on the name of another
+instruction with a suffix, for example:
+
+.. code-block:: c++
+
+ New = CmpInst::Create(..., SO->getName() + ".cmp");
+
+The ``Twine`` class is effectively a lightweight `rope
+<http://en.wikipedia.org/wiki/Rope_(computer_science)>`_ which points to
+temporary (stack allocated) objects. Twines can be implicitly constructed as
+the result of the plus operator applied to strings (i.e., a C strings, an
+``std::string``, or a ``StringRef``). The twine delays the actual concatenation
+of strings until it is actually required, at which point it can be efficiently
+rendered directly into a character array. This avoids unnecessary heap
+allocation involved in constructing the temporary results of string
+concatenation. See ``llvm/ADT/Twine.h`` (`doxygen
+<http://llvm.org/doxygen/Twine_8h_source.html>`__) and :ref:`here <dss_twine>`
+for more information.
+
+As with a ``StringRef``, ``Twine`` objects point to external memory and should
+almost never be stored or mentioned directly. They are intended solely for use
+when defining a function which should be able to efficiently accept concatenated
+strings.
+
+.. _formatting_strings:
+
+Formatting strings (the ``formatv`` function)
+---------------------------------------------
+While LLVM doesn't necessarily do a lot of string manipulation and parsing, it
+does do a lot of string formatting. From diagnostic messages, to llvm tool
+outputs such as ``llvm-readobj`` to printing verbose disassembly listings and
+LLDB runtime logging, the need for string formatting is pervasive.
+
+The ``formatv`` is similar in spirit to ``printf``, but uses a different syntax
+which borrows heavily from Python and C#. Unlike ``printf`` it deduces the type
+to be formatted at compile time, so it does not need a format specifier such as
+``%d``. This reduces the mental overhead of trying to construct portable format
+strings, especially for platform-specific types like ``size_t`` or pointer types.
+Unlike both ``printf`` and Python, it additionally fails to compile if LLVM does
+not know how to format the type. These two properties ensure that the function
+is both safer and simpler to use than traditional formatting methods such as
+the ``printf`` family of functions.
+
+Simple formatting
+^^^^^^^^^^^^^^^^^
+
+A call to ``formatv`` involves a single **format string** consisting of 0 or more
+**replacement sequences**, followed by a variable length list of **replacement values**.
+A replacement sequence is a string of the form ``{N[[,align]:style]}``.
+
+``N`` refers to the 0-based index of the argument from the list of replacement
+values. Note that this means it is possible to reference the same parameter
+multiple times, possibly with different style and/or alignment options, in any order.
+
+``align`` is an optional string specifying the width of the field to format
+the value into, and the alignment of the value within the field. It is specified as
+an optional **alignment style** followed by a positive integral **field width**. The
+alignment style can be one of the characters ``-`` (left align), ``=`` (center align),
+or ``+`` (right align). The default is right aligned.
+
+``style`` is an optional string consisting of a type specific that controls the
+formatting of the value. For example, to format a floating point value as a percentage,
+you can use the style option ``P``.
+
+Custom formatting
+^^^^^^^^^^^^^^^^^
+
+There are two ways to customize the formatting behavior for a type.
+
+1. Provide a template specialization of ``llvm::format_provider<T>`` for your
+ type ``T`` with the appropriate static format method.
+
+ .. code-block:: c++
+
+ namespace llvm {
+ template<>
+ struct format_provider<MyFooBar> {
+ static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) {
+ // Do whatever is necessary to format `V` into `Stream`
+ }
+ };
+ void foo() {
+ MyFooBar X;
+ std::string S = formatv("{0}", X);
+ }
+ }
+
+ This is a useful extensibility mechanism for adding support for formatting your own
+ custom types with your own custom Style options. But it does not help when you want
+ to extend the mechanism for formatting a type that the library already knows how to
+ format. For that, we need something else.
+
+2. Provide a **format adapter** inheriting from ``llvm::FormatAdapter<T>``.
+
+ .. code-block:: c++
+
+ namespace anything {
+ struct format_int_custom : public llvm::FormatAdapter<int> {
+ explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {}
+ void format(llvm::raw_ostream &Stream, StringRef Style) override {
+ // Do whatever is necessary to format ``this->Item`` into ``Stream``
+ }
+ };
+ }
+ namespace llvm {
+ void foo() {
+ std::string S = formatv("{0}", anything::format_int_custom(42));
+ }
+ }
+
+ If the type is detected to be derived from ``FormatAdapter<T>``, ``formatv``
+ will call the
+ ``format`` method on the argument passing in the specified style. This allows
+ one to provide custom formatting of any type, including one which already has
+ a builtin format provider.
+
+``formatv`` Examples
+^^^^^^^^^^^^^^^^^^^^
+Below is intended to provide an incomplete set of examples demonstrating
+the usage of ``formatv``. More information can be found by reading the
+doxygen documentation or by looking at the unit test suite.
+
+
+.. code-block:: c++
+
+ std::string S;
+ // Simple formatting of basic types and implicit string conversion.
+ S = formatv("{0} ({1:P})", 7, 0.35); // S == "7 (35.00%)"
+
+ // Out-of-order referencing and multi-referencing
+ outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1"
+
+ // Left, right, and center alignment
+ S = formatv("{0,7}", 'a'); // S == " a";
+ S = formatv("{0,-7}", 'a'); // S == "a ";
+ S = formatv("{0,=7}", 'a'); // S == " a ";
+ S = formatv("{0,+7}", 'a'); // S == " a";
+
+ // Custom styles
+ S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8"
+
+ // Adapters
+ S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7)); // S == " 42 "
+ S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi"
+ S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == " hi "
+
+ // Ranges
+ std::vector<int> V = {8, 9, 10};
+ S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10"
+ S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10"
+ S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA"
+
+.. _error_apis:
+
+Error handling
+--------------
+
+Proper error handling helps us identify bugs in our code, and helps end-users
+understand errors in their tool usage. Errors fall into two broad categories:
+*programmatic* and *recoverable*, with different strategies for handling and
+reporting.
+
+Programmatic Errors
+^^^^^^^^^^^^^^^^^^^
+
+Programmatic errors are violations of program invariants or API contracts, and
+represent bugs within the program itself. Our aim is to document invariants, and
+to abort quickly at the point of failure (providing some basic diagnostic) when
+invariants are broken at runtime.
+
+The fundamental tools for handling programmatic errors are assertions and the
+llvm_unreachable function. Assertions are used to express invariant conditions,
+and should include a message describing the invariant:
+
+.. code-block:: c++
+
+ assert(isPhysReg(R) && "All virt regs should have been allocated already.");
+
+The llvm_unreachable function can be used to document areas of control flow
+that should never be entered if the program invariants hold:
+
+.. code-block:: c++
+
+ enum { Foo, Bar, Baz } X = foo();
+
+ switch (X) {
+ case Foo: /* Handle Foo */; break;
+ case Bar: /* Handle Bar */; break;
+ default:
+ llvm_unreachable("X should be Foo or Bar here");
+ }
+
+Recoverable Errors
+^^^^^^^^^^^^^^^^^^
+
+Recoverable errors represent an error in the program's environment, for example
+a resource failure (a missing file, a dropped network connection, etc.), or
+malformed input. These errors should be detected and communicated to a level of
+the program where they can be handled appropriately. Handling the error may be
+as simple as reporting the issue to the user, or it may involve attempts at
+recovery.
+
+.. note::
+
+ While it would be ideal to use this error handling scheme throughout
+ LLVM, there are places where this hasn't been practical to apply. In
+ situations where you absolutely must emit a non-programmatic error and
+ the ``Error`` model isn't workable you can call ``report_fatal_error``,
+ which will call installed error handlers, print a message, and exit the
+ program.
+
+Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme
+represents errors using function return values, similar to classic C integer
+error codes, or C++'s ``std::error_code``. However, the ``Error`` class is
+actually a lightweight wrapper for user-defined error types, allowing arbitrary
+information to be attached to describe the error. This is similar to the way C++
+exceptions allow throwing of user-defined types.
+
+Success values are created by calling ``Error::success()``, E.g.:
+
+.. code-block:: c++
+
+ Error foo() {
+ // Do something.
+ // Return success.
+ return Error::success();
+ }
+
+Success values are very cheap to construct and return - they have minimal
+impact on program performance.
+
+Failure values are constructed using ``make_error<T>``, where ``T`` is any class
+that inherits from the ErrorInfo utility, E.g.:
+
+.. code-block:: c++
+
+ class BadFileFormat : public ErrorInfo<BadFileFormat> {
+ public:
+ static char ID;
+ std::string Path;
+
+ BadFileFormat(StringRef Path) : Path(Path.str()) {}
+
+ void log(raw_ostream &OS) const override {
+ OS << Path << " is malformed";
+ }
+
+ std::error_code convertToErrorCode() const override {
+ return make_error_code(object_error::parse_failed);
+ }
+ };
+
+ char BadFileFormat::ID; // This should be declared in the C++ file.
+
+ Error printFormattedFile(StringRef Path) {
+ if (<check for valid format>)
+ return make_error<BadFileFormat>(Path);
+ // print file contents.
+ return Error::success();
+ }
+
+Error values can be implicitly converted to bool: true for error, false for
+success, enabling the following idiom:
+
+.. code-block:: c++
+
+ Error mayFail();
+
+ Error foo() {
+ if (auto Err = mayFail())
+ return Err;
+ // Success! We can proceed.
+ ...
+
+For functions that can fail but need to return a value the ``Expected<T>``
+utility can be used. Values of this type can be constructed with either a
+``T``, or an ``Error``. Expected<T> values are also implicitly convertible to
+boolean, but with the opposite convention to ``Error``: true for success, false
+for error. If success, the ``T`` value can be accessed via the dereference
+operator. If failure, the ``Error`` value can be extracted using the
+``takeError()`` method. Idiomatic usage looks like:
+
+.. code-block:: c++
+
+ Expected<FormattedFile> openFormattedFile(StringRef Path) {
+ // If badly formatted, return an error.
+ if (auto Err = checkFormat(Path))
+ return std::move(Err);
+ // Otherwise return a FormattedFile instance.
+ return FormattedFile(Path);
+ }
+
+ Error processFormattedFile(StringRef Path) {
+ // Try to open a formatted file
+ if (auto FileOrErr = openFormattedFile(Path)) {
+ // On success, grab a reference to the file and continue.
+ auto &File = *FileOrErr;
+ ...
+ } else
+ // On error, extract the Error value and return it.
+ return FileOrErr.takeError();
+ }
+
+If an ``Expected<T>`` value is in success mode then the ``takeError()`` method
+will return a success value. Using this fact, the above function can be
+rewritten as:
+
+.. code-block:: c++
+
+ Error processFormattedFile(StringRef Path) {
+ // Try to open a formatted file
+ auto FileOrErr = openFormattedFile(Path);
+ if (auto Err = FileOrErr.takeError())
+ // On error, extract the Error value and return it.
+ return Err;
+ // On success, grab a reference to the file and continue.
+ auto &File = *FileOrErr;
+ ...
+ }
+
+This second form is often more readable for functions that involve multiple
+``Expected<T>`` values as it limits the indentation required.
+
+All ``Error`` instances, whether success or failure, must be either checked or
+moved from (via ``std::move`` or a return) before they are destructed.
+Accidentally discarding an unchecked error will cause a program abort at the
+point where the unchecked value's destructor is run, making it easy to identify
+and fix violations of this rule.
+
+Success values are considered checked once they have been tested (by invoking
+the boolean conversion operator):
+
+.. code-block:: c++
+
+ if (auto Err = mayFail(...))
+ return Err; // Failure value - move error to caller.
+
+ // Safe to continue: Err was checked.
+
+In contrast, the following code will always cause an abort, even if ``mayFail``
+returns a success value:
+
+.. code-block:: c++
+
+ mayFail();
+ // Program will always abort here, even if mayFail() returns Success, since
+ // the value is not checked.
+
+Failure values are considered checked once a handler for the error type has
+been activated:
+
+.. code-block:: c++
+
+ handleErrors(
+ processFormattedFile(...),
+ [](const BadFileFormat &BFF) {
+ report("Unable to process " + BFF.Path + ": bad format");
+ },
+ [](const FileNotFound &FNF) {
+ report("File not found " + FNF.Path);
+ });
+
+The ``handleErrors`` function takes an error as its first argument, followed by
+a variadic list of "handlers", each of which must be a callable type (a
+function, lambda, or class with a call operator) with one argument. The
+``handleErrors`` function will visit each handler in the sequence and check its
+argument type against the dynamic type of the error, running the first handler
+that matches. This is the same decision process that is used decide which catch
+clause to run for a C++ exception.
+
+Since the list of handlers passed to ``handleErrors`` may not cover every error
+type that can occur, the ``handleErrors`` function also returns an Error value
+that must be checked or propagated. If the error value that is passed to
+``handleErrors`` does not match any of the handlers it will be returned from
+handleErrors. Idiomatic use of ``handleErrors`` thus looks like:
+
+.. code-block:: c++
+
+ if (auto Err =
+ handleErrors(
+ processFormattedFile(...),
+ [](const BadFileFormat &BFF) {
+ report("Unable to process " + BFF.Path + ": bad format");
+ },
+ [](const FileNotFound &FNF) {
+ report("File not found " + FNF.Path);
+ }))
+ return Err;
+
+In cases where you truly know that the handler list is exhaustive the
+``handleAllErrors`` function can be used instead. This is identical to
+``handleErrors`` except that it will terminate the program if an unhandled
+error is passed in, and can therefore return void. The ``handleAllErrors``
+function should generally be avoided: the introduction of a new error type
+elsewhere in the program can easily turn a formerly exhaustive list of errors
+into a non-exhaustive list, risking unexpected program termination. Where
+possible, use handleErrors and propagate unknown errors up the stack instead.
+
+For tool code, where errors can be handled by printing an error message then
+exiting with an error code, the :ref:`ExitOnError <err_exitonerr>` utility
+may be a better choice than handleErrors, as it simplifies control flow when
+calling fallible functions.
+
+In situations where it is known that a particular call to a fallible function
+will always succeed (for example, a call to a function that can only fail on a
+subset of inputs with an input that is known to be safe) the
+:ref:`cantFail <err_cantfail>` functions can be used to remove the error type,
+simplifying control flow.
+
+StringError
+"""""""""""
+
+Many kinds of errors have no recovery strategy, the only action that can be
+taken is to report them to the user so that the user can attempt to fix the
+environment. In this case representing the error as a string makes perfect
+sense. LLVM provides the ``StringError`` class for this purpose. It takes two
+arguments: A string error message, and an equivalent ``std::error_code`` for
+interoperability:
+
+.. code-block:: c++
+
+ make_error<StringError>("Bad executable",
+ make_error_code(errc::executable_format_error"));
+
+If you're certain that the error you're building will never need to be converted
+to a ``std::error_code`` you can use the ``inconvertibleErrorCode()`` function:
+
+.. code-block:: c++
+
+ make_error<StringError>("Bad executable", inconvertibleErrorCode());
+
+This should be done only after careful consideration. If any attempt is made to
+convert this error to a ``std::error_code`` it will trigger immediate program
+termination. Unless you are certain that your errors will not need
+interoperability you should look for an existing ``std::error_code`` that you
+can convert to, and even (as painful as it is) consider introducing a new one as
+a stopgap measure.
+
+Interoperability with std::error_code and ErrorOr
+"""""""""""""""""""""""""""""""""""""""""""""""""
+
+Many existing LLVM APIs use ``std::error_code`` and its partner ``ErrorOr<T>``
+(which plays the same role as ``Expected<T>``, but wraps a ``std::error_code``
+rather than an ``Error``). The infectious nature of error types means that an
+attempt to change one of these functions to return ``Error`` or ``Expected<T>``
+instead often results in an avalanche of changes to callers, callers of callers,
+and so on. (The first such attempt, returning an ``Error`` from
+MachOObjectFile's constructor, was abandoned after the diff reached 3000 lines,
+impacted half a dozen libraries, and was still growing).
+
+To solve this problem, the ``Error``/``std::error_code`` interoperability requirement was
+introduced. Two pairs of functions allow any ``Error`` value to be converted to a
+``std::error_code``, any ``Expected<T>`` to be converted to an ``ErrorOr<T>``, and vice
+versa:
+
+.. code-block:: c++
+
+ std::error_code errorToErrorCode(Error Err);
+ Error errorCodeToError(std::error_code EC);
+
+ template <typename T> ErrorOr<T> expectedToErrorOr(Expected<T> TOrErr);
+ template <typename T> Expected<T> errorOrToExpected(ErrorOr<T> TOrEC);
+
+
+Using these APIs it is easy to make surgical patches that update individual
+functions from ``std::error_code`` to ``Error``, and from ``ErrorOr<T>`` to
+``Expected<T>``.
+
+Returning Errors from error handlers
+""""""""""""""""""""""""""""""""""""
+
+Error recovery attempts may themselves fail. For that reason, ``handleErrors``
+actually recognises three different forms of handler signature:
+
+.. code-block:: c++
+
+ // Error must be handled, no new errors produced:
+ void(UserDefinedError &E);
+
+ // Error must be handled, new errors can be produced:
+ Error(UserDefinedError &E);
+
+ // Original error can be inspected, then re-wrapped and returned (or a new
+ // error can be produced):
+ Error(std::unique_ptr<UserDefinedError> E);
+
+Any error returned from a handler will be returned from the ``handleErrors``
+function so that it can be handled itself, or propagated up the stack.
+
+.. _err_exitonerr:
+
+Using ExitOnError to simplify tool code
+"""""""""""""""""""""""""""""""""""""""
+
+Library code should never call ``exit`` for a recoverable error, however in tool
+code (especially command line tools) this can be a reasonable approach. Calling
+``exit`` upon encountering an error dramatically simplifies control flow as the
+error no longer needs to be propagated up the stack. This allows code to be
+written in straight-line style, as long as each fallible call is wrapped in a
+check and call to exit. The ``ExitOnError`` class supports this pattern by
+providing call operators that inspect ``Error`` values, stripping the error away
+in the success case and logging to ``stderr`` then exiting in the failure case.
+
+To use this class, declare a global ``ExitOnError`` variable in your program:
+
+.. code-block:: c++
+
+ ExitOnError ExitOnErr;
+
+Calls to fallible functions can then be wrapped with a call to ``ExitOnErr``,
+turning them into non-failing calls:
+
+.. code-block:: c++
+
+ Error mayFail();
+ Expected<int> mayFail2();
+
+ void foo() {
+ ExitOnErr(mayFail());
+ int X = ExitOnErr(mayFail2());
+ }
+
+On failure, the error's log message will be written to ``stderr``, optionally
+preceded by a string "banner" that can be set by calling the setBanner method. A
+mapping can also be supplied from ``Error`` values to exit codes using the
+``setExitCodeMapper`` method:
+
+.. code-block:: c++
+
+ int main(int argc, char *argv[]) {
+ ExitOnErr.setBanner(std::string(argv[0]) + " error:");
+ ExitOnErr.setExitCodeMapper(
+ [](const Error &Err) {
+ if (Err.isA<BadFileFormat>())
+ return 2;
+ return 1;
+ });
+
+Use ``ExitOnError`` in your tool code where possible as it can greatly improve
+readability.
+
+.. _err_cantfail:
+
+Using cantFail to simplify safe callsites
+"""""""""""""""""""""""""""""""""""""""""
+
+Some functions may only fail for a subset of their inputs, so calls using known
+safe inputs can be assumed to succeed.
+
+The cantFail functions encapsulate this by wrapping an assertion that their
+argument is a success value and, in the case of Expected<T>, unwrapping the
+T value:
+
+.. code-block:: c++
+
+ Error onlyFailsForSomeXValues(int X);
+ Expected<int> onlyFailsForSomeXValues2(int X);
+
+ void foo() {
+ cantFail(onlyFailsForSomeXValues(KnownSafeValue));
+ int Y = cantFail(onlyFailsForSomeXValues2(KnownSafeValue));
+ ...
+ }
+
+Like the ExitOnError utility, cantFail simplifies control flow. Their treatment
+of error cases is very different however: Where ExitOnError is guaranteed to
+terminate the program on an error input, cantFile simply asserts that the result
+is success. In debug builds this will result in an assertion failure if an error
+is encountered. In release builds the behavior of cantFail for failure values is
+undefined. As such, care must be taken in the use of cantFail: clients must be
+certain that a cantFail wrapped call really can not fail with the given
+arguments.
+
+Use of the cantFail functions should be rare in library code, but they are
+likely to be of more use in tool and unit-test code where inputs and/or
+mocked-up classes or functions may be known to be safe.
+
+Fallible constructors
+"""""""""""""""""""""
+
+Some classes require resource acquisition or other complex initialization that
+can fail during construction. Unfortunately constructors can't return errors,
+and having clients test objects after they're constructed to ensure that they're
+valid is error prone as it's all too easy to forget the test. To work around
+this, use the named constructor idiom and return an ``Expected<T>``:
+
+.. code-block:: c++
+
+ class Foo {
+ public:
+
+ static Expected<Foo> Create(Resource R1, Resource R2) {
+ Error Err;
+ Foo F(R1, R2, Err);
+ if (Err)
+ return std::move(Err);
+ return std::move(F);
+ }
+
+ private:
+
+ Foo(Resource R1, Resource R2, Error &Err) {
+ ErrorAsOutParameter EAO(&Err);
+ if (auto Err2 = R1.acquire()) {
+ Err = std::move(Err2);
+ return;
+ }
+ Err = R2.acquire();
+ }
+ };
+
+
+Here, the named constructor passes an ``Error`` by reference into the actual
+constructor, which the constructor can then use to return errors. The
+``ErrorAsOutParameter`` utility sets the ``Error`` value's checked flag on entry
+to the constructor so that the error can be assigned to, then resets it on exit
+to force the client (the named constructor) to check the error.
+
+By using this idiom, clients attempting to construct a Foo receive either a
+well-formed Foo or an Error, never an object in an invalid state.
+
+Propagating and consuming errors based on types
+"""""""""""""""""""""""""""""""""""""""""""""""
+
+In some contexts, certain types of error are known to be benign. For example,
+when walking an archive, some clients may be happy to skip over badly formatted
+object files rather than terminating the walk immediately. Skipping badly
+formatted objects could be achieved using an elaborate handler method, but the
+Error.h header provides two utilities that make this idiom much cleaner: the
+type inspection method, ``isA``, and the ``consumeError`` function:
+
+.. code-block:: c++
+
+ Error walkArchive(Archive A) {
+ for (unsigned I = 0; I != A.numMembers(); ++I) {
+ auto ChildOrErr = A.getMember(I);
+ if (auto Err = ChildOrErr.takeError()) {
+ if (Err.isA<BadFileFormat>())
+ consumeError(std::move(Err))
+ else
+ return Err;
+ }
+ auto &Child = *ChildOrErr;
+ // Use Child
+ ...
+ }
+ return Error::success();
+ }
+
+Concatenating Errors with joinErrors
+""""""""""""""""""""""""""""""""""""
+
+In the archive walking example above ``BadFileFormat`` errors are simply
+consumed and ignored. If the client had wanted report these errors after
+completing the walk over the archive they could use the ``joinErrors`` utility:
+
+.. code-block:: c++
+
+ Error walkArchive(Archive A) {
+ Error DeferredErrs = Error::success();
+ for (unsigned I = 0; I != A.numMembers(); ++I) {
+ auto ChildOrErr = A.getMember(I);
+ if (auto Err = ChildOrErr.takeError())
+ if (Err.isA<BadFileFormat>())
+ DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err));
+ else
+ return Err;
+ auto &Child = *ChildOrErr;
+ // Use Child
+ ...
+ }
+ return DeferredErrs;
+ }
+
+The ``joinErrors`` routine builds a special error type called ``ErrorList``,
+which holds a list of user defined errors. The ``handleErrors`` routine
+recognizes this type and will attempt to handle each of the contained errors in
+order. If all contained errors can be handled, ``handleErrors`` will return
+``Error::success()``, otherwise ``handleErrors`` will concatenate the remaining
+errors and return the resulting ``ErrorList``.
+
+Building fallible iterators and iterator ranges
+"""""""""""""""""""""""""""""""""""""""""""""""
+
+The archive walking examples above retrieve archive members by index, however
+this requires considerable boiler-plate for iteration and error checking. We can
+clean this up by using ``Error`` with the "fallible iterator" pattern. The usual
+C++ iterator patterns do not allow for failure on increment, but we can
+incorporate support for it by having iterators hold an Error reference through
+which they can report failure. In this pattern, if an increment operation fails
+the failure is recorded via the Error reference and the iterator value is set to
+the end of the range in order to terminate the loop. This ensures that the
+dereference operation is safe anywhere that an ordinary iterator dereference
+would be safe (i.e. when the iterator is not equal to end). Where this pattern
+is followed (as in the ``llvm::object::Archive`` class) the result is much
+cleaner iteration idiom:
+
+.. code-block:: c++
+
+ Error Err;
+ for (auto &Child : Ar->children(Err)) {
+ // Use Child - we only enter the loop when it's valid
+ ...
+ }
+ // Check Err after the loop to ensure it didn't break due to an error.
+ if (Err)
+ return Err;
+
+.. _function_apis:
+
+More information on Error and its related utilities can be found in the
+Error.h header file.
+
+Passing functions and other callable objects
+--------------------------------------------
+
+Sometimes you may want a function to be passed a callback object. In order to
+support lambda expressions and other function objects, you should not use the
+traditional C approach of taking a function pointer and an opaque cookie:
+
+.. code-block:: c++
+
+ void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);
+
+Instead, use one of the following approaches:
+
+Function template
+^^^^^^^^^^^^^^^^^
+
+If you don't mind putting the definition of your function into a header file,
+make it a function template that is templated on the callable type.
+
+.. code-block:: c++
+
+ template<typename Callable>
+ void takeCallback(Callable Callback) {
+ Callback(1, 2, 3);
+ }
+
+The ``function_ref`` class template
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``function_ref``
+(`doxygen <http://llvm.org/doxygen/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class
+template represents a reference to a callable object, templated over the type
+of the callable. This is a good choice for passing a callback to a function,
+if you don't need to hold onto the callback after the function returns. In this
+way, ``function_ref`` is to ``std::function`` as ``StringRef`` is to
+``std::string``.
+
+``function_ref<Ret(Param1, Param2, ...)>`` can be implicitly constructed from
+any callable object that can be called with arguments of type ``Param1``,
+``Param2``, ..., and returns a value that can be converted to type ``Ret``.
+For example:
+
+.. code-block:: c++
+
+ void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
+ for (BasicBlock &BB : *F)
+ if (Callback(&BB))
+ return;
+ }
+
+can be called using:
+
+.. code-block:: c++
+
+ visitBasicBlocks(F, [&](BasicBlock *BB) {
+ if (process(BB))
+ return isEmpty(BB);
+ return false;
+ });
+
+Note that a ``function_ref`` object contains pointers to external memory, so it
+is not generally safe to store an instance of the class (unless you know that
+the external storage will not be freed). If you need this ability, consider
+using ``std::function``. ``function_ref`` is small enough that it should always
+be passed by value.
+
+.. _DEBUG:
+
+The ``LLVM_DEBUG()`` macro and ``-debug`` option
+------------------------------------------------
+
+Often when working on your pass you will put a bunch of debugging printouts and
+other code into your pass. After you get it working, you want to remove it, but
+you may need it again in the future (to work out new bugs that you run across).
+
+Naturally, because of this, you don't want to delete the debug printouts, but
+you don't want them to always be noisy. A standard compromise is to comment
+them out, allowing you to enable them if you need them in the future.
+
+The ``llvm/Support/Debug.h`` (`doxygen
+<http://llvm.org/doxygen/Debug_8h_source.html>`__) file provides a macro named
+``LLVM_DEBUG()`` that is a much nicer solution to this problem. Basically, you can
+put arbitrary code into the argument of the ``LLVM_DEBUG`` macro, and it is only
+executed if '``opt``' (or any other tool) is run with the '``-debug``' command
+line argument:
+
+.. code-block:: c++
+
+ LLVM_DEBUG(dbgs() << "I am here!\n");
+
+Then you can run your pass like this:
+
+.. code-block:: none
+
+ $ opt < a.bc > /dev/null -mypass
+ <no output>
+ $ opt < a.bc > /dev/null -mypass -debug
+ I am here!
+
+Using the ``LLVM_DEBUG()`` macro instead of a home-brewed solution allows you to not
+have to create "yet another" command line option for the debug output for your
+pass. Note that ``LLVM_DEBUG()`` macros are disabled for non-asserts builds, so they
+do not cause a performance impact at all (for the same reason, they should also
+not contain side-effects!).
+
+One additional nice thing about the ``LLVM_DEBUG()`` macro is that you can enable or
+disable it directly in gdb. Just use "``set DebugFlag=0``" or "``set
+DebugFlag=1``" from the gdb if the program is running. If the program hasn't
+been started yet, you can always just run it with ``-debug``.
+
+.. _DEBUG_TYPE:
+
+Fine grained debug info with ``DEBUG_TYPE`` and the ``-debug-only`` option
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sometimes you may find yourself in a situation where enabling ``-debug`` just
+turns on **too much** information (such as when working on the code generator).
+If you want to enable debug information with more fine-grained control, you
+should define the ``DEBUG_TYPE`` macro and use the ``-debug-only`` option as
+follows:
+
+.. code-block:: c++
+
+ #define DEBUG_TYPE "foo"
+ LLVM_DEBUG(dbgs() << "'foo' debug type\n");
+ #undef DEBUG_TYPE
+ #define DEBUG_TYPE "bar"
+ LLVM_DEBUG(dbgs() << "'bar' debug type\n");
+ #undef DEBUG_TYPE
+
+Then you can run your pass like this:
+
+.. code-block:: none
+
+ $ opt < a.bc > /dev/null -mypass
+ <no output>
+ $ opt < a.bc > /dev/null -mypass -debug
+ 'foo' debug type
+ 'bar' debug type
+ $ opt < a.bc > /dev/null -mypass -debug-only=foo
+ 'foo' debug type
+ $ opt < a.bc > /dev/null -mypass -debug-only=bar
+ 'bar' debug type
+ $ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
+ 'foo' debug type
+ 'bar' debug type
+
+Of course, in practice, you should only set ``DEBUG_TYPE`` at the top of a file,
+to specify the debug type for the entire module. Be careful that you only do
+this after including Debug.h and not around any #include of headers. Also, you
+should use names more meaningful than "foo" and "bar", because there is no
+system in place to ensure that names do not conflict. If two different modules
+use the same string, they will all be turned on when the name is specified.
+This allows, for example, all debug information for instruction scheduling to be
+enabled with ``-debug-only=InstrSched``, even if the source lives in multiple
+files. The name must not include a comma (,) as that is used to separate the
+arguments of the ``-debug-only`` option.
+
+For performance reasons, -debug-only is not available in optimized build
+(``--enable-optimized``) of LLVM.
+
+The ``DEBUG_WITH_TYPE`` macro is also available for situations where you would
+like to set ``DEBUG_TYPE``, but only for one specific ``DEBUG`` statement. It
+takes an additional first parameter, which is the type to use. For example, the
+preceding example could be written as:
+
+.. code-block:: c++
+
+ DEBUG_WITH_TYPE("foo", dbgs() << "'foo' debug type\n");
+ DEBUG_WITH_TYPE("bar", dbgs() << "'bar' debug type\n");
+
+.. _Statistic:
+
+The ``Statistic`` class & ``-stats`` option
+-------------------------------------------
+
+The ``llvm/ADT/Statistic.h`` (`doxygen
+<http://llvm.org/doxygen/Statistic_8h_source.html>`__) file provides a class
+named ``Statistic`` that is used as a unified way to keep track of what the LLVM
+compiler is doing and how effective various optimizations are. It is useful to
+see what optimizations are contributing to making a particular program run
+faster.
+
+Often you may run your pass on some big program, and you're interested to see
+how many times it makes a certain transformation. Although you can do this with
+hand inspection, or some ad-hoc method, this is a real pain and not very useful
+for big programs. Using the ``Statistic`` class makes it very easy to keep
+track of this information, and the calculated information is presented in a
+uniform manner with the rest of the passes being executed.
+
+There are many examples of ``Statistic`` uses, but the basics of using it are as
+follows:
+
+Define your statistic like this:
+
+.. code-block:: c++
+
+ #define DEBUG_TYPE "mypassname" // This goes before any #includes.
+ STATISTIC(NumXForms, "The # of times I did stuff");
+
+The ``STATISTIC`` macro defines a static variable, whose name is specified by
+the first argument. The pass name is taken from the ``DEBUG_TYPE`` macro, and
+the description is taken from the second argument. The variable defined
+("NumXForms" in this case) acts like an unsigned integer.
+
+Whenever you make a transformation, bump the counter:
+
+.. code-block:: c++
+
+ ++NumXForms; // I did stuff!
+
+That's all you have to do. To get '``opt``' to print out the statistics
+gathered, use the '``-stats``' option:
+
+.. code-block:: none
+
+ $ opt -stats -mypassname < program.bc > /dev/null
+ ... statistics output ...
+
+Note that in order to use the '``-stats``' option, LLVM must be
+compiled with assertions enabled.
+
+When running ``opt`` on a C file from the SPEC benchmark suite, it gives a
+report that looks like this:
+
+.. code-block:: none
+
+ 7646 bitcodewriter - Number of normal instructions
+ 725 bitcodewriter - Number of oversized instructions
+ 129996 bitcodewriter - Number of bitcode bytes written
+ 2817 raise - Number of insts DCEd or constprop'd
+ 3213 raise - Number of cast-of-self removed
+ 5046 raise - Number of expression trees converted
+ 75 raise - Number of other getelementptr's formed
+ 138 raise - Number of load/store peepholes
+ 42 deadtypeelim - Number of unused typenames removed from symtab
+ 392 funcresolve - Number of varargs functions resolved
+ 27 globaldce - Number of global variables removed
+ 2 adce - Number of basic blocks removed
+ 134 cee - Number of branches revectored
+ 49 cee - Number of setcc instruction eliminated
+ 532 gcse - Number of loads removed
+ 2919 gcse - Number of instructions removed
+ 86 indvars - Number of canonical indvars added
+ 87 indvars - Number of aux indvars removed
+ 25 instcombine - Number of dead inst eliminate
+ 434 instcombine - Number of insts combined
+ 248 licm - Number of load insts hoisted
+ 1298 licm - Number of insts hoisted to a loop pre-header
+ 3 licm - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
+ 75 mem2reg - Number of alloca's promoted
+ 1444 cfgsimplify - Number of blocks simplified
+
+Obviously, with so many optimizations, having a unified framework for this stuff
+is very nice. Making your pass fit well into the framework makes it more
+maintainable and useful.
+
+.. _DebugCounters:
+
+Adding debug counters to aid in debugging your code
+---------------------------------------------------
+
+Sometimes, when writing new passes, or trying to track down bugs, it
+is useful to be able to control whether certain things in your pass
+happen or not. For example, there are times the minimization tooling
+can only easily give you large testcases. You would like to narrow
+your bug down to a specific transformation happening or not happening,
+automatically, using bisection. This is where debug counters help.
+They provide a framework for making parts of your code only execute a
+certain number of times.
+
+The ``llvm/Support/DebugCounter.h`` (`doxygen
+<http://llvm.org/doxygen/DebugCounter_8h_source.html>`__) file
+provides a class named ``DebugCounter`` that can be used to create
+command line counter options that control execution of parts of your code.
+
+Define your DebugCounter like this:
+
+.. code-block:: c++
+
+ DEBUG_COUNTER(DeleteAnInstruction, "passname-delete-instruction",
+ "Controls which instructions get delete");
+
+The ``DEBUG_COUNTER`` macro defines a static variable, whose name
+is specified by the first argument. The name of the counter
+(which is used on the command line) is specified by the second
+argument, and the description used in the help is specified by the
+third argument.
+
+Whatever code you want that control, use ``DebugCounter::shouldExecute`` to control it.
+
+.. code-block:: c++
+
+ if (DebugCounter::shouldExecute(DeleteAnInstruction))
+ I->eraseFromParent();
+
+That's all you have to do. Now, using opt, you can control when this code triggers using
+the '``--debug-counter``' option. There are two counters provided, ``skip`` and ``count``.
+``skip`` is the number of times to skip execution of the codepath. ``count`` is the number
+of times, once we are done skipping, to execute the codepath.
+
+.. code-block:: none
+
+ $ opt --debug-counter=passname-delete-instruction-skip=1,passname-delete-instruction-count=2 -passname
+
+This will skip the above code the first time we hit it, then execute it twice, then skip the rest of the executions.
+
+So if executed on the following code:
+
+.. code-block:: llvm
+
+ %1 = add i32 %a, %b
+ %2 = add i32 %a, %b
+ %3 = add i32 %a, %b
+ %4 = add i32 %a, %b
+
+It would delete number ``%2`` and ``%3``.
+
+A utility is provided in `utils/bisect-skip-count` to binary search
+skip and count arguments. It can be used to automatically minimize the
+skip and count for a debug-counter variable.
+
+.. _ViewGraph:
+
+Viewing graphs while debugging code
+-----------------------------------
+
+Several of the important data structures in LLVM are graphs: for example CFGs
+made out of LLVM :ref:`BasicBlocks <BasicBlock>`, CFGs made out of LLVM
+:ref:`MachineBasicBlocks <MachineBasicBlock>`, and :ref:`Instruction Selection
+DAGs <SelectionDAG>`. In many cases, while debugging various parts of the
+compiler, it is nice to instantly visualize these graphs.
+
+LLVM provides several callbacks that are available in a debug build to do
+exactly that. If you call the ``Function::viewCFG()`` method, for example, the
+current LLVM tool will pop up a window containing the CFG for the function where
+each basic block is a node in the graph, and each node contains the instructions
+in the block. Similarly, there also exists ``Function::viewCFGOnly()`` (does
+not include the instructions), the ``MachineFunction::viewCFG()`` and
+``MachineFunction::viewCFGOnly()``, and the ``SelectionDAG::viewGraph()``
+methods. Within GDB, for example, you can usually use something like ``call
+DAG.viewGraph()`` to pop up a window. Alternatively, you can sprinkle calls to
+these functions in your code in places you want to debug.
+
+Getting this to work requires a small amount of setup. On Unix systems
+with X11, install the `graphviz <http://www.graphviz.org>`_ toolkit, and make
+sure 'dot' and 'gv' are in your path. If you are running on Mac OS X, download
+and install the Mac OS X `Graphviz program
+<http://www.pixelglow.com/graphviz/>`_ and add
+``/Applications/Graphviz.app/Contents/MacOS/`` (or wherever you install it) to
+your path. The programs need not be present when configuring, building or
+running LLVM and can simply be installed when needed during an active debug
+session.
+
+``SelectionDAG`` has been extended to make it easier to locate *interesting*
+nodes in large complex graphs. From gdb, if you ``call DAG.setGraphColor(node,
+"color")``, then the next ``call DAG.viewGraph()`` would highlight the node in
+the specified color (choices of colors can be found at `colors
+<http://www.graphviz.org/doc/info/colors.html>`_.) More complex node attributes
+can be provided with ``call DAG.setGraphAttrs(node, "attributes")`` (choices can
+be found at `Graph attributes <http://www.graphviz.org/doc/info/attrs.html>`_.)
+If you want to restart and clear all the current graph attributes, then you can
+``call DAG.clearGraphAttrs()``.
+
+Note that graph visualization features are compiled out of Release builds to
+reduce file size. This means that you need a Debug+Asserts or Release+Asserts
+build to use these features.
+
+.. _datastructure:
+
+Picking the Right Data Structure for a Task
+===========================================
+
+LLVM has a plethora of data structures in the ``llvm/ADT/`` directory, and we
+commonly use STL data structures. This section describes the trade-offs you
+should consider when you pick one.
+
+The first step is a choose your own adventure: do you want a sequential
+container, a set-like container, or a map-like container? The most important
+thing when choosing a container is the algorithmic properties of how you plan to
+access the container. Based on that, you should use:
+
+
+* a :ref:`map-like <ds_map>` container if you need efficient look-up of a
+ value based on another value. Map-like containers also support efficient
+ queries for containment (whether a key is in the map). Map-like containers
+ generally do not support efficient reverse mapping (values to keys). If you
+ need that, use two maps. Some map-like containers also support efficient
+ iteration through the keys in sorted order. Map-like containers are the most
+ expensive sort, only use them if you need one of these capabilities.
+
+* a :ref:`set-like <ds_set>` container if you need to put a bunch of stuff into
+ a container that automatically eliminates duplicates. Some set-like
+ containers support efficient iteration through the elements in sorted order.
+ Set-like containers are more expensive than sequential containers.
+
+* a :ref:`sequential <ds_sequential>` container provides the most efficient way
+ to add elements and keeps track of the order they are added to the collection.
+ They permit duplicates and support efficient iteration, but do not support
+ efficient look-up based on a key.
+
+* a :ref:`string <ds_string>` container is a specialized sequential container or
+ reference structure that is used for character or byte arrays.
+
+* a :ref:`bit <ds_bit>` container provides an efficient way to store and
+ perform set operations on sets of numeric id's, while automatically
+ eliminating duplicates. Bit containers require a maximum of 1 bit for each
+ identifier you want to store.
+
+Once the proper category of container is determined, you can fine tune the
+memory use, constant factors, and cache behaviors of access by intelligently
+picking a member of the category. Note that constant factors and cache behavior
+can be a big deal. If you have a vector that usually only contains a few
+elements (but could contain many), for example, it's much better to use
+:ref:`SmallVector <dss_smallvector>` than :ref:`vector <dss_vector>`. Doing so
+avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding
+the elements to the container.
+
+.. _ds_sequential:
+
+Sequential Containers (std::vector, std::list, etc)
+---------------------------------------------------
+
+There are a variety of sequential containers available for you, based on your
+needs. Pick the first in this section that will do what you want.
+
+.. _dss_arrayref:
+
+llvm/ADT/ArrayRef.h
+^^^^^^^^^^^^^^^^^^^
+
+The ``llvm::ArrayRef`` class is the preferred class to use in an interface that
+accepts a sequential list of elements in memory and just reads from them. By
+taking an ``ArrayRef``, the API can be passed a fixed size array, an
+``std::vector``, an ``llvm::SmallVector`` and anything else that is contiguous
+in memory.
+
+.. _dss_fixedarrays:
+
+Fixed Size Arrays
+^^^^^^^^^^^^^^^^^
+
+Fixed size arrays are very simple and very fast. They are good if you know
+exactly how many elements you have, or you have a (low) upper bound on how many
+you have.
+
+.. _dss_heaparrays:
+
+Heap Allocated Arrays
+^^^^^^^^^^^^^^^^^^^^^
+
+Heap allocated arrays (``new[]`` + ``delete[]``) are also simple. They are good
+if the number of elements is variable, if you know how many elements you will
+need before the array is allocated, and if the array is usually large (if not,
+consider a :ref:`SmallVector <dss_smallvector>`). The cost of a heap allocated
+array is the cost of the new/delete (aka malloc/free). Also note that if you
+are allocating an array of a type with a constructor, the constructor and
+destructors will be run for every element in the array (re-sizable vectors only
+construct those elements actually used).
+
+.. _dss_tinyptrvector:
+
+llvm/ADT/TinyPtrVector.h
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``TinyPtrVector<Type>`` is a highly specialized collection class that is
+optimized to avoid allocation in the case when a vector has zero or one
+elements. It has two major restrictions: 1) it can only hold values of pointer
+type, and 2) it cannot hold a null pointer.
+
+Since this container is highly specialized, it is rarely used.
+
+.. _dss_smallvector:
+
+llvm/ADT/SmallVector.h
+^^^^^^^^^^^^^^^^^^^^^^
+
+``SmallVector<Type, N>`` is a simple class that looks and smells just like
+``vector<Type>``: it supports efficient iteration, lays out elements in memory
+order (so you can do pointer arithmetic between elements), supports efficient
+push_back/pop_back operations, supports efficient random access to its elements,
+etc.
+
+The main advantage of SmallVector is that it allocates space for some number of
+elements (N) **in the object itself**. Because of this, if the SmallVector is
+dynamically smaller than N, no malloc is performed. This can be a big win in
+cases where the malloc/free call is far more expensive than the code that
+fiddles around with the elements.
+
+This is good for vectors that are "usually small" (e.g. the number of
+predecessors/successors of a block is usually less than 8). On the other hand,
+this makes the size of the SmallVector itself large, so you don't want to
+allocate lots of them (doing so will waste a lot of space). As such,
+SmallVectors are most useful when on the stack.
+
+SmallVector also provides a nice portable and efficient replacement for
+``alloca``.
+
+SmallVector has grown a few other minor advantages over std::vector, causing
+``SmallVector<Type, 0>`` to be preferred over ``std::vector<Type>``.
+
+#. std::vector is exception-safe, and some implementations have pessimizations
+ that copy elements when SmallVector would move them.
+
+#. SmallVector understands ``isPodLike<Type>`` and uses realloc aggressively.
+
+#. Many LLVM APIs take a SmallVectorImpl as an out parameter (see the note
+ below).
+
+#. SmallVector with N equal to 0 is smaller than std::vector on 64-bit
+ platforms, since it uses ``unsigned`` (instead of ``void*``) for its size
+ and capacity.
+
+.. note::
+
+ Prefer to use ``SmallVectorImpl<T>`` as a parameter type.
+
+ In APIs that don't care about the "small size" (most?), prefer to use
+ the ``SmallVectorImpl<T>`` class, which is basically just the "vector
+ header" (and methods) without the elements allocated after it. Note that
+ ``SmallVector<T, N>`` inherits from ``SmallVectorImpl<T>`` so the
+ conversion is implicit and costs nothing. E.g.
+
+ .. code-block:: c++
+
+ // BAD: Clients cannot pass e.g. SmallVector<Foo, 4>.
+ hardcodedSmallSize(SmallVector<Foo, 2> &Out);
+ // GOOD: Clients can pass any SmallVector<Foo, N>.
+ allowsAnySmallSize(SmallVectorImpl<Foo> &Out);
+
+ void someFunc() {
+ SmallVector<Foo, 8> Vec;
+ hardcodedSmallSize(Vec); // Error.
+ allowsAnySmallSize(Vec); // Works.
+ }
+
+ Even though it has "``Impl``" in the name, this is so widely used that
+ it really isn't "private to the implementation" anymore. A name like
+ ``SmallVectorHeader`` would be more appropriate.
+
+.. _dss_vector:
+
+<vector>
+^^^^^^^^
+
+``std::vector<T>`` is well loved and respected. However, ``SmallVector<T, 0>``
+is often a better option due to the advantages listed above. std::vector is
+still useful when you need to store more than ``UINT32_MAX`` elements or when
+interfacing with code that expects vectors :).
+
+One worthwhile note about std::vector: avoid code like this:
+
+.. code-block:: c++
+
+ for ( ... ) {
+ std::vector<foo> V;
+ // make use of V.
+ }
+
+Instead, write this as:
+
+.. code-block:: c++
+
+ std::vector<foo> V;
+ for ( ... ) {
+ // make use of V.
+ V.clear();
+ }
+
+Doing so will save (at least) one heap allocation and free per iteration of the
+loop.
+
+.. _dss_deque:
+
+<deque>
+^^^^^^^
+
+``std::deque`` is, in some senses, a generalized version of ``std::vector``.
+Like ``std::vector``, it provides constant time random access and other similar
+properties, but it also provides efficient access to the front of the list. It
+does not guarantee continuity of elements within memory.
+
+In exchange for this extra flexibility, ``std::deque`` has significantly higher
+constant factor costs than ``std::vector``. If possible, use ``std::vector`` or
+something cheaper.
+
+.. _dss_list:
+
+<list>
+^^^^^^
+
+``std::list`` is an extremely inefficient class that is rarely useful. It
+performs a heap allocation for every element inserted into it, thus having an
+extremely high constant factor, particularly for small data types.
+``std::list`` also only supports bidirectional iteration, not random access
+iteration.
+
+In exchange for this high cost, std::list supports efficient access to both ends
+of the list (like ``std::deque``, but unlike ``std::vector`` or
+``SmallVector``). In addition, the iterator invalidation characteristics of
+std::list are stronger than that of a vector class: inserting or removing an
+element into the list does not invalidate iterator or pointers to other elements
+in the list.
+
+.. _dss_ilist:
+
+llvm/ADT/ilist.h
+^^^^^^^^^^^^^^^^
+
+``ilist<T>`` implements an 'intrusive' doubly-linked list. It is intrusive,
+because it requires the element to store and provide access to the prev/next
+pointers for the list.
+
+``ilist`` has the same drawbacks as ``std::list``, and additionally requires an
+``ilist_traits`` implementation for the element type, but it provides some novel
+characteristics. In particular, it can efficiently store polymorphic objects,
+the traits class is informed when an element is inserted or removed from the
+list, and ``ilist``\ s are guaranteed to support a constant-time splice
+operation.
+
+These properties are exactly what we want for things like ``Instruction``\ s and
+basic blocks, which is why these are implemented with ``ilist``\ s.
+
+Related classes of interest are explained in the following subsections:
+
+* :ref:`ilist_traits <dss_ilist_traits>`
+
+* :ref:`iplist <dss_iplist>`
+
+* :ref:`llvm/ADT/ilist_node.h <dss_ilist_node>`
+
+* :ref:`Sentinels <dss_ilist_sentinel>`
+
+.. _dss_packedvector:
+
+llvm/ADT/PackedVector.h
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Useful for storing a vector of values using only a few number of bits for each
+value. Apart from the standard operations of a vector-like container, it can
+also perform an 'or' set operation.
+
+For example:
+
+.. code-block:: c++
+
+ enum State {
+ None = 0x0,
+ FirstCondition = 0x1,
+ SecondCondition = 0x2,
+ Both = 0x3
+ };
+
+ State get() {
+ PackedVector<State, 2> Vec1;
+ Vec1.push_back(FirstCondition);
+
+ PackedVector<State, 2> Vec2;
+ Vec2.push_back(SecondCondition);
+
+ Vec1 |= Vec2;
+ return Vec1[0]; // returns 'Both'.
+ }
+
+.. _dss_ilist_traits:
+
+ilist_traits
+^^^^^^^^^^^^
+
+``ilist_traits<T>`` is ``ilist<T>``'s customization mechanism. ``iplist<T>``
+(and consequently ``ilist<T>``) publicly derive from this traits class.
+
+.. _dss_iplist:
+
+iplist
+^^^^^^
+
+``iplist<T>`` is ``ilist<T>``'s base and as such supports a slightly narrower
+interface. Notably, inserters from ``T&`` are absent.
+
+``ilist_traits<T>`` is a public base of this class and can be used for a wide
+variety of customizations.
+
+.. _dss_ilist_node:
+
+llvm/ADT/ilist_node.h
+^^^^^^^^^^^^^^^^^^^^^
+
+``ilist_node<T>`` implements the forward and backward links that are expected
+by the ``ilist<T>`` (and analogous containers) in the default manner.
+
+``ilist_node<T>``\ s are meant to be embedded in the node type ``T``, usually
+``T`` publicly derives from ``ilist_node<T>``.
+
+.. _dss_ilist_sentinel:
+
+Sentinels
+^^^^^^^^^
+
+``ilist``\ s have another specialty that must be considered. To be a good
+citizen in the C++ ecosystem, it needs to support the standard container
+operations, such as ``begin`` and ``end`` iterators, etc. Also, the
+``operator--`` must work correctly on the ``end`` iterator in the case of
+non-empty ``ilist``\ s.
+
+The only sensible solution to this problem is to allocate a so-called *sentinel*
+along with the intrusive list, which serves as the ``end`` iterator, providing
+the back-link to the last element. However conforming to the C++ convention it
+is illegal to ``operator++`` beyond the sentinel and it also must not be
+dereferenced.
+
+These constraints allow for some implementation freedom to the ``ilist`` how to
+allocate and store the sentinel. The corresponding policy is dictated by
+``ilist_traits<T>``. By default a ``T`` gets heap-allocated whenever the need
+for a sentinel arises.
+
+While the default policy is sufficient in most cases, it may break down when
+``T`` does not provide a default constructor. Also, in the case of many
+instances of ``ilist``\ s, the memory overhead of the associated sentinels is
+wasted. To alleviate the situation with numerous and voluminous
+``T``-sentinels, sometimes a trick is employed, leading to *ghostly sentinels*.
+
+Ghostly sentinels are obtained by specially-crafted ``ilist_traits<T>`` which
+superpose the sentinel with the ``ilist`` instance in memory. Pointer
+arithmetic is used to obtain the sentinel, which is relative to the ``ilist``'s
+``this`` pointer. The ``ilist`` is augmented by an extra pointer, which serves
+as the back-link of the sentinel. This is the only field in the ghostly
+sentinel which can be legally accessed.
+
+.. _dss_other:
+
+Other Sequential Container options
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Other STL containers are available, such as ``std::string``.
+
+There are also various STL adapter classes such as ``std::queue``,
+``std::priority_queue``, ``std::stack``, etc. These provide simplified access
+to an underlying container but don't affect the cost of the container itself.
+
+.. _ds_string:
+
+String-like containers
+----------------------
+
+There are a variety of ways to pass around and use strings in C and C++, and
+LLVM adds a few new options to choose from. Pick the first option on this list
+that will do what you need, they are ordered according to their relative cost.
+
+Note that it is generally preferred to *not* pass strings around as ``const
+char*``'s. These have a number of problems, including the fact that they
+cannot represent embedded nul ("\0") characters, and do not have a length
+available efficiently. The general replacement for '``const char*``' is
+StringRef.
+
+For more information on choosing string containers for APIs, please see
+:ref:`Passing Strings <string_apis>`.
+
+.. _dss_stringref:
+
+llvm/ADT/StringRef.h
+^^^^^^^^^^^^^^^^^^^^
+
+The StringRef class is a simple value class that contains a pointer to a
+character and a length, and is quite related to the :ref:`ArrayRef
+<dss_arrayref>` class (but specialized for arrays of characters). Because
+StringRef carries a length with it, it safely handles strings with embedded nul
+characters in it, getting the length does not require a strlen call, and it even
+has very convenient APIs for slicing and dicing the character range that it
+represents.
+
+StringRef is ideal for passing simple strings around that are known to be live,
+either because they are C string literals, std::string, a C array, or a
+SmallVector. Each of these cases has an efficient implicit conversion to
+StringRef, which doesn't result in a dynamic strlen being executed.
+
+StringRef has a few major limitations which make more powerful string containers
+useful:
+
+#. You cannot directly convert a StringRef to a 'const char*' because there is
+ no way to add a trailing nul (unlike the .c_str() method on various stronger
+ classes).
+
+#. StringRef doesn't own or keep alive the underlying string bytes.
+ As such it can easily lead to dangling pointers, and is not suitable for
+ embedding in datastructures in most cases (instead, use an std::string or
+ something like that).
+
+#. For the same reason, StringRef cannot be used as the return value of a
+ method if the method "computes" the result string. Instead, use std::string.
+
+#. StringRef's do not allow you to mutate the pointed-to string bytes and it
+ doesn't allow you to insert or remove bytes from the range. For editing
+ operations like this, it interoperates with the :ref:`Twine <dss_twine>`
+ class.
+
+Because of its strengths and limitations, it is very common for a function to
+take a StringRef and for a method on an object to return a StringRef that points
+into some string that it owns.
+
+.. _dss_twine:
+
+llvm/ADT/Twine.h
+^^^^^^^^^^^^^^^^
+
+The Twine class is used as an intermediary datatype for APIs that want to take a
+string that can be constructed inline with a series of concatenations. Twine
+works by forming recursive instances of the Twine datatype (a simple value
+object) on the stack as temporary objects, linking them together into a tree
+which is then linearized when the Twine is consumed. Twine is only safe to use
+as the argument to a function, and should always be a const reference, e.g.:
+
+.. code-block:: c++
+
+ void foo(const Twine &T);
+ ...
+ StringRef X = ...
+ unsigned i = ...
+ foo(X + "." + Twine(i));
+
+This example forms a string like "blarg.42" by concatenating the values
+together, and does not form intermediate strings containing "blarg" or "blarg.".
+
+Because Twine is constructed with temporary objects on the stack, and because
+these instances are destroyed at the end of the current statement, it is an
+inherently dangerous API. For example, this simple variant contains undefined
+behavior and will probably crash:
+
+.. code-block:: c++
+
+ void foo(const Twine &T);
+ ...
+ StringRef X = ...
+ unsigned i = ...
+ const Twine &Tmp = X + "." + Twine(i);
+ foo(Tmp);
+
+... because the temporaries are destroyed before the call. That said, Twine's
+are much more efficient than intermediate std::string temporaries, and they work
+really well with StringRef. Just be aware of their limitations.
+
+.. _dss_smallstring:
+
+llvm/ADT/SmallString.h
+^^^^^^^^^^^^^^^^^^^^^^
+
+SmallString is a subclass of :ref:`SmallVector <dss_smallvector>` that adds some
+convenience APIs like += that takes StringRef's. SmallString avoids allocating
+memory in the case when the preallocated space is enough to hold its data, and
+it calls back to general heap allocation when required. Since it owns its data,
+it is very safe to use and supports full mutation of the string.
+
+Like SmallVector's, the big downside to SmallString is their sizeof. While they
+are optimized for small strings, they themselves are not particularly small.
+This means that they work great for temporary scratch buffers on the stack, but
+should not generally be put into the heap: it is very rare to see a SmallString
+as the member of a frequently-allocated heap data structure or returned
+by-value.
+
+.. _dss_stdstring:
+
+std::string
+^^^^^^^^^^^
+
+The standard C++ std::string class is a very general class that (like
+SmallString) owns its underlying data. sizeof(std::string) is very reasonable
+so it can be embedded into heap data structures and returned by-value. On the
+other hand, std::string is highly inefficient for inline editing (e.g.
+concatenating a bunch of stuff together) and because it is provided by the
+standard library, its performance characteristics depend a lot of the host
+standard library (e.g. libc++ and MSVC provide a highly optimized string class,
+GCC contains a really slow implementation).
+
+The major disadvantage of std::string is that almost every operation that makes
+them larger can allocate memory, which is slow. As such, it is better to use
+SmallVector or Twine as a scratch buffer, but then use std::string to persist
+the result.
+
+.. _ds_set:
+
+Set-Like Containers (std::set, SmallSet, SetVector, etc)
+--------------------------------------------------------
+
+Set-like containers are useful when you need to canonicalize multiple values
+into a single representation. There are several different choices for how to do
+this, providing various trade-offs.
+
+.. _dss_sortedvectorset:
+
+A sorted 'vector'
+^^^^^^^^^^^^^^^^^
+
+If you intend to insert a lot of elements, then do a lot of queries, a great
+approach is to use an std::vector (or other sequential container) with
+std::sort+std::unique to remove duplicates. This approach works really well if
+your usage pattern has these two distinct phases (insert then query), and can be
+coupled with a good choice of :ref:`sequential container <ds_sequential>`.
+
+This combination provides the several nice properties: the result data is
+contiguous in memory (good for cache locality), has few allocations, is easy to
+address (iterators in the final vector are just indices or pointers), and can be
+efficiently queried with a standard binary search (e.g.
+``std::lower_bound``; if you want the whole range of elements comparing
+equal, use ``std::equal_range``).
+
+.. _dss_smallset:
+
+llvm/ADT/SmallSet.h
+^^^^^^^^^^^^^^^^^^^
+
+If you have a set-like data structure that is usually small and whose elements
+are reasonably small, a ``SmallSet<Type, N>`` is a good choice. This set has
+space for N elements in place (thus, if the set is dynamically smaller than N,
+no malloc traffic is required) and accesses them with a simple linear search.
+When the set grows beyond N elements, it allocates a more expensive
+representation that guarantees efficient access (for most types, it falls back
+to :ref:`std::set <dss_set>`, but for pointers it uses something far better,
+:ref:`SmallPtrSet <dss_smallptrset>`.
+
+The magic of this class is that it handles small sets extremely efficiently, but
+gracefully handles extremely large sets without loss of efficiency.
+
+.. _dss_smallptrset:
+
+llvm/ADT/SmallPtrSet.h
+^^^^^^^^^^^^^^^^^^^^^^
+
+``SmallPtrSet`` has all the advantages of ``SmallSet`` (and a ``SmallSet`` of
+pointers is transparently implemented with a ``SmallPtrSet``). If more than N
+insertions are performed, a single quadratically probed hash table is allocated
+and grows as needed, providing extremely efficient access (constant time
+insertion/deleting/queries with low constant factors) and is very stingy with
+malloc traffic.
+
+Note that, unlike :ref:`std::set <dss_set>`, the iterators of ``SmallPtrSet``
+are invalidated whenever an insertion occurs. Also, the values visited by the
+iterators are not visited in sorted order.
+
+.. _dss_stringset:
+
+llvm/ADT/StringSet.h
+^^^^^^^^^^^^^^^^^^^^
+
+``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`,
+and it allows efficient storage and retrieval of unique strings.
+
+Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports
+iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you
+need to call ``i->getKey()`` to access the item of the StringSet.) On the
+other hand, ``StringSet`` doesn't support range-insertion and
+copy-construction, which :ref:`SmallSet <dss_smallset>` and :ref:`SmallPtrSet
+<dss_smallptrset>` do support.
+
+.. _dss_denseset:
+
+llvm/ADT/DenseSet.h
+^^^^^^^^^^^^^^^^^^^
+
+DenseSet is a simple quadratically probed hash table. It excels at supporting
+small values: it uses a single allocation to hold all of the pairs that are
+currently inserted in the set. DenseSet is a great way to unique small values
+that are not simple pointers (use :ref:`SmallPtrSet <dss_smallptrset>` for
+pointers). Note that DenseSet has the same requirements for the value type that
+:ref:`DenseMap <dss_densemap>` has.
+
+.. _dss_sparseset:
+
+llvm/ADT/SparseSet.h
+^^^^^^^^^^^^^^^^^^^^
+
+SparseSet holds a small number of objects identified by unsigned keys of
+moderate size. It uses a lot of memory, but provides operations that are almost
+as fast as a vector. Typical keys are physical registers, virtual registers, or
+numbered basic blocks.
+
+SparseSet is useful for algorithms that need very fast clear/find/insert/erase
+and fast iteration over small sets. It is not intended for building composite
+data structures.
+
+.. _dss_sparsemultiset:
+
+llvm/ADT/SparseMultiSet.h
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+SparseMultiSet adds multiset behavior to SparseSet, while retaining SparseSet's
+desirable attributes. Like SparseSet, it typically uses a lot of memory, but
+provides operations that are almost as fast as a vector. Typical keys are
+physical registers, virtual registers, or numbered basic blocks.
+
+SparseMultiSet is useful for algorithms that need very fast
+clear/find/insert/erase of the entire collection, and iteration over sets of
+elements sharing a key. It is often a more efficient choice than using composite
+data structures (e.g. vector-of-vectors, map-of-vectors). It is not intended for
+building composite data structures.
+
+.. _dss_FoldingSet:
+
+llvm/ADT/FoldingSet.h
+^^^^^^^^^^^^^^^^^^^^^
+
+FoldingSet is an aggregate class that is really good at uniquing
+expensive-to-create or polymorphic objects. It is a combination of a chained
+hash table with intrusive links (uniqued objects are required to inherit from
+FoldingSetNode) that uses :ref:`SmallVector <dss_smallvector>` as part of its ID
+process.
+
+Consider a case where you want to implement a "getOrCreateFoo" method for a
+complex object (for example, a node in the code generator). The client has a
+description of **what** it wants to generate (it knows the opcode and all the
+operands), but we don't want to 'new' a node, then try inserting it into a set
+only to find out it already exists, at which point we would have to delete it
+and return the node that already exists.
+
+To support this style of client, FoldingSet perform a query with a
+FoldingSetNodeID (which wraps SmallVector) that can be used to describe the
+element that we want to query for. The query either returns the element
+matching the ID or it returns an opaque ID that indicates where insertion should
+take place. Construction of the ID usually does not require heap traffic.
+
+Because FoldingSet uses intrusive links, it can support polymorphic objects in
+the set (for example, you can have SDNode instances mixed with LoadSDNodes).
+Because the elements are individually allocated, pointers to the elements are
+stable: inserting or removing elements does not invalidate any pointers to other
+elements.
+
+.. _dss_set:
+
+<set>
+^^^^^
+
+``std::set`` is a reasonable all-around set class, which is decent at many
+things but great at nothing. std::set allocates memory for each element
+inserted (thus it is very malloc intensive) and typically stores three pointers
+per element in the set (thus adding a large amount of per-element space
+overhead). It offers guaranteed log(n) performance, which is not particularly
+fast from a complexity standpoint (particularly if the elements of the set are
+expensive to compare, like strings), and has extremely high constant factors for
+lookup, insertion and removal.
+
+The advantages of std::set are that its iterators are stable (deleting or
+inserting an element from the set does not affect iterators or pointers to other
+elements) and that iteration over the set is guaranteed to be in sorted order.
+If the elements in the set are large, then the relative overhead of the pointers
+and malloc traffic is not a big deal, but if the elements of the set are small,
+std::set is almost never a good choice.
+
+.. _dss_setvector:
+
+llvm/ADT/SetVector.h
+^^^^^^^^^^^^^^^^^^^^
+
+LLVM's ``SetVector<Type>`` is an adapter class that combines your choice of a
+set-like container along with a :ref:`Sequential Container <ds_sequential>` The
+important property that this provides is efficient insertion with uniquing
+(duplicate elements are ignored) with iteration support. It implements this by
+inserting elements into both a set-like container and the sequential container,
+using the set-like container for uniquing and the sequential container for
+iteration.
+
+The difference between SetVector and other sets is that the order of iteration
+is guaranteed to match the order of insertion into the SetVector. This property
+is really important for things like sets of pointers. Because pointer values
+are non-deterministic (e.g. vary across runs of the program on different
+machines), iterating over the pointers in the set will not be in a well-defined
+order.
+
+The drawback of SetVector is that it requires twice as much space as a normal
+set and has the sum of constant factors from the set-like container and the
+sequential container that it uses. Use it **only** if you need to iterate over
+the elements in a deterministic order. SetVector is also expensive to delete
+elements out of (linear time), unless you use its "pop_back" method, which is
+faster.
+
+``SetVector`` is an adapter class that defaults to using ``std::vector`` and a
+size 16 ``SmallSet`` for the underlying containers, so it is quite expensive.
+However, ``"llvm/ADT/SetVector.h"`` also provides a ``SmallSetVector`` class,
+which defaults to using a ``SmallVector`` and ``SmallSet`` of a specified size.
+If you use this, and if your sets are dynamically smaller than ``N``, you will
+save a lot of heap traffic.
+
+.. _dss_uniquevector:
+
+llvm/ADT/UniqueVector.h
+^^^^^^^^^^^^^^^^^^^^^^^
+
+UniqueVector is similar to :ref:`SetVector <dss_setvector>` but it retains a
+unique ID for each element inserted into the set. It internally contains a map
+and a vector, and it assigns a unique ID for each value inserted into the set.
+
+UniqueVector is very expensive: its cost is the sum of the cost of maintaining
+both the map and vector, it has high complexity, high constant factors, and
+produces a lot of malloc traffic. It should be avoided.
+
+.. _dss_immutableset:
+
+llvm/ADT/ImmutableSet.h
+^^^^^^^^^^^^^^^^^^^^^^^
+
+ImmutableSet is an immutable (functional) set implementation based on an AVL
+tree. Adding or removing elements is done through a Factory object and results
+in the creation of a new ImmutableSet object. If an ImmutableSet already exists
+with the given contents, then the existing one is returned; equality is compared
+with a FoldingSetNodeID. The time and space complexity of add or remove
+operations is logarithmic in the size of the original set.
+
+There is no method for returning an element of the set, you can only check for
+membership.
+
+.. _dss_otherset:
+
+Other Set-Like Container Options
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The STL provides several other options, such as std::multiset and the various
+"hash_set" like containers (whether from C++ TR1 or from the SGI library). We
+never use hash_set and unordered_set because they are generally very expensive
+(each insertion requires a malloc) and very non-portable.
+
+std::multiset is useful if you're not interested in elimination of duplicates,
+but has all the drawbacks of :ref:`std::set <dss_set>`. A sorted vector
+(where you don't delete duplicate entries) or some other approach is almost
+always better.
+
+.. _ds_map:
+
+Map-Like Containers (std::map, DenseMap, etc)
+---------------------------------------------
+
+Map-like containers are useful when you want to associate data to a key. As
+usual, there are a lot of different ways to do this. :)
+
+.. _dss_sortedvectormap:
+
+A sorted 'vector'
+^^^^^^^^^^^^^^^^^
+
+If your usage pattern follows a strict insert-then-query approach, you can
+trivially use the same approach as :ref:`sorted vectors for set-like containers
+<dss_sortedvectorset>`. The only difference is that your query function (which
+uses std::lower_bound to get efficient log(n) lookup) should only compare the
+key, not both the key and value. This yields the same advantages as sorted
+vectors for sets.
+
+.. _dss_stringmap:
+
+llvm/ADT/StringMap.h
+^^^^^^^^^^^^^^^^^^^^
+
+Strings are commonly used as keys in maps, and they are difficult to support
+efficiently: they are variable length, inefficient to hash and compare when
+long, expensive to copy, etc. StringMap is a specialized container designed to
+cope with these issues. It supports mapping an arbitrary range of bytes to an
+arbitrary other object.
+
+The StringMap implementation uses a quadratically-probed hash table, where the
+buckets store a pointer to the heap allocated entries (and some other stuff).
+The entries in the map must be heap allocated because the strings are variable
+length. The string data (key) and the element object (value) are stored in the
+same allocation with the string data immediately after the element object.
+This container guarantees the "``(char*)(&Value+1)``" points to the key string
+for a value.
+
+The StringMap is very fast for several reasons: quadratic probing is very cache
+efficient for lookups, the hash value of strings in buckets is not recomputed
+when looking up an element, StringMap rarely has to touch the memory for
+unrelated objects when looking up a value (even when hash collisions happen),
+hash table growth does not recompute the hash values for strings already in the
+table, and each pair in the map is store in a single allocation (the string data
+is stored in the same allocation as the Value of a pair).
+
+StringMap also provides query methods that take byte ranges, so it only ever
+copies a string if a value is inserted into the table.
+
+StringMap iteration order, however, is not guaranteed to be deterministic, so
+any uses which require that should instead use a std::map.
+
+.. _dss_indexmap:
+
+llvm/ADT/IndexedMap.h
+^^^^^^^^^^^^^^^^^^^^^
+
+IndexedMap is a specialized container for mapping small dense integers (or
+values that can be mapped to small dense integers) to some other type. It is
+internally implemented as a vector with a mapping function that maps the keys
+to the dense integer range.
+
+This is useful for cases like virtual registers in the LLVM code generator: they
+have a dense mapping that is offset by a compile-time constant (the first
+virtual register ID).
+
+.. _dss_densemap:
+
+llvm/ADT/DenseMap.h
+^^^^^^^^^^^^^^^^^^^
+
+DenseMap is a simple quadratically probed hash table. It excels at supporting
+small keys and values: it uses a single allocation to hold all of the pairs
+that are currently inserted in the map. DenseMap is a great way to map
+pointers to pointers, or map other small types to each other.
+
+There are several aspects of DenseMap that you should be aware of, however.
+The iterators in a DenseMap are invalidated whenever an insertion occurs,
+unlike map. Also, because DenseMap allocates space for a large number of
+key/value pairs (it starts with 64 by default), it will waste a lot of space if
+your keys or values are large. Finally, you must implement a partial
+specialization of DenseMapInfo for the key that you want, if it isn't already
+supported. This is required to tell DenseMap about two special marker values
+(which can never be inserted into the map) that it needs internally.
+
+DenseMap's find_as() method supports lookup operations using an alternate key
+type. This is useful in cases where the normal key type is expensive to
+construct, but cheap to compare against. The DenseMapInfo is responsible for
+defining the appropriate comparison and hashing methods for each alternate key
+type used.
+
+.. _dss_valuemap:
+
+llvm/IR/ValueMap.h
+^^^^^^^^^^^^^^^^^^^
+
+ValueMap is a wrapper around a :ref:`DenseMap <dss_densemap>` mapping
+``Value*``\ s (or subclasses) to another type. When a Value is deleted or
+RAUW'ed, ValueMap will update itself so the new version of the key is mapped to
+the same value, just as if the key were a WeakVH. You can configure exactly how
+this happens, and what else happens on these two events, by passing a ``Config``
+parameter to the ValueMap template.
+
+.. _dss_intervalmap:
+
+llvm/ADT/IntervalMap.h
+^^^^^^^^^^^^^^^^^^^^^^
+
+IntervalMap is a compact map for small keys and values. It maps key intervals
+instead of single keys, and it will automatically coalesce adjacent intervals.
+When the map only contains a few intervals, they are stored in the map object
+itself to avoid allocations.
+
+The IntervalMap iterators are quite big, so they should not be passed around as
+STL iterators. The heavyweight iterators allow a smaller data structure.
+
+.. _dss_map:
+
+<map>
+^^^^^
+
+std::map has similar characteristics to :ref:`std::set <dss_set>`: it uses a
+single allocation per pair inserted into the map, it offers log(n) lookup with
+an extremely large constant factor, imposes a space penalty of 3 pointers per
+pair in the map, etc.
+
+std::map is most useful when your keys or values are very large, if you need to
+iterate over the collection in sorted order, or if you need stable iterators
+into the map (i.e. they don't get invalidated if an insertion or deletion of
+another element takes place).
+
+.. _dss_mapvector:
+
+llvm/ADT/MapVector.h
+^^^^^^^^^^^^^^^^^^^^
+
+``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The
+main difference is that the iteration order is guaranteed to be the insertion
+order, making it an easy (but somewhat expensive) solution for non-deterministic
+iteration over maps of pointers.
+
+It is implemented by mapping from key to an index in a vector of key,value
+pairs. This provides fast lookup and iteration, but has two main drawbacks:
+the key is stored twice and removing elements takes linear time. If it is
+necessary to remove elements, it's best to remove them in bulk using
+``remove_if()``.
+
+.. _dss_inteqclasses:
+
+llvm/ADT/IntEqClasses.h
+^^^^^^^^^^^^^^^^^^^^^^^
+
+IntEqClasses provides a compact representation of equivalence classes of small
+integers. Initially, each integer in the range 0..n-1 has its own equivalence
+class. Classes can be joined by passing two class representatives to the
+join(a, b) method. Two integers are in the same class when findLeader() returns
+the same representative.
+
+Once all equivalence classes are formed, the map can be compressed so each
+integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m
+is the total number of equivalence classes. The map must be uncompressed before
+it can be edited again.
+
+.. _dss_immutablemap:
+
+llvm/ADT/ImmutableMap.h
+^^^^^^^^^^^^^^^^^^^^^^^
+
+ImmutableMap is an immutable (functional) map implementation based on an AVL
+tree. Adding or removing elements is done through a Factory object and results
+in the creation of a new ImmutableMap object. If an ImmutableMap already exists
+with the given key set, then the existing one is returned; equality is compared
+with a FoldingSetNodeID. The time and space complexity of add or remove
+operations is logarithmic in the size of the original map.
+
+.. _dss_othermap:
+
+Other Map-Like Container Options
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The STL provides several other options, such as std::multimap and the various
+"hash_map" like containers (whether from C++ TR1 or from the SGI library). We
+never use hash_set and unordered_set because they are generally very expensive
+(each insertion requires a malloc) and very non-portable.
+
+std::multimap is useful if you want to map a key to multiple values, but has all
+the drawbacks of std::map. A sorted vector or some other approach is almost
+always better.
+
+.. _ds_bit:
+
+Bit storage containers (BitVector, SparseBitVector)
+---------------------------------------------------
+
+Unlike the other containers, there are only two bit storage containers, and
+choosing when to use each is relatively straightforward.
+
+One additional option is ``std::vector<bool>``: we discourage its use for two
+reasons 1) the implementation in many common compilers (e.g. commonly
+available versions of GCC) is extremely inefficient and 2) the C++ standards
+committee is likely to deprecate this container and/or change it significantly
+somehow. In any case, please don't use it.
+
+.. _dss_bitvector:
+
+BitVector
+^^^^^^^^^
+
+The BitVector container provides a dynamic size set of bits for manipulation.
+It supports individual bit setting/testing, as well as set operations. The set
+operations take time O(size of bitvector), but operations are performed one word
+at a time, instead of one bit at a time. This makes the BitVector very fast for
+set operations compared to other containers. Use the BitVector when you expect
+the number of set bits to be high (i.e. a dense set).
+
+.. _dss_smallbitvector:
+
+SmallBitVector
+^^^^^^^^^^^^^^
+
+The SmallBitVector container provides the same interface as BitVector, but it is
+optimized for the case where only a small number of bits, less than 25 or so,
+are needed. It also transparently supports larger bit counts, but slightly less
+efficiently than a plain BitVector, so SmallBitVector should only be used when
+larger counts are rare.
+
+At this time, SmallBitVector does not support set operations (and, or, xor), and
+its operator[] does not provide an assignable lvalue.
+
+.. _dss_sparsebitvector:
+
+SparseBitVector
+^^^^^^^^^^^^^^^
+
+The SparseBitVector container is much like BitVector, with one major difference:
+Only the bits that are set, are stored. This makes the SparseBitVector much
+more space efficient than BitVector when the set is sparse, as well as making
+set operations O(number of set bits) instead of O(size of universe). The
+downside to the SparseBitVector is that setting and testing of random bits is
+O(N), and on large SparseBitVectors, this can be slower than BitVector. In our
+implementation, setting or testing bits in sorted order (either forwards or
+reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends
+on size) of the current bit is also O(1). As a general statement,
+testing/setting bits in a SparseBitVector is O(distance away from last set bit).
+
+.. _debugging:
+
+Debugging
+=========
+
+A handful of `GDB pretty printers
+<https://sourceware.org/gdb/onlinedocs/gdb/Pretty-Printing.html>`__ are
+provided for some of the core LLVM libraries. To use them, execute the
+following (or add it to your ``~/.gdbinit``)::
+
+ source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py
+
+It also might be handy to enable the `print pretty
+<http://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_57.html>`__ option to
+avoid data structures being printed as a big block of text.
+
+.. _common:
+
+Helpful Hints for Common Operations
+===================================
+
+This section describes how to perform some very simple transformations of LLVM
+code. This is meant to give examples of common idioms used, showing the
+practical side of LLVM transformations.
+
+Because this is a "how-to" section, you should also read about the main classes
+that you will be working with. The :ref:`Core LLVM Class Hierarchy Reference
+<coreclasses>` contains details and descriptions of the main classes that you
+should know about.
+
+.. _inspection:
+
+Basic Inspection and Traversal Routines
+---------------------------------------
+
+The LLVM compiler infrastructure have many different data structures that may be
+traversed. Following the example of the C++ standard template library, the
+techniques used to traverse these various data structures are all basically the
+same. For a enumerable sequence of values, the ``XXXbegin()`` function (or
+method) returns an iterator to the start of the sequence, the ``XXXend()``
+function returns an iterator pointing to one past the last valid element of the
+sequence, and there is some ``XXXiterator`` data type that is common between the
+two operations.
+
+Because the pattern for iteration is common across many different aspects of the
+program representation, the standard template library algorithms may be used on
+them, and it is easier to remember how to iterate. First we show a few common
+examples of the data structures that need to be traversed. Other data
+structures are traversed in very similar ways.
+
+.. _iterate_function:
+
+Iterating over the ``BasicBlock`` in a ``Function``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It's quite common to have a ``Function`` instance that you'd like to transform
+in some way; in particular, you'd like to manipulate its ``BasicBlock``\ s. To
+facilitate this, you'll need to iterate over all of the ``BasicBlock``\ s that
+constitute the ``Function``. The following is an example that prints the name
+of a ``BasicBlock`` and the number of ``Instruction``\ s it contains:
+
+.. code-block:: c++
+
+ Function &Func = ...
+ for (BasicBlock &BB : Func)
+ // Print out the name of the basic block if it has one, and then the
+ // number of instructions that it contains
+ errs() << "Basic block (name=" << BB.getName() << ") has "
+ << BB.size() << " instructions.\n";
+
+.. _iterate_basicblock:
+
+Iterating over the ``Instruction`` in a ``BasicBlock``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Just like when dealing with ``BasicBlock``\ s in ``Function``\ s, it's easy to
+iterate over the individual instructions that make up ``BasicBlock``\ s. Here's
+a code snippet that prints out each instruction in a ``BasicBlock``:
+
+.. code-block:: c++
+
+ BasicBlock& BB = ...
+ for (Instruction &I : BB)
+ // The next statement works since operator<<(ostream&,...)
+ // is overloaded for Instruction&
+ errs() << I << "\n";
+
+
+However, this isn't really the best way to print out the contents of a
+``BasicBlock``! Since the ostream operators are overloaded for virtually
+anything you'll care about, you could have just invoked the print routine on the
+basic block itself: ``errs() << BB << "\n";``.
+
+.. _iterate_insiter:
+
+Iterating over the ``Instruction`` in a ``Function``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you're finding that you commonly iterate over a ``Function``'s
+``BasicBlock``\ s and then that ``BasicBlock``'s ``Instruction``\ s,
+``InstIterator`` should be used instead. You'll need to include
+``llvm/IR/InstIterator.h`` (`doxygen
+<http://llvm.org/doxygen/InstIterator_8h.html>`__) and then instantiate
+``InstIterator``\ s explicitly in your code. Here's a small example that shows
+how to dump all instructions in a function to the standard error stream:
+
+.. code-block:: c++
+
+ #include "llvm/IR/InstIterator.h"
+
+ // F is a pointer to a Function instance
+ for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
+ errs() << *I << "\n";
+
+Easy, isn't it? You can also use ``InstIterator``\ s to fill a work list with
+its initial contents. For example, if you wanted to initialize a work list to
+contain all instructions in a ``Function`` F, all you would need to do is
+something like:
+
+.. code-block:: c++
+
+ std::set<Instruction*> worklist;
+ // or better yet, SmallPtrSet<Instruction*, 64> worklist;
+
+ for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
+ worklist.insert(&*I);
+
+The STL set ``worklist`` would now contain all instructions in the ``Function``
+pointed to by F.
+
+.. _iterate_convert:
+
+Turning an iterator into a class pointer (and vice-versa)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sometimes, it'll be useful to grab a reference (or pointer) to a class instance
+when all you've got at hand is an iterator. Well, extracting a reference or a
+pointer from an iterator is very straight-forward. Assuming that ``i`` is a
+``BasicBlock::iterator`` and ``j`` is a ``BasicBlock::const_iterator``:
+
+.. code-block:: c++
+
+ Instruction& inst = *i; // Grab reference to instruction reference
+ Instruction* pinst = &*i; // Grab pointer to instruction reference
+ const Instruction& inst = *j;
+
+However, the iterators you'll be working with in the LLVM framework are special:
+they will automatically convert to a ptr-to-instance type whenever they need to.
+Instead of dereferencing the iterator and then taking the address of the result,
+you can simply assign the iterator to the proper pointer type and you get the
+dereference and address-of operation as a result of the assignment (behind the
+scenes, this is a result of overloading casting mechanisms). Thus the second
+line of the last example,
+
+.. code-block:: c++
+
+ Instruction *pinst = &*i;
+
+is semantically equivalent to
+
+.. code-block:: c++
+
+ Instruction *pinst = i;
+
+It's also possible to turn a class pointer into the corresponding iterator, and
+this is a constant time operation (very efficient). The following code snippet
+illustrates use of the conversion constructors provided by LLVM iterators. By
+using these, you can explicitly grab the iterator of something without actually
+obtaining it via iteration over some structure:
+
+.. code-block:: c++
+
+ void printNextInstruction(Instruction* inst) {
+ BasicBlock::iterator it(inst);
+ ++it; // After this line, it refers to the instruction after *inst
+ if (it != inst->getParent()->end()) errs() << *it << "\n";
+ }
+
+Unfortunately, these implicit conversions come at a cost; they prevent these
+iterators from conforming to standard iterator conventions, and thus from being
+usable with standard algorithms and containers. For example, they prevent the
+following code, where ``B`` is a ``BasicBlock``, from compiling:
+
+.. code-block:: c++
+
+ llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
+
+Because of this, these implicit conversions may be removed some day, and
+``operator*`` changed to return a pointer instead of a reference.
+
+.. _iterate_complex:
+
+Finding call sites: a slightly more complex example
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Say that you're writing a FunctionPass and would like to count all the locations
+in the entire module (that is, across every ``Function``) where a certain
+function (i.e., some ``Function *``) is already in scope. As you'll learn
+later, you may want to use an ``InstVisitor`` to accomplish this in a much more
+straight-forward manner, but this example will allow us to explore how you'd do
+it if you didn't have ``InstVisitor`` around. In pseudo-code, this is what we
+want to do:
+
+.. code-block:: none
+
+ initialize callCounter to zero
+ for each Function f in the Module
+ for each BasicBlock b in f
+ for each Instruction i in b
+ if (i is a CallInst and calls the given function)
+ increment callCounter
+
+And the actual code is (remember, because we're writing a ``FunctionPass``, our
+``FunctionPass``-derived class simply has to override the ``runOnFunction``
+method):
+
+.. code-block:: c++
+
+ Function* targetFunc = ...;
+
+ class OurFunctionPass : public FunctionPass {
+ public:
+ OurFunctionPass(): callCounter(0) { }
+
+ virtual runOnFunction(Function& F) {
+ for (BasicBlock &B : F) {
+ for (Instruction &I: B) {
+ if (auto *CallInst = dyn_cast<CallInst>(&I)) {
+ // We know we've encountered a call instruction, so we
+ // need to determine if it's a call to the
+ // function pointed to by m_func or not.
+ if (CallInst->getCalledFunction() == targetFunc)
+ ++callCounter;
+ }
+ }
+ }
+ }
+
+ private:
+ unsigned callCounter;
+ };
+
+.. _calls_and_invokes:
+
+Treating calls and invokes the same way
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You may have noticed that the previous example was a bit oversimplified in that
+it did not deal with call sites generated by 'invoke' instructions. In this,
+and in other situations, you may find that you want to treat ``CallInst``\ s and
+``InvokeInst``\ s the same way, even though their most-specific common base
+class is ``Instruction``, which includes lots of less closely-related things.
+For these cases, LLVM provides a handy wrapper class called ``CallSite``
+(`doxygen <http://llvm.org/doxygen/classllvm_1_1CallSite.html>`__) It is
+essentially a wrapper around an ``Instruction`` pointer, with some methods that
+provide functionality common to ``CallInst``\ s and ``InvokeInst``\ s.
+
+This class has "value semantics": it should be passed by value, not by reference
+and it should not be dynamically allocated or deallocated using ``operator new``
+or ``operator delete``. It is efficiently copyable, assignable and
+constructable, with costs equivalents to that of a bare pointer. If you look at
+its definition, it has only a single pointer member.
+
+.. _iterate_chains:
+
+Iterating over def-use & use-def chains
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Frequently, we might have an instance of the ``Value`` class (`doxygen
+<http://llvm.org/doxygen/classllvm_1_1Value.html>`__) and we want to determine
+which ``User`` s use the ``Value``. The list of all ``User``\ s of a particular
+``Value`` is called a *def-use* chain. For example, let's say we have a
+``Function*`` named ``F`` to a particular function ``foo``. Finding all of the
+instructions that *use* ``foo`` is as simple as iterating over the *def-use*
+chain of ``F``:
+
+.. code-block:: c++
+
+ Function *F = ...;
+
+ for (User *U : F->users()) {
+ if (Instruction *Inst = dyn_cast<Instruction>(U)) {
+ errs() << "F is used in instruction:\n";
+ errs() << *Inst << "\n";
+ }
+
+Alternatively, it's common to have an instance of the ``User`` Class (`doxygen
+<http://llvm.org/doxygen/classllvm_1_1User.html>`__) and need to know what
+``Value``\ s are used by it. The list of all ``Value``\ s used by a ``User`` is
+known as a *use-def* chain. Instances of class ``Instruction`` are common
+``User`` s, so we might want to iterate over all of the values that a particular
+instruction uses (that is, the operands of the particular ``Instruction``):
+
+.. code-block:: c++
+
+ Instruction *pi = ...;
+
+ for (Use &U : pi->operands()) {
+ Value *v = U.get();
+ // ...
+ }
+
+Declaring objects as ``const`` is an important tool of enforcing mutation free
+algorithms (such as analyses, etc.). For this purpose above iterators come in
+constant flavors as ``Value::const_use_iterator`` and
+``Value::const_op_iterator``. They automatically arise when calling
+``use/op_begin()`` on ``const Value*``\ s or ``const User*``\ s respectively.
+Upon dereferencing, they return ``const Use*``\ s. Otherwise the above patterns
+remain unchanged.
+
+.. _iterate_preds:
+
+Iterating over predecessors & successors of blocks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Iterating over the predecessors and successors of a block is quite easy with the
+routines defined in ``"llvm/IR/CFG.h"``. Just use code like this to
+iterate over all predecessors of BB:
+
+.. code-block:: c++
+
+ #include "llvm/IR/CFG.h"
+ BasicBlock *BB = ...;
+
+ for (BasicBlock *Pred : predecessors(BB)) {
+ // ...
+ }
+
+Similarly, to iterate over successors use ``successors``.
+
+.. _simplechanges:
+
+Making simple changes
+---------------------
+
+There are some primitive transformation operations present in the LLVM
+infrastructure that are worth knowing about. When performing transformations,
+it's fairly common to manipulate the contents of basic blocks. This section
+describes some of the common methods for doing so and gives example code.
+
+.. _schanges_creating:
+
+Creating and inserting new ``Instruction``\ s
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+*Instantiating Instructions*
+
+Creation of ``Instruction``\ s is straight-forward: simply call the constructor
+for the kind of instruction to instantiate and provide the necessary parameters.
+For example, an ``AllocaInst`` only *requires* a (const-ptr-to) ``Type``. Thus:
+
+.. code-block:: c++
+
+ auto *ai = new AllocaInst(Type::Int32Ty);
+
+will create an ``AllocaInst`` instance that represents the allocation of one
+integer in the current stack frame, at run time. Each ``Instruction`` subclass
+is likely to have varying default parameters which change the semantics of the
+instruction, so refer to the `doxygen documentation for the subclass of
+Instruction <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_ that
+you're interested in instantiating.
+
+*Naming values*
+
+It is very useful to name the values of instructions when you're able to, as
+this facilitates the debugging of your transformations. If you end up looking
+at generated LLVM machine code, you definitely want to have logical names
+associated with the results of instructions! By supplying a value for the
+``Name`` (default) parameter of the ``Instruction`` constructor, you associate a
+logical name with the result of the instruction's execution at run time. For
+example, say that I'm writing a transformation that dynamically allocates space
+for an integer on the stack, and that integer is going to be used as some kind
+of index by some other code. To accomplish this, I place an ``AllocaInst`` at
+the first point in the first ``BasicBlock`` of some ``Function``, and I'm
+intending to use it within the same ``Function``. I might do:
+
+.. code-block:: c++
+
+ auto *pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
+
+where ``indexLoc`` is now the logical name of the instruction's execution value,
+which is a pointer to an integer on the run time stack.
+
+*Inserting instructions*
+
+There are essentially three ways to insert an ``Instruction`` into an existing
+sequence of instructions that form a ``BasicBlock``:
+
+* Insertion into an explicit instruction list
+
+ Given a ``BasicBlock* pb``, an ``Instruction* pi`` within that ``BasicBlock``,
+ and a newly-created instruction we wish to insert before ``*pi``, we do the
+ following:
+
+ .. code-block:: c++
+
+ BasicBlock *pb = ...;
+ Instruction *pi = ...;
+ auto *newInst = new Instruction(...);
+
+ pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
+
+ Appending to the end of a ``BasicBlock`` is so common that the ``Instruction``
+ class and ``Instruction``-derived classes provide constructors which take a
+ pointer to a ``BasicBlock`` to be appended to. For example code that looked
+ like:
+
+ .. code-block:: c++
+
+ BasicBlock *pb = ...;
+ auto *newInst = new Instruction(...);
+
+ pb->getInstList().push_back(newInst); // Appends newInst to pb
+
+ becomes:
+
+ .. code-block:: c++
+
+ BasicBlock *pb = ...;
+ auto *newInst = new Instruction(..., pb);
+
+ which is much cleaner, especially if you are creating long instruction
+ streams.
+
+* Insertion into an implicit instruction list
+
+ ``Instruction`` instances that are already in ``BasicBlock``\ s are implicitly
+ associated with an existing instruction list: the instruction list of the
+ enclosing basic block. Thus, we could have accomplished the same thing as the
+ above code without being given a ``BasicBlock`` by doing:
+
+ .. code-block:: c++
+
+ Instruction *pi = ...;
+ auto *newInst = new Instruction(...);
+
+ pi->getParent()->getInstList().insert(pi, newInst);
+
+ In fact, this sequence of steps occurs so frequently that the ``Instruction``
+ class and ``Instruction``-derived classes provide constructors which take (as
+ a default parameter) a pointer to an ``Instruction`` which the newly-created
+ ``Instruction`` should precede. That is, ``Instruction`` constructors are
+ capable of inserting the newly-created instance into the ``BasicBlock`` of a
+ provided instruction, immediately before that instruction. Using an
+ ``Instruction`` constructor with a ``insertBefore`` (default) parameter, the
+ above code becomes:
+
+ .. code-block:: c++
+
+ Instruction* pi = ...;
+ auto *newInst = new Instruction(..., pi);
+
+ which is much cleaner, especially if you're creating a lot of instructions and
+ adding them to ``BasicBlock``\ s.
+
+* Insertion using an instance of ``IRBuilder``
+
+ Inserting several ``Instruction``\ s can be quite laborious using the previous
+ methods. The ``IRBuilder`` is a convenience class that can be used to add
+ several instructions to the end of a ``BasicBlock`` or before a particular
+ ``Instruction``. It also supports constant folding and renaming named
+ registers (see ``IRBuilder``'s template arguments).
+
+ The example below demonstrates a very simple use of the ``IRBuilder`` where
+ three instructions are inserted before the instruction ``pi``. The first two
+ instructions are Call instructions and third instruction multiplies the return
+ value of the two calls.
+
+ .. code-block:: c++
+
+ Instruction *pi = ...;
+ IRBuilder<> Builder(pi);
+ CallInst* callOne = Builder.CreateCall(...);
+ CallInst* callTwo = Builder.CreateCall(...);
+ Value* result = Builder.CreateMul(callOne, callTwo);
+
+ The example below is similar to the above example except that the created
+ ``IRBuilder`` inserts instructions at the end of the ``BasicBlock`` ``pb``.
+
+ .. code-block:: c++
+
+ BasicBlock *pb = ...;
+ IRBuilder<> Builder(pb);
+ CallInst* callOne = Builder.CreateCall(...);
+ CallInst* callTwo = Builder.CreateCall(...);
+ Value* result = Builder.CreateMul(callOne, callTwo);
+
+ See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``.
+
+
+.. _schanges_deleting:
+
+Deleting Instructions
+^^^^^^^^^^^^^^^^^^^^^
+
+Deleting an instruction from an existing sequence of instructions that form a
+BasicBlock_ is very straight-forward: just call the instruction's
+``eraseFromParent()`` method. For example:
+
+.. code-block:: c++
+
+ Instruction *I = .. ;
+ I->eraseFromParent();
+
+This unlinks the instruction from its containing basic block and deletes it. If
+you'd just like to unlink the instruction from its containing basic block but
+not delete it, you can use the ``removeFromParent()`` method.
+
+.. _schanges_replacing:
+
+Replacing an Instruction with another Value
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Replacing individual instructions
+"""""""""""""""""""""""""""""""""
+
+Including "`llvm/Transforms/Utils/BasicBlockUtils.h
+<http://llvm.org/doxygen/BasicBlockUtils_8h_source.html>`_" permits use of two
+very useful replace functions: ``ReplaceInstWithValue`` and
+``ReplaceInstWithInst``.
+
+.. _schanges_deleting_sub:
+
+Deleting Instructions
+"""""""""""""""""""""
+
+* ``ReplaceInstWithValue``
+
+ This function replaces all uses of a given instruction with a value, and then
+ removes the original instruction. The following example illustrates the
+ replacement of the result of a particular ``AllocaInst`` that allocates memory
+ for a single integer with a null pointer to an integer.
+
+ .. code-block:: c++
+
+ AllocaInst* instToReplace = ...;
+ BasicBlock::iterator ii(instToReplace);
+
+ ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii,
+ Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
+
+* ``ReplaceInstWithInst``
+
+ This function replaces a particular instruction with another instruction,
+ inserting the new instruction into the basic block at the location where the
+ old instruction was, and replacing any uses of the old instruction with the
+ new instruction. The following example illustrates the replacement of one
+ ``AllocaInst`` with another.
+
+ .. code-block:: c++
+
+ AllocaInst* instToReplace = ...;
+ BasicBlock::iterator ii(instToReplace);
+
+ ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii,
+ new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
+
+
+Replacing multiple uses of Users and Values
+"""""""""""""""""""""""""""""""""""""""""""
+
+You can use ``Value::replaceAllUsesWith`` and ``User::replaceUsesOfWith`` to
+change more than one use at a time. See the doxygen documentation for the
+`Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_ and `User Class
+<http://llvm.org/doxygen/classllvm_1_1User.html>`_, respectively, for more
+information.
+
+.. _schanges_deletingGV:
+
+Deleting GlobalVariables
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Deleting a global variable from a module is just as easy as deleting an
+Instruction. First, you must have a pointer to the global variable that you
+wish to delete. You use this pointer to erase it from its parent, the module.
+For example:
+
+.. code-block:: c++
+
+ GlobalVariable *GV = .. ;
+
+ GV->eraseFromParent();
+
+
+.. _threading:
+
+Threads and LLVM
+================
+
+This section describes the interaction of the LLVM APIs with multithreading,
+both on the part of client applications, and in the JIT, in the hosted
+application.
+
+Note that LLVM's support for multithreading is still relatively young. Up
+through version 2.5, the execution of threaded hosted applications was
+supported, but not threaded client access to the APIs. While this use case is
+now supported, clients *must* adhere to the guidelines specified below to ensure
+proper operation in multithreaded mode.
+
+Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic
+intrinsics in order to support threaded operation. If you need a
+multhreading-capable LLVM on a platform without a suitably modern system
+compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and
+using the resultant compiler to build a copy of LLVM with multithreading
+support.
+
+.. _shutdown:
+
+Ending Execution with ``llvm_shutdown()``
+-----------------------------------------
+
+When you are done using the LLVM APIs, you should call ``llvm_shutdown()`` to
+deallocate memory used for internal structures.
+
+.. _managedstatic:
+
+Lazy Initialization with ``ManagedStatic``
+------------------------------------------
+
+``ManagedStatic`` is a utility class in LLVM used to implement static
+initialization of static resources, such as the global type tables. In a
+single-threaded environment, it implements a simple lazy initialization scheme.
+When LLVM is compiled with support for multi-threading, however, it uses
+double-checked locking to implement thread-safe lazy initialization.
+
+.. _llvmcontext:
+
+Achieving Isolation with ``LLVMContext``
+----------------------------------------
+
+``LLVMContext`` is an opaque class in the LLVM API which clients can use to
+operate multiple, isolated instances of LLVM concurrently within the same
+address space. For instance, in a hypothetical compile-server, the compilation
+of an individual translation unit is conceptually independent from all the
+others, and it would be desirable to be able to compile incoming translation
+units concurrently on independent server threads. Fortunately, ``LLVMContext``
+exists to enable just this kind of scenario!
+
+Conceptually, ``LLVMContext`` provides isolation. Every LLVM entity
+(``Module``\ s, ``Value``\ s, ``Type``\ s, ``Constant``\ s, etc.) in LLVM's
+in-memory IR belongs to an ``LLVMContext``. Entities in different contexts
+*cannot* interact with each other: ``Module``\ s in different contexts cannot be
+linked together, ``Function``\ s cannot be added to ``Module``\ s in different
+contexts, etc. What this means is that is safe to compile on multiple
+threads simultaneously, as long as no two threads operate on entities within the
+same context.
+
+In practice, very few places in the API require the explicit specification of a
+``LLVMContext``, other than the ``Type`` creation/lookup APIs. Because every
+``Type`` carries a reference to its owning context, most other entities can
+determine what context they belong to by looking at their own ``Type``. If you
+are adding new entities to LLVM IR, please try to maintain this interface
+design.
+
+.. _jitthreading:
+
+Threads and the JIT
+-------------------
+
+LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple
+threads can call ``ExecutionEngine::getPointerToFunction()`` or
+``ExecutionEngine::runFunction()`` concurrently, and multiple threads can run
+code output by the JIT concurrently. The user must still ensure that only one
+thread accesses IR in a given ``LLVMContext`` while another thread might be
+modifying it. One way to do that is to always hold the JIT lock while accessing
+IR outside the JIT (the JIT *modifies* the IR by adding ``CallbackVH``\ s).
+Another way is to only call ``getPointerToFunction()`` from the
+``LLVMContext``'s thread.
+
+When the JIT is configured to compile lazily (using
+``ExecutionEngine::DisableLazyCompilation(false)``), there is currently a `race
+condition <https://bugs.llvm.org/show_bug.cgi?id=5184>`_ in updating call sites
+after a function is lazily-jitted. It's still possible to use the lazy JIT in a
+threaded program if you ensure that only one thread at a time can call any
+particular lazy stub and that the JIT lock guards any IR access, but we suggest
+using only the eager JIT in threaded programs.
+
+.. _advanced:
+
+Advanced Topics
+===============
+
+This section describes some of the advanced or obscure API's that most clients
+do not need to be aware of. These API's tend manage the inner workings of the
+LLVM system, and only need to be accessed in unusual circumstances.
+
+.. _SymbolTable:
+
+The ``ValueSymbolTable`` class
+------------------------------
+
+The ``ValueSymbolTable`` (`doxygen
+<http://llvm.org/doxygen/classllvm_1_1ValueSymbolTable.html>`__) class provides
+a symbol table that the :ref:`Function <c_Function>` and Module_ classes use for
+naming value definitions. The symbol table can provide a name for any Value_.
+
+Note that the ``SymbolTable`` class should not be directly accessed by most
+clients. It should only be used when iteration over the symbol table names
+themselves are required, which is very special purpose. Note that not all LLVM
+Value_\ s have names, and those without names (i.e. they have an empty name) do
+not exist in the symbol table.
+
+Symbol tables support iteration over the values in the symbol table with
+``begin/end/iterator`` and supports querying to see if a specific name is in the
+symbol table (with ``lookup``). The ``ValueSymbolTable`` class exposes no
+public mutator methods, instead, simply call ``setName`` on a value, which will
+autoinsert it into the appropriate symbol table.
+
+.. _UserLayout:
+
+The ``User`` and owned ``Use`` classes' memory layout
+-----------------------------------------------------
+
+The ``User`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1User.html>`__)
+class provides a basis for expressing the ownership of ``User`` towards other
+`Value instance <http://llvm.org/doxygen/classllvm_1_1Value.html>`_\ s. The
+``Use`` (`doxygen <http://llvm.org/doxygen/classllvm_1_1Use.html>`__) helper
+class is employed to do the bookkeeping and to facilitate *O(1)* addition and
+removal.
+
+.. _Use2User:
+
+Interaction and relationship between ``User`` and ``Use`` objects
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A subclass of ``User`` can choose between incorporating its ``Use`` objects or
+refer to them out-of-line by means of a pointer. A mixed variant (some ``Use``
+s inline others hung off) is impractical and breaks the invariant that the
+``Use`` objects belonging to the same ``User`` form a contiguous array.
+
+We have 2 different layouts in the ``User`` (sub)classes:
+
+* Layout a)
+
+ The ``Use`` object(s) are inside (resp. at fixed offset) of the ``User``
+ object and there are a fixed number of them.
+
+* Layout b)
+
+ The ``Use`` object(s) are referenced by a pointer to an array from the
+ ``User`` object and there may be a variable number of them.
+
+As of v2.4 each layout still possesses a direct pointer to the start of the
+array of ``Use``\ s. Though not mandatory for layout a), we stick to this
+redundancy for the sake of simplicity. The ``User`` object also stores the
+number of ``Use`` objects it has. (Theoretically this information can also be
+calculated given the scheme presented below.)
+
+Special forms of allocation operators (``operator new``) enforce the following
+memory layouts:
+
+* Layout a) is modelled by prepending the ``User`` object by the ``Use[]``
+ array.
+
+ .. code-block:: none
+
+ ...---.---.---.---.-------...
+ | P | P | P | P | User
+ '''---'---'---'---'-------'''
+
+* Layout b) is modelled by pointing at the ``Use[]`` array.
+
+ .. code-block:: none
+
+ .-------...
+ | User
+ '-------'''
+ |
+ v
+ .---.---.---.---...
+ | P | P | P | P |
+ '---'---'---'---'''
+
+*(In the above figures* '``P``' *stands for the* ``Use**`` *that is stored in
+each* ``Use`` *object in the member* ``Use::Prev`` *)*
+
+.. _Waymarking:
+
+The waymarking algorithm
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Since the ``Use`` objects are deprived of the direct (back)pointer to their
+``User`` objects, there must be a fast and exact method to recover it. This is
+accomplished by the following scheme:
+
+A bit-encoding in the 2 LSBits (least significant bits) of the ``Use::Prev``
+allows to find the start of the ``User`` object:
+
+* ``00`` --- binary digit 0
+
+* ``01`` --- binary digit 1
+
+* ``10`` --- stop and calculate (``s``)
+
+* ``11`` --- full stop (``S``)
+
+Given a ``Use*``, all we have to do is to walk till we get a stop and we either
+have a ``User`` immediately behind or we have to walk to the next stop picking
+up digits and calculating the offset:
+
+.. code-block:: none
+
+ .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
+ | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
+ '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
+ |+15 |+10 |+6 |+3 |+1
+ | | | | | __>
+ | | | | __________>
+ | | | ______________________>
+ | | ______________________________________>
+ | __________________________________________________________>
+
+Only the significant number of bits need to be stored between the stops, so that
+the *worst case is 20 memory accesses* when there are 1000 ``Use`` objects
+associated with a ``User``.
+
+.. _ReferenceImpl:
+
+Reference implementation
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following literate Haskell fragment demonstrates the concept:
+
+.. code-block:: haskell
+
+ > import Test.QuickCheck
+ >
+ > digits :: Int -> [Char] -> [Char]
+ > digits 0 acc = '0' : acc
+ > digits 1 acc = '1' : acc
+ > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
+ >
+ > dist :: Int -> [Char] -> [Char]
+ > dist 0 [] = ['S']
+ > dist 0 acc = acc
+ > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
+ > dist n acc = dist (n - 1) $ dist 1 acc
+ >
+ > takeLast n ss = reverse $ take n $ reverse ss
+ >
+ > test = takeLast 40 $ dist 20 []
+ >
+
+Printing <test> gives: ``"1s100000s11010s10100s1111s1010s110s11s1S"``
+
+The reverse algorithm computes the length of the string just by examining a
+certain prefix:
+
+.. code-block:: haskell
+
+ > pref :: [Char] -> Int
+ > pref "S" = 1
+ > pref ('s':'1':rest) = decode 2 1 rest
+ > pref (_:rest) = 1 + pref rest
+ >
+ > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
+ > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
+ > decode walk acc _ = walk + acc
+ >
+
+Now, as expected, printing <pref test> gives ``40``.
+
+We can *quickCheck* this with following property:
+
+.. code-block:: haskell
+
+ > testcase = dist 2000 []
+ > testcaseLength = length testcase
+ >
+ > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
+ > where arr = takeLast n testcase
+ >
+
+As expected <quickCheck identityProp> gives:
+
+::
+
+ *Main> quickCheck identityProp
+ OK, passed 100 tests.
+
+Let's be a bit more exhaustive:
+
+.. code-block:: haskell
+
+ >
+ > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
+ >
+
+And here is the result of <deepCheck identityProp>:
+
+::
+
+ *Main> deepCheck identityProp
+ OK, passed 500 tests.
+
+.. _Tagging:
+
+Tagging considerations
+^^^^^^^^^^^^^^^^^^^^^^
+
+To maintain the invariant that the 2 LSBits of each ``Use**`` in ``Use`` never
+change after being set up, setters of ``Use::Prev`` must re-tag the new
+``Use**`` on every modification. Accordingly getters must strip the tag bits.
+
+For layout b) instead of the ``User`` we find a pointer (``User*`` with LSBit
+set). Following this pointer brings us to the ``User``. A portable trick
+ensures that the first bytes of ``User`` (if interpreted as a pointer) never has
+the LSBit set. (Portability is relying on the fact that all known compilers
+place the ``vptr`` in the first word of the instances.)
+
+.. _polymorphism:
+
+Designing Type Hiercharies and Polymorphic Interfaces
+-----------------------------------------------------
+
+There are two different design patterns that tend to result in the use of
+virtual dispatch for methods in a type hierarchy in C++ programs. The first is
+a genuine type hierarchy where different types in the hierarchy model
+a specific subset of the functionality and semantics, and these types nest
+strictly within each other. Good examples of this can be seen in the ``Value``
+or ``Type`` type hierarchies.
+
+A second is the desire to dispatch dynamically across a collection of
+polymorphic interface implementations. This latter use case can be modeled with
+virtual dispatch and inheritance by defining an abstract interface base class
+which all implementations derive from and override. However, this
+implementation strategy forces an **"is-a"** relationship to exist that is not
+actually meaningful. There is often not some nested hierarchy of useful
+generalizations which code might interact with and move up and down. Instead,
+there is a singular interface which is dispatched across a range of
+implementations.
+
+The preferred implementation strategy for the second use case is that of
+generic programming (sometimes called "compile-time duck typing" or "static
+polymorphism"). For example, a template over some type parameter ``T`` can be
+instantiated across any particular implementation that conforms to the
+interface or *concept*. A good example here is the highly generic properties of
+any type which models a node in a directed graph. LLVM models these primarily
+through templates and generic programming. Such templates include the
+``LoopInfoBase`` and ``DominatorTreeBase``. When this type of polymorphism
+truly needs **dynamic** dispatch you can generalize it using a technique
+called *concept-based polymorphism*. This pattern emulates the interfaces and
+behaviors of templates using a very limited form of virtual dispatch for type
+erasure inside its implementation. You can find examples of this technique in
+the ``PassManager.h`` system, and there is a more detailed introduction to it
+by Sean Parent in several of his talks and papers:
+
+#. `Inheritance Is The Base Class of Evil
+ <http://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil>`_
+ - The GoingNative 2013 talk describing this technique, and probably the best
+ place to start.
+#. `Value Semantics and Concepts-based Polymorphism
+ <http://www.youtube.com/watch?v=_BpMYeUFXv8>`_ - The C++Now! 2012 talk
+ describing this technique in more detail.
+#. `Sean Parent's Papers and Presentations
+ <http://github.com/sean-parent/sean-parent.github.com/wiki/Papers-and-Presentations>`_
+ - A Github project full of links to slides, video, and sometimes code.
+
+When deciding between creating a type hierarchy (with either tagged or virtual
+dispatch) and using templates or concepts-based polymorphism, consider whether
+there is some refinement of an abstract base class which is a semantically
+meaningful type on an interface boundary. If anything more refined than the
+root abstract interface is meaningless to talk about as a partial extension of
+the semantic model, then your use case likely fits better with polymorphism and
+you should avoid using virtual dispatch. However, there may be some exigent
+circumstances that require one technique or the other to be used.
+
+If you do need to introduce a type hierarchy, we prefer to use explicitly
+closed type hierarchies with manual tagged dispatch and/or RTTI rather than the
+open inheritance model and virtual dispatch that is more common in C++ code.
+This is because LLVM rarely encourages library consumers to extend its core
+types, and leverages the closed and tag-dispatched nature of its hierarchies to
+generate significantly more efficient code. We have also found that a large
+amount of our usage of type hierarchies fits better with tag-based pattern
+matching rather than dynamic dispatch across a common interface. Within LLVM we
+have built custom helpers to facilitate this design. See this document's
+section on :ref:`isa and dyn_cast <isa>` and our :doc:`detailed document
+<HowToSetUpLLVMStyleRTTI>` which describes how you can implement this
+pattern for use with the LLVM helpers.
+
+.. _abi_breaking_checks:
+
+ABI Breaking Checks
+-------------------
+
+Checks and asserts that alter the LLVM C++ ABI are predicated on the
+preprocessor symbol `LLVM_ENABLE_ABI_BREAKING_CHECKS` -- LLVM
+libraries built with `LLVM_ENABLE_ABI_BREAKING_CHECKS` are not ABI
+compatible LLVM libraries built without it defined. By default,
+turning on assertions also turns on `LLVM_ENABLE_ABI_BREAKING_CHECKS`
+so a default +Asserts build is not ABI compatible with a
+default -Asserts build. Clients that want ABI compatibility
+between +Asserts and -Asserts builds should use the CMake or autoconf
+build systems to set `LLVM_ENABLE_ABI_BREAKING_CHECKS` independently
+of `LLVM_ENABLE_ASSERTIONS`.
+
+.. _coreclasses:
+
+The Core LLVM Class Hierarchy Reference
+=======================================
+
+``#include "llvm/IR/Type.h"``
+
+header source: `Type.h <http://llvm.org/doxygen/Type_8h_source.html>`_
+
+doxygen info: `Type Clases <http://llvm.org/doxygen/classllvm_1_1Type.html>`_
+
+The Core LLVM classes are the primary means of representing the program being
+inspected or transformed. The core LLVM classes are defined in header files in
+the ``include/llvm/IR`` directory, and implemented in the ``lib/IR``
+directory. It's worth noting that, for historical reasons, this library is
+called ``libLLVMCore.so``, not ``libLLVMIR.so`` as you might expect.
+
+.. _Type:
+
+The Type class and Derived Types
+--------------------------------
+
+``Type`` is a superclass of all type classes. Every ``Value`` has a ``Type``.
+``Type`` cannot be instantiated directly but only through its subclasses.
+Certain primitive types (``VoidType``, ``LabelType``, ``FloatType`` and
+``DoubleType``) have hidden subclasses. They are hidden because they offer no
+useful functionality beyond what the ``Type`` class offers except to distinguish
+themselves from other subclasses of ``Type``.
+
+All other types are subclasses of ``DerivedType``. Types can be named, but this
+is not a requirement. There exists exactly one instance of a given shape at any
+one time. This allows type equality to be performed with address equality of
+the Type Instance. That is, given two ``Type*`` values, the types are identical
+if the pointers are identical.
+
+.. _m_Type:
+
+Important Public Methods
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``bool isIntegerTy() const``: Returns true for any integer type.
+
+* ``bool isFloatingPointTy()``: Return true if this is one of the five
+ floating point types.
+
+* ``bool isSized()``: Return true if the type has known size. Things
+ that don't have a size are abstract types, labels and void.
+
+.. _derivedtypes:
+
+Important Derived Types
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``IntegerType``
+ Subclass of DerivedType that represents integer types of any bit width. Any
+ bit width between ``IntegerType::MIN_INT_BITS`` (1) and
+ ``IntegerType::MAX_INT_BITS`` (~8 million) can be represented.
+
+ * ``static const IntegerType* get(unsigned NumBits)``: get an integer
+ type of a specific bit width.
+
+ * ``unsigned getBitWidth() const``: Get the bit width of an integer type.
+
+``SequentialType``
+ This is subclassed by ArrayType and VectorType.
+
+ * ``const Type * getElementType() const``: Returns the type of each
+ of the elements in the sequential type.
+
+ * ``uint64_t getNumElements() const``: Returns the number of elements
+ in the sequential type.
+
+``ArrayType``
+ This is a subclass of SequentialType and defines the interface for array
+ types.
+
+``PointerType``
+ Subclass of Type for pointer types.
+
+``VectorType``
+ Subclass of SequentialType for vector types. A vector type is similar to an
+ ArrayType but is distinguished because it is a first class type whereas
+ ArrayType is not. Vector types are used for vector operations and are usually
+ small vectors of an integer or floating point type.
+
+``StructType``
+ Subclass of DerivedTypes for struct types.
+
+.. _FunctionType:
+
+``FunctionType``
+ Subclass of DerivedTypes for function types.
+
+ * ``bool isVarArg() const``: Returns true if it's a vararg function.
+
+ * ``const Type * getReturnType() const``: Returns the return type of the
+ function.
+
+ * ``const Type * getParamType (unsigned i)``: Returns the type of the ith
+ parameter.
+
+ * ``const unsigned getNumParams() const``: Returns the number of formal
+ parameters.
+
+.. _Module:
+
+The ``Module`` class
+--------------------
+
+``#include "llvm/IR/Module.h"``
+
+header source: `Module.h <http://llvm.org/doxygen/Module_8h_source.html>`_
+
+doxygen info: `Module Class <http://llvm.org/doxygen/classllvm_1_1Module.html>`_
+
+The ``Module`` class represents the top level structure present in LLVM
+programs. An LLVM module is effectively either a translation unit of the
+original program or a combination of several translation units merged by the
+linker. The ``Module`` class keeps track of a list of :ref:`Function
+<c_Function>`\ s, a list of GlobalVariable_\ s, and a SymbolTable_.
+Additionally, it contains a few helpful member functions that try to make common
+operations easy.
+
+.. _m_Module:
+
+Important Public Members of the ``Module`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``Module::Module(std::string name = "")``
+
+ Constructing a Module_ is easy. You can optionally provide a name for it
+ (probably based on the name of the translation unit).
+
+* | ``Module::iterator`` - Typedef for function list iterator
+ | ``Module::const_iterator`` - Typedef for const_iterator.
+ | ``begin()``, ``end()``, ``size()``, ``empty()``
+
+ These are forwarding methods that make it easy to access the contents of a
+ ``Module`` object's :ref:`Function <c_Function>` list.
+
+* ``Module::FunctionListType &getFunctionList()``
+
+ Returns the list of :ref:`Function <c_Function>`\ s. This is necessary to use
+ when you need to update the list or perform a complex action that doesn't have
+ a forwarding method.
+
+----------------
+
+* | ``Module::global_iterator`` - Typedef for global variable list iterator
+ | ``Module::const_global_iterator`` - Typedef for const_iterator.
+ | ``global_begin()``, ``global_end()``, ``global_size()``, ``global_empty()``
+
+ These are forwarding methods that make it easy to access the contents of a
+ ``Module`` object's GlobalVariable_ list.
+
+* ``Module::GlobalListType &getGlobalList()``
+
+ Returns the list of GlobalVariable_\ s. This is necessary to use when you
+ need to update the list or perform a complex action that doesn't have a
+ forwarding method.
+
+----------------
+
+* ``SymbolTable *getSymbolTable()``
+
+ Return a reference to the SymbolTable_ for this ``Module``.
+
+----------------
+
+* ``Function *getFunction(StringRef Name) const``
+
+ Look up the specified function in the ``Module`` SymbolTable_. If it does not
+ exist, return ``null``.
+
+* ``Function *getOrInsertFunction(const std::string &Name, const FunctionType
+ *T)``
+
+ Look up the specified function in the ``Module`` SymbolTable_. If it does not
+ exist, add an external declaration for the function and return it.
+
+* ``std::string getTypeName(const Type *Ty)``
+
+ If there is at least one entry in the SymbolTable_ for the specified Type_,
+ return it. Otherwise return the empty string.
+
+* ``bool addTypeName(const std::string &Name, const Type *Ty)``
+
+ Insert an entry in the SymbolTable_ mapping ``Name`` to ``Ty``. If there is
+ already an entry for this name, true is returned and the SymbolTable_ is not
+ modified.
+
+.. _Value:
+
+The ``Value`` class
+-------------------
+
+``#include "llvm/IR/Value.h"``
+
+header source: `Value.h <http://llvm.org/doxygen/Value_8h_source.html>`_
+
+doxygen info: `Value Class <http://llvm.org/doxygen/classllvm_1_1Value.html>`_
+
+The ``Value`` class is the most important class in the LLVM Source base. It
+represents a typed value that may be used (among other things) as an operand to
+an instruction. There are many different types of ``Value``\ s, such as
+Constant_\ s, Argument_\ s. Even Instruction_\ s and :ref:`Function
+<c_Function>`\ s are ``Value``\ s.
+
+A particular ``Value`` may be used many times in the LLVM representation for a
+program. For example, an incoming argument to a function (represented with an
+instance of the Argument_ class) is "used" by every instruction in the function
+that references the argument. To keep track of this relationship, the ``Value``
+class keeps a list of all of the ``User``\ s that is using it (the User_ class
+is a base class for all nodes in the LLVM graph that can refer to ``Value``\ s).
+This use list is how LLVM represents def-use information in the program, and is
+accessible through the ``use_*`` methods, shown below.
+
+Because LLVM is a typed representation, every LLVM ``Value`` is typed, and this
+Type_ is available through the ``getType()`` method. In addition, all LLVM
+values can be named. The "name" of the ``Value`` is a symbolic string printed
+in the LLVM code:
+
+.. code-block:: llvm
+
+ %foo = add i32 1, 2
+
+.. _nameWarning:
+
+The name of this instruction is "foo". **NOTE** that the name of any value may
+be missing (an empty string), so names should **ONLY** be used for debugging
+(making the source code easier to read, debugging printouts), they should not be
+used to keep track of values or map between them. For this purpose, use a
+``std::map`` of pointers to the ``Value`` itself instead.
+
+One important aspect of LLVM is that there is no distinction between an SSA
+variable and the operation that produces it. Because of this, any reference to
+the value produced by an instruction (or the value available as an incoming
+argument, for example) is represented as a direct pointer to the instance of the
+class that represents this value. Although this may take some getting used to,
+it simplifies the representation and makes it easier to manipulate.
+
+.. _m_Value:
+
+Important Public Members of the ``Value`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* | ``Value::use_iterator`` - Typedef for iterator over the use-list
+ | ``Value::const_use_iterator`` - Typedef for const_iterator over the
+ use-list
+ | ``unsigned use_size()`` - Returns the number of users of the value.
+ | ``bool use_empty()`` - Returns true if there are no users.
+ | ``use_iterator use_begin()`` - Get an iterator to the start of the
+ use-list.
+ | ``use_iterator use_end()`` - Get an iterator to the end of the use-list.
+ | ``User *use_back()`` - Returns the last element in the list.
+
+ These methods are the interface to access the def-use information in LLVM.
+ As with all other iterators in LLVM, the naming conventions follow the
+ conventions defined by the STL_.
+
+* ``Type *getType() const``
+ This method returns the Type of the Value.
+
+* | ``bool hasName() const``
+ | ``std::string getName() const``
+ | ``void setName(const std::string &Name)``
+
+ This family of methods is used to access and assign a name to a ``Value``, be
+ aware of the :ref:`precaution above <nameWarning>`.
+
+* ``void replaceAllUsesWith(Value *V)``
+
+ This method traverses the use list of a ``Value`` changing all User_\ s of the
+ current value to refer to "``V``" instead. For example, if you detect that an
+ instruction always produces a constant value (for example through constant
+ folding), you can replace all uses of the instruction with the constant like
+ this:
+
+ .. code-block:: c++
+
+ Inst->replaceAllUsesWith(ConstVal);
+
+.. _User:
+
+The ``User`` class
+------------------
+
+``#include "llvm/IR/User.h"``
+
+header source: `User.h <http://llvm.org/doxygen/User_8h_source.html>`_
+
+doxygen info: `User Class <http://llvm.org/doxygen/classllvm_1_1User.html>`_
+
+Superclass: Value_
+
+The ``User`` class is the common base class of all LLVM nodes that may refer to
+``Value``\ s. It exposes a list of "Operands" that are all of the ``Value``\ s
+that the User is referring to. The ``User`` class itself is a subclass of
+``Value``.
+
+The operands of a ``User`` point directly to the LLVM ``Value`` that it refers
+to. Because LLVM uses Static Single Assignment (SSA) form, there can only be
+one definition referred to, allowing this direct connection. This connection
+provides the use-def information in LLVM.
+
+.. _m_User:
+
+Important Public Members of the ``User`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``User`` class exposes the operand list in two ways: through an index access
+interface and through an iterator based interface.
+
+* | ``Value *getOperand(unsigned i)``
+ | ``unsigned getNumOperands()``
+
+ These two methods expose the operands of the ``User`` in a convenient form for
+ direct access.
+
+* | ``User::op_iterator`` - Typedef for iterator over the operand list
+ | ``op_iterator op_begin()`` - Get an iterator to the start of the operand
+ list.
+ | ``op_iterator op_end()`` - Get an iterator to the end of the operand list.
+
+ Together, these methods make up the iterator based interface to the operands
+ of a ``User``.
+
+
+.. _Instruction:
+
+The ``Instruction`` class
+-------------------------
+
+``#include "llvm/IR/Instruction.h"``
+
+header source: `Instruction.h
+<http://llvm.org/doxygen/Instruction_8h_source.html>`_
+
+doxygen info: `Instruction Class
+<http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_
+
+Superclasses: User_, Value_
+
+The ``Instruction`` class is the common base class for all LLVM instructions.
+It provides only a few methods, but is a very commonly used class. The primary
+data tracked by the ``Instruction`` class itself is the opcode (instruction
+type) and the parent BasicBlock_ the ``Instruction`` is embedded into. To
+represent a specific type of instruction, one of many subclasses of
+``Instruction`` are used.
+
+Because the ``Instruction`` class subclasses the User_ class, its operands can
+be accessed in the same way as for other ``User``\ s (with the
+``getOperand()``/``getNumOperands()`` and ``op_begin()``/``op_end()`` methods).
+An important file for the ``Instruction`` class is the ``llvm/Instruction.def``
+file. This file contains some meta-data about the various different types of
+instructions in LLVM. It describes the enum values that are used as opcodes
+(for example ``Instruction::Add`` and ``Instruction::ICmp``), as well as the
+concrete sub-classes of ``Instruction`` that implement the instruction (for
+example BinaryOperator_ and CmpInst_). Unfortunately, the use of macros in this
+file confuses doxygen, so these enum values don't show up correctly in the
+`doxygen output <http://llvm.org/doxygen/classllvm_1_1Instruction.html>`_.
+
+.. _s_Instruction:
+
+Important Subclasses of the ``Instruction`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _BinaryOperator:
+
+* ``BinaryOperator``
+
+ This subclasses represents all two operand instructions whose operands must be
+ the same type, except for the comparison instructions.
+
+.. _CastInst:
+
+* ``CastInst``
+ This subclass is the parent of the 12 casting instructions. It provides
+ common operations on cast instructions.
+
+.. _CmpInst:
+
+* ``CmpInst``
+
+ This subclass represents the two comparison instructions,
+ `ICmpInst <LangRef.html#i_icmp>`_ (integer opreands), and
+ `FCmpInst <LangRef.html#i_fcmp>`_ (floating point operands).
+
+.. _m_Instruction:
+
+Important Public Members of the ``Instruction`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``BasicBlock *getParent()``
+
+ Returns the BasicBlock_ that this
+ ``Instruction`` is embedded into.
+
+* ``bool mayWriteToMemory()``
+
+ Returns true if the instruction writes to memory, i.e. it is a ``call``,
+ ``free``, ``invoke``, or ``store``.
+
+* ``unsigned getOpcode()``
+
+ Returns the opcode for the ``Instruction``.
+
+* ``Instruction *clone() const``
+
+ Returns another instance of the specified instruction, identical in all ways
+ to the original except that the instruction has no parent (i.e. it's not
+ embedded into a BasicBlock_), and it has no name.
+
+.. _Constant:
+
+The ``Constant`` class and subclasses
+-------------------------------------
+
+Constant represents a base class for different types of constants. It is
+subclassed by ConstantInt, ConstantArray, etc. for representing the various
+types of Constants. GlobalValue_ is also a subclass, which represents the
+address of a global variable or function.
+
+.. _s_Constant:
+
+Important Subclasses of Constant
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ConstantInt : This subclass of Constant represents an integer constant of
+ any width.
+
+ * ``const APInt& getValue() const``: Returns the underlying
+ value of this constant, an APInt value.
+
+ * ``int64_t getSExtValue() const``: Converts the underlying APInt value to an
+ int64_t via sign extension. If the value (not the bit width) of the APInt
+ is too large to fit in an int64_t, an assertion will result. For this
+ reason, use of this method is discouraged.
+
+ * ``uint64_t getZExtValue() const``: Converts the underlying APInt value
+ to a uint64_t via zero extension. IF the value (not the bit width) of the
+ APInt is too large to fit in a uint64_t, an assertion will result. For this
+ reason, use of this method is discouraged.
+
+ * ``static ConstantInt* get(const APInt& Val)``: Returns the ConstantInt
+ object that represents the value provided by ``Val``. The type is implied
+ as the IntegerType that corresponds to the bit width of ``Val``.
+
+ * ``static ConstantInt* get(const Type *Ty, uint64_t Val)``: Returns the
+ ConstantInt object that represents the value provided by ``Val`` for integer
+ type ``Ty``.
+
+* ConstantFP : This class represents a floating point constant.
+
+ * ``double getValue() const``: Returns the underlying value of this constant.
+
+* ConstantArray : This represents a constant array.
+
+ * ``const std::vector<Use> &getValues() const``: Returns a vector of
+ component constants that makeup this array.
+
+* ConstantStruct : This represents a constant struct.
+
+ * ``const std::vector<Use> &getValues() const``: Returns a vector of
+ component constants that makeup this array.
+
+* GlobalValue : This represents either a global variable or a function. In
+ either case, the value is a constant fixed address (after linking).
+
+.. _GlobalValue:
+
+The ``GlobalValue`` class
+-------------------------
+
+``#include "llvm/IR/GlobalValue.h"``
+
+header source: `GlobalValue.h
+<http://llvm.org/doxygen/GlobalValue_8h_source.html>`_
+
+doxygen info: `GlobalValue Class
+<http://llvm.org/doxygen/classllvm_1_1GlobalValue.html>`_
+
+Superclasses: Constant_, User_, Value_
+
+Global values ( GlobalVariable_\ s or :ref:`Function <c_Function>`\ s) are the
+only LLVM values that are visible in the bodies of all :ref:`Function
+<c_Function>`\ s. Because they are visible at global scope, they are also
+subject to linking with other globals defined in different translation units.
+To control the linking process, ``GlobalValue``\ s know their linkage rules.
+Specifically, ``GlobalValue``\ s know whether they have internal or external
+linkage, as defined by the ``LinkageTypes`` enumeration.
+
+If a ``GlobalValue`` has internal linkage (equivalent to being ``static`` in C),
+it is not visible to code outside the current translation unit, and does not
+participate in linking. If it has external linkage, it is visible to external
+code, and does participate in linking. In addition to linkage information,
+``GlobalValue``\ s keep track of which Module_ they are currently part of.
+
+Because ``GlobalValue``\ s are memory objects, they are always referred to by
+their **address**. As such, the Type_ of a global is always a pointer to its
+contents. It is important to remember this when using the ``GetElementPtrInst``
+instruction because this pointer must be dereferenced first. For example, if
+you have a ``GlobalVariable`` (a subclass of ``GlobalValue)`` that is an array
+of 24 ints, type ``[24 x i32]``, then the ``GlobalVariable`` is a pointer to
+that array. Although the address of the first element of this array and the
+value of the ``GlobalVariable`` are the same, they have different types. The
+``GlobalVariable``'s type is ``[24 x i32]``. The first element's type is
+``i32.`` Because of this, accessing a global value requires you to dereference
+the pointer with ``GetElementPtrInst`` first, then its elements can be accessed.
+This is explained in the `LLVM Language Reference Manual
+<LangRef.html#globalvars>`_.
+
+.. _m_GlobalValue:
+
+Important Public Members of the ``GlobalValue`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* | ``bool hasInternalLinkage() const``
+ | ``bool hasExternalLinkage() const``
+ | ``void setInternalLinkage(bool HasInternalLinkage)``
+
+ These methods manipulate the linkage characteristics of the ``GlobalValue``.
+
+* ``Module *getParent()``
+
+ This returns the Module_ that the
+ GlobalValue is currently embedded into.
+
+.. _c_Function:
+
+The ``Function`` class
+----------------------
+
+``#include "llvm/IR/Function.h"``
+
+header source: `Function.h <http://llvm.org/doxygen/Function_8h_source.html>`_
+
+doxygen info: `Function Class
+<http://llvm.org/doxygen/classllvm_1_1Function.html>`_
+
+Superclasses: GlobalValue_, Constant_, User_, Value_
+
+The ``Function`` class represents a single procedure in LLVM. It is actually
+one of the more complex classes in the LLVM hierarchy because it must keep track
+of a large amount of data. The ``Function`` class keeps track of a list of
+BasicBlock_\ s, a list of formal Argument_\ s, and a SymbolTable_.
+
+The list of BasicBlock_\ s is the most commonly used part of ``Function``
+objects. The list imposes an implicit ordering of the blocks in the function,
+which indicate how the code will be laid out by the backend. Additionally, the
+first BasicBlock_ is the implicit entry node for the ``Function``. It is not
+legal in LLVM to explicitly branch to this initial block. There are no implicit
+exit nodes, and in fact there may be multiple exit nodes from a single
+``Function``. If the BasicBlock_ list is empty, this indicates that the
+``Function`` is actually a function declaration: the actual body of the function
+hasn't been linked in yet.
+
+In addition to a list of BasicBlock_\ s, the ``Function`` class also keeps track
+of the list of formal Argument_\ s that the function receives. This container
+manages the lifetime of the Argument_ nodes, just like the BasicBlock_ list does
+for the BasicBlock_\ s.
+
+The SymbolTable_ is a very rarely used LLVM feature that is only used when you
+have to look up a value by name. Aside from that, the SymbolTable_ is used
+internally to make sure that there are not conflicts between the names of
+Instruction_\ s, BasicBlock_\ s, or Argument_\ s in the function body.
+
+Note that ``Function`` is a GlobalValue_ and therefore also a Constant_. The
+value of the function is its address (after linking) which is guaranteed to be
+constant.
+
+.. _m_Function:
+
+Important Public Members of the ``Function``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``Function(const FunctionType *Ty, LinkageTypes Linkage,
+ const std::string &N = "", Module* Parent = 0)``
+
+ Constructor used when you need to create new ``Function``\ s to add the
+ program. The constructor must specify the type of the function to create and
+ what type of linkage the function should have. The FunctionType_ argument
+ specifies the formal arguments and return value for the function. The same
+ FunctionType_ value can be used to create multiple functions. The ``Parent``
+ argument specifies the Module in which the function is defined. If this
+ argument is provided, the function will automatically be inserted into that
+ module's list of functions.
+
+* ``bool isDeclaration()``
+
+ Return whether or not the ``Function`` has a body defined. If the function is
+ "external", it does not have a body, and thus must be resolved by linking with
+ a function defined in a different translation unit.
+
+* | ``Function::iterator`` - Typedef for basic block list iterator
+ | ``Function::const_iterator`` - Typedef for const_iterator.
+ | ``begin()``, ``end()``, ``size()``, ``empty()``
+
+ These are forwarding methods that make it easy to access the contents of a
+ ``Function`` object's BasicBlock_ list.
+
+* ``Function::BasicBlockListType &getBasicBlockList()``
+
+ Returns the list of BasicBlock_\ s. This is necessary to use when you need to
+ update the list or perform a complex action that doesn't have a forwarding
+ method.
+
+* | ``Function::arg_iterator`` - Typedef for the argument list iterator
+ | ``Function::const_arg_iterator`` - Typedef for const_iterator.
+ | ``arg_begin()``, ``arg_end()``, ``arg_size()``, ``arg_empty()``
+
+ These are forwarding methods that make it easy to access the contents of a
+ ``Function`` object's Argument_ list.
+
+* ``Function::ArgumentListType &getArgumentList()``
+
+ Returns the list of Argument_. This is necessary to use when you need to
+ update the list or perform a complex action that doesn't have a forwarding
+ method.
+
+* ``BasicBlock &getEntryBlock()``
+
+ Returns the entry ``BasicBlock`` for the function. Because the entry block
+ for the function is always the first block, this returns the first block of
+ the ``Function``.
+
+* | ``Type *getReturnType()``
+ | ``FunctionType *getFunctionType()``
+
+ This traverses the Type_ of the ``Function`` and returns the return type of
+ the function, or the FunctionType_ of the actual function.
+
+* ``SymbolTable *getSymbolTable()``
+
+ Return a pointer to the SymbolTable_ for this ``Function``.
+
+.. _GlobalVariable:
+
+The ``GlobalVariable`` class
+----------------------------
+
+``#include "llvm/IR/GlobalVariable.h"``
+
+header source: `GlobalVariable.h
+<http://llvm.org/doxygen/GlobalVariable_8h_source.html>`_
+
+doxygen info: `GlobalVariable Class
+<http://llvm.org/doxygen/classllvm_1_1GlobalVariable.html>`_
+
+Superclasses: GlobalValue_, Constant_, User_, Value_
+
+Global variables are represented with the (surprise surprise) ``GlobalVariable``
+class. Like functions, ``GlobalVariable``\ s are also subclasses of
+GlobalValue_, and as such are always referenced by their address (global values
+must live in memory, so their "name" refers to their constant address). See
+GlobalValue_ for more on this. Global variables may have an initial value
+(which must be a Constant_), and if they have an initializer, they may be marked
+as "constant" themselves (indicating that their contents never change at
+runtime).
+
+.. _m_GlobalVariable:
+
+Important Public Members of the ``GlobalVariable`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes &Linkage,
+ Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)``
+
+ Create a new global variable of the specified type. If ``isConstant`` is true
+ then the global variable will be marked as unchanging for the program. The
+ Linkage parameter specifies the type of linkage (internal, external, weak,
+ linkonce, appending) for the variable. If the linkage is InternalLinkage,
+ WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage, then
+ the resultant global variable will have internal linkage. AppendingLinkage
+ concatenates together all instances (in different translation units) of the
+ variable into a single variable but is only applicable to arrays. See the
+ `LLVM Language Reference <LangRef.html#modulestructure>`_ for further details
+ on linkage types. Optionally an initializer, a name, and the module to put
+ the variable into may be specified for the global variable as well.
+
+* ``bool isConstant() const``
+
+ Returns true if this is a global variable that is known not to be modified at
+ runtime.
+
+* ``bool hasInitializer()``
+
+ Returns true if this ``GlobalVariable`` has an intializer.
+
+* ``Constant *getInitializer()``
+
+ Returns the initial value for a ``GlobalVariable``. It is not legal to call
+ this method if there is no initializer.
+
+.. _BasicBlock:
+
+The ``BasicBlock`` class
+------------------------
+
+``#include "llvm/IR/BasicBlock.h"``
+
+header source: `BasicBlock.h
+<http://llvm.org/doxygen/BasicBlock_8h_source.html>`_
+
+doxygen info: `BasicBlock Class
+<http://llvm.org/doxygen/classllvm_1_1BasicBlock.html>`_
+
+Superclass: Value_
+
+This class represents a single entry single exit section of the code, commonly
+known as a basic block by the compiler community. The ``BasicBlock`` class
+maintains a list of Instruction_\ s, which form the body of the block. Matching
+the language definition, the last element of this list of instructions is always
+a terminator instruction.
+
+In addition to tracking the list of instructions that make up the block, the
+``BasicBlock`` class also keeps track of the :ref:`Function <c_Function>` that
+it is embedded into.
+
+Note that ``BasicBlock``\ s themselves are Value_\ s, because they are
+referenced by instructions like branches and can go in the switch tables.
+``BasicBlock``\ s have type ``label``.
+
+.. _m_BasicBlock:
+
+Important Public Members of the ``BasicBlock`` class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* ``BasicBlock(const std::string &Name = "", Function *Parent = 0)``
+
+ The ``BasicBlock`` constructor is used to create new basic blocks for
+ insertion into a function. The constructor optionally takes a name for the
+ new block, and a :ref:`Function <c_Function>` to insert it into. If the
+ ``Parent`` parameter is specified, the new ``BasicBlock`` is automatically
+ inserted at the end of the specified :ref:`Function <c_Function>`, if not
+ specified, the BasicBlock must be manually inserted into the :ref:`Function
+ <c_Function>`.
+
+* | ``BasicBlock::iterator`` - Typedef for instruction list iterator
+ | ``BasicBlock::const_iterator`` - Typedef for const_iterator.
+ | ``begin()``, ``end()``, ``front()``, ``back()``,
+ ``size()``, ``empty()``
+ STL-style functions for accessing the instruction list.
+
+ These methods and typedefs are forwarding functions that have the same
+ semantics as the standard library methods of the same names. These methods
+ expose the underlying instruction list of a basic block in a way that is easy
+ to manipulate. To get the full complement of container operations (including
+ operations to update the list), you must use the ``getInstList()`` method.
+
+* ``BasicBlock::InstListType &getInstList()``
+
+ This method is used to get access to the underlying container that actually
+ holds the Instructions. This method must be used when there isn't a
+ forwarding function in the ``BasicBlock`` class for the operation that you
+ would like to perform. Because there are no forwarding functions for
+ "updating" operations, you need to use this if you want to update the contents
+ of a ``BasicBlock``.
+
+* ``Function *getParent()``
+
+ Returns a pointer to :ref:`Function <c_Function>` the block is embedded into,
+ or a null pointer if it is homeless.
+
+* ``Instruction *getTerminator()``
+
+ Returns a pointer to the terminator instruction that appears at the end of the
+ ``BasicBlock``. If there is no terminator instruction, or if the last
+ instruction in the block is not a terminator, then a null pointer is returned.
+
+.. _Argument:
+
+The ``Argument`` class
+----------------------
+
+This subclass of Value defines the interface for incoming formal arguments to a
+function. A Function maintains a list of its formal arguments. An argument has
+a pointer to the parent Function.
+
+
Added: www-releases/trunk/8.0.1/docs/_sources/Projects.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Projects.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Projects.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Projects.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,257 @@
+========================
+Creating an LLVM Project
+========================
+
+.. contents::
+ :local:
+
+Overview
+========
+
+The LLVM build system is designed to facilitate the building of third party
+projects that use LLVM header files, libraries, and tools. In order to use
+these facilities, a ``Makefile`` from a project must do the following things:
+
+* Set ``make`` variables. There are several variables that a ``Makefile`` needs
+ to set to use the LLVM build system:
+
+ * ``PROJECT_NAME`` - The name by which your project is known.
+ * ``LLVM_SRC_ROOT`` - The root of the LLVM source tree.
+ * ``LLVM_OBJ_ROOT`` - The root of the LLVM object tree.
+ * ``PROJ_SRC_ROOT`` - The root of the project's source tree.
+ * ``PROJ_OBJ_ROOT`` - The root of the project's object tree.
+ * ``PROJ_INSTALL_ROOT`` - The root installation directory.
+ * ``LEVEL`` - The relative path from the current directory to the
+ project's root ``($PROJ_OBJ_ROOT)``.
+
+* Include ``Makefile.config`` from ``$(LLVM_OBJ_ROOT)``.
+
+* Include ``Makefile.rules`` from ``$(LLVM_SRC_ROOT)``.
+
+There are two ways that you can set all of these variables:
+
+* You can write your own ``Makefiles`` which hard-code these values.
+
+* You can use the pre-made LLVM sample project. This sample project includes
+ ``Makefiles``, a configure script that can be used to configure the location
+ of LLVM, and the ability to support multiple object directories from a single
+ source directory.
+
+If you want to devise your own build system, studying other projects and LLVM
+``Makefiles`` will probably provide enough information on how to write your own
+``Makefiles``.
+
+Source Tree Layout
+==================
+
+In order to use the LLVM build system, you will want to organize your source
+code so that it can benefit from the build system's features. Mainly, you want
+your source tree layout to look similar to the LLVM source tree layout.
+
+Underneath your top level directory, you should have the following directories:
+
+**lib**
+
+ This subdirectory should contain all of your library source code. For each
+ library that you build, you will have one directory in **lib** that will
+ contain that library's source code.
+
+ Libraries can be object files, archives, or dynamic libraries. The **lib**
+ directory is just a convenient place for libraries as it places them all in
+ a directory from which they can be linked later.
+
+**include**
+
+ This subdirectory should contain any header files that are global to your
+ project. By global, we mean that they are used by more than one library or
+ executable of your project.
+
+ By placing your header files in **include**, they will be found
+ automatically by the LLVM build system. For example, if you have a file
+ **include/jazz/note.h**, then your source files can include it simply with
+ **#include "jazz/note.h"**.
+
+**tools**
+
+ This subdirectory should contain all of your source code for executables.
+ For each program that you build, you will have one directory in **tools**
+ that will contain that program's source code.
+
+**test**
+
+ This subdirectory should contain tests that verify that your code works
+ correctly. Automated tests are especially useful.
+
+ Currently, the LLVM build system provides basic support for tests. The LLVM
+ system provides the following:
+
+* LLVM contains regression tests in ``llvm/test``. These tests are run by the
+ :doc:`Lit <CommandGuide/lit>` testing tool. This test procedure uses ``RUN``
+ lines in the actual test case to determine how to run the test. See the
+ :doc:`TestingGuide` for more details.
+
+* LLVM contains an optional package called ``llvm-test``, which provides
+ benchmarks and programs that are known to compile with the Clang front
+ end. You can use these programs to test your code, gather statistical
+ information, and compare it to the current LLVM performance statistics.
+
+ Currently, there is no way to hook your tests directly into the ``llvm/test``
+ testing harness. You will simply need to find a way to use the source
+ provided within that directory on your own.
+
+Typically, you will want to build your **lib** directory first followed by your
+**tools** directory.
+
+Writing LLVM Style Makefiles
+============================
+
+The LLVM build system provides a convenient way to build libraries and
+executables. Most of your project Makefiles will only need to define a few
+variables. Below is a list of the variables one can set and what they can
+do:
+
+Required Variables
+------------------
+
+``LEVEL``
+
+ This variable is the relative path from this ``Makefile`` to the top
+ directory of your project's source code. For example, if your source code
+ is in ``/tmp/src``, then the ``Makefile`` in ``/tmp/src/jump/high``
+ would set ``LEVEL`` to ``"../.."``.
+
+Variables for Building Subdirectories
+-------------------------------------
+
+``DIRS``
+
+ This is a space separated list of subdirectories that should be built. They
+ will be built, one at a time, in the order specified.
+
+``PARALLEL_DIRS``
+
+ This is a list of directories that can be built in parallel. These will be
+ built after the directories in DIRS have been built.
+
+``OPTIONAL_DIRS``
+
+ This is a list of directories that can be built if they exist, but will not
+ cause an error if they do not exist. They are built serially in the order
+ in which they are listed.
+
+Variables for Building Libraries
+--------------------------------
+
+``LIBRARYNAME``
+
+ This variable contains the base name of the library that will be built. For
+ example, to build a library named ``libsample.a``, ``LIBRARYNAME`` should
+ be set to ``sample``.
+
+``BUILD_ARCHIVE``
+
+ By default, a library is a ``.o`` file that is linked directly into a
+ program. To build an archive (also known as a static library), set the
+ ``BUILD_ARCHIVE`` variable.
+
+``SHARED_LIBRARY``
+
+ If ``SHARED_LIBRARY`` is defined in your Makefile, a shared (or dynamic)
+ library will be built.
+
+Variables for Building Programs
+-------------------------------
+
+``TOOLNAME``
+
+ This variable contains the name of the program that will be built. For
+ example, to build an executable named ``sample``, ``TOOLNAME`` should be set
+ to ``sample``.
+
+``USEDLIBS``
+
+ This variable holds a space separated list of libraries that should be
+ linked into the program. These libraries must be libraries that come from
+ your **lib** directory. The libraries must be specified without their
+ ``lib`` prefix. For example, to link ``libsample.a``, you would set
+ ``USEDLIBS`` to ``sample.a``.
+
+ Note that this works only for statically linked libraries.
+
+``LLVMLIBS``
+
+ This variable holds a space separated list of libraries that should be
+ linked into the program. These libraries must be LLVM libraries. The
+ libraries must be specified without their ``lib`` prefix. For example, to
+ link with a driver that performs an IR transformation you might set
+ ``LLVMLIBS`` to this minimal set of libraries ``LLVMSupport.a LLVMCore.a
+ LLVMBitReader.a LLVMAsmParser.a LLVMAnalysis.a LLVMTransformUtils.a
+ LLVMScalarOpts.a LLVMTarget.a``.
+
+ Note that this works only for statically linked libraries. LLVM is split
+ into a large number of static libraries, and the list of libraries you
+ require may be much longer than the list above. To see a full list of
+ libraries use: ``llvm-config --libs all``. Using ``LINK_COMPONENTS`` as
+ described below, obviates the need to set ``LLVMLIBS``.
+
+``LINK_COMPONENTS``
+
+ This variable holds a space separated list of components that the LLVM
+ ``Makefiles`` pass to the ``llvm-config`` tool to generate a link line for
+ the program. For example, to link with all LLVM libraries use
+ ``LINK_COMPONENTS = all``.
+
+``LIBS``
+
+ To link dynamic libraries, add ``-l<library base name>`` to the ``LIBS``
+ variable. The LLVM build system will look in the same places for dynamic
+ libraries as it does for static libraries.
+
+ For example, to link ``libsample.so``, you would have the following line in
+ your ``Makefile``:
+
+ .. code-block:: makefile
+
+ LIBS += -lsample
+
+Note that ``LIBS`` must occur in the Makefile after the inclusion of
+``Makefile.common``.
+
+Miscellaneous Variables
+-----------------------
+
+``CFLAGS`` & ``CPPFLAGS``
+
+ This variable can be used to add options to the C and C++ compiler,
+ respectively. It is typically used to add options that tell the compiler
+ the location of additional directories to search for header files.
+
+ It is highly suggested that you append to ``CFLAGS`` and ``CPPFLAGS`` as
+ opposed to overwriting them. The master ``Makefiles`` may already have
+ useful options in them that you may not want to overwrite.
+
+Placement of Object Code
+========================
+
+The final location of built libraries and executables will depend upon whether
+you do a ``Debug``, ``Release``, or ``Profile`` build.
+
+Libraries
+
+ All libraries (static and dynamic) will be stored in
+ ``PROJ_OBJ_ROOT/<type>/lib``, where *type* is ``Debug``, ``Release``, or
+ ``Profile`` for a debug, optimized, or profiled build, respectively.
+
+Executables
+
+ All executables will be stored in ``PROJ_OBJ_ROOT/<type>/bin``, where *type*
+ is ``Debug``, ``Release``, or ``Profile`` for a debug, optimized, or
+ profiled build, respectively.
+
+Further Help
+============
+
+If you have any questions or need any help creating an LLVM project, the LLVM
+team would be more than happy to help. You can always post your questions to
+the `LLVM Developers Mailing List
+<http://lists.llvm.org/pipermail/llvm-dev/>`_.
Added: www-releases/trunk/8.0.1/docs/_sources/Proposals/GitHubMove.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Proposals/GitHubMove.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Proposals/GitHubMove.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Proposals/GitHubMove.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,868 @@
+==============================
+Moving LLVM Projects to GitHub
+==============================
+
+.. contents:: Table of Contents
+ :depth: 4
+ :local:
+
+Introduction
+============
+
+This is a proposal to move our current revision control system from our own
+hosted Subversion to GitHub. Below are the financial and technical arguments as
+to why we are proposing such a move and how people (and validation
+infrastructure) will continue to work with a Git-based LLVM.
+
+There will be a survey pointing at this document which we'll use to gauge the
+community's reaction and, if we collectively decide to move, the time-frame. Be
+sure to make your view count.
+
+Additionally, we will discuss this during a BoF at the next US LLVM Developer
+meeting (http://llvm.org/devmtg/2016-11/).
+
+What This Proposal is *Not* About
+=================================
+
+Changing the development policy.
+
+This proposal relates only to moving the hosting of our source-code repository
+from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
+using GitHub's issue tracker, pull-requests, or code-review.
+
+Contributors will continue to earn commit access on demand under the Developer
+Policy, except that that a GitHub account will be required instead of SVN
+username/password-hash.
+
+Why Git, and Why GitHub?
+========================
+
+Why Move At All?
+----------------
+
+This discussion began because we currently host our own Subversion server
+and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
+provides limited support, but there is only so much it can do.
+
+Volunteers are not sysadmins themselves, but compiler engineers that happen
+to know a thing or two about hosting servers. We also don't have 24/7 support,
+and we sometimes wake up to see that continuous integration is broken because
+the SVN server is either down or unresponsive.
+
+We should take advantage of one of the services out there (GitHub, GitLab,
+and BitBucket, among others) that offer better service (24/7 stability, disk
+space, Git server, code browsing, forking facilities, etc) for free.
+
+Why Git?
+--------
+
+Many new coders nowadays start with Git, and a lot of people have never used
+SVN, CVS, or anything else. Websites like GitHub have changed the landscape
+of open source contributions, reducing the cost of first contribution and
+fostering collaboration.
+
+Git is also the version control many LLVM developers use. Despite the
+sources being stored in a SVN server, these developers are already using Git
+through the Git-SVN integration.
+
+Git allows you to:
+
+* Commit, squash, merge, and fork locally without touching the remote server.
+* Maintain local branches, enabling multiple threads of development.
+* Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
+* Inspect the repository history (blame, log, bisect) without Internet access.
+* Maintain remote forks and branches on Git hosting services and
+ integrate back to the main repository.
+
+In addition, because Git seems to be replacing many OSS projects' version
+control systems, there are many tools that are built over Git.
+Future tooling may support Git first (if not only).
+
+Why GitHub?
+-----------
+
+GitHub, like GitLab and BitBucket, provides free code hosting for open source
+projects. Any of these could replace the code-hosting infrastructure that we
+have today.
+
+These services also have a dedicated team to monitor, migrate, improve and
+distribute the contents of the repositories depending on region and load.
+
+GitHub has one important advantage over GitLab and
+BitBucket: it offers read-write **SVN** access to the repository
+(https://github.com/blog/626-announcing-svn-support).
+This would enable people to continue working post-migration as though our code
+were still canonically in an SVN repository.
+
+In addition, there are already multiple LLVM mirrors on GitHub, indicating that
+part of our community has already settled there.
+
+On Managing Revision Numbers with Git
+-------------------------------------
+
+The current SVN repository hosts all the LLVM sub-projects alongside each other.
+A single revision number (e.g. r123456) thus identifies a consistent version of
+all LLVM sub-projects.
+
+Git does not use sequential integer revision number but instead uses a hash to
+identify each commit. (Linus mentioned that the lack of such revision number
+is "the only real design mistake" in Git [TorvaldRevNum]_.)
+
+The loss of a sequential integer revision number has been a sticking point in
+past discussions about Git:
+
+- "The 'branch' I most care about is mainline, and losing the ability to say
+ 'fixed in r1234' (with some sort of monotonically increasing number) would
+ be a tragic loss." [LattnerRevNum]_
+- "I like those results sorted by time and the chronology should be obvious, but
+ timestamps are incredibly cumbersome and make it difficult to verify that a
+ given checkout matches a given set of results." [TrickRevNum]_
+- "There is still the major regression with unreadable version numbers.
+ Given the amount of Bugzilla traffic with 'Fixed in...', that's a
+ non-trivial issue." [JSonnRevNum]_
+- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
+
+However, Git can emulate this increasing revision number:
+``git rev-list --count <commit-hash>``. This identifier is unique only
+within a single branch, but this means the tuple `(num, branch-name)` uniquely
+identifies a commit.
+
+We can thus use this revision number to ensure that e.g. `clang -v` reports a
+user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
+the objections raised above with respect to this aspect of Git.
+
+What About Branches and Merges?
+-------------------------------
+
+In contrast to SVN, Git makes branching easy. Git's commit history is
+represented as a DAG, a departure from SVN's linear history. However, we propose
+to mandate making merge commits illegal in our canonical Git repository.
+
+Unfortunately, GitHub does not support server side hooks to enforce such a
+policy. We must rely on the community to avoid pushing merge commits.
+
+GitHub offers a feature called `Status Checks`: a branch protected by
+`status checks` requires commits to be whitelisted before the push can happen.
+We could supply a pre-push hook on the client side that would run and check the
+history, before whitelisting the commit being pushed [statuschecks]_.
+However this solution would be somewhat fragile (how do you update a script
+installed on every developer machine?) and prevents SVN access to the
+repository.
+
+What About Commit Emails?
+-------------------------
+
+We will need a new bot to send emails for each commit. This proposal leaves the
+email format unchanged besides the commit URL.
+
+Straw Man Migration Plan
+========================
+
+Step #1 : Before The Move
+-------------------------
+
+1. Update docs to mention the move, so people are aware of what is going on.
+2. Set up a read-only version of the GitHub project, mirroring our current SVN
+ repository.
+3. Add the required bots to implement the commit emails, as well as the
+ umbrella repository update (if the multirepo is selected) or the read-only
+ Git views for the sub-projects (if the monorepo is selected).
+
+Step #2 : Git Move
+------------------
+
+4. Update the buildbots to pick up updates and commits from the GitHub
+ repository. Not all bots have to migrate at this point, but it'll help
+ provide infrastructure testing.
+5. Update Phabricator to pick up commits from the GitHub repository.
+6. LNT and llvmlab have to be updated: they rely on unique monotonically
+ increasing integer across branch [MatthewsRevNum]_.
+7. Instruct downstream integrators to pick up commits from the GitHub
+ repository.
+8. Review and prepare an update for the LLVM documentation.
+
+Until this point nothing has changed for developers, it will just
+boil down to a lot of work for buildbot and other infrastructure
+owners.
+
+The migration will pause here until all dependencies have cleared, and all
+problems have been solved.
+
+Step #3: Write Access Move
+--------------------------
+
+9. Collect developers' GitHub account information, and add them to the project.
+10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
+11. Update the documentation.
+12. Mirror Git to SVN.
+
+Step #4 : Post Move
+-------------------
+
+13. Archive the SVN repository.
+14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
+ point to GitHub instead.
+
+One or Multiple Repositories?
+=============================
+
+There are two major variants for how to structure our Git repository: The
+"multirepo" and the "monorepo".
+
+Multirepo Variant
+-----------------
+
+This variant recommends moving each LLVM sub-project to a separate Git
+repository. This mimics the existing official read-only Git repositories
+(e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical
+repositories for each sub-project.
+
+This will allow the individual sub-projects to remain distinct: a
+developer interested only in compiler-rt can checkout only this repository,
+build it, and work in isolation of the other sub-projects.
+
+A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or
+clang+llvm+libcxx for example) at a specific revision.
+
+A tuple of revisions (one entry per repository) accurately describes the state
+across the sub-projects.
+For example, a given version of clang would be
+*<LLVM-12345, clang-5432, libcxx-123, etc.>*.
+
+Umbrella Repository
+^^^^^^^^^^^^^^^^^^^
+
+To make this more convenient, a separate *umbrella* repository will be
+provided. This repository will be used for the sole purpose of understanding
+the sequence in which commits were pushed to the different repositories and to
+provide a single revision number.
+
+This umbrella repository will be read-only and continuously updated
+to record the above tuple. The proposed form to record this is to use Git
+[submodules]_, possibly along with a set of scripts to help check out a
+specific revision of the LLVM distribution.
+
+A regular LLVM developer does not need to interact with the umbrella repository
+-- the individual repositories can be checked out independently -- but you would
+need to use the umbrella repository to bisect multiple sub-projects at the same
+time, or to check-out old revisions of LLVM with another sub-project at a
+consistent state.
+
+This umbrella repository will be updated automatically by a bot (running on
+notice from a webhook on every push, and periodically) on a per commit basis: a
+single commit in the umbrella repository would match a single commit in a
+sub-project.
+
+Living Downstream
+^^^^^^^^^^^^^^^^^
+
+Downstream SVN users can use the read/write SVN bridges with the following
+caveats:
+
+ * Be prepared for a one-time change to the upstream revision numbers.
+ * The upstream sub-project revision numbers will no longer be in sync.
+
+Downstream Git users can continue without any major changes, with the minor
+change of upstreaming using `git push` instead of `git svn dcommit`.
+
+Git users also have the option of adopting an umbrella repository downstream.
+The tooling for the upstream umbrella can easily be reused for downstream needs,
+incorporating extra sub-projects and branching in parallel with sub-project
+branches.
+
+Multirepo Preview
+^^^^^^^^^^^^^^^^^
+
+As a preview (disclaimer: this rough prototype, not polished and not
+representative of the final solution), you can look at the following:
+
+ * Repository: https://github.com/llvm-beanz/llvm-submodules
+ * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
+
+Concerns
+^^^^^^^^
+
+ * Because GitHub does not allow server-side hooks, and because there is no
+ "push timestamp" in Git, the umbrella repository sequence isn't totally
+ exact: commits from different repositories pushed around the same time can
+ appear in different orders. However, we don't expect it to be the common case
+ or to cause serious issues in practice.
+ * You can't have a single cross-projects commit that would update both LLVM and
+ other sub-projects (something that can be achieved now). It would be possible
+ to establish a protocol whereby users add a special token to their commit
+ messages that causes the umbrella repo's updater bot to group all of them
+ into a single revision.
+ * Another option is to group commits that were pushed closely enough together
+ in the umbrella repository. This has the advantage of allowing cross-project
+ commits, and is less sensitive to mis-ordering commits. However, this has the
+ potential to group unrelated commits together, especially if the bot goes
+ down and needs to catch up.
+ * This variant relies on heavier tooling. But the current prototype shows that
+ it is not out-of-reach.
+ * Submodules don't have a good reputation / are complicating the command line.
+ However, in the proposed setup, a regular developer will seldom interact with
+ submodules directly, and certainly never update them.
+ * Refactoring across projects is not friendly: taking some functions from clang
+ to make it part of a utility in libSupport wouldn't carry the history of the
+ code in the llvm repo, preventing recursively applying `git blame` for
+ instance. However, this is not very different than how most people are
+ Interacting with the repository today, by splitting such change in multiple
+ commits.
+
+Workflows
+^^^^^^^^^
+
+ * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
+ * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-multicheckout-nocommit>`.
+ * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-multicheckout-multicommit>`.
+ * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
+ * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-multi-branching>`.
+ * :ref:`Bisecting <workflow-multi-bisecting>`.
+
+Monorepo Variant
+----------------
+
+This variant recommends moving all LLVM sub-projects to a single Git repository,
+similar to https://github.com/llvm-project/llvm-project.
+This would mimic an export of the current SVN repository, with each sub-project
+having its own top-level directory.
+Not all sub-projects are used for building toolchains. In practice, www/
+and test-suite/ will probably stay out of the monorepo.
+
+Putting all sub-projects in a single checkout makes cross-project refactoring
+naturally simple:
+
+ * New sub-projects can be trivially split out for better reuse and/or layering
+ (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
+ dependency on LLVM).
+ * Changing an API in LLVM and upgrading the sub-projects will always be done in
+ a single commit, designing away a common source of temporary build breakage.
+ * Moving code across sub-project (during refactoring for instance) in a single
+ commit enables accurate `git blame` when tracking code change history.
+ * Tooling based on `git grep` works natively across sub-projects, allowing to
+ easier find refactoring opportunities across projects (for example reusing a
+ datastructure initially in LLDB by moving it into libSupport).
+ * Having all the sources present encourages maintaining the other sub-projects
+ when changing API.
+
+Finally, the monorepo maintains the property of the existing SVN repository that
+the sub-projects move synchronously, and a single revision number (or commit
+hash) identifies the state of the development across all projects.
+
+.. _build_single_project:
+
+Building a single sub-project
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Nobody will be forced to build unnecessary projects. The exact structure
+is TBD, but making it trivial to configure builds for a single sub-project
+(or a subset of sub-projects) is a hard requirement.
+
+As an example, it could look like the following::
+
+ mkdir build && cd build
+ # Configure only LLVM (default)
+ cmake path/to/monorepo
+ # Configure LLVM and lld
+ cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
+ # Configure LLVM and clang
+ cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
+
+.. _git-svn-mirror:
+
+Read/write sub-project mirrors
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+With the Monorepo, the existing single-subproject mirrors (e.g.
+http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
+continue to be maintained: developers would continue to be able to use the
+existing single-subproject git repositories as they do today, with *no changes
+to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to
+work identically to how it works today. The monorepo can be set-up such that the
+SVN revision number matches the SVN revision in the GitHub SVN-bridge.
+
+Living Downstream
+^^^^^^^^^^^^^^^^^
+
+Downstream SVN users can use the read/write SVN bridge. The SVN revision
+number can be preserved in the monorepo, minimizing the impact.
+
+Downstream Git users can continue without any major changes, by using the
+git-svn mirrors on top of the SVN bridge.
+
+Git users can also work upstream with monorepo even if their downstream
+fork has split repositories. They can apply patches in the appropriate
+subdirectories of the monorepo using, e.g., `git am --directory=...`, or
+plain `diff` and `patch`.
+
+Alternatively, Git users can migrate their own fork to the monorepo. As a
+demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:
+
+ * Using a script that rewrites history (including merges) so that it looks
+ like the fork always lived in the monorepo [LebarCHERI]_. The upside of
+ this is when you check out an old revision, you get a copy of all llvm
+ sub-projects at a consistent revision. (For instance, if it's a clang
+ fork, when you check out an old revision you'll get a consistent version
+ of llvm proper.) The downside is that this changes the fork's commit
+ hashes.
+
+ * Merging the fork into the monorepo [AminiCHERI]_. This preserves the
+ fork's commit hashes, but when you check out an old commit you only get
+ the one sub-project.
+
+Monorepo Preview
+^^^^^^^^^^^^^^^^^
+
+As a preview (disclaimer: this rough prototype, not polished and not
+representative of the final solution), you can look at the following:
+
+ * Full Repository: https://github.com/joker-eph/llvm-project
+ * Single sub-project view with *SVN write access* to the full repo:
+ https://github.com/joker-eph/compiler-rt
+
+Concerns
+^^^^^^^^
+
+ * Using the monolithic repository may add overhead for those contributing to a
+ standalone sub-project, particularly on runtimes like libcxx and compiler-rt
+ that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
+ 1GB for the monorepo), and the commit rate of LLVM may cause more frequent
+ `git push` collisions when upstreaming. Affected contributors can continue to
+ use the SVN bridge or the single-subproject Git mirrors with git-svn for
+ read-write.
+ * Using the monolithic repository may add overhead for those *integrating* a
+ standalone sub-project, even if they aren't contributing to it, due to the
+ same disk space concern as the point above. The availability of the
+ sub-project Git mirror addresses this, even without SVN access.
+ * Preservation of the existing read/write SVN-based workflows relies on the
+ GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
+ into GitHub and could restrict future workflow changes.
+
+Workflows
+^^^^^^^^^
+
+ * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
+ * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-monocheckout-nocommit>`.
+ * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
+ * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
+ * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
+ * :ref:`Bisecting <workflow-mono-bisecting>`.
+
+Multi/Mono Hybrid Variant
+-------------------------
+
+This variant recommends moving only the LLVM sub-projects that are *rev-locked*
+to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo
+proposal for the rest. While neither variant recommends combining sub-projects
+like www/ and test-suite/ (which are completely standalone), this goes further
+and keeps sub-projects like libcxx and compiler-rt in their own distinct
+repositories.
+
+Concerns
+^^^^^^^^
+
+ * This has most disadvantages of multirepo and monorepo, without bringing many
+ of the advantages.
+ * Downstream have to upgrade to the monorepo structure, but only partially. So
+ they will keep the infrastructure to integrate the other separate
+ sub-projects.
+ * All projects that use LIT for testing are effectively rev-locked to LLVM.
+ Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
+ It's not clear where to draw the lines.
+
+
+Workflow Before/After
+=====================
+
+This section goes through a few examples of workflows, intended to illustrate
+how end-users or developers would interact with the repository for
+various use-cases.
+
+.. _workflow-checkout-commit:
+
+Checkout/Clone a Single Project, without Commit Access
+------------------------------------------------------
+
+Except the URL, nothing changes. The possibilities today are::
+
+ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
+ # or with Git
+ git clone http://llvm.org/git/llvm.git
+
+After the move to GitHub, you would do either::
+
+ git clone https://github.com/llvm-project/llvm.git
+ # or using the GitHub svn native bridge
+ svn co https://github.com/llvm-project/llvm/trunk
+
+The above works for both the monorepo and the multirepo, as we'll maintain the
+existing read-only views of the individual sub-projects.
+
+Checkout/Clone a Single Project, with Commit Access
+---------------------------------------------------
+
+Currently
+^^^^^^^^^
+
+::
+
+ # direct SVN checkout
+ svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
+ # or using the read-only Git view, with git-svn
+ git clone http://llvm.org/git/llvm.git
+ cd llvm
+ git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
+ git config svn-remote.svn.fetch :refs/remotes/origin/master
+ git svn rebase -l # -l avoids fetching ahead of the git mirror.
+
+Commits are performed using `svn commit` or with the sequence `git commit` and
+`git svn dcommit`.
+
+.. _workflow-multicheckout-nocommit:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+With the multirepo variant, nothing changes but the URL, and commits can be
+performed using `svn commit` or `git commit` and `git push`::
+
+ git clone https://github.com/llvm/llvm.git llvm
+ # or using the GitHub svn native bridge
+ svn co https://github.com/llvm/llvm/trunk/ llvm
+
+.. _workflow-monocheckout-nocommit:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+With the monorepo variant, there are a few options, depending on your
+constraints. First, you could just clone the full repository::
+
+ git clone https://github.com/llvm/llvm-projects.git llvm
+ # or using the GitHub svn native bridge
+ svn co https://github.com/llvm/llvm-projects/trunk/ llvm
+
+At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
+:ref:`doesn't imply you have to build all of them <build_single_project>`. You
+can still build only compiler-rt for instance. In this way it's not different
+from someone who would check out all the projects with SVN today.
+
+You can commit as normal using `git commit` and `git push` or `svn commit`, and
+read the history for a single project (`git log libcxx` for example).
+
+Secondly, there are a few options to avoid checking out all the sources.
+
+**Using the GitHub SVN bridge**
+
+The GitHub SVN native bridge allows to checkout a subdirectory directly:
+
+ svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt âusername=...
+
+This checks out only compiler-rt and provides commit access using "svn commit",
+in the same way as it would do today.
+
+**Using a Subproject Git Nirror**
+
+You can use *git-svn* and one of the sub-project mirrors::
+
+ # Clone from the single read-only Git repo
+ git clone http://llvm.org/git/llvm.git
+ cd llvm
+ # Configure the SVN remote and initialize the svn metadata
+ $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm âusername=...
+ git config svn-remote.svn.fetch :refs/remotes/origin/master
+ git svn rebase -l
+
+In this case the repository contains only a single sub-project, and commits can
+be made using `git svn dcommit`, again exactly as we do today.
+
+**Using a Sparse Checkouts**
+
+You can hide the other directories using a Git sparse checkout::
+
+ git config core.sparseCheckout true
+ echo /compiler-rt > .git/info/sparse-checkout
+ git read-tree -mu HEAD
+
+The data for all sub-projects is still in your `.git` directory, but in your
+checkout, you only see `compiler-rt`.
+Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
+usual.
+
+Note that when you fetch you'll likely pull in changes to sub-projects you don't
+care about. If you are using spasre checkout, the files from other projects
+won't appear on your disk. The only effect is that your commit hash changes.
+
+You can check whether the changes in the last fetch are relevant to your commit
+by running::
+
+ git log origin/master@{1}..origin/master -- libcxx
+
+This command can be hidden in a script so that `git llvmpush` would perform all
+these steps, fail only if such a dependent change exists, and show immediately
+the change that prevented the push. An immediate repeat of the command would
+(almost) certainly result in a successful push.
+Note that today with SVN or git-svn, this step is not possible since the
+"rebase" implicitly happens while committing (unless a conflict occurs).
+
+Checkout/Clone Multiple Projects, with Commit Access
+----------------------------------------------------
+
+Let's look how to assemble llvm+clang+libcxx at a given revision.
+
+Currently
+^^^^^^^^^
+
+::
+
+ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
+ cd llvm/tools
+ svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
+ cd ../projects
+ svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
+
+Or using git-svn::
+
+ git clone http://llvm.org/git/llvm.git
+ cd llvm/
+ git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
+ git config svn-remote.svn.fetch :refs/remotes/origin/master
+ git svn rebase -l
+ git checkout `git svn find-rev -B r258109`
+ cd tools
+ git clone http://llvm.org/git/clang.git
+ cd clang/
+ git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
+ git config svn-remote.svn.fetch :refs/remotes/origin/master
+ git svn rebase -l
+ git checkout `git svn find-rev -B r258109`
+ cd ../../projects/
+ git clone http://llvm.org/git/libcxx.git
+ cd libcxx
+ git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
+ git config svn-remote.svn.fetch :refs/remotes/origin/master
+ git svn rebase -l
+ git checkout `git svn find-rev -B r258109`
+
+Note that the list would be longer with more sub-projects.
+
+.. _workflow-multicheckout-multicommit:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+With the multirepo variant, the umbrella repository will be used. This is
+where the mapping from a single revision number to the individual repositories
+revisions is stored.::
+
+ git clone https://github.com/llvm-beanz/llvm-submodules
+ cd llvm-submodules
+ git checkout $REVISION
+ git submodule init
+ git submodule update clang llvm libcxx
+ # the list of sub-project is optional, `git submodule update` would get them all.
+
+At this point the clang, llvm, and libcxx individual repositories are cloned
+and stored alongside each other. There are CMake flags to describe the directory
+structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`,
+etc.
+
+Another option is to checkout repositories based on the commit timestamp::
+
+ git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`
+
+.. _workflow-monocheckout-multicommit:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+The repository contains natively the source for every sub-projects at the right
+revision, which makes this straightforward::
+
+ git clone https://github.com/llvm/llvm-projects.git llvm-projects
+ cd llvm-projects
+ git checkout $REVISION
+
+As before, at this point clang, llvm, and libcxx are stored in directories
+alongside each other.
+
+.. _workflow-cross-repo-commit:
+
+Commit an API Change in LLVM and Update the Sub-projects
+--------------------------------------------------------
+
+Today this is possible, even though not common (at least not documented) for
+subversion users and for git-svn users. For example, few Git users try to update
+LLD or Clang in the same commit as they change an LLVM API.
+
+The multirepo variant does not address this: one would have to commit and push
+separately in every individual repository. It would be possible to establish a
+protocol whereby users add a special token to their commit messages that causes
+the umbrella repo's updater bot to group all of them into a single revision.
+
+The monorepo variant handles this natively.
+
+Branching/Stashing/Updating for Local Development or Experiments
+----------------------------------------------------------------
+
+Currently
+^^^^^^^^^
+
+SVN does not allow this use case, but developers that are currently using
+git-svn can do it. Let's look in practice what it means when dealing with
+multiple sub-projects.
+
+To update the repository to tip of trunk::
+
+ git pull
+ cd tools/clang
+ git pull
+ cd ../../projects/libcxx
+ git pull
+
+To create a new branch::
+
+ git checkout -b MyBranch
+ cd tools/clang
+ git checkout -b MyBranch
+ cd ../../projects/libcxx
+ git checkout -b MyBranch
+
+To switch branches::
+
+ git checkout AnotherBranch
+ cd tools/clang
+ git checkout AnotherBranch
+ cd ../../projects/libcxx
+ git checkout AnotherBranch
+
+.. _workflow-multi-branching:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+The multirepo works the same as the current Git workflow: every command needs
+to be applied to each of the individual repositories.
+However, the umbrella repository makes this easy using `git submodule foreach`
+to replicate a command on all the individual repositories (or submodules
+in this case):
+
+To create a new branch::
+
+ git submodule foreach git checkout -b MyBranch
+
+To switch branches::
+
+ git submodule foreach git checkout AnotherBranch
+
+.. _workflow-mono-branching:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+Regular Git commands are sufficient, because everything is in a single
+repository:
+
+To update the repository to tip of trunk::
+
+ git pull
+
+To create a new branch::
+
+ git checkout -b MyBranch
+
+To switch branches::
+
+ git checkout AnotherBranch
+
+Bisecting
+---------
+
+Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
+
+Currently
+^^^^^^^^^
+
+SVN does not have builtin bisection support, but the single revision across
+sub-projects makes it possible to script around.
+
+Using the existing Git read-only view of the repositories, it is possible to use
+the native Git bisection script over the llvm repository, and use some scripting
+to synchronize the clang repository to match the llvm revision.
+
+.. _workflow-multi-bisecting:
+
+Multirepo Variant
+^^^^^^^^^^^^^^^^^
+
+With the multi-repositories variant, the cross-repository synchronization is
+achieved using the umbrella repository. This repository contains only
+submodules for the other sub-projects. The native Git bisection can be used on
+the umbrella repository directly. A subtlety is that the bisect script itself
+needs to make sure the submodules are updated accordingly.
+
+For example, to find which commit introduces a regression where clang-3.9
+crashes but not clang-3.8 passes, one should be able to simply do::
+
+ git bisect start release_39 release_38
+ git bisect run ./bisect_script.sh
+
+With the `bisect_script.sh` script being::
+
+ #!/bin/sh
+ cd $UMBRELLA_DIRECTORY
+ git submodule update llvm clang libcxx #....
+ cd $BUILD_DIR
+
+ ninja clang || exit 125 # an exit code of 125 asks "git bisect"
+ # to "skip" the current commit
+
+ ./bin/clang some_crash_test.cpp
+
+When the `git bisect run` command returns, the umbrella repository is set to
+the state where the regression is introduced. The commit diff in the umbrella
+indicate which submodule was updated, and the last commit in this sub-projects
+is the one that the bisect found.
+
+.. _workflow-mono-bisecting:
+
+Monorepo Variant
+^^^^^^^^^^^^^^^^
+
+Bisecting on the monorepo is straightforward, and very similar to the above,
+except that the bisection script does not need to include the
+`git submodule update` step.
+
+The same example, finding which commit introduces a regression where clang-3.9
+crashes but not clang-3.8 passes, will look like::
+
+ git bisect start release_39 release_38
+ git bisect run ./bisect_script.sh
+
+With the `bisect_script.sh` script being::
+
+ #!/bin/sh
+ cd $BUILD_DIR
+
+ ninja clang || exit 125 # an exit code of 125 asks "git bisect"
+ # to "skip" the current commit
+
+ ./bin/clang some_crash_test.cpp
+
+Also, since the monorepo handles commits update across multiple projects, you're
+less like to encounter a build failure where a commit change an API in LLVM and
+another later one "fixes" the build in clang.
+
+
+References
+==========
+
+.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
+.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
+.. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
+.. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html
+.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
+.. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules)
+.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
+.. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
+.. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html
Added: www-releases/trunk/8.0.1/docs/_sources/Proposals/TestSuite.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Proposals/TestSuite.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Proposals/TestSuite.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Proposals/TestSuite.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,321 @@
+=====================
+Test-Suite Extentions
+=====================
+
+.. contents::
+ :depth: 1
+ :local:
+
+Abstract
+========
+
+These are ideas for additional programs, benchmarks, applications and
+algorithms that could be added to the LLVM Test-Suite.
+The test-suite could be much larger than it is now, which would help us
+detecting compiler errors (crashes, miscompiles) during development.
+
+Most probably, the reason why the programs below have not been added to
+the test-suite yet is that nobody has found time to do it. But there
+might be other issues as well, such as
+
+ * Licensing (Support can still be added as external module,
+ like for the SPEC benchmarks)
+
+ * Language (in particular, there is no official LLVM frontend
+ for FORTRAN yet)
+
+ * Parallelism (currently, all programs in test-suite use
+ one thread only)
+
+Benchmarks
+==========
+
+SPEC CPU 2017
+-------------
+https://www.spec.org/cpu2017/
+
+The following have not been included yet because they contain Fortran
+code.
+
+In case of cactuBSSN only a small portion is Fortran. The hosts's
+Fortran compiler could be used for these parts.
+
+Note that CMake's Ninja generator has difficulties with Fortran. See the
+`CMake documentation <https://cmake.org/cmake/help/v3.13/generator/Ninja.html#fortran-support>`_
+for details.
+
+ * 503.bwaves_r/603.bwaves_s
+ * 507.cactuBSSN_r
+ * 521.wrf_r/621.wrf_s
+ * 527.cam4_r/627.cam4_s
+ * 628.pop2_s
+ * 548.exchange2_r/648.exchange2_s
+ * 549.fotonik3d_r/649.fotonik3d_s
+ * 554.roms_r/654.roms_s
+
+SPEC OMP2012
+------------
+https://www.spec.org/omp2012/
+
+ * 350.md
+ * 351.bwaves
+ * 352.nab
+ * 357.bt331
+ * 358.botsalgn
+ * 359.botsspar
+ * 360.ilbdc
+ * 362.fma3d
+ * 363.swim
+ * 367.imagick
+ * 370.mgrid331
+ * 371.applu331
+ * 372.smithwa
+ * 376.kdtree
+
+OpenCV
+------
+https://opencv.org/
+
+OpenMP 4.x SIMD Benchmarks
+--------------------------
+https://github.com/flwende/simd_benchmarks
+
+PWM-benchmarking
+----------------
+https://github.com/tbepler/PWM-benchmarking
+
+SLAMBench
+---------
+https://github.com/pamela-project/slambench
+
+FireHose
+--------
+http://firehose.sandia.gov/
+
+A Benchmark for the C/C++ Standard Library
+------------------------------------------
+https://github.com/hiraditya/std-benchmark
+
+OpenBenchmarking.org CPU / Processor Suite
+------------------------------------------
+https://openbenchmarking.org/suite/pts/cpu
+
+This is a subset of the
+`Phoronix Test Suite <https://github.com/phoronix-test-suite/phoronix-test-suite/>`_
+and is itself a collection of benchmark suites
+
+Parboil Benchmarks
+------------------
+http://impact.crhc.illinois.edu/parboil/parboil.aspx
+
+MachSuite
+---------
+https://breagen.github.io/MachSuite/
+
+Rodinia
+-------
+http://lava.cs.virginia.edu/Rodinia/download_links.htm
+
+Rodinia has already been partially included in
+MultiSource/Benchmarks/Rodinia. Benchmarks still missing are:
+
+ * streamcluster
+ * particlefilter
+ * nw
+ * nn
+ * myocyte
+ * mummergpu
+ * lud
+ * leukocyte
+ * lavaMD
+ * kmeans
+ * hotspot3D
+ * heartwall
+ * cfd
+ * bfs
+ * b+tree
+
+vecmathlib tests harness
+------------------------
+https://bitbucket.org/eschnett/vecmathlib/wiki/Home
+
+PARSEC
+------
+http://parsec.cs.princeton.edu/
+
+Graph500 reference implementations
+----------------------------------
+https://github.com/graph500/graph500/tree/v2-spec
+
+NAS Parallel Benchmarks
+-----------------------
+https://www.nas.nasa.gov/publications/npb.html
+
+The official benchmark is written in Fortran, but an unofficial
+C-translation is available as well:
+https://github.com/benchmark-subsetting/NPB3.0-omp-C
+
+DARPA HPCS SSCA#2 C/OpenMP reference implementation
+---------------------------------------------------
+http://www.highproductivity.org/SSCABmks.htm
+
+This web site does not exist any more, but there seems to be a copy of
+some of the benchmarks
+https://github.com/gtcasl/hpc-benchmarks/tree/master/SSCA2v2.2
+
+Kokkos
+------
+https://github.com/kokkos/kokkos-kernels/tree/master/perf_test
+https://github.com/kokkos/kokkos/tree/master/benchmarks
+
+PolyMage
+--------
+https://github.com/bondhugula/polymage-benchmarks
+
+PolyBench
+---------
+https://sourceforge.net/projects/polybench/
+
+A modified version of Polybench 3.2 is already presented in
+SingleSource/Benchmarks/Polybench. A newer version 4.2.1 is available.
+
+High Performance Geometric Multigrid
+------------------------------------
+https://crd.lbl.gov/departments/computer-science/PAR/research/hpgmg/
+
+RAJA Performance Suite
+----------------------
+https://github.com/LLNL/RAJAPerf
+
+CORAL-2 Benchmarks
+------------------
+https://asc.llnl.gov/coral-2-benchmarks/
+
+Many of its programs have already been integreated in
+MultiSource/Benchmarks/DOE-ProxyApps-C and
+MultiSource/Benchmarks/DOE-ProxyApps-C++.
+
+ * Nekbone
+ * QMCPack
+ * LAMMPS
+ * Kripke
+ * Quicksilver
+ * PENNANT
+ * Big Data Analytic Suite
+ * Deep Learning Suite
+ * Stream
+ * Stride
+ * ML/DL micro-benchmark
+ * Pynamic
+ * ACME
+ * VPIC
+ * Laghos
+ * Parallel Integer Sort
+ * Havoq
+
+NWChem
+------
+http://www.nwchem-sw.org/index.php/Benchmarks
+
+TVM
+----
+https://github.com/dmlc/tvm/tree/master/apps/benchmark
+
+HydroBench
+----------
+https://github.com/HydroBench/Hydro
+
+ParRes
+------
+https://github.com/ParRes/Kernels/tree/master/Cxx11
+
+Applications/Libraries
+======================
+
+GnuPG
+-----
+https://gnupg.org/
+
+Blitz++
+-------
+https://sourceforge.net/projects/blitz/
+
+FFmpeg
+------
+https://ffmpeg.org/
+
+FreePOOMA
+---------
+http://www.nongnu.org/freepooma/
+
+FTensors
+--------
+http://www.wlandry.net/Projects/FTensor
+
+rawspeed
+--------
+https://github.com/darktable-org/rawspeed
+
+Its test dataset is 756 MB in size, which is too large to be included
+into the test-suite repository.
+
+C++ Performance Benchmarks
+--------------------------
+https://gitlab.com/chriscox/CppPerformanceBenchmarks
+
+Generic Algorithms
+==================
+
+Image processing
+----------------
+
+Resampling
+``````````
+
+ * Bilinear
+ * Bicubic
+ * Lanczos
+
+Dither
+``````
+
+ * Threshold
+ * Random
+ * Halftone
+ * Bayer
+ * Floyd-Steinberg
+ * Jarvis
+ * Stucki
+ * Burkes
+ * Sierra
+ * Atkinson
+ * Gradient-based
+
+Feature detection
+`````````````````
+
+ * Harris
+ * Histogram of Oriented Gradients
+
+Color conversion
+````````````````
+
+ * RGB to grayscale
+ * HSL to RGB
+
+Graph
+-----
+
+Search Algorithms
+`````````````````
+
+ * Breadth-First-Search
+ * Depth-First-Search
+ * Dijkstra's algorithm
+ * A-Star
+
+Spanning Tree
+`````````````
+
+ * Kruskal's algorithm
+ * Prim's algorithm
Added: www-releases/trunk/8.0.1/docs/_sources/Proposals/VariableNames.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Proposals/VariableNames.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Proposals/VariableNames.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Proposals/VariableNames.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,399 @@
+===================
+Variable Names Plan
+===================
+
+.. contents::
+ :local:
+
+This plan is *provisional*. It is not agreed upon. It is written with the
+intention of capturing the desires and concerns of the LLVM community, and
+forming them into a plan that can be agreed upon.
+The original author is somewhat naïve in the ways of LLVM so there will
+inevitably be some details that are flawed. You can help - you can edit this
+page (preferably with a Phabricator review for larger changes) or reply to the
+`Request For Comments thread
+<http://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html>`_.
+
+Too Long; Didn't Read
+=====================
+
+Improve the readability of LLVM code.
+
+Introduction
+============
+
+The current `variable naming rule
+<../CodingStandards.html#name-types-functions-variables-and-enumerators-properly>`_
+states:
+
+ Variable names should be nouns (as they represent state). The name should be
+ camel case, and start with an upper case letter (e.g. Leader or Boats).
+
+This rule is the same as that for type names. This is a problem because the
+type name cannot be reused for a variable name [*]_. LLVM developers tend to
+work around this by either prepending ``The`` to the type name::
+
+ Triple TheTriple;
+
+... or more commonly use an acronym, despite the coding standard stating "Avoid
+abbreviations unless they are well known"::
+
+ Triple T;
+
+The proliferation of acronyms leads to hard-to-read code such as `this
+<https://github.com/llvm/llvm-project/blob/0a8bc14ad7f3209fe702d18e250194cd90188596/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L7445>`_::
+
+ InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
+ &LVL, &CM);
+
+Many other coding guidelines [LLDB]_ [Google]_ [WebKit]_ [Qt]_ [Rust]_ [Swift]_
+[Python]_ require that variable names begin with a lower case letter in contrast
+to class names which begin with a capital letter. This convention means that the
+most readable variable name also requires the least thought::
+
+ Triple triple;
+
+There is some agreement that the current rule is broken [LattnerAgree]_
+[ArsenaultAgree]_ [RobinsonAgree]_ and that acronyms are an obstacle to reading
+new code [MalyutinDistinguish]_ [CarruthAcronym]_ [PicusAcronym]_. There are
+some opposing views [ParzyszekAcronym2]_ [RicciAcronyms]_.
+
+This work-in-progress proposal is to change the coding standard for variable
+names to require that they start with a lower case letter.
+
+.. [*] In `some cases
+ <https://github.com/llvm/llvm-project/blob/8b72080d4d7b13072f371712eed333f987b7a18e/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L2727>`_
+ the type name *is* reused as a variable name, but this shadows the type name
+ and confuses many debuggers [DenisovCamelBack]_.
+
+Variable Names Coding Standard Options
+======================================
+
+There are two main options for variable names that begin with a lower case
+letter: ``camelBack`` and ``lower_case``. (These are also known by other names
+but here we use the terminology from clang-tidy).
+
+``camelBack`` is consistent with [WebKit]_, [Qt]_ and [Swift]_ while
+``lower_case`` is consistent with [LLDB]_, [Google]_, [Rust]_ and [Python]_.
+
+``camelBack`` is already used for function names, which may be considered an
+advantage [LattnerFunction]_ or a disadvantage [CarruthFunction]_.
+
+Approval for ``camelBack`` was expressed by [DenisovCamelBack]_
+[LattnerFunction]_ [IvanovicDistinguish]_.
+Opposition to ``camelBack`` was expressed by [CarruthCamelBack]_
+[TurnerCamelBack]_.
+Approval for ``lower_case`` was expressed by [CarruthLower]_
+[CarruthCamelBack]_ [TurnerLLDB]_.
+Opposition to ``lower_case`` was expressed by [LattnerLower]_.
+
+Differentiating variable kinds
+------------------------------
+
+An additional requested change is to distinguish between different kinds of
+variables [RobinsonDistinguish]_ [RobinsonDistinguish2]_ [JonesDistinguish]_
+[IvanovicDistinguish]_ [CarruthDistinguish]_ [MalyutinDistinguish]_.
+
+Others oppose this idea [HähnleDistinguish]_ [GreeneDistinguish]_
+[HendersonPrefix]_.
+
+A possibility is for member variables to be prefixed with ``m_`` and for global
+variables to be prefixed with ``g_`` to distinguish them from local variables.
+This is consistent with [LLDB]_. The ``m_`` prefix is consistent with [WebKit]_.
+
+A variation is for member variables to be prefixed with ``m``
+[IvanovicDistinguish]_ [BeylsDistinguish]_. This is consistent with [Mozilla]_.
+
+Another option is for member variables to be suffixed with ``_`` which is
+consistent with [Google]_ and similar to [Python]_. Opposed by
+[ParzyszekDistinguish]_.
+
+Reducing the number of acronyms
+===============================
+
+While switching coding standard will make it easier to use non-acronym names for
+new code, it doesn't improve the existing large body of code that uses acronyms
+extensively to the detriment of its readability. Further, it is natural and
+generally encouraged that new code be written in the style of the surrounding
+code. Therefore it is likely that much newly written code will also use
+acronyms despite what the coding standard says, much as it is today.
+
+As well as changing the case of variable names, they could also be expanded to
+their non-acronym form e.g. ``Triple T`` â ``Triple triple``.
+
+There is support for expanding many acronyms [CarruthAcronym]_ [PicusAcronym]_
+but there is a preference that expanding acronyms be deferred
+[ParzyszekAcronym]_ [CarruthAcronym]_.
+
+The consensus within the community seems to be that at least some acronyms are
+valuable [ParzyszekAcronym]_ [LattnerAcronym]_. The most commonly cited acronym
+is ``TLI`` however that is used to refer to both ``TargetLowering`` and
+``TargetLibraryInfo`` [GreeneDistinguish]_.
+
+The following is a list of acronyms considered sufficiently useful that the
+benefit of using them outweighs the cost of learning them. Acronyms that are
+either not on the list or are used to refer to a different type should be
+expanded.
+
+============================ =============
+Class name Variable name
+============================ =============
+DeterministicFiniteAutomaton dfa
+DominatorTree dt
+LoopInfo li
+MachineFunction mf
+MachineInstr mi
+MachineRegisterInfo mri
+ScalarEvolution se
+TargetInstrInfo tii
+TargetLibraryInfo tli
+TargetRegisterInfo tri
+============================ =============
+
+In some cases renaming acronyms to the full type name will result in overly
+verbose code. Unlike most classes, a variable's scope is limited and therefore
+some of its purpose can implied from that scope, meaning that fewer words are
+necessary to give it a clear name. For example, in an optization pass the reader
+can assume that a variable's purpose relates to optimization and therefore an
+``OptimizationRemarkEmitter`` variable could be given the name ``remarkEmitter``
+or even ``remarker``.
+
+The following is a list of longer class names and the associated shorter
+variable name.
+
+========================= =============
+Class name Variable name
+========================= =============
+BasicBlock block
+ConstantExpr expr
+ExecutionEngine engine
+MachineOperand operand
+OptimizationRemarkEmitter remarker
+PreservedAnalyses analyses
+PreservedAnalysesChecker checker
+TargetLowering lowering
+TargetMachine machine
+========================= =============
+
+Transition Options
+==================
+
+There are three main options for transitioning:
+
+1. Keep the current coding standard
+2. Laissez faire
+3. Big bang
+
+Keep the current coding standard
+--------------------------------
+
+Proponents of keeping the current coding standard (i.e. not transitioning at
+all) question whether the cost of transition outweighs the benefit
+[EmersonConcern]_ [ReamesConcern]_ [BradburyConcern]_.
+The costs are that ``git blame`` will become less usable; and that merging the
+changes will be costly for downstream maintainers. See `Big bang`_ for potential
+mitigations.
+
+Laissez faire
+-------------
+
+The coding standard could allow both ``CamelCase`` and ``camelBack`` styles for
+variable names [LattnerTransition]_.
+
+A code review to implement this is at https://reviews.llvm.org/D57896.
+
+Advantages
+**********
+
+ * Very easy to implement initially.
+
+Disadvantages
+*************
+
+ * Leads to inconsistency [BradburyConcern]_ [AminiInconsistent]_.
+ * Inconsistency means it will be hard to know at a guess what name a variable
+ will have [DasInconsistent]_ [CarruthInconsistent]_.
+ * Some large-scale renaming may happen anyway, leading to its disadvantages
+ without any mitigations.
+
+Big bang
+--------
+
+With this approach, variables will be renamed by an automated script in a series
+of large commits.
+
+The principle advantage of this approach is that it minimises the cost of
+inconsistency [BradburyTransition]_ [RobinsonTransition]_.
+
+It goes against a policy of avoiding large-scale reformatting of existing code
+[GreeneDistinguish]_.
+
+It has been suggested that LLD would be a good starter project for the renaming
+[Ueyama]_.
+
+Keeping git blame usable
+************************
+
+``git blame`` (or ``git annotate``) permits quickly identifying the commit that
+changed a given line in a file. After renaming variables, many lines will show
+as being changed by that one commit, requiring a further invocation of ``git
+blame`` to identify prior, more interesting commits [GreeneGitBlame]_
+[RicciAcronyms]_.
+
+**Mitigation**: `git-hyper-blame
+<https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html>`_
+can ignore or "look through" a given set of commits.
+A ``.git-blame-ignore-revs`` file identifying the variable renaming commits
+could be added to the LLVM git repository root directory.
+It is being `investigated
+<https://public-inbox.org/git/20190324235020.49706-1-michael@platin.gs/>`_
+whether similar functionality could be added to ``git blame`` itself.
+
+Minimising cost of downstream merges
+************************************
+
+There are many forks of LLVM with downstream changes. Merging a large-scale
+renaming change could be difficult for the fork maintainers.
+
+**Mitigation**: A large-scale renaming would be automated. A fork maintainer can
+merge from the commit immediately before the renaming, then apply the renaming
+script to their own branch. They can then merge again from the renaming commit,
+resolving all conflicts by choosing their own version. This could be tested on
+the [SVE]_ fork.
+
+Provisional Plan
+================
+
+This is a provisional plan for the `Big bang`_ approach. It has not been agreed.
+
+#. Investigate improving ``git blame``. The extent to which it can be made to
+ "look through" commits may impact how big a change can be made.
+
+#. Write a script to expand acronyms.
+
+#. Experiment and perform dry runs of the various refactoring options.
+ Results can be published in forks of the LLVM Git repository.
+
+#. Consider the evidence and agree on the new policy.
+
+#. Agree & announce a date for the renaming of the starter project (LLD).
+
+#. Update the `policy page <../CodingStandards.html>`_. This will explain the
+ old and new rules and which projects each applies to.
+
+#. Refactor the starter project in two commits:
+
+ 1. Add or change the project's .clang-tidy to reflect the agreed rules.
+ (This is in a separate commit to enable the merging process described in
+ `Minimising cost of downstream merges`_).
+ Also update the project list on the policy page.
+ 2. Apply ``clang-tidy`` to the project's files, with only the
+ ``readability-identifier-naming`` rules enabled. ``clang-tidy`` will also
+ reformat the affected lines according to the rules in ``.clang-format``.
+ It is anticipated that this will be a good dog-fooding opportunity for
+ clang-tidy, and bugs should be fixed in the process, likely including:
+
+ * `readability-identifier-naming incorrectly fixes lambda capture
+ <https://bugs.llvm.org/show_bug.cgi?id=41119>`_.
+ * `readability-identifier-naming incorrectly fixes variables which
+ become keywords <https://bugs.llvm.org/show_bug.cgi?id=41120>`_.
+ * `readability-identifier-naming misses fixing member variables in
+ destructor <https://bugs.llvm.org/show_bug.cgi?id=41122>`_.
+
+#. Gather feedback and refine the process as appropriate.
+
+#. Apply the process to the following projects, with a suitable delay between
+ each (at least 4 weeks after the first change, at least 2 weeks subsequently)
+ to allow gathering further feedback.
+ This list should exclude projects that must adhere to an externally defined
+ standard e.g. libcxx.
+ The list is roughly in chronological order of renaming.
+ Some items may not make sense to rename individually - it is expected that
+ this list will change following experimentation:
+
+ * TableGen
+ * llvm/tools
+ * clang-tools-extra
+ * clang
+ * ARM backend
+ * AArch64 backend
+ * AMDGPU backend
+ * ARC backend
+ * AVR backend
+ * BPF backend
+ * Hexagon backend
+ * Lanai backend
+ * MIPS backend
+ * NVPTX backend
+ * PowerPC backend
+ * RISC-V backend
+ * Sparc backend
+ * SystemZ backend
+ * WebAssembly backend
+ * X86 backend
+ * XCore backend
+ * libLTO
+ * Debug Information
+ * Remainder of llvm
+ * compiler-rt
+ * libunwind
+ * openmp
+ * parallel-libs
+ * polly
+ * lldb
+
+#. Remove the old variable name rule from the policy page.
+
+#. Repeat many of the steps in the sequence, using a script to expand acronyms.
+
+References
+==========
+
+.. [LLDB] LLDB Coding Conventions https://llvm.org/svn/llvm-project/lldb/branches/release_39/www/lldb-coding-conventions.html
+.. [Google] Google C++ Style Guide https://google.github.io/styleguide/cppguide.html#Variable_Names
+.. [WebKit] WebKit Code Style Guidelines https://webkit.org/code-style-guidelines/#names
+.. [Qt] Qt Coding Style https://wiki.qt.io/Qt_Coding_Style#Declaring_variables
+.. [Rust] Rust naming conventions https://doc.rust-lang.org/1.0.0/style/style/naming/README.html
+.. [Swift] Swift API Design Guidelines https://swift.org/documentation/api-design-guidelines/#general-conventions
+.. [Python] Style Guide for Python Code https://www.python.org/dev/peps/pep-0008/#function-and-variable-names
+.. [Mozilla] Mozilla Coding style: Prefixes https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Prefixes
+.. [SVE] LLVM with support for SVE https://github.com/ARM-software/LLVM-SVE
+.. [AminiInconsistent] Mehdi Amini, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130329.html
+.. [ArsenaultAgree] Matt Arsenault, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129934.html
+.. [BeylsDistinguish] Kristof Beyls, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130292.html
+.. [BradburyConcern] Alex Bradbury, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130266.html
+.. [BradburyTransition] Alex Bradbury, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130388.html
+.. [CarruthAcronym] Chandler Carruth, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130313.html
+.. [CarruthCamelBack] Chandler Carruth, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130214.html
+.. [CarruthDistinguish] Chandler Carruth, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130310.html
+.. [CarruthFunction] Chandler Carruth, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130309.html
+.. [CarruthInconsistent] Chandler Carruth, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130312.html
+.. [CarruthLower] Chandler Carruth, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130430.html
+.. [DasInconsistent] Sanjoy Das, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130304.html
+.. [DenisovCamelBack] Alex Denisov, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130179.html
+.. [EmersonConcern] Amara Emerson, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129894.html
+.. [GreeneDistinguish] David Greene, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130425.html
+.. [GreeneGitBlame] David Greene, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130228.html
+.. [HendersonPrefix] James Henderson, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130465.html
+.. [HähnleDistinguish] Nicolai Hähnle, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129923.html
+.. [IvanovicDistinguish] Nemanja Ivanovic, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130249.html
+.. [JonesDistinguish] JD Jones, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129926.html
+.. [LattnerAcronym] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130353.html
+.. [LattnerAgree] Chris Latter, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129907.html
+.. [LattnerFunction] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130630.html
+.. [LattnerLower] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130629.html
+.. [LattnerTransition] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130355.html
+.. [MalyutinDistinguish] Danila Malyutin, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130320.html
+.. [ParzyszekAcronym] Krzysztof Parzyszek, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130306.html
+.. [ParzyszekAcronym2] Krzysztof Parzyszek, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130323.html
+.. [ParzyszekDistinguish] Krzysztof Parzyszek, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129941.html
+.. [PicusAcronym] Diana Picus, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130318.html
+.. [ReamesConcern] Philip Reames, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130181.html
+.. [RicciAcronyms] Bruno Ricci, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130328.html
+.. [RobinsonAgree] Paul Robinson, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130111.html
+.. [RobinsonDistinguish] Paul Robinson, http://lists.llvm.org/pipermail/llvm-dev/2019-February/129920.html
+.. [RobinsonDistinguish2] Paul Robinson, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130229.html
+.. [RobinsonTransition] Paul Robinson, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130415.html
+.. [TurnerCamelBack] Zachary Turner, https://reviews.llvm.org/D57896#1402264
+.. [TurnerLLDB] Zachary Turner, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130213.html
+.. [Ueyama] Rui Ueyama, http://lists.llvm.org/pipermail/llvm-dev/2019-February/130435.html
Added: www-releases/trunk/8.0.1/docs/_sources/Proposals/VectorizationPlan.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Proposals/VectorizationPlan.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Proposals/VectorizationPlan.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Proposals/VectorizationPlan.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,247 @@
+==================
+Vectorization Plan
+==================
+
+.. contents::
+ :local:
+
+Abstract
+========
+The vectorization transformation can be rather complicated, involving several
+potential alternatives, especially for outer-loops [1]_ but also possibly for
+innermost loops. These alternatives may have significant performance impact,
+both positive and negative. A cost model is therefore employed to identify the
+best alternative, including the alternative of avoiding any transformation
+altogether.
+
+The Vectorization Plan is an explicit model for describing vectorization
+candidates. It serves for both optimizing candidates including estimating their
+cost reliably, and for performing their final translation into IR. This
+facilitates dealing with multiple vectorization candidates.
+
+High-level Design
+=================
+
+Vectorization Workflow
+----------------------
+VPlan-based vectorization involves three major steps, taking a "scenario-based
+approach" to vectorization planning:
+
+1. Legal Step: check if a loop can be legally vectorized; encode constraints and
+ artifacts if so.
+2. Plan Step:
+
+ a. Build initial VPlans following the constraints and decisions taken by
+ Legal Step 1, and compute their cost.
+ b. Apply optimizations to the VPlans, possibly forking additional VPlans.
+ Prune sub-optimal VPlans having relatively high cost.
+3. Execute Step: materialize the best VPlan. Note that this is the only step
+ that modifies the IR.
+
+Design Guidelines
+-----------------
+In what follows, the term "input IR" refers to code that is fed into the
+vectorizer whereas the term "output IR" refers to code that is generated by the
+vectorizer. The output IR contains code that has been vectorized or "widened"
+according to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed
+according to an Unroll Factor (UF).
+The design of VPlan follows several high-level guidelines:
+
+1. Analysis-like: building and manipulating VPlans must not modify the input IR.
+ In particular, if the best option is not to vectorize at all, the
+ vectorization process terminates before reaching Step 3, and compilation
+ should proceed as if VPlans had not been built.
+
+2. Align Cost & Execute: each VPlan must support both estimating the cost and
+ generating the output IR code, such that the cost estimation evaluates the
+ to-be-generated code reliably.
+
+3. Support vectorizing additional constructs:
+
+ a. Outer-loop vectorization. In particular, VPlan must be able to model the
+ control-flow of the output IR which may include multiple basic-blocks and
+ nested loops.
+ b. SLP vectorization.
+ c. Combinations of the above, including nested vectorization: vectorizing
+ both an inner loop and an outer-loop at the same time (each with its own
+ VF and UF), mixed vectorization: vectorizing a loop with SLP patterns
+ inside [4]_, (re)vectorizing input IR containing vector code.
+ d. Function vectorization [2]_.
+
+4. Support multiple candidates efficiently. In particular, similar candidates
+ related to a range of possible VF's and UF's must be represented efficiently.
+ Potential versioning needs to be supported efficiently.
+
+5. Support vectorizing idioms, such as interleaved groups of strided loads or
+ stores. This is achieved by modeling a sequence of output instructions using
+ a "Recipe", which is responsible for computing its cost and generating its
+ code.
+
+6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
+ such regions may need to be, for example, predicated and linearized, or
+ replicated VF*UF times to handle scalarized and predicated instructions.
+ Innerloops are also modelled as SESE regions.
+
+7. Support instruction-level analysis and transformation, as part of Planning
+ Step 2.b: During vectorization instructions may need to be traversed, moved,
+ replaced by other instructions or be created. For example, vector idiom
+ detection and formation involves searching for and optimizing instruction
+ patterns.
+
+Definitions
+===========
+The low-level design of VPlan comprises of the following classes.
+
+:LoopVectorizationPlanner:
+ A LoopVectorizationPlanner is designed to handle the vectorization of a loop
+ or a loop nest. It can construct, optimize and discard one or more VPlans,
+ each VPlan modelling a distinct way to vectorize the loop or the loop nest.
+ Once the best VPlan is determined, including the best VF and UF, this VPlan
+ drives the generation of output IR.
+
+:VPlan:
+ A model of a vectorized candidate for a given input IR loop or loop nest. This
+ candidate is represented using a Hierarchical CFG. VPlan supports estimating
+ the cost and driving the generation of the output IR code it represents.
+
+:Hierarchical CFG:
+ A control-flow graph whose nodes are basic-blocks or Hierarchical CFG's. The
+ Hierarchical CFG data structure is similar to the Tile Tree [5]_, where
+ cross-Tile edges are lifted to connect Tiles instead of the original
+ basic-blocks as in Sharir [6]_, promoting the Tile encapsulation. The terms
+ Region and Block are used rather than Tile [5]_ to avoid confusion with loop
+ tiling.
+
+:VPBlockBase:
+ The building block of the Hierarchical CFG. A pure-virtual base-class of
+ VPBasicBlock and VPRegionBlock, see below. VPBlockBase models the hierarchical
+ control-flow relations with other VPBlocks. Note that in contrast to the IR
+ BasicBlock, a VPBlockBase models its control-flow successors and predecessors
+ directly, rather than through a Terminator branch or through predecessor
+ branches that "use" the VPBlockBase.
+
+:VPBasicBlock:
+ VPBasicBlock is a subclass of VPBlockBase, and serves as the leaves of the
+ Hierarchical CFG. It represents a sequence of output IR instructions that will
+ appear consecutively in an output IR basic-block. The instructions of this
+ basic-block originate from one or more VPBasicBlocks. VPBasicBlock holds a
+ sequence of zero or more VPRecipes that model the cost and generation of the
+ output IR instructions.
+
+:VPRegionBlock:
+ VPRegionBlock is a subclass of VPBlockBase. It models a collection of
+ VPBasicBlocks and VPRegionBlocks which form a SESE subgraph of the output IR
+ CFG. A VPRegionBlock may indicate that its contents are to be replicated a
+ constant number of times when output IR is generated, effectively representing
+ a loop with constant trip-count that will be completely unrolled. This is used
+ to support scalarized and predicated instructions with a single model for
+ multiple candidate VF's and UF's.
+
+:VPRecipeBase:
+ A pure-virtual base class modeling a sequence of one or more output IR
+ instructions, possibly based on one or more input IR instructions. These
+ input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe
+ may specify how its ingredients are to be transformed to produce the output IR
+ instructions; e.g., cloned once, replicated multiple times or widened
+ according to selected VF.
+
+:VPValue:
+ The base of VPlan's def-use relations class hierarchy. When instantiated, it
+ models a constant or a live-in Value in VPlan. It has users, which are of type
+ VPUser, but no operands.
+
+:VPUser:
+ A VPValue representing a general vertex in the def-use graph of VPlan. It has
+ operands which are of type VPValue. When instantiated, it represents a
+ live-out Instruction that exists outside VPlan. VPUser is similar in some
+ aspects to LLVM's User class.
+
+:VPInstruction:
+ A VPInstruction is both a VPRecipe and a VPUser. It models a single
+ VPlan-level instruction to be generated if the VPlan is executed, including
+ its opcode and possibly additional characteristics. It is the basis for
+ writing instruction-level analyses and optimizations in VPlan as creating,
+ replacing or moving VPInstructions record both def-use and scheduling
+ decisions. VPInstructions also extend LLVM IR's opcodes with idiomatic
+ operations that enrich the Vectorizer's semantics.
+
+:VPTransformState:
+ Stores information used for generating output IR, passed from
+ LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
+ additional information down to VPBlocks and VPRecipes.
+
+The Planning Process and VPlan Roadmap
+======================================
+
+Transforming the Loop Vectorizer to use VPlan follows a staged approach. First,
+VPlan is used to record the final vectorization decisions, and to execute them:
+the Hierarchical CFG models the planned control-flow, and Recipes capture
+decisions taken inside basic-blocks. Next, VPlan will be used also as the basis
+for taking these decisions, effectively turning them into a series of
+VPlan-to-VPlan algorithms. Finally, VPlan will support the planning process
+itself including cost-based analyses for making these decisions, to fully
+support compositional and iterative decision making.
+
+Some decisions are local to an instruction in the loop, such as whether to widen
+it into a vector instruction or replicate it, keeping the generated instructions
+in place. Other decisions, however, involve moving instructions, replacing them
+with other instructions, and/or introducing new instructions. For example, a
+cast may sink past a later instruction and be widened to handle first-order
+recurrence; an interleave group of strided gathers or scatters may effectively
+move to one place where they are replaced with shuffles and a common wide vector
+load or store; new instructions may be introduced to compute masks, shuffle the
+elements of vectors, and pack scalar values into vectors or vice-versa.
+
+In order for VPlan to support making instruction-level decisions and analyses,
+it needs to model the relevant instructions along with their def/use relations.
+This too follows a staged approach: first, the new instructions that compute
+masks are modeled as VPInstructions, along with their induced def/use subgraph.
+This effectively models masks in VPlan, facilitating VPlan-based predication.
+Next, the logic embedded within each Recipe for generating its instructions at
+VPlan execution time, will instead take part in the planning process by modeling
+them as VPInstructions. Finally, only logic that applies to instructions as a
+group will remain in Recipes, such as interleave groups and potentially other
+idiom groups having synergistic cost.
+
+Related LLVM components
+-----------------------
+1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
+ tree, where TSLP [3]_ adds Plan Step 2.b.
+
+2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
+ Polly [7]_.
+
+3. Loop Vectorizer: the Vectorization Plan aims to upgrade the infrastructure of
+ the Loop Vectorizer and extend it to handle outer loops [8]_, [9]_.
+
+References
+----------
+.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit
+ Nuzman and Ayal Zaks, PACT 2008.
+
+.. [2] "Proposal for function vectorization and loop vectorization with function
+ calls", Xinmin Tian, [`cfe-dev
+ <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>`_].,
+ March 2, 2016.
+ See also `review <https://reviews.llvm.org/D22792>`_.
+
+.. [3] "Throttling Automatic Vectorization: When Less is More", Vasileios
+ Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015.
+
+.. [4] "Exploiting mixed SIMD parallelism by reducing data reorganization
+ overhead", Hao Zhou and Jingling Xue, CGO 2016.
+
+.. [5] "Register Allocation via Hierarchical Graph Coloring", David Callahan and
+ Brian Koblenz, PLDI 1991
+
+.. [6] "Structural analysis: A new approach to flow analysis in optimizing
+ compilers", M. Sharir, Journal of Computer Languages, Jan. 1980
+
+.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma
+ thesis, 2011.
+
+.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
+ European LLVM Developers' Meeting 2017.
+
+.. [9] "Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop
+ Auto-Vectorization", Intel Vectorizer Team, LLVM Developers' Meeting 2016.
Added: www-releases/trunk/8.0.1/docs/_sources/ReleaseNotes.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/ReleaseNotes.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/ReleaseNotes.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/ReleaseNotes.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,336 @@
+========================
+LLVM 8.0.0 Release Notes
+========================
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+This document contains the release notes for the LLVM Compiler Infrastructure,
+release 8.0.0. Here we describe the status of LLVM, including major improvements
+from the previous release, improvements in various subprojects of LLVM, and
+some of the current users of the code. All LLVM releases may be downloaded
+from the `LLVM releases web site <https://releases.llvm.org/>`_.
+
+For more information about LLVM, including information about the latest
+release, please check out the `main LLVM web site <https://llvm.org/>`_. If you
+have questions or comments, the `LLVM Developer's Mailing List
+<https://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
+them.
+
+Minimum Required Compiler Version
+=================================
+As `discussed on the mailing list
+<https://lists.llvm.org/pipermail/llvm-dev/2019-January/129452.html>`_,
+building LLVM will soon require more recent toolchains as follows:
+
+============= ====
+Clang 3.5
+Apple Clang 6.0
+GCC 5.1
+Visual Studio 2017
+============= ====
+
+A new CMake check when configuring LLVM provides a soft-error if your
+toolchain will become unsupported soon. You can opt out of the soft-error by
+setting the ``LLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN`` CMake variable to
+``ON``.
+
+
+Known Issues
+============
+
+These are issues that couldn't be fixed before the release. See the bug reports
+for the latest status.
+
+* `PR40547 <https://llvm.org/pr40547>`_ Clang gets miscompiled by trunk GCC.
+
+* `PR40761 <https://llvm.org/pr40761>`_ "asan-dynamic" doesn't work on FreeBSD.
+
+
+Non-comprehensive list of changes in this release
+=================================================
+
+* The **llvm-cov** tool can now export lcov trace files using the
+ `-format=lcov` option of the `export` command.
+
+* The ``add_llvm_loadable_module`` CMake macro has been removed. The
+ ``add_llvm_library`` macro with the ``MODULE`` argument now provides the same
+ functionality. See `Writing an LLVM Pass
+ <WritingAnLLVMPass.html#setting-up-the-build-environment>`_.
+
+* For MinGW, references to data variables that might need to be imported
+ from a dll are accessed via a stub, to allow the linker to convert it to
+ a dllimport if needed.
+
+* Added support for labels as offsets in ``.reloc`` directive.
+
+* Support for precise identification of X86 instructions with memory operands,
+ by using debug information. This supports profile-driven cache prefetching.
+ It is enabled with the ``-x86-discriminate-memops`` LLVM Flag.
+
+* Support for profile-driven software cache prefetching on X86. This is part of
+ a larger system, consisting of: an offline cache prefetches recommender,
+ AutoFDO tooling, and LLVM. In this system, a binary compiled with
+ ``-x86-discriminate-memops`` is run under the observation of the recommender.
+ The recommender identifies certain memory access instructions by their binary
+ file address, and recommends a prefetch of a specific type (NTA, T0, etc) be
+ performed at a specified fixed offset from such an instruction's memory
+ operand. Next, this information needs to be converted to the AutoFDO syntax
+ and the resulting profile may be passed back to the compiler with the LLVM
+ flag ``-prefetch-hints-file``, together with the exact same set of
+ compilation parameters used for the original binary. More information is
+ available in the `RFC
+ <https://lists.llvm.org/pipermail/llvm-dev/2018-November/127461.html>`_.
+
+* Windows support for libFuzzer (x86_64).
+
+Changes to the LLVM IR
+----------------------
+
+* Function attribute ``speculative_load_hardening`` has been introduced to
+ allow indicating that `Speculative Load Hardening
+ <SpeculativeLoadHardening.html>`_ must be enabled for the function body.
+
+
+Changes to the JIT APIs
+-----------------------
+
+The ORC (On Request Compilation) JIT APIs have been updated to support
+concurrent compilation. The existing (non-concurrent) ORC layer classes and
+related APIs are deprecated, have been renamed with a "Legacy" prefix (e.g.
+LegacyIRCompileLayer). The deprecated clasess will be removed in LLVM 9.
+
+An example JIT stack using the concurrent ORC APIs, called LLJIT, has been
+added (see include/llvm/ExecutionEngine/Orc/LLJIT.h). The lli tool has been
+updated to use LLJIT.
+
+MCJIT and ExecutionEngine continue to be supported, though ORC should be
+preferred for new projects.
+
+Changes to the C++ APIs
+-----------------------
+
+Three of the IR library methods related to debugging information for
+functions and methods have changed their prototypes:
+
+ DIBuilder::createMethod
+ DIBuilder::createFunction
+ DIBuilder::createTempFunctionFwdDecl
+
+In all cases, several individual parameters were removed, and replaced
+by a single 'SPFlags' (subprogram flags) parameter. The individual
+parameters are: 'isLocalToUnit'; 'isDefinition'; 'isOptimized'; and
+for 'createMethod', 'Virtuality'. The new 'SPFlags' parameter has a
+default value equivalent to passing 'false' for the three 'bool'
+parameters, and zero (non-virtual) to the 'Virtuality' parameter. For
+any old-style API call that passed 'true' or a non-zero virtuality to
+these methods, you will need to substitute the correct 'SPFlags' value.
+The helper method 'DISubprogram::toSPFlags()' might be useful in making
+this conversion.
+
+Changes to the AArch64 Target
+-----------------------------
+
+* Support for Speculative Load Hardening has been added.
+
+* Initial support for the Tiny code model, where code and its statically
+ defined symbols must live within 1MB of each other.
+
+* Added support for the ``.arch_extension`` assembler directive, just like
+ on ARM.
+
+
+Changes to the Hexagon Target
+-----------------------------
+
+* Added support for Hexagon/HVX V66 ISA.
+
+
+Changes to the MIPS Target
+--------------------------
+
+* Improved support of GlobalISel instruction selection framework.
+
+* Implemented emission of ``R_MIPS_JALR`` and ``R_MICROMIPS_JALR``
+ relocations. These relocations provide hints to a linker for optimization
+ of jumps to protected symbols.
+
+* ORC JIT has been supported for MIPS and MIPS64 architectures.
+
+* Assembler now suggests alternative MIPS instruction mnemonics when
+ an invalid one is specified.
+
+* Improved support for MIPS N32 ABI.
+
+* Added new instructions (``pll.ps``, ``plu.ps``, ``cvt.s.pu``,
+ ``cvt.s.pl``, ``cvt.ps``, ``sigrie``).
+
+* Numerous bug fixes and code cleanups.
+
+
+Changes to the PowerPC Target
+-----------------------------
+
+* Switched to non-PIC default
+
+* Deprecated Darwin support
+
+* Enabled Out-of-Order scheduling for P9
+
+* Better overload rules for compatible vector type parameter
+
+* Support constraint 'wi', modifier 'x' and VSX registers in inline asm
+
+* More ``__float128`` support
+
+* Added new builtins like vector int128 ``pack``/``unpack`` and
+ ``stxvw4x.be``/``stxvd2x.be``
+
+* Provided significant improvements to the automatic vectorizer
+
+* Code-gen improvements (especially for Power9)
+
+* Fixed some long-standing bugs in the back end
+
+* Added experimental prologue/epilogue improvements
+
+* Enabled builtins tests in compiler-rt
+
+* Add ``___fixunstfti``/``floattitf`` in compiler-rt to support conversion
+ between IBM double-double and unsigned int128
+
+* Disable randomized address space when running the sanitizers on Linux ppc64le
+
+* Completed support in LLD for ELFv2
+
+* Enabled llvm-exegesis latency mode for PPC
+
+
+Changes to the SystemZ Target
+-----------------------------
+
+* A number of bugs related to C/C++ language vector extension support were
+ fixed: the ``-mzvector`` option now actually enables the ``__vector`` and
+ ``__bool`` keywords, the ``vec_step`` intrinsic now works, and the
+ ``vec_insert_and_zero`` and ``vec_orc`` intrinsics now generate correct code.
+
+* The ``__float128`` keyword, which had been accidentally enabled in some
+ earlier releases, is now no longer supported. On SystemZ, the ``long double``
+ data type itself already uses the IEEE 128-bit floating-point format.
+
+* When the compiler inlines ``strcmp`` or ``memcmp``, the generated code no
+ longer returns ``INT_MIN`` as the negative result value under any
+ circumstances.
+
+* Various code-gen improvements, in particular related to improved
+ auto-vectorization, inlining, and instruction scheduling.
+
+
+Changes to the X86 Target
+-------------------------
+
+* Machine model for AMD bdver2 (Piledriver) CPU was added. It is used to support
+ instruction scheduling and other instruction cost heuristics.
+
+* New AVX512F gather and scatter intrinsics were added that take a <X x i1> mask
+ instead of a scalar integer. This removes the need for a bitcast in IR. The
+ new intrinsics are named like the old intrinsics with ``llvm.avx512.``
+ replaced with ``llvm.avx512.mask.``. The old intrinsics will be removed in a
+ future release.
+
+* Added ``cascadelake`` as a CPU name for -march. This is ``skylake-avx512``
+ with the addition of the ``avx512vnni`` instruction set.
+
+* ADCX instruction will no longer be emitted. This instruction is rarely better
+ than the legacy ADC instruction and just increased code size.
+
+
+Changes to the WebAssembly Target
+---------------------------------
+
+The WebAssembly target is no longer "experimental"! It's now built by default,
+rather than needing to be enabled with LLVM_EXPERIMENTAL_TARGETS_TO_BUILD.
+
+The object file format and core C ABI are now considered stable. That said,
+the object file format has an ABI versioning capability, and one anticipated
+use for it will be to add support for returning small structs as multiple
+return values, once the underlying WebAssembly platform itself supports it.
+Additionally, multithreading support is not yet included in the stable ABI.
+
+
+Changes to the Nios2 Target
+---------------------------
+
+* The Nios2 target was removed from this release.
+
+
+Changes to LLDB
+===============
+
+* Printed source code is now syntax highlighted in the terminal (only for C
+ languages).
+
+* The expression command now supports tab completing expressions.
+
+
+External Open Source Projects Using LLVM 8
+==========================================
+
+LDC - the LLVM-based D compiler
+-------------------------------
+
+`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
+pragmatically combines efficiency, control, and modeling power, with safety and
+programmer productivity. D supports powerful concepts like Compile-Time Function
+Execution (CTFE) and Template Meta-Programming, provides an innovative approach
+to concurrency and offers many classical paradigms.
+
+`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
+combined with LLVM as backend to produce efficient native code. LDC targets
+x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on ARM
+and PowerPC (32/64 bit). Ports to other architectures like AArch64 and MIPS64
+are underway.
+
+Open Dylan Compiler
+-------------------
+
+`Dylan <https://opendylan.org/>`_ is a multi-paradigm functional
+and object-oriented programming language. It is dynamic while
+providing a programming model designed to support efficient machine
+code generation, including fine-grained control over dynamic and
+static behavior. Dylan also features a powerful macro facility for
+expressive metaprogramming.
+
+The Open Dylan compiler can use LLVM as one of its code-generating
+back-ends, including full support for debug info generation. (Open
+Dylan generates LLVM bitcode directly using a native Dylan IR and
+bitcode library.) Development of a Dylan debugger and interactive REPL
+making use of the LLDB libraries is in progress.
+
+Zig Programming Language
+------------------------
+
+`Zig <https://ziglang.org>`_ is a system programming language intended to be
+an alternative to C. It provides high level features such as generics, compile
+time function execution, and partial evaluation, while exposing low level LLVM
+IR features such as aliases and intrinsics. Zig uses Clang to provide automatic
+import of .h symbols, including inline functions and simple macros. Zig uses
+LLD combined with lazily building compiler-rt to provide out-of-the-box
+cross-compiling for all supported targets.
+
+
+Additional Information
+======================
+
+A wide variety of additional information is available on the `LLVM web page
+<https://llvm.org/>`_, in particular in the `documentation
+<https://llvm.org/docs/>`_ section. The web page also contains versions of the
+API documentation which is up-to-date with the Subversion version of the source
+code. You can access versions of these documents specific to this release by
+going into the ``llvm/docs/`` directory in the LLVM tree.
+
+If you have any questions or comments about LLVM, please feel free to contact
+us via the `mailing lists <https://llvm.org/docs/#mailing-lists>`_.
Added: www-releases/trunk/8.0.1/docs/_sources/ReleaseProcess.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/ReleaseProcess.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/ReleaseProcess.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/ReleaseProcess.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,231 @@
+=============================
+How To Validate a New Release
+=============================
+
+.. contents::
+ :local:
+ :depth: 1
+
+Introduction
+============
+
+This document contains information about testing the release candidates that
+will ultimately be the next LLVM release. For more information on how to
+manage the actual release, please refer to :doc:`HowToReleaseLLVM`.
+
+Overview of the Release Process
+-------------------------------
+
+Once the release process starts, the Release Manager will ask for volunteers,
+and it'll be the role of each volunteer to:
+
+* Test and benchmark the previous release
+
+* Test and benchmark each release candidate, comparing to the previous release
+ and candidates
+
+* Identify, reduce and report every regression found during tests and benchmarks
+
+* Make sure the critical bugs get fixed and merged to the next release candidate
+
+Not all bugs or regressions are show-stoppers and it's a bit of a grey area what
+should be fixed before the next candidate and what can wait until the next
+release.
+
+It'll depend on:
+
+* The severity of the bug, how many people it affects and if it's a regression
+ or a known bug. Known bugs are "unsupported features" and some bugs can be
+ disabled if they have been implemented recently.
+
+* The stage in the release. Less critical bugs should be considered to be
+ fixed between RC1 and RC2, but not so much at the end of it.
+
+* If it's a correctness or a performance regression. Performance regression
+ tends to be taken more lightly than correctness.
+
+.. _scripts:
+
+Scripts
+=======
+
+The scripts are in the ``utils/release`` directory.
+
+test-release.sh
+---------------
+
+This script will check-out, configure and compile LLVM+Clang (+ most add-ons,
+like ``compiler-rt``, ``libcxx``, ``libomp`` and ``clang-extra-tools``) in
+three stages, and will test the final stage.
+It'll have installed the final binaries on the Phase3/Releasei(+Asserts)
+directory, and that's the one you should use for the test-suite and other
+external tests.
+
+To run the script on a specific release candidate run::
+
+ ./test-release.sh \
+ -release 3.3 \
+ -rc 1 \
+ -no-64bit \
+ -test-asserts \
+ -no-compare-files
+
+Each system will require different options. For instance, x86_64 will
+obviously not need ``-no-64bit`` while 32-bit systems will, or the script will
+fail.
+
+The important flags to get right are:
+
+* On the pre-release, you should change ``-rc 1`` to ``-final``. On RC2,
+ change it to ``-rc 2`` and so on.
+
+* On non-release testing, you can use ``-final`` in conjunction with
+ ``-no-checkout``, but you'll have to create the ``final`` directory by hand
+ and link the correct source dir to ``final/llvm.src``.
+
+* For release candidates, you need ``-test-asserts``, or it won't create a
+ "Release+Asserts" directory, which is needed for release testing and
+ benchmarking. This will take twice as long.
+
+* On the final candidate you just need Release builds, and that's the binary
+ directory you'll have to pack.
+
+This script builds three phases of Clang+LLVM twice each (Release and
+Release+Asserts), so use screen or nohup to avoid headaches, since it'll take
+a long time.
+
+Use the ``--help`` option to see all the options and chose it according to
+your needs.
+
+
+findRegressions-nightly.py
+--------------------------
+
+TODO
+
+.. _test-suite:
+
+Test Suite
+==========
+
+.. contents::
+ :local:
+
+Follow the `LNT Quick Start Guide
+<http://llvm.org/docs/lnt/quickstart.html>`__ link on how to set-up the
+test-suite
+
+The binary location you'll have to use for testing is inside the
+``rcN/Phase3/Release+Asserts/llvmCore-REL-RC.install``.
+Link that directory to an easier location and run the test-suite.
+
+An example on the run command line, assuming you created a link from the correct
+install directory to ``~/devel/llvm/install``::
+
+ ./sandbox/bin/python sandbox/bin/lnt runtest \
+ nt \
+ -j4 \
+ --sandbox sandbox \
+ --test-suite ~/devel/llvm/test/test-suite \
+ --cc ~/devel/llvm/install/bin/clang \
+ --cxx ~/devel/llvm/install/bin/clang++
+
+It should have no new regressions, compared to the previous release or release
+candidate. You don't need to fix all the bugs in the test-suite, since they're
+not necessarily meant to pass on all architectures all the time. This is
+due to the nature of the result checking, which relies on direct comparison,
+and most of the time, the failures are related to bad output checking, rather
+than bad code generation.
+
+If the errors are in LLVM itself, please report every single regression found
+as blocker, and all the other bugs as important, but not necessarily blocking
+the release to proceed. They can be set as "known failures" and to be
+fix on a future date.
+
+.. _pre-release-process:
+
+Pre-Release Process
+===================
+
+.. contents::
+ :local:
+
+When the release process is announced on the mailing list, you should prepare
+for the testing, by applying the same testing you'll do on the release
+candidates, on the previous release.
+
+You should:
+
+* Download the previous release sources from
+ http://llvm.org/releases/download.html.
+
+* Run the test-release.sh script on ``final`` mode (change ``-rc 1`` to
+ ``-final``).
+
+* Once all three stages are done, it'll test the final stage.
+
+* Using the ``Phase3/Release+Asserts/llvmCore-MAJ.MIN-final.install`` base,
+ run the test-suite.
+
+If the final phase's ``make check-all`` failed, it's a good idea to also test
+the intermediate stages by going on the obj directory and running
+``make check-all`` to find if there's at least one stage that passes (helps
+when reducing the error for bug report purposes).
+
+.. _release-process:
+
+Release Process
+===============
+
+.. contents::
+ :local:
+
+When the Release Manager sends you the release candidate, download all sources,
+unzip on the same directory (there will be sym-links from the appropriate places
+to them), and run the release test as above.
+
+You should:
+
+* Download the current candidate sources from where the release manager points
+ you (ex. http://llvm.org/pre-releases/3.3/rc1/).
+
+* Repeat the steps above with ``-rc 1``, ``-rc 2`` etc modes and run the
+ test-suite the same way.
+
+* Compare the results, report all errors on Bugzilla and publish the binary blob
+ where the release manager can grab it.
+
+Once the release manages announces that the latest candidate is the good one,
+you have to pack the ``Release`` (no Asserts) install directory on ``Phase3``
+and that will be the official binary.
+
+* Rename (or link) ``clang+llvm-REL-ARCH-ENV`` to the .install directory
+
+* Tar that into the same name with ``.tar.gz`` extensioan from outside the
+ directory
+
+* Make it available for the release manager to download
+
+.. _bug-reporting:
+
+Bug Reporting Process
+=====================
+
+.. contents::
+ :local:
+
+If you found regressions or failures when comparing a release candidate with the
+previous release, follow the rules below:
+
+* Critical bugs on compilation should be fixed as soon as possible, possibly
+ before releasing the binary blobs.
+
+* Check-all tests should be fixed before the next release candidate, but can
+ wait until the test-suite run is finished.
+
+* Bugs in the test suite or unimportant check-all tests can be fixed in between
+ release candidates.
+
+* New features or recent big changes, when close to the release, should have
+ done in a way that it's easy to disable. If they misbehave, prefer disabling
+ them than releasing an unstable (but untested) binary package.
Added: www-releases/trunk/8.0.1/docs/_sources/Remarks.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Remarks.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Remarks.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Remarks.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,305 @@
+=======
+Remarks
+=======
+
+.. contents::
+ :local:
+
+Introduction to the LLVM remark diagnostics
+===========================================
+
+LLVM is able to emit diagnostics from passes describing whether an optimization
+has been performed or missed for a particular reason, which should give more
+insight to users about what the compiler did during the compilation pipeline.
+
+There are three main remark types:
+
+``Passed``
+
+ Remarks that describe a successful optimization performed by the compiler.
+
+ :Example:
+
+ ::
+
+ foo inlined into bar with (cost=always): always inline attribute
+
+``Missed``
+
+ Remarks that describe an attempt to an optimization by the compiler that
+ could not be performed.
+
+ :Example:
+
+ ::
+
+ foo not inlined into bar because it should never be inlined
+ (cost=never): noinline function attribute
+
+``Analysis``
+
+ Remarks that describe the result of an analysis, that can bring more
+ information to the user regarding the generated code.
+
+ :Example:
+
+ ::
+
+ 16 stack bytes in function
+
+ ::
+
+ 10 instructions in function
+
+Enabling optimization remarks
+=============================
+
+There are two modes that are supported for enabling optimization remarks in
+LLVM: through remark diagnostics, or through serialized remarks.
+
+Remark diagnostics
+------------------
+
+Optimization remarks can be emitted as diagnostics. These diagnostics will be
+propagated to front-ends if desired, or emitted by tools like :doc:`llc
+<CommandGuide/llc>` or :doc:`opt <CommandGuide/opt>`.
+
+.. option:: -pass-remarks=<regex>
+
+ Enables optimization remarks from passes whose name match the given (POSIX)
+ regular expression.
+
+.. option:: -pass-remarks-missed=<regex>
+
+ Enables missed optimization remarks from passes whose name match the given
+ (POSIX) regular expression.
+
+.. option:: -pass-remarks-analysis=<regex>
+
+ Enables optimization analysis remarks from passes whose name match the given
+ (POSIX) regular expression.
+
+Serialized remarks
+------------------
+
+While diagnostics are useful during development, it is often more useful to
+refer to optimization remarks post-compilation, typically during performance
+analysis.
+
+For that, LLVM can serialize the remarks produced for each compilation unit to
+a file that can be consumed later.
+
+By default, the format of the serialized remarks is :ref:`YAML
+<yamlremarks>`, and it can be accompanied by a :ref:`section <remarkssection>`
+in the object files to easily retrieve it.
+
+:doc:`llc <CommandGuide/llc>` and :doc:`opt <CommandGuide/opt>` support the
+following options:
+
+
+``Basic options``
+
+ .. option:: -pass-remarks-output=<filename>
+
+ Enables the serialization of remarks to a file specified in <filename>.
+
+ By default, the output is serialized to :ref:`YAML <yamlremarks>`.
+
+ .. option:: -pass-remarks-format=<format>
+
+ Specifies the output format of the serialized remarks.
+
+ Supported formats:
+
+ * :ref:`yaml <yamlremarks>` (default)
+
+``Content configuration``
+
+ .. option:: -pass-remarks-filter=<regex>
+
+ Only passes whose name match the given (POSIX) regular expression will be
+ serialized to the final output.
+
+ .. option:: -pass-remarks-with-hotness
+
+ With PGO, include profile count in optimization remarks.
+
+ .. option:: -pass-remarks-hotness-threshold
+
+ The minimum profile count required for an optimization remark to be
+ emitted.
+
+Other tools that support remarks:
+
+:program:`llvm-lto`
+
+ .. option:: -lto-pass-remarks-output=<filename>
+ .. option:: -lto-pass-remarks-filter=<regex>
+ .. option:: -lto-pass-remarks-format=<format>
+ .. option:: -lto-pass-remarks-with-hotness
+ .. option:: -lto-pass-remarks-hotness-threshold
+
+:program:`gold-plugin` and :program:`lld`
+
+ .. option:: -opt-remarks-filename=<filename>
+ .. option:: -opt-remarks-filter=<regex>
+ .. option:: -opt-remarks-format=<format>
+ .. option:: -opt-remarks-with-hotness
+
+.. _yamlremarks:
+
+YAML remarks
+============
+
+A typical remark serialized to YAML looks like this:
+
+.. code-block:: yaml
+
+ --- !<TYPE>
+ Pass: <pass>
+ Name: <name>
+ DebugLoc: { File: <file>, Line: <line>, Column: <column> }
+ Function: <function>
+ Hotness: <hotness>
+ Args:
+ - <key>: <value>
+ DebugLoc: { File: <arg-file>, Line: <arg-line>, Column: <arg-column> }
+
+The following entries are mandatory:
+
+* ``<TYPE>``: can be ``Passed``, ``Missed``, ``Analysis``,
+ ``AnalysisFPCommute``, ``AnalysisAliasing``, ``Failure``.
+* ``<pass>``: the name of the pass that emitted this remark.
+* ``<name>``: the name of the remark coming from ``<pass>``.
+* ``<function>``: the mangled name of the function.
+
+If a ``DebugLoc`` entry is specified, the following fields are required:
+
+* ``<file>``
+* ``<line>``
+* ``<column>``
+
+If an ``arg`` entry is specified, the following fields are required:
+
+* ``<key>``
+* ``<value>``
+
+If a ``DebugLoc`` entry is specified within an ``arg`` entry, the following
+fields are required:
+
+* ``<arg-file>``
+* ``<arg-line>``
+* ``<arg-column>``
+
+opt-viewer
+==========
+
+The ``opt-viewer`` directory contains a collection of tools that visualize and
+summarize serialized remarks.
+
+.. _optviewerpy:
+
+opt-viewer.py
+-------------
+
+Output a HTML page which gives visual feedback on compiler interactions with
+your program.
+
+ :Examples:
+
+ ::
+
+ $ opt-viewer.py my_yaml_file.opt.yaml
+
+ ::
+
+ $ opt-viewer.py my_build_dir/
+
+
+opt-stats.py
+------------
+
+Output statistics about the optimization remarks in the input set.
+
+ :Example:
+
+ ::
+
+ $ opt-stats.py my_yaml_file.opt.yaml
+
+ Total number of remarks 3
+
+
+ Top 10 remarks by pass:
+ inline 33%
+ asm-printer 33%
+ prologepilog 33%
+
+ Top 10 remarks:
+ asm-printer/InstructionCount 33%
+ inline/NoDefinition 33%
+ prologepilog/StackSize 33%
+
+opt-diff.py
+-----------
+
+Produce a new YAML file which contains all of the changes in optimizations
+between two YAML files.
+
+Typically, this tool should be used to do diffs between:
+
+* new compiler + fixed source vs old compiler + fixed source
+* fixed compiler + new source vs fixed compiler + old source
+
+This diff file can be displayed using :ref:`opt-viewer.py <optviewerpy>`.
+
+ :Example:
+
+ ::
+
+ $ opt-diff.py my_opt_yaml1.opt.yaml my_opt_yaml2.opt.yaml -o my_opt_diff.opt.yaml
+ $ opt-viewer.py my_opt_diff.opt.yaml
+
+.. _remarkssection:
+
+Emitting remark diagnostics in the object file
+==============================================
+
+A section containing metadata on remark diagnostics will be emitted when
+-remarks-section is passed. The section contains:
+
+* a magic number: "REMARKS\\0"
+* the version number: a little-endian uint64_t
+* the total size of the string table (the size itself excluded):
+ little-endian uint64_t
+* a list of null-terminated strings
+* the absolute file path to the serialized remark diagnostics: a
+ null-terminated string.
+
+The section is named:
+
+* ``__LLVM,__remarks`` (MachO)
+* ``.remarks`` (ELF)
+
+C API
+=====
+
+LLVM provides a library that can be used to parse remarks through a shared
+library named ``libRemarks``.
+
+The typical usage through the C API is like the following:
+
+.. code-block:: c
+
+ LLVMRemarkParserRef Parser = LLVMRemarkParserCreateYAML(Buf, Size);
+ LLVMRemarkEntryRef Remark = NULL;
+ while ((Remark = LLVMRemarkParserGetNext(Parser))) {
+ // use Remark
+ }
+ bool HasError = LLVMRemarkParserHasError(Parser);
+ LLVMRemarkParserDispose(Parser);
+
+.. FIXME: add documentation for llvm-opt-report.
+.. FIXME: add documentation for Passes supporting optimization remarks
+.. FIXME: add documentation for IR Passes
+.. FIXME: add documentation for CodeGen Passes
Added: www-releases/trunk/8.0.1/docs/_sources/ReportingGuide.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/ReportingGuide.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/ReportingGuide.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/ReportingGuide.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,143 @@
+===============
+Reporting Guide
+===============
+
+.. note::
+
+ This document is currently a **DRAFT** document while it is being discussed
+ by the community.
+
+If you believe someone is violating the :doc:`code of conduct <CodeOfConduct>`
+you can always report it to the LLVM Foundation Code of Conduct Advisory
+Committee by emailing conduct at llvm.org. **All reports will be kept
+confidential.** This isn't a public list and only `members`_ of the advisory
+committee will receive the report.
+
+If you believe anyone is in **physical danger**, please notify appropriate law
+enforcement first. If you are unsure what law enforcement agency is
+appropriate, please include this in your report and we will attempt to notify
+them.
+
+If the violation occurs at an event such as a Developer Meeting and requires
+immediate attention, you can also reach out to any of the event organizers or
+staff. Event organizers and staff will be prepared to handle the incident and
+able to help. If you cannot find one of the organizers, the venue staff can
+locate one for you. We will also post detailed contact information for specific
+events as part of each events' information. In person reports will still be
+kept confidential exactly as above, but also feel free to (anonymously if
+needed) email conduct at llvm.org.
+
+.. note::
+ The LLVM community has long handled inappropriate behavior on its own, using
+ both private communication and public responses. Nothing in this document is
+ intended to discourage this self enforcement of community norms. Instead,
+ the mechanisms described here are intended to supplement any self
+ enforcement within the community. They provide avenues for handling severe
+ cases or cases where the reporting party does not wish to respond directly
+ for any reason.
+
+Filing a report
+===============
+
+Reports can be as formal or informal as needed for the situation at hand. If
+possible, please include as much information as you can. If you feel
+comfortable, please consider including:
+
+* Your contact info (so we can get in touch with you if we need to follow up).
+* Names (real, nicknames, or pseudonyms) of any individuals involved. If there
+ were other witnesses besides you, please try to include them as well.
+* When and where the incident occurred. Please be as specific as possible.
+* Your account of what occurred. If there is a publicly available record (e.g.
+ a mailing list archive or a public IRC logger) please include a link.
+* Any extra context you believe existed for the incident.
+* If you believe this incident is ongoing.
+* Any other information you believe we should have.
+
+What happens after you file a report?
+=====================================
+
+You will receive an email from the advisory committee acknowledging receipt
+within 24 hours (and we will aim to respond much quicker than that).
+
+The advisory committee will immediately meet to review the incident and try to
+determine:
+
+* What happened and who was involved.
+* Whether this event constitutes a code of conduct violation.
+* Whether this is an ongoing situation, or if there is a threat to anyone's
+ physical safety.
+
+If this is determined to be an ongoing incident or a threat to physical safety,
+the working groups' immediate priority will be to protect everyone involved.
+This means we may delay an "official" response until we believe that the
+situation has ended and that everyone is physically safe.
+
+The working group will try to contact other parties involved or witnessing the
+event to gain clarity on what happened and understand any different
+perspectives.
+
+Once the advisory committee has a complete account of the events they will make
+a decision as to how to respond. Responses may include:
+
+* Nothing, if we determine no violation occurred or it has already been
+ appropriately resolved.
+* Providing either moderation or mediation to ongoing interactions (where
+ appropriate, safe, and desired by both parties).
+* A private reprimand from the working group to the individuals involved.
+* An imposed vacation (i.e. asking someone to "take a week off" from a mailing
+ list or IRC).
+* A public reprimand.
+* A permanent or temporary ban from some or all LLVM spaces (mailing lists,
+ IRC, etc.)
+* Involvement of relevant law enforcement if appropriate.
+
+If the situation is not resolved within one week, we'll respond within one week
+to the original reporter with an update and explanation.
+
+Once we've determined our response, we will separately contact the original
+reporter and other individuals to let them know what actions (if any) we'll be
+taking. We will take into account feedback from the individuals involved on the
+appropriateness of our response, but we don't guarantee we'll act on it.
+
+After any incident, the advisory committee will make a report on the situation
+to the LLVM Foundation board. The board may choose to make a public statement
+about the incident. If that's the case, the identities of anyone involved will
+remain confidential unless instructed by those inviduals otherwise.
+
+Appealing
+=========
+
+Only permanent resolutions (such as bans) or requests for public actions may be
+appealed. To appeal a decision of the working group, contact the LLVM
+Foundation board at board at llvm.org with your appeal and the board will review
+the case.
+
+In general, it is **not** appropriate to appeal a particular decision on
+a public mailing list. Doing so would involve disclosure of information which
+whould be confidential. Disclosing this kind of information publicly may be
+considered a separate and (potentially) more serious violation of the Code of
+Conduct. This is not meant to limit discussion of the Code of Conduct, the
+advisory board itself, or the appropriateness of responses in general, but
+**please** refrain from mentioning specific facts about cases without the
+explicit permission of all parties involved.
+
+.. _members:
+
+Members of the Code of Conduct Advisory Committee
+=================================================
+
+The members serving on the advisory committee are listed here with contact
+information in case you are more comfortable talking directly to a specific
+member of the committee.
+
+.. note::
+
+ FIXME: When we form the initial advisory committee, the members names and private contact info need to be added here.
+
+
+
+(This text is based on the `Django Project`_ Code of Conduct, which is in turn
+based on wording from the `Speak Up! project`_.)
+
+.. _Django Project: https://www.djangoproject.com/conduct/
+.. _Speak Up! project: http://speakup.io/coc.html
Added: www-releases/trunk/8.0.1/docs/_sources/ScudoHardenedAllocator.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/ScudoHardenedAllocator.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/ScudoHardenedAllocator.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/ScudoHardenedAllocator.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,203 @@
+========================
+Scudo Hardened Allocator
+========================
+
+.. contents::
+ :local:
+ :depth: 1
+
+Introduction
+============
+
+The Scudo Hardened Allocator is a user-mode allocator based on LLVM Sanitizer's
+CombinedAllocator, which aims at providing additional mitigations against heap
+based vulnerabilities, while maintaining good performance.
+
+Currently, the allocator supports (was tested on) the following architectures:
+
+- i386 (& i686) (32-bit);
+- x86_64 (64-bit);
+- armhf (32-bit);
+- AArch64 (64-bit);
+- MIPS (32-bit & 64-bit).
+
+The name "Scudo" has been retained from the initial implementation (Escudo
+meaning Shield in Spanish and Portuguese).
+
+Design
+======
+
+Allocator
+---------
+Scudo can be considered a Frontend to the Sanitizers' common allocator (later
+referenced as the Backend). It is split between a Primary allocator, fast and
+efficient, that services smaller allocation sizes, and a Secondary allocator
+that services larger allocation sizes and is backed by the operating system
+memory mapping primitives.
+
+Scudo was designed with security in mind, but aims at striking a good balance
+between security and performance. It is highly tunable and configurable.
+
+Chunk Header
+------------
+Every chunk of heap memory will be preceded by a chunk header. This has two
+purposes, the first one being to store various information about the chunk,
+the second one being to detect potential heap overflows. In order to achieve
+this, the header will be checksummed, involving the pointer to the chunk itself
+and a global secret. Any corruption of the header will be detected when said
+header is accessed, and the process terminated.
+
+The following information is stored in the header:
+
+- the 16-bit checksum;
+- the class ID for that chunk, which is the "bucket" where the chunk resides
+ for Primary backed allocations, or 0 for Secondary backed allocations;
+- the size (Primary) or unused bytes amount (Secondary) for that chunk, which is
+ necessary for computing the size of the chunk;
+- the state of the chunk (available, allocated or quarantined);
+- the allocation type (malloc, new, new[] or memalign), to detect potential
+ mismatches in the allocation APIs used;
+- the offset of the chunk, which is the distance in bytes from the beginning of
+ the returned chunk to the beginning of the Backend allocation;
+
+This header fits within 8 bytes, on all platforms supported.
+
+The checksum is computed as a CRC32 (made faster with hardware support)
+of the global secret, the chunk pointer itself, and the 8 bytes of header with
+the checksum field zeroed out. It is not intended to be cryptographically
+strong.
+
+The header is atomically loaded and stored to prevent races. This is important
+as two consecutive chunks could belong to different threads. We also want to
+avoid any type of double fetches of information located in the header, and use
+local copies of the header for this purpose.
+
+Delayed Freelist
+-----------------
+A delayed freelist allows us to not return a chunk directly to the Backend, but
+to keep it aside for a while. Once a criterion is met, the delayed freelist is
+emptied, and the quarantined chunks are returned to the Backend. This helps
+mitigate use-after-free vulnerabilities by reducing the determinism of the
+allocation and deallocation patterns.
+
+This feature is using the Sanitizer's Quarantine as its base, and the amount of
+memory that it can hold is configurable by the user (see the Options section
+below).
+
+Randomness
+----------
+It is important for the allocator to not make use of fixed addresses. We use
+the dynamic base option for the SizeClassAllocator, allowing us to benefit
+from the randomness of the system memory mapping functions.
+
+Usage
+=====
+
+Library
+-------
+The allocator static library can be built from the LLVM build tree thanks to
+the ``scudo`` CMake rule. The associated tests can be exercised thanks to the
+``check-scudo`` CMake rule.
+
+Linking the static library to your project can require the use of the
+``whole-archive`` linker flag (or equivalent), depending on your linker.
+Additional flags might also be necessary.
+
+Your linked binary should now make use of the Scudo allocation and deallocation
+functions.
+
+You may also build Scudo like this:
+
+.. code::
+
+ cd $LLVM/projects/compiler-rt/lib
+ clang++ -fPIC -std=c++11 -msse4.2 -O2 -I. scudo/*.cpp \
+ $(\ls sanitizer_common/*.{cc,S} | grep -v "sanitizer_termination\|sanitizer_common_nolibc\|sancov_\|sanitizer_unwind\|sanitizer_symbol") \
+ -shared -o libscudo.so -pthread
+
+and then use it with existing binaries as follows:
+
+.. code::
+
+ LD_PRELOAD=`pwd`/libscudo.so ./a.out
+
+Clang
+-----
+With a recent version of Clang (post rL317337), the allocator can be linked with
+a binary at compilation using the ``-fsanitize=scudo`` command-line argument, if
+the target platform is supported. Currently, the only other Sanitizer Scudo is
+compatible with is UBSan (eg: ``-fsanitize=scudo,undefined``). Compiling with
+Scudo will also enforce PIE for the output binary.
+
+Options
+-------
+Several aspects of the allocator can be configured on a per process basis
+through the following ways:
+
+- at compile time, by defining ``SCUDO_DEFAULT_OPTIONS`` to the options string
+ you want set by default;
+
+- by defining a ``__scudo_default_options`` function in one's program that
+ returns the options string to be parsed. Said function must have the following
+ prototype: ``extern "C" const char* __scudo_default_options(void)``, with a
+ default visibility. This will override the compile time define;
+
+- through the environment variable SCUDO_OPTIONS, containing the options string
+ to be parsed. Options defined this way will override any definition made
+ through ``__scudo_default_options``.
+
+The options string follows a syntax similar to ASan, where distinct options
+can be assigned in the same string, separated by colons.
+
+For example, using the environment variable:
+
+.. code::
+
+ SCUDO_OPTIONS="DeleteSizeMismatch=1:QuarantineSizeKb=64" ./a.out
+
+Or using the function:
+
+.. code:: cpp
+
+ extern "C" const char *__scudo_default_options() {
+ return "DeleteSizeMismatch=1:QuarantineSizeKb=64";
+ }
+
+
+The following options are available:
+
++-----------------------------+----------------+----------------+------------------------------------------------+
+| Option | 64-bit default | 32-bit default | Description |
++-----------------------------+----------------+----------------+------------------------------------------------+
+| QuarantineSizeKb | 256 | 64 | The size (in Kb) of quarantine used to delay |
+| | | | the actual deallocation of chunks. Lower value |
+| | | | may reduce memory usage but decrease the |
+| | | | effectiveness of the mitigation; a negative |
+| | | | value will fallback to the defaults. Setting |
+| | | | *both* this and ThreadLocalQuarantineSizeKb to |
+| | | | zero will disable the quarantine entirely. |
++-----------------------------+----------------+----------------+------------------------------------------------+
+| QuarantineChunksUpToSize | 2048 | 512 | Size (in bytes) up to which chunks can be |
+| | | | quarantined. |
++-----------------------------+----------------+----------------+------------------------------------------------+
+| ThreadLocalQuarantineSizeKb | 1024 | 256 | The size (in Kb) of per-thread cache use to |
+| | | | offload the global quarantine. Lower value may |
+| | | | reduce memory usage but might increase |
+| | | | contention on the global quarantine. Setting |
+| | | | *both* this and QuarantineSizeKb to zero will |
+| | | | disable the quarantine entirely. |
++-----------------------------+----------------+----------------+------------------------------------------------+
+| DeallocationTypeMismatch | true | true | Whether or not we report errors on |
+| | | | malloc/delete, new/free, new/delete[], etc. |
++-----------------------------+----------------+----------------+------------------------------------------------+
+| DeleteSizeMismatch | true | true | Whether or not we report errors on mismatch |
+| | | | between sizes of new and delete. |
++-----------------------------+----------------+----------------+------------------------------------------------+
+| ZeroContents | false | false | Whether or not we zero chunk contents on |
+| | | | allocation and deallocation. |
++-----------------------------+----------------+----------------+------------------------------------------------+
+
+Allocator related common Sanitizer options can also be passed through Scudo
+options, such as ``allocator_may_return_null`` or ``abort_on_error``. A detailed
+list including those can be found here:
+https://github.com/google/sanitizers/wiki/SanitizerCommonFlags.
Added: www-releases/trunk/8.0.1/docs/_sources/SegmentedStacks.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/SegmentedStacks.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/SegmentedStacks.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/SegmentedStacks.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,77 @@
+========================
+Segmented Stacks in LLVM
+========================
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+Segmented stack allows stack space to be allocated incrementally than as a
+monolithic chunk (of some worst case size) at thread initialization. This is
+done by allocating stack blocks (henceforth called *stacklets*) and linking them
+into a doubly linked list. The function prologue is responsible for checking if
+the current stacklet has enough space for the function to execute; and if not,
+call into the libgcc runtime to allocate more stack space. Segmented stacks are
+enabled with the ``"split-stack"`` attribute on LLVM functions.
+
+The runtime functionality is `already there in libgcc
+<http://gcc.gnu.org/wiki/SplitStacks>`_.
+
+Implementation Details
+======================
+
+.. _allocating stacklets:
+
+Allocating Stacklets
+--------------------
+
+As mentioned above, the function prologue checks if the current stacklet has
+enough space. The current approach is to use a slot in the TCB to store the
+current stack limit (minus the amount of space needed to allocate a new block) -
+this slot's offset is again dictated by ``libgcc``. The generated
+assembly looks like this on x86-64:
+
+.. code-block:: text
+
+ leaq -8(%rsp), %r10
+ cmpq %fs:112, %r10
+ jg .LBB0_2
+
+ # More stack space needs to be allocated
+ movabsq $8, %r10 # The amount of space needed
+ movabsq $0, %r11 # The total size of arguments passed on stack
+ callq __morestack
+ ret # The reason for this extra return is explained below
+ .LBB0_2:
+ # Usual prologue continues here
+
+The size of function arguments on the stack needs to be passed to
+``__morestack`` (this function is implemented in ``libgcc``) since that number
+of bytes has to be copied from the previous stacklet to the current one. This is
+so that SP (and FP) relative addressing of function arguments work as expected.
+
+The unusual ``ret`` is needed to have the function which made a call to
+``__morestack`` return correctly. ``__morestack``, instead of returning, calls
+into ``.LBB0_2``. This is possible since both, the size of the ``ret``
+instruction and the PC of call to ``__morestack`` are known. When the function
+body returns, control is transferred back to ``__morestack``. ``__morestack``
+then de-allocates the new stacklet, restores the correct SP value, and does a
+second return, which returns control to the correct caller.
+
+Variable Sized Allocas
+----------------------
+
+The section on `allocating stacklets`_ automatically assumes that every stack
+frame will be of fixed size. However, LLVM allows the use of the ``llvm.alloca``
+intrinsic to allocate dynamically sized blocks of memory on the stack. When
+faced with such a variable-sized alloca, code is generated to:
+
+* Check if the current stacklet has enough space. If yes, just bump the SP, like
+ in the normal case.
+* If not, generate a call to ``libgcc``, which allocates the memory from the
+ heap.
+
+The memory allocated from the heap is linked into a list in the current
+stacklet, and freed along with the same. This prevents a memory leak.
Added: www-releases/trunk/8.0.1/docs/_sources/SourceLevelDebugging.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/SourceLevelDebugging.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/SourceLevelDebugging.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/SourceLevelDebugging.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,1664 @@
+================================
+Source Level Debugging with LLVM
+================================
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+This document is the central repository for all information pertaining to debug
+information in LLVM. It describes the :ref:`actual format that the LLVM debug
+information takes <format>`, which is useful for those interested in creating
+front-ends or dealing directly with the information. Further, this document
+provides specific examples of what debug information for C/C++ looks like.
+
+Philosophy behind LLVM debugging information
+--------------------------------------------
+
+The idea of the LLVM debugging information is to capture how the important
+pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
+Several design aspects have shaped the solution that appears here. The
+important ones are:
+
+* Debugging information should have very little impact on the rest of the
+ compiler. No transformations, analyses, or code generators should need to
+ be modified because of debugging information.
+
+* LLVM optimizations should interact in :ref:`well-defined and easily described
+ ways <intro_debugopt>` with the debugging information.
+
+* Because LLVM is designed to support arbitrary programming languages,
+ LLVM-to-LLVM tools should not need to know anything about the semantics of
+ the source-level-language.
+
+* Source-level languages are often **widely** different from one another.
+ LLVM should not put any restrictions of the flavor of the source-language,
+ and the debugging information should work with any language.
+
+* With code generator support, it should be possible to use an LLVM compiler
+ to compile a program to native machine code and standard debugging
+ formats. This allows compatibility with traditional machine-code level
+ debuggers, like GDB or DBX.
+
+The approach used by the LLVM implementation is to use a small set of
+:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping
+between LLVM program objects and the source-level objects. The description of
+the source-level program is maintained in LLVM metadata in an
+:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end
+currently uses working draft 7 of the `DWARF 3 standard
+<http://www.eagercon.com/dwarf/dwarf3std.htm>`_).
+
+When a program is being debugged, a debugger interacts with the user and turns
+the stored debug information into source-language specific information. As
+such, a debugger must be aware of the source-language, and is thus tied to a
+specific language or family of languages.
+
+Debug information consumers
+---------------------------
+
+The role of debug information is to provide meta information normally stripped
+away during the compilation process. This meta information provides an LLVM
+user a relationship between generated code and the original program source
+code.
+
+Currently, there are two backend consumers of debug info: DwarfDebug and
+CodeViewDebug. DwarfDebug produces DWARF suitable for use with GDB, LLDB, and
+other DWARF-based debuggers. :ref:`CodeViewDebug <codeview>` produces CodeView,
+the Microsoft debug info format, which is usable with Microsoft debuggers such
+as Visual Studio and WinDBG. LLVM's debug information format is mostly derived
+from and inspired by DWARF, but it is feasible to translate into other target
+debug info formats such as STABS.
+
+It would also be reasonable to use debug information to feed profiling tools
+for analysis of generated code, or, tools for reconstructing the original
+source from generated code.
+
+.. _intro_debugopt:
+
+Debug information and optimizations
+-----------------------------------
+
+An extremely high priority of LLVM debugging information is to make it interact
+well with optimizations and analysis. In particular, the LLVM debug
+information provides the following guarantees:
+
+* LLVM debug information **always provides information to accurately read
+ the source-level state of the program**, regardless of which LLVM
+ optimizations have been run, and without any modification to the
+ optimizations themselves. However, some optimizations may impact the
+ ability to modify the current state of the program with a debugger, such
+ as setting program variables, or calling functions that have been
+ deleted.
+
+* As desired, LLVM optimizations can be upgraded to be aware of debugging
+ information, allowing them to update the debugging information as they
+ perform aggressive optimizations. This means that, with effort, the LLVM
+ optimizers could optimize debug code just as well as non-debug code.
+
+* LLVM debug information does not prevent optimizations from
+ happening (for example inlining, basic block reordering/merging/cleanup,
+ tail duplication, etc).
+
+* LLVM debug information is automatically optimized along with the rest of
+ the program, using existing facilities. For example, duplicate
+ information is automatically merged by the linker, and unused information
+ is automatically removed.
+
+Basically, the debug information allows you to compile a program with
+"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify
+the program as it executes from a debugger. Compiling a program with
+"``-O3 -g``" gives you full debug information that is always available and
+accurate for reading (e.g., you get accurate stack traces despite tail call
+elimination and inlining), but you might lose the ability to modify the program
+and call functions which were optimized out of the program, or inlined away
+completely.
+
+The :doc:`LLVM test-suite <TestSuiteMakefileGuide>` provides a framework to
+test the optimizer's handling of debugging information. It can be run like
+this:
+
+.. code-block:: bash
+
+ % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
+ % make TEST=dbgopt
+
+This will test impact of debugging information on optimization passes. If
+debugging information influences optimization passes then it will be reported
+as a failure. See :doc:`TestingGuide` for more information on LLVM test
+infrastructure and how to run various tests.
+
+.. _format:
+
+Debugging information format
+============================
+
+LLVM debugging information has been carefully designed to make it possible for
+the optimizer to optimize the program and debugging information without
+necessarily having to know anything about debugging information. In
+particular, the use of metadata avoids duplicated debugging information from
+the beginning, and the global dead code elimination pass automatically deletes
+debugging information for a function if it decides to delete the function.
+
+To do this, most of the debugging information (descriptors for types,
+variables, functions, source files, etc) is inserted by the language front-end
+in the form of LLVM metadata.
+
+Debug information is designed to be agnostic about the target debugger and
+debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic
+pass to decode the information that represents variables, types, functions,
+namespaces, etc: this allows for arbitrary source-language semantics and
+type-systems to be used, as long as there is a module written for the target
+debugger to interpret the information.
+
+To provide basic functionality, the LLVM debugger does have to make some
+assumptions about the source-level language being debugged, though it keeps
+these to a minimum. The only common features that the LLVM debugger assumes
+exist are `source files <LangRef.html#difile>`_, and `program objects
+<LangRef.html#diglobalvariable>`_. These abstract objects are used by a
+debugger to form stack traces, show information about local variables, etc.
+
+This section of the documentation first describes the representation aspects
+common to any source-language. :ref:`ccxx_frontend` describes the data layout
+conventions used by the C and C++ front-ends.
+
+Debug information descriptors are `specialized metadata nodes
+<LangRef.html#specialized-metadata>`_, first-class subclasses of ``Metadata``.
+
+.. _format_common_intrinsics:
+
+Debugger intrinsic functions
+----------------------------
+
+LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to
+track source local variables through optimization and code generation.
+
+``llvm.dbg.addr``
+^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ void @llvm.dbg.addr(metadata, metadata, metadata)
+
+This intrinsic provides information about a local element (e.g., variable).
+The first argument is metadata holding the address of variable, typically a
+static alloca in the function entry block. The second argument is a
+`local variable <LangRef.html#dilocalvariable>`_ containing a description of
+the variable. The third argument is a `complex expression
+<LangRef.html#diexpression>`_. An `llvm.dbg.addr` intrinsic describes the
+*address* of a source variable.
+
+.. code-block:: text
+
+ %i.addr = alloca i32, align 4
+ call void @llvm.dbg.addr(metadata i32* %i.addr, metadata !1,
+ metadata !DIExpression()), !dbg !2
+ !1 = !DILocalVariable(name: "i", ...) ; int i
+ !2 = !DILocation(...)
+ ...
+ %buffer = alloca [256 x i8], align 8
+ ; The address of i is buffer+64.
+ call void @llvm.dbg.addr(metadata [256 x i8]* %buffer, metadata !3,
+ metadata !DIExpression(DW_OP_plus, 64)), !dbg !4
+ !3 = !DILocalVariable(name: "i", ...) ; int i
+ !4 = !DILocation(...)
+
+A frontend should generate exactly one call to ``llvm.dbg.addr`` at the point
+of declaration of a source variable. Optimization passes that fully promote the
+variable from memory to SSA values will replace this call with possibly
+multiple calls to `llvm.dbg.value`. Passes that delete stores are effectively
+partial promotion, and they will insert a mix of calls to ``llvm.dbg.value``
+and ``llvm.dbg.addr`` to track the source variable value when it is available.
+After optimization, there may be multiple calls to ``llvm.dbg.addr`` describing
+the program points where the variables lives in memory. All calls for the same
+concrete source variable must agree on the memory location.
+
+
+``llvm.dbg.declare``
+^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ void @llvm.dbg.declare(metadata, metadata, metadata)
+
+This intrinsic is identical to `llvm.dbg.addr`, except that there can only be
+one call to `llvm.dbg.declare` for a given concrete `local variable
+<LangRef.html#dilocalvariable>`_. It is not control-dependent, meaning that if
+a call to `llvm.dbg.declare` exists and has a valid location argument, that
+address is considered to be the true home of the variable across its entire
+lifetime. This makes it hard for optimizations to preserve accurate debug info
+in the presence of ``llvm.dbg.declare``, so we are transitioning away from it,
+and we plan to deprecate it in future LLVM releases.
+
+
+``llvm.dbg.value``
+^^^^^^^^^^^^^^^^^^
+
+.. code-block:: llvm
+
+ void @llvm.dbg.value(metadata, metadata, metadata)
+
+This intrinsic provides information when a user source variable is set to a new
+value. The first argument is the new value (wrapped as metadata). The second
+argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a
+description of the variable. The third argument is a `complex expression
+<LangRef.html#diexpression>`_.
+
+An `llvm.dbg.value` intrinsic describes the *value* of a source variable
+directly, not its address. Note that the value operand of this intrinsic may
+be indirect (i.e, a pointer to the source variable), provided that interpreting
+the complex expression derives the direct value.
+
+Object lifetimes and scoping
+============================
+
+In many languages, the local variables in functions can have their lifetimes or
+scopes limited to a subset of a function. In the C family of languages, for
+example, variables are only live (readable and writable) within the source
+block that they are defined in. In functional languages, values are only
+readable after they have been defined. Though this is a very obvious concept,
+it is non-trivial to model in LLVM, because it has no notion of scoping in this
+sense, and does not want to be tied to a language's scoping rules.
+
+In order to handle this, the LLVM debug format uses the metadata attached to
+llvm instructions to encode line number and scoping information. Consider the
+following C fragment, for example:
+
+.. code-block:: c
+
+ 1. void foo() {
+ 2. int X = 21;
+ 3. int Y = 22;
+ 4. {
+ 5. int Z = 23;
+ 6. Z = X;
+ 7. }
+ 8. X = Y;
+ 9. }
+
+.. FIXME: Update the following example to use llvm.dbg.addr once that is the
+ default in clang.
+
+Compiled to LLVM, this function would be represented like this:
+
+.. code-block:: text
+
+ ; Function Attrs: nounwind ssp uwtable
+ define void @foo() #0 !dbg !4 {
+ entry:
+ %X = alloca i32, align 4
+ %Y = alloca i32, align 4
+ %Z = alloca i32, align 4
+ call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
+ store i32 21, i32* %X, align 4, !dbg !14
+ call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16
+ store i32 22, i32* %Y, align 4, !dbg !16
+ call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
+ store i32 23, i32* %Z, align 4, !dbg !19
+ %0 = load i32, i32* %X, align 4, !dbg !20
+ store i32 %0, i32* %Z, align 4, !dbg !21
+ %1 = load i32, i32* %Y, align 4, !dbg !22
+ store i32 %1, i32* %X, align 4, !dbg !23
+ ret void, !dbg !24
+ }
+
+ ; Function Attrs: nounwind readnone
+ declare void @llvm.dbg.declare(metadata, metadata, metadata) #1
+
+ attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+ attributes #1 = { nounwind readnone }
+
+ !llvm.dbg.cu = !{!0}
+ !llvm.module.flags = !{!7, !8, !9}
+ !llvm.ident = !{!10}
+
+ !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2)
+ !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info")
+ !2 = !{}
+ !3 = !{!4}
+ !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, variables: !2)
+ !5 = !DISubroutineType(types: !6)
+ !6 = !{null}
+ !7 = !{i32 2, !"Dwarf Version", i32 2}
+ !8 = !{i32 2, !"Debug Info Version", i32 3}
+ !9 = !{i32 1, !"PIC Level", i32 2}
+ !10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"}
+ !11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12)
+ !12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
+ !13 = !DIExpression()
+ !14 = !DILocation(line: 2, column: 9, scope: !4)
+ !15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12)
+ !16 = !DILocation(line: 3, column: 9, scope: !4)
+ !17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12)
+ !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
+ !19 = !DILocation(line: 5, column: 11, scope: !18)
+ !20 = !DILocation(line: 6, column: 11, scope: !18)
+ !21 = !DILocation(line: 6, column: 9, scope: !18)
+ !22 = !DILocation(line: 8, column: 9, scope: !4)
+ !23 = !DILocation(line: 8, column: 7, scope: !4)
+ !24 = !DILocation(line: 9, column: 3, scope: !4)
+
+
+This example illustrates a few important details about LLVM debugging
+information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and
+location information, which are attached to an instruction, are applied
+together to allow a debugger to analyze the relationship between statements,
+variable definitions, and the code used to implement the function.
+
+.. code-block:: llvm
+
+ call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14
+ ; [debug line = 2:7] [debug variable = X]
+
+The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the
+variable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides
+scope information for the variable ``X``.
+
+.. code-block:: text
+
+ !14 = !DILocation(line: 2, column: 9, scope: !4)
+ !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5,
+ isLocal: false, isDefinition: true, scopeLine: 1,
+ isOptimized: false, variables: !2)
+
+Here ``!14`` is metadata providing `location information
+<LangRef.html#dilocation>`_. In this example, scope is encoded by ``!4``, a
+`subprogram descriptor <LangRef.html#disubprogram>`_. This way the location
+information attached to the intrinsics indicates that the variable ``X`` is
+declared at line number 2 at a function level scope in function ``foo``.
+
+Now lets take another example.
+
+.. code-block:: llvm
+
+ call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19
+ ; [debug line = 5:9] [debug variable = Z]
+
+The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for
+variable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides
+scope information for the variable ``Z``.
+
+.. code-block:: text
+
+ !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5)
+ !19 = !DILocation(line: 5, column: 11, scope: !18)
+
+Here ``!19`` indicates that ``Z`` is declared at line number 5 and column
+number 11 inside of lexical scope ``!18``. The lexical scope itself resides
+inside of subprogram ``!4`` described above.
+
+The scope information attached with each instruction provides a straightforward
+way to find instructions covered by a scope.
+
+.. _ccxx_frontend:
+
+C/C++ front-end specific debug information
+==========================================
+
+The C and C++ front-ends represent information about the program in a format
+that is effectively identical to `DWARF 3.0
+<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information
+content. This allows code generators to trivially support native debuggers by
+generating standard dwarf information, and contains enough information for
+non-dwarf targets to translate it as needed.
+
+This section describes the forms used to represent C and C++ programs. Other
+languages could pattern themselves after this (which itself is tuned to
+representing programs in the same way that DWARF 3 does), or they could choose
+to provide completely different forms if they don't fit into the DWARF model.
+As support for debugging information gets added to the various LLVM
+source-language front-ends, the information used should be documented here.
+
+The following sections provide examples of a few C/C++ constructs and the debug
+information that would best describe those constructs. The canonical
+references are the ``DIDescriptor`` classes defined in
+``include/llvm/IR/DebugInfo.h`` and the implementations of the helper functions
+in ``lib/IR/DIBuilder.cpp``.
+
+C/C++ source file information
+-----------------------------
+
+``llvm::Instruction`` provides easy access to metadata attached with an
+instruction. One can extract line number information encoded in LLVM IR using
+``Instruction::getDebugLoc()`` and ``DILocation::getLine()``.
+
+.. code-block:: c++
+
+ if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction
+ unsigned Line = Loc->getLine();
+ StringRef File = Loc->getFilename();
+ StringRef Dir = Loc->getDirectory();
+ bool ImplicitCode = Loc->isImplicitCode();
+ }
+
+When the flag ImplicitCode is true then it means that the Instruction has been
+added by the front-end but doesn't correspond to source code written by the user. For example
+
+.. code-block:: c++
+
+ if (MyBoolean) {
+ MyObject MO;
+ ...
+ }
+
+At the end of the scope the MyObject's destructor is called but it isn't written
+explicitly. This information is useful to avoid to have counters on brackets when
+making code coverage.
+
+C/C++ global variable information
+---------------------------------
+
+Given an integer global variable declared as follows:
+
+.. code-block:: c
+
+ _Alignas(8) int MyGlobal = 100;
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: text
+
+ ;;
+ ;; Define the global itself.
+ ;;
+ @MyGlobal = global i32 100, align 8, !dbg !0
+
+ ;;
+ ;; List of debug info of globals
+ ;;
+ !llvm.dbg.cu = !{!1}
+
+ ;; Some unrelated metadata.
+ !llvm.module.flags = !{!6, !7}
+ !llvm.ident = !{!8}
+
+ ;; Define the global variable itself
+ !0 = distinct !DIGlobalVariable(name: "MyGlobal", scope: !1, file: !2, line: 1, type: !5, isLocal: false, isDefinition: true, align: 64)
+
+ ;; Define the compile unit.
+ !1 = distinct !DICompileUnit(language: DW_LANG_C99, file: !2,
+ producer: "clang version 4.0.0",
+ isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug,
+ enums: !3, globals: !4)
+
+ ;;
+ ;; Define the file
+ ;;
+ !2 = !DIFile(filename: "/dev/stdin",
+ directory: "/Users/dexonsmith/data/llvm/debug-info")
+
+ ;; An empty array.
+ !3 = !{}
+
+ ;; The Array of Global Variables
+ !4 = !{!0}
+
+ ;;
+ ;; Define the type
+ ;;
+ !5 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+
+ ;; Dwarf version to output.
+ !6 = !{i32 2, !"Dwarf Version", i32 4}
+
+ ;; Debug info schema version.
+ !7 = !{i32 2, !"Debug Info Version", i32 3}
+
+ ;; Compiler identification
+ !8 = !{!"clang version 4.0.0"}
+
+
+The align value in DIGlobalVariable description specifies variable alignment in
+case it was forced by C11 _Alignas(), C++11 alignas() keywords or compiler
+attribute __attribute__((aligned ())). In other case (when this field is missing)
+alignment is considered default. This is used when producing DWARF output
+for DW_AT_alignment value.
+
+C/C++ function information
+--------------------------
+
+Given a function declared as follows:
+
+.. code-block:: c
+
+ int main(int argc, char *argv[]) {
+ return 0;
+ }
+
+a C/C++ front-end would generate the following descriptors:
+
+.. code-block:: text
+
+ ;;
+ ;; Define the anchor for subprograms.
+ ;;
+ !4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5,
+ isLocal: false, isDefinition: true, scopeLine: 1,
+ flags: DIFlagPrototyped, isOptimized: false,
+ variables: !2)
+
+ ;;
+ ;; Define the subprogram itself.
+ ;;
+ define i32 @main(i32 %argc, i8** %argv) !dbg !4 {
+ ...
+ }
+
+Debugging information format
+============================
+
+Debugging Information Extension for Objective C Properties
+----------------------------------------------------------
+
+Introduction
+^^^^^^^^^^^^
+
+Objective C provides a simpler way to declare and define accessor methods using
+declared properties. The language provides features to declare a property and
+to let compiler synthesize accessor methods.
+
+The debugger lets developer inspect Objective C interfaces and their instance
+variables and class variables. However, the debugger does not know anything
+about the properties defined in Objective C interfaces. The debugger consumes
+information generated by compiler in DWARF format. The format does not support
+encoding of Objective C properties. This proposal describes DWARF extensions to
+encode Objective C properties, which the debugger can use to let developers
+inspect Objective C properties.
+
+Proposal
+^^^^^^^^
+
+Objective C properties exist separately from class members. A property can be
+defined only by "setter" and "getter" selectors, and be calculated anew on each
+access. Or a property can just be a direct access to some declared ivar.
+Finally it can have an ivar "automatically synthesized" for it by the compiler,
+in which case the property can be referred to in user code directly using the
+standard C dereference syntax as well as through the property "dot" syntax, but
+there is no entry in the ``@interface`` declaration corresponding to this ivar.
+
+To facilitate debugging, these properties we will add a new DWARF TAG into the
+``DW_TAG_structure_type`` definition for the class to hold the description of a
+given property, and a set of DWARF attributes that provide said description.
+The property tag will also contain the name and declared type of the property.
+
+If there is a related ivar, there will also be a DWARF property attribute placed
+in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG
+for that property. And in the case where the compiler synthesizes the ivar
+directly, the compiler is expected to generate a ``DW_TAG_member`` for that
+ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used
+to access this ivar directly in code, and with the property attribute pointing
+back to the property it is backing.
+
+The following examples will serve as illustration for our discussion:
+
+.. code-block:: objc
+
+ @interface I1 {
+ int n2;
+ }
+
+ @property int p1;
+ @property int p2;
+ @end
+
+ @implementation I1
+ @synthesize p1;
+ @synthesize p2 = n2;
+ @end
+
+This produces the following DWARF (this is a "pseudo dwarfdump" output):
+
+.. code-block:: none
+
+ 0x00000100: TAG_structure_type [7] *
+ AT_APPLE_runtime_class( 0x10 )
+ AT_name( "I1" )
+ AT_decl_file( "Objc_Property.m" )
+ AT_decl_line( 3 )
+
+ 0x00000110 TAG_APPLE_property
+ AT_name ( "p1" )
+ AT_type ( {0x00000150} ( int ) )
+
+ 0x00000120: TAG_APPLE_property
+ AT_name ( "p2" )
+ AT_type ( {0x00000150} ( int ) )
+
+ 0x00000130: TAG_member [8]
+ AT_name( "_p1" )
+ AT_APPLE_property ( {0x00000110} "p1" )
+ AT_type( {0x00000150} ( int ) )
+ AT_artificial ( 0x1 )
+
+ 0x00000140: TAG_member [8]
+ AT_name( "n2" )
+ AT_APPLE_property ( {0x00000120} "p2" )
+ AT_type( {0x00000150} ( int ) )
+
+ 0x00000150: AT_type( ( int ) )
+
+Note, the current convention is that the name of the ivar for an
+auto-synthesized property is the name of the property from which it derives
+with an underscore prepended, as is shown in the example. But we actually
+don't need to know this convention, since we are given the name of the ivar
+directly.
+
+Also, it is common practice in ObjC to have different property declarations in
+the @interface and @implementation - e.g. to provide a read-only property in
+the interface,and a read-write interface in the implementation. In that case,
+the compiler should emit whichever property declaration will be in force in the
+current translation unit.
+
+Developers can decorate a property with attributes which are encoded using
+``DW_AT_APPLE_property_attribute``.
+
+.. code-block:: objc
+
+ @property (readonly, nonatomic) int pr;
+
+.. code-block:: none
+
+ TAG_APPLE_property [8]
+ AT_name( "pr" )
+ AT_type ( {0x00000147} (int) )
+ AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
+
+The setter and getter method names are attached to the property using
+``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes.
+
+.. code-block:: objc
+
+ @interface I1
+ @property (setter=myOwnP3Setter:) int p3;
+ -(void)myOwnP3Setter:(int)a;
+ @end
+
+ @implementation I1
+ @synthesize p3;
+ -(void)myOwnP3Setter:(int)a{ }
+ @end
+
+The DWARF for this would be:
+
+.. code-block:: none
+
+ 0x000003bd: TAG_structure_type [7] *
+ AT_APPLE_runtime_class( 0x10 )
+ AT_name( "I1" )
+ AT_decl_file( "Objc_Property.m" )
+ AT_decl_line( 3 )
+
+ 0x000003cd TAG_APPLE_property
+ AT_name ( "p3" )
+ AT_APPLE_property_setter ( "myOwnP3Setter:" )
+ AT_type( {0x00000147} ( int ) )
+
+ 0x000003f3: TAG_member [8]
+ AT_name( "_p3" )
+ AT_type ( {0x00000147} ( int ) )
+ AT_APPLE_property ( {0x000003cd} )
+ AT_artificial ( 0x1 )
+
+New DWARF Tags
+^^^^^^^^^^^^^^
+
++-----------------------+--------+
+| TAG | Value |
++=======================+========+
+| DW_TAG_APPLE_property | 0x4200 |
++-----------------------+--------+
+
+New DWARF Attributes
+^^^^^^^^^^^^^^^^^^^^
+
++--------------------------------+--------+-----------+
+| Attribute | Value | Classes |
++================================+========+===========+
+| DW_AT_APPLE_property | 0x3fed | Reference |
++--------------------------------+--------+-----------+
+| DW_AT_APPLE_property_getter | 0x3fe9 | String |
++--------------------------------+--------+-----------+
+| DW_AT_APPLE_property_setter | 0x3fea | String |
++--------------------------------+--------+-----------+
+| DW_AT_APPLE_property_attribute | 0x3feb | Constant |
++--------------------------------+--------+-----------+
+
+New DWARF Constants
+^^^^^^^^^^^^^^^^^^^
+
++--------------------------------------+-------+
+| Name | Value |
++======================================+=======+
+| DW_APPLE_PROPERTY_readonly | 0x01 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_getter | 0x02 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_assign | 0x04 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_readwrite | 0x08 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_retain | 0x10 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_copy | 0x20 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_nonatomic | 0x40 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_setter | 0x80 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_atomic | 0x100 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_weak | 0x200 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_strong | 0x400 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_unsafe_unretained | 0x800 |
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_nullability | 0x1000|
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_null_resettable | 0x2000|
++--------------------------------------+-------+
+| DW_APPLE_PROPERTY_class | 0x4000|
++--------------------------------------+-------+
+
+Name Accelerator Tables
+-----------------------
+
+Introduction
+^^^^^^^^^^^^
+
+The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a
+debugger needs. The "``pub``" in the section name indicates that the entries
+in the table are publicly visible names only. This means no static or hidden
+functions show up in the "``.debug_pubnames``". No static variables or private
+class variables are in the "``.debug_pubtypes``". Many compilers add different
+things to these tables, so we can't rely upon the contents between gcc, icc, or
+clang.
+
+The typical query given by users tends not to match up with the contents of
+these tables. For example, the DWARF spec states that "In the case of the name
+of a function member or static data member of a C++ structure, class or union,
+the name presented in the "``.debug_pubnames``" section is not the simple name
+given by the ``DW_AT_name attribute`` of the referenced debugging information
+entry, but rather the fully qualified name of the data or function member."
+So the only names in these tables for complex C++ entries is a fully
+qualified name. Debugger users tend not to enter their search strings as
+"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or
+"``a::b::c``". So the name entered in the name table must be demangled in
+order to chop it up appropriately and additional names must be manually entered
+into the table to make it effective as a name lookup table for debuggers to
+use.
+
+All debuggers currently ignore the "``.debug_pubnames``" table as a result of
+its inconsistent and useless public-only name content making it a waste of
+space in the object file. These tables, when they are written to disk, are not
+sorted in any way, leaving every debugger to do its own parsing and sorting.
+These tables also include an inlined copy of the string values in the table
+itself making the tables much larger than they need to be on disk, especially
+for large C++ programs.
+
+Can't we just fix the sections by adding all of the names we need to this
+table? No, because that is not what the tables are defined to contain and we
+won't know the difference between the old bad tables and the new good tables.
+At best we could make our own renamed sections that contain all of the data we
+need.
+
+These tables are also insufficient for what a debugger like LLDB needs. LLDB
+uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then
+often asked to look for type "``foo``" or namespace "``bar``", or list items in
+namespace "``baz``". Namespaces are not included in the pubnames or pubtypes
+tables. Since clang asks a lot of questions when it is parsing an expression,
+we need to be very fast when looking up names, as it happens a lot. Having new
+accelerator tables that are optimized for very quick lookups will benefit this
+type of debugging experience greatly.
+
+We would like to generate name lookup tables that can be mapped into memory
+from disk, and used as is, with little or no up-front parsing. We would also
+be able to control the exact content of these different tables so they contain
+exactly what we need. The Name Accelerator Tables were designed to fix these
+issues. In order to solve these issues we need to:
+
+* Have a format that can be mapped into memory from disk and used as is
+* Lookups should be very fast
+* Extensible table format so these tables can be made by many producers
+* Contain all of the names needed for typical lookups out of the box
+* Strict rules for the contents of tables
+
+Table size is important and the accelerator table format should allow the reuse
+of strings from common string tables so the strings for the names are not
+duplicated. We also want to make sure the table is ready to be used as-is by
+simply mapping the table into memory with minimal header parsing.
+
+The name lookups need to be fast and optimized for the kinds of lookups that
+debuggers tend to do. Optimally we would like to touch as few parts of the
+mapped table as possible when doing a name lookup and be able to quickly find
+the name entry we are looking for, or discover there are no matches. In the
+case of debuggers we optimized for lookups that fail most of the time.
+
+Each table that is defined should have strict rules on exactly what is in the
+accelerator tables and documented so clients can rely on the content.
+
+Hash Tables
+^^^^^^^^^^^
+
+Standard Hash Tables
+""""""""""""""""""""
+
+Typical hash tables have a header, buckets, and each bucket points to the
+bucket contents:
+
+.. code-block:: none
+
+ .------------.
+ | HEADER |
+ |------------|
+ | BUCKETS |
+ |------------|
+ | DATA |
+ `------------'
+
+The BUCKETS are an array of offsets to DATA for each hash:
+
+.. code-block:: none
+
+ .------------.
+ | 0x00001000 | BUCKETS[0]
+ | 0x00002000 | BUCKETS[1]
+ | 0x00002200 | BUCKETS[2]
+ | 0x000034f0 | BUCKETS[3]
+ | | ...
+ | 0xXXXXXXXX | BUCKETS[n_buckets]
+ '------------'
+
+So for ``bucket[3]`` in the example above, we have an offset into the table
+0x000034f0 which points to a chain of entries for the bucket. Each bucket must
+contain a next pointer, full 32 bit hash value, the string itself, and the data
+for the current string value.
+
+.. code-block:: none
+
+ .------------.
+ 0x000034f0: | 0x00003500 | next pointer
+ | 0x12345678 | 32 bit hash
+ | "erase" | string value
+ | data[n] | HashData for this bucket
+ |------------|
+ 0x00003500: | 0x00003550 | next pointer
+ | 0x29273623 | 32 bit hash
+ | "dump" | string value
+ | data[n] | HashData for this bucket
+ |------------|
+ 0x00003550: | 0x00000000 | next pointer
+ | 0x82638293 | 32 bit hash
+ | "main" | string value
+ | data[n] | HashData for this bucket
+ `------------'
+
+The problem with this layout for debuggers is that we need to optimize for the
+negative lookup case where the symbol we're searching for is not present. So
+if we were to lookup "``printf``" in the table above, we would make a 32-bit
+hash for "``printf``", it might match ``bucket[3]``. We would need to go to
+the offset 0x000034f0 and start looking to see if our 32 bit hash matches. To
+do so, we need to read the next pointer, then read the hash, compare it, and
+skip to the next bucket. Each time we are skipping many bytes in memory and
+touching new pages just to do the compare on the full 32 bit hash. All of
+these accesses then tell us that we didn't have a match.
+
+Name Hash Tables
+""""""""""""""""
+
+To solve the issues mentioned above we have structured the hash tables a bit
+differently: a header, buckets, an array of all unique 32 bit hash values,
+followed by an array of hash value data offsets, one for each hash value, then
+the data for all hash values:
+
+.. code-block:: none
+
+ .-------------.
+ | HEADER |
+ |-------------|
+ | BUCKETS |
+ |-------------|
+ | HASHES |
+ |-------------|
+ | OFFSETS |
+ |-------------|
+ | DATA |
+ `-------------'
+
+The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By
+making all of the full 32 bit hash values contiguous in memory, we allow
+ourselves to efficiently check for a match while touching as little memory as
+possible. Most often checking the 32 bit hash values is as far as the lookup
+goes. If it does match, it usually is a match with no collisions. So for a
+table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash
+values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and
+``OFFSETS`` as:
+
+.. code-block:: none
+
+ .-------------------------.
+ | HEADER.magic | uint32_t
+ | HEADER.version | uint16_t
+ | HEADER.hash_function | uint16_t
+ | HEADER.bucket_count | uint32_t
+ | HEADER.hashes_count | uint32_t
+ | HEADER.header_data_len | uint32_t
+ | HEADER_DATA | HeaderData
+ |-------------------------|
+ | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes
+ |-------------------------|
+ | HASHES | uint32_t[n_hashes] // 32 bit hash values
+ |-------------------------|
+ | OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data
+ |-------------------------|
+ | ALL HASH DATA |
+ `-------------------------'
+
+So taking the exact same data from the standard hash example above we end up
+with:
+
+.. code-block:: none
+
+ .------------.
+ | HEADER |
+ |------------|
+ | 0 | BUCKETS[0]
+ | 2 | BUCKETS[1]
+ | 5 | BUCKETS[2]
+ | 6 | BUCKETS[3]
+ | | ...
+ | ... | BUCKETS[n_buckets]
+ |------------|
+ | 0x........ | HASHES[0]
+ | 0x........ | HASHES[1]
+ | 0x........ | HASHES[2]
+ | 0x........ | HASHES[3]
+ | 0x........ | HASHES[4]
+ | 0x........ | HASHES[5]
+ | 0x12345678 | HASHES[6] hash for BUCKETS[3]
+ | 0x29273623 | HASHES[7] hash for BUCKETS[3]
+ | 0x82638293 | HASHES[8] hash for BUCKETS[3]
+ | 0x........ | HASHES[9]
+ | 0x........ | HASHES[10]
+ | 0x........ | HASHES[11]
+ | 0x........ | HASHES[12]
+ | 0x........ | HASHES[13]
+ | 0x........ | HASHES[n_hashes]
+ |------------|
+ | 0x........ | OFFSETS[0]
+ | 0x........ | OFFSETS[1]
+ | 0x........ | OFFSETS[2]
+ | 0x........ | OFFSETS[3]
+ | 0x........ | OFFSETS[4]
+ | 0x........ | OFFSETS[5]
+ | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3]
+ | 0x00003500 | OFFSETS[7] offset for BUCKETS[3]
+ | 0x00003550 | OFFSETS[8] offset for BUCKETS[3]
+ | 0x........ | OFFSETS[9]
+ | 0x........ | OFFSETS[10]
+ | 0x........ | OFFSETS[11]
+ | 0x........ | OFFSETS[12]
+ | 0x........ | OFFSETS[13]
+ | 0x........ | OFFSETS[n_hashes]
+ |------------|
+ | |
+ | |
+ | |
+ | |
+ | |
+ |------------|
+ 0x000034f0: | 0x00001203 | .debug_str ("erase")
+ | 0x00000004 | A 32 bit array count - number of HashData with name "erase"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x........ | HashData[2]
+ | 0x........ | HashData[3]
+ | 0x00000000 | String offset into .debug_str (terminate data for hash)
+ |------------|
+ 0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
+ | 0x00000002 | A 32 bit array count - number of HashData with name "collision"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x00001203 | String offset into .debug_str ("dump")
+ | 0x00000003 | A 32 bit array count - number of HashData with name "dump"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x........ | HashData[2]
+ | 0x00000000 | String offset into .debug_str (terminate data for hash)
+ |------------|
+ 0x00003550: | 0x00001203 | String offset into .debug_str ("main")
+ | 0x00000009 | A 32 bit array count - number of HashData with name "main"
+ | 0x........ | HashData[0]
+ | 0x........ | HashData[1]
+ | 0x........ | HashData[2]
+ | 0x........ | HashData[3]
+ | 0x........ | HashData[4]
+ | 0x........ | HashData[5]
+ | 0x........ | HashData[6]
+ | 0x........ | HashData[7]
+ | 0x........ | HashData[8]
+ | 0x00000000 | String offset into .debug_str (terminate data for hash)
+ `------------'
+
+So we still have all of the same data, we just organize it more efficiently for
+debugger lookup. If we repeat the same "``printf``" lookup from above, we
+would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit
+hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which
+is the index into the ``HASHES`` table. We would then compare any consecutive
+32 bit hashes values in the ``HASHES`` array as long as the hashes would be in
+``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo
+``n_buckets`` is still 3. In the case of a failed lookup we would access the
+memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes
+before we know that we have no match. We don't end up marching through
+multiple words of memory and we really keep the number of processor data cache
+lines being accessed as small as possible.
+
+The string hash that is used for these lookup tables is the Daniel J.
+Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a
+very good hash for all kinds of names in programs with very few hash
+collisions.
+
+Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``.
+
+Details
+^^^^^^^
+
+These name hash tables are designed to be generic where specializations of the
+table get to define additional data that goes into the header ("``HeaderData``"),
+how the string value is stored ("``KeyType``") and the content of the data for each
+hash value.
+
+Header Layout
+"""""""""""""
+
+The header has a fixed part, and the specialized part. The exact format of the
+header is:
+
+.. code-block:: c
+
+ struct Header
+ {
+ uint32_t magic; // 'HASH' magic value to allow endian detection
+ uint16_t version; // Version number
+ uint16_t hash_function; // The hash function enumeration that was used
+ uint32_t bucket_count; // The number of buckets in this hash table
+ uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
+ uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
+ // Specifically the length of the following HeaderData field - this does not
+ // include the size of the preceding fields
+ HeaderData header_data; // Implementation specific header data
+ };
+
+The header starts with a 32 bit "``magic``" value which must be ``'HASH'``
+encoded as an ASCII integer. This allows the detection of the start of the
+hash table and also allows the table's byte order to be determined so the table
+can be correctly extracted. The "``magic``" value is followed by a 16 bit
+``version`` number which allows the table to be revised and modified in the
+future. The current version number is 1. ``hash_function`` is a ``uint16_t``
+enumeration that specifies which hash function was used to produce this table.
+The current values for the hash function enumerations include:
+
+.. code-block:: c
+
+ enum HashFunctionType
+ {
+ eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
+ };
+
+``bucket_count`` is a 32 bit unsigned integer that represents how many buckets
+are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit
+hash values that are in the ``HASHES`` array, and is the same number of offsets
+are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size
+in bytes of the ``HeaderData`` that is filled in by specialized versions of
+this table.
+
+Fixed Lookup
+""""""""""""
+
+The header is followed by the buckets, hashes, offsets, and hash value data.
+
+.. code-block:: c
+
+ struct FixedTable
+ {
+ uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
+ uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
+ uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
+ };
+
+``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The
+``hashes`` array contains all of the 32 bit hash values for all names in the
+hash table. Each hash in the ``hashes`` table has an offset in the ``offsets``
+array that points to the data for the hash value.
+
+This table setup makes it very easy to repurpose these tables to contain
+different data, while keeping the lookup mechanism the same for all tables.
+This layout also makes it possible to save the table to disk and map it in
+later and do very efficient name lookups with little or no parsing.
+
+DWARF lookup tables can be implemented in a variety of ways and can store a lot
+of information for each name. We want to make the DWARF tables extensible and
+able to store the data efficiently so we have used some of the DWARF features
+that enable efficient data storage to define exactly what kind of data we store
+for each name.
+
+The ``HeaderData`` contains a definition of the contents of each HashData chunk.
+We might want to store an offset to all of the debug information entries (DIEs)
+for each name. To keep things extensible, we create a list of items, or
+Atoms, that are contained in the data for each name. First comes the type of
+the data in each atom:
+
+.. code-block:: c
+
+ enum AtomType
+ {
+ eAtomTypeNULL = 0u,
+ eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
+ eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
+ eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
+ eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
+ eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
+ };
+
+The enumeration values and their meanings are:
+
+.. code-block:: none
+
+ eAtomTypeNULL - a termination atom that specifies the end of the atom list
+ eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
+ eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
+ eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
+ eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
+ eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
+
+Then we allow each atom type to define the atom type and how the data for each
+atom type data is encoded:
+
+.. code-block:: c
+
+ struct Atom
+ {
+ uint16_t type; // AtomType enum value
+ uint16_t form; // DWARF DW_FORM_XXX defines
+ };
+
+The ``form`` type above is from the DWARF specification and defines the exact
+encoding of the data for the Atom type. See the DWARF specification for the
+``DW_FORM_`` definitions.
+
+.. code-block:: c
+
+ struct HeaderData
+ {
+ uint32_t die_offset_base;
+ uint32_t atom_count;
+ Atoms atoms[atom_count0];
+ };
+
+``HeaderData`` defines the base DIE offset that should be added to any atoms
+that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``,
+``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines
+what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large
+each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data
+should be interpreted.
+
+For the current implementations of the "``.apple_names``" (all functions +
+globals), the "``.apple_types``" (names of all types that are defined), and
+the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom``
+array to be:
+
+.. code-block:: c
+
+ HeaderData.atom_count = 1;
+ HeaderData.atoms[0].type = eAtomTypeDIEOffset;
+ HeaderData.atoms[0].form = DW_FORM_data4;
+
+This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
+encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
+multiple matching DIEs in a single file, which could come up with an inlined
+function for instance. Future tables could include more information about the
+DIE such as flags indicating if the DIE is a function, method, block,
+or inlined.
+
+The KeyType for the DWARF table is a 32 bit string table offset into the
+".debug_str" table. The ".debug_str" is the string table for the DWARF which
+may already contain copies of all of the strings. This helps make sure, with
+help from the compiler, that we reuse the strings between all of the DWARF
+sections and keeps the hash table size down. Another benefit to having the
+compiler generate all strings as DW_FORM_strp in the debug info, is that
+DWARF parsing can be made much faster.
+
+After a lookup is made, we get an offset into the hash data. The hash data
+needs to be able to deal with 32 bit hash collisions, so the chunk of data
+at the offset in the hash data consists of a triple:
+
+.. code-block:: c
+
+ uint32_t str_offset
+ uint32_t hash_data_count
+ HashData[hash_data_count]
+
+If "str_offset" is zero, then the bucket contents are done. 99.9% of the
+hash data chunks contain a single item (no 32 bit hash collision):
+
+.. code-block:: none
+
+ .------------.
+ | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
+ | 0x00000004 | uint32_t HashData count
+ | 0x........ | uint32_t HashData[0] DIE offset
+ | 0x........ | uint32_t HashData[1] DIE offset
+ | 0x........ | uint32_t HashData[2] DIE offset
+ | 0x........ | uint32_t HashData[3] DIE offset
+ | 0x00000000 | uint32_t KeyType (end of hash chain)
+ `------------'
+
+If there are collisions, you will have multiple valid string offsets:
+
+.. code-block:: none
+
+ .------------.
+ | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
+ | 0x00000004 | uint32_t HashData count
+ | 0x........ | uint32_t HashData[0] DIE offset
+ | 0x........ | uint32_t HashData[1] DIE offset
+ | 0x........ | uint32_t HashData[2] DIE offset
+ | 0x........ | uint32_t HashData[3] DIE offset
+ | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
+ | 0x00000002 | uint32_t HashData count
+ | 0x........ | uint32_t HashData[0] DIE offset
+ | 0x........ | uint32_t HashData[1] DIE offset
+ | 0x00000000 | uint32_t KeyType (end of hash chain)
+ `------------'
+
+Current testing with real world C++ binaries has shown that there is around 1
+32 bit hash collision per 100,000 name entries.
+
+Contents
+^^^^^^^^
+
+As we said, we want to strictly define exactly what is included in the
+different tables. For DWARF, we have 3 tables: "``.apple_names``",
+"``.apple_types``", and "``.apple_namespaces``".
+
+"``.apple_names``" sections should contain an entry for each DWARF DIE whose
+``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or
+``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``,
+``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains
+``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and
+static variables). All global and static variables should be included,
+including those scoped within functions and classes. For example using the
+following code:
+
+.. code-block:: c
+
+ static int var = 0;
+
+ void f ()
+ {
+ static int var = 0;
+ }
+
+Both of the static ``var`` variables would be included in the table. All
+functions should emit both their full names and their basenames. For C or C++,
+the full name is the mangled name (if available) which is usually in the
+``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the
+function basename. If global or static variables have a mangled name in a
+``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the
+simple name found in the ``DW_AT_name`` attribute.
+
+"``.apple_types``" sections should contain an entry for each DWARF DIE whose
+tag is one of:
+
+* DW_TAG_array_type
+* DW_TAG_class_type
+* DW_TAG_enumeration_type
+* DW_TAG_pointer_type
+* DW_TAG_reference_type
+* DW_TAG_string_type
+* DW_TAG_structure_type
+* DW_TAG_subroutine_type
+* DW_TAG_typedef
+* DW_TAG_union_type
+* DW_TAG_ptr_to_member_type
+* DW_TAG_set_type
+* DW_TAG_subrange_type
+* DW_TAG_base_type
+* DW_TAG_const_type
+* DW_TAG_file_type
+* DW_TAG_namelist
+* DW_TAG_packed_type
+* DW_TAG_volatile_type
+* DW_TAG_restrict_type
+* DW_TAG_atomic_type
+* DW_TAG_interface_type
+* DW_TAG_unspecified_type
+* DW_TAG_shared_type
+
+Only entries with a ``DW_AT_name`` attribute are included, and the entry must
+not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero
+value). For example, using the following code:
+
+.. code-block:: c
+
+ int main ()
+ {
+ int *b = 0;
+ return *b;
+ }
+
+We get a few type DIEs:
+
+.. code-block:: none
+
+ 0x00000067: TAG_base_type [5]
+ AT_encoding( DW_ATE_signed )
+ AT_name( "int" )
+ AT_byte_size( 0x04 )
+
+ 0x0000006e: TAG_pointer_type [6]
+ AT_type( {0x00000067} ( int ) )
+ AT_byte_size( 0x08 )
+
+The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``.
+
+"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs.
+If we run into a namespace that has no name this is an anonymous namespace, and
+the name should be output as "``(anonymous namespace)``" (without the quotes).
+Why? This matches the output of the ``abi::cxa_demangle()`` that is in the
+standard C++ library that demangles mangled names.
+
+
+Language Extensions and File Format Changes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Objective-C Extensions
+""""""""""""""""""""""
+
+"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an
+Objective-C class. The name used in the hash table is the name of the
+Objective-C class itself. If the Objective-C class has a category, then an
+entry is made for both the class name without the category, and for the class
+name with the category. So if we have a DIE at offset 0x1234 with a name of
+method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add
+an entry for "``NSString``" that points to DIE 0x1234, and an entry for
+"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly
+track down all Objective-C methods for an Objective-C class when doing
+expressions. It is needed because of the dynamic nature of Objective-C where
+anyone can add methods to a class. The DWARF for Objective-C methods is also
+emitted differently from C++ classes where the methods are not usually
+contained in the class definition, they are scattered about across one or more
+compile units. Categories can also be defined in different shared libraries.
+So we need to be able to quickly find all of the methods and class functions
+given the Objective-C class name, or quickly find all methods and class
+functions for a class + category name. This table does not contain any
+selector names, it just maps Objective-C class names (or class names +
+category) to all of the methods and class functions. The selectors are added
+as function basenames in the "``.debug_names``" section.
+
+In the "``.apple_names``" section for Objective-C functions, the full name is
+the entire function name with the brackets ("``-[NSString
+stringWithCString:]``") and the basename is the selector only
+("``stringWithCString:``").
+
+Mach-O Changes
+""""""""""""""
+
+The sections names for the apple hash tables are for non-mach-o files. For
+mach-o files, the sections should be contained in the ``__DWARF`` segment with
+names as follows:
+
+* "``.apple_names``" -> "``__apple_names``"
+* "``.apple_types``" -> "``__apple_types``"
+* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit)
+* "``.apple_objc``" -> "``__apple_objc``"
+
+.. _codeview:
+
+CodeView Debug Info Format
+==========================
+
+LLVM supports emitting CodeView, the Microsoft debug info format, and this
+section describes the design and implementation of that support.
+
+Format Background
+-----------------
+
+CodeView as a format is clearly oriented around C++ debugging, and in C++, the
+majority of debug information tends to be type information. Therefore, the
+overriding design constraint of CodeView is the separation of type information
+from other "symbol" information so that type information can be efficiently
+merged across translation units. Both type information and symbol information is
+generally stored as a sequence of records, where each record begins with a
+16-bit record size and a 16-bit record kind.
+
+Type information is usually stored in the ``.debug$T`` section of the object
+file. All other debug info, such as line info, string table, symbol info, and
+inlinee info, is stored in one or more ``.debug$S`` sections. There may only be
+one ``.debug$T`` section per object file, since all other debug info refers to
+it. If a PDB (enabled by the ``/Zi`` MSVC option) was used during compilation,
+the ``.debug$T`` section will contain only an ``LF_TYPESERVER2`` record pointing
+to the PDB. When using PDBs, symbol information appears to remain in the object
+file ``.debug$S`` sections.
+
+Type records are referred to by their index, which is the number of records in
+the stream before a given record plus ``0x1000``. Many common basic types, such
+as the basic integral types and unqualified pointers to them, are represented
+using type indices less than ``0x1000``. Such basic types are built in to
+CodeView consumers and do not require type records.
+
+Each type record may only contain type indices that are less than its own type
+index. This ensures that the graph of type stream references is acyclic. While
+the source-level type graph may contain cycles through pointer types (consider a
+linked list struct), these cycles are removed from the type stream by always
+referring to the forward declaration record of user-defined record types. Only
+"symbol" records in the ``.debug$S`` streams may refer to complete,
+non-forward-declaration type records.
+
+Working with CodeView
+---------------------
+
+These are instructions for some common tasks for developers working to improve
+LLVM's CodeView support. Most of them revolve around using the CodeView dumper
+embedded in ``llvm-readobj``.
+
+* Testing MSVC's output::
+
+ $ cl -c -Z7 foo.cpp # Use /Z7 to keep types in the object file
+ $ llvm-readobj -codeview foo.obj
+
+* Getting LLVM IR debug info out of Clang::
+
+ $ clang -g -gcodeview --target=x86_64-windows-msvc foo.cpp -S -emit-llvm
+
+ Use this to generate LLVM IR for LLVM test cases.
+
+* Generate and dump CodeView from LLVM IR metadata::
+
+ $ llc foo.ll -filetype=obj -o foo.obj
+ $ llvm-readobj -codeview foo.obj > foo.txt
+
+ Use this pattern in lit test cases and FileCheck the output of llvm-readobj
+
+Improving LLVM's CodeView support is a process of finding interesting type
+records, constructing a C++ test case that makes MSVC emit those records,
+dumping the records, understanding them, and then generating equivalent records
+in LLVM's backend.
+
+Testing Debug Info Preservation in Optimizations
+================================================
+
+The following paragraphs are an introduction to the debugify utility
+and examples of how to use it in regression tests to check debug info
+preservation after optimizations.
+
+The ``debugify`` utility
+------------------------
+
+The ``debugify`` synthetic debug info testing utility consists of two
+main parts. The ``debugify`` pass and the ``check-debugify`` one. They are
+meant to be used with ``opt`` for development purposes.
+
+The first applies synthetic debug information to every instruction of the module,
+while the latter checks that this DI is still available after an optimization
+has occurred, reporting any errors/warnings while doing so.
+
+The instructions are assigned sequentially increasing line locations,
+and are immediately used by debug value intrinsics when possible.
+
+For example, here is a module before:
+
+.. code-block:: llvm
+
+ define void @f(i32* %x) {
+ entry:
+ %x.addr = alloca i32*, align 8
+ store i32* %x, i32** %x.addr, align 8
+ %0 = load i32*, i32** %x.addr, align 8
+ store i32 10, i32* %0, align 4
+ ret void
+ }
+
+and after running ``opt -debugify`` on it we get:
+
+.. code-block:: text
+
+ define void @f(i32* %x) !dbg !6 {
+ entry:
+ %x.addr = alloca i32*, align 8, !dbg !12
+ call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12
+ store i32* %x, i32** %x.addr, align 8, !dbg !13
+ %0 = load i32*, i32** %x.addr, align 8, !dbg !14
+ call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14
+ store i32 10, i32* %0, align 4, !dbg !15
+ ret void, !dbg !16
+ }
+
+ !llvm.dbg.cu = !{!0}
+ !llvm.debugify = !{!3, !4}
+ !llvm.module.flags = !{!5}
+
+ !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
+ !1 = !DIFile(filename: "debugify-sample.ll", directory: "/")
+ !2 = !{}
+ !3 = !{i32 5}
+ !4 = !{i32 2}
+ !5 = !{i32 2, !"Debug Info Version", i32 3}
+ !6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8)
+ !7 = !DISubroutineType(types: !2)
+ !8 = !{!9, !11}
+ !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10)
+ !10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned)
+ !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10)
+ !12 = !DILocation(line: 1, column: 1, scope: !6)
+ !13 = !DILocation(line: 2, column: 1, scope: !6)
+ !14 = !DILocation(line: 3, column: 1, scope: !6)
+ !15 = !DILocation(line: 4, column: 1, scope: !6)
+ !16 = !DILocation(line: 5, column: 1, scope: !6)
+
+The following is an example of the -check-debugify output:
+
+.. code-block:: none
+
+ $ opt -enable-debugify -loop-vectorize llvm/test/Transforms/LoopVectorize/i8-induction.ll -disable-output
+ ERROR: Instruction with empty DebugLoc in function f -- %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
+
+Errors/warnings can range from instructions with empty debug location to an
+instruction having a type that's incompatible with the source variable it describes,
+all the way to missing lines and missing debug value intrinsics.
+
+Fixing errors
+^^^^^^^^^^^^^
+
+Each of the errors above has a relevant API available to fix it.
+
+* In the case of missing debug location, ``Instruction::setDebugLoc`` or possibly
+ ``IRBuilder::setCurrentDebugLocation`` when using a Builder and the new location
+ should be reused.
+
+* When a debug value has incompatible type ``llvm::replaceAllDbgUsesWith`` can be used.
+ After a RAUW call an incompatible type error can occur because RAUW does not handle
+ widening and narrowing of variables while ``llvm::replaceAllDbgUsesWith`` does. It is
+ also capable of changing the DWARF expression used by the debugger to describe the variable.
+ It also prevents use-before-def by salvaging or deleting invalid debug values.
+
+* When a debug value is missing ``llvm::salvageDebugInfo`` can be used when no replacement
+ exists, or ``llvm::replaceAllDbgUsesWith`` when a replacement exists.
+
+Using ``debugify``
+------------------
+
+In order for ``check-debugify`` to work, the DI must be coming from
+``debugify``. Thus, modules with existing DI will be skipped.
+
+The most straightforward way to use ``debugify`` is as follows::
+
+ $ opt -debugify -pass-to-test -check-debugify sample.ll
+
+This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test``
+and then check for missing DI.
+
+Some other ways to run debugify are avaliable:
+
+.. code-block:: bash
+
+ # Same as the above example.
+ $ opt -enable-debugify -pass-to-test sample.ll
+
+ # Suppresses verbose debugify output.
+ $ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll
+
+ # Prepend -debugify before and append -check-debugify -strip after
+ # each pass on the pipeline (similar to -verify-each).
+ $ opt -debugify-each -O2 sample.ll
+
+``debugify`` can also be used to test a backend, e.g:
+
+.. code-block:: bash
+
+ $ opt -debugify < sample.ll | llc -o -
+
+``debugify`` in regression tests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``-debugify`` pass is especially helpful when it comes to testing that
+a given pass preserves DI while transforming the module. For this to work,
+the ``-debugify`` output must be stable enough to use in regression tests.
+Changes to this pass are not allowed to break existing tests.
+
+It allows us to test for DI loss in the same tests we check that the
+transformation is actually doing what it should.
+
+Here is an example from ``test/Transforms/InstCombine/cast-mul-select.ll``:
+
+.. code-block:: llvm
+
+ ; RUN: opt < %s -debugify -instcombine -S | FileCheck %s --check-prefix=DEBUGINFO
+
+ define i32 @mul(i32 %x, i32 %y) {
+ ; DBGINFO-LABEL: @mul(
+ ; DBGINFO-NEXT: [[C:%.*]] = mul i32 {{.*}}
+ ; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[C]]
+ ; DBGINFO-NEXT: [[D:%.*]] = and i32 {{.*}}
+ ; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[D]]
+
+ %A = trunc i32 %x to i8
+ %B = trunc i32 %y to i8
+ %C = mul i8 %A, %B
+ %D = zext i8 %C to i32
+ ret i32 %D
+ }
+
+Here we test that the two ``dbg.value`` instrinsics are preserved and
+are correctly pointing to the ``[[C]]`` and ``[[D]]`` variables.
+
+.. note::
+
+ Note, that when writing this kind of regression tests, it is important
+ to make them as robust as possible. That's why we should try to avoid
+ hardcoding line/variable numbers in check lines. If for example you test
+ for a ``DILocation`` to have a specific line number, and someone later adds
+ an instruction before the one we check the test will fail. In the cases this
+ can't be avoided (say, if a test wouldn't be precise enough), moving the
+ test to its own file is preferred.
Added: www-releases/trunk/8.0.1/docs/_sources/SpeculativeLoadHardening.md.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/SpeculativeLoadHardening.md.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/SpeculativeLoadHardening.md.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/SpeculativeLoadHardening.md.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,1098 @@
+# Speculative Load Hardening
+
+### A Spectre Variant #1 Mitigation Technique
+
+Author: Chandler Carruth - [chandlerc at google.com](mailto:chandlerc at google.com)
+
+## Problem Statement
+
+Recently, Google Project Zero and other researchers have found information leak
+vulnerabilities by exploiting speculative execution in modern CPUs. These
+exploits are currently broken down into three variants:
+* GPZ Variant #1 (a.k.a. Spectre Variant #1): Bounds check (or predicate) bypass
+* GPZ Variant #2 (a.k.a. Spectre Variant #2): Branch target injection
+* GPZ Variant #3 (a.k.a. Meltdown): Rogue data cache load
+
+For more details, see the Google Project Zero blog post and the Spectre research
+paper:
+* https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
+* https://spectreattack.com/spectre.pdf
+
+The core problem of GPZ Variant #1 is that speculative execution uses branch
+prediction to select the path of instructions speculatively executed. This path
+is speculatively executed with the available data, and may load from memory and
+leak the loaded values through various side channels that survive even when the
+speculative execution is unwound due to being incorrect. Mispredicted paths can
+cause code to be executed with data inputs that never occur in correct
+executions, making checks against malicious inputs ineffective and allowing
+attackers to use malicious data inputs to leak secret data. Here is an example,
+extracted and simplified from the Project Zero paper:
+```
+struct array {
+ unsigned long length;
+ unsigned char data[];
+};
+struct array *arr1 = ...; // small array
+struct array *arr2 = ...; // array of size 0x400
+unsigned long untrusted_offset_from_caller = ...;
+if (untrusted_offset_from_caller < arr1->length) {
+ unsigned char value = arr1->data[untrusted_offset_from_caller];
+ unsigned long index2 = ((value&1)*0x100)+0x200;
+ unsigned char value2 = arr2->data[index2];
+}
+```
+
+The key of the attack is to call this with `untrusted_offset_from_caller` that
+is far outside of the bounds when the branch predictor will predict that it
+will be in-bounds. In that case, the body of the `if` will be executed
+speculatively, and may read secret data into `value` and leak it via a
+cache-timing side channel when a dependent access is made to populate `value2`.
+
+## High Level Mitigation Approach
+
+While several approaches are being actively pursued to mitigate specific
+branches and/or loads inside especially risky software (most notably various OS
+kernels), these approaches require manual and/or static analysis aided auditing
+of code and explicit source changes to apply the mitigation. They are unlikely
+to scale well to large applications. We are proposing a comprehensive
+mitigation approach that would apply automatically across an entire program
+rather than through manual changes to the code. While this is likely to have a
+high performance cost, some applications may be in a good position to take this
+performance / security tradeoff.
+
+The specific technique we propose is to cause loads to be checked using
+branchless code to ensure that they are executing along a valid control flow
+path. Consider the following C-pseudo-code representing the core idea of a
+predicate guarding potentially invalid loads:
+```
+void leak(int data);
+void example(int* pointer1, int* pointer2) {
+ if (condition) {
+ // ... lots of code ...
+ leak(*pointer1);
+ } else {
+ // ... more code ...
+ leak(*pointer2);
+ }
+}
+```
+
+This would get transformed into something resembling the following:
+```
+uintptr_t all_ones_mask = std::numerical_limits<uintptr_t>::max();
+uintptr_t all_zeros_mask = 0;
+void leak(int data);
+void example(int* pointer1, int* pointer2) {
+ uintptr_t predicate_state = all_ones_mask;
+ if (condition) {
+ // Assuming ?: is implemented using branchless logic...
+ predicate_state = !condition ? all_zeros_mask : predicate_state;
+ // ... lots of code ...
+ //
+ // Harden the pointer so it can't be loaded
+ pointer1 &= predicate_state;
+ leak(*pointer1);
+ } else {
+ predicate_state = condition ? all_zeros_mask : predicate_state;
+ // ... more code ...
+ //
+ // Alternative: Harden the loaded value
+ int value2 = *pointer2 & predicate_state;
+ leak(value2);
+ }
+}
+```
+
+The result should be that if the `if (condition) {` branch is mis-predicted,
+there is a *data* dependency on the condition used to zero out any pointers
+prior to loading through them or to zero out all of the loaded bits. Even
+though this code pattern may still execute speculatively, *invalid* speculative
+executions are prevented from leaking secret data from memory (but note that
+this data might still be loaded in safe ways, and some regions of memory are
+required to not hold secrets, see below for detailed limitations). This
+approach only requires the underlying hardware have a way to implement a
+branchless and unpredicted conditional update of a register's value. All modern
+architectures have support for this, and in fact such support is necessary to
+correctly implement constant time cryptographic primitives.
+
+Crucial properties of this approach:
+* It is not preventing any particular side-channel from working. This is
+ important as there are an unknown number of potential side channels and we
+ expect to continue discovering more. Instead, it prevents the observation of
+ secret data in the first place.
+* It accumulates the predicate state, protecting even in the face of nested
+ *correctly* predicted control flows.
+* It passes this predicate state across function boundaries to provide
+ [interprocedural protection](#interprocedural-checking).
+* When hardening the address of a load, it uses a *destructive* or
+ *non-reversible* modification of the address to prevent an attacker from
+ reversing the check using attacker-controlled inputs.
+* It does not completely block speculative execution, and merely prevents
+ *mis*-speculated paths from leaking secrets from memory (and stalls
+ speculation until this can be determined).
+* It is completely general and makes no fundamental assumptions about the
+ underlying architecture other than the ability to do branchless conditional
+ data updates and a lack of value prediction.
+* It does not require programmers to identify all possible secret data using
+ static source code annotations or code vulnerable to a variant #1 style
+ attack.
+
+Limitations of this approach:
+* It requires re-compiling source code to insert hardening instruction
+ sequences. Only software compiled in this mode is protected.
+* The performance is heavily dependent on a particular architecture's
+ implementation strategy. We outline a potential x86 implementation below and
+ characterize its performance.
+* It does not defend against secret data already loaded from memory and
+ residing in registers or leaked through other side-channels in
+ non-speculative execution. Code dealing with this, e.g cryptographic
+ routines, already uses constant-time algorithms and code to prevent
+ side-channels. Such code should also scrub registers of secret data following
+ [these
+ guidelines](https://github.com/HACS-workshop/spectre-mitigations/blob/master/crypto_guidelines.md).
+* To achieve reasonable performance, many loads may not be checked, such as
+ those with compile-time fixed addresses. This primarily consists of accesses
+ at compile-time constant offsets of global and local variables. Code which
+ needs this protection and intentionally stores secret data must ensure the
+ memory regions used for secret data are necessarily dynamic mappings or heap
+ allocations. This is an area which can be tuned to provide more comprehensive
+ protection at the cost of performance.
+* [Hardened loads](#hardening-the-address-of-the-load) may still load data from
+ _valid_ addresses if not _attacker-controlled_ addresses. To prevent these
+ from reading secret data, the low 2gb of the address space and 2gb above and
+ below any executable pages should be protected.
+
+Credit:
+* The core idea of tracing misspeculation through data and marking pointers to
+ block misspeculated loads was developed as part of a HACS 2018 discussion
+ between Chandler Carruth, Paul Kocher, Thomas Pornin, and several other
+ individuals.
+* Core idea of masking out loaded bits was part of the original mitigation
+ suggested by Jann Horn when these attacks were reported.
+
+
+### Indirect Branches, Calls, and Returns
+
+It is possible to attack control flow other than conditional branches with
+variant #1 style mispredictions.
+* A prediction towards a hot call target of a virtual method can lead to it
+ being speculatively executed when an expected type is used (often called
+ "type confusion").
+* A hot case may be speculatively executed due to prediction instead of the
+ correct case for a switch statement implemented as a jump table.
+* A hot common return address may be predicted incorrectly when returning from
+ a function.
+
+These code patterns are also vulnerable to Spectre variant #2, and as such are
+best mitigated with a
+[retpoline](https://support.google.com/faqs/answer/7625886) on x86 platforms.
+When a mitigation technique like retpoline is used, speculation simply cannot
+proceed through an indirect control flow edge (or it cannot be mispredicted in
+the case of a filled RSB) and so it is also protected from variant #1 style
+attacks. However, some architectures, micro-architectures, or vendors do not
+employ the retpoline mitigation, and on future x86 hardware (both Intel and
+AMD) it is expected to become unnecessary due to hardware-based mitigation.
+
+When not using a retpoline, these edges will need independent protection from
+variant #1 style attacks. The analogous approach to that used for conditional
+control flow should work:
+```
+uintptr_t all_ones_mask = std::numerical_limits<uintptr_t>::max();
+uintptr_t all_zeros_mask = 0;
+void leak(int data);
+void example(int* pointer1, int* pointer2) {
+ uintptr_t predicate_state = all_ones_mask;
+ switch (condition) {
+ case 0:
+ // Assuming ?: is implemented using branchless logic...
+ predicate_state = (condition != 0) ? all_zeros_mask : predicate_state;
+ // ... lots of code ...
+ //
+ // Harden the pointer so it can't be loaded
+ pointer1 &= predicate_state;
+ leak(*pointer1);
+ break;
+
+ case 1:
+ predicate_state = (condition != 1) ? all_zeros_mask : predicate_state;
+ // ... more code ...
+ //
+ // Alternative: Harden the loaded value
+ int value2 = *pointer2 & predicate_state;
+ leak(value2);
+ break;
+
+ // ...
+ }
+}
+```
+
+The core idea remains the same: validate the control flow using data-flow and
+use that validation to check that loads cannot leak information along
+misspeculated paths. Typically this involves passing the desired target of such
+control flow across the edge and checking that it is correct afterwards. Note
+that while it is tempting to think that this mitigates variant #2 attacks, it
+does not. Those attacks go to arbitrary gadgets that don't include the checks.
+
+
+### Variant #1.1 and #1.2 attacks: "Bounds Check Bypass Store"
+
+Beyond the core variant #1 attack, there are techniques to extend this attack.
+The primary technique is known as "Bounds Check Bypass Store" and is discussed
+in this research paper: https://people.csail.mit.edu/vlk/spectre11.pdf
+
+We will analyze these two variants independently. First, variant #1.1 works by
+speculatively storing over the return address after a bounds check bypass. This
+speculative store then ends up being used by the CPU during speculative
+execution of the return, potentially directing speculative execution to
+arbitrary gadgets in the binary. Let's look at an example.
+```
+unsigned char local_buffer[4];
+unsigned char *untrusted_data_from_caller = ...;
+unsigned long untrusted_size_from_caller = ...;
+if (untrusted_size_from_caller < sizeof(local_buffer)) {
+ // Speculative execution enters here with a too-large size.
+ memcpy(local_buffer, untrusted_data_from_caller,
+ untrusted_size_from_caller);
+ // The stack has now been smashed, writing an attacker-controlled
+ // address over the return adress.
+ minor_processing(local_buffer);
+ return;
+ // Control will speculate to the attacker-written address.
+}
+```
+
+However, this can be mitigated by hardening the load of the return address just
+like any other load. This is sometimes complicated because x86 for example
+*implicitly* loads the return address off the stack. However, the
+implementation technique below is specifically designed to mitigate this
+implicit load by using the stack pointer to communicate misspeculation between
+functions. This additionally causes a misspeculation to have an invalid stack
+pointer and never be able to read the speculatively stored return address. See
+the detailed discussion below.
+
+For variant #1.2, the attacker speculatively stores into the vtable or jump
+table used to implement an indirect call or indirect jump. Because this is
+speculative, this will often be possible even when these are stored in
+read-only pages. For example:
+```
+class FancyObject : public BaseObject {
+public:
+ void DoSomething() override;
+};
+void f(unsigned long attacker_offset, unsigned long attacker_data) {
+ FancyObject object = getMyObject();
+ unsigned long *arr[4] = getFourDataPointers();
+ if (attacker_offset < 4) {
+ // We have bypassed the bounds check speculatively.
+ unsigned long *data = arr[attacker_offset];
+ // Now we have computed a pointer inside of `object`, the vptr.
+ *data = attacker_data;
+ // The vptr points to the virtual table and we speculatively clobber that.
+ g(object); // Hand the object to some other routine.
+ }
+}
+// In another file, we call a method on the object.
+void g(BaseObject &object) {
+ object.DoSomething();
+ // This speculatively calls the address stored over the vtable.
+}
+```
+
+Mitigating this requires hardening loads from these locations, or mitigating
+the indirect call or indirect jump. Any of these are sufficient to block the
+call or jump from using a speculatively stored value that has been read back.
+
+For both of these, using retpolines would be equally sufficient. One possible
+hybrid approach is to use retpolines for indirect call and jump, while relying
+on SLH to mitigate returns.
+
+Another approach that is sufficient for both of these is to harden all of the
+speculative stores. However, as most stores aren't interesting and don't
+inherently leak data, this is expected to be prohibitively expensive given the
+attack it is defending against.
+
+
+## Implementation Details
+
+There are a number of complex details impacting the implementation of this
+technique, both on a particular architecture and within a particular compiler.
+We discuss proposed implementation techniques for the x86 architecture and the
+LLVM compiler. These are primarily to serve as an example, as other
+implementation techniques are very possible.
+
+
+### x86 Implementation Details
+
+On the x86 platform we break down the implementation into three core
+components: accumulating the predicate state through the control flow graph,
+checking the loads, and checking control transfers between procedures.
+
+
+#### Accumulating Predicate State
+
+Consider baseline x86 instructions like the following, which test three
+conditions and if all pass, loads data from memory and potentially leaks it
+through some side channel:
+```
+# %bb.0: # %entry
+ pushq %rax
+ testl %edi, %edi
+ jne .LBB0_4
+# %bb.1: # %then1
+ testl %esi, %esi
+ jne .LBB0_4
+# %bb.2: # %then2
+ testl %edx, %edx
+ je .LBB0_3
+.LBB0_4: # %exit
+ popq %rax
+ retq
+.LBB0_3: # %danger
+ movl (%rcx), %edi
+ callq leak
+ popq %rax
+ retq
+```
+
+When we go to speculatively execute the load, we want to know whether any of
+the dynamically executed predicates have been misspeculated. To track that,
+along each conditional edge, we need to track the data which would allow that
+edge to be taken. On x86, this data is stored in the flags register used by the
+conditional jump instruction. Along both edges after this fork in control flow,
+the flags register remains alive and contains data that we can use to build up
+our accumulated predicate state. We accumulate it using the x86 conditional
+move instruction which also reads the flag registers where the state resides.
+These conditional move instructions are known to not be predicted on any x86
+processors, making them immune to misprediction that could reintroduce the
+vulnerability. When we insert the conditional moves, the code ends up looking
+like the following:
+```
+# %bb.0: # %entry
+ pushq %rax
+ xorl %eax, %eax # Zero out initial predicate state.
+ movq $-1, %r8 # Put all-ones mask into a register.
+ testl %edi, %edi
+ jne .LBB0_1
+# %bb.2: # %then1
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ testl %esi, %esi
+ jne .LBB0_1
+# %bb.3: # %then2
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ testl %edx, %edx
+ je .LBB0_4
+.LBB0_1:
+ cmoveq %r8, %rax # Conditionally update predicate state.
+ popq %rax
+ retq
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+```
+
+Here we create the "empty" or "correct execution" predicate state by zeroing
+`%rax`, and we create a constant "incorrect execution" predicate value by
+putting `-1` into `%r8`. Then, along each edge coming out of a conditional
+branch we do a conditional move that in a correct execution will be a no-op,
+but if misspeculated, will replace the `%rax` with the value of `%r8`.
+Misspeculating any one of the three predicates will cause `%rax` to hold the
+"incorrect execution" value from `%r8` as we preserve incoming values when
+execution is correct rather than overwriting it.
+
+We now have a value in `%rax` in each basic block that indicates if at some
+point previously a predicate was mispredicted. And we have arranged for that
+value to be particularly effective when used below to harden loads.
+
+
+##### Indirect Call, Branch, and Return Predicates
+
+There is no analogous flag to use when tracing indirect calls, branches, and
+returns. The predicate state must be accumulated through some other means.
+Fundamentally, this is the reverse of the problem posed in CFI: we need to
+check where we came from rather than where we are going. For function-local
+jump tables, this is easily arranged by testing the input to the jump table
+within each destination (not yet implemented, use retpolines):
+```
+ pushq %rax
+ xorl %eax, %eax # Zero out initial predicate state.
+ movq $-1, %r8 # Put all-ones mask into a register.
+ jmpq *.LJTI0_0(,%rdi,8) # Indirect jump through table.
+.LBB0_2: # %sw.bb
+ testq $0, %rdi # Validate index used for jump table.
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+ jmp _Z4leaki # TAILCALL
+
+.LBB0_3: # %sw.bb1
+ testq $1, %rdi # Validate index used for jump table.
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+ jmp _Z4leaki # TAILCALL
+
+.LBB0_5: # %sw.bb10
+ testq $2, %rdi # Validate index used for jump table.
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ ...
+ jmp _Z4leaki # TAILCALL
+ ...
+
+ .section .rodata,"a", at progbits
+ .p2align 3
+.LJTI0_0:
+ .quad .LBB0_2
+ .quad .LBB0_3
+ .quad .LBB0_5
+ ...
+```
+
+Returns have a simple mitigation technique on x86-64 (or other ABIs which have
+what is called a "red zone" region beyond the end of the stack). This region is
+guaranteed to be preserved across interrupts and context switches, making the
+return address used in returning to the current code remain on the stack and
+valid to read. We can emit code in the caller to verify that a return edge was
+not mispredicted:
+```
+ callq other_function
+return_addr:
+ testq -8(%rsp), return_addr # Validate return address.
+ cmovneq %r8, %rax # Update predicate state.
+```
+
+For an ABI without a "red zone" (and thus unable to read the return address
+from the stack), we can compute the expected return address prior to the call
+into a register preserved across the call and use that similarly to the above.
+
+Indirect calls (and returns in the absence of a red zone ABI) pose the most
+significant challenge to propagate. The simplest technique would be to define a
+new ABI such that the intended call target is passed into the called function
+and checked in the entry. Unfortunately, new ABIs are quite expensive to deploy
+in C and C++. While the target function could be passed in TLS, we would still
+require complex logic to handle a mixture of functions compiled with and
+without this extra logic (essentially, making the ABI backwards compatible).
+Currently, we suggest using retpolines here and will continue to investigate
+ways of mitigating this.
+
+
+##### Optimizations, Alternatives, and Tradeoffs
+
+Merely accumulating predicate state involves significant cost. There are
+several key optimizations we employ to minimize this and various alternatives
+that present different tradeoffs in the generated code.
+
+First, we work to reduce the number of instructions used to track the state:
+* Rather than inserting a `cmovCC` instruction along every conditional edge in
+ the original program, we track each set of condition flags we need to capture
+ prior to entering each basic block and reuse a common `cmovCC` sequence for
+ those.
+ * We could further reuse suffixes when there are multiple `cmovCC`
+ instructions required to capture the set of flags. Currently this is
+ believed to not be worth the cost as paired flags are relatively rare and
+ suffixes of them are exceedingly rare.
+* A common pattern in x86 is to have multiple conditional jump instructions
+ that use the same flags but handle different conditions. Naively, we could
+ consider each fallthrough between them an "edge" but this causes a much more
+ complex control flow graph. Instead, we accumulate the set of conditions
+ necessary for fallthrough and use a sequence of `cmovCC` instructions in a
+ single fallthrough edge to track it.
+
+Second, we trade register pressure for simpler `cmovCC` instructions by
+allocating a register for the "bad" state. We could read that value from memory
+as part of the conditional move instruction, however, this creates more
+micro-ops and requires the load-store unit to be involved. Currently, we place
+the value into a virtual register and allow the register allocator to decide
+when the register pressure is sufficient to make it worth spilling to memory
+and reloading.
+
+
+#### Hardening Loads
+
+Once we have the predicate accumulated into a special value for correct vs.
+misspeculated, we need to apply this to loads in a way that ensures they do not
+leak secret data. There are two primary techniques for this: we can either
+harden the loaded value to prevent observation, or we can harden the address
+itself to prevent the load from occuring. These have significantly different
+performance tradeoffs.
+
+
+##### Hardening loaded values
+
+The most appealing way to harden loads is to mask out all of the bits loaded.
+The key requirement is that for each bit loaded, along the misspeculated path
+that bit is always fixed at either 0 or 1 regardless of the value of the bit
+loaded. The most obvious implementation uses either an `and` instruction with
+an all-zero mask along misspeculated paths and an all-one mask along correct
+paths, or an `or` instruction with an all-one mask along misspeculated paths
+and an all-zero mask along correct paths. Other options become less appealing
+such as multiplying by zero, or multiple shift instructions. For reasons we
+elaborate on below, we end up suggesting you use `or` with an all-ones mask,
+making the x86 instruction sequence look like the following:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ movl (%rsi), %edi # Load potentially secret data from %rsi.
+ orl %eax, %edi
+```
+
+Other useful patterns may be to fold the load into the `or` instruction itself
+at the cost of a register-to-register copy.
+
+There are some challenges with deploying this approach:
+1. Many loads on x86 are folded into other instructions. Separating them would
+ add very significant and costly register pressure with prohibitive
+ performance cost.
+1. Loads may not target a general purpose register requiring extra instructions
+ to map the state value into the correct register class, and potentially more
+ expensive instructions to mask the value in some way.
+1. The flags registers on x86 are very likely to be live, and challenging to
+ preserve cheaply.
+1. There are many more values loaded than pointers & indices used for loads. As
+ a consequence, hardening the result of a load requires substantially more
+ instructions than hardening the address of the load (see below).
+
+Despite these challenges, hardening the result of the load critically allows
+the load to proceed and thus has dramatically less impact on the total
+speculative / out-of-order potential of the execution. There are also several
+interesting techniques to try and mitigate these challenges and make hardening
+the results of loads viable in at least some cases. However, we generally
+expect to fall back when unprofitable from hardening the loaded value to the
+next approach of hardening the address itself.
+
+
+###### Loads folded into data-invariant operations can be hardened after the operation
+
+The first key to making this feasible is to recognize that many operations on
+x86 are "data-invariant". That is, they have no (known) observable behavior
+differences due to the particular input data. These instructions are often used
+when implementing cryptographic primitives dealing with private key data
+because they are not believed to provide any side-channels. Similarly, we can
+defer hardening until after them as they will not in-and-of-themselves
+introduce a speculative execution side-channel. This results in code sequences
+that look like:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ addl (%rsi), %edi # Load and accumulate without leaking.
+ orl %eax, %edi
+```
+
+While an addition happens to the loaded (potentially secret) value, that
+doesn't leak any data and we then immediately harden it.
+
+
+###### Hardening of loaded values deferred down the data-invariant expression graph
+
+We can generalize the previous idea and sink the hardening down the expression
+graph across as many data-invariant operations as desirable. This can use very
+conservative rules for whether something is data-invariant. The primary goal
+should be to handle multiple loads with a single hardening instruction:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ addl (%rsi), %edi # Load and accumulate without leaking.
+ addl 4(%rsi), %edi # Continue without leaking.
+ addl 8(%rsi), %edi
+ orl %eax, %edi # Mask out bits from all three loads.
+```
+
+
+###### Preserving the flags while hardening loaded values on Haswell, Zen, and newer processors
+
+Sadly, there are no useful instructions on x86 that apply a mask to all 64 bits
+without touching the flag registers. However, we can harden loaded values that
+are narrower than a word (fewer than 32-bits on 32-bit systems and fewer than
+64-bits on 64-bit systems) by zero-extending the value to the full word size
+and then shifting right by at least the number of original bits using the BMI2
+`shrx` instruction:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ addl (%rsi), %edi # Load and accumulate 32 bits of data.
+ shrxq %rax, %rdi, %rdi # Shift out all 32 bits loaded.
+```
+
+Because on x86 the zero-extend is free, this can efficiently harden the loaded
+value.
+
+
+##### Hardening the address of the load
+
+When hardening the loaded value is inapplicable, most often because the
+instruction directly leaks information (like `cmp` or `jmpq`), we switch to
+hardening the _address_ of the load instead of the loaded value. This avoids
+increasing register pressure by unfolding the load or paying some other high
+cost.
+
+To understand how this works in practice, we need to examine the exact
+semantics of the x86 addressing modes which, in its fully general form, looks
+like `(%base,%index,scale)offset`. Here `%base` and `%index` are 64-bit
+registers that can potentially be any value, and may be attacker controlled,
+and `scale` and `offset` are fixed immediate values. `scale` must be `1`, `2`,
+`4`, or `8`, and `offset` can be any 32-bit sign extended value. The exact
+computation performed to find the address is then: `%base + (scale * %index) +
+offset` under 64-bit 2's complement modular arithmetic.
+
+One issue with this approach is that, after hardening, the `%base + (scale *
+%index)` subexpression will compute a value near zero (`-1 + (scale * -1)`) and
+then a large, positive `offset` will index into memory within the first two
+gigabytes of address space. While these offsets are not attacker controlled,
+the attacker could chose to attack a load which happens to have the desired
+offset and then successfully read memory in that region. This significantly
+raises the burden on the attacker and limits the scope of attack but does not
+eliminate it. To fully close the attack we must work with the operating system
+to preclude mapping memory in the low two gigabytes of address space.
+
+
+###### 64-bit load checking instructions
+
+We can use the following instruction sequences to check loads. We set up `%r8`
+in these examples to hold the special value of `-1` which will be `cmov`ed over
+`%rax` in misspeculated paths.
+
+Single register addressing mode:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ orq %rax, %rsi # Mask the pointer if misspeculating.
+ movl (%rsi), %edi
+```
+
+Two register addressing mode:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ orq %rax, %rsi # Mask the pointer if misspeculating.
+ orq %rax, %rcx # Mask the index if misspeculating.
+ movl (%rsi,%rcx), %edi
+```
+
+This will result in a negative address near zero or in `offset` wrapping the
+address space back to a small positive address. Small, negative addresses will
+fault in user-mode for most operating systems, but targets which need the high
+address space to be user accessible may need to adjust the exact sequence used
+above. Additionally, the low addresses will need to be marked unreadable by the
+OS to fully harden the load.
+
+
+###### RIP-relative addressing is even easier to break
+
+There is a common addressing mode idiom that is substantially harder to check:
+addressing relative to the instruction pointer. We cannot change the value of
+the instruction pointer register and so we have the harder problem of forcing
+`%base + scale * %index + offset` to be an invalid address, by *only* changing
+`%index`. The only advantage we have is that the attacker also cannot modify
+`%base`. If we use the fast instruction sequence above, but only apply it to
+the index, we will always access `%rip + (scale * -1) + offset`. If the
+attacker can find a load which with this address happens to point to secret
+data, then they can reach it. However, the loader and base libraries can also
+simply refuse to map the heap, data segments, or stack within 2gb of any of the
+text in the program, much like it can reserve the low 2gb of address space.
+
+
+###### The flag registers again make everything hard
+
+Unfortunately, the technique of using `orq`-instructions has a serious flaw on
+x86. The very thing that makes it easy to accumulate state, the flag registers
+containing predicates, causes serious problems here because they may be alive
+and used by the loading instruction or subsequent instructions. On x86, the
+`orq` instruction **sets** the flags and will override anything already there.
+This makes inserting them into the instruction stream very hazardous.
+Unfortunately, unlike when hardening the loaded value, we have no fallback here
+and so we must have a fully general approach available.
+
+The first thing we must do when generating these sequences is try to analyze
+the surrounding code to prove that the flags are not in fact alive or being
+used. Typically, it has been set by some other instruction which just happens
+to set the flags register (much like ours!) with no actual dependency. In those
+cases, it is safe to directly insert these instructions. Alternatively we may
+be able to move them earlier to avoid clobbering the used value.
+
+However, this may ultimately be impossible. In that case, we need to preserve
+the flags around these instructions:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ pushfq
+ orq %rax, %rcx # Mask the pointer if misspeculating.
+ orq %rax, %rdx # Mask the index if misspeculating.
+ popfq
+ movl (%rcx,%rdx), %edi
+```
+
+Using the `pushf` and `popf` instructions saves the flags register around our
+inserted code, but comes at a high cost. First, we must store the flags to the
+stack and reload them. Second, this causes the stack pointer to be adjusted
+dynamically, requiring a frame pointer be used for referring to temporaries
+spilled to the stack, etc.
+
+On newer x86 processors we can use the `lahf` and `sahf` instructions to save
+all of the flags besides the overflow flag in a register rather than on the
+stack. We can then use `seto` and `add` to save and restore the overflow flag
+in a register. Combined, this will save and restore flags in the same manner as
+above but using two registers rather than the stack. That is still very
+expensive if slightly less expensive than `pushf` and `popf` in most cases.
+
+
+###### A flag-less alternative on Haswell, Zen and newer processors
+
+Starting with the BMI2 x86 instruction set extensions available on Haswell and
+Zen processors, there is an instruction for shifting that does not set any
+flags: `shrx`. We can use this and the `lea` instruction to implement analogous
+code sequences to the above ones. However, these are still very marginally
+slower, as there are fewer ports able to dispatch shift instructions in most
+modern x86 processors than there are for `or` instructions.
+
+Fast, single register addressing mode:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ shrxq %rax, %rsi, %rsi # Shift away bits if misspeculating.
+ movl (%rsi), %edi
+```
+
+This will collapse the register to zero or one, and everything but the offset
+in the addressing mode to be less than or equal to 9. This means the full
+address can only be guaranteed to be less than `(1 << 31) + 9`. The OS may wish
+to protect an extra page of the low address space to account for this
+
+
+##### Optimizations
+
+A very large portion of the cost for this approach comes from checking loads in
+this way, so it is important to work to optimize this. However, beyond making
+the instruction sequences to *apply* the checks efficient (for example by
+avoiding `pushfq` and `popfq` sequences), the only significant optimization is
+to check fewer loads without introducing a vulnerability. We apply several
+techniques to accomplish that.
+
+
+###### Don't check loads from compile-time constant stack offsets
+
+We implement this optimization on x86 by skipping the checking of loads which
+use a fixed frame pointer offset.
+
+The result of this optimization is that patterns like reloading a spilled
+register or accessing a global field don't get checked. This is a very
+significant performance win.
+
+
+###### Don't check dependent loads
+
+A core part of why this mitigation strategy works is that it establishes a
+data-flow check on the loaded address. However, this means that if the address
+itself was already loaded using a checked load, there is no need to check a
+dependent load provided it is within the same basic block as the checked load,
+and therefore has no additional predicates guarding it. Consider code like the
+following:
+```
+ ...
+
+.LBB0_4: # %danger
+ movq (%rcx), %rdi
+ movl (%rdi), %edx
+```
+
+This will get transformed into:
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ orq %rax, %rcx # Mask the pointer if misspeculating.
+ movq (%rcx), %rdi # Hardened load.
+ movl (%rdi), %edx # Unhardened load due to dependent addr.
+```
+
+This doesn't check the load through `%rdi` as that pointer is dependent on a
+checked load already.
+
+
+###### Protect large, load-heavy blocks with a single lfence
+
+It may be worth using a single `lfence` instruction at the start of a block
+which begins with a (very) large number of loads that require independent
+protection *and* which require hardening the address of the load. However, this
+is unlikely to be profitable in practice. The latency hit of the hardening
+would need to exceed that of an `lfence` when *correctly* speculatively
+executed. But in that case, the `lfence` cost is a complete loss of speculative
+execution (at a minimum). So far, the evidence we have of the performance cost
+of using `lfence` indicates few if any hot code patterns where this trade off
+would make sense.
+
+
+###### Tempting optimizations that break the security model
+
+Several optimizations were considered which didn't pan out due to failure to
+uphold the security model. One in particular is worth discussing as many others
+will reduce to it.
+
+We wondered whether only the *first* load in a basic block could be checked. If
+the check works as intended, it forms an invalid pointer that doesn't even
+virtual-address translate in the hardware. It should fault very early on in its
+processing. Maybe that would stop things in time for the misspeculated path to
+fail to leak any secrets. This doesn't end up working because the processor is
+fundamentally out-of-order, even in its speculative domain. As a consequence,
+the attacker could cause the initial address computation itself to stall and
+allow an arbitrary number of unrelated loads (including attacked loads of
+secret data) to pass through.
+
+
+#### Interprocedural Checking
+
+Modern x86 processors may speculate into called functions and out of functions
+to their return address. As a consequence, we need a way to check loads that
+occur after a misspeculated predicate but where the load and the misspeculated
+predicate are in different functions. In essence, we need some interprocedural
+generalization of the predicate state tracking. A primary challenge to passing
+the predicate state between functions is that we would like to not require a
+change to the ABI or calling convention in order to make this mitigation more
+deployable, and further would like code mitigated in this way to be easily
+mixed with code not mitigated in this way and without completely losing the
+value of the mitigation.
+
+
+##### Embed the predicate state into the high bit(s) of the stack pointer
+
+We can use the same technique that allows hardening pointers to pass the
+predicate state into and out of functions. The stack pointer is trivially
+passed between functions and we can test for it having the high bits set to
+detect when it has been marked due to misspeculation. The callsite instruction
+sequence looks like (assuming a misspeculated state value of `-1`):
+```
+ ...
+
+.LBB0_4: # %danger
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ shlq $47, %rax
+ orq %rax, %rsp
+ callq other_function
+ movq %rsp, %rax
+ sarq 63, %rax # Sign extend the high bit to all bits.
+```
+
+This first puts the predicate state into the high bits of `%rsp` before calling
+the function and then reads it back out of high bits of `%rsp` afterward. When
+correctly executing (speculatively or not), these are all no-ops. When
+misspeculating, the stack pointer will end up negative. We arrange for it to
+remain a canonical address, but otherwise leave the low bits alone to allow
+stack adjustments to proceed normally without disrupting this. Within the
+called function, we can extract this predicate state and then reset it on
+return:
+```
+other_function:
+ # prolog
+ callq other_function
+ movq %rsp, %rax
+ sarq 63, %rax # Sign extend the high bit to all bits.
+ # ...
+
+.LBB0_N:
+ cmovneq %r8, %rax # Conditionally update predicate state.
+ shlq $47, %rax
+ orq %rax, %rsp
+ retq
+```
+
+This approach is effective when all code is mitigated in this fashion, and can
+even survive very limited reaches into unmitigated code (the state will
+round-trip in and back out of an unmitigated function, it just won't be
+updated). But it does have some limitations. There is a cost to merging the
+state into `%rsp` and it doesn't insulate mitigated code from misspeculation in
+an unmitigated caller.
+
+There is also an advantage to using this form of interprocedural mitigation: by
+forming these invalid stack pointer addresses we can prevent speculative
+returns from successfully reading speculatively written values to the actual
+stack. This works first by forming a data-dependency between computing the
+address of the return address on the stack and our predicate state. And even
+when satisfied, if a misprediction causes the state to be poisoned the
+resulting stack pointer will be invalid.
+
+
+##### Rewrite API of internal functions to directly propagate predicate state
+
+(Not yet implemented.)
+
+We have the option with internal functions to directly adjust their API to
+accept the predicate as an argument and return it. This is likely to be
+marginally cheaper than embedding into `%rsp` for entering functions.
+
+
+##### Use `lfence` to guard function transitions
+
+An `lfence` instruction can be used to prevent subsequent loads from
+speculatively executing until all prior mispredicted predicates have resolved.
+We can use this broader barrier to speculative loads executing between
+functions. We emit it in the entry block to handle calls, and prior to each
+return. This approach also has the advantage of providing the strongest degree
+of mitigation when mixed with unmitigated code by halting all misspeculation
+entering a function which is mitigated, regardless of what occured in the
+caller. However, such a mixture is inherently more risky. Whether this kind of
+mixture is a sufficient mitigation requires careful analysis.
+
+Unfortunately, experimental results indicate that the performance overhead of
+this approach is very high for certain patterns of code. A classic example is
+any form of recursive evaluation engine. The hot, rapid call and return
+sequences exhibit dramatic performance loss when mitigated with `lfence`. This
+component alone can regress performance by 2x or more, making it an unpleasant
+tradeoff even when only used in a mixture of code.
+
+
+##### Use an internal TLS location to pass predicate state
+
+We can define a special thread-local value to hold the predicate state between
+functions. This avoids direct ABI implications by using a side channel between
+callers and callees to communicate the predicate state. It also allows implicit
+zero-initialization of the state, which allows non-checked code to be the first
+code executed.
+
+However, this requires a load from TLS in the entry block, a store to TLS
+before every call and every ret, and a load from TLS after every call. As a
+consequence it is expected to be substantially more expensive even than using
+`%rsp` and potentially `lfence` within the function entry block.
+
+
+##### Define a new ABI and/or calling convention
+
+We could define a new ABI and/or calling convention to explicitly pass the
+predicate state in and out of functions. This may be interesting if none of the
+alternatives have adequate performance, but it makes deployment and adoption
+dramatically more complex, and potentially infeasible.
+
+
+## High-Level Alternative Mitigation Strategies
+
+There are completely different alternative approaches to mitigating variant 1
+attacks. [Most](https://lwn.net/Articles/743265/)
+[discussion](https://lwn.net/Articles/744287/) so far focuses on mitigating
+specific known attackable components in the Linux kernel (or other kernels) by
+manually rewriting the code to contain an instruction sequence that is not
+vulnerable. For x86 systems this is done by either injecting an `lfence`
+instruction along the code path which would leak data if executed speculatively
+or by rewriting memory accesses to have branch-less masking to a known safe
+region. On Intel systems, `lfence` [will prevent the speculative load of secret
+data](https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf).
+On AMD systems `lfence` is currently a no-op, but can be made
+dispatch-serializing by setting an MSR, and thus preclude misspeculation of the
+code path ([mitigation G-2 +
+V1-1](https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf)).
+
+However, this relies on finding and enumerating all possible points in code
+which could be attacked to leak information. While in some cases static
+analysis is effective at doing this at scale, in many cases it still relies on
+human judgement to evaluate whether code might be vulnerable. Especially for
+software systems which receive less detailed scrutiny but remain sensitive to
+these attacks, this seems like an impractical security model. We need an
+automatic and systematic mitigation strategy.
+
+
+### Automatic `lfence` on Conditional Edges
+
+A natural way to scale up the existing hand-coded mitigations is simply to
+inject an `lfence` instruction into both the target and fallthrough
+destinations of every conditional branch. This ensures that no predicate or
+bounds check can be bypassed speculatively. However, the performance overhead
+of this approach is, simply put, catastrophic. Yet it remains the only truly
+"secure by default" approach known prior to this effort and serves as the
+baseline for performance.
+
+One attempt to address the performance overhead of this and make it more
+realistic to deploy is [MSVC's /Qspectre
+switch](https://blogs.msdn.microsoft.com/vcblog/2018/01/15/spectre-mitigations-in-msvc/).
+Their technique is to use static analysis within the compiler to only insert
+`lfence` instructions into conditional edges at risk of attack. However,
+[initial](https://arstechnica.com/gadgets/2018/02/microsofts-compiler-level-spectre-fix-shows-how-hard-this-problem-will-be-to-solve/)
+[analysis](https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html)
+has shown that this approach is incomplete and only catches a small and limited
+subset of attackable patterns which happen to resemble very closely the initial
+proofs of concept. As such, while its performance is acceptable, it does not
+appear to be an adequate systematic mitigation.
+
+
+## Performance Overhead
+
+The performance overhead of this style of comprehensive mitigation is very
+high. However, it compares very favorably with previously recommended
+approaches such as the `lfence` instruction. Just as users can restrict the
+scope of `lfence` to control its performance impact, this mitigation technique
+could be restricted in scope as well.
+
+However, it is important to understand what it would cost to get a fully
+mitigated baseline. Here we assume targeting a Haswell (or newer) processor and
+using all of the tricks to improve performance (so leaves the low 2gb
+unprotected and +/- 2gb surrounding any PC in the program). We ran both
+Google's microbenchmark suite and a large highly-tuned server built using
+ThinLTO and PGO. All were built with `-march=haswell` to give access to BMI2
+instructions, and benchmarks were run on large Haswell servers. We collected
+data both with an `lfence`-based mitigation and load hardening as presented
+here. The summary is that mitigating with load hardening is 1.77x faster than
+mitigating with `lfence`, and the overhead of load hardening compared to a
+normal program is likely between a 10% overhead and a 50% overhead with most
+large applications seeing a 30% overhead or less.
+
+| Benchmark | `lfence` | Load Hardening | Mitigated Speedup |
+| -------------------------------------- | -------: | -------------: | ----------------: |
+| Google microbenchmark suite | -74.8% | -36.4% | **2.5x** |
+| Large server QPS (using ThinLTO & PGO) | -62% | -29% | **1.8x** |
+
+Below is a visualization of the microbenchmark suite results which helps show
+the distribution of results that is somewhat lost in the summary. The y-axis is
+a log-scale speedup ratio of load hardening relative to `lfence` (up -> faster
+-> better). Each box-and-whiskers represents one microbenchmark which may have
+many different metrics measured. The red line marks the median, the box marks
+the first and third quartiles, and the whiskers mark the min and max.
+
+![Microbenchmark result visualization](speculative_load_hardening_microbenchmarks.png)
+
+We don't yet have benchmark data on SPEC or the LLVM test suite, but we can
+work on getting that. Still, the above should give a pretty clear
+characterization of the performance, and specific benchmarks are unlikely to
+reveal especially interesting properties.
+
+
+### Future Work: Fine Grained Control and API-Integration
+
+The performance overhead of this technique is likely to be very significant and
+something users wish to control or reduce. There are interesting options here
+that impact the implementation strategy used.
+
+One particularly appealing option is to allow both opt-in and opt-out of this
+mitigation at reasonably fine granularity such as on a per-function basis,
+including intelligent handling of inlining decisions -- protected code can be
+prevented from inlining into unprotected code, and unprotected code will become
+protected when inlined into protected code. For systems where only a limited
+set of code is reachable by externally controlled inputs, it may be possible to
+limit the scope of mitigation through such mechanisms without compromising the
+application's overall security. The performance impact may also be focused in a
+few key functions that can be hand-mitigated in ways that have lower
+performance overhead while the remainder of the application receives automatic
+protection.
+
+For both limiting the scope of mitigation or manually mitigating hot functions,
+there needs to be some support for mixing mitigated and unmitigated code
+without completely defeating the mitigation. For the first use case, it would
+be particularly desirable that mitigated code remains safe when being called
+during misspeculation from unmitigated code.
+
+For the second use case, it may be important to connect the automatic
+mitigation technique to explicit mitigation APIs such as what is described in
+http://wg21.link/p0928 (or any other eventual API) so that there is a clean way
+to switch from automatic to manual mitigation without immediately exposing a
+hole. However, the design for how to do this is hard to come up with until the
+APIs are better established. We will revisit this as those APIs mature.
Added: www-releases/trunk/8.0.1/docs/_sources/SphinxQuickstartTemplate.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/SphinxQuickstartTemplate.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/SphinxQuickstartTemplate.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/SphinxQuickstartTemplate.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,160 @@
+==========================
+Sphinx Quickstart Template
+==========================
+
+Introduction and Quickstart
+===========================
+
+This document is meant to get you writing documentation as fast as possible
+even if you have no previous experience with Sphinx. The goal is to take
+someone in the state of "I want to write documentation and get it added to
+LLVM's docs" and turn that into useful documentation mailed to llvm-commits
+with as little nonsense as possible.
+
+You can find this document in ``docs/SphinxQuickstartTemplate.rst``. You
+should copy it, open the new file in your text editor, write your docs, and
+then send the new document to llvm-commits for review.
+
+Focus on *content*. It is easy to fix the Sphinx (reStructuredText) syntax
+later if necessary, although reStructuredText tries to imitate common
+plain-text conventions so it should be quite natural. A basic knowledge of
+reStructuredText syntax is useful when writing the document, so the last
+~half of this document (starting with `Example Section`_) gives examples
+which should cover 99% of use cases.
+
+Let me say that again: focus on *content*. But if you really need to verify
+Sphinx's output, see ``docs/README.txt`` for information.
+
+Once you have finished with the content, please send the ``.rst`` file to
+llvm-commits for review.
+
+Guidelines
+==========
+
+Try to answer the following questions in your first section:
+
+#. Why would I want to read this document?
+
+#. What should I know to be able to follow along with this document?
+
+#. What will I have learned by the end of this document?
+
+Common names for the first section are ``Introduction``, ``Overview``, or
+``Background``.
+
+If possible, make your document a "how to". Give it a name ``HowTo*.rst``
+like the other "how to" documents. This format is usually the easiest
+for another person to understand and also the most useful.
+
+You generally should not be writing documentation other than a "how to"
+unless there is already a "how to" about your topic. The reason for this
+is that without a "how to" document to read first, it is difficult for a
+person to understand a more advanced document.
+
+Focus on content (yes, I had to say it again).
+
+The rest of this document shows example reStructuredText markup constructs
+that are meant to be read by you in your text editor after you have copied
+this file into a new file for the documentation you are about to write.
+
+Example Section
+===============
+
+Your text can be *emphasized*, **bold**, or ``monospace``.
+
+Use blank lines to separate paragraphs.
+
+Headings (like ``Example Section`` just above) give your document its
+structure. Use the same kind of adornments (e.g. ``======`` vs. ``------``)
+as are used in this document. The adornment must be the same length as the
+text above it. For Vim users, variations of ``yypVr=`` might be handy.
+
+Example Subsection
+------------------
+
+Make a link `like this <http://llvm.org/>`_. There is also a more
+sophisticated syntax which `can be more readable`_ for longer links since
+it disrupts the flow less. You can put the ``.. _`link text`: <URL>`` block
+pretty much anywhere later in the document.
+
+.. _`can be more readable`: http://en.wikipedia.org/wiki/LLVM
+
+Lists can be made like this:
+
+#. A list starting with ``#.`` will be automatically numbered.
+
+#. This is a second list element.
+
+ #. Use indentation to create nested lists.
+
+You can also use unordered lists.
+
+* Stuff.
+
+ + Deeper stuff.
+
+* More stuff.
+
+Example Subsubsection
+^^^^^^^^^^^^^^^^^^^^^
+
+You can make blocks of code like this:
+
+.. code-block:: c++
+
+ int main() {
+ return 0;
+ }
+
+For a shell session, use a ``console`` code block (some existing docs use
+``bash``):
+
+.. code-block:: console
+
+ $ echo "Goodbye cruel world!"
+ $ rm -rf /
+
+If you need to show LLVM IR use the ``llvm`` code block.
+
+.. code-block:: llvm
+
+ define i32 @test1() {
+ entry:
+ ret i32 0
+ }
+
+Some other common code blocks you might need are ``c``, ``objc``, ``make``,
+and ``cmake``. If you need something beyond that, you can look at the `full
+list`_ of supported code blocks.
+
+.. _`full list`: http://pygments.org/docs/lexers/
+
+However, don't waste time fiddling with syntax highlighting when you could
+be adding meaningful content. When in doubt, show preformatted text
+without any syntax highlighting like this:
+
+::
+
+ .
+ +:.
+ ..:: ::
+ .++:+:: ::+:.:.
+ .:+ :
+ ::.::..:: .+.
+ ..:+ :: :
+ ......+:. ..
+ :++. .. :
+ .+:::+:: :
+ .. . .+ ::
+ +.: .::+.
+ ...+. .: .
+ .++:..
+ ...
+
+Hopefully you won't need to be this deep
+""""""""""""""""""""""""""""""""""""""""
+
+If you need to do fancier things than what has been shown in this document,
+you can mail the list or check Sphinx's `reStructuredText Primer`_.
+
+.. _`reStructuredText Primer`: http://sphinx.pocoo.org/rest.html
Added: www-releases/trunk/8.0.1/docs/_sources/StackMaps.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/StackMaps.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/StackMaps.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/StackMaps.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,517 @@
+===================================
+Stack maps and patch points in LLVM
+===================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Definitions
+===========
+
+In this document we refer to the "runtime" collectively as all
+components that serve as the LLVM client, including the LLVM IR
+generator, object code consumer, and code patcher.
+
+A stack map records the location of ``live values`` at a particular
+instruction address. These ``live values`` do not refer to all the
+LLVM values live across the stack map. Instead, they are only the
+values that the runtime requires to be live at this point. For
+example, they may be the values the runtime will need to resume
+program execution at that point independent of the compiled function
+containing the stack map.
+
+LLVM emits stack map data into the object code within a designated
+:ref:`stackmap-section`. This stack map data contains a record for
+each stack map. The record stores the stack map's instruction address
+and contains a entry for each mapped value. Each entry encodes a
+value's location as a register, stack offset, or constant.
+
+A patch point is an instruction address at which space is reserved for
+patching a new instruction sequence at run time. Patch points look
+much like calls to LLVM. They take arguments that follow a calling
+convention and may return a value. They also imply stack map
+generation, which allows the runtime to locate the patchpoint and
+find the location of ``live values`` at that point.
+
+Motivation
+==========
+
+This functionality is currently experimental but is potentially useful
+in a variety of settings, the most obvious being a runtime (JIT)
+compiler. Example applications of the patchpoint intrinsics are
+implementing an inline call cache for polymorphic method dispatch or
+optimizing the retrieval of properties in dynamically typed languages
+such as JavaScript.
+
+The intrinsics documented here are currently used by the JavaScript
+compiler within the open source WebKit project, see the `FTL JIT
+<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
+used whenever stack maps or code patching are needed. Because the
+intrinsics have experimental status, compatibility across LLVM
+releases is not guaranteed.
+
+The stack map functionality described in this document is separate
+from the functionality described in
+:ref:`stack-map`. `GCFunctionMetadata` provides the location of
+pointers into a collected heap captured by the `GCRoot` intrinsic,
+which can also be considered a "stack map". Unlike the stack maps
+defined above, the `GCFunctionMetadata` stack map interface does not
+provide a way to associate live register values of arbitrary type with
+an instruction address, nor does it specify a format for the resulting
+stack map. The stack maps described here could potentially provide
+richer information to a garbage collecting runtime, but that usage
+will not be discussed in this document.
+
+Intrinsics
+==========
+
+The following two kinds of intrinsics can be used to implement stack
+maps and patch points: ``llvm.experimental.stackmap`` and
+``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
+stack map record, and they both allow some form of code patching. They
+can be used independently (i.e. ``llvm.experimental.patchpoint``
+implicitly generates a stack map without the need for an additional
+call to ``llvm.experimental.stackmap``). The choice of which to use
+depends on whether it is necessary to reserve space for code patching
+and whether any of the intrinsic arguments should be lowered according
+to calling conventions. ``llvm.experimental.stackmap`` does not
+reserve any space, nor does it expect any call arguments. If the
+runtime patches code at the stack map's address, it will destructively
+overwrite the program text. This is unlike
+``llvm.experimental.patchpoint``, which reserves space for in-place
+patching without overwriting surrounding code. The
+``llvm.experimental.patchpoint`` intrinsic also lowers a specified
+number of arguments according to its calling convention. This allows
+patched code to make in-place function calls without marshaling.
+
+Each instance of one of these intrinsics generates a stack map record
+in the :ref:`stackmap-section`. The record includes an ID, allowing
+the runtime to uniquely identify the stack map, and the offset within
+the code from the beginning of the enclosing function.
+
+'``llvm.experimental.stackmap``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare void
+ @llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
+
+Overview:
+"""""""""
+
+The '``llvm.experimental.stackmap``' intrinsic records the location of
+specified values in the stack map without generating any code.
+
+Operands:
+"""""""""
+
+The first operand is an ID to be encoded within the stack map. The
+second operand is the number of shadow bytes following the
+intrinsic. The variable number of operands that follow are the ``live
+values`` for which locations will be recorded in the stack map.
+
+To use this intrinsic as a bare-bones stack map, with no code patching
+support, the number of shadow bytes can be set to zero.
+
+Semantics:
+""""""""""
+
+The stack map intrinsic generates no code in place, unless nops are
+needed to cover its shadow (see below). However, its offset from
+function entry is stored in the stack map. This is the relative
+instruction address immediately following the instructions that
+precede the stack map.
+
+The stack map ID allows a runtime to locate the desired stack map
+record. LLVM passes this ID through directly to the stack map
+record without checking uniqueness.
+
+LLVM guarantees a shadow of instructions following the stack map's
+instruction offset during which neither the end of the basic block nor
+another call to ``llvm.experimental.stackmap`` or
+``llvm.experimental.patchpoint`` may occur. This allows the runtime to
+patch the code at this point in response to an event triggered from
+outside the code. The code for instructions following the stack map
+may be emitted in the stack map's shadow, and these instructions may
+be overwritten by destructive patching. Without shadow bytes, this
+destructive patching could overwrite program text or data outside the
+current function. We disallow overlapping stack map shadows so that
+the runtime does not need to consider this corner case.
+
+For example, a stack map with 8 byte shadow:
+
+.. code-block:: llvm
+
+ call void @runtime()
+ call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
+ i64* %ptr)
+ %val = load i64* %ptr
+ %add = add i64 %val, 3
+ ret i64 %add
+
+May require one byte of nop-padding:
+
+.. code-block:: none
+
+ 0x00 callq _runtime
+ 0x05 nop <--- stack map address
+ 0x06 movq (%rdi), %rax
+ 0x07 addq $3, %rax
+ 0x0a popq %rdx
+ 0x0b ret <---- end of 8-byte shadow
+
+Now, if the runtime needs to invalidate the compiled code, it may
+patch 8 bytes of code at the stack map's address at follows:
+
+.. code-block:: none
+
+ 0x00 callq _runtime
+ 0x05 movl $0xffff, %rax <--- patched code at stack map address
+ 0x0a callq *%rax <---- end of 8-byte shadow
+
+This way, after the normal call to the runtime returns, the code will
+execute a patched call to a special entry point that can rebuild a
+stack frame from the values located by the stack map.
+
+'``llvm.experimental.patchpoint.*``' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare void
+ @llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
+ i8* <target>, i32 <numArgs>, ...)
+ declare i64
+ @llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
+ i8* <target>, i32 <numArgs>, ...)
+
+Overview:
+"""""""""
+
+The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
+call to the specified ``<target>`` and records the location of specified
+values in the stack map.
+
+Operands:
+"""""""""
+
+The first operand is an ID, the second operand is the number of bytes
+reserved for the patchable region, the third operand is the target
+address of a function (optionally null), and the fourth operand
+specifies how many of the following variable operands are considered
+function call arguments. The remaining variable number of operands are
+the ``live values`` for which locations will be recorded in the stack
+map.
+
+Semantics:
+""""""""""
+
+The patch point intrinsic generates a stack map. It also emits a
+function call to the address specified by ``<target>`` if the address
+is not a constant null. The function call and its arguments are
+lowered according to the calling convention specified at the
+intrinsic's callsite. Variants of the intrinsic with non-void return
+type also return a value according to calling convention.
+
+On PowerPC, note that ``<target>`` must be the ABI function pointer for the
+intended target of the indirect call. Specifically, when compiling for the
+ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
+the C/C++ function-pointer representation.
+
+Requesting zero patch point arguments is valid. In this case, all
+variable operands are handled just like
+``llvm.experimental.stackmap.*``. The difference is that space will
+still be reserved for patching, a call will be emitted, and a return
+value is allowed.
+
+The location of the arguments are not normally recorded in the stack
+map because they are already fixed by the calling convention. The
+remaining ``live values`` will have their location recorded, which
+could be a register, stack location, or constant. A special calling
+convention has been introduced for use with stack maps, anyregcc,
+which forces the arguments to be loaded into registers but allows
+those register to be dynamically allocated. These argument registers
+will have their register locations recorded in the stack map in
+addition to the remaining ``live values``.
+
+The patch point also emits nops to cover at least ``<numBytes>`` of
+instruction encoding space. Hence, the client must ensure that
+``<numBytes>`` is enough to encode a call to the target address on the
+supported targets. If the call target is constant null, then there is
+no minimum requirement. A zero-byte null target patchpoint is
+valid.
+
+The runtime may patch the code emitted for the patch point, including
+the call sequence and nops. However, the runtime may not assume
+anything about the code LLVM emits within the reserved space. Partial
+patching is not allowed. The runtime must patch all reserved bytes,
+padding with nops if necessary.
+
+This example shows a patch point reserving 15 bytes, with one argument
+in $rdi, and a return value in $rax per native calling convention:
+
+.. code-block:: llvm
+
+ %target = inttoptr i64 -281474976710654 to i8*
+ %val = call i64 (i64, i32, ...)*
+ @llvm.experimental.patchpoint.i64(i64 78, i32 15,
+ i8* %target, i32 1, i64* %ptr)
+ %add = add i64 %val, 3
+ ret i64 %add
+
+May generate:
+
+.. code-block:: none
+
+ 0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
+ 0x0a callq *%r11
+ 0x0d nop
+ 0x0e nop <--- end of reserved 15-bytes
+ 0x0f addq $0x3, %rax
+ 0x10 movl %rax, 8(%rsp)
+
+Note that no stack map locations will be recorded. If the patched code
+sequence does not need arguments fixed to specific calling convention
+registers, then the ``anyregcc`` convention may be used:
+
+.. code-block:: none
+
+ %val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
+ i8* %target, i32 1,
+ i64* %ptr)
+
+The stack map now indicates the location of the %ptr argument and
+return value:
+
+.. code-block:: none
+
+ Stack Map: ID=78, Loc0=%r9 Loc1=%r8
+
+The patch code sequence may now use the argument that happened to be
+allocated in %r8 and return a value allocated in %r9:
+
+.. code-block:: none
+
+ 0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
+ 0x03 nop
+ ...
+ 0x0e nop <--- end of reserved 15-bytes
+ 0x0f addq $0x3, %r9
+ 0x10 movl %r9, 8(%rsp)
+
+.. _stackmap-format:
+
+Stack Map Format
+================
+
+The existence of a stack map or patch point intrinsic within an LLVM
+Module forces code emission to create a :ref:`stackmap-section`. The
+format of this section follows:
+
+.. code-block:: none
+
+ Header {
+ uint8 : Stack Map Version (current version is 3)
+ uint8 : Reserved (expected to be 0)
+ uint16 : Reserved (expected to be 0)
+ }
+ uint32 : NumFunctions
+ uint32 : NumConstants
+ uint32 : NumRecords
+ StkSizeRecord[NumFunctions] {
+ uint64 : Function Address
+ uint64 : Stack Size
+ uint64 : Record Count
+ }
+ Constants[NumConstants] {
+ uint64 : LargeConstant
+ }
+ StkMapRecord[NumRecords] {
+ uint64 : PatchPoint ID
+ uint32 : Instruction Offset
+ uint16 : Reserved (record flags)
+ uint16 : NumLocations
+ Location[NumLocations] {
+ uint8 : Register | Direct | Indirect | Constant | ConstantIndex
+ uint8 : Reserved (expected to be 0)
+ uint16 : Location Size
+ uint16 : Dwarf RegNum
+ uint16 : Reserved (expected to be 0)
+ int32 : Offset or SmallConstant
+ }
+ uint32 : Padding (only if required to align to 8 byte)
+ uint16 : Padding
+ uint16 : NumLiveOuts
+ LiveOuts[NumLiveOuts]
+ uint16 : Dwarf RegNum
+ uint8 : Reserved
+ uint8 : Size in Bytes
+ }
+ uint32 : Padding (only if required to align to 8 byte)
+ }
+
+The first byte of each location encodes a type that indicates how to
+interpret the ``RegNum`` and ``Offset`` fields as follows:
+
+======== ========== =================== ===========================
+Encoding Type Value Description
+-------- ---------- ------------------- ---------------------------
+0x1 Register Reg Value in a register
+0x2 Direct Reg + Offset Frame index value
+0x3 Indirect [Reg + Offset] Spilled value
+0x4 Constant Offset Small constant
+0x5 ConstIndex Constants[Offset] Large constant
+======== ========== =================== ===========================
+
+In the common case, a value is available in a register, and the
+``Offset`` field will be zero. Values spilled to the stack are encoded
+as ``Indirect`` locations. The runtime must load those values from a
+stack address, typically in the form ``[BP + Offset]``. If an
+``alloca`` value is passed directly to a stack map intrinsic, then
+LLVM may fold the frame index into the stack map as an optimization to
+avoid allocating a register or stack slot. These frame indices will be
+encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
+also optimize constants by emitting them directly in the stack map,
+either in the ``Offset`` of a ``Constant`` location or in the constant
+pool, referred to by ``ConstantIndex`` locations.
+
+At each callsite, a "liveout" register list is also recorded. These
+are the registers that are live across the stackmap and therefore must
+be saved by the runtime. This is an important optimization when the
+patchpoint intrinsic is used with a calling convention that by default
+preserves most registers as callee-save.
+
+Each entry in the liveout register list contains a DWARF register
+number and size in bytes. The stackmap format deliberately omits
+specific subregister information. Instead the runtime must interpret
+this information conservatively. For example, if the stackmap reports
+one byte at ``%rax``, then the value may be in either ``%al`` or
+``%ah``. It doesn't matter in practice, because the runtime will
+simply save ``%rax``. However, if the stackmap reports 16 bytes at
+``%ymm0``, then the runtime can safely optimize by saving only
+``%xmm0``.
+
+The stack map format is a contract between an LLVM SVN revision and
+the runtime. It is currently experimental and may change in the short
+term, but minimizing the need to update the runtime is
+important. Consequently, the stack map design is motivated by
+simplicity and extensibility. Compactness of the representation is
+secondary because the runtime is expected to parse the data
+immediately after compiling a module and encode the information in its
+own format. Since the runtime controls the allocation of sections, it
+can reuse the same stack map space for multiple modules.
+
+Stackmap support is currently only implemented for 64-bit
+platforms. However, a 32-bit implementation should be able to use the
+same format with an insignificant amount of wasted space.
+
+.. _stackmap-section:
+
+Stack Map Section
+^^^^^^^^^^^^^^^^^
+
+A JIT compiler can easily access this section by providing its own
+memory manager via the LLVM C API
+``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
+manager, the JIT provides a callback:
+``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
+this section, it invokes the callback and passes the section name. The
+JIT can record the in-memory address of the section at this time and
+later parse it to recover the stack map data.
+
+For MachO (e.g. on Darwin), the stack map section name is
+"__llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
+
+For ELF (e.g. on Linux), the stack map section name is
+".llvm_stackmaps". The segment name is "__LLVM_STACKMAPS".
+
+Stack Map Usage
+===============
+
+The stack map support described in this document can be used to
+precisely determine the location of values at a specific position in
+the code. LLVM does not maintain any mapping between those values and
+any higher-level entity. The runtime must be able to interpret the
+stack map record given only the ID, offset, and the order of the
+locations, records, and functions, which LLVM preserves.
+
+Note that this is quite different from the goal of debug information,
+which is a best-effort attempt to track the location of named
+variables at every instruction.
+
+An important motivation for this design is to allow a runtime to
+commandeer a stack frame when execution reaches an instruction address
+associated with a stack map. The runtime must be able to rebuild a
+stack frame and resume program execution using the information
+provided by the stack map. For example, execution may resume in an
+interpreter or a recompiled version of the same function.
+
+This usage restricts LLVM optimization. Clearly, LLVM must not move
+stores across a stack map. However, loads must also be handled
+conservatively. If the load may trigger an exception, hoisting it
+above a stack map could be invalid. For example, the runtime may
+determine that a load is safe to execute without a type check given
+the current state of the type system. If the type system changes while
+some activation of the load's function exists on the stack, the load
+becomes unsafe. The runtime can prevent subsequent execution of that
+load by immediately patching any stack map location that lies between
+the current call site and the load (typically, the runtime would
+simply patch all stack map locations to invalidate the function). If
+the compiler had hoisted the load above the stack map, then the
+program could crash before the runtime could take back control.
+
+To enforce these semantics, stackmap and patchpoint intrinsics are
+considered to potentially read and write all memory. This may limit
+optimization more than some clients desire. This limitation may be
+avoided by marking the call site as "readonly". In the future we may
+also allow meta-data to be added to the intrinsic call to express
+aliasing, thereby allowing optimizations to hoist certain loads above
+stack maps.
+
+Direct Stack Map Entries
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+As shown in :ref:`stackmap-section`, a Direct stack map location
+records the address of frame index. This address is itself the value
+that the runtime requested. This differs from Indirect locations,
+which refer to a stack locations from which the requested values must
+be loaded. Direct locations can communicate the address if an alloca,
+while Indirect locations handle register spills.
+
+For example:
+
+.. code-block:: none
+
+ entry:
+ %a = alloca i64...
+ llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
+
+The runtime can determine this alloca's relative location on the
+stack immediately after compilation, or at any time thereafter. This
+differs from Register and Indirect locations, because the runtime can
+only read the values in those locations when execution reaches the
+instruction address of the stack map.
+
+This functionality requires LLVM to treat entry-block allocas
+specially when they are directly consumed by an intrinsics. (This is
+the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
+transformations must not substitute the alloca with any intervening
+value. This can be verified by the runtime simply by checking that the
+stack map's location is a Direct location type.
+
+
+Supported Architectures
+=======================
+
+Support for StackMap generation and the related intrinsics requires
+some code for each backend. Today, only a subset of LLVM's backends
+are supported. The currently supported architectures are X86_64,
+PowerPC, and Aarch64.
Added: www-releases/trunk/8.0.1/docs/_sources/StackSafetyAnalysis.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/StackSafetyAnalysis.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/StackSafetyAnalysis.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/StackSafetyAnalysis.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,56 @@
+==================================
+Stack Safety Analysis
+==================================
+
+
+Introduction
+============
+
+The Stack Safety Analysis determines if stack allocated variables can be
+considered 'safe' from memory access bugs.
+
+The primary purpose of the analysis is to be used by sanitizers to avoid
+unnecessary instrumentation of 'safe' variables. SafeStack is going to be the
+first user.
+
+'safe' variables can be defined as variables that can not be used out-of-scope
+(e.g. use-after-return) or accessed out of bounds. In the future it can be
+extended to track other variable properties. E.g. we plan to extend
+implementation with a check to make sure that variable is always initialized
+before every read to optimize use-of-uninitialized-memory checks.
+
+How it works
+============
+
+The analysis is implemented in two stages:
+
+The intra-procedural, or 'local', stage performs a depth-first search inside
+functions to collect all uses of each alloca, including loads/stores and uses as
+arguments functions. After this stage we know which parts of the alloca are used
+by functions itself but we don't know what happens after it is passed as
+an argument to another function.
+
+The inter-procedural, or 'global', stage, resolves what happens to allocas after
+they are passed as function arguments. This stage performs a depth-first search
+on function calls inside a single module and propagates allocas usage through
+functions calls.
+
+When used with ThinLTO, the global stage performs a whole program analysis over
+the Module Summary Index.
+
+Testing
+=======
+
+The analysis is covered with lit tests.
+
+We expect that users can tolerate false classification of variables as
+'unsafe' when in-fact it's 'safe'. This may lead to inefficient code. However, we
+can't accept false 'safe' classification which may cause sanitizers to miss actual
+bugs in instrumented code. To avoid that we want additional validation tool.
+
+AddressSanitizer may help with this validation. We can instrument all variables
+as usual but additionally store stack-safe information in the
+``ASanStackVariableDescription``. Then if AddressSanitizer detects a bug on
+a 'safe' variable we can produce an additional report to let the user know that
+probably Stack Safety Analysis failed and we should check for a bug in the
+compiler.
Added: www-releases/trunk/8.0.1/docs/_sources/Statepoints.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Statepoints.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Statepoints.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Statepoints.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,983 @@
+=====================================
+Garbage Collection Safepoints in LLVM
+=====================================
+
+.. contents::
+ :local:
+ :depth: 2
+
+Status
+=======
+
+This document describes a set of extensions to LLVM to support garbage
+collection. By now, these mechanisms are well proven with commercial java
+implementation with a fully relocating collector having shipped using them.
+There are a couple places where bugs might still linger; these are called out
+below.
+
+They are still listed as "experimental" to indicate that no forward or backward
+compatibility guarantees are offered across versions. If your use case is such
+that you need some form of forward compatibility guarantee, please raise the
+issue on the llvm-dev mailing list.
+
+LLVM still supports an alternate mechanism for conservative garbage collection
+support using the ``gcroot`` intrinsic. The ``gcroot`` mechanism is mostly of
+historical interest at this point with one exception - its implementation of
+shadow stacks has been used successfully by a number of language frontends and
+is still supported.
+
+Overview & Core Concepts
+========================
+
+To collect dead objects, garbage collectors must be able to identify
+any references to objects contained within executing code, and,
+depending on the collector, potentially update them. The collector
+does not need this information at all points in code - that would make
+the problem much harder - but only at well-defined points in the
+execution known as 'safepoints' For most collectors, it is sufficient
+to track at least one copy of each unique pointer value. However, for
+a collector which wishes to relocate objects directly reachable from
+running code, a higher standard is required.
+
+One additional challenge is that the compiler may compute intermediate
+results ("derived pointers") which point outside of the allocation or
+even into the middle of another allocation. The eventual use of this
+intermediate value must yield an address within the bounds of the
+allocation, but such "exterior derived pointers" may be visible to the
+collector. Given this, a garbage collector can not safely rely on the
+runtime value of an address to indicate the object it is associated
+with. If the garbage collector wishes to move any object, the
+compiler must provide a mapping, for each pointer, to an indication of
+its allocation.
+
+To simplify the interaction between a collector and the compiled code,
+most garbage collectors are organized in terms of three abstractions:
+load barriers, store barriers, and safepoints.
+
+#. A load barrier is a bit of code executed immediately after the
+ machine load instruction, but before any use of the value loaded.
+ Depending on the collector, such a barrier may be needed for all
+ loads, merely loads of a particular type (in the original source
+ language), or none at all.
+
+#. Analogously, a store barrier is a code fragment that runs
+ immediately before the machine store instruction, but after the
+ computation of the value stored. The most common use of a store
+ barrier is to update a 'card table' in a generational garbage
+ collector.
+
+#. A safepoint is a location at which pointers visible to the compiled
+ code (i.e. currently in registers or on the stack) are allowed to
+ change. After the safepoint completes, the actual pointer value
+ may differ, but the 'object' (as seen by the source language)
+ pointed to will not.
+
+ Note that the term 'safepoint' is somewhat overloaded. It refers to
+ both the location at which the machine state is parsable and the
+ coordination protocol involved in bring application threads to a
+ point at which the collector can safely use that information. The
+ term "statepoint" as used in this document refers exclusively to the
+ former.
+
+This document focuses on the last item - compiler support for
+safepoints in generated code. We will assume that an outside
+mechanism has decided where to place safepoints. From our
+perspective, all safepoints will be function calls. To support
+relocation of objects directly reachable from values in compiled code,
+the collector must be able to:
+
+#. identify every copy of a pointer (including copies introduced by
+ the compiler itself) at the safepoint,
+#. identify which object each pointer relates to, and
+#. potentially update each of those copies.
+
+This document describes the mechanism by which an LLVM based compiler
+can provide this information to a language runtime/collector, and
+ensure that all pointers can be read and updated if desired.
+
+Abstract Machine Model
+^^^^^^^^^^^^^^^^^^^^^^^
+
+At a high level, LLVM has been extended to support compiling to an abstract
+machine which extends the actual target with a non-integral pointer type
+suitable for representing a garbage collected reference to an object. In
+particular, such non-integral pointer type have no defined mapping to an
+integer representation. This semantic quirk allows the runtime to pick a
+integer mapping for each point in the program allowing relocations of objects
+without visible effects.
+
+This high level abstract machine model is used for most of the optimizer. As
+a result, transform passes do not need to be extended to look through explicit
+relocation sequence. Before starting code generation, we switch
+representations to an explicit form. The exact location chosen for lowering
+is an implementation detail.
+
+Note that most of the value of the abstract machine model comes for collectors
+which need to model potentially relocatable objects. For a compiler which
+supports only a non-relocating collector, you may wish to consider starting
+with the fully explicit form.
+
+Warning: There is one currently known semantic hole in the definition of
+non-integral pointers which has not been addressed upstream. To work around
+this, you need to disable speculation of loads unless the memory type
+(non-integral pointer vs anything else) is known to unchanged. That is, it is
+not safe to speculate a load if doing causes a non-integral pointer value to
+be loaded as any other type or vice versa. In practice, this restriction is
+well isolated to isSafeToSpeculate in ValueTracking.cpp.
+
+Explicit Representation
+^^^^^^^^^^^^^^^^^^^^^^^
+
+A frontend could directly generate this low level explicit form, but
+doing so may inhibit optimization. Instead, it is recommended that
+compilers with relocating collectors target the abstract machine model just
+described.
+
+The heart of the explicit approach is to construct (or rewrite) the IR in a
+manner where the possible updates performed by the garbage collector are
+explicitly visible in the IR. Doing so requires that we:
+
+#. create a new SSA value for each potentially relocated pointer, and
+ ensure that no uses of the original (non relocated) value is
+ reachable after the safepoint,
+#. specify the relocation in a way which is opaque to the compiler to
+ ensure that the optimizer can not introduce new uses of an
+ unrelocated value after a statepoint. This prevents the optimizer
+ from performing unsound optimizations.
+#. recording a mapping of live pointers (and the allocation they're
+ associated with) for each statepoint.
+
+At the most abstract level, inserting a safepoint can be thought of as
+replacing a call instruction with a call to a multiple return value
+function which both calls the original target of the call, returns
+its result, and returns updated values for any live pointers to
+garbage collected objects.
+
+ Note that the task of identifying all live pointers to garbage
+ collected values, transforming the IR to expose a pointer giving the
+ base object for every such live pointer, and inserting all the
+ intrinsics correctly is explicitly out of scope for this document.
+ The recommended approach is to use the :ref:`utility passes
+ <statepoint-utilities>` described below.
+
+This abstract function call is concretely represented by a sequence of
+intrinsic calls known collectively as a "statepoint relocation sequence".
+
+Let's consider a simple call in LLVM IR:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ call void ()* @foo()
+ ret i8 addrspace(1)* %obj
+ }
+
+Depending on our language we may need to allow a safepoint during the execution
+of ``foo``. If so, we need to let the collector update local values in the
+current frame. If we don't, we'll be accessing a potential invalid reference
+once we eventually return from the call.
+
+In this example, we need to relocate the SSA value ``%obj``. Since we can't
+actually change the value in the SSA value ``%obj``, we need to introduce a new
+SSA value ``%obj.relocated`` which represents the potentially changed value of
+``%obj`` after the safepoint and update any following uses appropriately. The
+resulting relocation sequence is:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+ %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
+ ret i8 addrspace(1)* %obj.relocated
+ }
+
+Ideally, this sequence would have been represented as a M argument, N
+return value function (where M is the number of values being
+relocated + the original call arguments and N is the original return
+value + each relocated value), but LLVM does not easily support such a
+representation.
+
+Instead, the statepoint intrinsic marks the actual site of the
+safepoint or statepoint. The statepoint returns a token value (which
+exists only at compile time). To get back the original return value
+of the call, we use the ``gc.result`` intrinsic. To get the relocation
+of each pointer in turn, we use the ``gc.relocate`` intrinsic with the
+appropriate index. Note that both the ``gc.relocate`` and ``gc.result`` are
+tied to the statepoint. The combination forms a "statepoint relocation
+sequence" and represents the entirety of a parseable call or 'statepoint'.
+
+When lowered, this example would generate the following x86 assembly:
+
+.. code-block:: gas
+
+ .globl test1
+ .align 16, 0x90
+ pushq %rax
+ callq foo
+ .Ltmp1:
+ movq (%rsp), %rax # This load is redundant (oops!)
+ popq %rdx
+ retq
+
+Each of the potentially relocated values has been spilled to the
+stack, and a record of that location has been recorded to the
+:ref:`Stack Map section <stackmap-section>`. If the garbage collector
+needs to update any of these pointers during the call, it knows
+exactly what to change.
+
+The relevant parts of the StackMap section for our example are:
+
+.. code-block:: gas
+
+ # This describes the call site
+ # Stack Maps: callsite 2882400000
+ .quad 2882400000
+ .long .Ltmp1-test1
+ .short 0
+ # .. 8 entries skipped ..
+ # This entry describes the spill slot which is directly addressable
+ # off RSP with offset 0. Given the value was spilled with a pushq,
+ # that makes sense.
+ # Stack Maps: Loc 8: Direct RSP [encoding: .byte 2, .byte 8, .short 7, .int 0]
+ .byte 2
+ .byte 8
+ .short 7
+ .long 0
+
+This example was taken from the tests for the :ref:`RewriteStatepointsForGC`
+utility pass. As such, its full StackMap can be easily examined with the
+following command.
+
+.. code-block:: bash
+
+ opt -rewrite-statepoints-for-gc test/Transforms/RewriteStatepointsForGC/basics.ll -S | llc -debug-only=stackmaps
+
+Simplifications for Non-Relocating GCs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some of the complexity in the previous example is unnecessary for a
+non-relocating collector. While a non-relocating collector still needs the
+information about which location contain live references, it doesn't need to
+represent explicit relocations. As such, the previously described explicit
+lowering can be simplified to remove all of the ``gc.relocate`` intrinsic
+calls and leave uses in terms of the original reference value.
+
+Here's the explicit lowering for the previous example for a non-relocating
+collector:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+ ret i8 addrspace(1)* %obj
+ }
+
+Recording On Stack Regions
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In addition to the explicit relocation form previously described, the
+statepoint infrastructure also allows the listing of allocas within the gc
+pointer list. Allocas can be listed with or without additional explicit gc
+pointer values and relocations.
+
+An alloca in the gc region of the statepoint operand list will cause the
+address of the stack region to be listed in the stackmap for the statepoint.
+
+This mechanism can be used to describe explicit spill slots if desired. It
+then becomes the generator's responsibility to ensure that values are
+spill/filled to/from the alloca as needed on either side of the safepoint.
+Note that there is no way to indicate a corresponding base pointer for such
+an explicitly specified spill slot, so usage is restricted to values for
+which the associated collector can derive the object base from the pointer
+itself.
+
+This mechanism can be used to describe on stack objects containing
+references provided that the collector can map from the location on the
+stack to a heap map describing the internal layout of the references the
+collector needs to process.
+
+WARNING: At the moment, this alternate form is not well exercised. It is
+recommended to use this with caution and expect to have to fix a few bugs.
+In particular, the RewriteStatepointsForGC utility pass does not do
+anything for allocas today.
+
+Base & Derived Pointers
+^^^^^^^^^^^^^^^^^^^^^^^
+
+A "base pointer" is one which points to the starting address of an allocation
+(object). A "derived pointer" is one which is offset from a base pointer by
+some amount. When relocating objects, a garbage collector needs to be able
+to relocate each derived pointer associated with an allocation to the same
+offset from the new address.
+
+"Interior derived pointers" remain within the bounds of the allocation
+they're associated with. As a result, the base object can be found at
+runtime provided the bounds of allocations are known to the runtime system.
+
+"Exterior derived pointers" are outside the bounds of the associated object;
+they may even fall within *another* allocations address range. As a result,
+there is no way for a garbage collector to determine which allocation they
+are associated with at runtime and compiler support is needed.
+
+The ``gc.relocate`` intrinsic supports an explicit operand for describing the
+allocation associated with a derived pointer. This operand is frequently
+referred to as the base operand, but does not strictly speaking have to be
+a base pointer, but it does need to lie within the bounds of the associated
+allocation. Some collectors may require that the operand be an actual base
+pointer rather than merely an internal derived pointer. Note that during
+lowering both the base and derived pointer operands are required to be live
+over the associated call safepoint even if the base is otherwise unused
+afterwards.
+
+If we extend our previous example to include a pointless derived pointer,
+we get:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ %gep = getelementptr i8, i8 addrspace(1)* %obj, i64 20000
+ %token = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj, i8 addrspace(1)* %gep)
+ %obj.relocated = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %token, i32 7, i32 7)
+ %gep.relocated = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %token, i32 7, i32 8)
+ %p = getelementptr i8, i8 addrspace(1)* %gep, i64 -20000
+ ret i8 addrspace(1)* %p
+ }
+
+Note that in this example %p and %obj.relocate are the same address and we
+could replace one with the other, potentially removing the derived pointer
+from the live set at the safepoint entirely.
+
+.. _gc_transition_args:
+
+GC Transitions
+^^^^^^^^^^^^^^^^^^
+
+As a practical consideration, many garbage-collected systems allow code that is
+collector-aware ("managed code") to call code that is not collector-aware
+("unmanaged code"). It is common that such calls must also be safepoints, since
+it is desirable to allow the collector to run during the execution of
+unmanaged code. Furthermore, it is common that coordinating the transition from
+managed to unmanaged code requires extra code generation at the call site to
+inform the collector of the transition. In order to support these needs, a
+statepoint may be marked as a GC transition, and data that is necessary to
+perform the transition (if any) may be provided as additional arguments to the
+statepoint.
+
+ Note that although in many cases statepoints may be inferred to be GC
+ transitions based on the function symbols involved (e.g. a call from a
+ function with GC strategy "foo" to a function with GC strategy "bar"),
+ indirect calls that are also GC transitions must also be supported. This
+ requirement is the driving force behind the decision to require that GC
+ transitions are explicitly marked.
+
+Let's revisit the sample given above, this time treating the call to ``@foo``
+as a GC transition. Depending on our target, the transition code may need to
+access some extra state in order to inform the collector of the transition.
+Let's assume a hypothetical GC--somewhat unimaginatively named "hypothetical-gc"
+--that requires that a TLS variable must be written to before and after a call
+to unmanaged code. The resulting relocation sequence is:
+
+.. code-block:: llvm
+
+ @flag = thread_local global i32 0, align 4
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1) *%obj)
+ gc "hypothetical-gc" {
+
+ %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 1, i32* @Flag, i32 0, i8 addrspace(1)* %obj)
+ %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
+ ret i8 addrspace(1)* %obj.relocated
+ }
+
+During lowering, this will result in a instruction selection DAG that looks
+something like:
+
+::
+
+ CALLSEQ_START
+ ...
+ GC_TRANSITION_START (lowered i32 *@Flag), SRCVALUE i32* Flag
+ STATEPOINT
+ GC_TRANSITION_END (lowered i32 *@Flag), SRCVALUE i32 *Flag
+ ...
+ CALLSEQ_END
+
+In order to generate the necessary transition code, the backend for each target
+supported by "hypothetical-gc" must be modified to lower ``GC_TRANSITION_START``
+and ``GC_TRANSITION_END`` nodes appropriately when the "hypothetical-gc"
+strategy is in use for a particular function. Assuming that such lowering has
+been added for X86, the generated assembly would be:
+
+.. code-block:: gas
+
+ .globl test1
+ .align 16, 0x90
+ pushq %rax
+ movl $1, %fs:Flag at TPOFF
+ callq foo
+ movl $0, %fs:Flag at TPOFF
+ .Ltmp1:
+ movq (%rsp), %rax # This load is redundant (oops!)
+ popq %rdx
+ retq
+
+Note that the design as presented above is not fully implemented: in particular,
+strategy-specific lowering is not present, and all GC transitions are emitted as
+as single no-op before and after the call instruction. These no-ops are often
+removed by the backend during dead machine instruction elimination.
+
+
+Intrinsics
+===========
+
+'llvm.experimental.gc.statepoint' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare token
+ @llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
+ func_type <target>,
+ i64 <#call args>, i64 <flags>,
+ ... (call parameters),
+ i64 <# transition args>, ... (transition parameters),
+ i64 <# deopt args>, ... (deopt parameters),
+ ... (gc parameters))
+
+Overview:
+"""""""""
+
+The statepoint intrinsic represents a call which is parse-able by the
+runtime.
+
+Operands:
+"""""""""
+
+The 'id' operand is a constant integer that is reported as the ID
+field in the generated stackmap. LLVM does not interpret this
+parameter in any way and its meaning is up to the statepoint user to
+decide. Note that LLVM is free to duplicate code containing
+statepoint calls, and this may transform IR that had a unique 'id' per
+lexical call to statepoint to IR that does not.
+
+If 'num patch bytes' is non-zero then the call instruction
+corresponding to the statepoint is not emitted and LLVM emits 'num
+patch bytes' bytes of nops in its place. LLVM will emit code to
+prepare the function arguments and retrieve the function return value
+in accordance to the calling convention; the former before the nop
+sequence and the latter after the nop sequence. It is expected that
+the user will patch over the 'num patch bytes' bytes of nops with a
+calling sequence specific to their runtime before executing the
+generated machine code. There are no guarantees with respect to the
+alignment of the nop sequence. Unlike :doc:`StackMaps` statepoints do
+not have a concept of shadow bytes. Note that semantically the
+statepoint still represents a call or invoke to 'target', and the nop
+sequence after patching is expected to represent an operation
+equivalent to a call or invoke to 'target'.
+
+The 'target' operand is the function actually being called. The
+target can be specified as either a symbolic LLVM function, or as an
+arbitrary Value of appropriate function type. Note that the function
+type must match the signature of the callee and the types of the 'call
+parameters' arguments.
+
+The '#call args' operand is the number of arguments to the actual
+call. It must exactly match the number of arguments passed in the
+'call parameters' variable length section.
+
+The 'flags' operand is used to specify extra information about the
+statepoint. This is currently only used to mark certain statepoints
+as GC transitions. This operand is a 64-bit integer with the following
+layout, where bit 0 is the least significant bit:
+
+ +-------+---------------------------------------------------+
+ | Bit # | Usage |
+ +=======+===================================================+
+ | 0 | Set if the statepoint is a GC transition, cleared |
+ | | otherwise. |
+ +-------+---------------------------------------------------+
+ | 1-63 | Reserved for future use; must be cleared. |
+ +-------+---------------------------------------------------+
+
+The 'call parameters' arguments are simply the arguments which need to
+be passed to the call target. They will be lowered according to the
+specified calling convention and otherwise handled like a normal call
+instruction. The number of arguments must exactly match what is
+specified in '# call args'. The types must match the signature of
+'target'.
+
+The 'transition parameters' arguments contain an arbitrary list of
+Values which need to be passed to GC transition code. They will be
+lowered and passed as operands to the appropriate GC_TRANSITION nodes
+in the selection DAG. It is assumed that these arguments must be
+available before and after (but not necessarily during) the execution
+of the callee. The '# transition args' field indicates how many operands
+are to be interpreted as 'transition parameters'.
+
+The 'deopt parameters' arguments contain an arbitrary list of Values
+which is meaningful to the runtime. The runtime may read any of these
+values, but is assumed not to modify them. If the garbage collector
+might need to modify one of these values, it must also be listed in
+the 'gc pointer' argument list. The '# deopt args' field indicates
+how many operands are to be interpreted as 'deopt parameters'.
+
+The 'gc parameters' arguments contain every pointer to a garbage
+collector object which potentially needs to be updated by the garbage
+collector. Note that the argument list must explicitly contain a base
+pointer for every derived pointer listed. The order of arguments is
+unimportant. Unlike the other variable length parameter sets, this
+list is not length prefixed.
+
+Semantics:
+""""""""""
+
+A statepoint is assumed to read and write all memory. As a result,
+memory operations can not be reordered past a statepoint. It is
+illegal to mark a statepoint as being either 'readonly' or 'readnone'.
+
+Note that legal IR can not perform any memory operation on a 'gc
+pointer' argument of the statepoint in a location statically reachable
+from the statepoint. Instead, the explicitly relocated value (from a
+``gc.relocate``) must be used.
+
+'llvm.experimental.gc.result' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare type*
+ @llvm.experimental.gc.result(token %statepoint_token)
+
+Overview:
+"""""""""
+
+``gc.result`` extracts the result of the original call instruction
+which was replaced by the ``gc.statepoint``. The ``gc.result``
+intrinsic is actually a family of three intrinsics due to an
+implementation limitation. Other than the type of the return value,
+the semantics are the same.
+
+Operands:
+"""""""""
+
+The first and only argument is the ``gc.statepoint`` which starts
+the safepoint sequence of which this ``gc.result`` is a part.
+Despite the typing of this as a generic token, *only* the value defined
+by a ``gc.statepoint`` is legal here.
+
+Semantics:
+""""""""""
+
+The ``gc.result`` represents the return value of the call target of
+the ``statepoint``. The type of the ``gc.result`` must exactly match
+the type of the target. If the call target returns void, there will
+be no ``gc.result``.
+
+A ``gc.result`` is modeled as a 'readnone' pure function. It has no
+side effects since it is just a projection of the return value of the
+previous call represented by the ``gc.statepoint``.
+
+'llvm.experimental.gc.relocate' Intrinsic
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+::
+
+ declare <pointer type>
+ @llvm.experimental.gc.relocate(token %statepoint_token,
+ i32 %base_offset,
+ i32 %pointer_offset)
+
+Overview:
+"""""""""
+
+A ``gc.relocate`` returns the potentially relocated value of a pointer
+at the safepoint.
+
+Operands:
+"""""""""
+
+The first argument is the ``gc.statepoint`` which starts the
+safepoint sequence of which this ``gc.relocation`` is a part.
+Despite the typing of this as a generic token, *only* the value defined
+by a ``gc.statepoint`` is legal here.
+
+The second argument is an index into the statepoints list of arguments
+which specifies the allocation for the pointer being relocated.
+This index must land within the 'gc parameter' section of the
+statepoint's argument list. The associated value must be within the
+object with which the pointer being relocated is associated. The optimizer
+is free to change *which* interior derived pointer is reported, provided that
+it does not replace an actual base pointer with another interior derived
+pointer. Collectors are allowed to rely on the base pointer operand
+remaining an actual base pointer if so constructed.
+
+The third argument is an index into the statepoint's list of arguments
+which specify the (potentially) derived pointer being relocated. It
+is legal for this index to be the same as the second argument
+if-and-only-if a base pointer is being relocated. This index must land
+within the 'gc parameter' section of the statepoint's argument list.
+
+Semantics:
+""""""""""
+
+The return value of ``gc.relocate`` is the potentially relocated value
+of the pointer specified by its arguments. It is unspecified how the
+value of the returned pointer relates to the argument to the
+``gc.statepoint`` other than that a) it points to the same source
+language object with the same offset, and b) the 'based-on'
+relationship of the newly relocated pointers is a projection of the
+unrelocated pointers. In particular, the integer value of the pointer
+returned is unspecified.
+
+A ``gc.relocate`` is modeled as a ``readnone`` pure function. It has no
+side effects since it is just a way to extract information about work
+done during the actual call modeled by the ``gc.statepoint``.
+
+.. _statepoint-stackmap-format:
+
+Stack Map Format
+================
+
+Locations for each pointer value which may need read and/or updated by
+the runtime or collector are provided in a separate section of the
+generated object file as specified in the PatchPoint documentation.
+This special section is encoded per the
+:ref:`Stack Map format <stackmap-format>`.
+
+The general expectation is that a JIT compiler will parse and discard this
+format; it is not particularly memory efficient. If you need an alternate
+format (e.g. for an ahead of time compiler), see discussion under
+:ref: `open work items <OpenWork>` below.
+
+Each statepoint generates the following Locations:
+
+* Constant which describes the calling convention of the call target. This
+ constant is a valid :ref:`calling convention identifier <callingconv>` for
+ the version of LLVM used to generate the stackmap. No additional compatibility
+ guarantees are made for this constant over what LLVM provides elsewhere w.r.t.
+ these identifiers.
+* Constant which describes the flags passed to the statepoint intrinsic
+* Constant which describes number of following deopt *Locations* (not
+ operands)
+* Variable number of Locations, one for each deopt parameter listed in
+ the IR statepoint (same number as described by previous Constant). At
+ the moment, only deopt parameters with a bitwidth of 64 bits or less
+ are supported. Values of a type larger than 64 bits can be specified
+ and reported only if a) the value is constant at the call site, and b)
+ the constant can be represented with less than 64 bits (assuming zero
+ extension to the original bitwidth).
+* Variable number of relocation records, each of which consists of
+ exactly two Locations. Relocation records are described in detail
+ below.
+
+Each relocation record provides sufficient information for a collector to
+relocate one or more derived pointers. Each record consists of a pair of
+Locations. The second element in the record represents the pointer (or
+pointers) which need updated. The first element in the record provides a
+pointer to the base of the object with which the pointer(s) being relocated is
+associated. This information is required for handling generalized derived
+pointers since a pointer may be outside the bounds of the original allocation,
+but still needs to be relocated with the allocation. Additionally:
+
+* It is guaranteed that the base pointer must also appear explicitly as a
+ relocation pair if used after the statepoint.
+* There may be fewer relocation records then gc parameters in the IR
+ statepoint. Each *unique* pair will occur at least once; duplicates
+ are possible.
+* The Locations within each record may either be of pointer size or a
+ multiple of pointer size. In the later case, the record must be
+ interpreted as describing a sequence of pointers and their corresponding
+ base pointers. If the Location is of size N x sizeof(pointer), then
+ there will be N records of one pointer each contained within the Location.
+ Both Locations in a pair can be assumed to be of the same size.
+
+Note that the Locations used in each section may describe the same
+physical location. e.g. A stack slot may appear as a deopt location,
+a gc base pointer, and a gc derived pointer.
+
+The LiveOut section of the StkMapRecord will be empty for a statepoint
+record.
+
+Safepoint Semantics & Verification
+==================================
+
+The fundamental correctness property for the compiled code's
+correctness w.r.t. the garbage collector is a dynamic one. It must be
+the case that there is no dynamic trace such that a operation
+involving a potentially relocated pointer is observably-after a
+safepoint which could relocate it. 'observably-after' is this usage
+means that an outside observer could observe this sequence of events
+in a way which precludes the operation being performed before the
+safepoint.
+
+To understand why this 'observable-after' property is required,
+consider a null comparison performed on the original copy of a
+relocated pointer. Assuming that control flow follows the safepoint,
+there is no way to observe externally whether the null comparison is
+performed before or after the safepoint. (Remember, the original
+Value is unmodified by the safepoint.) The compiler is free to make
+either scheduling choice.
+
+The actual correctness property implemented is slightly stronger than
+this. We require that there be no *static path* on which a
+potentially relocated pointer is 'observably-after' it may have been
+relocated. This is slightly stronger than is strictly necessary (and
+thus may disallow some otherwise valid programs), but greatly
+simplifies reasoning about correctness of the compiled code.
+
+By construction, this property will be upheld by the optimizer if
+correctly established in the source IR. This is a key invariant of
+the design.
+
+The existing IR Verifier pass has been extended to check most of the
+local restrictions on the intrinsics mentioned in their respective
+documentation. The current implementation in LLVM does not check the
+key relocation invariant, but this is ongoing work on developing such
+a verifier. Please ask on llvm-dev if you're interested in
+experimenting with the current version.
+
+.. _statepoint-utilities:
+
+Utility Passes for Safepoint Insertion
+======================================
+
+.. _RewriteStatepointsForGC:
+
+RewriteStatepointsForGC
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The pass RewriteStatepointsForGC transforms a function's IR to lower from the
+abstract machine model described above to the explicit statepoint model of
+relocations. To do this, it replaces all calls or invokes of functions which
+might contain a safepoint poll with a ``gc.statepoint`` and associated full
+relocation sequence, including all required ``gc.relocates``.
+
+Note that by default, this pass only runs for the "statepoint-example" or
+"core-clr" gc strategies. You will need to add your custom strategy to this
+whitelist or use one of the predefined ones.
+
+As an example, given this code:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ call void @foo()
+ ret i8 addrspace(1)* %obj
+ }
+
+The pass would produce this IR:
+
+.. code-block:: llvm
+
+ define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
+ gc "statepoint-example" {
+ %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
+ %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 12, i32 12)
+ ret i8 addrspace(1)* %obj.relocated
+ }
+
+In the above examples, the addrspace(1) marker on the pointers is the mechanism
+that the ``statepoint-example`` GC strategy uses to distinguish references from
+non references. The pass assumes that all addrspace(1) pointers are non-integral
+pointer types. Address space 1 is not globally reserved for this purpose.
+
+This pass can be used an utility function by a language frontend that doesn't
+want to manually reason about liveness, base pointers, or relocation when
+constructing IR. As currently implemented, RewriteStatepointsForGC must be
+run after SSA construction (i.e. mem2ref).
+
+RewriteStatepointsForGC will ensure that appropriate base pointers are listed
+for every relocation created. It will do so by duplicating code as needed to
+propagate the base pointer associated with each pointer being relocated to
+the appropriate safepoints. The implementation assumes that the following
+IR constructs produce base pointers: loads from the heap, addresses of global
+variables, function arguments, function return values. Constant pointers (such
+as null) are also assumed to be base pointers. In practice, this constraint
+can be relaxed to producing interior derived pointers provided the target
+collector can find the associated allocation from an arbitrary interior
+derived pointer.
+
+By default RewriteStatepointsForGC passes in ``0xABCDEF00`` as the statepoint
+ID and ``0`` as the number of patchable bytes to the newly constructed
+``gc.statepoint``. These values can be configured on a per-callsite
+basis using the attributes ``"statepoint-id"`` and
+``"statepoint-num-patch-bytes"``. If a call site is marked with a
+``"statepoint-id"`` function attribute and its value is a positive
+integer (represented as a string), then that value is used as the ID
+of the newly constructed ``gc.statepoint``. If a call site is marked
+with a ``"statepoint-num-patch-bytes"`` function attribute and its
+value is a positive integer, then that value is used as the 'num patch
+bytes' parameter of the newly constructed ``gc.statepoint``. The
+``"statepoint-id"`` and ``"statepoint-num-patch-bytes"`` attributes
+are not propagated to the ``gc.statepoint`` call or invoke if they
+could be successfully parsed.
+
+In practice, RewriteStatepointsForGC should be run much later in the pass
+pipeline, after most optimization is already done. This helps to improve
+the quality of the generated code when compiled with garbage collection support.
+
+.. _PlaceSafepoints:
+
+PlaceSafepoints
+^^^^^^^^^^^^^^^^
+
+The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running
+code checks for a safepoint request on a timely manner. This pass is expected
+to be run before RewriteStatepointsForGC and thus does not produce full
+relocation sequences.
+
+As an example, given input IR of the following:
+
+.. code-block:: llvm
+
+ define void @test() gc "statepoint-example" {
+ call void @foo()
+ ret void
+ }
+
+ declare void @do_safepoint()
+ define void @gc.safepoint_poll() {
+ call void @do_safepoint()
+ ret void
+ }
+
+
+This pass would produce the following IR:
+
+.. code-block:: llvm
+
+ define void @test() gc "statepoint-example" {
+ call void @do_safepoint()
+ call void @foo()
+ ret void
+ }
+
+In this case, we've added an (unconditional) entry safepoint poll. Note that
+despite appearances, the entry poll is not necessarily redundant. We'd have to
+know that ``foo`` and ``test`` were not mutually recursive for the poll to be
+redundant. In practice, you'd probably want to your poll definition to contain
+a conditional branch of some form.
+
+At the moment, PlaceSafepoints can insert safepoint polls at method entry and
+loop backedges locations. Extending this to work with return polls would be
+straight forward if desired.
+
+PlaceSafepoints includes a number of optimizations to avoid placing safepoint
+polls at particular sites unless needed to ensure timely execution of a poll
+under normal conditions. PlaceSafepoints does not attempt to ensure timely
+execution of a poll under worst case conditions such as heavy system paging.
+
+The implementation of a safepoint poll action is specified by looking up a
+function of the name ``gc.safepoint_poll`` in the containing Module. The body
+of this function is inserted at each poll site desired. While calls or invokes
+inside this method are transformed to a ``gc.statepoints``, recursive poll
+insertion is not performed.
+
+This pass is useful for any language frontend which only has to support
+garbage collection semantics at safepoints. If you need other abstract
+frame information at safepoints (e.g. for deoptimization or introspection),
+you can insert safepoint polls in the frontend. If you have the later case,
+please ask on llvm-dev for suggestions. There's been a good amount of work
+done on making such a scheme work well in practice which is not yet documented
+here.
+
+
+Supported Architectures
+=======================
+
+Support for statepoint generation requires some code for each backend.
+Today, only X86_64 is supported.
+
+.. _OpenWork:
+
+Limitations and Half Baked Ideas
+================================
+
+Mixing References and Raw Pointers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Support for languages which allow unmanaged pointers to garbage collected
+objects (i.e. pass a pointer to an object to a C routine) in the abstract
+machine model. At the moment, the best idea on how to approach this
+involves an intrinsic or opaque function which hides the connection between
+the reference value and the raw pointer. The problem is that having a
+ptrtoint or inttoptr cast (which is common for such use cases) breaks the
+rules used for inferring base pointers for arbitrary references when
+lowering out of the abstract model to the explicit physical model. Note
+that a frontend which lowers directly to the physical model doesn't have
+any problems here.
+
+Objects on the Stack
+^^^^^^^^^^^^^^^^^^^^
+
+As noted above, the explicit lowering supports objects allocated on the
+stack provided the collector can find a heap map given the stack address.
+
+The missing pieces are a) integration with rewriting (RS4GC) from the
+abstract machine model and b) support for optionally decomposing on stack
+objects so as not to require heap maps for them. The later is required
+for ease of integration with some collectors.
+
+Lowering Quality and Representation Overhead
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The current statepoint lowering is known to be somewhat poor. In the very
+long term, we'd like to integrate statepoints with the register allocator;
+in the near term this is unlikely to happen. We've found the quality of
+lowering to be relatively unimportant as hot-statepoints are almost always
+inliner bugs.
+
+Concerns have been raised that the statepoint representation results in a
+large amount of IR being produced for some examples and that this
+contributes to higher than expected memory usage and compile times. There's
+no immediate plans to make changes due to this, but alternate models may be
+explored in the future.
+
+Relocations Along Exceptional Edges
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Relocations along exceptional paths are currently broken in ToT. In
+particular, there is current no way to represent a rethrow on a path which
+also has relocations. See `this llvm-dev discussion
+<https://groups.google.com/forum/#!topic/llvm-dev/AE417XjgxvI>`_ for more
+detail.
+
+Support for alternate stackmap formats
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For some use cases, it is
+desirable to directly encode a final memory efficient stackmap format for
+use by the runtime. This is particularly relevant for ahead of time
+compilers which wish to directly link object files without the need for
+post processing of each individual object file. While not implemented
+today for statepoints, there is precedent for a GCStrategy to be able to
+select a customer GCMetataPrinter for this purpose. Patches to enable
+this functionality upstream are welcome.
+
+Bugs and Enhancements
+=====================
+
+Currently known bugs and enhancements under consideration can be
+tracked by performing a `bugzilla search
+<https://bugs.llvm.org/buglist.cgi?cmdtype=runnamed&namedcmd=Statepoint%20Bugs&list_id=64342>`_
+for [Statepoint] in the summary field. When filing new bugs, please
+use this tag so that interested parties see the newly filed bug. As
+with most LLVM features, design discussions take place on `llvm-dev
+<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_, and patches
+should be sent to `llvm-commits
+<http://lists.llvm.org/mailman/listinfo/llvm-commits>`_ for review.
+
Added: www-releases/trunk/8.0.1/docs/_sources/SupportLibrary.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/SupportLibrary.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/SupportLibrary.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/SupportLibrary.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,246 @@
+===============
+Support Library
+===============
+
+Abstract
+========
+
+This document provides some details on LLVM's Support Library, located in the
+source at ``lib/Support`` and ``include/llvm/Support``. The library's purpose
+is to shield LLVM from the differences between operating systems for the few
+services LLVM needs from the operating system. Much of LLVM is written using
+portability features of standard C++. However, in a few areas, system dependent
+facilities are needed and the Support Library is the wrapper around those
+system calls.
+
+By centralizing LLVM's use of operating system interfaces, we make it possible
+for the LLVM tool chain and runtime libraries to be more easily ported to new
+platforms since (theoretically) only ``lib/Support`` needs to be ported. This
+library also unclutters the rest of LLVM from #ifdef use and special cases for
+specific operating systems. Such uses are replaced with simple calls to the
+interfaces provided in ``include/llvm/Support``.
+
+Note that the Support Library is not intended to be a complete operating system
+wrapper (such as the Adaptive Communications Environment (ACE) or Apache
+Portable Runtime (APR)), but only provides the functionality necessary to
+support LLVM.
+
+The Support Library was originally referred to as the System Library, written
+by Reid Spencer who formulated the design based on similar work originating
+from the eXtensible Programming System (XPS). Several people helped with the
+effort; especially, Jeff Cohen and Henrik Bach on the Win32 port.
+
+Keeping LLVM Portable
+=====================
+
+In order to keep LLVM portable, LLVM developers should adhere to a set of
+portability rules associated with the Support Library. Adherence to these rules
+should help the Support Library achieve its goal of shielding LLVM from the
+variations in operating system interfaces and doing so efficiently. The
+following sections define the rules needed to fulfill this objective.
+
+Don't Include System Headers
+----------------------------
+
+Except in ``lib/Support``, no LLVM source code should directly ``#include`` a
+system header. Care has been taken to remove all such ``#includes`` from LLVM
+while ``lib/Support`` was being developed. Specifically this means that header
+files like "``unistd.h``", "``windows.h``", "``stdio.h``", and "``string.h``"
+are forbidden to be included by LLVM source code outside the implementation of
+``lib/Support``.
+
+To obtain system-dependent functionality, existing interfaces to the system
+found in ``include/llvm/Support`` should be used. If an appropriate interface is
+not available, it should be added to ``include/llvm/Support`` and implemented in
+``lib/Support`` for all supported platforms.
+
+Don't Expose System Headers
+---------------------------
+
+The Support Library must shield LLVM from **all** system headers. To obtain
+system level functionality, LLVM source must
+``#include "llvm/Support/Thing.h"`` and nothing else. This means that
+``Thing.h`` cannot expose any system header files. This protects LLVM from
+accidentally using system specific functionality and only allows it via
+the ``lib/Support`` interface.
+
+Use Standard C Headers
+----------------------
+
+The **standard** C headers (the ones beginning with "c") are allowed to be
+exposed through the ``lib/Support`` interface. These headers and the things they
+declare are considered to be platform agnostic. LLVM source files may include
+them directly or obtain their inclusion through ``lib/Support`` interfaces.
+
+Use Standard C++ Headers
+------------------------
+
+The **standard** C++ headers from the standard C++ library and standard
+template library may be exposed through the ``lib/Support`` interface. These
+headers and the things they declare are considered to be platform agnostic.
+LLVM source files may include them or obtain their inclusion through
+``lib/Support`` interfaces.
+
+High Level Interface
+--------------------
+
+The entry points specified in the interface of ``lib/Support`` must be aimed at
+completing some reasonably high level task needed by LLVM. We do not want to
+simply wrap each operating system call. It would be preferable to wrap several
+operating system calls that are always used in conjunction with one another by
+LLVM.
+
+For example, consider what is needed to execute a program, wait for it to
+complete, and return its result code. On Unix, this involves the following
+operating system calls: ``getenv``, ``fork``, ``execve``, and ``wait``. The
+correct thing for ``lib/Support`` to provide is a function, say
+``ExecuteProgramAndWait``, that implements the functionality completely. what
+we don't want is wrappers for the operating system calls involved.
+
+There must **not** be a one-to-one relationship between operating system
+calls and the Support library's interface. Any such interface function will be
+suspicious.
+
+No Unused Functionality
+-----------------------
+
+There must be no functionality specified in the interface of ``lib/Support``
+that isn't actually used by LLVM. We're not writing a general purpose operating
+system wrapper here, just enough to satisfy LLVM's needs. And, LLVM doesn't
+need much. This design goal aims to keep the ``lib/Support`` interface small and
+understandable which should foster its actual use and adoption.
+
+No Duplicate Implementations
+----------------------------
+
+The implementation of a function for a given platform must be written exactly
+once. This implies that it must be possible to apply a function's
+implementation to multiple operating systems if those operating systems can
+share the same implementation. This rule applies to the set of operating
+systems supported for a given class of operating system (e.g. Unix, Win32).
+
+No Virtual Methods
+------------------
+
+The Support Library interfaces can be called quite frequently by LLVM. In order
+to make those calls as efficient as possible, we discourage the use of virtual
+methods. There is no need to use inheritance for implementation differences, it
+just adds complexity. The ``#include`` mechanism works just fine.
+
+No Exposed Functions
+--------------------
+
+Any functions defined by system libraries (i.e. not defined by ``lib/Support``)
+must not be exposed through the ``lib/Support`` interface, even if the header
+file for that function is not exposed. This prevents inadvertent use of system
+specific functionality.
+
+For example, the ``stat`` system call is notorious for having variations in the
+data it provides. ``lib/Support`` must not declare ``stat`` nor allow it to be
+declared. Instead it should provide its own interface to discovering
+information about files and directories. Those interfaces may be implemented in
+terms of ``stat`` but that is strictly an implementation detail. The interface
+provided by the Support Library must be implemented on all platforms (even
+those without ``stat``).
+
+No Exposed Data
+---------------
+
+Any data defined by system libraries (i.e. not defined by ``lib/Support``) must
+not be exposed through the ``lib/Support`` interface, even if the header file
+for that function is not exposed. As with functions, this prevents inadvertent
+use of data that might not exist on all platforms.
+
+Minimize Soft Errors
+--------------------
+
+Operating system interfaces will generally provide error results for every
+little thing that could go wrong. In almost all cases, you can divide these
+error results into two groups: normal/good/soft and abnormal/bad/hard. That is,
+some of the errors are simply information like "file not found", "insufficient
+privileges", etc. while other errors are much harder like "out of space", "bad
+disk sector", or "system call interrupted". We'll call the first group "*soft*"
+errors and the second group "*hard*" errors.
+
+``lib/Support`` must always attempt to minimize soft errors. This is a design
+requirement because the minimization of soft errors can affect the granularity
+and the nature of the interface. In general, if you find that you're wanting to
+throw soft errors, you must review the granularity of the interface because it
+is likely you're trying to implement something that is too low level. The rule
+of thumb is to provide interface functions that **can't** fail, except when
+faced with hard errors.
+
+For a trivial example, suppose we wanted to add an "``OpenFileForWriting``"
+function. For many operating systems, if the file doesn't exist, attempting to
+open the file will produce an error. However, ``lib/Support`` should not simply
+throw that error if it occurs because its a soft error. The problem is that the
+interface function, ``OpenFileForWriting`` is too low level. It should be
+``OpenOrCreateFileForWriting``. In the case of the soft "doesn't exist" error,
+this function would just create it and then open it for writing.
+
+This design principle needs to be maintained in ``lib/Support`` because it
+avoids the propagation of soft error handling throughout the rest of LLVM.
+Hard errors will generally just cause a termination for an LLVM tool so don't
+be bashful about throwing them.
+
+Rules of thumb:
+
+#. Don't throw soft errors, only hard errors.
+
+#. If you're tempted to throw a soft error, re-think the interface.
+
+#. Handle internally the most common normal/good/soft error conditions
+ so the rest of LLVM doesn't have to.
+
+No throw Specifications
+-----------------------
+
+None of the ``lib/Support`` interface functions may be declared with C++
+``throw()`` specifications on them. This requirement makes sure that the
+compiler does not insert additional exception handling code into the interface
+functions. This is a performance consideration: ``lib/Support`` functions are
+at the bottom of many call chains and as such can be frequently called. We
+need them to be as efficient as possible. However, no routines in the system
+library should actually throw exceptions.
+
+Code Organization
+-----------------
+
+Implementations of the Support Library interface are separated by their general
+class of operating system. Currently only Unix and Win32 classes are defined
+but more could be added for other operating system classifications. To
+distinguish which implementation to compile, the code in ``lib/Support`` uses
+the ``LLVM_ON_UNIX`` and ``_WIN32`` ``#defines``. Each source file in
+``lib/Support``, after implementing the generic (operating system independent)
+functionality needs to include the correct implementation using a set of
+``#if defined(LLVM_ON_XYZ)`` directives. For example, if we had
+``lib/Support/Path.cpp``, we'd expect to see in that file:
+
+.. code-block:: c++
+
+ #if defined(LLVM_ON_UNIX)
+ #include "Unix/Path.inc"
+ #endif
+ #if defined(_WIN32)
+ #include "Windows/Path.inc"
+ #endif
+
+The implementation in ``lib/Support/Unix/Path.inc`` should handle all Unix
+variants. The implementation in ``lib/Support/Windows/Path.inc`` should handle
+all Windows variants. What this does is quickly inc the basic class
+of operating system that will provide the implementation. The specific details
+for a given platform must still be determined through the use of ``#ifdef``.
+
+Consistent Semantics
+--------------------
+
+The implementation of a ``lib/Support`` interface can vary drastically between
+platforms. That's okay as long as the end result of the interface function is
+the same. For example, a function to create a directory is pretty straight
+forward on all operating system. System V IPC on the other hand isn't even
+supported on all platforms. Instead of "supporting" System V IPC,
+``lib/Support`` should provide an interface to the basic concept of
+inter-process communications. The implementations might use System V IPC if
+that was available or named pipes, or whatever gets the job done effectively
+for a given operating system. In all cases, the interface and the
+implementation must be semantically consistent.
Added: www-releases/trunk/8.0.1/docs/_sources/SystemLibrary.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/SystemLibrary.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/SystemLibrary.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/SystemLibrary.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,9 @@
+==============
+System Library
+==============
+
+Moved
+=====
+
+The System Library has been renamed to Support Library with documentation
+available at :doc:`SupportLibrary`. Please, change your links to that page.
Added: www-releases/trunk/8.0.1/docs/_sources/TableGen/BackEnds.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TableGen/BackEnds.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TableGen/BackEnds.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TableGen/BackEnds.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,571 @@
+=================
+TableGen BackEnds
+=================
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+TableGen backends are at the core of TableGen's functionality. The source files
+provide the semantics to a generated (in memory) structure, but it's up to the
+backend to print this out in a way that is meaningful to the user (normally a
+C program including a file or a textual list of warnings, options and error
+messages).
+
+TableGen is used by both LLVM and Clang with very different goals. LLVM uses it
+as a way to automate the generation of massive amounts of information regarding
+instructions, schedules, cores and architecture features. Some backends generate
+output that is consumed by more than one source file, so they need to be created
+in a way that is easy to use pre-processor tricks. Some backends can also print
+C code structures, so that they can be directly included as-is.
+
+Clang, on the other hand, uses it mainly for diagnostic messages (errors,
+warnings, tips) and attributes, so more on the textual end of the scale.
+
+LLVM BackEnds
+=============
+
+.. warning::
+ This document is raw. Each section below needs three sub-sections: description
+ of its purpose with a list of users, output generated from generic input, and
+ finally why it needed a new backend (in case there's something similar).
+
+Overall, each backend will take the same TableGen file type and transform into
+similar output for different targets/uses. There is an implicit contract between
+the TableGen files, the back-ends and their users.
+
+For instance, a global contract is that each back-end produces macro-guarded
+sections. Based on whether the file is included by a header or a source file,
+or even in which context of each file the include is being used, you have
+todefine a macro just before including it, to get the right output:
+
+.. code-block:: c++
+
+ #define GET_REGINFO_TARGET_DESC
+ #include "ARMGenRegisterInfo.inc"
+
+And just part of the generated file would be included. This is useful if
+you need the same information in multiple formats (instantiation, initialization,
+getter/setter functions, etc) from the same source TableGen file without having
+to re-compile the TableGen file multiple times.
+
+Sometimes, multiple macros might be defined before the same include file to
+output multiple blocks:
+
+.. code-block:: c++
+
+ #define GET_REGISTER_MATCHER
+ #define GET_SUBTARGET_FEATURE_NAME
+ #define GET_MATCHER_IMPLEMENTATION
+ #include "ARMGenAsmMatcher.inc"
+
+The macros will be undef'd automatically as they're used, in the include file.
+
+On all LLVM back-ends, the ``llvm-tblgen`` binary will be executed on the root
+TableGen file ``<Target>.td``, which should include all others. This guarantees
+that all information needed is accessible, and that no duplication is needed
+in the TableGen files.
+
+CodeEmitter
+-----------
+
+**Purpose**: CodeEmitterGen uses the descriptions of instructions and their fields to
+construct an automated code emitter: a function that, given a MachineInstr,
+returns the (currently, 32-bit unsigned) value of the instruction.
+
+**Output**: C++ code, implementing the target's CodeEmitter
+class by overriding the virtual functions as ``<Target>CodeEmitter::function()``.
+
+**Usage**: Used to include directly at the end of ``<Target>MCCodeEmitter.cpp``.
+
+RegisterInfo
+------------
+
+**Purpose**: This tablegen backend is responsible for emitting a description of a target
+register file for a code generator. It uses instances of the Register,
+RegisterAliases, and RegisterClass classes to gather this information.
+
+**Output**: C++ code with enums and structures representing the register mappings,
+properties, masks, etc.
+
+**Usage**: Both on ``<Target>BaseRegisterInfo`` and ``<Target>MCTargetDesc`` (headers
+and source files) with macros defining in which they are for declaration vs.
+initialization issues.
+
+InstrInfo
+---------
+
+**Purpose**: This tablegen backend is responsible for emitting a description of the target
+instruction set for the code generator. (what are the differences from CodeEmitter?)
+
+**Output**: C++ code with enums and structures representing the instruction mappings,
+properties, masks, etc.
+
+**Usage**: Both on ``<Target>BaseInstrInfo`` and ``<Target>MCTargetDesc`` (headers
+and source files) with macros defining in which they are for declaration vs.
+initialization issues.
+
+AsmWriter
+---------
+
+**Purpose**: Emits an assembly printer for the current target.
+
+**Output**: Implementation of ``<Target>InstPrinter::printInstruction()``, among
+other things.
+
+**Usage**: Included directly into ``InstPrinter/<Target>InstPrinter.cpp``.
+
+AsmMatcher
+----------
+
+**Purpose**: Emits a target specifier matcher for
+converting parsed assembly operands in the MCInst structures. It also
+emits a matcher for custom operand parsing. Extensive documentation is
+written on the ``AsmMatcherEmitter.cpp`` file.
+
+**Output**: Assembler parsers' matcher functions, declarations, etc.
+
+**Usage**: Used in back-ends' ``AsmParser/<Target>AsmParser.cpp`` for
+building the AsmParser class.
+
+Disassembler
+------------
+
+**Purpose**: Contains disassembler table emitters for various
+architectures. Extensive documentation is written on the
+``DisassemblerEmitter.cpp`` file.
+
+**Output**: Decoding tables, static decoding functions, etc.
+
+**Usage**: Directly included in ``Disassembler/<Target>Disassembler.cpp``
+to cater for all default decodings, after all hand-made ones.
+
+PseudoLowering
+--------------
+
+**Purpose**: Generate pseudo instruction lowering.
+
+**Output**: Implements ``<Target>AsmPrinter::emitPseudoExpansionLowering()``.
+
+**Usage**: Included directly into ``<Target>AsmPrinter.cpp``.
+
+CallingConv
+-----------
+
+**Purpose**: Responsible for emitting descriptions of the calling
+conventions supported by this target.
+
+**Output**: Implement static functions to deal with calling conventions
+chained by matching styles, returning false on no match.
+
+**Usage**: Used in ISelLowering and FastIsel as function pointers to
+implementation returned by a CC selection function.
+
+DAGISel
+-------
+
+**Purpose**: Generate a DAG instruction selector.
+
+**Output**: Creates huge functions for automating DAG selection.
+
+**Usage**: Included in ``<Target>ISelDAGToDAG.cpp`` inside the target's
+implementation of ``SelectionDAGISel``.
+
+DFAPacketizer
+-------------
+
+**Purpose**: This class parses the Schedule.td file and produces an API that
+can be used to reason about whether an instruction can be added to a packet
+on a VLIW architecture. The class internally generates a deterministic finite
+automaton (DFA) that models all possible mappings of machine instructions
+to functional units as instructions are added to a packet.
+
+**Output**: Scheduling tables for GPU back-ends (Hexagon, AMD).
+
+**Usage**: Included directly on ``<Target>InstrInfo.cpp``.
+
+FastISel
+--------
+
+**Purpose**: This tablegen backend emits code for use by the "fast"
+instruction selection algorithm. See the comments at the top of
+lib/CodeGen/SelectionDAG/FastISel.cpp for background. This file
+scans through the target's tablegen instruction-info files
+and extracts instructions with obvious-looking patterns, and it emits
+code to look up these instructions by type and operator.
+
+**Output**: Generates ``Predicate`` and ``FastEmit`` methods.
+
+**Usage**: Implements private methods of the targets' implementation
+of ``FastISel`` class.
+
+Subtarget
+---------
+
+**Purpose**: Generate subtarget enumerations.
+
+**Output**: Enums, globals, local tables for sub-target information.
+
+**Usage**: Populates ``<Target>Subtarget`` and
+``MCTargetDesc/<Target>MCTargetDesc`` files (both headers and source).
+
+Intrinsic
+---------
+
+**Purpose**: Generate (target) intrinsic information.
+
+OptParserDefs
+-------------
+
+**Purpose**: Print enum values for a class.
+
+SearchableTables
+----------------
+
+**Purpose**: Generate custom searchable tables.
+
+**Output**: Enums, global tables and lookup helper functions.
+
+**Usage**: This backend allows generating free-form, target-specific tables
+from TableGen records. The ARM and AArch64 targets use this backend to generate
+tables of system registers; the AMDGPU target uses it to generate meta-data
+about complex image and memory buffer instructions.
+
+More documentation is available in ``include/llvm/TableGen/SearchableTable.td``,
+which also contains the definitions of TableGen classes which must be
+instantiated in order to define the enums and tables emitted by this backend.
+
+CTags
+-----
+
+**Purpose**: This tablegen backend emits an index of definitions in ctags(1)
+format. A helper script, utils/TableGen/tdtags, provides an easier-to-use
+interface; run 'tdtags -H' for documentation.
+
+X86EVEX2VEX
+-----------
+
+**Purpose**: This X86 specific tablegen backend emits tables that map EVEX
+encoded instructions to their VEX encoded identical instruction.
+
+Clang BackEnds
+==============
+
+ClangAttrClasses
+----------------
+
+**Purpose**: Creates Attrs.inc, which contains semantic attribute class
+declarations for any attribute in ``Attr.td`` that has not set ``ASTNode = 0``.
+This file is included as part of ``Attr.h``.
+
+ClangAttrParserStringSwitches
+-----------------------------
+
+**Purpose**: Creates AttrParserStringSwitches.inc, which contains
+StringSwitch::Case statements for parser-related string switches. Each switch
+is given its own macro (such as ``CLANG_ATTR_ARG_CONTEXT_LIST``, or
+``CLANG_ATTR_IDENTIFIER_ARG_LIST``), which is expected to be defined before
+including AttrParserStringSwitches.inc, and undefined after.
+
+ClangAttrImpl
+-------------
+
+**Purpose**: Creates AttrImpl.inc, which contains semantic attribute class
+definitions for any attribute in ``Attr.td`` that has not set ``ASTNode = 0``.
+This file is included as part of ``AttrImpl.cpp``.
+
+ClangAttrList
+-------------
+
+**Purpose**: Creates AttrList.inc, which is used when a list of semantic
+attribute identifiers is required. For instance, ``AttrKinds.h`` includes this
+file to generate the list of ``attr::Kind`` enumeration values. This list is
+separated out into multiple categories: attributes, inheritable attributes, and
+inheritable parameter attributes. This categorization happens automatically
+based on information in ``Attr.td`` and is used to implement the ``classof``
+functionality required for ``dyn_cast`` and similar APIs.
+
+ClangAttrPCHRead
+----------------
+
+**Purpose**: Creates AttrPCHRead.inc, which is used to deserialize attributes
+in the ``ASTReader::ReadAttributes`` function.
+
+ClangAttrPCHWrite
+-----------------
+
+**Purpose**: Creates AttrPCHWrite.inc, which is used to serialize attributes in
+the ``ASTWriter::WriteAttributes`` function.
+
+ClangAttrSpellings
+---------------------
+
+**Purpose**: Creates AttrSpellings.inc, which is used to implement the
+``__has_attribute`` feature test macro.
+
+ClangAttrSpellingListIndex
+--------------------------
+
+**Purpose**: Creates AttrSpellingListIndex.inc, which is used to map parsed
+attribute spellings (including which syntax or scope was used) to an attribute
+spelling list index. These spelling list index values are internal
+implementation details exposed via
+``AttributeList::getAttributeSpellingListIndex``.
+
+ClangAttrVisitor
+-------------------
+
+**Purpose**: Creates AttrVisitor.inc, which is used when implementing
+recursive AST visitors.
+
+ClangAttrTemplateInstantiate
+----------------------------
+
+**Purpose**: Creates AttrTemplateInstantiate.inc, which implements the
+``instantiateTemplateAttribute`` function, used when instantiating a template
+that requires an attribute to be cloned.
+
+ClangAttrParsedAttrList
+-----------------------
+
+**Purpose**: Creates AttrParsedAttrList.inc, which is used to generate the
+``AttributeList::Kind`` parsed attribute enumeration.
+
+ClangAttrParsedAttrImpl
+-----------------------
+
+**Purpose**: Creates AttrParsedAttrImpl.inc, which is used by
+``AttributeList.cpp`` to implement several functions on the ``AttributeList``
+class. This functionality is implemented via the ``AttrInfoMap ParsedAttrInfo``
+array, which contains one element per parsed attribute object.
+
+ClangAttrParsedAttrKinds
+------------------------
+
+**Purpose**: Creates AttrParsedAttrKinds.inc, which is used to implement the
+``AttributeList::getKind`` function, mapping a string (and syntax) to a parsed
+attribute ``AttributeList::Kind`` enumeration.
+
+ClangAttrDump
+-------------
+
+**Purpose**: Creates AttrDump.inc, which dumps information about an attribute.
+It is used to implement ``ASTDumper::dumpAttr``.
+
+ClangDiagsDefs
+--------------
+
+Generate Clang diagnostics definitions.
+
+ClangDiagGroups
+---------------
+
+Generate Clang diagnostic groups.
+
+ClangDiagsIndexName
+-------------------
+
+Generate Clang diagnostic name index.
+
+ClangCommentNodes
+-----------------
+
+Generate Clang AST comment nodes.
+
+ClangDeclNodes
+--------------
+
+Generate Clang AST declaration nodes.
+
+ClangStmtNodes
+--------------
+
+Generate Clang AST statement nodes.
+
+ClangSACheckers
+---------------
+
+Generate Clang Static Analyzer checkers.
+
+ClangCommentHTMLTags
+--------------------
+
+Generate efficient matchers for HTML tag names that are used in documentation comments.
+
+ClangCommentHTMLTagsProperties
+------------------------------
+
+Generate efficient matchers for HTML tag properties.
+
+ClangCommentHTMLNamedCharacterReferences
+----------------------------------------
+
+Generate function to translate named character references to UTF-8 sequences.
+
+ClangCommentCommandInfo
+-----------------------
+
+Generate command properties for commands that are used in documentation comments.
+
+ClangCommentCommandList
+-----------------------
+
+Generate list of commands that are used in documentation comments.
+
+ArmNeon
+-------
+
+Generate arm_neon.h for clang.
+
+ArmNeonSema
+-----------
+
+Generate ARM NEON sema support for clang.
+
+ArmNeonTest
+-----------
+
+Generate ARM NEON tests for clang.
+
+AttrDocs
+--------
+
+**Purpose**: Creates ``AttributeReference.rst`` from ``AttrDocs.td``, and is
+used for documenting user-facing attributes.
+
+General BackEnds
+================
+
+JSON
+----
+
+**Purpose**: Output all the values in every ``def``, as a JSON data
+structure that can be easily parsed by a variety of languages. Useful
+for writing custom backends without having to modify TableGen itself,
+or for performing auxiliary analysis on the same TableGen data passed
+to a built-in backend.
+
+**Output**:
+
+The root of the output file is a JSON object (i.e. dictionary),
+containing the following fixed keys:
+
+* ``!tablegen_json_version``: a numeric version field that will
+ increase if an incompatible change is ever made to the structure of
+ this data. The format described here corresponds to version 1.
+
+* ``!instanceof``: a dictionary whose keys are the class names defined
+ in the TableGen input. For each key, the corresponding value is an
+ array of strings giving the names of ``def`` records that derive
+ from that class. So ``root["!instanceof"]["Instruction"]``, for
+ example, would list the names of all the records deriving from the
+ class ``Instruction``.
+
+For each ``def`` record, the root object also has a key for the record
+name. The corresponding value is a subsidiary object containing the
+following fixed keys:
+
+* ``!superclasses``: an array of strings giving the names of all the
+ classes that this record derives from.
+
+* ``!fields``: an array of strings giving the names of all the variables
+ in this record that were defined with the ``field`` keyword.
+
+* ``!name``: a string giving the name of the record. This is always
+ identical to the key in the JSON root object corresponding to this
+ record's dictionary. (If the record is anonymous, the name is
+ arbitrary.)
+
+* ``!anonymous``: a boolean indicating whether the record's name was
+ specified by the TableGen input (if it is ``false``), or invented by
+ TableGen itself (if ``true``).
+
+For each variable defined in a record, the ``def`` object for that
+record also has a key for the variable name. The corresponding value
+is a translation into JSON of the variable's value, using the
+conventions described below.
+
+Some TableGen data types are translated directly into the
+corresponding JSON type:
+
+* A completely undefined value (e.g. for a variable declared without
+ initializer in some superclass of this record, and never initialized
+ by the record itself or any other superclass) is emitted as the JSON
+ ``null`` value.
+
+* ``int`` and ``bit`` values are emitted as numbers. Note that
+ TableGen ``int`` values are capable of holding integers too large to
+ be exactly representable in IEEE double precision. The integer
+ literal in the JSON output will show the full exact integer value.
+ So if you need to retrieve large integers with full precision, you
+ should use a JSON reader capable of translating such literals back
+ into 64-bit integers without losing precision, such as Python's
+ standard ``json`` module.
+
+* ``string`` and ``code`` values are emitted as JSON strings.
+
+* ``list<T>`` values, for any element type ``T``, are emitted as JSON
+ arrays. Each element of the array is represented in turn using these
+ same conventions.
+
+* ``bits`` values are also emitted as arrays. A ``bits`` array is
+ ordered from least-significant bit to most-significant. So the
+ element with index ``i`` corresponds to the bit described as
+ ``x{i}`` in TableGen source. However, note that this means that
+ scripting languages are likely to *display* the array in the
+ opposite order from the way it appears in the TableGen source or in
+ the diagnostic ``-print-records`` output.
+
+All other TableGen value types are emitted as a JSON object,
+containing two standard fields: ``kind`` is a discriminator describing
+which kind of value the object represents, and ``printable`` is a
+string giving the same representation of the value that would appear
+in ``-print-records``.
+
+* A reference to a ``def`` object has ``kind=="def"``, and has an
+ extra field ``def`` giving the name of the object referred to.
+
+* A reference to another variable in the same record has
+ ``kind=="var"``, and has an extra field ``var`` giving the name of
+ the variable referred to.
+
+* A reference to a specific bit of a ``bits``-typed variable in the
+ same record has ``kind=="varbit"``, and has two extra fields:
+ ``var`` gives the name of the variable referred to, and ``index``
+ gives the index of the bit.
+
+* A value of type ``dag`` has ``kind=="dag"``, and has two extra
+ fields. ``operator`` gives the initial value after the opening
+ parenthesis of the dag initializer; ``args`` is an array giving the
+ following arguments. The elements of ``args`` are arrays of length
+ 2, giving the value of each argument followed by its colon-suffixed
+ name (if any). For example, in the JSON representation of the dag
+ value ``(Op 22, "hello":$foo)`` (assuming that ``Op`` is the name of
+ a record defined elsewhere with a ``def`` statement):
+
+ * ``operator`` will be an object in which ``kind=="def"`` and
+ ``def=="Op"``
+
+ * ``args`` will be the array ``[[22, null], ["hello", "foo"]]``.
+
+* If any other kind of value or complicated expression appears in the
+ output, it will have ``kind=="complex"``, and no additional fields.
+ These values are not expected to be needed by backends. The standard
+ ``printable`` field can be used to extract a representation of them
+ in TableGen source syntax if necessary.
+
+How to write a back-end
+=======================
+
+TODO.
+
+Until we get a step-by-step HowTo for writing TableGen backends, you can at
+least grab the boilerplate (build system, new files, etc.) from Clang's
+r173931.
+
+TODO: How they work, how to write one. This section should not contain details
+about any particular backend, except maybe ``-print-enums`` as an example. This
+should highlight the APIs in ``TableGen/Record.h``.
+
Added: www-releases/trunk/8.0.1/docs/_sources/TableGen/Deficiencies.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TableGen/Deficiencies.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TableGen/Deficiencies.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TableGen/Deficiencies.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,31 @@
+=====================
+TableGen Deficiencies
+=====================
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+Despite being very generic, TableGen has some deficiencies that have been
+pointed out numerous times. The common theme is that, while TableGen allows
+you to build Domain-Specific-Languages, the final languages that you create
+lack the power of other DSLs, which in turn increase considerably the size
+and complexity of TableGen files.
+
+At the same time, TableGen allows you to create virtually any meaning of
+the basic concepts via custom-made back-ends, which can pervert the original
+design and make it very hard for newcomers to understand it.
+
+There are some in favour of extending the semantics even more, but making sure
+back-ends adhere to strict rules. Others suggesting we should move to more
+powerful DSLs designed with specific purposes, or even re-using existing
+DSLs.
+
+Known Problems
+==============
+
+TODO: Add here frequently asked questions about why TableGen doesn't do
+what you want, how it might, and how we could extend/restrict it to
+be more use friendly.
Added: www-releases/trunk/8.0.1/docs/_sources/TableGen/LangIntro.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TableGen/LangIntro.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TableGen/LangIntro.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TableGen/LangIntro.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,695 @@
+==============================
+TableGen Language Introduction
+==============================
+
+.. contents::
+ :local:
+
+.. warning::
+ This document is extremely rough. If you find something lacking, please
+ fix it, file a documentation bug, or ask about it on llvm-dev.
+
+Introduction
+============
+
+This document is not meant to be a normative spec about the TableGen language
+in and of itself (i.e. how to understand a given construct in terms of how
+it affects the final set of records represented by the TableGen file). For
+the formal language specification, see :doc:`LangRef`.
+
+TableGen syntax
+===============
+
+TableGen doesn't care about the meaning of data (that is up to the backend to
+define), but it does care about syntax, and it enforces a simple type system.
+This section describes the syntax and the constructs allowed in a TableGen file.
+
+TableGen primitives
+-------------------
+
+TableGen comments
+^^^^^^^^^^^^^^^^^
+
+TableGen supports C++ style "``//``" comments, which run to the end of the
+line, and it also supports **nestable** "``/* */``" comments.
+
+.. _TableGen type:
+
+The TableGen type system
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+TableGen files are strongly typed, in a simple (but complete) type-system.
+These types are used to perform automatic conversions, check for errors, and to
+help interface designers constrain the input that they allow. Every `value
+definition`_ is required to have an associated type.
+
+TableGen supports a mixture of very low-level types (such as ``bit``) and very
+high-level types (such as ``dag``). This flexibility is what allows it to
+describe a wide range of information conveniently and compactly. The TableGen
+types are:
+
+``bit``
+ A 'bit' is a boolean value that can hold either 0 or 1.
+
+``int``
+ The 'int' type represents a simple 32-bit integer value, such as 5.
+
+``string``
+ The 'string' type represents an ordered sequence of characters of arbitrary
+ length.
+
+``code``
+ The `code` type represents a code fragment, which can be single/multi-line
+ string literal.
+
+``bits<n>``
+ A 'bits' type is an arbitrary, but fixed, size integer that is broken up
+ into individual bits. This type is useful because it can handle some bits
+ being defined while others are undefined.
+
+``list<ty>``
+ This type represents a list whose elements are some other type. The
+ contained type is arbitrary: it can even be another list type.
+
+Class type
+ Specifying a class name in a type context means that the defined value must
+ be a subclass of the specified class. This is useful in conjunction with
+ the ``list`` type, for example, to constrain the elements of the list to a
+ common base class (e.g., a ``list<Register>`` can only contain definitions
+ derived from the "``Register``" class).
+
+``dag``
+ This type represents a nestable directed graph of elements.
+
+To date, these types have been sufficient for describing things that TableGen
+has been used for, but it is straight-forward to extend this list if needed.
+
+.. _TableGen expressions:
+
+TableGen values and expressions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+TableGen allows for a pretty reasonable number of different expression forms
+when building up values. These forms allow the TableGen file to be written in a
+natural syntax and flavor for the application. The current expression forms
+supported include:
+
+``?``
+ uninitialized field
+
+``0b1001011``
+ binary integer value.
+ Note that this is sized by the number of bits given and will not be
+ silently extended/truncated.
+
+``7``
+ decimal integer value
+
+``0x7F``
+ hexadecimal integer value
+
+``"foo"``
+ a single-line string value, can be assigned to ``string`` or ``code`` variable.
+
+``[{ ... }]``
+ usually called a "code fragment", but is just a multiline string literal
+
+``[ X, Y, Z ]<type>``
+ list value. <type> is the type of the list element and is usually optional.
+ In rare cases, TableGen is unable to deduce the element type in which case
+ the user must specify it explicitly.
+
+``{ a, b, 0b10 }``
+ initializer for a "bits<4>" value.
+ 1-bit from "a", 1-bit from "b", 2-bits from 0b10.
+
+``value``
+ value reference
+
+``value{17}``
+ access to one bit of a value
+
+``value{15-17}``
+ access to an ordered sequence of bits of a value, in particular ``value{15-17}``
+ produces an order that is the reverse of ``value{17-15}``.
+
+``DEF``
+ reference to a record definition
+
+``CLASS<val list>``
+ reference to a new anonymous definition of CLASS with the specified template
+ arguments.
+
+``X.Y``
+ reference to the subfield of a value
+
+``list[4-7,17,2-3]``
+ A slice of the 'list' list, including elements 4,5,6,7,17,2, and 3 from it.
+ Elements may be included multiple times.
+
+``foreach <var> = [ <list> ] in { <body> }``
+
+``foreach <var> = [ <list> ] in <def>``
+ Replicate <body> or <def>, replacing instances of <var> with each value
+ in <list>. <var> is scoped at the level of the ``foreach`` loop and must
+ not conflict with any other object introduced in <body> or <def>. Only
+ ``def``\s and ``defm``\s are expanded within <body>.
+
+``foreach <var> = 0-15 in ...``
+
+``foreach <var> = {0-15,32-47} in ...``
+ Loop over ranges of integers. The braces are required for multiple ranges.
+
+``(DEF a, b)``
+ a dag value. The first element is required to be a record definition, the
+ remaining elements in the list may be arbitrary other values, including
+ nested ```dag``' values.
+
+``!con(a, b, ...)``
+ Concatenate two or more DAG nodes. Their operations must equal.
+
+ Example: !con((op a1:$name1, a2:$name2), (op b1:$name3)) results in
+ the DAG node (op a1:$name1, a2:$name2, b1:$name3).
+
+``!dag(op, children, names)``
+ Generate a DAG node programmatically. 'children' and 'names' must be lists
+ of equal length or unset ('?'). 'names' must be a 'list<string>'.
+
+ Due to limitations of the type system, 'children' must be a list of items
+ of a common type. In practice, this means that they should either have the
+ same type or be records with a common superclass. Mixing dag and non-dag
+ items is not possible. However, '?' can be used.
+
+ Example: !dag(op, [a1, a2, ?], ["name1", "name2", "name3"]) results in
+ (op a1:$name1, a2:$name2, ?:$name3).
+
+``!listconcat(a, b, ...)``
+ A list value that is the result of concatenating the 'a' and 'b' lists.
+ The lists must have the same element type.
+ More than two arguments are accepted with the result being the concatenation
+ of all the lists given.
+
+``!strconcat(a, b, ...)``
+ A string value that is the result of concatenating the 'a' and 'b' strings.
+ More than two arguments are accepted with the result being the concatenation
+ of all the strings given.
+
+``str1#str2``
+ "#" (paste) is a shorthand for !strconcat. It may concatenate things that
+ are not quoted strings, in which case an implicit !cast<string> is done on
+ the operand of the paste.
+
+``!cast<type>(a)``
+ If 'a' is a string, a record of type *type* obtained by looking up the
+ string 'a' in the list of all records defined by the time that all template
+ arguments in 'a' are fully resolved.
+
+ For example, if !cast<type>(a) appears in a multiclass definition, or in a
+ class instantiated inside of a multiclass definition, and 'a' does not
+ reference any template arguments of the multiclass, then a record of name
+ 'a' must be instantiated earlier in the source file. If 'a' does reference
+ a template argument, then the lookup is delayed until defm statements
+ instantiating the multiclass (or later, if the defm occurs in another
+ multiclass and template arguments of the inner multiclass that are
+ referenced by 'a' are substituted by values that themselves contain
+ references to template arguments of the outer multiclass).
+
+ If the type of 'a' does not match *type*, TableGen aborts with an error.
+
+ Otherwise, perform a normal type cast e.g. between an int and a bit, or
+ between record types. This allows casting a record to a subclass, though if
+ the types do not match, constant folding will be inhibited. !cast<string>
+ is a special case in that the argument can be an int or a record. In the
+ latter case, the record's name is returned.
+
+``!isa<type>(a)``
+ Returns an integer: 1 if 'a' is dynamically of the given type, 0 otherwise.
+
+``!subst(a, b, c)``
+ If 'a' and 'b' are of string type or are symbol references, substitute 'b'
+ for 'a' in 'c.' This operation is analogous to $(subst) in GNU make.
+
+``!foreach(a, b, c)``
+ For each member of dag or list 'b' apply operator 'c'. 'a' is the name
+ of a variable that will be substituted by members of 'b' in 'c'.
+ This operation is analogous to $(foreach) in GNU make.
+
+``!foldl(start, lst, a, b, expr)``
+ Perform a left-fold over 'lst' with the given starting value. 'a' and 'b'
+ are variable names which will be substituted in 'expr'. If you think of
+ expr as a function f(a,b), the fold will compute
+ 'f(...f(f(start, lst[0]), lst[1]), ...), lst[n-1])' for a list of length n.
+ As usual, 'a' will be of the type of 'start', and 'b' will be of the type
+ of elements of 'lst'. These types need not be the same, but 'expr' must be
+ of the same type as 'start'.
+
+``!head(a)``
+ The first element of list 'a.'
+
+``!tail(a)``
+ The 2nd-N elements of list 'a.'
+
+``!empty(a)``
+ An integer {0,1} indicating whether list 'a' is empty.
+
+``!size(a)``
+ An integer indicating the number of elements in list 'a'.
+
+``!if(a,b,c)``
+ 'b' if the result of 'int' or 'bit' operator 'a' is nonzero, 'c' otherwise.
+
+``!eq(a,b)``
+ 'bit 1' if string a is equal to string b, 0 otherwise. This only operates
+ on string, int and bit objects. Use !cast<string> to compare other types of
+ objects.
+
+``!ne(a,b)``
+ The negation of ``!eq(a,b)``.
+
+``!le(a,b), !lt(a,b), !ge(a,b), !gt(a,b)``
+ (Signed) comparison of integer values that returns bit 1 or 0 depending on
+ the result of the comparison.
+
+``!shl(a,b)`` ``!srl(a,b)`` ``!sra(a,b)``
+ The usual shift operators. Operations are on 64-bit integers, the result
+ is undefined for shift counts outside [0, 63].
+
+``!add(a,b,...)`` ``!and(a,b,...)`` ``!or(a,b,...)``
+ The usual arithmetic and binary operators.
+
+Note that all of the values have rules specifying how they convert to values
+for different types. These rules allow you to assign a value like "``7``"
+to a "``bits<4>``" value, for example.
+
+Classes and definitions
+-----------------------
+
+As mentioned in the :doc:`introduction <index>`, classes and definitions (collectively known as
+'records') in TableGen are the main high-level unit of information that TableGen
+collects. Records are defined with a ``def`` or ``class`` keyword, the record
+name, and an optional list of "`template arguments`_". If the record has
+superclasses, they are specified as a comma separated list that starts with a
+colon character ("``:``"). If `value definitions`_ or `let expressions`_ are
+needed for the class, they are enclosed in curly braces ("``{}``"); otherwise,
+the record ends with a semicolon.
+
+Here is a simple TableGen file:
+
+.. code-block:: text
+
+ class C { bit V = 1; }
+ def X : C;
+ def Y : C {
+ string Greeting = "hello";
+ }
+
+This example defines two definitions, ``X`` and ``Y``, both of which derive from
+the ``C`` class. Because of this, they both get the ``V`` bit value. The ``Y``
+definition also gets the Greeting member as well.
+
+In general, classes are useful for collecting together the commonality between a
+group of records and isolating it in a single place. Also, classes permit the
+specification of default values for their subclasses, allowing the subclasses to
+override them as they wish.
+
+.. _value definition:
+.. _value definitions:
+
+Value definitions
+^^^^^^^^^^^^^^^^^
+
+Value definitions define named entries in records. A value must be defined
+before it can be referred to as the operand for another value definition or
+before the value is reset with a `let expression`_. A value is defined by
+specifying a `TableGen type`_ and a name. If an initial value is available, it
+may be specified after the type with an equal sign. Value definitions require
+terminating semicolons.
+
+.. _let expression:
+.. _let expressions:
+.. _"let" expressions within a record:
+
+'let' expressions
+^^^^^^^^^^^^^^^^^
+
+A record-level let expression is used to change the value of a value definition
+in a record. This is primarily useful when a superclass defines a value that a
+derived class or definition wants to override. Let expressions consist of the
+'``let``' keyword followed by a value name, an equal sign ("``=``"), and a new
+value. For example, a new class could be added to the example above, redefining
+the ``V`` field for all of its subclasses:
+
+.. code-block:: text
+
+ class D : C { let V = 0; }
+ def Z : D;
+
+In this case, the ``Z`` definition will have a zero value for its ``V`` value,
+despite the fact that it derives (indirectly) from the ``C`` class, because the
+``D`` class overrode its value.
+
+References between variables in a record are substituted late, which gives
+``let`` expressions unusual power. Consider this admittedly silly example:
+
+.. code-block:: text
+
+ class A<int x> {
+ int Y = x;
+ int Yplus1 = !add(Y, 1);
+ int xplus1 = !add(x, 1);
+ }
+ def Z : A<5> {
+ let Y = 10;
+ }
+
+The value of ``Z.xplus1`` will be 6, but the value of ``Z.Yplus1`` is 11. Use
+this power wisely.
+
+.. _template arguments:
+
+Class template arguments
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+TableGen permits the definition of parameterized classes as well as normal
+concrete classes. Parameterized TableGen classes specify a list of variable
+bindings (which may optionally have defaults) that are bound when used. Here is
+a simple example:
+
+.. code-block:: text
+
+ class FPFormat<bits<3> val> {
+ bits<3> Value = val;
+ }
+ def NotFP : FPFormat<0>;
+ def ZeroArgFP : FPFormat<1>;
+ def OneArgFP : FPFormat<2>;
+ def OneArgFPRW : FPFormat<3>;
+ def TwoArgFP : FPFormat<4>;
+ def CompareFP : FPFormat<5>;
+ def CondMovFP : FPFormat<6>;
+ def SpecialFP : FPFormat<7>;
+
+In this case, template arguments are used as a space efficient way to specify a
+list of "enumeration values", each with a "``Value``" field set to the specified
+integer.
+
+The more esoteric forms of `TableGen expressions`_ are useful in conjunction
+with template arguments. As an example:
+
+.. code-block:: text
+
+ class ModRefVal<bits<2> val> {
+ bits<2> Value = val;
+ }
+
+ def None : ModRefVal<0>;
+ def Mod : ModRefVal<1>;
+ def Ref : ModRefVal<2>;
+ def ModRef : ModRefVal<3>;
+
+ class Value<ModRefVal MR> {
+ // Decode some information into a more convenient format, while providing
+ // a nice interface to the user of the "Value" class.
+ bit isMod = MR.Value{0};
+ bit isRef = MR.Value{1};
+
+ // other stuff...
+ }
+
+ // Example uses
+ def bork : Value<Mod>;
+ def zork : Value<Ref>;
+ def hork : Value<ModRef>;
+
+This is obviously a contrived example, but it shows how template arguments can
+be used to decouple the interface provided to the user of the class from the
+actual internal data representation expected by the class. In this case,
+running ``llvm-tblgen`` on the example prints the following definitions:
+
+.. code-block:: text
+
+ def bork { // Value
+ bit isMod = 1;
+ bit isRef = 0;
+ }
+ def hork { // Value
+ bit isMod = 1;
+ bit isRef = 1;
+ }
+ def zork { // Value
+ bit isMod = 0;
+ bit isRef = 1;
+ }
+
+This shows that TableGen was able to dig into the argument and extract a piece
+of information that was requested by the designer of the "Value" class. For
+more realistic examples, please see existing users of TableGen, such as the X86
+backend.
+
+Multiclass definitions and instances
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+While classes with template arguments are a good way to factor commonality
+between two instances of a definition, multiclasses allow a convenient notation
+for defining multiple definitions at once (instances of implicitly constructed
+classes). For example, consider an 3-address instruction set whose instructions
+come in two forms: "``reg = reg op reg``" and "``reg = reg op imm``"
+(e.g. SPARC). In this case, you'd like to specify in one place that this
+commonality exists, then in a separate place indicate what all the ops are.
+
+Here is an example TableGen fragment that shows this idea:
+
+.. code-block:: text
+
+ def ops;
+ def GPR;
+ def Imm;
+ class inst<int opc, string asmstr, dag operandlist>;
+
+ multiclass ri_inst<int opc, string asmstr> {
+ def _rr : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
+ (ops GPR:$dst, GPR:$src1, GPR:$src2)>;
+ def _ri : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
+ (ops GPR:$dst, GPR:$src1, Imm:$src2)>;
+ }
+
+ // Instantiations of the ri_inst multiclass.
+ defm ADD : ri_inst<0b111, "add">;
+ defm SUB : ri_inst<0b101, "sub">;
+ defm MUL : ri_inst<0b100, "mul">;
+ ...
+
+The name of the resultant definitions has the multidef fragment names appended
+to them, so this defines ``ADD_rr``, ``ADD_ri``, ``SUB_rr``, etc. A defm may
+inherit from multiple multiclasses, instantiating definitions from each
+multiclass. Using a multiclass this way is exactly equivalent to instantiating
+the classes multiple times yourself, e.g. by writing:
+
+.. code-block:: text
+
+ def ops;
+ def GPR;
+ def Imm;
+ class inst<int opc, string asmstr, dag operandlist>;
+
+ class rrinst<int opc, string asmstr>
+ : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
+ (ops GPR:$dst, GPR:$src1, GPR:$src2)>;
+
+ class riinst<int opc, string asmstr>
+ : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
+ (ops GPR:$dst, GPR:$src1, Imm:$src2)>;
+
+ // Instantiations of the ri_inst multiclass.
+ def ADD_rr : rrinst<0b111, "add">;
+ def ADD_ri : riinst<0b111, "add">;
+ def SUB_rr : rrinst<0b101, "sub">;
+ def SUB_ri : riinst<0b101, "sub">;
+ def MUL_rr : rrinst<0b100, "mul">;
+ def MUL_ri : riinst<0b100, "mul">;
+ ...
+
+A ``defm`` can also be used inside a multiclass providing several levels of
+multiclass instantiations.
+
+.. code-block:: text
+
+ class Instruction<bits<4> opc, string Name> {
+ bits<4> opcode = opc;
+ string name = Name;
+ }
+
+ multiclass basic_r<bits<4> opc> {
+ def rr : Instruction<opc, "rr">;
+ def rm : Instruction<opc, "rm">;
+ }
+
+ multiclass basic_s<bits<4> opc> {
+ defm SS : basic_r<opc>;
+ defm SD : basic_r<opc>;
+ def X : Instruction<opc, "x">;
+ }
+
+ multiclass basic_p<bits<4> opc> {
+ defm PS : basic_r<opc>;
+ defm PD : basic_r<opc>;
+ def Y : Instruction<opc, "y">;
+ }
+
+ defm ADD : basic_s<0xf>, basic_p<0xf>;
+ ...
+
+ // Results
+ def ADDPDrm { ...
+ def ADDPDrr { ...
+ def ADDPSrm { ...
+ def ADDPSrr { ...
+ def ADDSDrm { ...
+ def ADDSDrr { ...
+ def ADDY { ...
+ def ADDX { ...
+
+``defm`` declarations can inherit from classes too, the rule to follow is that
+the class list must start after the last multiclass, and there must be at least
+one multiclass before them.
+
+.. code-block:: text
+
+ class XD { bits<4> Prefix = 11; }
+ class XS { bits<4> Prefix = 12; }
+
+ class I<bits<4> op> {
+ bits<4> opcode = op;
+ }
+
+ multiclass R {
+ def rr : I<4>;
+ def rm : I<2>;
+ }
+
+ multiclass Y {
+ defm SS : R, XD;
+ defm SD : R, XS;
+ }
+
+ defm Instr : Y;
+
+ // Results
+ def InstrSDrm {
+ bits<4> opcode = { 0, 0, 1, 0 };
+ bits<4> Prefix = { 1, 1, 0, 0 };
+ }
+ ...
+ def InstrSSrr {
+ bits<4> opcode = { 0, 1, 0, 0 };
+ bits<4> Prefix = { 1, 0, 1, 1 };
+ }
+
+File scope entities
+-------------------
+
+File inclusion
+^^^^^^^^^^^^^^
+
+TableGen supports the '``include``' token, which textually substitutes the
+specified file in place of the include directive. The filename should be
+specified as a double quoted string immediately after the '``include``' keyword.
+Example:
+
+.. code-block:: text
+
+ include "foo.td"
+
+'let' expressions
+^^^^^^^^^^^^^^^^^
+
+"Let" expressions at file scope are similar to `"let" expressions within a
+record`_, except they can specify a value binding for multiple records at a
+time, and may be useful in certain other cases. File-scope let expressions are
+really just another way that TableGen allows the end-user to factor out
+commonality from the records.
+
+File-scope "let" expressions take a comma-separated list of bindings to apply,
+and one or more records to bind the values in. Here are some examples:
+
+.. code-block:: text
+
+ let isTerminator = 1, isReturn = 1, isBarrier = 1, hasCtrlDep = 1 in
+ def RET : I<0xC3, RawFrm, (outs), (ins), "ret", [(X86retflag 0)]>;
+
+ let isCall = 1 in
+ // All calls clobber the non-callee saved registers...
+ let Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0,
+ MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
+ XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, EFLAGS] in {
+ def CALLpcrel32 : Ii32<0xE8, RawFrm, (outs), (ins i32imm:$dst,variable_ops),
+ "call\t${dst:call}", []>;
+ def CALL32r : I<0xFF, MRM2r, (outs), (ins GR32:$dst, variable_ops),
+ "call\t{*}$dst", [(X86call GR32:$dst)]>;
+ def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst, variable_ops),
+ "call\t{*}$dst", []>;
+ }
+
+File-scope "let" expressions are often useful when a couple of definitions need
+to be added to several records, and the records do not otherwise need to be
+opened, as in the case with the ``CALL*`` instructions above.
+
+It's also possible to use "let" expressions inside multiclasses, providing more
+ways to factor out commonality from the records, specially if using several
+levels of multiclass instantiations. This also avoids the need of using "let"
+expressions within subsequent records inside a multiclass.
+
+.. code-block:: text
+
+ multiclass basic_r<bits<4> opc> {
+ let Predicates = [HasSSE2] in {
+ def rr : Instruction<opc, "rr">;
+ def rm : Instruction<opc, "rm">;
+ }
+ let Predicates = [HasSSE3] in
+ def rx : Instruction<opc, "rx">;
+ }
+
+ multiclass basic_ss<bits<4> opc> {
+ let IsDouble = 0 in
+ defm SS : basic_r<opc>;
+
+ let IsDouble = 1 in
+ defm SD : basic_r<opc>;
+ }
+
+ defm ADD : basic_ss<0xf>;
+
+Looping
+^^^^^^^
+
+TableGen supports the '``foreach``' block, which textually replicates the loop
+body, substituting iterator values for iterator references in the body.
+Example:
+
+.. code-block:: text
+
+ foreach i = [0, 1, 2, 3] in {
+ def R#i : Register<...>;
+ def F#i : Register<...>;
+ }
+
+This will create objects ``R0``, ``R1``, ``R2`` and ``R3``. ``foreach`` blocks
+may be nested. If there is only one item in the body the braces may be
+elided:
+
+.. code-block:: text
+
+ foreach i = [0, 1, 2, 3] in
+ def R#i : Register<...>;
+
+Code Generator backend info
+===========================
+
+Expressions used by code generator to describe instructions and isel patterns:
+
+``(implicit a)``
+ an implicitly defined physical register. This tells the dag instruction
+ selection emitter the input pattern's extra definitions matches implicit
+ physical register definitions.
+
Added: www-releases/trunk/8.0.1/docs/_sources/TableGen/LangRef.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TableGen/LangRef.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TableGen/LangRef.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TableGen/LangRef.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,497 @@
+===========================
+TableGen Language Reference
+===========================
+
+.. contents::
+ :local:
+
+.. warning::
+ This document is extremely rough. If you find something lacking, please
+ fix it, file a documentation bug, or ask about it on llvm-dev.
+
+Introduction
+============
+
+This document is meant to be a normative spec about the TableGen language
+in and of itself (i.e. how to understand a given construct in terms of how
+it affects the final set of records represented by the TableGen file). If
+you are unsure if this document is really what you are looking for, please
+read the :doc:`introduction to TableGen <index>` first.
+
+Notation
+========
+
+The lexical and syntax notation used here is intended to imitate
+`Python's`_. In particular, for lexical definitions, the productions
+operate at the character level and there is no implied whitespace between
+elements. The syntax definitions operate at the token level, so there is
+implied whitespace between tokens.
+
+.. _`Python's`: http://docs.python.org/py3k/reference/introduction.html#notation
+
+Lexical Analysis
+================
+
+TableGen supports BCPL (``// ...``) and nestable C-style (``/* ... */``)
+comments. TableGen also provides simple `Preprocessing Support`_.
+
+The following is a listing of the basic punctuation tokens::
+
+ - + [ ] { } ( ) < > : ; . = ? #
+
+Numeric literals take one of the following forms:
+
+.. TableGen actually will lex some pretty strange sequences an interpret
+ them as numbers. What is shown here is an attempt to approximate what it
+ "should" accept.
+
+.. productionlist::
+ TokInteger: `DecimalInteger` | `HexInteger` | `BinInteger`
+ DecimalInteger: ["+" | "-"] ("0"..."9")+
+ HexInteger: "0x" ("0"..."9" | "a"..."f" | "A"..."F")+
+ BinInteger: "0b" ("0" | "1")+
+
+One aspect to note is that the :token:`DecimalInteger` token *includes* the
+``+`` or ``-``, as opposed to having ``+`` and ``-`` be unary operators as
+most languages do.
+
+Also note that :token:`BinInteger` creates a value of type ``bits<n>``
+(where ``n`` is the number of bits). This will implicitly convert to
+integers when needed.
+
+TableGen has identifier-like tokens:
+
+.. productionlist::
+ ualpha: "a"..."z" | "A"..."Z" | "_"
+ TokIdentifier: ("0"..."9")* `ualpha` (`ualpha` | "0"..."9")*
+ TokVarName: "$" `ualpha` (`ualpha` | "0"..."9")*
+
+Note that unlike most languages, TableGen allows :token:`TokIdentifier` to
+begin with a number. In case of ambiguity, a token will be interpreted as a
+numeric literal rather than an identifier.
+
+TableGen also has two string-like literals:
+
+.. productionlist::
+ TokString: '"' <non-'"' characters and C-like escapes> '"'
+ TokCodeFragment: "[{" <shortest text not containing "}]"> "}]"
+
+:token:`TokCodeFragment` is essentially a multiline string literal
+delimited by ``[{`` and ``}]``.
+
+.. note::
+ The current implementation accepts the following C-like escapes::
+
+ \\ \' \" \t \n
+
+TableGen also has the following keywords::
+
+ bit bits class code dag
+ def foreach defm field in
+ int let list multiclass string
+
+TableGen also has "bang operators" which have a
+wide variety of meanings:
+
+.. productionlist::
+ BangOperator: one of
+ :!eq !if !head !tail !con
+ :!add !shl !sra !srl !and
+ :!or !empty !subst !foreach !strconcat
+ :!cast !listconcat !size !foldl
+ :!isa !dag !le !lt !ge
+ :!gt !ne
+
+
+Syntax
+======
+
+TableGen has an ``include`` mechanism. It does not play a role in the
+syntax per se, since it is lexically replaced with the contents of the
+included file.
+
+.. productionlist::
+ IncludeDirective: "include" `TokString`
+
+TableGen's top-level production consists of "objects".
+
+.. productionlist::
+ TableGenFile: `Object`*
+ Object: `Class` | `Def` | `Defm` | `Defset` | `Let` | `MultiClass` |
+ `Foreach`
+
+``class``\es
+------------
+
+.. productionlist::
+ Class: "class" `TokIdentifier` [`TemplateArgList`] `ObjectBody`
+ TemplateArgList: "<" `Declaration` ("," `Declaration`)* ">"
+
+A ``class`` declaration creates a record which other records can inherit
+from. A class can be parametrized by a list of "template arguments", whose
+values can be used in the class body.
+
+A given class can only be defined once. A ``class`` declaration is
+considered to define the class if any of the following is true:
+
+.. break ObjectBody into its consituents so that they are present here?
+
+#. The :token:`TemplateArgList` is present.
+#. The :token:`Body` in the :token:`ObjectBody` is present and is not empty.
+#. The :token:`BaseClassList` in the :token:`ObjectBody` is present.
+
+You can declare an empty class by giving and empty :token:`TemplateArgList`
+and an empty :token:`ObjectBody`. This can serve as a restricted form of
+forward declaration: note that records deriving from the forward-declared
+class will inherit no fields from it since the record expansion is done
+when the record is parsed.
+
+Every class has an implicit template argument called ``NAME``, which is set
+to the name of the instantiating ``def`` or ``defm``. The result is undefined
+if the class is instantiated by an anonymous record.
+
+Declarations
+------------
+
+.. Omitting mention of arcane "field" prefix to discourage its use.
+
+The declaration syntax is pretty much what you would expect as a C++
+programmer.
+
+.. productionlist::
+ Declaration: `Type` `TokIdentifier` ["=" `Value`]
+
+It assigns the value to the identifier.
+
+Types
+-----
+
+.. productionlist::
+ Type: "string" | "code" | "bit" | "int" | "dag"
+ :| "bits" "<" `TokInteger` ">"
+ :| "list" "<" `Type` ">"
+ :| `ClassID`
+ ClassID: `TokIdentifier`
+
+Both ``string`` and ``code`` correspond to the string type; the difference
+is purely to indicate programmer intention.
+
+The :token:`ClassID` must identify a class that has been previously
+declared or defined.
+
+Values
+------
+
+.. productionlist::
+ Value: `SimpleValue` `ValueSuffix`*
+ ValueSuffix: "{" `RangeList` "}"
+ :| "[" `RangeList` "]"
+ :| "." `TokIdentifier`
+ RangeList: `RangePiece` ("," `RangePiece`)*
+ RangePiece: `TokInteger`
+ :| `TokInteger` "-" `TokInteger`
+ :| `TokInteger` `TokInteger`
+
+The peculiar last form of :token:`RangePiece` is due to the fact that the
+"``-``" is included in the :token:`TokInteger`, hence ``1-5`` gets lexed as
+two consecutive :token:`TokInteger`'s, with values ``1`` and ``-5``,
+instead of "1", "-", and "5".
+The :token:`RangeList` can be thought of as specifying "list slice" in some
+contexts.
+
+
+:token:`SimpleValue` has a number of forms:
+
+
+.. productionlist::
+ SimpleValue: `TokIdentifier`
+
+The value will be the variable referenced by the identifier. It can be one
+of:
+
+.. The code for this is exceptionally abstruse. These examples are a
+ best-effort attempt.
+
+* name of a ``def``, such as the use of ``Bar`` in::
+
+ def Bar : SomeClass {
+ int X = 5;
+ }
+
+ def Foo {
+ SomeClass Baz = Bar;
+ }
+
+* value local to a ``def``, such as the use of ``Bar`` in::
+
+ def Foo {
+ int Bar = 5;
+ int Baz = Bar;
+ }
+
+ Values defined in superclasses can be accessed the same way.
+
+* a template arg of a ``class``, such as the use of ``Bar`` in::
+
+ class Foo<int Bar> {
+ int Baz = Bar;
+ }
+
+* value local to a ``class``, such as the use of ``Bar`` in::
+
+ class Foo {
+ int Bar = 5;
+ int Baz = Bar;
+ }
+
+* a template arg to a ``multiclass``, such as the use of ``Bar`` in::
+
+ multiclass Foo<int Bar> {
+ def : SomeClass<Bar>;
+ }
+
+* the iteration variable of a ``foreach``, such as the use of ``i`` in::
+
+ foreach i = 0-5 in
+ def Foo#i;
+
+* a variable defined by ``defset``
+
+* the implicit template argument ``NAME`` in a ``class`` or ``multiclass``
+
+.. productionlist::
+ SimpleValue: `TokInteger`
+
+This represents the numeric value of the integer.
+
+.. productionlist::
+ SimpleValue: `TokString`+
+
+Multiple adjacent string literals are concatenated like in C/C++. The value
+is the concatenation of the strings.
+
+.. productionlist::
+ SimpleValue: `TokCodeFragment`
+
+The value is the string value of the code fragment.
+
+.. productionlist::
+ SimpleValue: "?"
+
+``?`` represents an "unset" initializer.
+
+.. productionlist::
+ SimpleValue: "{" `ValueList` "}"
+ ValueList: [`ValueListNE`]
+ ValueListNE: `Value` ("," `Value`)*
+
+This represents a sequence of bits, as would be used to initialize a
+``bits<n>`` field (where ``n`` is the number of bits).
+
+.. productionlist::
+ SimpleValue: `ClassID` "<" `ValueListNE` ">"
+
+This generates a new anonymous record definition (as would be created by an
+unnamed ``def`` inheriting from the given class with the given template
+arguments) and the value is the value of that record definition.
+
+.. productionlist::
+ SimpleValue: "[" `ValueList` "]" ["<" `Type` ">"]
+
+A list initializer. The optional :token:`Type` can be used to indicate a
+specific element type, otherwise the element type will be deduced from the
+given values.
+
+.. The initial `DagArg` of the dag must start with an identifier or
+ !cast, but this is more of an implementation detail and so for now just
+ leave it out.
+
+.. productionlist::
+ SimpleValue: "(" `DagArg` [`DagArgList`] ")"
+ DagArgList: `DagArg` ("," `DagArg`)*
+ DagArg: `Value` [":" `TokVarName`] | `TokVarName`
+
+The initial :token:`DagArg` is called the "operator" of the dag.
+
+.. productionlist::
+ SimpleValue: `BangOperator` ["<" `Type` ">"] "(" `ValueListNE` ")"
+
+Bodies
+------
+
+.. productionlist::
+ ObjectBody: `BaseClassList` `Body`
+ BaseClassList: [":" `BaseClassListNE`]
+ BaseClassListNE: `SubClassRef` ("," `SubClassRef`)*
+ SubClassRef: (`ClassID` | `MultiClassID`) ["<" `ValueList` ">"]
+ DefmID: `TokIdentifier`
+
+The version with the :token:`MultiClassID` is only valid in the
+:token:`BaseClassList` of a ``defm``.
+The :token:`MultiClassID` should be the name of a ``multiclass``.
+
+.. put this somewhere else
+
+It is after parsing the base class list that the "let stack" is applied.
+
+.. productionlist::
+ Body: ";" | "{" BodyList "}"
+ BodyList: BodyItem*
+ BodyItem: `Declaration` ";"
+ :| "let" `TokIdentifier` [ "{" `RangeList` "}" ] "=" `Value` ";"
+
+The ``let`` form allows overriding the value of an inherited field.
+
+``def``
+-------
+
+.. productionlist::
+ Def: "def" [`Value`] `ObjectBody`
+
+Defines a record whose name is given by the optional :token:`Value`. The value
+is parsed in a special mode where global identifiers (records and variables
+defined by ``defset``) are not recognized, and all unrecognized identifiers
+are interpreted as strings.
+
+If no name is given, the record is anonymous. The final name of anonymous
+records is undefined, but globally unique.
+
+Special handling occurs if this ``def`` appears inside a ``multiclass`` or
+a ``foreach``.
+
+When a non-anonymous record is defined in a multiclass and the given name
+does not contain a reference to the implicit template argument ``NAME``, such
+a reference will automatically be prepended. That is, the following are
+equivalent inside a multiclass::
+
+ def Foo;
+ def NAME#Foo;
+
+``defm``
+--------
+
+.. productionlist::
+ Defm: "defm" [`Value`] ":" `BaseClassListNE` ";"
+
+The :token:`BaseClassList` is a list of at least one ``multiclass`` and any
+number of ``class``'s. The ``multiclass``'s must occur before any ``class``'s.
+
+Instantiates all records defined in all given ``multiclass``'s and adds the
+given ``class``'s as superclasses.
+
+The name is parsed in the same special mode used by ``def``. If the name is
+missing, a globally unique string is used instead (but instantiated records
+are not considered to be anonymous, unless they were originally defined by an
+anonymous ``def``) That is, the following have different semantics::
+
+ defm : SomeMultiClass<...>; // some globally unique name
+ defm "" : SomeMultiClass<...>; // empty name string
+
+When it occurs inside a multiclass, the second variant is equivalent to
+``defm NAME : ...``. More generally, when ``defm`` occurs in a multiclass and
+its name does not contain a reference to the implicit template argument
+``NAME``, such a reference will automatically be prepended. That is, the
+following are equivalent inside a multiclass::
+
+ defm Foo : SomeMultiClass<...>;
+ defm NAME#Foo : SomeMultiClass<...>;
+
+``defset``
+----------
+.. productionlist::
+ Defset: "defset" `Type` `TokIdentifier` "=" "{" `Object`* "}"
+
+All records defined inside the braces via ``def`` and ``defm`` are collected
+in a globally accessible list of the given name (in addition to being added
+to the global collection of records as usual). Anonymous records created inside
+initializier expressions using the ``Class<args...>`` syntax are never collected
+in a defset.
+
+The given type must be ``list<A>``, where ``A`` is some class. It is an error
+to define a record (via ``def`` or ``defm``) inside the braces which doesn't
+derive from ``A``.
+
+``foreach``
+-----------
+
+.. productionlist::
+ Foreach: "foreach" `ForeachDeclaration` "in" "{" `Object`* "}"
+ :| "foreach" `ForeachDeclaration` "in" `Object`
+ ForeachDeclaration: ID "=" ( "{" `RangeList` "}" | `RangePiece` | `Value` )
+
+The value assigned to the variable in the declaration is iterated over and
+the object or object list is reevaluated with the variable set at each
+iterated value.
+
+Note that the productions involving RangeList and RangePiece have precedence
+over the more generic value parsing based on the first token.
+
+Top-Level ``let``
+-----------------
+
+.. productionlist::
+ Let: "let" `LetList` "in" "{" `Object`* "}"
+ :| "let" `LetList` "in" `Object`
+ LetList: `LetItem` ("," `LetItem`)*
+ LetItem: `TokIdentifier` [`RangeList`] "=" `Value`
+
+This is effectively equivalent to ``let`` inside the body of a record
+except that it applies to multiple records at a time. The bindings are
+applied at the end of parsing the base classes of a record.
+
+``multiclass``
+--------------
+
+.. productionlist::
+ MultiClass: "multiclass" `TokIdentifier` [`TemplateArgList`]
+ : [":" `BaseMultiClassList`] "{" `MultiClassObject`+ "}"
+ BaseMultiClassList: `MultiClassID` ("," `MultiClassID`)*
+ MultiClassID: `TokIdentifier`
+ MultiClassObject: `Def` | `Defm` | `Let` | `Foreach`
+
+Preprocessing Support
+=====================
+
+TableGen's embedded preprocessor is only intended for conditional compilation.
+It supports the following directives:
+
+.. productionlist::
+ LineBegin: ^
+ LineEnd: "\n" | "\r" | EOF
+ WhiteSpace: " " | "\t"
+ CStyleComment: "/*" (.* - "*/") "*/"
+ BCPLComment: "//" (.* - `LineEnd`) `LineEnd`
+ WhiteSpaceOrCStyleComment: `WhiteSpace` | `CStyleComment`
+ WhiteSpaceOrAnyComment: `WhiteSpace` | `CStyleComment` | `BCPLComment`
+ MacroName: `ualpha` (`ualpha` | "0"..."9")*
+ PrepDefine: `LineBegin` (`WhiteSpaceOrCStyleComment`)*
+ : "#define" (`WhiteSpace`)+ `MacroName`
+ : (`WhiteSpaceOrAnyComment`)* `LineEnd`
+ PrepIfdef: `LineBegin` (`WhiteSpaceOrCStyleComment`)*
+ : "#ifdef" (`WhiteSpace`)+ `MacroName`
+ : (`WhiteSpaceOrAnyComment`)* `LineEnd`
+ PrepElse: `LineBegin` (`WhiteSpaceOrCStyleComment`)*
+ : "#else" (`WhiteSpaceOrAnyComment`)* `LineEnd`
+ PrepEndif: `LineBegin` (`WhiteSpaceOrCStyleComment`)*
+ : "#endif" (`WhiteSpaceOrAnyComment`)* `LineEnd`
+ PrepRegContentException: `PredIfdef` | `PredElse` | `PredEndif` | EOF
+ PrepRegion: .* - `PrepRegContentException`
+ :| `PrepIfDef`
+ : (`PrepRegion`)*
+ : [`PrepElse`]
+ : (`PrepRegion`)*
+ : `PrepEndif`
+
+:token:`PrepRegion` may occur anywhere in a TD file, as long as it matches
+the grammar specification.
+
+:token:`PrepDefine` allows defining a :token:`MacroName` so that any following
+:token:`PrepIfdef` - :token:`PrepElse` preprocessing region part and
+:token:`PrepIfdef` - :token:`PrepEndif` preprocessing region
+are enabled for TableGen tokens parsing.
+
+A preprocessing region, starting (i.e. having its :token:`PrepIfdef`) in a file,
+must end (i.e. have its :token:`PrepEndif`) in the same file.
+
+A :token:`MacroName` may be defined externally by using ``{ -D<NAME> }``
+option of TableGen.
Added: www-releases/trunk/8.0.1/docs/_sources/TableGen/index.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TableGen/index.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TableGen/index.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TableGen/index.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,304 @@
+========
+TableGen
+========
+
+.. contents::
+ :local:
+
+.. toctree::
+ :hidden:
+
+ BackEnds
+ LangRef
+ LangIntro
+ Deficiencies
+
+Introduction
+============
+
+TableGen's purpose is to help a human develop and maintain records of
+domain-specific information. Because there may be a large number of these
+records, it is specifically designed to allow writing flexible descriptions and
+for common features of these records to be factored out. This reduces the
+amount of duplication in the description, reduces the chance of error, and makes
+it easier to structure domain specific information.
+
+The core part of TableGen parses a file, instantiates the declarations, and
+hands the result off to a domain-specific `backend`_ for processing.
+
+The current major users of TableGen are :doc:`../CodeGenerator`
+and the
+`Clang diagnostics and attributes <http://clang.llvm.org/docs/UsersManual.html#controlling-errors-and-warnings>`_.
+
+Note that if you work on TableGen much, and use emacs or vim, that you can find
+an emacs "TableGen mode" and a vim language file in the ``llvm/utils/emacs`` and
+``llvm/utils/vim`` directories of your LLVM distribution, respectively.
+
+.. _intro:
+
+
+The TableGen program
+====================
+
+TableGen files are interpreted by the TableGen program: `llvm-tblgen` available
+on your build directory under `bin`. It is not installed in the system (or where
+your sysroot is set to), since it has no use beyond LLVM's build process.
+
+Running TableGen
+----------------
+
+TableGen runs just like any other LLVM tool. The first (optional) argument
+specifies the file to read. If a filename is not specified, ``llvm-tblgen``
+reads from standard input.
+
+To be useful, one of the `backends`_ must be used. These backends are
+selectable on the command line (type '``llvm-tblgen -help``' for a list). For
+example, to get a list of all of the definitions that subclass a particular type
+(which can be useful for building up an enum list of these records), use the
+``-print-enums`` option:
+
+.. code-block:: bash
+
+ $ llvm-tblgen X86.td -print-enums -class=Register
+ AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
+ ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
+ MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
+ R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
+ R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
+ RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
+ XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
+ XMM6, XMM7, XMM8, XMM9,
+
+ $ llvm-tblgen X86.td -print-enums -class=Instruction
+ ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
+ ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
+ ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
+ ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
+ ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
+
+The default backend prints out all of the records. There is also a general
+backend which outputs all the records as a JSON data structure, enabled using
+the `-dump-json` option.
+
+If you plan to use TableGen, you will most likely have to write a `backend`_
+that extracts the information specific to what you need and formats it in the
+appropriate way. You can do this by extending TableGen itself in C++, or by
+writing a script in any language that can consume the JSON output.
+
+Example
+-------
+
+With no other arguments, `llvm-tblgen` parses the specified file and prints out all
+of the classes, then all of the definitions. This is a good way to see what the
+various definitions expand to fully. Running this on the ``X86.td`` file prints
+this (at the time of this writing):
+
+.. code-block:: text
+
+ ...
+ def ADD32rr { // Instruction X86Inst I
+ string Namespace = "X86";
+ dag OutOperandList = (outs GR32:$dst);
+ dag InOperandList = (ins GR32:$src1, GR32:$src2);
+ string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
+ list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
+ list<Register> Uses = [];
+ list<Register> Defs = [EFLAGS];
+ list<Predicate> Predicates = [];
+ int CodeSize = 3;
+ int AddedComplexity = 0;
+ bit isReturn = 0;
+ bit isBranch = 0;
+ bit isIndirectBranch = 0;
+ bit isBarrier = 0;
+ bit isCall = 0;
+ bit canFoldAsLoad = 0;
+ bit mayLoad = 0;
+ bit mayStore = 0;
+ bit isImplicitDef = 0;
+ bit isConvertibleToThreeAddress = 1;
+ bit isCommutable = 1;
+ bit isTerminator = 0;
+ bit isReMaterializable = 0;
+ bit isPredicable = 0;
+ bit hasDelaySlot = 0;
+ bit usesCustomInserter = 0;
+ bit hasCtrlDep = 0;
+ bit isNotDuplicable = 0;
+ bit hasSideEffects = 0;
+ InstrItinClass Itinerary = NoItinerary;
+ string Constraints = "";
+ string DisableEncoding = "";
+ bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
+ Format Form = MRMDestReg;
+ bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
+ ImmType ImmT = NoImm;
+ bits<3> ImmTypeBits = { 0, 0, 0 };
+ bit hasOpSizePrefix = 0;
+ bit hasAdSizePrefix = 0;
+ bits<4> Prefix = { 0, 0, 0, 0 };
+ bit hasREX_WPrefix = 0;
+ FPFormat FPForm = ?;
+ bits<3> FPFormBits = { 0, 0, 0 };
+ }
+ ...
+
+This definition corresponds to the 32-bit register-register ``add`` instruction
+of the x86 architecture. ``def ADD32rr`` defines a record named
+``ADD32rr``, and the comment at the end of the line indicates the superclasses
+of the definition. The body of the record contains all of the data that
+TableGen assembled for the record, indicating that the instruction is part of
+the "X86" namespace, the pattern indicating how the instruction is selected by
+the code generator, that it is a two-address instruction, has a particular
+encoding, etc. The contents and semantics of the information in the record are
+specific to the needs of the X86 backend, and are only shown as an example.
+
+As you can see, a lot of information is needed for every instruction supported
+by the code generator, and specifying it all manually would be unmaintainable,
+prone to bugs, and tiring to do in the first place. Because we are using
+TableGen, all of the information was derived from the following definition:
+
+.. code-block:: text
+
+ let Defs = [EFLAGS],
+ isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y
+ isConvertibleToThreeAddress = 1 in // Can transform into LEA.
+ def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst),
+ (ins GR32:$src1, GR32:$src2),
+ "add{l}\t{$src2, $dst|$dst, $src2}",
+ [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
+
+This definition makes use of the custom class ``I`` (extended from the custom
+class ``X86Inst``), which is defined in the X86-specific TableGen file, to
+factor out the common features that instructions of its class share. A key
+feature of TableGen is that it allows the end-user to define the abstractions
+they prefer to use when describing their information.
+
+Syntax
+======
+
+TableGen has a syntax that is loosely based on C++ templates, with built-in
+types and specification. In addition, TableGen's syntax introduces some
+automation concepts like multiclass, foreach, let, etc.
+
+Basic concepts
+--------------
+
+TableGen files consist of two key parts: 'classes' and 'definitions', both of
+which are considered 'records'.
+
+**TableGen records** have a unique name, a list of values, and a list of
+superclasses. The list of values is the main data that TableGen builds for each
+record; it is this that holds the domain specific information for the
+application. The interpretation of this data is left to a specific `backend`_,
+but the structure and format rules are taken care of and are fixed by
+TableGen.
+
+**TableGen definitions** are the concrete form of 'records'. These generally do
+not have any undefined values, and are marked with the '``def``' keyword.
+
+.. code-block:: text
+
+ def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
+ "Enable ARMv8 FP">;
+
+In this example, FeatureFPARMv8 is ``SubtargetFeature`` record initialised
+with some values. The names of the classes are defined via the
+keyword `class` either on the same file or some other included. Most target
+TableGen files include the generic ones in ``include/llvm/Target``.
+
+**TableGen classes** are abstract records that are used to build and describe
+other records. These classes allow the end-user to build abstractions for
+either the domain they are targeting (such as "Register", "RegisterClass", and
+"Instruction" in the LLVM code generator) or for the implementor to help factor
+out common properties of records (such as "FPInst", which is used to represent
+floating point instructions in the X86 backend). TableGen keeps track of all of
+the classes that are used to build up a definition, so the backend can find all
+definitions of a particular class, such as "Instruction".
+
+.. code-block:: text
+
+ class ProcNoItin<string Name, list<SubtargetFeature> Features>
+ : Processor<Name, NoItineraries, Features>;
+
+Here, the class ProcNoItin, receiving parameters `Name` of type `string` and
+a list of target features is specializing the class Processor by passing the
+arguments down as well as hard-coding NoItineraries.
+
+**TableGen multiclasses** are groups of abstract records that are instantiated
+all at once. Each instantiation can result in multiple TableGen definitions.
+If a multiclass inherits from another multiclass, the definitions in the
+sub-multiclass become part of the current multiclass, as if they were declared
+in the current multiclass.
+
+.. code-block:: text
+
+ multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend,
+ dag address, ValueType sty> {
+ def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)),
+ (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset")
+ Base, Offset, Extend)>;
+
+ def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)),
+ (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset")
+ Base, Offset, Extend)>;
+ }
+
+ defm : ro_signed_pats<"B", Rm, Base, Offset, Extend,
+ !foreach(decls.pattern, address,
+ !subst(SHIFT, imm_eq0, decls.pattern)),
+ i8>;
+
+
+
+See the :doc:`TableGen Language Introduction <LangIntro>` for more generic
+information on the usage of the language, and the
+:doc:`TableGen Language Reference <LangRef>` for more in-depth description
+of the formal language specification.
+
+.. _backend:
+.. _backends:
+
+TableGen backends
+=================
+
+TableGen files have no real meaning without a back-end. The default operation
+of running ``llvm-tblgen`` is to print the information in a textual format, but
+that's only useful for debugging of the TableGen files themselves. The power
+in TableGen is, however, to interpret the source files into an internal
+representation that can be generated into anything you want.
+
+Current usage of TableGen is to create huge include files with tables that you
+can either include directly (if the output is in the language you're coding),
+or be used in pre-processing via macros surrounding the include of the file.
+
+Direct output can be used if the back-end already prints a table in C format
+or if the output is just a list of strings (for error and warning messages).
+Pre-processed output should be used if the same information needs to be used
+in different contexts (like Instruction names), so your back-end should print
+a meta-information list that can be shaped into different compile-time formats.
+
+See the `TableGen BackEnds <BackEnds.html>`_ for more information.
+
+TableGen Deficiencies
+=====================
+
+Despite being very generic, TableGen has some deficiencies that have been
+pointed out numerous times. The common theme is that, while TableGen allows
+you to build Domain-Specific-Languages, the final languages that you create
+lack the power of other DSLs, which in turn increase considerably the size
+and complexity of TableGen files.
+
+At the same time, TableGen allows you to create virtually any meaning of
+the basic concepts via custom-made back-ends, which can pervert the original
+design and make it very hard for newcomers to understand the evil TableGen
+file.
+
+There are some in favour of extending the semantics even more, but making sure
+back-ends adhere to strict rules. Others are suggesting we should move to less,
+more powerful DSLs designed with specific purposes, or even re-using existing
+DSLs.
+
+Either way, this is a discussion that will likely span across several years,
+if not decades. You can read more in the `TableGen Deficiencies <Deficiencies.html>`_
+document.
Added: www-releases/trunk/8.0.1/docs/_sources/TableGenFundamentals.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TableGenFundamentals.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TableGenFundamentals.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TableGenFundamentals.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,10 @@
+=====================
+TableGen Fundamentals
+=====================
+
+Moved
+=====
+
+The TableGen fundamentals documentation has moved to a directory on its own
+and is now available at :doc:`TableGen/index`. Please, change your links to
+that page.
Added: www-releases/trunk/8.0.1/docs/_sources/TestSuiteGuide.md.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TestSuiteGuide.md.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TestSuiteGuide.md.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TestSuiteGuide.md.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,403 @@
+test-suite Guide
+================
+
+Quickstart
+----------
+
+1. The lit test runner is required to run the tests. You can either use one
+ from an LLVM build:
+
+ ```bash
+ % <path to llvm build>/bin/llvm-lit --version
+ lit 0.8.0dev
+ ```
+
+ An alternative is installing it as a python package in a python virtual
+ environment:
+
+ ```bash
+ % mkdir venv
+ % virtualenv venv
+ % . venv/bin/activate
+ % pip install svn+http://llvm.org/svn/llvm-project/llvm/trunk/utils/lit
+ % lit --version
+ lit 0.8.0dev
+ ```
+
+2. Check out the `test-suite` module with:
+
+ ```bash
+ % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
+ ```
+
+3. Create a build directory and use CMake to configure the suite. Use the
+ `CMAKE_C_COMPILER` option to specify the compiler to test. Use a cache file
+ to choose a typical build configuration:
+
+ ```bash
+ % mkdir test-suite-build
+ % cd test-suite-build
+ % cmake -DCMAKE_C_COMPILER=<path to llvm build>/bin/clang \
+ -C../test-suite/cmake/caches/O3.cmake \
+ ../test-suite
+ ```
+
+4. Build the benchmarks:
+
+ ```text
+ % make
+ Scanning dependencies of target timeit-target
+ [ 0%] Building C object tools/CMakeFiles/timeit-target.dir/timeit.c.o
+ [ 0%] Linking C executable timeit-target
+ ...
+ ```
+
+5. Run the tests with lit:
+
+ ```text
+ % llvm-lit -v -j 1 -o results.json .
+ -- Testing: 474 tests, 1 threads --
+ PASS: test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test (1 of 474)
+ ********** TEST 'test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test' RESULTS **********
+ compile_time: 0.2192
+ exec_time: 0.0462
+ hash: "59620e187c6ac38b36382685ccd2b63b"
+ size: 83348
+ **********
+ PASS: test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test (2 of 474)
+ ...
+ ```
+
+6. Show and compare result files (optional):
+
+ ```bash
+ # Make sure pandas is installed. Prepend `sudo` if necessary.
+ % pip install pandas
+ # Show a single result file:
+ % test-suite/utils/compare.py results.json
+ # Compare two result files:
+ % test-suite/utils/compare.py results_a.json results_b.json
+ ```
+
+
+Structure
+---------
+
+The test-suite contains benchmark and test programs. The programs come with
+reference outputs so that their correctness can be checked. The suite comes
+with tools to collect metrics such as benchmark runtime, compilation time and
+code size.
+
+The test-suite is divided into several directories:
+
+- `SingleSource/`
+
+ Contains test programs that are only a single source file in size. A
+ subdirectory may contain several programs.
+
+- `MultiSource/`
+
+ Contains subdirectories which entire programs with multiple source files.
+ Large benchmarks and whole applications go here.
+
+- `MicroBenchmarks/`
+
+ Programs using the [google-benchmark](https://github.com/google/benchmark)
+ library. The programs define functions that are run multiple times until the
+ measurement results are statistically significant.
+
+- `External/`
+
+ Contains descriptions and test data for code that cannot be directly
+ distributed with the test-suite. The most prominent members of this
+ directory are the SPEC CPU benchmark suites.
+ See [External Suites](#external-suites).
+
+- `Bitcode/`
+
+ These tests are mostly written in LLVM bitcode.
+
+- `CTMark/`
+
+ Contains symbolic links to other benchmarks forming a representative sample
+ for compilation performance measurements.
+
+### Benchmarks
+
+Every program can work as a correctness test. Some programs are unsuitable for
+performance measurements. Setting the `TEST_SUITE_BENCHMARKING_ONLY` CMake
+option to `ON` will disable them.
+
+
+Configuration
+-------------
+
+The test-suite has configuration options to customize building and running the
+benchmarks. CMake can print a list of them:
+
+```bash
+% cd test-suite-build
+# Print basic options:
+% cmake -LH
+# Print all options:
+% cmake -LAH
+```
+
+### Common Configuration Options
+
+- `CMAKE_C_FLAGS`
+
+ Specify extra flags to be passed to C compiler invocations. The flags are
+ also passed to the C++ compiler and linker invocations. See
+ [https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html)
+
+- `CMAKE_C_COMPILER`
+
+ Select the C compiler executable to be used. Note that the C++ compiler is
+ inferred automatically i.e. when specifying `path/to/clang` CMake will
+ automatically use `path/to/clang++` as the C++ compiler. See
+ [https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER.html](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER.html)
+
+- `CMAKE_BUILD_TYPE`
+
+ Select a build type like `OPTIMIZE` or `DEBUG` selecting a set of predefined
+ compiler flags. These flags are applied regardless of the `CMAKE_C_FLAGS`
+ option and may be changed by modifying `CMAKE_C_FLAGS_OPTIMIZE` etc. See
+ [https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html]](https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html)
+
+- `TEST_SUITE_RUN_UNDER`
+
+ Prefix test invocations with the given tool. This is typically used to run
+ cross-compiled tests within a simulator tool.
+
+- `TEST_SUITE_BENCHMARKING_ONLY`
+
+ Disable tests that are unsuitable for performance measurements. The disabled
+ tests either run for a very short time or are dominated by I/O performance
+ making them unsuitable as compiler performance tests.
+
+- `TEST_SUITE_SUBDIRS`
+
+ Semicolon-separated list of directories to include. This can be used to only
+ build parts of the test-suite or to include external suites. This option
+ does not work reliably with deeper subdirectories as it skips intermediate
+ `CMakeLists.txt` files which may be required.
+
+- `TEST_SUITE_COLLECT_STATS`
+
+ Collect internal LLVM statistics. Appends `-save-stats=obj` when invocing the
+ compiler and makes the lit runner collect and merge the statistic files.
+
+- `TEST_SUITE_RUN_BENCHMARKS`
+
+ If this is set to `OFF` then lit will not actually run the tests but just
+ collect build statistics like compile time and code size.
+
+- `TEST_SUITE_USE_PERF`
+
+ Use the `perf` tool for time measurement instead of the `timeit` tool that
+ comes with the test-suite. The `perf` is usually available on linux systems.
+
+- `TEST_SUITE_SPEC2000_ROOT`, `TEST_SUITE_SPEC2006_ROOT`, `TEST_SUITE_SPEC2017_ROOT`, ...
+
+ Specify installation directories of external benchmark suites. You can find
+ more information about expected versions or usage in the README files in the
+ `External` directory (such as `External/SPEC/README`)
+
+### Common CMake Flags
+
+- `-GNinja`
+
+ Generate build files for the ninja build tool.
+
+- `-Ctest-suite/cmake/caches/<cachefile.cmake>`
+
+ Use a CMake cache. The test-suite comes with several CMake caches which
+ predefine common or tricky build configurations.
+
+
+Displaying and Analyzing Results
+--------------------------------
+
+The `compare.py` script displays and compares result files. A result file is
+produced when invoking lit with the `-o filename.json` flag.
+
+Example usage:
+
+- Basic Usage:
+
+ ```text
+ % test-suite/utils/compare.py baseline.json
+ Warning: 'test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test' has No metrics!
+ Tests: 508
+ Metric: exec_time
+
+ Program baseline
+
+ INT2006/456.hmmer/456.hmmer 1222.90
+ INT2006/464.h264ref/464.h264ref 928.70
+ ...
+ baseline
+ count 506.000000
+ mean 20.563098
+ std 111.423325
+ min 0.003400
+ 25% 0.011200
+ 50% 0.339450
+ 75% 4.067200
+ max 1222.896800
+ ```
+
+- Show compile_time or text segment size metrics:
+
+ ```bash
+ % test-suite/utils/compare.py -m compile_time baseline.json
+ % test-suite/utils/compare.py -m size.__text baseline.json
+ ```
+
+- Compare two result files and filter short running tests:
+
+ ```bash
+ % test-suite/utils/compare.py --filter-short baseline.json experiment.json
+ ...
+ Program baseline experiment diff
+
+ SingleSour.../Benchmarks/Linpack/linpack-pc 5.16 4.30 -16.5%
+ MultiSourc...erolling-dbl/LoopRerolling-dbl 7.01 7.86 12.2%
+ SingleSour...UnitTests/Vectorizer/gcc-loops 3.89 3.54 -9.0%
+ ...
+ ```
+
+- Merge multiple baseline and experiment result files by taking the minimum
+ runtime each:
+
+ ```bash
+ % test-suite/utils/compare.py base0.json base1.json base2.json vs exp0.json exp1.json exp2.json
+ ```
+
+### Continuous Tracking with LNT
+
+LNT is a set of client and server tools for continuously monitoring
+performance. You can find more information at
+[http://llvm.org/docs/lnt](http://llvm.org/docs/lnt). The official LNT instance
+of the LLVM project is hosted at [http://lnt.llvm.org](http://lnt.llvm.org).
+
+
+External Suites
+---------------
+
+External suites such as SPEC can be enabled by either
+
+- placing (or linking) them into the `test-suite/test-suite-externals/xxx` directory (example: `test-suite/test-suite-externals/speccpu2000`)
+- using a configuration option such as `-D TEST_SUITE_SPEC2000_ROOT=path/to/speccpu2000`
+
+You can find further information in the respective README files such as
+`test-suite/External/SPEC/README`.
+
+For the SPEC benchmarks you can switch between the `test`, `train` and
+`ref` input datasets via the `TEST_SUITE_RUN_TYPE` configuration option.
+The `train` dataset is used by default.
+
+
+Custom Suites
+-------------
+
+You can build custom suites using the test-suite infrastructure. A custom suite
+has a `CMakeLists.txt` file at the top directory. The `CMakeLists.txt` will be
+picked up automatically if placed into a subdirectory of the test-suite or when
+setting the `TEST_SUITE_SUBDIRS` variable:
+
+```bash
+% cmake -DTEST_SUITE_SUBDIRS=path/to/my/benchmark-suite ../test-suite
+```
+
+
+Profile Guided Optimization
+---------------------------
+
+Profile guided optimization requires to compile and run twice. First the
+benchmark should be compiled with profile generation instrumentation enabled
+and setup for training data. The lit runner will merge the profile files
+using `llvm-profdata` so they can be used by the second compilation run.
+
+Example:
+```bash
+# Profile generation run:
+% cmake -DTEST_SUITE_PROFILE_GENERATE=ON \
+ -DTEST_SUITE_RUN_TYPE=train \
+ ../test-suite
+% make
+% llvm-lit .
+# Use the profile data for compilation and actual benchmark run:
+% cmake -DTEST_SUITE_PROFILE_GENERATE=OFF \
+ -DTEST_SUITE_PROFILE_USE=ON \
+ -DTEST_SUITE_RUN_TYPE=ref \
+ .
+% make
+% llvm-lit -o result.json .
+```
+
+The `TEST_SUITE_RUN_TYPE` setting only affects the SPEC benchmark suites.
+
+
+Cross Compilation and External Devices
+--------------------------------------
+
+### Compilation
+
+CMake allows to cross compile to a different target via toolchain files. More
+information can be found here:
+
+- [http://llvm.org/docs/lnt/tests.html#cross-compiling](http://llvm.org/docs/lnt/tests.html#cross-compiling)
+
+- [https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html](https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html)
+
+Cross compilation from macOS to iOS is possible with the
+`test-suite/cmake/caches/target-target-*-iphoneos-internal.cmake` CMake cache
+files; this requires an internal iOS SDK.
+
+### Running
+
+There are two ways to run the tests in a cross compilation setting:
+
+- Via SSH connection to an external device: The `TEST_SUITE_REMOTE_HOST` option
+ should be set to the SSH hostname. The executables and data files need to be
+ transferred to the device after compilation. This is typically done via the
+ `rsync` make target. After this, the lit runner can be used on the host
+ machine. It will prefix the benchmark and verification command lines with an
+ `ssh` command.
+
+ Example:
+
+ ```bash
+ % cmake -G Ninja -D CMAKE_C_COMPILER=path/to/clang \
+ -C ../test-suite/cmake/caches/target-arm64-iphoneos-internal.cmake \
+ -D TEST_SUITE_REMOTE_HOST=mydevice \
+ ../test-suite
+ % ninja
+ % ninja rsync
+ % llvm-lit -j1 -o result.json .
+ ```
+
+- You can specify a simulator for the target machine with the
+ `TEST_SUITE_RUN_UNDER` setting. The lit runner will prefix all benchmark
+ invocations with it.
+
+
+Running the test-suite via LNT
+------------------------------
+
+The LNT tool can run the test-suite. Use this when submitting test results to
+an LNT instance. See
+[http://llvm.org/docs/lnt/tests.html#llvm-cmake-test-suite](http://llvm.org/docs/lnt/tests.html#llvm-cmake-test-suite)
+for details.
+
+Running the test-suite via Makefiles (deprecated)
+-------------------------------------------------
+
+**Note**: The test-suite comes with a set of Makefiles that are considered
+deprecated. They do not support newer testing modes like `Bitcode` or
+`Microbenchmarks` and are harder to use.
+
+Old documentation is available in the
+[test-suite Makefile Guide](TestSuiteMakefileGuide).
Added: www-releases/trunk/8.0.1/docs/_sources/TestSuiteMakefileGuide.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TestSuiteMakefileGuide.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TestSuiteMakefileGuide.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TestSuiteMakefileGuide.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,198 @@
+======================================
+test-suite Makefile Guide (deprecated)
+======================================
+
+.. contents::
+ :local:
+
+Overview
+========
+
+First, all tests are executed within the LLVM object directory tree.
+They *are not* executed inside of the LLVM source tree. This is because
+the test suite creates temporary files during execution.
+
+To run the test suite, you need to use the following steps:
+
+#. ``cd`` into the ``llvm/projects`` directory in your source tree.
+#. Check out the ``test-suite`` module with:
+
+ .. code-block:: bash
+
+ % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
+
+ This will get the test suite into ``llvm/projects/test-suite``.
+
+#. Configure and build ``llvm``.
+
+#. Configure and build ``llvm-gcc``.
+
+#. Install ``llvm-gcc`` somewhere.
+
+#. *Re-configure* ``llvm`` from the top level of each build tree (LLVM
+ object directory tree) in which you want to run the test suite, just
+ as you do before building LLVM.
+
+ During the *re-configuration*, you must either: (1) have ``llvm-gcc``
+ you just built in your path, or (2) specify the directory where your
+ just-built ``llvm-gcc`` is installed using
+ ``--with-llvmgccdir=$LLVM_GCC_DIR``.
+
+ You must also tell the configure machinery that the test suite is
+ available so it can be configured for your build tree:
+
+ .. code-block:: bash
+
+ % cd $LLVM_OBJ_ROOT ; $LLVM_SRC_ROOT/configure [--with-llvmgccdir=$LLVM_GCC_DIR]
+
+ [Remember that ``$LLVM_GCC_DIR`` is the directory where you
+ *installed* llvm-gcc, not its src or obj directory.]
+
+#. You can now run the test suite from your build tree as follows:
+
+ .. code-block:: bash
+
+ % cd $LLVM_OBJ_ROOT/projects/test-suite
+ % make
+
+Note that the second and third steps only need to be done once. After
+you have the suite checked out and configured, you don't need to do it
+again (unless the test code or configure script changes).
+
+Configuring External Tests
+==========================
+
+In order to run the External tests in the ``test-suite`` module, you
+must specify *--with-externals*. This must be done during the
+*re-configuration* step (see above), and the ``llvm`` re-configuration
+must recognize the previously-built ``llvm-gcc``. If any of these is
+missing or neglected, the External tests won't work.
+
+* *--with-externals*
+
+* *--with-externals=<directory>*
+
+This tells LLVM where to find any external tests. They are expected to
+be in specifically named subdirectories of <``directory``>. If
+``directory`` is left unspecified, ``configure`` uses the default value
+``/home/vadve/shared/benchmarks/speccpu2000/benchspec``. Subdirectory
+names known to LLVM include:
+
+* spec95
+
+* speccpu2000
+
+* speccpu2006
+
+* povray31
+
+Others are added from time to time, and can be determined from
+``configure``.
+
+Running Different Tests
+=======================
+
+In addition to the regular "whole program" tests, the ``test-suite``
+module also provides a mechanism for compiling the programs in different
+ways. If the variable TEST is defined on the ``gmake`` command line, the
+test system will include a Makefile named
+``TEST.<value of TEST variable>.Makefile``. This Makefile can modify
+build rules to yield different results.
+
+For example, the LLVM nightly tester uses ``TEST.nightly.Makefile`` to
+create the nightly test reports. To run the nightly tests, run
+``gmake TEST=nightly``.
+
+There are several TEST Makefiles available in the tree. Some of them are
+designed for internal LLVM research and will not work outside of the
+LLVM research group. They may still be valuable, however, as a guide to
+writing your own TEST Makefile for any optimization or analysis passes
+that you develop with LLVM.
+
+Generating Test Output
+======================
+
+There are a number of ways to run the tests and generate output. The
+most simple one is simply running ``gmake`` with no arguments. This will
+compile and run all programs in the tree using a number of different
+methods and compare results. Any failures are reported in the output,
+but are likely drowned in the other output. Passes are not reported
+explicitly.
+
+Somewhat better is running ``gmake TEST=sometest test``, which runs the
+specified test and usually adds per-program summaries to the output
+(depending on which sometest you use). For example, the ``nightly`` test
+explicitly outputs TEST-PASS or TEST-FAIL for every test after each
+program. Though these lines are still drowned in the output, it's easy
+to grep the output logs in the Output directories.
+
+Even better are the ``report`` and ``report.format`` targets (where
+``format`` is one of ``html``, ``csv``, ``text`` or ``graphs``). The
+exact contents of the report are dependent on which ``TEST`` you are
+running, but the text results are always shown at the end of the run and
+the results are always stored in the ``report.<type>.format`` file (when
+running with ``TEST=<type>``). The ``report`` also generate a file
+called ``report.<type>.raw.out`` containing the output of the entire
+test run.
+
+Writing Custom Tests for the test-suite
+=======================================
+
+Assuming you can run the test suite, (e.g.
+"``gmake TEST=nightly report``" should work), it is really easy to run
+optimizations or code generator components against every program in the
+tree, collecting statistics or running custom checks for correctness. At
+base, this is how the nightly tester works, it's just one example of a
+general framework.
+
+Lets say that you have an LLVM optimization pass, and you want to see
+how many times it triggers. First thing you should do is add an LLVM
+`statistic <ProgrammersManual.html#Statistic>`_ to your pass, which will
+tally counts of things you care about.
+
+Following this, you can set up a test and a report that collects these
+and formats them for easy viewing. This consists of two files, a
+"``test-suite/TEST.XXX.Makefile``" fragment (where XXX is the name of
+your test) and a "``test-suite/TEST.XXX.report``" file that indicates
+how to format the output into a table. There are many example reports of
+various levels of sophistication included with the test suite, and the
+framework is very general.
+
+If you are interested in testing an optimization pass, check out the
+"libcalls" test as an example. It can be run like this:
+
+.. code-block:: bash
+
+ % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
+ % make TEST=libcalls report
+
+This will do a bunch of stuff, then eventually print a table like this:
+
+::
+
+ Name | total | #exit |
+ ...
+ FreeBench/analyzer/analyzer | 51 | 6 |
+ FreeBench/fourinarow/fourinarow | 1 | 1 |
+ FreeBench/neural/neural | 19 | 9 |
+ FreeBench/pifft/pifft | 5 | 3 |
+ MallocBench/cfrac/cfrac | 1 | * |
+ MallocBench/espresso/espresso | 52 | 12 |
+ MallocBench/gs/gs | 4 | * |
+ Prolangs-C/TimberWolfMC/timberwolfmc | 302 | * |
+ Prolangs-C/agrep/agrep | 33 | 12 |
+ Prolangs-C/allroots/allroots | * | * |
+ Prolangs-C/assembler/assembler | 47 | * |
+ Prolangs-C/bison/mybison | 74 | * |
+ ...
+
+This basically is grepping the -stats output and displaying it in a
+table. You can also use the "TEST=libcalls report.html" target to get
+the table in HTML form, similarly for report.csv and report.tex.
+
+The source for this is in ``test-suite/TEST.libcalls.*``. The format is
+pretty simple: the Makefile indicates how to run the test (in this case,
+"``opt -simplify-libcalls -stats``"), and the report contains one line
+for each column of the output. The first value is the header for the
+column and the second is the regex to grep the output of the command
+for. There are lots of example reports that can do fancy stuff.
Added: www-releases/trunk/8.0.1/docs/_sources/TestingGuide.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TestingGuide.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TestingGuide.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TestingGuide.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,585 @@
+=================================
+LLVM Testing Infrastructure Guide
+=================================
+
+.. contents::
+ :local:
+
+.. toctree::
+ :hidden:
+
+ TestSuiteGuide
+ TestSuiteMakefileGuide
+
+Overview
+========
+
+This document is the reference manual for the LLVM testing
+infrastructure. It documents the structure of the LLVM testing
+infrastructure, the tools needed to use it, and how to add and run
+tests.
+
+Requirements
+============
+
+In order to use the LLVM testing infrastructure, you will need all of the
+software required to build LLVM, as well as `Python <http://python.org>`_ 2.7 or
+later.
+
+LLVM Testing Infrastructure Organization
+========================================
+
+The LLVM testing infrastructure contains two major categories of tests:
+regression tests and whole programs. The regression tests are contained
+inside the LLVM repository itself under ``llvm/test`` and are expected
+to always pass -- they should be run before every commit.
+
+The whole programs tests are referred to as the "LLVM test suite" (or
+"test-suite") and are in the ``test-suite`` module in subversion. For
+historical reasons, these tests are also referred to as the "nightly
+tests" in places, which is less ambiguous than "test-suite" and remains
+in use although we run them much more often than nightly.
+
+Regression tests
+----------------
+
+The regression tests are small pieces of code that test a specific
+feature of LLVM or trigger a specific bug in LLVM. The language they are
+written in depends on the part of LLVM being tested. These tests are driven by
+the :doc:`Lit <CommandGuide/lit>` testing tool (which is part of LLVM), and
+are located in the ``llvm/test`` directory.
+
+Typically when a bug is found in LLVM, a regression test containing just
+enough code to reproduce the problem should be written and placed
+somewhere underneath this directory. For example, it can be a small
+piece of LLVM IR distilled from an actual application or benchmark.
+
+``test-suite``
+--------------
+
+The test suite contains whole programs, which are pieces of code which
+can be compiled and linked into a stand-alone program that can be
+executed. These programs are generally written in high level languages
+such as C or C++.
+
+These programs are compiled using a user specified compiler and set of
+flags, and then executed to capture the program output and timing
+information. The output of these programs is compared to a reference
+output to ensure that the program is being compiled correctly.
+
+In addition to compiling and executing programs, whole program tests
+serve as a way of benchmarking LLVM performance, both in terms of the
+efficiency of the programs generated as well as the speed with which
+LLVM compiles, optimizes, and generates code.
+
+The test-suite is located in the ``test-suite`` Subversion module.
+
+See the :doc:`TestSuiteGuide` for details.
+
+Debugging Information tests
+---------------------------
+
+The test suite contains tests to check quality of debugging information.
+The test are written in C based languages or in LLVM assembly language.
+
+These tests are compiled and run under a debugger. The debugger output
+is checked to validate of debugging information. See README.txt in the
+test suite for more information . This test suite is located in the
+``debuginfo-tests`` Subversion module.
+
+Quick start
+===========
+
+The tests are located in two separate Subversion modules. The
+regressions tests are in the main "llvm" module under the directory
+``llvm/test`` (so you get these tests for free with the main LLVM tree).
+Use ``make check-all`` to run the regression tests after building LLVM.
+
+The ``test-suite`` module contains more comprehensive tests including whole C
+and C++ programs. See the :doc:`TestSuiteGuide` for details.
+
+Regression tests
+----------------
+
+To run all of the LLVM regression tests use the check-llvm target:
+
+.. code-block:: bash
+
+ % make check-llvm
+
+If you have `Clang <http://clang.llvm.org/>`_ checked out and built, you
+can run the LLVM and Clang tests simultaneously using:
+
+.. code-block:: bash
+
+ % make check-all
+
+To run the tests with Valgrind (Memcheck by default), use the ``LIT_ARGS`` make
+variable to pass the required options to lit. For example, you can use:
+
+.. code-block:: bash
+
+ % make check LIT_ARGS="-v --vg --vg-leak"
+
+to enable testing with valgrind and with leak checking enabled.
+
+To run individual tests or subsets of tests, you can use the ``llvm-lit``
+script which is built as part of LLVM. For example, to run the
+``Integer/BitPacked.ll`` test by itself you can run:
+
+.. code-block:: bash
+
+ % llvm-lit ~/llvm/test/Integer/BitPacked.ll
+
+or to run all of the ARM CodeGen tests:
+
+.. code-block:: bash
+
+ % llvm-lit ~/llvm/test/CodeGen/ARM
+
+For more information on using the :program:`lit` tool, see ``llvm-lit --help``
+or the :doc:`lit man page <CommandGuide/lit>`.
+
+Debugging Information tests
+---------------------------
+
+To run debugging information tests simply checkout the tests inside
+clang/test directory.
+
+.. code-block:: bash
+
+ % cd clang/test
+ % svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests
+
+These tests are already set up to run as part of clang regression tests.
+
+Regression test structure
+=========================
+
+The LLVM regression tests are driven by :program:`lit` and are located in the
+``llvm/test`` directory.
+
+This directory contains a large array of small tests that exercise
+various features of LLVM and to ensure that regressions do not occur.
+The directory is broken into several sub-directories, each focused on a
+particular area of LLVM.
+
+Writing new regression tests
+----------------------------
+
+The regression test structure is very simple, but does require some
+information to be set. This information is gathered via ``configure``
+and is written to a file, ``test/lit.site.cfg`` in the build directory.
+The ``llvm/test`` Makefile does this work for you.
+
+In order for the regression tests to work, each directory of tests must
+have a ``lit.local.cfg`` file. :program:`lit` looks for this file to determine
+how to run the tests. This file is just Python code and thus is very
+flexible, but we've standardized it for the LLVM regression tests. If
+you're adding a directory of tests, just copy ``lit.local.cfg`` from
+another directory to get running. The standard ``lit.local.cfg`` simply
+specifies which files to look in for tests. Any directory that contains
+only directories does not need the ``lit.local.cfg`` file. Read the :doc:`Lit
+documentation <CommandGuide/lit>` for more information.
+
+Each test file must contain lines starting with "RUN:" that tell :program:`lit`
+how to run it. If there are no RUN lines, :program:`lit` will issue an error
+while running a test.
+
+RUN lines are specified in the comments of the test program using the
+keyword ``RUN`` followed by a colon, and lastly the command (pipeline)
+to execute. Together, these lines form the "script" that :program:`lit`
+executes to run the test case. The syntax of the RUN lines is similar to a
+shell's syntax for pipelines including I/O redirection and variable
+substitution. However, even though these lines may *look* like a shell
+script, they are not. RUN lines are interpreted by :program:`lit`.
+Consequently, the syntax differs from shell in a few ways. You can specify
+as many RUN lines as needed.
+
+:program:`lit` performs substitution on each RUN line to replace LLVM tool names
+with the full paths to the executable built for each tool (in
+``$(LLVM_OBJ_ROOT)/$(BuildMode)/bin)``. This ensures that :program:`lit` does
+not invoke any stray LLVM tools in the user's path during testing.
+
+Each RUN line is executed on its own, distinct from other lines unless
+its last character is ``\``. This continuation character causes the RUN
+line to be concatenated with the next one. In this way you can build up
+long pipelines of commands without making huge line lengths. The lines
+ending in ``\`` are concatenated until a RUN line that doesn't end in
+``\`` is found. This concatenated set of RUN lines then constitutes one
+execution. :program:`lit` will substitute variables and arrange for the pipeline
+to be executed. If any process in the pipeline fails, the entire line (and
+test case) fails too.
+
+Below is an example of legal RUN lines in a ``.ll`` file:
+
+.. code-block:: llvm
+
+ ; RUN: llvm-as < %s | llvm-dis > %t1
+ ; RUN: llvm-dis < %s.bc-13 > %t2
+ ; RUN: diff %t1 %t2
+
+As with a Unix shell, the RUN lines permit pipelines and I/O
+redirection to be used.
+
+There are some quoting rules that you must pay attention to when writing
+your RUN lines. In general nothing needs to be quoted. :program:`lit` won't
+strip off any quote characters so they will get passed to the invoked program.
+To avoid this use curly braces to tell :program:`lit` that it should treat
+everything enclosed as one value.
+
+In general, you should strive to keep your RUN lines as simple as possible,
+using them only to run tools that generate textual output you can then examine.
+The recommended way to examine output to figure out if the test passes is using
+the :doc:`FileCheck tool <CommandGuide/FileCheck>`. *[The usage of grep in RUN
+lines is deprecated - please do not send or commit patches that use it.]*
+
+Put related tests into a single file rather than having a separate file per
+test. Check if there are files already covering your feature and consider
+adding your code there instead of creating a new file.
+
+Extra files
+-----------
+
+If your test requires extra files besides the file containing the ``RUN:``
+lines, the idiomatic place to put them is in a subdirectory ``Inputs``.
+You can then refer to the extra files as ``%S/Inputs/foo.bar``.
+
+For example, consider ``test/Linker/ident.ll``. The directory structure is
+as follows::
+
+ test/
+ Linker/
+ ident.ll
+ Inputs/
+ ident.a.ll
+ ident.b.ll
+
+For convenience, these are the contents:
+
+.. code-block:: llvm
+
+ ;;;;; ident.ll:
+
+ ; RUN: llvm-link %S/Inputs/ident.a.ll %S/Inputs/ident.b.ll -S | FileCheck %s
+
+ ; Verify that multiple input llvm.ident metadata are linked together.
+
+ ; CHECK-DAG: !llvm.ident = !{!0, !1, !2}
+ ; CHECK-DAG: "Compiler V1"
+ ; CHECK-DAG: "Compiler V2"
+ ; CHECK-DAG: "Compiler V3"
+
+ ;;;;; Inputs/ident.a.ll:
+
+ !llvm.ident = !{!0, !1}
+ !0 = metadata !{metadata !"Compiler V1"}
+ !1 = metadata !{metadata !"Compiler V2"}
+
+ ;;;;; Inputs/ident.b.ll:
+
+ !llvm.ident = !{!0}
+ !0 = metadata !{metadata !"Compiler V3"}
+
+For symmetry reasons, ``ident.ll`` is just a dummy file that doesn't
+actually participate in the test besides holding the ``RUN:`` lines.
+
+.. note::
+
+ Some existing tests use ``RUN: true`` in extra files instead of just
+ putting the extra files in an ``Inputs/`` directory. This pattern is
+ deprecated.
+
+Fragile tests
+-------------
+
+It is easy to write a fragile test that would fail spuriously if the tool being
+tested outputs a full path to the input file. For example, :program:`opt` by
+default outputs a ``ModuleID``:
+
+.. code-block:: console
+
+ $ cat example.ll
+ define i32 @main() nounwind {
+ ret i32 0
+ }
+
+ $ opt -S /path/to/example.ll
+ ; ModuleID = '/path/to/example.ll'
+
+ define i32 @main() nounwind {
+ ret i32 0
+ }
+
+``ModuleID`` can unexpectedly match against ``CHECK`` lines. For example:
+
+.. code-block:: llvm
+
+ ; RUN: opt -S %s | FileCheck
+
+ define i32 @main() nounwind {
+ ; CHECK-NOT: load
+ ret i32 0
+ }
+
+This test will fail if placed into a ``download`` directory.
+
+To make your tests robust, always use ``opt ... < %s`` in the RUN line.
+:program:`opt` does not output a ``ModuleID`` when input comes from stdin.
+
+Platform-Specific Tests
+-----------------------
+
+Whenever adding tests that require the knowledge of a specific platform,
+either related to code generated, specific output or back-end features,
+you must make sure to isolate the features, so that buildbots that
+run on different architectures (and don't even compile all back-ends),
+don't fail.
+
+The first problem is to check for target-specific output, for example sizes
+of structures, paths and architecture names, for example:
+
+* Tests containing Windows paths will fail on Linux and vice-versa.
+* Tests that check for ``x86_64`` somewhere in the text will fail anywhere else.
+* Tests where the debug information calculates the size of types and structures.
+
+Also, if the test rely on any behaviour that is coded in any back-end, it must
+go in its own directory. So, for instance, code generator tests for ARM go
+into ``test/CodeGen/ARM`` and so on. Those directories contain a special
+``lit`` configuration file that ensure all tests in that directory will
+only run if a specific back-end is compiled and available.
+
+For instance, on ``test/CodeGen/ARM``, the ``lit.local.cfg`` is:
+
+.. code-block:: python
+
+ config.suffixes = ['.ll', '.c', '.cpp', '.test']
+ if not 'ARM' in config.root.targets:
+ config.unsupported = True
+
+Other platform-specific tests are those that depend on a specific feature
+of a specific sub-architecture, for example only to Intel chips that support ``AVX2``.
+
+For instance, ``test/CodeGen/X86/psubus.ll`` tests three sub-architecture
+variants:
+
+.. code-block:: llvm
+
+ ; RUN: llc -mcpu=core2 < %s | FileCheck %s -check-prefix=SSE2
+ ; RUN: llc -mcpu=corei7-avx < %s | FileCheck %s -check-prefix=AVX1
+ ; RUN: llc -mcpu=core-avx2 < %s | FileCheck %s -check-prefix=AVX2
+
+And the checks are different:
+
+.. code-block:: llvm
+
+ ; SSE2: @test1
+ ; SSE2: psubusw LCPI0_0(%rip), %xmm0
+ ; AVX1: @test1
+ ; AVX1: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0
+ ; AVX2: @test1
+ ; AVX2: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0
+
+So, if you're testing for a behaviour that you know is platform-specific or
+depends on special features of sub-architectures, you must add the specific
+triple, test with the specific FileCheck and put it into the specific
+directory that will filter out all other architectures.
+
+
+Constraining test execution
+---------------------------
+
+Some tests can be run only in specific configurations, such as
+with debug builds or on particular platforms. Use ``REQUIRES``
+and ``UNSUPPORTED`` to control when the test is enabled.
+
+Some tests are expected to fail. For example, there may be a known bug
+that the test detect. Use ``XFAIL`` to mark a test as an expected failure.
+An ``XFAIL`` test will be successful if its execution fails, and
+will be a failure if its execution succeeds.
+
+.. code-block:: llvm
+
+ ; This test will be only enabled in the build with asserts.
+ ; REQUIRES: asserts
+ ; This test is disabled on Linux.
+ ; UNSUPPORTED: -linux-
+ ; This test is expected to fail on PowerPC.
+ ; XFAIL: powerpc
+
+``REQUIRES`` and ``UNSUPPORTED`` and ``XFAIL`` all accept a comma-separated
+list of boolean expressions. The values in each expression may be:
+
+- Features added to ``config.available_features`` by
+ configuration files such as ``lit.cfg``.
+- Substrings of the target triple (``UNSUPPORTED`` and ``XFAIL`` only).
+
+| ``REQUIRES`` enables the test if all expressions are true.
+| ``UNSUPPORTED`` disables the test if any expression is true.
+| ``XFAIL`` expects the test to fail if any expression is true.
+
+As a special case, ``XFAIL: *`` is expected to fail everywhere.
+
+.. code-block:: llvm
+
+ ; This test is disabled on Windows,
+ ; and is disabled on Linux, except for Android Linux.
+ ; UNSUPPORTED: windows, linux && !android
+ ; This test is expected to fail on both PowerPC and ARM.
+ ; XFAIL: powerpc || arm
+
+
+Substitutions
+-------------
+
+Besides replacing LLVM tool names the following substitutions are performed in
+RUN lines:
+
+``%%``
+ Replaced by a single ``%``. This allows escaping other substitutions.
+
+``%s``
+ File path to the test case's source. This is suitable for passing on the
+ command line as the input to an LLVM tool.
+
+ Example: ``/home/user/llvm/test/MC/ELF/foo_test.s``
+
+``%S``
+ Directory path to the test case's source.
+
+ Example: ``/home/user/llvm/test/MC/ELF``
+
+``%t``
+ File path to a temporary file name that could be used for this test case.
+ The file name won't conflict with other test cases. You can append to it
+ if you need multiple temporaries. This is useful as the destination of
+ some redirected output.
+
+ Example: ``/home/user/llvm.build/test/MC/ELF/Output/foo_test.s.tmp``
+
+``%T``
+ Directory of ``%t``. Deprecated. Shouldn't be used, because it can be easily
+ misused and cause race conditions between tests.
+
+ Use ``rm -rf %t && mkdir %t`` instead if a temporary directory is necessary.
+
+ Example: ``/home/user/llvm.build/test/MC/ELF/Output``
+
+``%{pathsep}``
+
+ Expands to the path separator, i.e. ``:`` (or ``;`` on Windows).
+
+``%/s, %/S, %/t, %/T:``
+
+ Act like the corresponding substitution above but replace any ``\``
+ character with a ``/``. This is useful to normalize path separators.
+
+ Example: ``%s: C:\Desktop Files/foo_test.s.tmp``
+
+ Example: ``%/s: C:/Desktop Files/foo_test.s.tmp``
+
+``%:s, %:S, %:t, %:T:``
+
+ Act like the corresponding substitution above but remove colons at
+ the beginning of Windows paths. This is useful to allow concatenation
+ of absolute paths on Windows to produce a legal path.
+
+ Example: ``%s: C:\Desktop Files\foo_test.s.tmp``
+
+ Example: ``%:s: C\Desktop Files\foo_test.s.tmp``
+
+
+**LLVM-specific substitutions:**
+
+``%shlibext``
+ The suffix for the host platforms shared library files. This includes the
+ period as the first character.
+
+ Example: ``.so`` (Linux), ``.dylib`` (OS X), ``.dll`` (Windows)
+
+``%exeext``
+ The suffix for the host platforms executable files. This includes the
+ period as the first character.
+
+ Example: ``.exe`` (Windows), empty on Linux.
+
+``%(line)``, ``%(line+<number>)``, ``%(line-<number>)``
+ The number of the line where this substitution is used, with an optional
+ integer offset. This can be used in tests with multiple RUN lines, which
+ reference test file's line numbers.
+
+
+**Clang-specific substitutions:**
+
+``%clang``
+ Invokes the Clang driver.
+
+``%clang_cpp``
+ Invokes the Clang driver for C++.
+
+``%clang_cl``
+ Invokes the CL-compatible Clang driver.
+
+``%clangxx``
+ Invokes the G++-compatible Clang driver.
+
+``%clang_cc1``
+ Invokes the Clang frontend.
+
+``%itanium_abi_triple``, ``%ms_abi_triple``
+ These substitutions can be used to get the current target triple adjusted to
+ the desired ABI. For example, if the test suite is running with the
+ ``i686-pc-win32`` target, ``%itanium_abi_triple`` will expand to
+ ``i686-pc-mingw32``. This allows a test to run with a specific ABI without
+ constraining it to a specific triple.
+
+To add more substituations, look at ``test/lit.cfg`` or ``lit.local.cfg``.
+
+
+Options
+-------
+
+The llvm lit configuration allows to customize some things with user options:
+
+``llc``, ``opt``, ...
+ Substitute the respective llvm tool name with a custom command line. This
+ allows to specify custom paths and default arguments for these tools.
+ Example:
+
+ % llvm-lit "-Dllc=llc -verify-machineinstrs"
+
+``run_long_tests``
+ Enable the execution of long running tests.
+
+``llvm_site_config``
+ Load the specified lit configuration instead of the default one.
+
+
+Other Features
+--------------
+
+To make RUN line writing easier, there are several helper programs. These
+helpers are in the PATH when running tests, so you can just call them using
+their name. For example:
+
+``not``
+ This program runs its arguments and then inverts the result code from it.
+ Zero result codes become 1. Non-zero result codes become 0.
+
+To make the output more useful, :program:`lit` will scan
+the lines of the test case for ones that contain a pattern that matches
+``PR[0-9]+``. This is the syntax for specifying a PR (Problem Report) number
+that is related to the test case. The number after "PR" specifies the
+LLVM bugzilla number. When a PR number is specified, it will be used in
+the pass/fail reporting. This is useful to quickly get some context when
+a test fails.
+
+Finally, any line that contains "END." will cause the special
+interpretation of lines to terminate. This is generally done right after
+the last RUN: line. This has two side effects:
+
+(a) it prevents special interpretation of lines that are part of the test
+ program, not the instructions to the test case, and
+
+(b) it speeds things up for really big test cases by avoiding
+ interpretation of the remainder of the file.
Added: www-releases/trunk/8.0.1/docs/_sources/TransformMetadata.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TransformMetadata.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TransformMetadata.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TransformMetadata.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,441 @@
+.. _transformation-metadata:
+
+============================
+Code Transformation Metadata
+============================
+
+.. contents::
+ :local:
+
+Overview
+========
+
+LLVM transformation passes can be controlled by attaching metadata to
+the code to transform. By default, transformation passes use heuristics
+to determine whether or not to perform transformations, and when doing
+so, other details of how the transformations are applied (e.g., which
+vectorization factor to select).
+Unless the optimizer is otherwise directed, transformations are applied
+conservatively. This conservatism generally allows the optimizer to
+avoid unprofitable transformations, but in practice, this results in the
+optimizer not applying transformations that would be highly profitable.
+
+Frontends can give additional hints to LLVM passes on which
+transformations they should apply. This can be additional knowledge that
+cannot be derived from the emitted IR, or directives passed from the
+user/programmer. OpenMP pragmas are an example of the latter.
+
+If any such metadata is dropped from the program, the code's semantics
+must not change.
+
+Metadata on Loops
+=================
+
+Attributes can be attached to loops as described in :ref:`llvm.loop`.
+Attributes can describe properties of the loop, disable transformations,
+force specific transformations and set transformation options.
+
+Because metadata nodes are immutable (with the exception of
+``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
+metadata), in order to add or remove a loop attributes, a new ``MDNode``
+must be created and assigned as the new ``llvm.loop`` metadata. Any
+connection between the old ``MDNode`` and the loop is lost. The
+``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
+the loop effectively gets a new identifier. For instance,
+``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
+the parallel access property is to be preserved after adding/removing
+loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
+updated to the new LoopID.
+
+Transformation Metadata Structure
+=================================
+
+Some attributes describe code transformations (unrolling, vectorizing,
+loop distribution, etc.). They can either be a hint to the optimizer
+that a transformation might be beneficial, instruction to use a specific
+option, , or convey a specific request from the user (such as
+``#pragma clang loop`` or ``#pragma omp simd``).
+
+If a transformation is forced but cannot be carried-out for any reason,
+an optimization-missed warning must be emitted. Semantic information
+such as a transformation being safe (e.g.
+``llvm.mem.parallel_loop_access``) can be unused by the optimizer
+without generating a warning.
+
+Unless explicitly disabled, any optimization pass may heuristically
+determine whether a transformation is beneficial and apply it. If
+metadata for another transformation was specified, applying a different
+transformation before it might be inadvertent due to being applied on a
+different loop or the loop not existing anymore. To avoid having to
+explicitly disable an unknown number of passes, the attribute
+``llvm.loop.disable_nonforced`` disables all optional, high-level,
+restructuring transformations.
+
+The following example avoids the loop being altered before being
+vectorized, for instance being unrolled.
+
+.. code-block:: llvm
+
+ br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
+ ...
+ !0 = distinct !{!0, !1, !2}
+ !1 = !{!"llvm.loop.vectorize.enable", i1 true}
+ !2 = !{!"llvm.loop.disable_nonforced"}
+
+After a transformation is applied, follow-up attributes are set on the
+transformed and/or new loop(s). This allows additional attributes
+including followup-transformations to be specified. Specifying multiple
+transformations in the same metadata node is possible for compatibility
+reasons, but their execution order is undefined. For instance, when
+``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
+specified at the same time, unrolling may occur either before or after
+vectorization.
+
+As an example, the following instructs a loop to be vectorized and only
+then unrolled.
+
+.. code-block:: llvm
+
+ !0 = distinct !{!0, !1, !2, !3}
+ !1 = !{!"llvm.loop.vectorize.enable", i1 true}
+ !2 = !{!"llvm.loop.disable_nonforced"}
+ !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}
+
+If, and only if, no followup is specified, the pass may add attributes itself.
+For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
+all attributes from the original loop excluding its loop vectorizer
+attributes. To avoid this, an empty followup attribute can be used, e.g.
+
+.. code-block:: llvm
+
+ !3 = !{!"llvm.loop.vectorize.followup_vectorized"}
+
+The followup attributes of a transformation that cannot be applied will
+never be added to a loop and are therefore effectively ignored. This means
+that any followup-transformation in such attributes requires that its
+prior transformations are applied before the followup-transformation.
+The user should receive a warning about the first transformation in the
+transformation chain that could not be applied if it a forced
+transformation. All following transformations are skipped.
+
+Pass-Specific Transformation Metadata
+=====================================
+
+Transformation options are specific to each transformation. In the
+following, we present the model for each LLVM loop optimization pass and
+the metadata to influence them.
+
+Loop Vectorization and Interleaving
+-----------------------------------
+
+Loop vectorization and interleaving is interpreted as a single
+transformation. It is interpreted as forced if
+``!{"llvm.loop.vectorize.enable", i1 true}`` is set.
+
+Assuming the pre-vectorization loop is
+
+.. code-block:: c
+
+ for (int i = 0; i < n; i+=1) // original loop
+ Stmt(i);
+
+then the code after vectorization will be approximately (assuming an
+SIMD width of 4):
+
+.. code-block:: c
+
+ int i = 0;
+ if (rtc) {
+ for (; i + 3 < n; i+=4) // vectorized/interleaved loop
+ Stmt(i:i+3);
+ }
+ for (; i < n; i+=1) // epilogue loop
+ Stmt(i);
+
+where ``rtc`` is a generated runtime check.
+
+``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
+the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
+combined with the original loop's attributes to avoid it being
+vectorized multiple times.
+
+``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
+the remainder loop. If not specified, it will have the original loop's
+attributes combined with ``llvm.loop.isvectorized`` and
+``llvm.loop.unroll.runtime.disable`` (unless the original loop already
+has unroll metadata).
+
+The attributes specified by ``llvm.loop.vectorize.followup_all`` are
+added to both loops.
+
+When using a follow-up attribute, it replaces any automatically deduced
+attributes for the generated loop in question. Therefore it is
+recommended to add ``llvm.loop.isvectorized`` to
+``llvm.loop.vectorize.followup_all`` which avoids that the loop
+vectorizer tries to optimize the loops again.
+
+Loop Unrolling
+--------------
+
+Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
+metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
+is present. Unrolling can be full unrolling, partial unrolling of a loop
+with constant trip count or runtime unrolling of a loop with a trip
+count unknown at compile-time.
+
+If the loop has been unrolled fully, there is no followup-loop. For
+partial/runtime unrolling, the original loop of
+
+.. code-block:: c
+
+ for (int i = 0; i < n; i+=1) // original loop
+ Stmt(i);
+
+is transformed into (using an unroll factor of 4):
+
+.. code-block:: c
+
+ int i = 0;
+ for (; i + 3 < n; i+=4) // unrolled loop
+ Stmt(i);
+ Stmt(i+1);
+ Stmt(i+2);
+ Stmt(i+3);
+ }
+ for (; i < n; i+=1) // remainder loop
+ Stmt(i);
+
+``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
+the unrolled loop. If not specified, the attributes of the original loop
+without the ``llvm.loop.unroll.*`` attributes are copied and
+``llvm.loop.unroll.disable`` added to it.
+
+``llvm.loop.unroll.followup_remainder`` defines the attributes of the
+remainder loop. If not specified the remainder loop will have no
+attributes. The remainder loop might not be present due to being fully
+unrolled in which case this attribute has no effect.
+
+Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
+unrolled and remainder loops.
+
+To avoid that the partially unrolled loop is unrolled again, it is
+recommended to add ``llvm.loop.unroll.disable`` to
+``llvm.loop.unroll.followup_all``. If no follow-up attribute specified
+for a generated loop, it is added automatically.
+
+Unroll-And-Jam
+--------------
+
+Unroll-and-jam uses the following transformation model (here with an
+unroll factor if 2). Currently, it does not support a fallback version
+when the transformation is unsafe.
+
+.. code-block:: c
+
+ for (int i = 0; i < n; i+=1) { // original outer loop
+ Fore(i);
+ for (int j = 0; j < m; j+=1) // original inner loop
+ SubLoop(i, j);
+ Aft(i);
+ }
+
+.. code-block:: c
+
+ int i = 0;
+ for (; i + 1 < n; i+=2) { // unrolled outer loop
+ Fore(i);
+ Fore(i+1);
+ for (int j = 0; j < m; j+=1) { // unrolled inner loop
+ SubLoop(i, j);
+ SubLoop(i+1, j);
+ }
+ Aft(i);
+ Aft(i+1);
+ }
+ for (; i < n; i+=1) { // remainder outer loop
+ Fore(i);
+ for (int j = 0; j < m; j+=1) // remainder inner loop
+ SubLoop(i, j);
+ Aft(i);
+ }
+
+``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
+of the unrolled outer loop. If not specified, the attributes of the
+original outer loop without the ``llvm.loop.unroll.*`` attributes are
+copied and ``llvm.loop.unroll.disable`` added to it.
+
+``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
+of the unrolled inner loop. If not specified, the attributes of the
+original inner loop are used unchanged.
+
+``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
+attributes of the outer remainder loop. If not specified it will not
+have any attributes. The remainder loop might not be present due to
+being fully unrolled.
+
+``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
+attributes of the inner remainder loop. If not specified it will have
+the attributes of the original inner loop. It the outer remainder loop
+is unrolled, the inner remainder loop might be present multiple times.
+
+Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
+added to all of the aforementioned output loops.
+
+To avoid that the unrolled loop is unrolled again, it is
+recommended to add ``llvm.loop.unroll.disable`` to
+``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam
+as well as an additional inner loop unrolling. If no follow-up
+attribute specified for a generated loop, it is added automatically.
+
+Loop Distribution
+-----------------
+
+The LoopDistribution pass tries to separate vectorizable parts of a loop
+from the non-vectorizable part (which otherwise would make the entire
+loop non-vectorizable). Conceptually, it transforms a loop such as
+
+.. code-block:: c
+
+ for (int i = 1; i < n; i+=1) { // original loop
+ A[i] = i;
+ B[i] = 2 + B[i];
+ C[i] = 3 + C[i - 1];
+ }
+
+into the following code:
+
+.. code-block:: c
+
+ if (rtc) {
+ for (int i = 1; i < n; i+=1) // coincident loop
+ A[i] = i;
+ for (int i = 1; i < n; i+=1) // coincident loop
+ B[i] = 2 + B[i];
+ for (int i = 1; i < n; i+=1) // sequential loop
+ C[i] = 3 + C[i - 1];
+ } else {
+ for (int i = 1; i < n; i+=1) { // fallback loop
+ A[i] = i;
+ B[i] = 2 + B[i];
+ C[i] = 3 + C[i - 1];
+ }
+ }
+
+where ``rtc`` is a generated runtime check.
+
+``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
+all loops without loop-carried dependencies (i.e. vectorizable loops).
+There might be more than one such loops. If not defined, the loops will
+inherit the original loop's attributes.
+
+``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
+loop with potentially unsafe dependencies. There should be at most one
+such loop. If not defined, the loop will inherit the original loop's
+attributes.
+
+``llvm.loop.distribute.followup_fallback`` defines the loop attributes
+for the fallback loop, which is a copy of the original loop for when
+loop versioning is required. If undefined, the fallback loop inherits
+all attributes from the original loop.
+
+Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
+all of the aforementioned output loops.
+
+It is recommended to add ``llvm.loop.disable_nonforced`` to
+``llvm.loop.distribute.followup_fallback``. This avoids that the
+fallback version (which is likely never executed) is further optimzed
+which would increase the code size.
+
+Versioning LICM
+---------------
+
+The pass hoists code out of loops that are only loop-invariant when
+dynamic conditions apply. For instance, it transforms the loop
+
+.. code-block:: c
+
+ for (int i = 0; i < n; i+=1) // original loop
+ A[i] = B[0];
+
+into:
+
+.. code-block:: c
+
+ if (rtc) {
+ auto b = B[0];
+ for (int i = 0; i < n; i+=1) // versioned loop
+ A[i] = b;
+ } else {
+ for (int i = 0; i < n; i+=1) // unversioned loop
+ A[i] = B[0];
+ }
+
+The runtime condition (``rtc``) checks that the array ``A`` and the
+element `B[0]` do not alias.
+
+Currently, this transformation does not support followup-attributes.
+
+Loop Interchange
+----------------
+
+Currently, the ``LoopInterchange`` pass does not use any metadata.
+
+Ambiguous Transformation Order
+==============================
+
+If there multiple transformations defined, the order in which they are
+executed depends on the order in LLVM's pass pipeline, which is subject
+to change. The default optimization pipeline (anything higher than
+``-O0``) has the following order.
+
+When using the legacy pass manager:
+
+ - LoopInterchange (if enabled)
+ - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
+ - VersioningLICM (if enabled)
+ - LoopDistribute
+ - LoopVectorizer
+ - LoopUnrollAndJam (if enabled)
+ - LoopUnroll (partial and runtime unrolling)
+
+When using the legacy pass manager with LTO:
+
+ - LoopInterchange (if enabled)
+ - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
+ - LoopVectorizer
+ - LoopUnroll (partial and runtime unrolling)
+
+When using the new pass manager:
+
+ - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
+ - LoopDistribute
+ - LoopVectorizer
+ - LoopUnrollAndJam (if enabled)
+ - LoopUnroll (partial and runtime unrolling)
+
+Leftover Transformations
+========================
+
+Forced transformations that have not been applied after the last
+transformation pass should be reported to the user. The transformation
+passes themselves cannot be responsible for this reporting because they
+might not be in the pipeline, there might be multiple passes able to
+apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
+transformation attribute may be 'hidden' inside another passes' followup
+attribute.
+
+The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
+emits such warnings. It should be placed after the last transformation
+pass.
+
+The current pass pipeline has a fixed order in which transformations
+passes are executed. A transformation can be in the followup of a pass
+that is executed later and thus leftover. For instance, a loop nest
+cannot be distributed and then interchanged with the current pass
+pipeline. The loop distribution will execute, but there is no loop
+interchange pass following such that any loop interchange metadata will
+be ignored. The ``-transform-warning`` should emit a warning in this
+case.
+
+Future versions of LLVM may fix this by executing transformations using
+a dynamic ordering.
Added: www-releases/trunk/8.0.1/docs/_sources/TypeMetadata.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/TypeMetadata.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/TypeMetadata.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/TypeMetadata.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,226 @@
+=============
+Type Metadata
+=============
+
+Type metadata is a mechanism that allows IR modules to co-operatively build
+pointer sets corresponding to addresses within a given set of globals. LLVM's
+`control flow integrity`_ implementation uses this metadata to efficiently
+check (at each call site) that a given address corresponds to either a
+valid vtable or function pointer for a given class or function type, and its
+whole-program devirtualization pass uses the metadata to identify potential
+callees for a given virtual call.
+
+To use the mechanism, a client creates metadata nodes with two elements:
+
+1. a byte offset into the global (generally zero for functions)
+2. a metadata object representing an identifier for the type
+
+These metadata nodes are associated with globals by using global object
+metadata attachments with the ``!type`` metadata kind.
+
+Each type identifier must exclusively identify either global variables
+or functions.
+
+.. admonition:: Limitation
+
+ The current implementation only supports attaching metadata to functions on
+ the x86-32 and x86-64 architectures.
+
+An intrinsic, :ref:`llvm.type.test <type.test>`, is used to test whether a
+given pointer is associated with a type identifier.
+
+.. _control flow integrity: http://clang.llvm.org/docs/ControlFlowIntegrity.html
+
+Representing Type Information using Type Metadata
+=================================================
+
+This section describes how Clang represents C++ type information associated with
+virtual tables using type metadata.
+
+Consider the following inheritance hierarchy:
+
+.. code-block:: c++
+
+ struct A {
+ virtual void f();
+ };
+
+ struct B : A {
+ virtual void f();
+ virtual void g();
+ };
+
+ struct C {
+ virtual void h();
+ };
+
+ struct D : A, C {
+ virtual void f();
+ virtual void h();
+ };
+
+The virtual table objects for A, B, C and D look like this (under the Itanium ABI):
+
+.. csv-table:: Virtual Table Layout for A, B, C, D
+ :header: Class, 0, 1, 2, 3, 4, 5, 6
+
+ A, A::offset-to-top, &A::rtti, &A::f
+ B, B::offset-to-top, &B::rtti, &B::f, &B::g
+ C, C::offset-to-top, &C::rtti, &C::h
+ D, D::offset-to-top, &D::rtti, &D::f, &D::h, D::offset-to-top, &D::rtti, thunk for &D::h
+
+When an object of type A is constructed, the address of ``&A::f`` in A's
+virtual table object is stored in the object's vtable pointer. In ABI parlance
+this address is known as an `address point`_. Similarly, when an object of type
+B is constructed, the address of ``&B::f`` is stored in the vtable pointer. In
+this way, the vtable in B's virtual table object is compatible with A's vtable.
+
+D is a little more complicated, due to the use of multiple inheritance. Its
+virtual table object contains two vtables, one compatible with A's vtable and
+the other compatible with C's vtable. Objects of type D contain two virtual
+pointers, one belonging to the A subobject and containing the address of
+the vtable compatible with A's vtable, and the other belonging to the C
+subobject and containing the address of the vtable compatible with C's vtable.
+
+The full set of compatibility information for the above class hierarchy is
+shown below. The following table shows the name of a class, the offset of an
+address point within that class's vtable and the name of one of the classes
+with which that address point is compatible.
+
+.. csv-table:: Type Offsets for A, B, C, D
+ :header: VTable for, Offset, Compatible Class
+
+ A, 16, A
+ B, 16, A
+ , , B
+ C, 16, C
+ D, 16, A
+ , , D
+ , 48, C
+
+The next step is to encode this compatibility information into the IR. The way
+this is done is to create type metadata named after each of the compatible
+classes, with which we associate each of the compatible address points in
+each vtable. For example, these type metadata entries encode the compatibility
+information for the above hierarchy:
+
+::
+
+ @_ZTV1A = constant [...], !type !0
+ @_ZTV1B = constant [...], !type !0, !type !1
+ @_ZTV1C = constant [...], !type !2
+ @_ZTV1D = constant [...], !type !0, !type !3, !type !4
+
+ !0 = !{i64 16, !"_ZTS1A"}
+ !1 = !{i64 16, !"_ZTS1B"}
+ !2 = !{i64 16, !"_ZTS1C"}
+ !3 = !{i64 16, !"_ZTS1D"}
+ !4 = !{i64 48, !"_ZTS1C"}
+
+With this type metadata, we can now use the ``llvm.type.test`` intrinsic to
+test whether a given pointer is compatible with a type identifier. Working
+backwards, if ``llvm.type.test`` returns true for a particular pointer,
+we can also statically determine the identities of the virtual functions
+that a particular virtual call may call. For example, if a program assumes
+a pointer to be a member of ``!"_ZST1A"``, we know that the address can
+be only be one of ``_ZTV1A+16``, ``_ZTV1B+16`` or ``_ZTV1D+16`` (i.e. the
+address points of the vtables of A, B and D respectively). If we then load
+an address from that pointer, we know that the address can only be one of
+``&A::f``, ``&B::f`` or ``&D::f``.
+
+.. _address point: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vtable-general
+
+Testing Addresses For Type Membership
+=====================================
+
+If a program tests an address using ``llvm.type.test``, this will cause
+a link-time optimization pass, ``LowerTypeTests``, to replace calls to this
+intrinsic with efficient code to perform type member tests. At a high level,
+the pass will lay out referenced globals in a consecutive memory region in
+the object file, construct bit vectors that map onto that memory region,
+and generate code at each of the ``llvm.type.test`` call sites to test
+pointers against those bit vectors. Because of the layout manipulation, the
+globals' definitions must be available at LTO time. For more information,
+see the `control flow integrity design document`_.
+
+A type identifier that identifies functions is transformed into a jump table,
+which is a block of code consisting of one branch instruction for each
+of the functions associated with the type identifier that branches to the
+target function. The pass will redirect any taken function addresses to the
+corresponding jump table entry. In the object file's symbol table, the jump
+table entries take the identities of the original functions, so that addresses
+taken outside the module will pass any verification done inside the module.
+
+Jump tables may call external functions, so their definitions need not
+be available at LTO time. Note that if an externally defined function is
+associated with a type identifier, there is no guarantee that its identity
+within the module will be the same as its identity outside of the module,
+as the former will be the jump table entry if a jump table is necessary.
+
+The `GlobalLayoutBuilder`_ class is responsible for laying out the globals
+efficiently to minimize the sizes of the underlying bitsets.
+
+.. _control flow integrity design document: http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
+
+:Example:
+
+::
+
+ target datalayout = "e-p:32:32"
+
+ @a = internal global i32 0, !type !0
+ @b = internal global i32 0, !type !0, !type !1
+ @c = internal global i32 0, !type !1
+ @d = internal global [2 x i32] [i32 0, i32 0], !type !2
+
+ define void @e() !type !3 {
+ ret void
+ }
+
+ define void @f() {
+ ret void
+ }
+
+ declare void @g() !type !3
+
+ !0 = !{i32 0, !"typeid1"}
+ !1 = !{i32 0, !"typeid2"}
+ !2 = !{i32 4, !"typeid2"}
+ !3 = !{i32 0, !"typeid3"}
+
+ declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone
+
+ define i1 @foo(i32* %p) {
+ %pi8 = bitcast i32* %p to i8*
+ %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1")
+ ret i1 %x
+ }
+
+ define i1 @bar(i32* %p) {
+ %pi8 = bitcast i32* %p to i8*
+ %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2")
+ ret i1 %x
+ }
+
+ define i1 @baz(void ()* %p) {
+ %pi8 = bitcast void ()* %p to i8*
+ %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
+ ret i1 %x
+ }
+
+ define void @main() {
+ %a1 = call i1 @foo(i32* @a) ; returns 1
+ %b1 = call i1 @foo(i32* @b) ; returns 1
+ %c1 = call i1 @foo(i32* @c) ; returns 0
+ %a2 = call i1 @bar(i32* @a) ; returns 0
+ %b2 = call i1 @bar(i32* @b) ; returns 1
+ %c2 = call i1 @bar(i32* @c) ; returns 1
+ %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
+ %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
+ %e = call i1 @baz(void ()* @e) ; returns 1
+ %f = call i1 @baz(void ()* @f) ; returns 0
+ %g = call i1 @baz(void ()* @g) ; returns 1
+ ret void
+ }
+
+.. _GlobalLayoutBuilder: https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Transforms/IPO/LowerTypeTests.h
Added: www-releases/trunk/8.0.1/docs/_sources/Vectorizers.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/Vectorizers.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/Vectorizers.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/Vectorizers.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,435 @@
+==========================
+Auto-Vectorization in LLVM
+==========================
+
+.. contents::
+ :local:
+
+LLVM has two vectorizers: The :ref:`Loop Vectorizer <loop-vectorizer>`,
+which operates on Loops, and the :ref:`SLP Vectorizer
+<slp-vectorizer>`. These vectorizers
+focus on different optimization opportunities and use different techniques.
+The SLP vectorizer merges multiple scalars that are found in the code into
+vectors while the Loop Vectorizer widens instructions in loops
+to operate on multiple consecutive iterations.
+
+Both the Loop Vectorizer and the SLP Vectorizer are enabled by default.
+
+.. _loop-vectorizer:
+
+The Loop Vectorizer
+===================
+
+Usage
+-----
+
+The Loop Vectorizer is enabled by default, but it can be disabled
+through clang using the command line flag:
+
+.. code-block:: console
+
+ $ clang ... -fno-vectorize file.c
+
+Command line flags
+^^^^^^^^^^^^^^^^^^
+
+The loop vectorizer uses a cost model to decide on the optimal vectorization factor
+and unroll factor. However, users of the vectorizer can force the vectorizer to use
+specific values. Both 'clang' and 'opt' support the flags below.
+
+Users can control the vectorization SIMD width using the command line flag "-force-vector-width".
+
+.. code-block:: console
+
+ $ clang -mllvm -force-vector-width=8 ...
+ $ opt -loop-vectorize -force-vector-width=8 ...
+
+Users can control the unroll factor using the command line flag "-force-vector-interleave"
+
+.. code-block:: console
+
+ $ clang -mllvm -force-vector-interleave=2 ...
+ $ opt -loop-vectorize -force-vector-interleave=2 ...
+
+Pragma loop hint directives
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``#pragma clang loop`` directive allows loop vectorization hints to be
+specified for the subsequent for, while, do-while, or c++11 range-based for
+loop. The directive allows vectorization and interleaving to be enabled or
+disabled. Vector width as well as interleave count can also be manually
+specified. The following example explicitly enables vectorization and
+interleaving:
+
+.. code-block:: c++
+
+ #pragma clang loop vectorize(enable) interleave(enable)
+ while(...) {
+ ...
+ }
+
+The following example implicitly enables vectorization and interleaving by
+specifying a vector width and interleaving count:
+
+.. code-block:: c++
+
+ #pragma clang loop vectorize_width(2) interleave_count(2)
+ for(...) {
+ ...
+ }
+
+See the Clang
+`language extensions
+<http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations>`_
+for details.
+
+Diagnostics
+-----------
+
+Many loops cannot be vectorized including loops with complicated control flow,
+unvectorizable types, and unvectorizable calls. The loop vectorizer generates
+optimization remarks which can be queried using command line options to identify
+and diagnose loops that are skipped by the loop-vectorizer.
+
+Optimization remarks are enabled using:
+
+``-Rpass=loop-vectorize`` identifies loops that were successfully vectorized.
+
+``-Rpass-missed=loop-vectorize`` identifies loops that failed vectorization and
+indicates if vectorization was specified.
+
+``-Rpass-analysis=loop-vectorize`` identifies the statements that caused
+vectorization to fail. If in addition ``-fsave-optimization-record`` is
+provided, multiple causes of vectorization failure may be listed (this behavior
+might change in the future).
+
+Consider the following loop:
+
+.. code-block:: c++
+
+ #pragma clang loop vectorize(enable)
+ for (int i = 0; i < Length; i++) {
+ switch(A[i]) {
+ case 0: A[i] = i*2; break;
+ case 1: A[i] = i; break;
+ default: A[i] = 0;
+ }
+ }
+
+The command line ``-Rpass-missed=loop-vectorized`` prints the remark:
+
+.. code-block:: console
+
+ no_switch.cpp:4:5: remark: loop not vectorized: vectorization is explicitly enabled [-Rpass-missed=loop-vectorize]
+
+And the command line ``-Rpass-analysis=loop-vectorize`` indicates that the
+switch statement cannot be vectorized.
+
+.. code-block:: console
+
+ no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement [-Rpass-analysis=loop-vectorize]
+ switch(A[i]) {
+ ^
+
+To ensure line and column numbers are produced include the command line options
+``-gline-tables-only`` and ``-gcolumn-info``. See the Clang `user manual
+<http://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports>`_
+for details
+
+Features
+--------
+
+The LLVM Loop Vectorizer has a number of features that allow it to vectorize
+complex loops.
+
+Loops with unknown trip count
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The Loop Vectorizer supports loops with an unknown trip count.
+In the loop below, the iteration ``start`` and ``finish`` points are unknown,
+and the Loop Vectorizer has a mechanism to vectorize loops that do not start
+at zero. In this example, 'n' may not be a multiple of the vector width, and
+the vectorizer has to execute the last few iterations as scalar code. Keeping
+a scalar copy of the loop increases the code size.
+
+.. code-block:: c++
+
+ void bar(float *A, float* B, float K, int start, int end) {
+ for (int i = start; i < end; ++i)
+ A[i] *= B[i] + K;
+ }
+
+Runtime Checks of Pointers
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the example below, if the pointers A and B point to consecutive addresses,
+then it is illegal to vectorize the code because some elements of A will be
+written before they are read from array B.
+
+Some programmers use the 'restrict' keyword to notify the compiler that the
+pointers are disjointed, but in our example, the Loop Vectorizer has no way of
+knowing that the pointers A and B are unique. The Loop Vectorizer handles this
+loop by placing code that checks, at runtime, if the arrays A and B point to
+disjointed memory locations. If arrays A and B overlap, then the scalar version
+of the loop is executed.
+
+.. code-block:: c++
+
+ void bar(float *A, float* B, float K, int n) {
+ for (int i = 0; i < n; ++i)
+ A[i] *= B[i] + K;
+ }
+
+
+Reductions
+^^^^^^^^^^
+
+In this example the ``sum`` variable is used by consecutive iterations of
+the loop. Normally, this would prevent vectorization, but the vectorizer can
+detect that 'sum' is a reduction variable. The variable 'sum' becomes a vector
+of integers, and at the end of the loop the elements of the array are added
+together to create the correct result. We support a number of different
+reduction operations, such as addition, multiplication, XOR, AND and OR.
+
+.. code-block:: c++
+
+ int foo(int *A, int *B, int n) {
+ unsigned sum = 0;
+ for (int i = 0; i < n; ++i)
+ sum += A[i] + 5;
+ return sum;
+ }
+
+We support floating point reduction operations when `-ffast-math` is used.
+
+Inductions
+^^^^^^^^^^
+
+In this example the value of the induction variable ``i`` is saved into an
+array. The Loop Vectorizer knows to vectorize induction variables.
+
+.. code-block:: c++
+
+ void bar(float *A, float* B, float K, int n) {
+ for (int i = 0; i < n; ++i)
+ A[i] = i;
+ }
+
+If Conversion
+^^^^^^^^^^^^^
+
+The Loop Vectorizer is able to "flatten" the IF statement in the code and
+generate a single stream of instructions. The Loop Vectorizer supports any
+control flow in the innermost loop. The innermost loop may contain complex
+nesting of IFs, ELSEs and even GOTOs.
+
+.. code-block:: c++
+
+ int foo(int *A, int *B, int n) {
+ unsigned sum = 0;
+ for (int i = 0; i < n; ++i)
+ if (A[i] > B[i])
+ sum += A[i] + 5;
+ return sum;
+ }
+
+Pointer Induction Variables
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This example uses the "accumulate" function of the standard c++ library. This
+loop uses C++ iterators, which are pointers, and not integer indices.
+The Loop Vectorizer detects pointer induction variables and can vectorize
+this loop. This feature is important because many C++ programs use iterators.
+
+.. code-block:: c++
+
+ int baz(int *A, int n) {
+ return std::accumulate(A, A + n, 0);
+ }
+
+Reverse Iterators
+^^^^^^^^^^^^^^^^^
+
+The Loop Vectorizer can vectorize loops that count backwards.
+
+.. code-block:: c++
+
+ int foo(int *A, int *B, int n) {
+ for (int i = n; i > 0; --i)
+ A[i] +=1;
+ }
+
+Scatter / Gather
+^^^^^^^^^^^^^^^^
+
+The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions
+that scatter/gathers memory.
+
+.. code-block:: c++
+
+ int foo(int * A, int * B, int n) {
+ for (intptr_t i = 0; i < n; ++i)
+ A[i] += B[i * 4];
+ }
+
+In many situations the cost model will inform LLVM that this is not beneficial
+and LLVM will only vectorize such code if forced with "-mllvm -force-vector-width=#".
+
+Vectorization of Mixed Types
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The Loop Vectorizer can vectorize programs with mixed types. The Vectorizer
+cost model can estimate the cost of the type conversion and decide if
+vectorization is profitable.
+
+.. code-block:: c++
+
+ int foo(int *A, char *B, int n, int k) {
+ for (int i = 0; i < n; ++i)
+ A[i] += 4 * B[i];
+ }
+
+Global Structures Alias Analysis
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Access to global structures can also be vectorized, with alias analysis being
+used to make sure accesses don't alias. Run-time checks can also be added on
+pointer access to structure members.
+
+Many variations are supported, but some that rely on undefined behaviour being
+ignored (as other compilers do) are still being left un-vectorized.
+
+.. code-block:: c++
+
+ struct { int A[100], K, B[100]; } Foo;
+
+ int foo() {
+ for (int i = 0; i < 100; ++i)
+ Foo.A[i] = Foo.B[i] + 100;
+ }
+
+Vectorization of function calls
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The Loop Vectorizer can vectorize intrinsic math functions.
+See the table below for a list of these functions.
+
++-----+-----+---------+
+| pow | exp | exp2 |
++-----+-----+---------+
+| sin | cos | sqrt |
++-----+-----+---------+
+| log |log2 | log10 |
++-----+-----+---------+
+|fabs |floor| ceil |
++-----+-----+---------+
+|fma |trunc|nearbyint|
++-----+-----+---------+
+| | | fmuladd |
++-----+-----+---------+
+
+Note that the optimizer may not be able to vectorize math library functions
+that correspond to these intrinsics if the library calls access external state
+such as "errno". To allow better optimization of C/C++ math library functions,
+use "-fno-math-errno".
+
+The loop vectorizer knows about special instructions on the target and will
+vectorize a loop containing a function call that maps to the instructions. For
+example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps
+instruction is available.
+
+.. code-block:: c++
+
+ void foo(float *f) {
+ for (int i = 0; i != 1024; ++i)
+ f[i] = floorf(f[i]);
+ }
+
+Partial unrolling during vectorization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Modern processors feature multiple execution units, and only programs that contain a
+high degree of parallelism can fully utilize the entire width of the machine.
+The Loop Vectorizer increases the instruction level parallelism (ILP) by
+performing partial-unrolling of loops.
+
+In the example below the entire array is accumulated into the variable 'sum'.
+This is inefficient because only a single execution port can be used by the processor.
+By unrolling the code the Loop Vectorizer allows two or more execution ports
+to be used simultaneously.
+
+.. code-block:: c++
+
+ int foo(int *A, int *B, int n) {
+ unsigned sum = 0;
+ for (int i = 0; i < n; ++i)
+ sum += A[i];
+ return sum;
+ }
+
+The Loop Vectorizer uses a cost model to decide when it is profitable to unroll loops.
+The decision to unroll the loop depends on the register pressure and the generated code size.
+
+Performance
+-----------
+
+This section shows the execution time of Clang on a simple benchmark:
+`gcc-loops <https://github.com/llvm/llvm-test-suite/tree/master/SingleSource/UnitTests/Vectorizer>`_.
+This benchmarks is a collection of loops from the GCC autovectorization
+`page <http://gcc.gnu.org/projects/tree-ssa/vectorization.html>`_ by Dorit Nuzman.
+
+The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for "corei7-avx", running on a Sandybridge iMac.
+The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels.
+
+.. image:: gcc-loops.png
+
+And Linpack-pc with the same configuration. Result is Mflops, higher is better.
+
+.. image:: linpack-pc.png
+
+Ongoing Development Directions
+------------------------------
+
+.. toctree::
+ :hidden:
+
+ Proposals/VectorizationPlan
+
+:doc:`Proposals/VectorizationPlan`
+ Modeling the process and upgrading the infrastructure of LLVM's Loop Vectorizer.
+
+.. _slp-vectorizer:
+
+The SLP Vectorizer
+==================
+
+Details
+-------
+
+The goal of SLP vectorization (a.k.a. superword-level parallelism) is
+to combine similar independent instructions
+into vector instructions. Memory accesses, arithmetic operations, comparison
+operations, PHI-nodes, can all be vectorized using this technique.
+
+For example, the following function performs very similar operations on its
+inputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these
+into vector operations.
+
+.. code-block:: c++
+
+ void foo(int a1, int a2, int b1, int b2, int *A) {
+ A[0] = a1*(a1 + b1)/b1 + 50*b1/a1;
+ A[1] = a2*(a2 + b2)/b2 + 50*b2/a2;
+ }
+
+The SLP-vectorizer processes the code bottom-up, across basic blocks, in search of scalars to combine.
+
+Usage
+------
+
+The SLP Vectorizer is enabled by default, but it can be disabled
+through clang using the command line flag:
+
+.. code-block:: console
+
+ $ clang -fno-slp-vectorize file.c
Added: www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMBackend.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMBackend.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMBackend.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMBackend.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,1979 @@
+=======================
+Writing an LLVM Backend
+=======================
+
+.. toctree::
+ :hidden:
+
+ HowToUseInstrMappings
+
+.. contents::
+ :local:
+
+Introduction
+============
+
+This document describes techniques for writing compiler backends that convert
+the LLVM Intermediate Representation (IR) to code for a specified machine or
+other languages. Code intended for a specific machine can take the form of
+either assembly code or binary code (usable for a JIT compiler).
+
+The backend of LLVM features a target-independent code generator that may
+create output for several types of target CPUs --- including X86, PowerPC,
+ARM, and SPARC. The backend may also be used to generate code targeted at SPUs
+of the Cell processor or GPUs to support the execution of compute kernels.
+
+The document focuses on existing examples found in subdirectories of
+``llvm/lib/Target`` in a downloaded LLVM release. In particular, this document
+focuses on the example of creating a static compiler (one that emits text
+assembly) for a SPARC target, because SPARC has fairly standard
+characteristics, such as a RISC instruction set and straightforward calling
+conventions.
+
+Audience
+--------
+
+The audience for this document is anyone who needs to write an LLVM backend to
+generate code for a specific hardware or software target.
+
+Prerequisite Reading
+--------------------
+
+These essential documents must be read before reading this document:
+
+* `LLVM Language Reference Manual <LangRef.html>`_ --- a reference manual for
+ the LLVM assembly language.
+
+* :doc:`CodeGenerator` --- a guide to the components (classes and code
+ generation algorithms) for translating the LLVM internal representation into
+ machine code for a specified target. Pay particular attention to the
+ descriptions of code generation stages: Instruction Selection, Scheduling and
+ Formation, SSA-based Optimization, Register Allocation, Prolog/Epilog Code
+ Insertion, Late Machine Code Optimizations, and Code Emission.
+
+* :doc:`TableGen/index` --- a document that describes the TableGen
+ (``tblgen``) application that manages domain-specific information to support
+ LLVM code generation. TableGen processes input from a target description
+ file (``.td`` suffix) and generates C++ code that can be used for code
+ generation.
+
+* :doc:`WritingAnLLVMPass` --- The assembly printer is a ``FunctionPass``, as
+ are several ``SelectionDAG`` processing steps.
+
+To follow the SPARC examples in this document, have a copy of `The SPARC
+Architecture Manual, Version 8 <http://www.sparc.org/standards/V8.pdf>`_ for
+reference. For details about the ARM instruction set, refer to the `ARM
+Architecture Reference Manual <http://infocenter.arm.com/>`_. For more about
+the GNU Assembler format (``GAS``), see `Using As
+<http://sourceware.org/binutils/docs/as/index.html>`_, especially for the
+assembly printer. "Using As" contains a list of target machine dependent
+features.
+
+Basic Steps
+-----------
+
+To write a compiler backend for LLVM that converts the LLVM IR to code for a
+specified target (machine or other language), follow these steps:
+
+* Create a subclass of the ``TargetMachine`` class that describes
+ characteristics of your target machine. Copy existing examples of specific
+ ``TargetMachine`` class and header files; for example, start with
+ ``SparcTargetMachine.cpp`` and ``SparcTargetMachine.h``, but change the file
+ names for your target. Similarly, change code that references "``Sparc``" to
+ reference your target.
+
+* Describe the register set of the target. Use TableGen to generate code for
+ register definition, register aliases, and register classes from a
+ target-specific ``RegisterInfo.td`` input file. You should also write
+ additional code for a subclass of the ``TargetRegisterInfo`` class that
+ represents the class register file data used for register allocation and also
+ describes the interactions between registers.
+
+* Describe the instruction set of the target. Use TableGen to generate code
+ for target-specific instructions from target-specific versions of
+ ``TargetInstrFormats.td`` and ``TargetInstrInfo.td``. You should write
+ additional code for a subclass of the ``TargetInstrInfo`` class to represent
+ machine instructions supported by the target machine.
+
+* Describe the selection and conversion of the LLVM IR from a Directed Acyclic
+ Graph (DAG) representation of instructions to native target-specific
+ instructions. Use TableGen to generate code that matches patterns and
+ selects instructions based on additional information in a target-specific
+ version of ``TargetInstrInfo.td``. Write code for ``XXXISelDAGToDAG.cpp``,
+ where ``XXX`` identifies the specific target, to perform pattern matching and
+ DAG-to-DAG instruction selection. Also write code in ``XXXISelLowering.cpp``
+ to replace or remove operations and data types that are not supported
+ natively in a SelectionDAG.
+
+* Write code for an assembly printer that converts LLVM IR to a GAS format for
+ your target machine. You should add assembly strings to the instructions
+ defined in your target-specific version of ``TargetInstrInfo.td``. You
+ should also write code for a subclass of ``AsmPrinter`` that performs the
+ LLVM-to-assembly conversion and a trivial subclass of ``TargetAsmInfo``.
+
+* Optionally, add support for subtargets (i.e., variants with different
+ capabilities). You should also write code for a subclass of the
+ ``TargetSubtarget`` class, which allows you to use the ``-mcpu=`` and
+ ``-mattr=`` command-line options.
+
+* Optionally, add JIT support and create a machine code emitter (subclass of
+ ``TargetJITInfo``) that is used to emit binary code directly into memory.
+
+In the ``.cpp`` and ``.h``. files, initially stub up these methods and then
+implement them later. Initially, you may not know which private members that
+the class will need and which components will need to be subclassed.
+
+Preliminaries
+-------------
+
+To actually create your compiler backend, you need to create and modify a few
+files. The absolute minimum is discussed here. But to actually use the LLVM
+target-independent code generator, you must perform the steps described in the
+:doc:`LLVM Target-Independent Code Generator <CodeGenerator>` document.
+
+First, you should create a subdirectory under ``lib/Target`` to hold all the
+files related to your target. If your target is called "Dummy", create the
+directory ``lib/Target/Dummy``.
+
+In this new directory, create a ``CMakeLists.txt``. It is easiest to copy a
+``CMakeLists.txt`` of another target and modify it. It should at least contain
+the ``LLVM_TARGET_DEFINITIONS`` variable. The library can be named ``LLVMDummy``
+(for example, see the MIPS target). Alternatively, you can split the library
+into ``LLVMDummyCodeGen`` and ``LLVMDummyAsmPrinter``, the latter of which
+should be implemented in a subdirectory below ``lib/Target/Dummy`` (for example,
+see the PowerPC target).
+
+Note that these two naming schemes are hardcoded into ``llvm-config``. Using
+any other naming scheme will confuse ``llvm-config`` and produce a lot of
+(seemingly unrelated) linker errors when linking ``llc``.
+
+To make your target actually do something, you need to implement a subclass of
+``TargetMachine``. This implementation should typically be in the file
+``lib/Target/DummyTargetMachine.cpp``, but any file in the ``lib/Target``
+directory will be built and should work. To use LLVM's target independent code
+generator, you should do what all current machine backends do: create a
+subclass of ``LLVMTargetMachine``. (To create a target from scratch, create a
+subclass of ``TargetMachine``.)
+
+To get LLVM to actually build and link your target, you need to run ``cmake``
+with ``-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=Dummy``. This will build your
+target without needing to add it to the list of all the targets.
+
+Once your target is stable, you can add it to the ``LLVM_ALL_TARGETS`` variable
+located in the main ``CMakeLists.txt``.
+
+Target Machine
+==============
+
+``LLVMTargetMachine`` is designed as a base class for targets implemented with
+the LLVM target-independent code generator. The ``LLVMTargetMachine`` class
+should be specialized by a concrete target class that implements the various
+virtual methods. ``LLVMTargetMachine`` is defined as a subclass of
+``TargetMachine`` in ``include/llvm/Target/TargetMachine.h``. The
+``TargetMachine`` class implementation (``TargetMachine.cpp``) also processes
+numerous command-line options.
+
+To create a concrete target-specific subclass of ``LLVMTargetMachine``, start
+by copying an existing ``TargetMachine`` class and header. You should name the
+files that you create to reflect your specific target. For instance, for the
+SPARC target, name the files ``SparcTargetMachine.h`` and
+``SparcTargetMachine.cpp``.
+
+For a target machine ``XXX``, the implementation of ``XXXTargetMachine`` must
+have access methods to obtain objects that represent target components. These
+methods are named ``get*Info``, and are intended to obtain the instruction set
+(``getInstrInfo``), register set (``getRegisterInfo``), stack frame layout
+(``getFrameInfo``), and similar information. ``XXXTargetMachine`` must also
+implement the ``getDataLayout`` method to access an object with target-specific
+data characteristics, such as data type size and alignment requirements.
+
+For instance, for the SPARC target, the header file ``SparcTargetMachine.h``
+declares prototypes for several ``get*Info`` and ``getDataLayout`` methods that
+simply return a class member.
+
+.. code-block:: c++
+
+ namespace llvm {
+
+ class Module;
+
+ class SparcTargetMachine : public LLVMTargetMachine {
+ const DataLayout DataLayout; // Calculates type size & alignment
+ SparcSubtarget Subtarget;
+ SparcInstrInfo InstrInfo;
+ TargetFrameInfo FrameInfo;
+
+ protected:
+ virtual const TargetAsmInfo *createTargetAsmInfo() const;
+
+ public:
+ SparcTargetMachine(const Module &M, const std::string &FS);
+
+ virtual const SparcInstrInfo *getInstrInfo() const {return &InstrInfo; }
+ virtual const TargetFrameInfo *getFrameInfo() const {return &FrameInfo; }
+ virtual const TargetSubtarget *getSubtargetImpl() const{return &Subtarget; }
+ virtual const TargetRegisterInfo *getRegisterInfo() const {
+ return &InstrInfo.getRegisterInfo();
+ }
+ virtual const DataLayout *getDataLayout() const { return &DataLayout; }
+ static unsigned getModuleMatchQuality(const Module &M);
+
+ // Pass Pipeline Configuration
+ virtual bool addInstSelector(PassManagerBase &PM, bool Fast);
+ virtual bool addPreEmitPass(PassManagerBase &PM, bool Fast);
+ };
+
+ } // end namespace llvm
+
+* ``getInstrInfo()``
+* ``getRegisterInfo()``
+* ``getFrameInfo()``
+* ``getDataLayout()``
+* ``getSubtargetImpl()``
+
+For some targets, you also need to support the following methods:
+
+* ``getTargetLowering()``
+* ``getJITInfo()``
+
+Some architectures, such as GPUs, do not support jumping to an arbitrary
+program location and implement branching using masked execution and loop using
+special instructions around the loop body. In order to avoid CFG modifications
+that introduce irreducible control flow not handled by such hardware, a target
+must call `setRequiresStructuredCFG(true)` when being initialized.
+
+In addition, the ``XXXTargetMachine`` constructor should specify a
+``TargetDescription`` string that determines the data layout for the target
+machine, including characteristics such as pointer size, alignment, and
+endianness. For example, the constructor for ``SparcTargetMachine`` contains
+the following:
+
+.. code-block:: c++
+
+ SparcTargetMachine::SparcTargetMachine(const Module &M, const std::string &FS)
+ : DataLayout("E-p:32:32-f128:128:128"),
+ Subtarget(M, FS), InstrInfo(Subtarget),
+ FrameInfo(TargetFrameInfo::StackGrowsDown, 8, 0) {
+ }
+
+Hyphens separate portions of the ``TargetDescription`` string.
+
+* An upper-case "``E``" in the string indicates a big-endian target data model.
+ A lower-case "``e``" indicates little-endian.
+
+* "``p:``" is followed by pointer information: size, ABI alignment, and
+ preferred alignment. If only two figures follow "``p:``", then the first
+ value is pointer size, and the second value is both ABI and preferred
+ alignment.
+
+* Then a letter for numeric type alignment: "``i``", "``f``", "``v``", or
+ "``a``" (corresponding to integer, floating point, vector, or aggregate).
+ "``i``", "``v``", or "``a``" are followed by ABI alignment and preferred
+ alignment. "``f``" is followed by three values: the first indicates the size
+ of a long double, then ABI alignment, and then ABI preferred alignment.
+
+Target Registration
+===================
+
+You must also register your target with the ``TargetRegistry``, which is what
+other LLVM tools use to be able to lookup and use your target at runtime. The
+``TargetRegistry`` can be used directly, but for most targets there are helper
+templates which should take care of the work for you.
+
+All targets should declare a global ``Target`` object which is used to
+represent the target during registration. Then, in the target's ``TargetInfo``
+library, the target should define that object and use the ``RegisterTarget``
+template to register the target. For example, the Sparc registration code
+looks like this:
+
+.. code-block:: c++
+
+ Target llvm::getTheSparcTarget();
+
+ extern "C" void LLVMInitializeSparcTargetInfo() {
+ RegisterTarget<Triple::sparc, /*HasJIT=*/false>
+ X(getTheSparcTarget(), "sparc", "Sparc");
+ }
+
+This allows the ``TargetRegistry`` to look up the target by name or by target
+triple. In addition, most targets will also register additional features which
+are available in separate libraries. These registration steps are separate,
+because some clients may wish to only link in some parts of the target --- the
+JIT code generator does not require the use of the assembler printer, for
+example. Here is an example of registering the Sparc assembly printer:
+
+.. code-block:: c++
+
+ extern "C" void LLVMInitializeSparcAsmPrinter() {
+ RegisterAsmPrinter<SparcAsmPrinter> X(getTheSparcTarget());
+ }
+
+For more information, see "`llvm/Target/TargetRegistry.h
+</doxygen/TargetRegistry_8h-source.html>`_".
+
+Register Set and Register Classes
+=================================
+
+You should describe a concrete target-specific class that represents the
+register file of a target machine. This class is called ``XXXRegisterInfo``
+(where ``XXX`` identifies the target) and represents the class register file
+data that is used for register allocation. It also describes the interactions
+between registers.
+
+You also need to define register classes to categorize related registers. A
+register class should be added for groups of registers that are all treated the
+same way for some instruction. Typical examples are register classes for
+integer, floating-point, or vector registers. A register allocator allows an
+instruction to use any register in a specified register class to perform the
+instruction in a similar manner. Register classes allocate virtual registers
+to instructions from these sets, and register classes let the
+target-independent register allocator automatically choose the actual
+registers.
+
+Much of the code for registers, including register definition, register
+aliases, and register classes, is generated by TableGen from
+``XXXRegisterInfo.td`` input files and placed in ``XXXGenRegisterInfo.h.inc``
+and ``XXXGenRegisterInfo.inc`` output files. Some of the code in the
+implementation of ``XXXRegisterInfo`` requires hand-coding.
+
+Defining a Register
+-------------------
+
+The ``XXXRegisterInfo.td`` file typically starts with register definitions for
+a target machine. The ``Register`` class (specified in ``Target.td``) is used
+to define an object for each register. The specified string ``n`` becomes the
+``Name`` of the register. The basic ``Register`` object does not have any
+subregisters and does not specify any aliases.
+
+.. code-block:: text
+
+ class Register<string n> {
+ string Namespace = "";
+ string AsmName = n;
+ string Name = n;
+ int SpillSize = 0;
+ int SpillAlignment = 0;
+ list<Register> Aliases = [];
+ list<Register> SubRegs = [];
+ list<int> DwarfNumbers = [];
+ }
+
+For example, in the ``X86RegisterInfo.td`` file, there are register definitions
+that utilize the ``Register`` class, such as:
+
+.. code-block:: text
+
+ def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>;
+
+This defines the register ``AL`` and assigns it values (with ``DwarfRegNum``)
+that are used by ``gcc``, ``gdb``, or a debug information writer to identify a
+register. For register ``AL``, ``DwarfRegNum`` takes an array of 3 values
+representing 3 different modes: the first element is for X86-64, the second for
+exception handling (EH) on X86-32, and the third is generic. -1 is a special
+Dwarf number that indicates the gcc number is undefined, and -2 indicates the
+register number is invalid for this mode.
+
+From the previously described line in the ``X86RegisterInfo.td`` file, TableGen
+generates this code in the ``X86GenRegisterInfo.inc`` file:
+
+.. code-block:: c++
+
+ static const unsigned GR8[] = { X86::AL, ... };
+
+ const unsigned AL_AliasSet[] = { X86::AX, X86::EAX, X86::RAX, 0 };
+
+ const TargetRegisterDesc RegisterDescriptors[] = {
+ ...
+ { "AL", "AL", AL_AliasSet, Empty_SubRegsSet, Empty_SubRegsSet, AL_SuperRegsSet }, ...
+
+From the register info file, TableGen generates a ``TargetRegisterDesc`` object
+for each register. ``TargetRegisterDesc`` is defined in
+``include/llvm/Target/TargetRegisterInfo.h`` with the following fields:
+
+.. code-block:: c++
+
+ struct TargetRegisterDesc {
+ const char *AsmName; // Assembly language name for the register
+ const char *Name; // Printable name for the reg (for debugging)
+ const unsigned *AliasSet; // Register Alias Set
+ const unsigned *SubRegs; // Sub-register set
+ const unsigned *ImmSubRegs; // Immediate sub-register set
+ const unsigned *SuperRegs; // Super-register set
+ };
+
+TableGen uses the entire target description file (``.td``) to determine text
+names for the register (in the ``AsmName`` and ``Name`` fields of
+``TargetRegisterDesc``) and the relationships of other registers to the defined
+register (in the other ``TargetRegisterDesc`` fields). In this example, other
+definitions establish the registers "``AX``", "``EAX``", and "``RAX``" as
+aliases for one another, so TableGen generates a null-terminated array
+(``AL_AliasSet``) for this register alias set.
+
+The ``Register`` class is commonly used as a base class for more complex
+classes. In ``Target.td``, the ``Register`` class is the base for the
+``RegisterWithSubRegs`` class that is used to define registers that need to
+specify subregisters in the ``SubRegs`` list, as shown here:
+
+.. code-block:: text
+
+ class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> {
+ let SubRegs = subregs;
+ }
+
+In ``SparcRegisterInfo.td``, additional register classes are defined for SPARC:
+a ``Register`` subclass, ``SparcReg``, and further subclasses: ``Ri``, ``Rf``,
+and ``Rd``. SPARC registers are identified by 5-bit ID numbers, which is a
+feature common to these subclasses. Note the use of "``let``" expressions to
+override values that are initially defined in a superclass (such as ``SubRegs``
+field in the ``Rd`` class).
+
+.. code-block:: text
+
+ class SparcReg<string n> : Register<n> {
+ field bits<5> Num;
+ let Namespace = "SP";
+ }
+ // Ri - 32-bit integer registers
+ class Ri<bits<5> num, string n> :
+ SparcReg<n> {
+ let Num = num;
+ }
+ // Rf - 32-bit floating-point registers
+ class Rf<bits<5> num, string n> :
+ SparcReg<n> {
+ let Num = num;
+ }
+ // Rd - Slots in the FP register file for 64-bit floating-point values.
+ class Rd<bits<5> num, string n, list<Register> subregs> : SparcReg<n> {
+ let Num = num;
+ let SubRegs = subregs;
+ }
+
+In the ``SparcRegisterInfo.td`` file, there are register definitions that
+utilize these subclasses of ``Register``, such as:
+
+.. code-block:: text
+
+ def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>;
+ def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>;
+ ...
+ def F0 : Rf< 0, "F0">, DwarfRegNum<[32]>;
+ def F1 : Rf< 1, "F1">, DwarfRegNum<[33]>;
+ ...
+ def D0 : Rd< 0, "F0", [F0, F1]>, DwarfRegNum<[32]>;
+ def D1 : Rd< 2, "F2", [F2, F3]>, DwarfRegNum<[34]>;
+
+The last two registers shown above (``D0`` and ``D1``) are double-precision
+floating-point registers that are aliases for pairs of single-precision
+floating-point sub-registers. In addition to aliases, the sub-register and
+super-register relationships of the defined register are in fields of a
+register's ``TargetRegisterDesc``.
+
+Defining a Register Class
+-------------------------
+
+The ``RegisterClass`` class (specified in ``Target.td``) is used to define an
+object that represents a group of related registers and also defines the
+default allocation order of the registers. A target description file
+``XXXRegisterInfo.td`` that uses ``Target.td`` can construct register classes
+using the following class:
+
+.. code-block:: text
+
+ class RegisterClass<string namespace,
+ list<ValueType> regTypes, int alignment, dag regList> {
+ string Namespace = namespace;
+ list<ValueType> RegTypes = regTypes;
+ int Size = 0; // spill size, in bits; zero lets tblgen pick the size
+ int Alignment = alignment;
+
+ // CopyCost is the cost of copying a value between two registers
+ // default value 1 means a single instruction
+ // A negative value means copying is extremely expensive or impossible
+ int CopyCost = 1;
+ dag MemberList = regList;
+
+ // for register classes that are subregisters of this class
+ list<RegisterClass> SubRegClassList = [];
+
+ code MethodProtos = [{}]; // to insert arbitrary code
+ code MethodBodies = [{}];
+ }
+
+To define a ``RegisterClass``, use the following 4 arguments:
+
+* The first argument of the definition is the name of the namespace.
+
+* The second argument is a list of ``ValueType`` register type values that are
+ defined in ``include/llvm/CodeGen/ValueTypes.td``. Defined values include
+ integer types (such as ``i16``, ``i32``, and ``i1`` for Boolean),
+ floating-point types (``f32``, ``f64``), and vector types (for example,
+ ``v8i16`` for an ``8 x i16`` vector). All registers in a ``RegisterClass``
+ must have the same ``ValueType``, but some registers may store vector data in
+ different configurations. For example a register that can process a 128-bit
+ vector may be able to handle 16 8-bit integer elements, 8 16-bit integers, 4
+ 32-bit integers, and so on.
+
+* The third argument of the ``RegisterClass`` definition specifies the
+ alignment required of the registers when they are stored or loaded to
+ memory.
+
+* The final argument, ``regList``, specifies which registers are in this class.
+ If an alternative allocation order method is not specified, then ``regList``
+ also defines the order of allocation used by the register allocator. Besides
+ simply listing registers with ``(add R0, R1, ...)``, more advanced set
+ operators are available. See ``include/llvm/Target/Target.td`` for more
+ information.
+
+In ``SparcRegisterInfo.td``, three ``RegisterClass`` objects are defined:
+``FPRegs``, ``DFPRegs``, and ``IntRegs``. For all three register classes, the
+first argument defines the namespace with the string "``SP``". ``FPRegs``
+defines a group of 32 single-precision floating-point registers (``F0`` to
+``F31``); ``DFPRegs`` defines a group of 16 double-precision registers
+(``D0-D15``).
+
+.. code-block:: text
+
+ // F0, F1, F2, ..., F31
+ def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>;
+
+ def DFPRegs : RegisterClass<"SP", [f64], 64,
+ (add D0, D1, D2, D3, D4, D5, D6, D7, D8,
+ D9, D10, D11, D12, D13, D14, D15)>;
+
+ def IntRegs : RegisterClass<"SP", [i32], 32,
+ (add L0, L1, L2, L3, L4, L5, L6, L7,
+ I0, I1, I2, I3, I4, I5,
+ O0, O1, O2, O3, O4, O5, O7,
+ G1,
+ // Non-allocatable regs:
+ G2, G3, G4,
+ O6, // stack ptr
+ I6, // frame ptr
+ I7, // return address
+ G0, // constant zero
+ G5, G6, G7 // reserved for kernel
+ )>;
+
+Using ``SparcRegisterInfo.td`` with TableGen generates several output files
+that are intended for inclusion in other source code that you write.
+``SparcRegisterInfo.td`` generates ``SparcGenRegisterInfo.h.inc``, which should
+be included in the header file for the implementation of the SPARC register
+implementation that you write (``SparcRegisterInfo.h``). In
+``SparcGenRegisterInfo.h.inc`` a new structure is defined called
+``SparcGenRegisterInfo`` that uses ``TargetRegisterInfo`` as its base. It also
+specifies types, based upon the defined register classes: ``DFPRegsClass``,
+``FPRegsClass``, and ``IntRegsClass``.
+
+``SparcRegisterInfo.td`` also generates ``SparcGenRegisterInfo.inc``, which is
+included at the bottom of ``SparcRegisterInfo.cpp``, the SPARC register
+implementation. The code below shows only the generated integer registers and
+associated register classes. The order of registers in ``IntRegs`` reflects
+the order in the definition of ``IntRegs`` in the target description file.
+
+.. code-block:: c++
+
+ // IntRegs Register Class...
+ static const unsigned IntRegs[] = {
+ SP::L0, SP::L1, SP::L2, SP::L3, SP::L4, SP::L5,
+ SP::L6, SP::L7, SP::I0, SP::I1, SP::I2, SP::I3,
+ SP::I4, SP::I5, SP::O0, SP::O1, SP::O2, SP::O3,
+ SP::O4, SP::O5, SP::O7, SP::G1, SP::G2, SP::G3,
+ SP::G4, SP::O6, SP::I6, SP::I7, SP::G0, SP::G5,
+ SP::G6, SP::G7,
+ };
+
+ // IntRegsVTs Register Class Value Types...
+ static const MVT::ValueType IntRegsVTs[] = {
+ MVT::i32, MVT::Other
+ };
+
+ namespace SP { // Register class instances
+ DFPRegsClass DFPRegsRegClass;
+ FPRegsClass FPRegsRegClass;
+ IntRegsClass IntRegsRegClass;
+ ...
+ // IntRegs Sub-register Classes...
+ static const TargetRegisterClass* const IntRegsSubRegClasses [] = {
+ NULL
+ };
+ ...
+ // IntRegs Super-register Classes..
+ static const TargetRegisterClass* const IntRegsSuperRegClasses [] = {
+ NULL
+ };
+ ...
+ // IntRegs Register Class sub-classes...
+ static const TargetRegisterClass* const IntRegsSubclasses [] = {
+ NULL
+ };
+ ...
+ // IntRegs Register Class super-classes...
+ static const TargetRegisterClass* const IntRegsSuperclasses [] = {
+ NULL
+ };
+
+ IntRegsClass::IntRegsClass() : TargetRegisterClass(IntRegsRegClassID,
+ IntRegsVTs, IntRegsSubclasses, IntRegsSuperclasses, IntRegsSubRegClasses,
+ IntRegsSuperRegClasses, 4, 4, 1, IntRegs, IntRegs + 32) {}
+ }
+
+The register allocators will avoid using reserved registers, and callee saved
+registers are not used until all the volatile registers have been used. That
+is usually good enough, but in some cases it may be necessary to provide custom
+allocation orders.
+
+Implement a subclass of ``TargetRegisterInfo``
+----------------------------------------------
+
+The final step is to hand code portions of ``XXXRegisterInfo``, which
+implements the interface described in ``TargetRegisterInfo.h`` (see
+:ref:`TargetRegisterInfo`). These functions return ``0``, ``NULL``, or
+``false``, unless overridden. Here is a list of functions that are overridden
+for the SPARC implementation in ``SparcRegisterInfo.cpp``:
+
+* ``getCalleeSavedRegs`` --- Returns a list of callee-saved registers in the
+ order of the desired callee-save stack frame offset.
+
+* ``getReservedRegs`` --- Returns a bitset indexed by physical register
+ numbers, indicating if a particular register is unavailable.
+
+* ``hasFP`` --- Return a Boolean indicating if a function should have a
+ dedicated frame pointer register.
+
+* ``eliminateCallFramePseudoInstr`` --- If call frame setup or destroy pseudo
+ instructions are used, this can be called to eliminate them.
+
+* ``eliminateFrameIndex`` --- Eliminate abstract frame indices from
+ instructions that may use them.
+
+* ``emitPrologue`` --- Insert prologue code into the function.
+
+* ``emitEpilogue`` --- Insert epilogue code into the function.
+
+.. _instruction-set:
+
+Instruction Set
+===============
+
+During the early stages of code generation, the LLVM IR code is converted to a
+``SelectionDAG`` with nodes that are instances of the ``SDNode`` class
+containing target instructions. An ``SDNode`` has an opcode, operands, type
+requirements, and operation properties. For example, is an operation
+commutative, does an operation load from memory. The various operation node
+types are described in the ``include/llvm/CodeGen/SelectionDAGNodes.h`` file
+(values of the ``NodeType`` enum in the ``ISD`` namespace).
+
+TableGen uses the following target description (``.td``) input files to
+generate much of the code for instruction definition:
+
+* ``Target.td`` --- Where the ``Instruction``, ``Operand``, ``InstrInfo``, and
+ other fundamental classes are defined.
+
+* ``TargetSelectionDAG.td`` --- Used by ``SelectionDAG`` instruction selection
+ generators, contains ``SDTC*`` classes (selection DAG type constraint),
+ definitions of ``SelectionDAG`` nodes (such as ``imm``, ``cond``, ``bb``,
+ ``add``, ``fadd``, ``sub``), and pattern support (``Pattern``, ``Pat``,
+ ``PatFrag``, ``PatLeaf``, ``ComplexPattern``.
+
+* ``XXXInstrFormats.td`` --- Patterns for definitions of target-specific
+ instructions.
+
+* ``XXXInstrInfo.td`` --- Target-specific definitions of instruction templates,
+ condition codes, and instructions of an instruction set. For architecture
+ modifications, a different file name may be used. For example, for Pentium
+ with SSE instruction, this file is ``X86InstrSSE.td``, and for Pentium with
+ MMX, this file is ``X86InstrMMX.td``.
+
+There is also a target-specific ``XXX.td`` file, where ``XXX`` is the name of
+the target. The ``XXX.td`` file includes the other ``.td`` input files, but
+its contents are only directly important for subtargets.
+
+You should describe a concrete target-specific class ``XXXInstrInfo`` that
+represents machine instructions supported by a target machine.
+``XXXInstrInfo`` contains an array of ``XXXInstrDescriptor`` objects, each of
+which describes one instruction. An instruction descriptor defines:
+
+* Opcode mnemonic
+* Number of operands
+* List of implicit register definitions and uses
+* Target-independent properties (such as memory access, is commutable)
+* Target-specific flags
+
+The Instruction class (defined in ``Target.td``) is mostly used as a base for
+more complex instruction classes.
+
+.. code-block:: text
+
+ class Instruction {
+ string Namespace = "";
+ dag OutOperandList; // A dag containing the MI def operand list.
+ dag InOperandList; // A dag containing the MI use operand list.
+ string AsmString = ""; // The .s format to print the instruction with.
+ list<dag> Pattern; // Set to the DAG pattern for this instruction.
+ list<Register> Uses = [];
+ list<Register> Defs = [];
+ list<Predicate> Predicates = []; // predicates turned into isel match code
+ ... remainder not shown for space ...
+ }
+
+A ``SelectionDAG`` node (``SDNode``) should contain an object representing a
+target-specific instruction that is defined in ``XXXInstrInfo.td``. The
+instruction objects should represent instructions from the architecture manual
+of the target machine (such as the SPARC Architecture Manual for the SPARC
+target).
+
+A single instruction from the architecture manual is often modeled as multiple
+target instructions, depending upon its operands. For example, a manual might
+describe an add instruction that takes a register or an immediate operand. An
+LLVM target could model this with two instructions named ``ADDri`` and
+``ADDrr``.
+
+You should define a class for each instruction category and define each opcode
+as a subclass of the category with appropriate parameters such as the fixed
+binary encoding of opcodes and extended opcodes. You should map the register
+bits to the bits of the instruction in which they are encoded (for the JIT).
+Also you should specify how the instruction should be printed when the
+automatic assembly printer is used.
+
+As is described in the SPARC Architecture Manual, Version 8, there are three
+major 32-bit formats for instructions. Format 1 is only for the ``CALL``
+instruction. Format 2 is for branch on condition codes and ``SETHI`` (set high
+bits of a register) instructions. Format 3 is for other instructions.
+
+Each of these formats has corresponding classes in ``SparcInstrFormat.td``.
+``InstSP`` is a base class for other instruction classes. Additional base
+classes are specified for more precise formats: for example in
+``SparcInstrFormat.td``, ``F2_1`` is for ``SETHI``, and ``F2_2`` is for
+branches. There are three other base classes: ``F3_1`` for register/register
+operations, ``F3_2`` for register/immediate operations, and ``F3_3`` for
+floating-point operations. ``SparcInstrInfo.td`` also adds the base class
+``Pseudo`` for synthetic SPARC instructions.
+
+``SparcInstrInfo.td`` largely consists of operand and instruction definitions
+for the SPARC target. In ``SparcInstrInfo.td``, the following target
+description file entry, ``LDrr``, defines the Load Integer instruction for a
+Word (the ``LD`` SPARC opcode) from a memory address to a register. The first
+parameter, the value 3 (``11``\ :sub:`2`), is the operation value for this
+category of operation. The second parameter (``000000``\ :sub:`2`) is the
+specific operation value for ``LD``/Load Word. The third parameter is the
+output destination, which is a register operand and defined in the ``Register``
+target description file (``IntRegs``).
+
+.. code-block:: text
+
+ def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr),
+ "ld [$addr], $dst",
+ [(set i32:$dst, (load ADDRrr:$addr))]>;
+
+The fourth parameter is the input source, which uses the address operand
+``MEMrr`` that is defined earlier in ``SparcInstrInfo.td``:
+
+.. code-block:: text
+
+ def MEMrr : Operand<i32> {
+ let PrintMethod = "printMemOperand";
+ let MIOperandInfo = (ops IntRegs, IntRegs);
+ }
+
+The fifth parameter is a string that is used by the assembly printer and can be
+left as an empty string until the assembly printer interface is implemented.
+The sixth and final parameter is the pattern used to match the instruction
+during the SelectionDAG Select Phase described in :doc:`CodeGenerator`.
+This parameter is detailed in the next section, :ref:`instruction-selector`.
+
+Instruction class definitions are not overloaded for different operand types,
+so separate versions of instructions are needed for register, memory, or
+immediate value operands. For example, to perform a Load Integer instruction
+for a Word from an immediate operand to a register, the following instruction
+class is defined:
+
+.. code-block:: text
+
+ def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr),
+ "ld [$addr], $dst",
+ [(set i32:$dst, (load ADDRri:$addr))]>;
+
+Writing these definitions for so many similar instructions can involve a lot of
+cut and paste. In ``.td`` files, the ``multiclass`` directive enables the
+creation of templates to define several instruction classes at once (using the
+``defm`` directive). For example in ``SparcInstrInfo.td``, the ``multiclass``
+pattern ``F3_12`` is defined to create 2 instruction classes each time
+``F3_12`` is invoked:
+
+.. code-block:: text
+
+ multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> {
+ def rr : F3_1 <2, Op3Val,
+ (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
+ !strconcat(OpcStr, " $b, $c, $dst"),
+ [(set i32:$dst, (OpNode i32:$b, i32:$c))]>;
+ def ri : F3_2 <2, Op3Val,
+ (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c),
+ !strconcat(OpcStr, " $b, $c, $dst"),
+ [(set i32:$dst, (OpNode i32:$b, simm13:$c))]>;
+ }
+
+So when the ``defm`` directive is used for the ``XOR`` and ``ADD``
+instructions, as seen below, it creates four instruction objects: ``XORrr``,
+``XORri``, ``ADDrr``, and ``ADDri``.
+
+.. code-block:: text
+
+ defm XOR : F3_12<"xor", 0b000011, xor>;
+ defm ADD : F3_12<"add", 0b000000, add>;
+
+``SparcInstrInfo.td`` also includes definitions for condition codes that are
+referenced by branch instructions. The following definitions in
+``SparcInstrInfo.td`` indicate the bit location of the SPARC condition code.
+For example, the 10\ :sup:`th` bit represents the "greater than" condition for
+integers, and the 22\ :sup:`nd` bit represents the "greater than" condition for
+floats.
+
+.. code-block:: text
+
+ def ICC_NE : ICC_VAL< 9>; // Not Equal
+ def ICC_E : ICC_VAL< 1>; // Equal
+ def ICC_G : ICC_VAL<10>; // Greater
+ ...
+ def FCC_U : FCC_VAL<23>; // Unordered
+ def FCC_G : FCC_VAL<22>; // Greater
+ def FCC_UG : FCC_VAL<21>; // Unordered or Greater
+ ...
+
+(Note that ``Sparc.h`` also defines enums that correspond to the same SPARC
+condition codes. Care must be taken to ensure the values in ``Sparc.h``
+correspond to the values in ``SparcInstrInfo.td``. I.e., ``SPCC::ICC_NE = 9``,
+``SPCC::FCC_U = 23`` and so on.)
+
+Instruction Operand Mapping
+---------------------------
+
+The code generator backend maps instruction operands to fields in the
+instruction. Operands are assigned to unbound fields in the instruction in the
+order they are defined. Fields are bound when they are assigned a value. For
+example, the Sparc target defines the ``XNORrr`` instruction as a ``F3_1``
+format instruction having three operands.
+
+.. code-block:: text
+
+ def XNORrr : F3_1<2, 0b000111,
+ (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
+ "xnor $b, $c, $dst",
+ [(set i32:$dst, (not (xor i32:$b, i32:$c)))]>;
+
+The instruction templates in ``SparcInstrFormats.td`` show the base class for
+``F3_1`` is ``InstSP``.
+
+.. code-block:: text
+
+ class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction {
+ field bits<32> Inst;
+ let Namespace = "SP";
+ bits<2> op;
+ let Inst{31-30} = op;
+ dag OutOperandList = outs;
+ dag InOperandList = ins;
+ let AsmString = asmstr;
+ let Pattern = pattern;
+ }
+
+``InstSP`` leaves the ``op`` field unbound.
+
+.. code-block:: text
+
+ class F3<dag outs, dag ins, string asmstr, list<dag> pattern>
+ : InstSP<outs, ins, asmstr, pattern> {
+ bits<5> rd;
+ bits<6> op3;
+ bits<5> rs1;
+ let op{1} = 1; // Op = 2 or 3
+ let Inst{29-25} = rd;
+ let Inst{24-19} = op3;
+ let Inst{18-14} = rs1;
+ }
+
+``F3`` binds the ``op`` field and defines the ``rd``, ``op3``, and ``rs1``
+fields. ``F3`` format instructions will bind the operands ``rd``, ``op3``, and
+``rs1`` fields.
+
+.. code-block:: text
+
+ class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins,
+ string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> {
+ bits<8> asi = 0; // asi not currently used
+ bits<5> rs2;
+ let op = opVal;
+ let op3 = op3val;
+ let Inst{13} = 0; // i field = 0
+ let Inst{12-5} = asi; // address space identifier
+ let Inst{4-0} = rs2;
+ }
+
+``F3_1`` binds the ``op3`` field and defines the ``rs2`` fields. ``F3_1``
+format instructions will bind the operands to the ``rd``, ``rs1``, and ``rs2``
+fields. This results in the ``XNORrr`` instruction binding ``$dst``, ``$b``,
+and ``$c`` operands to the ``rd``, ``rs1``, and ``rs2`` fields respectively.
+
+Instruction Operand Name Mapping
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+TableGen will also generate a function called getNamedOperandIdx() which
+can be used to look up an operand's index in a MachineInstr based on its
+TableGen name. Setting the UseNamedOperandTable bit in an instruction's
+TableGen definition will add all of its operands to an enumeration in the
+llvm::XXX:OpName namespace and also add an entry for it into the OperandMap
+table, which can be queried using getNamedOperandIdx()
+
+.. code-block:: text
+
+ int DstIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::dst); // => 0
+ int BIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::b); // => 1
+ int CIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::c); // => 2
+ int DIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::d); // => -1
+
+ ...
+
+The entries in the OpName enum are taken verbatim from the TableGen definitions,
+so operands with lowercase names will have lower case entries in the enum.
+
+To include the getNamedOperandIdx() function in your backend, you will need
+to define a few preprocessor macros in XXXInstrInfo.cpp and XXXInstrInfo.h.
+For example:
+
+XXXInstrInfo.cpp:
+
+.. code-block:: c++
+
+ #define GET_INSTRINFO_NAMED_OPS // For getNamedOperandIdx() function
+ #include "XXXGenInstrInfo.inc"
+
+XXXInstrInfo.h:
+
+.. code-block:: c++
+
+ #define GET_INSTRINFO_OPERAND_ENUM // For OpName enum
+ #include "XXXGenInstrInfo.inc"
+
+ namespace XXX {
+ int16_t getNamedOperandIdx(uint16_t Opcode, uint16_t NamedIndex);
+ } // End namespace XXX
+
+Instruction Operand Types
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+TableGen will also generate an enumeration consisting of all named Operand
+types defined in the backend, in the llvm::XXX::OpTypes namespace.
+Some common immediate Operand types (for instance i8, i32, i64, f32, f64)
+are defined for all targets in ``include/llvm/Target/Target.td``, and are
+available in each Target's OpTypes enum. Also, only named Operand types appear
+in the enumeration: anonymous types are ignored.
+For example, the X86 backend defines ``brtarget`` and ``brtarget8``, both
+instances of the TableGen ``Operand`` class, which represent branch target
+operands:
+
+.. code-block:: text
+
+ def brtarget : Operand<OtherVT>;
+ def brtarget8 : Operand<OtherVT>;
+
+This results in:
+
+.. code-block:: c++
+
+ namespace X86 {
+ namespace OpTypes {
+ enum OperandType {
+ ...
+ brtarget,
+ brtarget8,
+ ...
+ i32imm,
+ i64imm,
+ ...
+ OPERAND_TYPE_LIST_END
+ } // End namespace OpTypes
+ } // End namespace X86
+
+In typical TableGen fashion, to use the enum, you will need to define a
+preprocessor macro:
+
+.. code-block:: c++
+
+ #define GET_INSTRINFO_OPERAND_TYPES_ENUM // For OpTypes enum
+ #include "XXXGenInstrInfo.inc"
+
+
+Instruction Scheduling
+----------------------
+
+Instruction itineraries can be queried using MCDesc::getSchedClass(). The
+value can be named by an enumeration in llvm::XXX::Sched namespace generated
+by TableGen in XXXGenInstrInfo.inc. The name of the schedule classes are
+the same as provided in XXXSchedule.td plus a default NoItinerary class.
+
+The schedule models are generated by TableGen by the SubtargetEmitter,
+using the ``CodeGenSchedModels`` class. This is distinct from the itinerary
+method of specifying machine resource use. The tool ``utils/schedcover.py``
+can be used to determine which instructions have been covered by the
+schedule model description and which haven't. The first step is to use the
+instructions below to create an output file. Then run ``schedcover.py`` on the
+output file:
+
+.. code-block:: shell
+
+ $ <src>/utils/schedcover.py <build>/lib/Target/AArch64/tblGenSubtarget.with
+ instruction, default, CortexA53Model, CortexA57Model, CycloneModel, ExynosM1Model, FalkorModel, KryoModel, ThunderX2T99Model, ThunderXT8XModel
+ ABSv16i8, WriteV, , , CyWriteV3, M1WriteNMISC1, FalkorWr_2VXVY_2cyc, KryoWrite_2cyc_XY_XY_150ln, ,
+ ABSv1i64, WriteV, , , CyWriteV3, M1WriteNMISC1, FalkorWr_1VXVY_2cyc, KryoWrite_2cyc_XY_noRSV_67ln, ,
+ ...
+
+To capture the debug output from generating a schedule model, change to the
+appropriate target directory and use the following command:
+command with the ``subtarget-emitter`` debug option:
+
+.. code-block:: shell
+
+ $ <build>/bin/llvm-tblgen -debug-only=subtarget-emitter -gen-subtarget \
+ -I <src>/lib/Target/<target> -I <src>/include \
+ -I <src>/lib/Target <src>/lib/Target/<target>/<target>.td \
+ -o <build>/lib/Target/<target>/<target>GenSubtargetInfo.inc.tmp \
+ > tblGenSubtarget.dbg 2>&1
+
+Where ``<build>`` is the build directory, ``src`` is the source directory,
+and ``<target>`` is the name of the target.
+To double check that the above command is what is needed, one can capture the
+exact TableGen command from a build by using:
+
+.. code-block:: shell
+
+ $ VERBOSE=1 make ...
+
+and search for ``llvm-tblgen`` commands in the output.
+
+
+Instruction Relation Mapping
+----------------------------
+
+This TableGen feature is used to relate instructions with each other. It is
+particularly useful when you have multiple instruction formats and need to
+switch between them after instruction selection. This entire feature is driven
+by relation models which can be defined in ``XXXInstrInfo.td`` files
+according to the target-specific instruction set. Relation models are defined
+using ``InstrMapping`` class as a base. TableGen parses all the models
+and generates instruction relation maps using the specified information.
+Relation maps are emitted as tables in the ``XXXGenInstrInfo.inc`` file
+along with the functions to query them. For the detailed information on how to
+use this feature, please refer to :doc:`HowToUseInstrMappings`.
+
+Implement a subclass of ``TargetInstrInfo``
+-------------------------------------------
+
+The final step is to hand code portions of ``XXXInstrInfo``, which implements
+the interface described in ``TargetInstrInfo.h`` (see :ref:`TargetInstrInfo`).
+These functions return ``0`` or a Boolean or they assert, unless overridden.
+Here's a list of functions that are overridden for the SPARC implementation in
+``SparcInstrInfo.cpp``:
+
+* ``isLoadFromStackSlot`` --- If the specified machine instruction is a direct
+ load from a stack slot, return the register number of the destination and the
+ ``FrameIndex`` of the stack slot.
+
+* ``isStoreToStackSlot`` --- If the specified machine instruction is a direct
+ store to a stack slot, return the register number of the destination and the
+ ``FrameIndex`` of the stack slot.
+
+* ``copyPhysReg`` --- Copy values between a pair of physical registers.
+
+* ``storeRegToStackSlot`` --- Store a register value to a stack slot.
+
+* ``loadRegFromStackSlot`` --- Load a register value from a stack slot.
+
+* ``storeRegToAddr`` --- Store a register value to memory.
+
+* ``loadRegFromAddr`` --- Load a register value from memory.
+
+* ``foldMemoryOperand`` --- Attempt to combine instructions of any load or
+ store instruction for the specified operand(s).
+
+Branch Folding and If Conversion
+--------------------------------
+
+Performance can be improved by combining instructions or by eliminating
+instructions that are never reached. The ``AnalyzeBranch`` method in
+``XXXInstrInfo`` may be implemented to examine conditional instructions and
+remove unnecessary instructions. ``AnalyzeBranch`` looks at the end of a
+machine basic block (MBB) for opportunities for improvement, such as branch
+folding and if conversion. The ``BranchFolder`` and ``IfConverter`` machine
+function passes (see the source files ``BranchFolding.cpp`` and
+``IfConversion.cpp`` in the ``lib/CodeGen`` directory) call ``AnalyzeBranch``
+to improve the control flow graph that represents the instructions.
+
+Several implementations of ``AnalyzeBranch`` (for ARM, Alpha, and X86) can be
+examined as models for your own ``AnalyzeBranch`` implementation. Since SPARC
+does not implement a useful ``AnalyzeBranch``, the ARM target implementation is
+shown below.
+
+``AnalyzeBranch`` returns a Boolean value and takes four parameters:
+
+* ``MachineBasicBlock &MBB`` --- The incoming block to be examined.
+
+* ``MachineBasicBlock *&TBB`` --- A destination block that is returned. For a
+ conditional branch that evaluates to true, ``TBB`` is the destination.
+
+* ``MachineBasicBlock *&FBB`` --- For a conditional branch that evaluates to
+ false, ``FBB`` is returned as the destination.
+
+* ``std::vector<MachineOperand> &Cond`` --- List of operands to evaluate a
+ condition for a conditional branch.
+
+In the simplest case, if a block ends without a branch, then it falls through
+to the successor block. No destination blocks are specified for either ``TBB``
+or ``FBB``, so both parameters return ``NULL``. The start of the
+``AnalyzeBranch`` (see code below for the ARM target) shows the function
+parameters and the code for the simplest case.
+
+.. code-block:: c++
+
+ bool ARMInstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
+ MachineBasicBlock *&TBB,
+ MachineBasicBlock *&FBB,
+ std::vector<MachineOperand> &Cond) const
+ {
+ MachineBasicBlock::iterator I = MBB.end();
+ if (I == MBB.begin() || !isUnpredicatedTerminator(--I))
+ return false;
+
+If a block ends with a single unconditional branch instruction, then
+``AnalyzeBranch`` (shown below) should return the destination of that branch in
+the ``TBB`` parameter.
+
+.. code-block:: c++
+
+ if (LastOpc == ARM::B || LastOpc == ARM::tB) {
+ TBB = LastInst->getOperand(0).getMBB();
+ return false;
+ }
+
+If a block ends with two unconditional branches, then the second branch is
+never reached. In that situation, as shown below, remove the last branch
+instruction and return the penultimate branch in the ``TBB`` parameter.
+
+.. code-block:: c++
+
+ if ((SecondLastOpc == ARM::B || SecondLastOpc == ARM::tB) &&
+ (LastOpc == ARM::B || LastOpc == ARM::tB)) {
+ TBB = SecondLastInst->getOperand(0).getMBB();
+ I = LastInst;
+ I->eraseFromParent();
+ return false;
+ }
+
+A block may end with a single conditional branch instruction that falls through
+to successor block if the condition evaluates to false. In that case,
+``AnalyzeBranch`` (shown below) should return the destination of that
+conditional branch in the ``TBB`` parameter and a list of operands in the
+``Cond`` parameter to evaluate the condition.
+
+.. code-block:: c++
+
+ if (LastOpc == ARM::Bcc || LastOpc == ARM::tBcc) {
+ // Block ends with fall-through condbranch.
+ TBB = LastInst->getOperand(0).getMBB();
+ Cond.push_back(LastInst->getOperand(1));
+ Cond.push_back(LastInst->getOperand(2));
+ return false;
+ }
+
+If a block ends with both a conditional branch and an ensuing unconditional
+branch, then ``AnalyzeBranch`` (shown below) should return the conditional
+branch destination (assuming it corresponds to a conditional evaluation of
+"``true``") in the ``TBB`` parameter and the unconditional branch destination
+in the ``FBB`` (corresponding to a conditional evaluation of "``false``"). A
+list of operands to evaluate the condition should be returned in the ``Cond``
+parameter.
+
+.. code-block:: c++
+
+ unsigned SecondLastOpc = SecondLastInst->getOpcode();
+
+ if ((SecondLastOpc == ARM::Bcc && LastOpc == ARM::B) ||
+ (SecondLastOpc == ARM::tBcc && LastOpc == ARM::tB)) {
+ TBB = SecondLastInst->getOperand(0).getMBB();
+ Cond.push_back(SecondLastInst->getOperand(1));
+ Cond.push_back(SecondLastInst->getOperand(2));
+ FBB = LastInst->getOperand(0).getMBB();
+ return false;
+ }
+
+For the last two cases (ending with a single conditional branch or ending with
+one conditional and one unconditional branch), the operands returned in the
+``Cond`` parameter can be passed to methods of other instructions to create new
+branches or perform other operations. An implementation of ``AnalyzeBranch``
+requires the helper methods ``RemoveBranch`` and ``InsertBranch`` to manage
+subsequent operations.
+
+``AnalyzeBranch`` should return false indicating success in most circumstances.
+``AnalyzeBranch`` should only return true when the method is stumped about what
+to do, for example, if a block has three terminating branches.
+``AnalyzeBranch`` may return true if it encounters a terminator it cannot
+handle, such as an indirect branch.
+
+.. _instruction-selector:
+
+Instruction Selector
+====================
+
+LLVM uses a ``SelectionDAG`` to represent LLVM IR instructions, and nodes of
+the ``SelectionDAG`` ideally represent native target instructions. During code
+generation, instruction selection passes are performed to convert non-native
+DAG instructions into native target-specific instructions. The pass described
+in ``XXXISelDAGToDAG.cpp`` is used to match patterns and perform DAG-to-DAG
+instruction selection. Optionally, a pass may be defined (in
+``XXXBranchSelector.cpp``) to perform similar DAG-to-DAG operations for branch
+instructions. Later, the code in ``XXXISelLowering.cpp`` replaces or removes
+operations and data types not supported natively (legalizes) in a
+``SelectionDAG``.
+
+TableGen generates code for instruction selection using the following target
+description input files:
+
+* ``XXXInstrInfo.td`` --- Contains definitions of instructions in a
+ target-specific instruction set, generates ``XXXGenDAGISel.inc``, which is
+ included in ``XXXISelDAGToDAG.cpp``.
+
+* ``XXXCallingConv.td`` --- Contains the calling and return value conventions
+ for the target architecture, and it generates ``XXXGenCallingConv.inc``,
+ which is included in ``XXXISelLowering.cpp``.
+
+The implementation of an instruction selection pass must include a header that
+declares the ``FunctionPass`` class or a subclass of ``FunctionPass``. In
+``XXXTargetMachine.cpp``, a Pass Manager (PM) should add each instruction
+selection pass into the queue of passes to run.
+
+The LLVM static compiler (``llc``) is an excellent tool for visualizing the
+contents of DAGs. To display the ``SelectionDAG`` before or after specific
+processing phases, use the command line options for ``llc``, described at
+:ref:`SelectionDAG-Process`.
+
+To describe instruction selector behavior, you should add patterns for lowering
+LLVM code into a ``SelectionDAG`` as the last parameter of the instruction
+definitions in ``XXXInstrInfo.td``. For example, in ``SparcInstrInfo.td``,
+this entry defines a register store operation, and the last parameter describes
+a pattern with the store DAG operator.
+
+.. code-block:: text
+
+ def STrr : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src),
+ "st $src, [$addr]", [(store i32:$src, ADDRrr:$addr)]>;
+
+``ADDRrr`` is a memory mode that is also defined in ``SparcInstrInfo.td``:
+
+.. code-block:: text
+
+ def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>;
+
+The definition of ``ADDRrr`` refers to ``SelectADDRrr``, which is a function
+defined in an implementation of the Instructor Selector (such as
+``SparcISelDAGToDAG.cpp``).
+
+In ``lib/Target/TargetSelectionDAG.td``, the DAG operator for store is defined
+below:
+
+.. code-block:: text
+
+ def store : PatFrag<(ops node:$val, node:$ptr),
+ (st node:$val, node:$ptr), [{
+ if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N))
+ return !ST->isTruncatingStore() &&
+ ST->getAddressingMode() == ISD::UNINDEXED;
+ return false;
+ }]>;
+
+``XXXInstrInfo.td`` also generates (in ``XXXGenDAGISel.inc``) the
+``SelectCode`` method that is used to call the appropriate processing method
+for an instruction. In this example, ``SelectCode`` calls ``Select_ISD_STORE``
+for the ``ISD::STORE`` opcode.
+
+.. code-block:: c++
+
+ SDNode *SelectCode(SDValue N) {
+ ...
+ MVT::ValueType NVT = N.getNode()->getValueType(0);
+ switch (N.getOpcode()) {
+ case ISD::STORE: {
+ switch (NVT) {
+ default:
+ return Select_ISD_STORE(N);
+ break;
+ }
+ break;
+ }
+ ...
+
+The pattern for ``STrr`` is matched, so elsewhere in ``XXXGenDAGISel.inc``,
+code for ``STrr`` is created for ``Select_ISD_STORE``. The ``Emit_22`` method
+is also generated in ``XXXGenDAGISel.inc`` to complete the processing of this
+instruction.
+
+.. code-block:: c++
+
+ SDNode *Select_ISD_STORE(const SDValue &N) {
+ SDValue Chain = N.getOperand(0);
+ if (Predicate_store(N.getNode())) {
+ SDValue N1 = N.getOperand(1);
+ SDValue N2 = N.getOperand(2);
+ SDValue CPTmp0;
+ SDValue CPTmp1;
+
+ // Pattern: (st:void i32:i32:$src,
+ // ADDRrr:i32:$addr)<<P:Predicate_store>>
+ // Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src)
+ // Pattern complexity = 13 cost = 1 size = 0
+ if (SelectADDRrr(N, N2, CPTmp0, CPTmp1) &&
+ N1.getNode()->getValueType(0) == MVT::i32 &&
+ N2.getNode()->getValueType(0) == MVT::i32) {
+ return Emit_22(N, SP::STrr, CPTmp0, CPTmp1);
+ }
+ ...
+
+The SelectionDAG Legalize Phase
+-------------------------------
+
+The Legalize phase converts a DAG to use types and operations that are natively
+supported by the target. For natively unsupported types and operations, you
+need to add code to the target-specific ``XXXTargetLowering`` implementation to
+convert unsupported types and operations to supported ones.
+
+In the constructor for the ``XXXTargetLowering`` class, first use the
+``addRegisterClass`` method to specify which types are supported and which
+register classes are associated with them. The code for the register classes
+are generated by TableGen from ``XXXRegisterInfo.td`` and placed in
+``XXXGenRegisterInfo.h.inc``. For example, the implementation of the
+constructor for the SparcTargetLowering class (in ``SparcISelLowering.cpp``)
+starts with the following code:
+
+.. code-block:: c++
+
+ addRegisterClass(MVT::i32, SP::IntRegsRegisterClass);
+ addRegisterClass(MVT::f32, SP::FPRegsRegisterClass);
+ addRegisterClass(MVT::f64, SP::DFPRegsRegisterClass);
+
+You should examine the node types in the ``ISD`` namespace
+(``include/llvm/CodeGen/SelectionDAGNodes.h``) and determine which operations
+the target natively supports. For operations that do **not** have native
+support, add a callback to the constructor for the ``XXXTargetLowering`` class,
+so the instruction selection process knows what to do. The ``TargetLowering``
+class callback methods (declared in ``llvm/Target/TargetLowering.h``) are:
+
+* ``setOperationAction`` --- General operation.
+* ``setLoadExtAction`` --- Load with extension.
+* ``setTruncStoreAction`` --- Truncating store.
+* ``setIndexedLoadAction`` --- Indexed load.
+* ``setIndexedStoreAction`` --- Indexed store.
+* ``setConvertAction`` --- Type conversion.
+* ``setCondCodeAction`` --- Support for a given condition code.
+
+Note: on older releases, ``setLoadXAction`` is used instead of
+``setLoadExtAction``. Also, on older releases, ``setCondCodeAction`` may not
+be supported. Examine your release to see what methods are specifically
+supported.
+
+These callbacks are used to determine that an operation does or does not work
+with a specified type (or types). And in all cases, the third parameter is a
+``LegalAction`` type enum value: ``Promote``, ``Expand``, ``Custom``, or
+``Legal``. ``SparcISelLowering.cpp`` contains examples of all four
+``LegalAction`` values.
+
+Promote
+^^^^^^^
+
+For an operation without native support for a given type, the specified type
+may be promoted to a larger type that is supported. For example, SPARC does
+not support a sign-extending load for Boolean values (``i1`` type), so in
+``SparcISelLowering.cpp`` the third parameter below, ``Promote``, changes
+``i1`` type values to a large type before loading.
+
+.. code-block:: c++
+
+ setLoadExtAction(ISD::SEXTLOAD, MVT::i1, Promote);
+
+Expand
+^^^^^^
+
+For a type without native support, a value may need to be broken down further,
+rather than promoted. For an operation without native support, a combination
+of other operations may be used to similar effect. In SPARC, the
+floating-point sine and cosine trig operations are supported by expansion to
+other operations, as indicated by the third parameter, ``Expand``, to
+``setOperationAction``:
+
+.. code-block:: c++
+
+ setOperationAction(ISD::FSIN, MVT::f32, Expand);
+ setOperationAction(ISD::FCOS, MVT::f32, Expand);
+
+Custom
+^^^^^^
+
+For some operations, simple type promotion or operation expansion may be
+insufficient. In some cases, a special intrinsic function must be implemented.
+
+For example, a constant value may require special treatment, or an operation
+may require spilling and restoring registers in the stack and working with
+register allocators.
+
+As seen in ``SparcISelLowering.cpp`` code below, to perform a type conversion
+from a floating point value to a signed integer, first the
+``setOperationAction`` should be called with ``Custom`` as the third parameter:
+
+.. code-block:: c++
+
+ setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);
+
+In the ``LowerOperation`` method, for each ``Custom`` operation, a case
+statement should be added to indicate what function to call. In the following
+code, an ``FP_TO_SINT`` opcode will call the ``LowerFP_TO_SINT`` method:
+
+.. code-block:: c++
+
+ SDValue SparcTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) {
+ switch (Op.getOpcode()) {
+ case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
+ ...
+ }
+ }
+
+Finally, the ``LowerFP_TO_SINT`` method is implemented, using an FP register to
+convert the floating-point value to an integer.
+
+.. code-block:: c++
+
+ static SDValue LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) {
+ assert(Op.getValueType() == MVT::i32);
+ Op = DAG.getNode(SPISD::FTOI, MVT::f32, Op.getOperand(0));
+ return DAG.getNode(ISD::BITCAST, MVT::i32, Op);
+ }
+
+Legal
+^^^^^
+
+The ``Legal`` ``LegalizeAction`` enum value simply indicates that an operation
+**is** natively supported. ``Legal`` represents the default condition, so it
+is rarely used. In ``SparcISelLowering.cpp``, the action for ``CTPOP`` (an
+operation to count the bits set in an integer) is natively supported only for
+SPARC v9. The following code enables the ``Expand`` conversion technique for
+non-v9 SPARC implementations.
+
+.. code-block:: c++
+
+ setOperationAction(ISD::CTPOP, MVT::i32, Expand);
+ ...
+ if (TM.getSubtarget<SparcSubtarget>().isV9())
+ setOperationAction(ISD::CTPOP, MVT::i32, Legal);
+
+Calling Conventions
+-------------------
+
+To support target-specific calling conventions, ``XXXGenCallingConv.td`` uses
+interfaces (such as ``CCIfType`` and ``CCAssignToReg``) that are defined in
+``lib/Target/TargetCallingConv.td``. TableGen can take the target descriptor
+file ``XXXGenCallingConv.td`` and generate the header file
+``XXXGenCallingConv.inc``, which is typically included in
+``XXXISelLowering.cpp``. You can use the interfaces in
+``TargetCallingConv.td`` to specify:
+
+* The order of parameter allocation.
+
+* Where parameters and return values are placed (that is, on the stack or in
+ registers).
+
+* Which registers may be used.
+
+* Whether the caller or callee unwinds the stack.
+
+The following example demonstrates the use of the ``CCIfType`` and
+``CCAssignToReg`` interfaces. If the ``CCIfType`` predicate is true (that is,
+if the current argument is of type ``f32`` or ``f64``), then the action is
+performed. In this case, the ``CCAssignToReg`` action assigns the argument
+value to the first available register: either ``R0`` or ``R1``.
+
+.. code-block:: text
+
+ CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>>
+
+``SparcCallingConv.td`` contains definitions for a target-specific return-value
+calling convention (``RetCC_Sparc32``) and a basic 32-bit C calling convention
+(``CC_Sparc32``). The definition of ``RetCC_Sparc32`` (shown below) indicates
+which registers are used for specified scalar return types. A single-precision
+float is returned to register ``F0``, and a double-precision float goes to
+register ``D0``. A 32-bit integer is returned in register ``I0`` or ``I1``.
+
+.. code-block:: text
+
+ def RetCC_Sparc32 : CallingConv<[
+ CCIfType<[i32], CCAssignToReg<[I0, I1]>>,
+ CCIfType<[f32], CCAssignToReg<[F0]>>,
+ CCIfType<[f64], CCAssignToReg<[D0]>>
+ ]>;
+
+The definition of ``CC_Sparc32`` in ``SparcCallingConv.td`` introduces
+``CCAssignToStack``, which assigns the value to a stack slot with the specified
+size and alignment. In the example below, the first parameter, 4, indicates
+the size of the slot, and the second parameter, also 4, indicates the stack
+alignment along 4-byte units. (Special cases: if size is zero, then the ABI
+size is used; if alignment is zero, then the ABI alignment is used.)
+
+.. code-block:: text
+
+ def CC_Sparc32 : CallingConv<[
+ // All arguments get passed in integer registers if there is space.
+ CCIfType<[i32, f32, f64], CCAssignToReg<[I0, I1, I2, I3, I4, I5]>>,
+ CCAssignToStack<4, 4>
+ ]>;
+
+``CCDelegateTo`` is another commonly used interface, which tries to find a
+specified sub-calling convention, and, if a match is found, it is invoked. In
+the following example (in ``X86CallingConv.td``), the definition of
+``RetCC_X86_32_C`` ends with ``CCDelegateTo``. After the current value is
+assigned to the register ``ST0`` or ``ST1``, the ``RetCC_X86Common`` is
+invoked.
+
+.. code-block:: text
+
+ def RetCC_X86_32_C : CallingConv<[
+ CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>,
+ CCIfType<[f64], CCAssignToReg<[ST0, ST1]>>,
+ CCDelegateTo<RetCC_X86Common>
+ ]>;
+
+``CCIfCC`` is an interface that attempts to match the given name to the current
+calling convention. If the name identifies the current calling convention,
+then a specified action is invoked. In the following example (in
+``X86CallingConv.td``), if the ``Fast`` calling convention is in use, then
+``RetCC_X86_32_Fast`` is invoked. If the ``SSECall`` calling convention is in
+use, then ``RetCC_X86_32_SSE`` is invoked.
+
+.. code-block:: text
+
+ def RetCC_X86_32 : CallingConv<[
+ CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>,
+ CCIfCC<"CallingConv::X86_SSECall", CCDelegateTo<RetCC_X86_32_SSE>>,
+ CCDelegateTo<RetCC_X86_32_C>
+ ]>;
+
+Other calling convention interfaces include:
+
+* ``CCIf <predicate, action>`` --- If the predicate matches, apply the action.
+
+* ``CCIfInReg <action>`` --- If the argument is marked with the "``inreg``"
+ attribute, then apply the action.
+
+* ``CCIfNest <action>`` --- If the argument is marked with the "``nest``"
+ attribute, then apply the action.
+
+* ``CCIfNotVarArg <action>`` --- If the current function does not take a
+ variable number of arguments, apply the action.
+
+* ``CCAssignToRegWithShadow <registerList, shadowList>`` --- similar to
+ ``CCAssignToReg``, but with a shadow list of registers.
+
+* ``CCPassByVal <size, align>`` --- Assign value to a stack slot with the
+ minimum specified size and alignment.
+
+* ``CCPromoteToType <type>`` --- Promote the current value to the specified
+ type.
+
+* ``CallingConv <[actions]>`` --- Define each calling convention that is
+ supported.
+
+Assembly Printer
+================
+
+During the code emission stage, the code generator may utilize an LLVM pass to
+produce assembly output. To do this, you want to implement the code for a
+printer that converts LLVM IR to a GAS-format assembly language for your target
+machine, using the following steps:
+
+* Define all the assembly strings for your target, adding them to the
+ instructions defined in the ``XXXInstrInfo.td`` file. (See
+ :ref:`instruction-set`.) TableGen will produce an output file
+ (``XXXGenAsmWriter.inc``) with an implementation of the ``printInstruction``
+ method for the ``XXXAsmPrinter`` class.
+
+* Write ``XXXTargetAsmInfo.h``, which contains the bare-bones declaration of
+ the ``XXXTargetAsmInfo`` class (a subclass of ``TargetAsmInfo``).
+
+* Write ``XXXTargetAsmInfo.cpp``, which contains target-specific values for
+ ``TargetAsmInfo`` properties and sometimes new implementations for methods.
+
+* Write ``XXXAsmPrinter.cpp``, which implements the ``AsmPrinter`` class that
+ performs the LLVM-to-assembly conversion.
+
+The code in ``XXXTargetAsmInfo.h`` is usually a trivial declaration of the
+``XXXTargetAsmInfo`` class for use in ``XXXTargetAsmInfo.cpp``. Similarly,
+``XXXTargetAsmInfo.cpp`` usually has a few declarations of ``XXXTargetAsmInfo``
+replacement values that override the default values in ``TargetAsmInfo.cpp``.
+For example in ``SparcTargetAsmInfo.cpp``:
+
+.. code-block:: c++
+
+ SparcTargetAsmInfo::SparcTargetAsmInfo(const SparcTargetMachine &TM) {
+ Data16bitsDirective = "\t.half\t";
+ Data32bitsDirective = "\t.word\t";
+ Data64bitsDirective = 0; // .xword is only supported by V9.
+ ZeroDirective = "\t.skip\t";
+ CommentString = "!";
+ ConstantPoolSection = "\t.section \".rodata\",#alloc\n";
+ }
+
+The X86 assembly printer implementation (``X86TargetAsmInfo``) is an example
+where the target specific ``TargetAsmInfo`` class uses an overridden methods:
+``ExpandInlineAsm``.
+
+A target-specific implementation of ``AsmPrinter`` is written in
+``XXXAsmPrinter.cpp``, which implements the ``AsmPrinter`` class that converts
+the LLVM to printable assembly. The implementation must include the following
+headers that have declarations for the ``AsmPrinter`` and
+``MachineFunctionPass`` classes. The ``MachineFunctionPass`` is a subclass of
+``FunctionPass``.
+
+.. code-block:: c++
+
+ #include "llvm/CodeGen/AsmPrinter.h"
+ #include "llvm/CodeGen/MachineFunctionPass.h"
+
+As a ``FunctionPass``, ``AsmPrinter`` first calls ``doInitialization`` to set
+up the ``AsmPrinter``. In ``SparcAsmPrinter``, a ``Mangler`` object is
+instantiated to process variable names.
+
+In ``XXXAsmPrinter.cpp``, the ``runOnMachineFunction`` method (declared in
+``MachineFunctionPass``) must be implemented for ``XXXAsmPrinter``. In
+``MachineFunctionPass``, the ``runOnFunction`` method invokes
+``runOnMachineFunction``. Target-specific implementations of
+``runOnMachineFunction`` differ, but generally do the following to process each
+machine function:
+
+* Call ``SetupMachineFunction`` to perform initialization.
+
+* Call ``EmitConstantPool`` to print out (to the output stream) constants which
+ have been spilled to memory.
+
+* Call ``EmitJumpTableInfo`` to print out jump tables used by the current
+ function.
+
+* Print out the label for the current function.
+
+* Print out the code for the function, including basic block labels and the
+ assembly for the instruction (using ``printInstruction``)
+
+The ``XXXAsmPrinter`` implementation must also include the code generated by
+TableGen that is output in the ``XXXGenAsmWriter.inc`` file. The code in
+``XXXGenAsmWriter.inc`` contains an implementation of the ``printInstruction``
+method that may call these methods:
+
+* ``printOperand``
+* ``printMemOperand``
+* ``printCCOperand`` (for conditional statements)
+* ``printDataDirective``
+* ``printDeclare``
+* ``printImplicitDef``
+* ``printInlineAsm``
+
+The implementations of ``printDeclare``, ``printImplicitDef``,
+``printInlineAsm``, and ``printLabel`` in ``AsmPrinter.cpp`` are generally
+adequate for printing assembly and do not need to be overridden.
+
+The ``printOperand`` method is implemented with a long ``switch``/``case``
+statement for the type of operand: register, immediate, basic block, external
+symbol, global address, constant pool index, or jump table index. For an
+instruction with a memory address operand, the ``printMemOperand`` method
+should be implemented to generate the proper output. Similarly,
+``printCCOperand`` should be used to print a conditional operand.
+
+``doFinalization`` should be overridden in ``XXXAsmPrinter``, and it should be
+called to shut down the assembly printer. During ``doFinalization``, global
+variables and constants are printed to output.
+
+Subtarget Support
+=================
+
+Subtarget support is used to inform the code generation process of instruction
+set variations for a given chip set. For example, the LLVM SPARC
+implementation provided covers three major versions of the SPARC microprocessor
+architecture: Version 8 (V8, which is a 32-bit architecture), Version 9 (V9, a
+64-bit architecture), and the UltraSPARC architecture. V8 has 16
+double-precision floating-point registers that are also usable as either 32
+single-precision or 8 quad-precision registers. V8 is also purely big-endian.
+V9 has 32 double-precision floating-point registers that are also usable as 16
+quad-precision registers, but cannot be used as single-precision registers.
+The UltraSPARC architecture combines V9 with UltraSPARC Visual Instruction Set
+extensions.
+
+If subtarget support is needed, you should implement a target-specific
+``XXXSubtarget`` class for your architecture. This class should process the
+command-line options ``-mcpu=`` and ``-mattr=``.
+
+TableGen uses definitions in the ``Target.td`` and ``Sparc.td`` files to
+generate code in ``SparcGenSubtarget.inc``. In ``Target.td``, shown below, the
+``SubtargetFeature`` interface is defined. The first 4 string parameters of
+the ``SubtargetFeature`` interface are a feature name, an attribute set by the
+feature, the value of the attribute, and a description of the feature. (The
+fifth parameter is a list of features whose presence is implied, and its
+default value is an empty array.)
+
+.. code-block:: text
+
+ class SubtargetFeature<string n, string a, string v, string d,
+ list<SubtargetFeature> i = []> {
+ string Name = n;
+ string Attribute = a;
+ string Value = v;
+ string Desc = d;
+ list<SubtargetFeature> Implies = i;
+ }
+
+In the ``Sparc.td`` file, the ``SubtargetFeature`` is used to define the
+following features.
+
+.. code-block:: text
+
+ def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true",
+ "Enable SPARC-V9 instructions">;
+ def FeatureV8Deprecated : SubtargetFeature<"deprecated-v8",
+ "V8DeprecatedInsts", "true",
+ "Enable deprecated V8 instructions in V9 mode">;
+ def FeatureVIS : SubtargetFeature<"vis", "IsVIS", "true",
+ "Enable UltraSPARC Visual Instruction Set extensions">;
+
+Elsewhere in ``Sparc.td``, the ``Proc`` class is defined and then is used to
+define particular SPARC processor subtypes that may have the previously
+described features.
+
+.. code-block:: text
+
+ class Proc<string Name, list<SubtargetFeature> Features>
+ : Processor<Name, NoItineraries, Features>;
+
+ def : Proc<"generic", []>;
+ def : Proc<"v8", []>;
+ def : Proc<"supersparc", []>;
+ def : Proc<"sparclite", []>;
+ def : Proc<"f934", []>;
+ def : Proc<"hypersparc", []>;
+ def : Proc<"sparclite86x", []>;
+ def : Proc<"sparclet", []>;
+ def : Proc<"tsc701", []>;
+ def : Proc<"v9", [FeatureV9]>;
+ def : Proc<"ultrasparc", [FeatureV9, FeatureV8Deprecated]>;
+ def : Proc<"ultrasparc3", [FeatureV9, FeatureV8Deprecated]>;
+ def : Proc<"ultrasparc3-vis", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>;
+
+From ``Target.td`` and ``Sparc.td`` files, the resulting
+``SparcGenSubtarget.inc`` specifies enum values to identify the features,
+arrays of constants to represent the CPU features and CPU subtypes, and the
+``ParseSubtargetFeatures`` method that parses the features string that sets
+specified subtarget options. The generated ``SparcGenSubtarget.inc`` file
+should be included in the ``SparcSubtarget.cpp``. The target-specific
+implementation of the ``XXXSubtarget`` method should follow this pseudocode:
+
+.. code-block:: c++
+
+ XXXSubtarget::XXXSubtarget(const Module &M, const std::string &FS) {
+ // Set the default features
+ // Determine default and user specified characteristics of the CPU
+ // Call ParseSubtargetFeatures(FS, CPU) to parse the features string
+ // Perform any additional operations
+ }
+
+JIT Support
+===========
+
+The implementation of a target machine optionally includes a Just-In-Time (JIT)
+code generator that emits machine code and auxiliary structures as binary
+output that can be written directly to memory. To do this, implement JIT code
+generation by performing the following steps:
+
+* Write an ``XXXCodeEmitter.cpp`` file that contains a machine function pass
+ that transforms target-machine instructions into relocatable machine
+ code.
+
+* Write an ``XXXJITInfo.cpp`` file that implements the JIT interfaces for
+ target-specific code-generation activities, such as emitting machine code and
+ stubs.
+
+* Modify ``XXXTargetMachine`` so that it provides a ``TargetJITInfo`` object
+ through its ``getJITInfo`` method.
+
+There are several different approaches to writing the JIT support code. For
+instance, TableGen and target descriptor files may be used for creating a JIT
+code generator, but are not mandatory. For the Alpha and PowerPC target
+machines, TableGen is used to generate ``XXXGenCodeEmitter.inc``, which
+contains the binary coding of machine instructions and the
+``getBinaryCodeForInstr`` method to access those codes. Other JIT
+implementations do not.
+
+Both ``XXXJITInfo.cpp`` and ``XXXCodeEmitter.cpp`` must include the
+``llvm/CodeGen/MachineCodeEmitter.h`` header file that defines the
+``MachineCodeEmitter`` class containing code for several callback functions
+that write data (in bytes, words, strings, etc.) to the output stream.
+
+Machine Code Emitter
+--------------------
+
+In ``XXXCodeEmitter.cpp``, a target-specific of the ``Emitter`` class is
+implemented as a function pass (subclass of ``MachineFunctionPass``). The
+target-specific implementation of ``runOnMachineFunction`` (invoked by
+``runOnFunction`` in ``MachineFunctionPass``) iterates through the
+``MachineBasicBlock`` calls ``emitInstruction`` to process each instruction and
+emit binary code. ``emitInstruction`` is largely implemented with case
+statements on the instruction types defined in ``XXXInstrInfo.h``. For
+example, in ``X86CodeEmitter.cpp``, the ``emitInstruction`` method is built
+around the following ``switch``/``case`` statements:
+
+.. code-block:: c++
+
+ switch (Desc->TSFlags & X86::FormMask) {
+ case X86II::Pseudo: // for not yet implemented instructions
+ ... // or pseudo-instructions
+ break;
+ case X86II::RawFrm: // for instructions with a fixed opcode value
+ ...
+ break;
+ case X86II::AddRegFrm: // for instructions that have one register operand
+ ... // added to their opcode
+ break;
+ case X86II::MRMDestReg:// for instructions that use the Mod/RM byte
+ ... // to specify a destination (register)
+ break;
+ case X86II::MRMDestMem:// for instructions that use the Mod/RM byte
+ ... // to specify a destination (memory)
+ break;
+ case X86II::MRMSrcReg: // for instructions that use the Mod/RM byte
+ ... // to specify a source (register)
+ break;
+ case X86II::MRMSrcMem: // for instructions that use the Mod/RM byte
+ ... // to specify a source (memory)
+ break;
+ case X86II::MRM0r: case X86II::MRM1r: // for instructions that operate on
+ case X86II::MRM2r: case X86II::MRM3r: // a REGISTER r/m operand and
+ case X86II::MRM4r: case X86II::MRM5r: // use the Mod/RM byte and a field
+ case X86II::MRM6r: case X86II::MRM7r: // to hold extended opcode data
+ ...
+ break;
+ case X86II::MRM0m: case X86II::MRM1m: // for instructions that operate on
+ case X86II::MRM2m: case X86II::MRM3m: // a MEMORY r/m operand and
+ case X86II::MRM4m: case X86II::MRM5m: // use the Mod/RM byte and a field
+ case X86II::MRM6m: case X86II::MRM7m: // to hold extended opcode data
+ ...
+ break;
+ case X86II::MRMInitReg: // for instructions whose source and
+ ... // destination are the same register
+ break;
+ }
+
+The implementations of these case statements often first emit the opcode and
+then get the operand(s). Then depending upon the operand, helper methods may
+be called to process the operand(s). For example, in ``X86CodeEmitter.cpp``,
+for the ``X86II::AddRegFrm`` case, the first data emitted (by ``emitByte``) is
+the opcode added to the register operand. Then an object representing the
+machine operand, ``MO1``, is extracted. The helper methods such as
+``isImmediate``, ``isGlobalAddress``, ``isExternalSymbol``,
+``isConstantPoolIndex``, and ``isJumpTableIndex`` determine the operand type.
+(``X86CodeEmitter.cpp`` also has private methods such as ``emitConstant``,
+``emitGlobalAddress``, ``emitExternalSymbolAddress``, ``emitConstPoolAddress``,
+and ``emitJumpTableAddress`` that emit the data into the output stream.)
+
+.. code-block:: c++
+
+ case X86II::AddRegFrm:
+ MCE.emitByte(BaseOpcode + getX86RegNum(MI.getOperand(CurOp++).getReg()));
+
+ if (CurOp != NumOps) {
+ const MachineOperand &MO1 = MI.getOperand(CurOp++);
+ unsigned Size = X86InstrInfo::sizeOfImm(Desc);
+ if (MO1.isImmediate())
+ emitConstant(MO1.getImm(), Size);
+ else {
+ unsigned rt = Is64BitMode ? X86::reloc_pcrel_word
+ : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word);
+ if (Opcode == X86::MOV64ri)
+ rt = X86::reloc_absolute_dword; // FIXME: add X86II flag?
+ if (MO1.isGlobalAddress()) {
+ bool NeedStub = isa<Function>(MO1.getGlobal());
+ bool isLazy = gvNeedsLazyPtr(MO1.getGlobal());
+ emitGlobalAddress(MO1.getGlobal(), rt, MO1.getOffset(), 0,
+ NeedStub, isLazy);
+ } else if (MO1.isExternalSymbol())
+ emitExternalSymbolAddress(MO1.getSymbolName(), rt);
+ else if (MO1.isConstantPoolIndex())
+ emitConstPoolAddress(MO1.getIndex(), rt);
+ else if (MO1.isJumpTableIndex())
+ emitJumpTableAddress(MO1.getIndex(), rt);
+ }
+ }
+ break;
+
+In the previous example, ``XXXCodeEmitter.cpp`` uses the variable ``rt``, which
+is a ``RelocationType`` enum that may be used to relocate addresses (for
+example, a global address with a PIC base offset). The ``RelocationType`` enum
+for that target is defined in the short target-specific ``XXXRelocations.h``
+file. The ``RelocationType`` is used by the ``relocate`` method defined in
+``XXXJITInfo.cpp`` to rewrite addresses for referenced global symbols.
+
+For example, ``X86Relocations.h`` specifies the following relocation types for
+the X86 addresses. In all four cases, the relocated value is added to the
+value already in memory. For ``reloc_pcrel_word`` and ``reloc_picrel_word``,
+there is an additional initial adjustment.
+
+.. code-block:: c++
+
+ enum RelocationType {
+ reloc_pcrel_word = 0, // add reloc value after adjusting for the PC loc
+ reloc_picrel_word = 1, // add reloc value after adjusting for the PIC base
+ reloc_absolute_word = 2, // absolute relocation; no additional adjustment
+ reloc_absolute_dword = 3 // absolute relocation; no additional adjustment
+ };
+
+Target JIT Info
+---------------
+
+``XXXJITInfo.cpp`` implements the JIT interfaces for target-specific
+code-generation activities, such as emitting machine code and stubs. At
+minimum, a target-specific version of ``XXXJITInfo`` implements the following:
+
+* ``getLazyResolverFunction`` --- Initializes the JIT, gives the target a
+ function that is used for compilation.
+
+* ``emitFunctionStub`` --- Returns a native function with a specified address
+ for a callback function.
+
+* ``relocate`` --- Changes the addresses of referenced globals, based on
+ relocation types.
+
+* Callback function that are wrappers to a function stub that is used when the
+ real target is not initially known.
+
+``getLazyResolverFunction`` is generally trivial to implement. It makes the
+incoming parameter as the global ``JITCompilerFunction`` and returns the
+callback function that will be used a function wrapper. For the Alpha target
+(in ``AlphaJITInfo.cpp``), the ``getLazyResolverFunction`` implementation is
+simply:
+
+.. code-block:: c++
+
+ TargetJITInfo::LazyResolverFn AlphaJITInfo::getLazyResolverFunction(
+ JITCompilerFn F) {
+ JITCompilerFunction = F;
+ return AlphaCompilationCallback;
+ }
+
+For the X86 target, the ``getLazyResolverFunction`` implementation is a little
+more complicated, because it returns a different callback function for
+processors with SSE instructions and XMM registers.
+
+The callback function initially saves and later restores the callee register
+values, incoming arguments, and frame and return address. The callback
+function needs low-level access to the registers or stack, so it is typically
+implemented with assembler.
+
Added: www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMPass.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMPass.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMPass.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/WritingAnLLVMPass.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,1434 @@
+====================
+Writing an LLVM Pass
+====================
+
+.. contents::
+ :local:
+
+Introduction --- What is a pass?
+================================
+
+The LLVM Pass Framework is an important part of the LLVM system, because LLVM
+passes are where most of the interesting parts of the compiler exist. Passes
+perform the transformations and optimizations that make up the compiler, they
+build the analysis results that are used by these transformations, and they
+are, above all, a structuring technique for compiler code.
+
+All LLVM passes are subclasses of the `Pass
+<http://llvm.org/doxygen/classllvm_1_1Pass.html>`_ class, which implement
+functionality by overriding virtual methods inherited from ``Pass``. Depending
+on how your pass works, you should inherit from the :ref:`ModulePass
+<writing-an-llvm-pass-ModulePass>` , :ref:`CallGraphSCCPass
+<writing-an-llvm-pass-CallGraphSCCPass>`, :ref:`FunctionPass
+<writing-an-llvm-pass-FunctionPass>` , or :ref:`LoopPass
+<writing-an-llvm-pass-LoopPass>`, or :ref:`RegionPass
+<writing-an-llvm-pass-RegionPass>`, or :ref:`BasicBlockPass
+<writing-an-llvm-pass-BasicBlockPass>` classes, which gives the system more
+information about what your pass does, and how it can be combined with other
+passes. One of the main features of the LLVM Pass Framework is that it
+schedules passes to run in an efficient way based on the constraints that your
+pass meets (which are indicated by which class they derive from).
+
+We start by showing you how to construct a pass, everything from setting up the
+code, to compiling, loading, and executing it. After the basics are down, more
+advanced features are discussed.
+
+Quick Start --- Writing hello world
+===================================
+
+Here we describe how to write the "hello world" of passes. The "Hello" pass is
+designed to simply print out the name of non-external functions that exist in
+the program being compiled. It does not modify the program at all, it just
+inspects it. The source code and files for this pass are available in the LLVM
+source tree in the ``lib/Transforms/Hello`` directory.
+
+.. _writing-an-llvm-pass-makefile:
+
+Setting up the build environment
+--------------------------------
+
+First, configure and build LLVM. Next, you need to create a new directory
+somewhere in the LLVM source base. For this example, we'll assume that you
+made ``lib/Transforms/Hello``. Finally, you must set up a build script
+that will compile the source code for the new pass. To do this,
+copy the following into ``CMakeLists.txt``:
+
+.. code-block:: cmake
+
+ add_llvm_library( LLVMHello MODULE
+ Hello.cpp
+
+ PLUGIN_TOOL
+ opt
+ )
+
+and the following line into ``lib/Transforms/CMakeLists.txt``:
+
+.. code-block:: cmake
+
+ add_subdirectory(Hello)
+
+(Note that there is already a directory named ``Hello`` with a sample "Hello"
+pass; you may play with it -- in which case you don't need to modify any
+``CMakeLists.txt`` files -- or, if you want to create everything from scratch,
+use another name.)
+
+This build script specifies that ``Hello.cpp`` file in the current directory
+is to be compiled and linked into a shared object ``$(LEVEL)/lib/LLVMHello.so`` that
+can be dynamically loaded by the :program:`opt` tool via its :option:`-load`
+option. If your operating system uses a suffix other than ``.so`` (such as
+Windows or Mac OS X), the appropriate extension will be used.
+
+Now that we have the build scripts set up, we just need to write the code for
+the pass itself.
+
+.. _writing-an-llvm-pass-basiccode:
+
+Basic code required
+-------------------
+
+Now that we have a way to compile our new pass, we just have to write it.
+Start out with:
+
+.. code-block:: c++
+
+ #include "llvm/Pass.h"
+ #include "llvm/IR/Function.h"
+ #include "llvm/Support/raw_ostream.h"
+
+Which are needed because we are writing a `Pass
+<http://llvm.org/doxygen/classllvm_1_1Pass.html>`_, we are operating on
+`Function <http://llvm.org/doxygen/classllvm_1_1Function.html>`_\ s, and we will
+be doing some printing.
+
+Next we have:
+
+.. code-block:: c++
+
+ using namespace llvm;
+
+... which is required because the functions from the include files live in the
+llvm namespace.
+
+Next we have:
+
+.. code-block:: c++
+
+ namespace {
+
+... which starts out an anonymous namespace. Anonymous namespaces are to C++
+what the "``static``" keyword is to C (at global scope). It makes the things
+declared inside of the anonymous namespace visible only to the current file.
+If you're not familiar with them, consult a decent C++ book for more
+information.
+
+Next, we declare our pass itself:
+
+.. code-block:: c++
+
+ struct Hello : public FunctionPass {
+
+This declares a "``Hello``" class that is a subclass of :ref:`FunctionPass
+<writing-an-llvm-pass-FunctionPass>`. The different builtin pass subclasses
+are described in detail :ref:`later <writing-an-llvm-pass-pass-classes>`, but
+for now, know that ``FunctionPass`` operates on a function at a time.
+
+.. code-block:: c++
+
+ static char ID;
+ Hello() : FunctionPass(ID) {}
+
+This declares pass identifier used by LLVM to identify pass. This allows LLVM
+to avoid using expensive C++ runtime information.
+
+.. code-block:: c++
+
+ bool runOnFunction(Function &F) override {
+ errs() << "Hello: ";
+ errs().write_escaped(F.getName()) << '\n';
+ return false;
+ }
+ }; // end of struct Hello
+ } // end of anonymous namespace
+
+We declare a :ref:`runOnFunction <writing-an-llvm-pass-runOnFunction>` method,
+which overrides an abstract virtual method inherited from :ref:`FunctionPass
+<writing-an-llvm-pass-FunctionPass>`. This is where we are supposed to do our
+thing, so we just print out our message with the name of each function.
+
+.. code-block:: c++
+
+ char Hello::ID = 0;
+
+We initialize pass ID here. LLVM uses ID's address to identify a pass, so
+initialization value is not important.
+
+.. code-block:: c++
+
+ static RegisterPass<Hello> X("hello", "Hello World Pass",
+ false /* Only looks at CFG */,
+ false /* Analysis Pass */);
+
+Lastly, we :ref:`register our class <writing-an-llvm-pass-registration>`
+``Hello``, giving it a command line argument "``hello``", and a name "Hello
+World Pass". The last two arguments describe its behavior: if a pass walks CFG
+without modifying it then the third argument is set to ``true``; if a pass is
+an analysis pass, for example dominator tree pass, then ``true`` is supplied as
+the fourth argument.
+
+As a whole, the ``.cpp`` file looks like:
+
+.. code-block:: c++
+
+ #include "llvm/Pass.h"
+ #include "llvm/IR/Function.h"
+ #include "llvm/Support/raw_ostream.h"
+
+ using namespace llvm;
+
+ namespace {
+ struct Hello : public FunctionPass {
+ static char ID;
+ Hello() : FunctionPass(ID) {}
+
+ bool runOnFunction(Function &F) override {
+ errs() << "Hello: ";
+ errs().write_escaped(F.getName()) << '\n';
+ return false;
+ }
+ }; // end of struct Hello
+ } // end of anonymous namespace
+
+ char Hello::ID = 0;
+ static RegisterPass<Hello> X("hello", "Hello World Pass",
+ false /* Only looks at CFG */,
+ false /* Analysis Pass */);
+
+Now that it's all together, compile the file with a simple "``gmake``" command
+from the top level of your build directory and you should get a new file
+"``lib/LLVMHello.so``". Note that everything in this file is
+contained in an anonymous namespace --- this reflects the fact that passes
+are self contained units that do not need external interfaces (although they
+can have them) to be useful.
+
+Running a pass with ``opt``
+---------------------------
+
+Now that you have a brand new shiny shared object file, we can use the
+:program:`opt` command to run an LLVM program through your pass. Because you
+registered your pass with ``RegisterPass``, you will be able to use the
+:program:`opt` tool to access it, once loaded.
+
+To test it, follow the example at the end of the :doc:`GettingStarted` to
+compile "Hello World" to LLVM. We can now run the bitcode file (hello.bc) for
+the program through our transformation like this (or course, any bitcode file
+will work):
+
+.. code-block:: console
+
+ $ opt -load lib/LLVMHello.so -hello < hello.bc > /dev/null
+ Hello: __main
+ Hello: puts
+ Hello: main
+
+The :option:`-load` option specifies that :program:`opt` should load your pass
+as a shared object, which makes "``-hello``" a valid command line argument
+(which is one reason you need to :ref:`register your pass
+<writing-an-llvm-pass-registration>`). Because the Hello pass does not modify
+the program in any interesting way, we just throw away the result of
+:program:`opt` (sending it to ``/dev/null``).
+
+To see what happened to the other string you registered, try running
+:program:`opt` with the :option:`-help` option:
+
+.. code-block:: console
+
+ $ opt -load lib/LLVMHello.so -help
+ OVERVIEW: llvm .bc -> .bc modular optimizer and analysis printer
+
+ USAGE: opt [subcommand] [options] <input bitcode file>
+
+ OPTIONS:
+ Optimizations available:
+ ...
+ -guard-widening - Widen guards
+ -gvn - Global Value Numbering
+ -gvn-hoist - Early GVN Hoisting of Expressions
+ -hello - Hello World Pass
+ -indvars - Induction Variable Simplification
+ -inferattrs - Infer set function attributes
+ ...
+
+The pass name gets added as the information string for your pass, giving some
+documentation to users of :program:`opt`. Now that you have a working pass,
+you would go ahead and make it do the cool transformations you want. Once you
+get it all working and tested, it may become useful to find out how fast your
+pass is. The :ref:`PassManager <writing-an-llvm-pass-passmanager>` provides a
+nice command line option (:option:`--time-passes`) that allows you to get
+information about the execution time of your pass along with the other passes
+you queue up. For example:
+
+.. code-block:: console
+
+ $ opt -load lib/LLVMHello.so -hello -time-passes < hello.bc > /dev/null
+ Hello: __main
+ Hello: puts
+ Hello: main
+ ===-------------------------------------------------------------------------===
+ ... Pass execution timing report ...
+ ===-------------------------------------------------------------------------===
+ Total Execution Time: 0.0007 seconds (0.0005 wall clock)
+
+ ---User Time--- --User+System-- ---Wall Time--- --- Name ---
+ 0.0004 ( 55.3%) 0.0004 ( 55.3%) 0.0004 ( 75.7%) Bitcode Writer
+ 0.0003 ( 44.7%) 0.0003 ( 44.7%) 0.0001 ( 13.6%) Hello World Pass
+ 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 10.7%) Module Verifier
+ 0.0007 (100.0%) 0.0007 (100.0%) 0.0005 (100.0%) Total
+
+As you can see, our implementation above is pretty fast. The additional
+passes listed are automatically inserted by the :program:`opt` tool to verify
+that the LLVM emitted by your pass is still valid and well formed LLVM, which
+hasn't been broken somehow.
+
+Now that you have seen the basics of the mechanics behind passes, we can talk
+about some more details of how they work and how to use them.
+
+.. _writing-an-llvm-pass-pass-classes:
+
+Pass classes and requirements
+=============================
+
+One of the first things that you should do when designing a new pass is to
+decide what class you should subclass for your pass. The :ref:`Hello World
+<writing-an-llvm-pass-basiccode>` example uses the :ref:`FunctionPass
+<writing-an-llvm-pass-FunctionPass>` class for its implementation, but we did
+not discuss why or when this should occur. Here we talk about the classes
+available, from the most general to the most specific.
+
+When choosing a superclass for your ``Pass``, you should choose the **most
+specific** class possible, while still being able to meet the requirements
+listed. This gives the LLVM Pass Infrastructure information necessary to
+optimize how passes are run, so that the resultant compiler isn't unnecessarily
+slow.
+
+The ``ImmutablePass`` class
+---------------------------
+
+The most plain and boring type of pass is the "`ImmutablePass
+<http://llvm.org/doxygen/classllvm_1_1ImmutablePass.html>`_" class. This pass
+type is used for passes that do not have to be run, do not change state, and
+never need to be updated. This is not a normal type of transformation or
+analysis, but can provide information about the current compiler configuration.
+
+Although this pass class is very infrequently used, it is important for
+providing information about the current target machine being compiled for, and
+other static information that can affect the various transformations.
+
+``ImmutablePass``\ es never invalidate other transformations, are never
+invalidated, and are never "run".
+
+.. _writing-an-llvm-pass-ModulePass:
+
+The ``ModulePass`` class
+------------------------
+
+The `ModulePass <http://llvm.org/doxygen/classllvm_1_1ModulePass.html>`_ class
+is the most general of all superclasses that you can use. Deriving from
+``ModulePass`` indicates that your pass uses the entire program as a unit,
+referring to function bodies in no predictable order, or adding and removing
+functions. Because nothing is known about the behavior of ``ModulePass``
+subclasses, no optimization can be done for their execution.
+
+A module pass can use function level passes (e.g. dominators) using the
+``getAnalysis`` interface ``getAnalysis<DominatorTree>(llvm::Function *)`` to
+provide the function to retrieve analysis result for, if the function pass does
+not require any module or immutable passes. Note that this can only be done
+for functions for which the analysis ran, e.g. in the case of dominators you
+should only ask for the ``DominatorTree`` for function definitions, not
+declarations.
+
+To write a correct ``ModulePass`` subclass, derive from ``ModulePass`` and
+overload the ``runOnModule`` method with the following signature:
+
+The ``runOnModule`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnModule(Module &M) = 0;
+
+The ``runOnModule`` method performs the interesting work of the pass. It
+should return ``true`` if the module was modified by the transformation and
+``false`` otherwise.
+
+.. _writing-an-llvm-pass-CallGraphSCCPass:
+
+The ``CallGraphSCCPass`` class
+------------------------------
+
+The `CallGraphSCCPass
+<http://llvm.org/doxygen/classllvm_1_1CallGraphSCCPass.html>`_ is used by
+passes that need to traverse the program bottom-up on the call graph (callees
+before callers). Deriving from ``CallGraphSCCPass`` provides some mechanics
+for building and traversing the ``CallGraph``, but also allows the system to
+optimize execution of ``CallGraphSCCPass``\ es. If your pass meets the
+requirements outlined below, and doesn't meet the requirements of a
+:ref:`FunctionPass <writing-an-llvm-pass-FunctionPass>` or :ref:`BasicBlockPass
+<writing-an-llvm-pass-BasicBlockPass>`, you should derive from
+``CallGraphSCCPass``.
+
+``TODO``: explain briefly what SCC, Tarjan's algo, and B-U mean.
+
+To be explicit, CallGraphSCCPass subclasses are:
+
+#. ... *not allowed* to inspect or modify any ``Function``\ s other than those
+ in the current SCC and the direct callers and direct callees of the SCC.
+#. ... *required* to preserve the current ``CallGraph`` object, updating it to
+ reflect any changes made to the program.
+#. ... *not allowed* to add or remove SCC's from the current Module, though
+ they may change the contents of an SCC.
+#. ... *allowed* to add or remove global variables from the current Module.
+#. ... *allowed* to maintain state across invocations of :ref:`runOnSCC
+ <writing-an-llvm-pass-runOnSCC>` (including global data).
+
+Implementing a ``CallGraphSCCPass`` is slightly tricky in some cases because it
+has to handle SCCs with more than one node in it. All of the virtual methods
+described below should return ``true`` if they modified the program, or
+``false`` if they didn't.
+
+The ``doInitialization(CallGraph &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doInitialization(CallGraph &CG);
+
+The ``doInitialization`` method is allowed to do most of the things that
+``CallGraphSCCPass``\ es are not allowed to do. They can add and remove
+functions, get pointers to functions, etc. The ``doInitialization`` method is
+designed to do simple initialization type of stuff that does not depend on the
+SCCs being processed. The ``doInitialization`` method call is not scheduled to
+overlap with any other pass executions (thus it should be very fast).
+
+.. _writing-an-llvm-pass-runOnSCC:
+
+The ``runOnSCC`` method
+^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnSCC(CallGraphSCC &SCC) = 0;
+
+The ``runOnSCC`` method performs the interesting work of the pass, and should
+return ``true`` if the module was modified by the transformation, ``false``
+otherwise.
+
+The ``doFinalization(CallGraph &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doFinalization(CallGraph &CG);
+
+The ``doFinalization`` method is an infrequently used method that is called
+when the pass framework has finished calling :ref:`runOnSCC
+<writing-an-llvm-pass-runOnSCC>` for every SCC in the program being compiled.
+
+.. _writing-an-llvm-pass-FunctionPass:
+
+The ``FunctionPass`` class
+--------------------------
+
+In contrast to ``ModulePass`` subclasses, `FunctionPass
+<http://llvm.org/doxygen/classllvm_1_1Pass.html>`_ subclasses do have a
+predictable, local behavior that can be expected by the system. All
+``FunctionPass`` execute on each function in the program independent of all of
+the other functions in the program. ``FunctionPass``\ es do not require that
+they are executed in a particular order, and ``FunctionPass``\ es do not modify
+external functions.
+
+To be explicit, ``FunctionPass`` subclasses are not allowed to:
+
+#. Inspect or modify a ``Function`` other than the one currently being processed.
+#. Add or remove ``Function``\ s from the current ``Module``.
+#. Add or remove global variables from the current ``Module``.
+#. Maintain state across invocations of :ref:`runOnFunction
+ <writing-an-llvm-pass-runOnFunction>` (including global data).
+
+Implementing a ``FunctionPass`` is usually straightforward (See the :ref:`Hello
+World <writing-an-llvm-pass-basiccode>` pass for example).
+``FunctionPass``\ es may overload three virtual methods to do their work. All
+of these methods should return ``true`` if they modified the program, or
+``false`` if they didn't.
+
+.. _writing-an-llvm-pass-doInitialization-mod:
+
+The ``doInitialization(Module &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doInitialization(Module &M);
+
+The ``doInitialization`` method is allowed to do most of the things that
+``FunctionPass``\ es are not allowed to do. They can add and remove functions,
+get pointers to functions, etc. The ``doInitialization`` method is designed to
+do simple initialization type of stuff that does not depend on the functions
+being processed. The ``doInitialization`` method call is not scheduled to
+overlap with any other pass executions (thus it should be very fast).
+
+A good example of how this method should be used is the `LowerAllocations
+<http://llvm.org/doxygen/LowerAllocations_8cpp-source.html>`_ pass. This pass
+converts ``malloc`` and ``free`` instructions into platform dependent
+``malloc()`` and ``free()`` function calls. It uses the ``doInitialization``
+method to get a reference to the ``malloc`` and ``free`` functions that it
+needs, adding prototypes to the module if necessary.
+
+.. _writing-an-llvm-pass-runOnFunction:
+
+The ``runOnFunction`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnFunction(Function &F) = 0;
+
+The ``runOnFunction`` method must be implemented by your subclass to do the
+transformation or analysis work of your pass. As usual, a ``true`` value
+should be returned if the function is modified.
+
+.. _writing-an-llvm-pass-doFinalization-mod:
+
+The ``doFinalization(Module &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doFinalization(Module &M);
+
+The ``doFinalization`` method is an infrequently used method that is called
+when the pass framework has finished calling :ref:`runOnFunction
+<writing-an-llvm-pass-runOnFunction>` for every function in the program being
+compiled.
+
+.. _writing-an-llvm-pass-LoopPass:
+
+The ``LoopPass`` class
+----------------------
+
+All ``LoopPass`` execute on each loop in the function independent of all of the
+other loops in the function. ``LoopPass`` processes loops in loop nest order
+such that outer most loop is processed last.
+
+``LoopPass`` subclasses are allowed to update loop nest using ``LPPassManager``
+interface. Implementing a loop pass is usually straightforward.
+``LoopPass``\ es may overload three virtual methods to do their work. All
+these methods should return ``true`` if they modified the program, or ``false``
+if they didn't.
+
+A ``LoopPass`` subclass which is intended to run as part of the main loop pass
+pipeline needs to preserve all of the same *function* analyses that the other
+loop passes in its pipeline require. To make that easier,
+a ``getLoopAnalysisUsage`` function is provided by ``LoopUtils.h``. It can be
+called within the subclass's ``getAnalysisUsage`` override to get consistent
+and correct behavior. Analogously, ``INITIALIZE_PASS_DEPENDENCY(LoopPass)``
+will initialize this set of function analyses.
+
+The ``doInitialization(Loop *, LPPassManager &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doInitialization(Loop *, LPPassManager &LPM);
+
+The ``doInitialization`` method is designed to do simple initialization type of
+stuff that does not depend on the functions being processed. The
+``doInitialization`` method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast). ``LPPassManager`` interface
+should be used to access ``Function`` or ``Module`` level analysis information.
+
+.. _writing-an-llvm-pass-runOnLoop:
+
+The ``runOnLoop`` method
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnLoop(Loop *, LPPassManager &LPM) = 0;
+
+The ``runOnLoop`` method must be implemented by your subclass to do the
+transformation or analysis work of your pass. As usual, a ``true`` value
+should be returned if the function is modified. ``LPPassManager`` interface
+should be used to update loop nest.
+
+The ``doFinalization()`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doFinalization();
+
+The ``doFinalization`` method is an infrequently used method that is called
+when the pass framework has finished calling :ref:`runOnLoop
+<writing-an-llvm-pass-runOnLoop>` for every loop in the program being compiled.
+
+.. _writing-an-llvm-pass-RegionPass:
+
+The ``RegionPass`` class
+------------------------
+
+``RegionPass`` is similar to :ref:`LoopPass <writing-an-llvm-pass-LoopPass>`,
+but executes on each single entry single exit region in the function.
+``RegionPass`` processes regions in nested order such that the outer most
+region is processed last.
+
+``RegionPass`` subclasses are allowed to update the region tree by using the
+``RGPassManager`` interface. You may overload three virtual methods of
+``RegionPass`` to implement your own region pass. All these methods should
+return ``true`` if they modified the program, or ``false`` if they did not.
+
+The ``doInitialization(Region *, RGPassManager &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doInitialization(Region *, RGPassManager &RGM);
+
+The ``doInitialization`` method is designed to do simple initialization type of
+stuff that does not depend on the functions being processed. The
+``doInitialization`` method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast). ``RPPassManager`` interface
+should be used to access ``Function`` or ``Module`` level analysis information.
+
+.. _writing-an-llvm-pass-runOnRegion:
+
+The ``runOnRegion`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnRegion(Region *, RGPassManager &RGM) = 0;
+
+The ``runOnRegion`` method must be implemented by your subclass to do the
+transformation or analysis work of your pass. As usual, a true value should be
+returned if the region is modified. ``RGPassManager`` interface should be used to
+update region tree.
+
+The ``doFinalization()`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doFinalization();
+
+The ``doFinalization`` method is an infrequently used method that is called
+when the pass framework has finished calling :ref:`runOnRegion
+<writing-an-llvm-pass-runOnRegion>` for every region in the program being
+compiled.
+
+.. _writing-an-llvm-pass-BasicBlockPass:
+
+The ``BasicBlockPass`` class
+----------------------------
+
+``BasicBlockPass``\ es are just like :ref:`FunctionPass's
+<writing-an-llvm-pass-FunctionPass>` , except that they must limit their scope
+of inspection and modification to a single basic block at a time. As such,
+they are **not** allowed to do any of the following:
+
+#. Modify or inspect any basic blocks outside of the current one.
+#. Maintain state across invocations of :ref:`runOnBasicBlock
+ <writing-an-llvm-pass-runOnBasicBlock>`.
+#. Modify the control flow graph (by altering terminator instructions)
+#. Any of the things forbidden for :ref:`FunctionPasses
+ <writing-an-llvm-pass-FunctionPass>`.
+
+``BasicBlockPass``\ es are useful for traditional local and "peephole"
+optimizations. They may override the same :ref:`doInitialization(Module &)
+<writing-an-llvm-pass-doInitialization-mod>` and :ref:`doFinalization(Module &)
+<writing-an-llvm-pass-doFinalization-mod>` methods that :ref:`FunctionPass's
+<writing-an-llvm-pass-FunctionPass>` have, but also have the following virtual
+methods that may also be implemented:
+
+The ``doInitialization(Function &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doInitialization(Function &F);
+
+The ``doInitialization`` method is allowed to do most of the things that
+``BasicBlockPass``\ es are not allowed to do, but that ``FunctionPass``\ es
+can. The ``doInitialization`` method is designed to do simple initialization
+that does not depend on the ``BasicBlock``\ s being processed. The
+``doInitialization`` method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).
+
+.. _writing-an-llvm-pass-runOnBasicBlock:
+
+The ``runOnBasicBlock`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnBasicBlock(BasicBlock &BB) = 0;
+
+Override this function to do the work of the ``BasicBlockPass``. This function
+is not allowed to inspect or modify basic blocks other than the parameter, and
+are not allowed to modify the CFG. A ``true`` value must be returned if the
+basic block is modified.
+
+The ``doFinalization(Function &)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool doFinalization(Function &F);
+
+The ``doFinalization`` method is an infrequently used method that is called
+when the pass framework has finished calling :ref:`runOnBasicBlock
+<writing-an-llvm-pass-runOnBasicBlock>` for every ``BasicBlock`` in the program
+being compiled. This can be used to perform per-function finalization.
+
+The ``MachineFunctionPass`` class
+---------------------------------
+
+A ``MachineFunctionPass`` is a part of the LLVM code generator that executes on
+the machine-dependent representation of each LLVM function in the program.
+
+Code generator passes are registered and initialized specially by
+``TargetMachine::addPassesToEmitFile`` and similar routines, so they cannot
+generally be run from the :program:`opt` or :program:`bugpoint` commands.
+
+A ``MachineFunctionPass`` is also a ``FunctionPass``, so all the restrictions
+that apply to a ``FunctionPass`` also apply to it. ``MachineFunctionPass``\ es
+also have additional restrictions. In particular, ``MachineFunctionPass``\ es
+are not allowed to do any of the following:
+
+#. Modify or create any LLVM IR ``Instruction``\ s, ``BasicBlock``\ s,
+ ``Argument``\ s, ``Function``\ s, ``GlobalVariable``\ s,
+ ``GlobalAlias``\ es, or ``Module``\ s.
+#. Modify a ``MachineFunction`` other than the one currently being processed.
+#. Maintain state across invocations of :ref:`runOnMachineFunction
+ <writing-an-llvm-pass-runOnMachineFunction>` (including global data).
+
+.. _writing-an-llvm-pass-runOnMachineFunction:
+
+The ``runOnMachineFunction(MachineFunction &MF)`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual bool runOnMachineFunction(MachineFunction &MF) = 0;
+
+``runOnMachineFunction`` can be considered the main entry point of a
+``MachineFunctionPass``; that is, you should override this method to do the
+work of your ``MachineFunctionPass``.
+
+The ``runOnMachineFunction`` method is called on every ``MachineFunction`` in a
+``Module``, so that the ``MachineFunctionPass`` may perform optimizations on
+the machine-dependent representation of the function. If you want to get at
+the LLVM ``Function`` for the ``MachineFunction`` you're working on, use
+``MachineFunction``'s ``getFunction()`` accessor method --- but remember, you
+may not modify the LLVM ``Function`` or its contents from a
+``MachineFunctionPass``.
+
+.. _writing-an-llvm-pass-registration:
+
+Pass registration
+-----------------
+
+In the :ref:`Hello World <writing-an-llvm-pass-basiccode>` example pass we
+illustrated how pass registration works, and discussed some of the reasons that
+it is used and what it does. Here we discuss how and why passes are
+registered.
+
+As we saw above, passes are registered with the ``RegisterPass`` template. The
+template parameter is the name of the pass that is to be used on the command
+line to specify that the pass should be added to a program (for example, with
+:program:`opt` or :program:`bugpoint`). The first argument is the name of the
+pass, which is to be used for the :option:`-help` output of programs, as well
+as for debug output generated by the `--debug-pass` option.
+
+If you want your pass to be easily dumpable, you should implement the virtual
+print method:
+
+The ``print`` method
+^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual void print(llvm::raw_ostream &O, const Module *M) const;
+
+The ``print`` method must be implemented by "analyses" in order to print a
+human readable version of the analysis results. This is useful for debugging
+an analysis itself, as well as for other people to figure out how an analysis
+works. Use the opt ``-analyze`` argument to invoke this method.
+
+The ``llvm::raw_ostream`` parameter specifies the stream to write the results
+on, and the ``Module`` parameter gives a pointer to the top level module of the
+program that has been analyzed. Note however that this pointer may be ``NULL``
+in certain circumstances (such as calling the ``Pass::dump()`` from a
+debugger), so it should only be used to enhance debug output, it should not be
+depended on.
+
+.. _writing-an-llvm-pass-interaction:
+
+Specifying interactions between passes
+--------------------------------------
+
+One of the main responsibilities of the ``PassManager`` is to make sure that
+passes interact with each other correctly. Because ``PassManager`` tries to
+:ref:`optimize the execution of passes <writing-an-llvm-pass-passmanager>` it
+must know how the passes interact with each other and what dependencies exist
+between the various passes. To track this, each pass can declare the set of
+passes that are required to be executed before the current pass, and the passes
+which are invalidated by the current pass.
+
+Typically this functionality is used to require that analysis results are
+computed before your pass is run. Running arbitrary transformation passes can
+invalidate the computed analysis results, which is what the invalidation set
+specifies. If a pass does not implement the :ref:`getAnalysisUsage
+<writing-an-llvm-pass-getAnalysisUsage>` method, it defaults to not having any
+prerequisite passes, and invalidating **all** other passes.
+
+.. _writing-an-llvm-pass-getAnalysisUsage:
+
+The ``getAnalysisUsage`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual void getAnalysisUsage(AnalysisUsage &Info) const;
+
+By implementing the ``getAnalysisUsage`` method, the required and invalidated
+sets may be specified for your transformation. The implementation should fill
+in the `AnalysisUsage
+<http://llvm.org/doxygen/classllvm_1_1AnalysisUsage.html>`_ object with
+information about which passes are required and not invalidated. To do this, a
+pass may call any of the following methods on the ``AnalysisUsage`` object:
+
+The ``AnalysisUsage::addRequired<>`` and ``AnalysisUsage::addRequiredTransitive<>`` methods
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If your pass requires a previous pass to be executed (an analysis for example),
+it can use one of these methods to arrange for it to be run before your pass.
+LLVM has many different types of analyses and passes that can be required,
+spanning the range from ``DominatorSet`` to ``BreakCriticalEdges``. Requiring
+``BreakCriticalEdges``, for example, guarantees that there will be no critical
+edges in the CFG when your pass has been run.
+
+Some analyses chain to other analyses to do their job. For example, an
+`AliasAnalysis <AliasAnalysis>` implementation is required to :ref:`chain
+<aliasanalysis-chaining>` to other alias analysis passes. In cases where
+analyses chain, the ``addRequiredTransitive`` method should be used instead of
+the ``addRequired`` method. This informs the ``PassManager`` that the
+transitively required pass should be alive as long as the requiring pass is.
+
+The ``AnalysisUsage::addPreserved<>`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+One of the jobs of the ``PassManager`` is to optimize how and when analyses are
+run. In particular, it attempts to avoid recomputing data unless it needs to.
+For this reason, passes are allowed to declare that they preserve (i.e., they
+don't invalidate) an existing analysis if it's available. For example, a
+simple constant folding pass would not modify the CFG, so it can't possibly
+affect the results of dominator analysis. By default, all passes are assumed
+to invalidate all others.
+
+The ``AnalysisUsage`` class provides several methods which are useful in
+certain circumstances that are related to ``addPreserved``. In particular, the
+``setPreservesAll`` method can be called to indicate that the pass does not
+modify the LLVM program at all (which is true for analyses), and the
+``setPreservesCFG`` method can be used by transformations that change
+instructions in the program but do not modify the CFG or terminator
+instructions (note that this property is implicitly set for
+:ref:`BasicBlockPass <writing-an-llvm-pass-BasicBlockPass>`\ es).
+
+``addPreserved`` is particularly useful for transformations like
+``BreakCriticalEdges``. This pass knows how to update a small set of loop and
+dominator related analyses if they exist, so it can preserve them, despite the
+fact that it hacks on the CFG.
+
+Example implementations of ``getAnalysisUsage``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ // This example modifies the program, but does not modify the CFG
+ void LICM::getAnalysisUsage(AnalysisUsage &AU) const {
+ AU.setPreservesCFG();
+ AU.addRequired<LoopInfoWrapperPass>();
+ }
+
+.. _writing-an-llvm-pass-getAnalysis:
+
+The ``getAnalysis<>`` and ``getAnalysisIfAvailable<>`` methods
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``Pass::getAnalysis<>`` method is automatically inherited by your class,
+providing you with access to the passes that you declared that you required
+with the :ref:`getAnalysisUsage <writing-an-llvm-pass-getAnalysisUsage>`
+method. It takes a single template argument that specifies which pass class
+you want, and returns a reference to that pass. For example:
+
+.. code-block:: c++
+
+ bool LICM::runOnFunction(Function &F) {
+ LoopInfo &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
+ //...
+ }
+
+This method call returns a reference to the pass desired. You may get a
+runtime assertion failure if you attempt to get an analysis that you did not
+declare as required in your :ref:`getAnalysisUsage
+<writing-an-llvm-pass-getAnalysisUsage>` implementation. This method can be
+called by your ``run*`` method implementation, or by any other local method
+invoked by your ``run*`` method.
+
+A module level pass can use function level analysis info using this interface.
+For example:
+
+.. code-block:: c++
+
+ bool ModuleLevelPass::runOnModule(Module &M) {
+ //...
+ DominatorTree &DT = getAnalysis<DominatorTree>(Func);
+ //...
+ }
+
+In above example, ``runOnFunction`` for ``DominatorTree`` is called by pass
+manager before returning a reference to the desired pass.
+
+If your pass is capable of updating analyses if they exist (e.g.,
+``BreakCriticalEdges``, as described above), you can use the
+``getAnalysisIfAvailable`` method, which returns a pointer to the analysis if
+it is active. For example:
+
+.. code-block:: c++
+
+ if (DominatorSet *DS = getAnalysisIfAvailable<DominatorSet>()) {
+ // A DominatorSet is active. This code will update it.
+ }
+
+Implementing Analysis Groups
+----------------------------
+
+Now that we understand the basics of how passes are defined, how they are used,
+and how they are required from other passes, it's time to get a little bit
+fancier. All of the pass relationships that we have seen so far are very
+simple: one pass depends on one other specific pass to be run before it can
+run. For many applications, this is great, for others, more flexibility is
+required.
+
+In particular, some analyses are defined such that there is a single simple
+interface to the analysis results, but multiple ways of calculating them.
+Consider alias analysis for example. The most trivial alias analysis returns
+"may alias" for any alias query. The most sophisticated analysis a
+flow-sensitive, context-sensitive interprocedural analysis that can take a
+significant amount of time to execute (and obviously, there is a lot of room
+between these two extremes for other implementations). To cleanly support
+situations like this, the LLVM Pass Infrastructure supports the notion of
+Analysis Groups.
+
+Analysis Group Concepts
+^^^^^^^^^^^^^^^^^^^^^^^
+
+An Analysis Group is a single simple interface that may be implemented by
+multiple different passes. Analysis Groups can be given human readable names
+just like passes, but unlike passes, they need not derive from the ``Pass``
+class. An analysis group may have one or more implementations, one of which is
+the "default" implementation.
+
+Analysis groups are used by client passes just like other passes are: the
+``AnalysisUsage::addRequired()`` and ``Pass::getAnalysis()`` methods. In order
+to resolve this requirement, the :ref:`PassManager
+<writing-an-llvm-pass-passmanager>` scans the available passes to see if any
+implementations of the analysis group are available. If none is available, the
+default implementation is created for the pass to use. All standard rules for
+:ref:`interaction between passes <writing-an-llvm-pass-interaction>` still
+apply.
+
+Although :ref:`Pass Registration <writing-an-llvm-pass-registration>` is
+optional for normal passes, all analysis group implementations must be
+registered, and must use the :ref:`INITIALIZE_AG_PASS
+<writing-an-llvm-pass-RegisterAnalysisGroup>` template to join the
+implementation pool. Also, a default implementation of the interface **must**
+be registered with :ref:`RegisterAnalysisGroup
+<writing-an-llvm-pass-RegisterAnalysisGroup>`.
+
+As a concrete example of an Analysis Group in action, consider the
+`AliasAnalysis <http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html>`_
+analysis group. The default implementation of the alias analysis interface
+(the `basicaa <http://llvm.org/doxygen/structBasicAliasAnalysis.html>`_ pass)
+just does a few simple checks that don't require significant analysis to
+compute (such as: two different globals can never alias each other, etc).
+Passes that use the `AliasAnalysis
+<http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html>`_ interface (for
+example the `gvn <http://llvm.org/doxygen/classllvm_1_1GVN.html>`_ pass), do not
+care which implementation of alias analysis is actually provided, they just use
+the designated interface.
+
+From the user's perspective, commands work just like normal. Issuing the
+command ``opt -gvn ...`` will cause the ``basicaa`` class to be instantiated
+and added to the pass sequence. Issuing the command ``opt -somefancyaa -gvn
+...`` will cause the ``gvn`` pass to use the ``somefancyaa`` alias analysis
+(which doesn't actually exist, it's just a hypothetical example) instead.
+
+.. _writing-an-llvm-pass-RegisterAnalysisGroup:
+
+Using ``RegisterAnalysisGroup``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``RegisterAnalysisGroup`` template is used to register the analysis group
+itself, while the ``INITIALIZE_AG_PASS`` is used to add pass implementations to
+the analysis group. First, an analysis group should be registered, with a
+human readable name provided for it. Unlike registration of passes, there is
+no command line argument to be specified for the Analysis Group Interface
+itself, because it is "abstract":
+
+.. code-block:: c++
+
+ static RegisterAnalysisGroup<AliasAnalysis> A("Alias Analysis");
+
+Once the analysis is registered, passes can declare that they are valid
+implementations of the interface by using the following code:
+
+.. code-block:: c++
+
+ namespace {
+ // Declare that we implement the AliasAnalysis interface
+ INITIALIZE_AG_PASS(FancyAA, AliasAnalysis , "somefancyaa",
+ "A more complex alias analysis implementation",
+ false, // Is CFG Only?
+ true, // Is Analysis?
+ false); // Is default Analysis Group implementation?
+ }
+
+This just shows a class ``FancyAA`` that uses the ``INITIALIZE_AG_PASS`` macro
+both to register and to "join" the `AliasAnalysis
+<http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html>`_ analysis group.
+Every implementation of an analysis group should join using this macro.
+
+.. code-block:: c++
+
+ namespace {
+ // Declare that we implement the AliasAnalysis interface
+ INITIALIZE_AG_PASS(BasicAA, AliasAnalysis, "basicaa",
+ "Basic Alias Analysis (default AA impl)",
+ false, // Is CFG Only?
+ true, // Is Analysis?
+ true); // Is default Analysis Group implementation?
+ }
+
+Here we show how the default implementation is specified (using the final
+argument to the ``INITIALIZE_AG_PASS`` template). There must be exactly one
+default implementation available at all times for an Analysis Group to be used.
+Only default implementation can derive from ``ImmutablePass``. Here we declare
+that the `BasicAliasAnalysis
+<http://llvm.org/doxygen/structBasicAliasAnalysis.html>`_ pass is the default
+implementation for the interface.
+
+Pass Statistics
+===============
+
+The `Statistic <http://llvm.org/doxygen/Statistic_8h_source.html>`_ class is
+designed to be an easy way to expose various success metrics from passes.
+These statistics are printed at the end of a run, when the :option:`-stats`
+command line option is enabled on the command line. See the :ref:`Statistics
+section <Statistic>` in the Programmer's Manual for details.
+
+.. _writing-an-llvm-pass-passmanager:
+
+What PassManager does
+---------------------
+
+The `PassManager <http://llvm.org/doxygen/PassManager_8h_source.html>`_ `class
+<http://llvm.org/doxygen/classllvm_1_1PassManager.html>`_ takes a list of
+passes, ensures their :ref:`prerequisites <writing-an-llvm-pass-interaction>`
+are set up correctly, and then schedules passes to run efficiently. All of the
+LLVM tools that run passes use the PassManager for execution of these passes.
+
+The PassManager does two main things to try to reduce the execution time of a
+series of passes:
+
+#. **Share analysis results.** The ``PassManager`` attempts to avoid
+ recomputing analysis results as much as possible. This means keeping track
+ of which analyses are available already, which analyses get invalidated, and
+ which analyses are needed to be run for a pass. An important part of work
+ is that the ``PassManager`` tracks the exact lifetime of all analysis
+ results, allowing it to :ref:`free memory
+ <writing-an-llvm-pass-releaseMemory>` allocated to holding analysis results
+ as soon as they are no longer needed.
+
+#. **Pipeline the execution of passes on the program.** The ``PassManager``
+ attempts to get better cache and memory usage behavior out of a series of
+ passes by pipelining the passes together. This means that, given a series
+ of consecutive :ref:`FunctionPass <writing-an-llvm-pass-FunctionPass>`, it
+ will execute all of the :ref:`FunctionPass
+ <writing-an-llvm-pass-FunctionPass>` on the first function, then all of the
+ :ref:`FunctionPasses <writing-an-llvm-pass-FunctionPass>` on the second
+ function, etc... until the entire program has been run through the passes.
+
+ This improves the cache behavior of the compiler, because it is only
+ touching the LLVM program representation for a single function at a time,
+ instead of traversing the entire program. It reduces the memory consumption
+ of compiler, because, for example, only one `DominatorSet
+ <http://llvm.org/doxygen/classllvm_1_1DominatorSet.html>`_ needs to be
+ calculated at a time. This also makes it possible to implement some
+ :ref:`interesting enhancements <writing-an-llvm-pass-SMP>` in the future.
+
+The effectiveness of the ``PassManager`` is influenced directly by how much
+information it has about the behaviors of the passes it is scheduling. For
+example, the "preserved" set is intentionally conservative in the face of an
+unimplemented :ref:`getAnalysisUsage <writing-an-llvm-pass-getAnalysisUsage>`
+method. Not implementing when it should be implemented will have the effect of
+not allowing any analysis results to live across the execution of your pass.
+
+The ``PassManager`` class exposes a ``--debug-pass`` command line options that
+is useful for debugging pass execution, seeing how things work, and diagnosing
+when you should be preserving more analyses than you currently are. (To get
+information about all of the variants of the ``--debug-pass`` option, just type
+"``opt -help-hidden``").
+
+By using the --debug-pass=Structure option, for example, we can see how our
+:ref:`Hello World <writing-an-llvm-pass-basiccode>` pass interacts with other
+passes. Lets try it out with the gvn and licm passes:
+
+.. code-block:: console
+
+ $ opt -load lib/LLVMHello.so -gvn -licm --debug-pass=Structure < hello.bc > /dev/null
+ ModulePass Manager
+ FunctionPass Manager
+ Dominator Tree Construction
+ Basic Alias Analysis (stateless AA impl)
+ Function Alias Analysis Results
+ Memory Dependence Analysis
+ Global Value Numbering
+ Natural Loop Information
+ Canonicalize natural loops
+ Loop-Closed SSA Form Pass
+ Basic Alias Analysis (stateless AA impl)
+ Function Alias Analysis Results
+ Scalar Evolution Analysis
+ Loop Pass Manager
+ Loop Invariant Code Motion
+ Module Verifier
+ Bitcode Writer
+
+This output shows us when passes are constructed.
+Here we see that GVN uses dominator tree information to do its job. The LICM pass
+uses natural loop information, which uses dominator tree as well.
+
+After the LICM pass, the module verifier runs (which is automatically added by
+the :program:`opt` tool), which uses the dominator tree to check that the
+resultant LLVM code is well formed. Note that the dominator tree is computed
+once, and shared by three passes.
+
+Lets see how this changes when we run the :ref:`Hello World
+<writing-an-llvm-pass-basiccode>` pass in between the two passes:
+
+.. code-block:: console
+
+ $ opt -load lib/LLVMHello.so -gvn -hello -licm --debug-pass=Structure < hello.bc > /dev/null
+ ModulePass Manager
+ FunctionPass Manager
+ Dominator Tree Construction
+ Basic Alias Analysis (stateless AA impl)
+ Function Alias Analysis Results
+ Memory Dependence Analysis
+ Global Value Numbering
+ Hello World Pass
+ Dominator Tree Construction
+ Natural Loop Information
+ Canonicalize natural loops
+ Loop-Closed SSA Form Pass
+ Basic Alias Analysis (stateless AA impl)
+ Function Alias Analysis Results
+ Scalar Evolution Analysis
+ Loop Pass Manager
+ Loop Invariant Code Motion
+ Module Verifier
+ Bitcode Writer
+ Hello: __main
+ Hello: puts
+ Hello: main
+
+Here we see that the :ref:`Hello World <writing-an-llvm-pass-basiccode>` pass
+has killed the Dominator Tree pass, even though it doesn't modify the code at
+all! To fix this, we need to add the following :ref:`getAnalysisUsage
+<writing-an-llvm-pass-getAnalysisUsage>` method to our pass:
+
+.. code-block:: c++
+
+ // We don't modify the program, so we preserve all analyses
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.setPreservesAll();
+ }
+
+Now when we run our pass, we get this output:
+
+.. code-block:: console
+
+ $ opt -load lib/LLVMHello.so -gvn -hello -licm --debug-pass=Structure < hello.bc > /dev/null
+ Pass Arguments: -gvn -hello -licm
+ ModulePass Manager
+ FunctionPass Manager
+ Dominator Tree Construction
+ Basic Alias Analysis (stateless AA impl)
+ Function Alias Analysis Results
+ Memory Dependence Analysis
+ Global Value Numbering
+ Hello World Pass
+ Natural Loop Information
+ Canonicalize natural loops
+ Loop-Closed SSA Form Pass
+ Basic Alias Analysis (stateless AA impl)
+ Function Alias Analysis Results
+ Scalar Evolution Analysis
+ Loop Pass Manager
+ Loop Invariant Code Motion
+ Module Verifier
+ Bitcode Writer
+ Hello: __main
+ Hello: puts
+ Hello: main
+
+Which shows that we don't accidentally invalidate dominator information
+anymore, and therefore do not have to compute it twice.
+
+.. _writing-an-llvm-pass-releaseMemory:
+
+The ``releaseMemory`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. code-block:: c++
+
+ virtual void releaseMemory();
+
+The ``PassManager`` automatically determines when to compute analysis results,
+and how long to keep them around for. Because the lifetime of the pass object
+itself is effectively the entire duration of the compilation process, we need
+some way to free analysis results when they are no longer useful. The
+``releaseMemory`` virtual method is the way to do this.
+
+If you are writing an analysis or any other pass that retains a significant
+amount of state (for use by another pass which "requires" your pass and uses
+the :ref:`getAnalysis <writing-an-llvm-pass-getAnalysis>` method) you should
+implement ``releaseMemory`` to, well, release the memory allocated to maintain
+this internal state. This method is called after the ``run*`` method for the
+class, before the next call of ``run*`` in your pass.
+
+Registering dynamically loaded passes
+=====================================
+
+*Size matters* when constructing production quality tools using LLVM, both for
+the purposes of distribution, and for regulating the resident code size when
+running on the target system. Therefore, it becomes desirable to selectively
+use some passes, while omitting others and maintain the flexibility to change
+configurations later on. You want to be able to do all this, and, provide
+feedback to the user. This is where pass registration comes into play.
+
+The fundamental mechanisms for pass registration are the
+``MachinePassRegistry`` class and subclasses of ``MachinePassRegistryNode``.
+
+An instance of ``MachinePassRegistry`` is used to maintain a list of
+``MachinePassRegistryNode`` objects. This instance maintains the list and
+communicates additions and deletions to the command line interface.
+
+An instance of ``MachinePassRegistryNode`` subclass is used to maintain
+information provided about a particular pass. This information includes the
+command line name, the command help string and the address of the function used
+to create an instance of the pass. A global static constructor of one of these
+instances *registers* with a corresponding ``MachinePassRegistry``, the static
+destructor *unregisters*. Thus a pass that is statically linked in the tool
+will be registered at start up. A dynamically loaded pass will register on
+load and unregister at unload.
+
+Using existing registries
+-------------------------
+
+There are predefined registries to track instruction scheduling
+(``RegisterScheduler``) and register allocation (``RegisterRegAlloc``) machine
+passes. Here we will describe how to *register* a register allocator machine
+pass.
+
+Implement your register allocator machine pass. In your register allocator
+``.cpp`` file add the following include:
+
+.. code-block:: c++
+
+ #include "llvm/CodeGen/RegAllocRegistry.h"
+
+Also in your register allocator ``.cpp`` file, define a creator function in the
+form:
+
+.. code-block:: c++
+
+ FunctionPass *createMyRegisterAllocator() {
+ return new MyRegisterAllocator();
+ }
+
+Note that the signature of this function should match the type of
+``RegisterRegAlloc::FunctionPassCtor``. In the same file add the "installing"
+declaration, in the form:
+
+.. code-block:: c++
+
+ static RegisterRegAlloc myRegAlloc("myregalloc",
+ "my register allocator help string",
+ createMyRegisterAllocator);
+
+Note the two spaces prior to the help string produces a tidy result on the
+:option:`-help` query.
+
+.. code-block:: console
+
+ $ llc -help
+ ...
+ -regalloc - Register allocator to use (default=linearscan)
+ =linearscan - linear scan register allocator
+ =local - local register allocator
+ =simple - simple register allocator
+ =myregalloc - my register allocator help string
+ ...
+
+And that's it. The user is now free to use ``-regalloc=myregalloc`` as an
+option. Registering instruction schedulers is similar except use the
+``RegisterScheduler`` class. Note that the
+``RegisterScheduler::FunctionPassCtor`` is significantly different from
+``RegisterRegAlloc::FunctionPassCtor``.
+
+To force the load/linking of your register allocator into the
+:program:`llc`/:program:`lli` tools, add your creator function's global
+declaration to ``Passes.h`` and add a "pseudo" call line to
+``llvm/Codegen/LinkAllCodegenComponents.h``.
+
+Creating new registries
+-----------------------
+
+The easiest way to get started is to clone one of the existing registries; we
+recommend ``llvm/CodeGen/RegAllocRegistry.h``. The key things to modify are
+the class name and the ``FunctionPassCtor`` type.
+
+Then you need to declare the registry. Example: if your pass registry is
+``RegisterMyPasses`` then define:
+
+.. code-block:: c++
+
+ MachinePassRegistry RegisterMyPasses::Registry;
+
+And finally, declare the command line option for your passes. Example:
+
+.. code-block:: c++
+
+ cl::opt<RegisterMyPasses::FunctionPassCtor, false,
+ RegisterPassParser<RegisterMyPasses> >
+ MyPassOpt("mypass",
+ cl::init(&createDefaultMyPass),
+ cl::desc("my pass option help"));
+
+Here the command option is "``mypass``", with ``createDefaultMyPass`` as the
+default creator.
+
+Using GDB with dynamically loaded passes
+----------------------------------------
+
+Unfortunately, using GDB with dynamically loaded passes is not as easy as it
+should be. First of all, you can't set a breakpoint in a shared object that
+has not been loaded yet, and second of all there are problems with inlined
+functions in shared objects. Here are some suggestions to debugging your pass
+with GDB.
+
+For sake of discussion, I'm going to assume that you are debugging a
+transformation invoked by :program:`opt`, although nothing described here
+depends on that.
+
+Setting a breakpoint in your pass
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+First thing you do is start gdb on the opt process:
+
+.. code-block:: console
+
+ $ gdb opt
+ GNU gdb 5.0
+ Copyright 2000 Free Software Foundation, Inc.
+ GDB is free software, covered by the GNU General Public License, and you are
+ welcome to change it and/or distribute copies of it under certain conditions.
+ Type "show copying" to see the conditions.
+ There is absolutely no warranty for GDB. Type "show warranty" for details.
+ This GDB was configured as "sparc-sun-solaris2.6"...
+ (gdb)
+
+Note that :program:`opt` has a lot of debugging information in it, so it takes
+time to load. Be patient. Since we cannot set a breakpoint in our pass yet
+(the shared object isn't loaded until runtime), we must execute the process,
+and have it stop before it invokes our pass, but after it has loaded the shared
+object. The most foolproof way of doing this is to set a breakpoint in
+``PassManager::run`` and then run the process with the arguments you want:
+
+.. code-block:: console
+
+ $ (gdb) break llvm::PassManager::run
+ Breakpoint 1 at 0x2413bc: file Pass.cpp, line 70.
+ (gdb) run test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]
+ Starting program: opt test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]
+ Breakpoint 1, PassManager::run (this=0xffbef174, M=@0x70b298) at Pass.cpp:70
+ 70 bool PassManager::run(Module &M) { return PM->run(M); }
+ (gdb)
+
+Once the :program:`opt` stops in the ``PassManager::run`` method you are now
+free to set breakpoints in your pass so that you can trace through execution or
+do other standard debugging stuff.
+
+Miscellaneous Problems
+^^^^^^^^^^^^^^^^^^^^^^
+
+Once you have the basics down, there are a couple of problems that GDB has,
+some with solutions, some without.
+
+* Inline functions have bogus stack information. In general, GDB does a pretty
+ good job getting stack traces and stepping through inline functions. When a
+ pass is dynamically loaded however, it somehow completely loses this
+ capability. The only solution I know of is to de-inline a function (move it
+ from the body of a class to a ``.cpp`` file).
+
+* Restarting the program breaks breakpoints. After following the information
+ above, you have succeeded in getting some breakpoints planted in your pass.
+ Next thing you know, you restart the program (i.e., you type "``run``" again),
+ and you start getting errors about breakpoints being unsettable. The only
+ way I have found to "fix" this problem is to delete the breakpoints that are
+ already set in your pass, run the program, and re-set the breakpoints once
+ execution stops in ``PassManager::run``.
+
+Hopefully these tips will help with common case debugging situations. If you'd
+like to contribute some tips of your own, just contact `Chris
+<mailto:sabre at nondot.org>`_.
+
+Future extensions planned
+-------------------------
+
+Although the LLVM Pass Infrastructure is very capable as it stands, and does
+some nifty stuff, there are things we'd like to add in the future. Here is
+where we are going:
+
+.. _writing-an-llvm-pass-SMP:
+
+Multithreaded LLVM
+^^^^^^^^^^^^^^^^^^
+
+Multiple CPU machines are becoming more common and compilation can never be
+fast enough: obviously we should allow for a multithreaded compiler. Because
+of the semantics defined for passes above (specifically they cannot maintain
+state across invocations of their ``run*`` methods), a nice clean way to
+implement a multithreaded compiler would be for the ``PassManager`` class to
+create multiple instances of each pass object, and allow the separate instances
+to be hacking on different parts of the program at the same time.
+
+This implementation would prevent each of the passes from having to implement
+multithreaded constructs, requiring only the LLVM core to have locking in a few
+places (for global resources). Although this is a simple extension, we simply
+haven't had time (or multiprocessor machines, thus a reason) to implement this.
+Despite that, we have kept the LLVM passes SMP ready, and you should too.
+
Added: www-releases/trunk/8.0.1/docs/_sources/XRay.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/XRay.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/XRay.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/XRay.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,339 @@
+====================
+XRay Instrumentation
+====================
+
+:Version: 1 as of 2016-11-08
+
+.. contents::
+ :local:
+
+
+Introduction
+============
+
+XRay is a function call tracing system which combines compiler-inserted
+instrumentation points and a runtime library that can dynamically enable and
+disable the instrumentation.
+
+More high level information about XRay can be found in the `XRay whitepaper`_.
+
+This document describes how to use XRay as implemented in LLVM.
+
+XRay in LLVM
+============
+
+XRay consists of three main parts:
+
+- Compiler-inserted instrumentation points.
+- A runtime library for enabling/disabling tracing at runtime.
+- A suite of tools for analysing the traces.
+
+ **NOTE:** As of July 25, 2018 , XRay is only available for the following
+ architectures running Linux: x86_64, arm7 (no thumb), aarch64, powerpc64le,
+ mips, mipsel, mips64, mips64el, NetBSD: x86_64, FreeBSD: x86_64 and
+ OpenBSD: x86_64.
+
+The compiler-inserted instrumentation points come in the form of nop-sleds in
+the final generated binary, and an ELF section named ``xray_instr_map`` which
+contains entries pointing to these instrumentation points. The runtime library
+relies on being able to access the entries of the ``xray_instr_map``, and
+overwrite the instrumentation points at runtime.
+
+Using XRay
+==========
+
+You can use XRay in a couple of ways:
+
+- Instrumenting your C/C++/Objective-C/Objective-C++ application.
+- Generating LLVM IR with the correct function attributes.
+
+The rest of this section covers these main ways and later on how to customise
+what XRay does in an XRay-instrumented binary.
+
+Instrumenting your C/C++/Objective-C Application
+------------------------------------------------
+
+The easiest way of getting XRay instrumentation for your application is by
+enabling the ``-fxray-instrument`` flag in your clang invocation.
+
+For example:
+
+::
+
+ clang -fxray-instrument ...
+
+By default, functions that have at least 200 instructions will get XRay
+instrumentation points. You can tweak that number through the
+``-fxray-instruction-threshold=`` flag:
+
+::
+
+ clang -fxray-instrument -fxray-instruction-threshold=1 ...
+
+You can also specifically instrument functions in your binary to either always
+or never be instrumented using source-level attributes. You can do it using the
+GCC-style attributes or C++11-style attributes.
+
+.. code-block:: c++
+
+ [[clang::xray_always_instrument]] void always_instrumented();
+
+ [[clang::xray_never_instrument]] void never_instrumented();
+
+ void alt_always_instrumented() __attribute__((xray_always_instrument));
+
+ void alt_never_instrumented() __attribute__((xray_never_instrument));
+
+When linking a binary, you can either manually link in the `XRay Runtime
+Library`_ or use ``clang`` to link it in automatically with the
+``-fxray-instrument`` flag. Alternatively, you can statically link-in the XRay
+runtime library from compiler-rt -- those archive files will take the name of
+`libclang_rt.xray-{arch}` where `{arch}` is the mnemonic supported by clang
+(x86_64, arm7, etc.).
+
+LLVM Function Attribute
+-----------------------
+
+If you're using LLVM IR directly, you can add the ``function-instrument``
+string attribute to your functions, to get the similar effect that the
+C/C++/Objective-C source-level attributes would get:
+
+.. code-block:: llvm
+
+ define i32 @always_instrument() uwtable "function-instrument"="xray-always" {
+ ; ...
+ }
+
+ define i32 @never_instrument() uwtable "function-instrument"="xray-never" {
+ ; ...
+ }
+
+You can also set the ``xray-instruction-threshold`` attribute and provide a
+numeric string value for how many instructions should be in the function before
+it gets instrumented.
+
+.. code-block:: llvm
+
+ define i32 @maybe_instrument() uwtable "xray-instruction-threshold"="2" {
+ ; ...
+ }
+
+Special Case File
+-----------------
+
+Attributes can be imbued through the use of special case files instead of
+adding them to the original source files. You can use this to mark certain
+functions and classes to be never, always, or instrumented with first-argument
+logging from a file. The file's format is described below:
+
+.. code-block:: bash
+
+ # Comments are supported
+ [always]
+ fun:always_instrument
+ fun:log_arg1=arg1 # Log the first argument for the function
+
+ [never]
+ fun:never_instrument
+
+These files can be provided through the ``-fxray-attr-list=`` flag to clang.
+You may have multiple files loaded through multiple instances of the flag.
+
+XRay Runtime Library
+--------------------
+
+The XRay Runtime Library is part of the compiler-rt project, which implements
+the runtime components that perform the patching and unpatching of inserted
+instrumentation points. When you use ``clang`` to link your binaries and the
+``-fxray-instrument`` flag, it will automatically link in the XRay runtime.
+
+The default implementation of the XRay runtime will enable XRay instrumentation
+before ``main`` starts, which works for applications that have a short
+lifetime. This implementation also records all function entry and exit events
+which may result in a lot of records in the resulting trace.
+
+Also by default the filename of the XRay trace is ``xray-log.XXXXXX`` where the
+``XXXXXX`` part is randomly generated.
+
+These options can be controlled through the ``XRAY_OPTIONS`` environment
+variable, where we list down the options and their defaults below.
+
++-------------------+-----------------+---------------+------------------------+
+| Option | Type | Default | Description |
++===================+=================+===============+========================+
+| patch_premain | ``bool`` | ``false`` | Whether to patch |
+| | | | instrumentation points |
+| | | | before main. |
++-------------------+-----------------+---------------+------------------------+
+| xray_mode | ``const char*`` | ``""`` | Default mode to |
+| | | | install and initialize |
+| | | | before ``main``. |
++-------------------+-----------------+---------------+------------------------+
+| xray_logfile_base | ``const char*`` | ``xray-log.`` | Filename base for the |
+| | | | XRay logfile. |
++-------------------+-----------------+---------------+------------------------+
+| verbosity | ``int`` | ``0`` | Runtime verbosity |
+| | | | level. |
++-------------------+-----------------+---------------+------------------------+
+
+
+If you choose to not use the default logging implementation that comes with the
+XRay runtime and/or control when/how the XRay instrumentation runs, you may use
+the XRay APIs directly for doing so. To do this, you'll need to include the
+``xray_log_interface.h`` from the compiler-rt ``xray`` directory. The important API
+functions we list below:
+
+- ``__xray_log_register_mode(...)``: Register a logging implementation against
+ a string Mode identifier. The implementation is an instance of
+ ``XRayLogImpl`` defined in ``xray/xray_log_interface.h``.
+- ``__xray_log_select_mode(...)``: Select the mode to install, associated with
+ a string Mode identifier. Only implementations registered with
+ ``__xray_log_register_mode(...)`` can be chosen with this function.
+- ``__xray_log_init_mode(...)``: This function allows for initializing and
+ re-initializing an installed logging implementation. See
+ ``xray/xray_log_interface.h`` for details, part of the XRay compiler-rt
+ installation.
+
+Once a logging implementation has been initialized, it can be "stopped" by
+finalizing the implementation through the ``__xray_log_finalize()`` function.
+The finalization routine is the opposite of the initialization. When finalized,
+an implementation's data can be cleared out through the
+``__xray_log_flushLog()`` function. For implementations that support in-memory
+processing, these should register an iterator function to provide access to the
+data via the ``__xray_log_set_buffer_iterator(...)`` which allows code calling
+the ``__xray_log_process_buffers(...)`` function to deal with the data in
+memory.
+
+All of this is better explained in the ``xray/xray_log_interface.h`` header.
+
+Basic Mode
+----------
+
+XRay supports a basic logging mode which will trace the application's
+execution, and periodically append to a single log. This mode can be
+installed/enabled by setting ``xray_mode=xray-basic`` in the ``XRAY_OPTIONS``
+environment variable. Combined with ``patch_premain=true`` this can allow for
+tracing applications from start to end.
+
+Like all the other modes installed through ``__xray_log_select_mode(...)``, the
+implementation can be configured through the ``__xray_log_init_mode(...)``
+function, providing the mode string and the flag options. Basic-mode specific
+defaults can be provided in the ``XRAY_BASIC_OPTIONS`` environment variable.
+
+Flight Data Recorder Mode
+-------------------------
+
+XRay supports a logging mode which allows the application to only capture a
+fixed amount of memory's worth of events. Flight Data Recorder (FDR) mode works
+very much like a plane's "black box" which keeps recording data to memory in a
+fixed-size circular queue of buffers, and have the data available
+programmatically until the buffers are finalized and flushed. To use FDR mode
+on your application, you may set the ``xray_mode`` variable to ``xray-fdr`` in
+the ``XRAY_OPTIONS`` environment variable. Additional options to the FDR mode
+implementation can be provided in the ``XRAY_FDR_OPTIONS`` environment
+variable. Programmatic configuration can be done by calling
+``__xray_log_init_mode("xray-fdr", <configuration string>)`` once it has been
+selected/installed.
+
+When the buffers are flushed to disk, the result is a binary trace format
+described by `XRay FDR format <XRayFDRFormat.html>`_
+
+When FDR mode is on, it will keep writing and recycling memory buffers until
+the logging implementation is finalized -- at which point it can be flushed and
+re-initialised later. To do this programmatically, we follow the workflow
+provided below:
+
+.. code-block:: c++
+
+ // Patch the sleds, if we haven't yet.
+ auto patch_status = __xray_patch();
+
+ // Maybe handle the patch_status errors.
+
+ // When we want to flush the log, we need to finalize it first, to give
+ // threads a chance to return buffers to the queue.
+ auto finalize_status = __xray_log_finalize();
+ if (finalize_status != XRAY_LOG_FINALIZED) {
+ // maybe retry, or bail out.
+ }
+
+ // At this point, we are sure that the log is finalized, so we may try
+ // flushing the log.
+ auto flush_status = __xray_log_flushLog();
+ if (flush_status != XRAY_LOG_FLUSHED) {
+ // maybe retry, or bail out.
+ }
+
+The default settings for the FDR mode implementation will create logs named
+similarly to the basic log implementation, but will have a different log
+format. All the trace analysis tools (and the trace reading library) will
+support all versions of the FDR mode format as we add more functionality and
+record types in the future.
+
+ **NOTE:** We do not promise perpetual support for when we update the log
+ versions we support going forward. Deprecation of the formats will be
+ announced and discussed on the developers mailing list.
+
+Trace Analysis Tools
+--------------------
+
+We currently have the beginnings of a trace analysis tool in LLVM, which can be
+found in the ``tools/llvm-xray`` directory. The ``llvm-xray`` tool currently
+supports the following subcommands:
+
+- ``extract``: Extract the instrumentation map from a binary, and return it as
+ YAML.
+- ``account``: Performs basic function call accounting statistics with various
+ options for sorting, and output formats (supports CSV, YAML, and
+ console-friendly TEXT).
+- ``convert``: Converts an XRay log file from one format to another. We can
+ convert from binary XRay traces (both basic and FDR mode) to YAML,
+ `flame-graph <https://github.com/brendangregg/FlameGraph>`_ friendly text
+ formats, as well as `Chrome Trace Viewer (catapult)
+ <https://github.com/catapult-project/catapult>` formats.
+- ``graph``: Generates a DOT graph of the function call relationships between
+ functions found in an XRay trace.
+- ``stack``: Reconstructs function call stacks from a timeline of function
+ calls in an XRay trace.
+
+These subcommands use various library components found as part of the XRay
+libraries, distributed with the LLVM distribution. These are:
+
+- ``llvm/XRay/Trace.h`` : A trace reading library for conveniently loading
+ an XRay trace of supported forms, into a convenient in-memory representation.
+ All the analysis tools that deal with traces use this implementation.
+- ``llvm/XRay/Graph.h`` : A semi-generic graph type used by the graph
+ subcommand to conveniently represent a function call graph with statistics
+ associated with edges and vertices.
+- ``llvm/XRay/InstrumentationMap.h``: A convenient tool for analyzing the
+ instrumentation map in XRay-instrumented object files and binaries. The
+ ``extract`` and ``stack`` subcommands uses this particular library.
+
+Future Work
+===========
+
+There are a number of ongoing efforts for expanding the toolset building around
+the XRay instrumentation system.
+
+Trace Analysis Tools
+--------------------
+
+- Work is in progress to integrate with or develop tools to visualize findings
+ from an XRay trace. Particularly, the ``stack`` tool is being expanded to
+ output formats that allow graphing and exploring the duration of time in each
+ call stack.
+- With a large instrumented binary, the size of generated XRay traces can
+ quickly become unwieldy. We are working on integrating pruning techniques and
+ heuristics for the analysis tools to sift through the traces and surface only
+ relevant information.
+
+More Platforms
+--------------
+
+We're looking forward to contributions to port XRay to more architectures and
+operating systems.
+
+.. References...
+
+.. _`XRay whitepaper`: http://research.google.com/pubs/pub45287.html
+
Added: www-releases/trunk/8.0.1/docs/_sources/XRayExample.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/XRayExample.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/XRayExample.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/XRayExample.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,349 @@
+===================
+Debugging with XRay
+===================
+
+This document shows an example of how you would go about analyzing applications
+built with XRay instrumentation. Here we will attempt to debug ``llc``
+compiling some sample LLVM IR generated by Clang.
+
+.. contents::
+ :local:
+
+Building with XRay
+------------------
+
+To debug an application with XRay instrumentation, we need to build it with a
+Clang that supports the ``-fxray-instrument`` option. See `XRay <XRay.html>`_
+for more technical details of how XRay works for background information.
+
+In our example, we need to add ``-fxray-instrument`` to the list of flags
+passed to Clang when building a binary. Note that we need to link with Clang as
+well to get the XRay runtime linked in appropriately. For building ``llc`` with
+XRay, we do something similar below for our LLVM build:
+
+::
+
+ $ mkdir -p llvm-build && cd llvm-build
+ # Assume that the LLVM sources are at ../llvm
+ $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
+ -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \
+ # Once this finishes, we should build llc
+ $ ninja llc
+
+
+To verify that we have an XRay instrumented binary, we can use ``objdump`` to
+look for the ``xray_instr_map`` section.
+
+::
+
+ $ objdump -h -j xray_instr_map ./bin/llc
+ ./bin/llc: file format elf64-x86-64
+
+ Sections:
+ Idx Name Size VMA LMA File off Algn
+ 14 xray_instr_map 00002fc0 00000000041516c6 00000000041516c6 03d516c6 2**0
+ CONTENTS, ALLOC, LOAD, READONLY, DATA
+
+Getting Traces
+--------------
+
+By default, XRay does not write out the trace files or patch the application
+before main starts. If we run ``llc`` it should work like a normally built
+binary. If we want to get a full trace of the application's operations (of the
+functions we do end up instrumenting with XRay) then we need to enable XRay
+at application start. To do this, XRay checks the ``XRAY_OPTIONS`` environment
+variable.
+
+::
+
+ # The following doesn't create an XRay trace by default.
+ $ ./bin/llc input.ll
+
+ # We need to set the XRAY_OPTIONS to enable some features.
+ $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./bin/llc input.ll
+ ==69819==XRay: Log file in 'xray-log.llc.m35qPB'
+
+At this point we now have an XRay trace we can start analysing.
+
+The ``llvm-xray`` Tool
+----------------------
+
+Having a trace then allows us to do basic accounting of the functions that were
+instrumented, and how much time we're spending in parts of the code. To make
+sense of this data, we use the ``llvm-xray`` tool which has a few subcommands
+to help us understand our trace.
+
+One of the things we can do is to get an accounting of the functions that have
+been instrumented. We can see an example accounting with ``llvm-xray account``:
+
+::
+
+ $ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
+ Functions with latencies: 29
+ funcid count [ min, med, 90p, 99p, max] sum function
+ 187 360 [ 0.000000, 0.000001, 0.000014, 0.000032, 0.000075] 0.001596 LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier()
+ 85 130 [ 0.000000, 0.000000, 0.000018, 0.000023, 0.000156] 0.000799 X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*)
+ 138 130 [ 0.000000, 0.000000, 0.000017, 0.000155, 0.000155] 0.000774 SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
+ 188 103 [ 0.000000, 0.000000, 0.000003, 0.000123, 0.000214] 0.000737 LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*)
+ 88 1 [ 0.000562, 0.000562, 0.000562, 0.000562, 0.000562] 0.000562 X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&)
+ 125 102 [ 0.000001, 0.000003, 0.000010, 0.000017, 0.000049] 0.000471 Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&)
+ 90 8 [ 0.000023, 0.000035, 0.000106, 0.000106, 0.000106] 0.000342 X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const
+ 124 32 [ 0.000003, 0.000007, 0.000016, 0.000041, 0.000041] 0.000310 Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&)
+ 123 1 [ 0.000302, 0.000302, 0.000302, 0.000302, 0.000302] 0.000302 LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl()
+ 139 46 [ 0.000000, 0.000002, 0.000006, 0.000008, 0.000019] 0.000138 TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const
+
+This shows us that for our input file, ``llc`` spent the most cumulative time
+in the lexer (a total of 1 millisecond). If we wanted for example to work with
+this data in a spreadsheet, we can output the results as CSV using the
+``-format=csv`` option to the command for further analysis.
+
+If we want to get a textual representation of the raw trace we can use the
+``llvm-xray convert`` tool to get YAML output. The first few lines of that
+output for an example trace would look like the following:
+
+::
+
+ $ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB
+ ---
+ header:
+ version: 1
+ type: 0
+ constant-tsc: true
+ nonstop-tsc: true
+ cycle-frequency: 2601000000
+ records:
+ - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 }
+ - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 }
+ - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 }
+ - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 }
+ - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 }
+ - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 }
+ - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 }
+ - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 }
+ - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 }
+ - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 }
+ - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 }
+ - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 }
+
+Controlling Fidelity
+--------------------
+
+So far in our examples, we haven't been getting full coverage of the functions
+we have in the binary. To get that, we need to modify the compiler flags so
+that we can instrument more (if not all) the functions we have in the binary.
+We have two options for doing that, and we explore both of these below.
+
+Instruction Threshold
+`````````````````````
+
+The first "blunt" way of doing this is by setting the minimum threshold for
+function bodies to 1. We can do that with the
+``-fxray-instruction-threshold=N`` flag when building our binary. We rebuild
+``llc`` with this option and observe the results:
+
+::
+
+ $ rm CMakeCache.txt
+ $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
+ -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \
+ -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1"
+ $ ninja llc
+ $ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
+ ==69819==XRay: Log file in 'xray-log.llc.5rqxkU'
+
+ $ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
+ Functions with latencies: 36652
+ funcid count [ min, med, 90p, 99p, max] sum function
+ 75 1 [ 0.672368, 0.672368, 0.672368, 0.672368, 0.672368] 0.672368 llc.cpp:271:0: main
+ 78 1 [ 0.626455, 0.626455, 0.626455, 0.626455, 0.626455] 0.626455 llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&)
+ 139617 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&)
+ 139610 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&)
+ 139612 1 [ 0.470948, 0.470948, 0.470948, 0.470948, 0.470948] 0.470948 LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&)
+ 139607 2 [ 0.147345, 0.315994, 0.315994, 0.315994, 0.315994] 0.463340 LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&)
+ 139605 21 [ 0.000002, 0.000002, 0.102593, 0.213336, 0.213336] 0.463331 LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&)
+ 139563 26096 [ 0.000002, 0.000002, 0.000037, 0.000063, 0.000215] 0.225708 LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool)
+ 108055 188 [ 0.000002, 0.000120, 0.001375, 0.004523, 0.062624] 0.159279 MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
+ 62635 22 [ 0.000041, 0.000046, 0.000050, 0.126744, 0.126744] 0.127715 X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const
+
+
+Instrumentation Attributes
+``````````````````````````
+
+The other way is to use configuration files for selecting which functions
+should always be instrumented by the compiler. This gives us a way of ensuring
+that certain functions are either always or never instrumented by not having to
+add the attribute to the source.
+
+To use this feature, you can define one file for the functions to always
+instrument, and another for functions to never instrument. The format of these
+files are exactly the same as the SanitizerLists files that control similar
+things for the sanitizer implementations. For example:
+
+::
+
+ # xray-attr-list.txt
+ # always instrument functions that match the following filters:
+ [always]
+ fun:main
+
+ # never instrument functions that match the following filters:
+ [never]
+ fun:__cxx_*
+
+Given the file above we can re-build by providing it to the
+``-fxray-attr-list=`` flag to clang. You can have multiple files, each defining
+different sets of attribute sets, to be combined into a single list by clang.
+
+The XRay stack tool
+-------------------
+
+Given a trace, and optionally an instrumentation map, the ``llvm-xray stack``
+command can be used to analyze a call stack graph constructed from the function
+call timeline.
+
+The way to use the command is to output the top stacks by call count and time spent.
+
+::
+
+ $ llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc
+
+ Unique Stacks: 3069
+ Top 10 Stacks by leaf sum:
+
+ Sum: 9633790
+ lvl function count sum
+ #0 main 1 58421550
+ #1 compileModule(char**, llvm::LLVMContext&) 1 51440360
+ #2 llvm::legacy::PassManagerImpl::run(llvm::Module&) 1 40535375
+ #3 llvm::FPPassManager::runOnModule(llvm::Module&) 2 39337525
+ #4 llvm::FPPassManager::runOnFunction(llvm::Function&) 6 39331465
+ #5 llvm::PMDataManager::verifyPreservedAnalysis(llvm::Pass*) 399 16628590
+ #6 llvm::PMTopLevelManager::findAnalysisPass(void const*) 4584 15155600
+ #7 llvm::PMDataManager::findAnalysisPass(void const*, bool) 32088 9633790
+
+ ..etc..
+
+In the default mode, identical stacks on different threads are independently
+aggregated. In a multithreaded program, you may end up having identical call
+stacks fill your list of top calls.
+
+To address this, you may specify the ``-aggregate-threads`` or
+``-per-thread-stacks`` flags. ``-per-thread-stacks`` treats the thread id as an
+implicit root in each call stack tree, while ``-aggregate-threads`` combines
+identical stacks from all threads.
+
+Flame Graph Generation
+----------------------
+
+The ``llvm-xray stack`` tool may also be used to generate flamegraphs for
+visualizing your instrumented invocations. The tool does not generate the graphs
+themselves, but instead generates a format that can be used with Brendan Gregg's
+FlameGraph tool, currently available on `github
+<https://github.com/brendangregg/FlameGraph>`_.
+
+To generate output for a flamegraph, a few more options are necessary.
+
+- ``-all-stacks`` - Emits all of the stacks.
+- ``-stack-format`` - Choose the flamegraph output format 'flame'.
+- ``-aggregation-type`` - Choose the metric to graph.
+
+You may pipe the command output directly to the flamegraph tool to obtain an
+svg file.
+
+::
+
+ $llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc -stack-format=flame -aggregation-type=time -all-stacks | \
+ /path/to/FlameGraph/flamegraph.pl > flamegraph.svg
+
+If you open the svg in a browser, mouse events allow exploring the call stacks.
+
+Chrome Trace Viewer Visualization
+---------------------------------
+
+We can also generate a trace which can be loaded by the Chrome Trace Viewer
+from the same generated trace:
+
+::
+
+ $ llvm-xray convert -symbolize -instr_map=./bin/llc \
+ -output_format=trace_event xray-log.llc.5rqxkU \
+ | gzip > llc-trace.txt.gz
+
+From a Chrome browser, navigating to ``chrome:///tracing`` allows us to load
+the ``sample-trace.txt.gz`` file to visualize the execution trace.
+
+Further Exploration
+-------------------
+
+The ``llvm-xray`` tool has a few other subcommands that are in various stages
+of being developed. One interesting subcommand that can highlight a few
+interesting things is the ``graph`` subcommand. Given for example the following
+toy program that we build with XRay instrumentation, we can see how the
+generated graph may be a helpful indicator of where time is being spent for the
+application.
+
+.. code-block:: c++
+
+ // sample.cc
+ #include <iostream>
+ #include <thread>
+
+ [[clang::xray_always_instrument]] void f() {
+ std::cerr << '.';
+ }
+
+ [[clang::xray_always_instrument]] void g() {
+ for (int i = 0; i < 1 << 10; ++i) {
+ std::cerr << '-';
+ }
+ }
+
+ int main(int argc, char* argv[]) {
+ std::thread t1([] {
+ for (int i = 0; i < 1 << 10; ++i)
+ f();
+ });
+ std::thread t2([] {
+ g();
+ });
+ t1.join();
+ t2.join();
+ std::cerr << '\n';
+ }
+
+We then build the above with XRay instrumentation:
+
+::
+
+ $ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1
+ $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic" ./sample
+
+We can then explore the graph rendering of the trace generated by this sample
+application. We assume you have the graphviz toosl available in your system,
+including both ``unflatten`` and ``dot``. If you prefer rendering or exploring
+the graph using another tool, then that should be feasible as well. ``llvm-xray
+graph`` will create DOT format graphs which should be usable in most graph
+rendering applications. One example invocation of the ``llvm-xray graph``
+command should yield some interesting insights to the workings of C++
+applications:
+
+::
+
+ $ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \
+ | unflatten -f -l10 | dot -Tsvg -o sample.svg
+
+
+Next Steps
+----------
+
+If you have some interesting analyses you'd like to implement as part of the
+llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list.
+The following are some ideas to inspire you in getting involved and potentially
+making things better.
+
+ - Implement a query/filtering library that allows for finding patterns in the
+ XRay traces.
+ - Collecting function call stacks and how often they're encountered in the
+ XRay trace.
+
+
Added: www-releases/trunk/8.0.1/docs/_sources/XRayFDRFormat.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/XRayFDRFormat.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/XRayFDRFormat.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/XRayFDRFormat.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,401 @@
+======================================
+XRay Flight Data Recorder Trace Format
+======================================
+
+:Version: 1 as of 2017-07-20
+
+.. contents::
+ :local:
+
+
+Introduction
+============
+
+When gathering XRay traces in Flight Data Recorder mode, each thread of an
+application will claim buffers to fill with trace data, which at some point
+is finalized and flushed.
+
+A goal of the profiler is to minimize overhead, the flushed data directly
+corresponds to the buffer.
+
+This document describes the format of a trace file.
+
+
+General
+=======
+
+Each trace file corresponds to a sequence of events in a particular thread.
+
+The file has a header followed by a sequence of discriminated record types.
+
+The endianness of byte fields matches the endianess of the platform which
+produced the trace file.
+
+
+Header Section
+==============
+
+A trace file begins with a 32 byte header.
+
++-------------------+-----------------+----------------------------------------+
+| Field | Size (bytes) | Description |
++===================+=================+========================================+
+| version | ``2`` | Anticipates versioned readers. This |
+| | | document describes the format when |
+| | | version == 1 |
++-------------------+-----------------+----------------------------------------+
+| type | ``2`` | An enumeration encoding the type of |
+| | | trace. Flight Data Recorder mode |
+| | | traces have type == 1 |
++-------------------+-----------------+----------------------------------------+
+| bitfield | ``4`` | Holds parameters that are not aligned |
+| | | to bytes. Further described below. |
++-------------------+-----------------+----------------------------------------+
+| cycle_frequency | ``8`` | The frequency in hertz of the CPU |
+| | | oscillator used to measure duration of |
+| | | events in ticks. |
++-------------------+-----------------+----------------------------------------+
+| buffer_size | ``8`` | The size in bytes of the data portion |
+| | | of the trace following the header. |
++-------------------+-----------------+----------------------------------------+
+| reserved | ``8`` | Reserved for future use. |
++-------------------+-----------------+----------------------------------------+
+
+The bitfield parameter of the file header is composed of the following fields.
+
++-------------------+----------------+-----------------------------------------+
+| Field | Size (bits) | Description |
++===================+================+=========================================+
+| constant_tsc | ``1`` | Whether the platform's timestamp |
+| | | counter used to record ticks between |
+| | | events ticks at a constant frequency |
+| | | despite CPU frequency changes. |
+| | | 0 == non-constant. 1 == constant. |
++-------------------+----------------+-----------------------------------------+
+| nonstop_tsc | ``1`` | Whether the tsc continues to count |
+| | | despite whether the CPU is in a low |
+| | | power state. 0 == stop. 1 == non-stop. |
++-------------------+----------------+-----------------------------------------+
+| reserved | ``30`` | Not meaningful. |
++-------------------+----------------+-----------------------------------------+
+
+
+Data Section
+============
+
+Following the header in a trace is a data section with size matching the
+buffer_size field in the header.
+
+The data section is a stream of elements of different types.
+
+There are a few categories of data in the sequence.
+
+- ``Function Records``: Function Records contain the timing of entry into and
+ exit from function execution. Function Records have 8 bytes each.
+
+- ``Metadata Records``: Metadata records serve many purposes. Mostly, they
+ capture information that may be too costly to record for each function, but
+ that is required to contextualize the fine-grained timings. They also are used
+ as markers for user-defined Event Data payloads. Metadata records have 16
+ bytes each.
+
+- ``Event Data``: Free form data may be associated with events that are traced
+ by the binary and encode data defined by a handler function. Event data is
+ always preceded with a marker record which indicates how large it is.
+
+- ``Function Arguments``: The arguments to some functions are included in the
+ trace. These are either pointer addresses or primitives that are read and
+ logged independently of their types in a high level language. To the tracer,
+ they are all numbers. Function Records that have attached arguments will
+ indicate their presence on the function entry record. We only support logging
+ contiguous function argument sequences starting with argument zero, which will
+ be the "this" pointer for member function invocations. For example, we don't
+ support logging the first and third argument.
+
+A reader of the memory format must maintain a state machine. The format makes no
+attempt to pad for alignment, and it is not seekable.
+
+
+Function Records
+----------------
+
+Function Records have an 8 byte layout. This layout encodes information to
+reconstruct a call stack of instrumented function and their durations.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bits) | Description |
++===============+==============+===============================================+
+| discriminant | ``1`` | Indicates whether a reader should read a |
+| | | Function or Metadata record. Set to ``0`` for |
+| | | Function records. |
++---------------+--------------+-----------------------------------------------+
+| action | ``3`` | Specifies whether the function is being |
+| | | entered, exited, or is a non-standard entry |
+| | | or exit produced by optimizations. |
++---------------+--------------+-----------------------------------------------+
+| function_id | ``28`` | A numeric ID for the function. Resolved to a |
+| | | name via the xray instrumentation map. The |
+| | | instrumentation map is built by xray at |
+| | | compile time into an object file and pairs |
+| | | the function ids to addresses. It is used for |
+| | | patching and as a lookup into the binary's |
+| | | symbols to obtain names. |
++---------------+--------------+-----------------------------------------------+
+| tsc_delta | ``32`` | The number of ticks of the timestamp counter |
+| | | since a previous record recorded a delta or |
+| | | other TSC resetting event. |
++---------------+--------------+-----------------------------------------------+
+
+On little-endian machines, the bitfields are ordered from least significant bit
+bit to most significant bit. A reader can read an 8 bit value and apply the mask
+``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned
+shift right by ``0x04`` to obtain the function_id field.
+
+On big-endian machine, the bitfields are written in order from most significant
+bit to least significant bit. A reader would read an 8 bit value and unsigned
+shift right by 7 bits for the discriminant. The function_id field could be
+obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``.
+
+Function action types are as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Type | Number | Description |
++===============+==============+===============================================+
+| Entry | ``0`` | Typical function entry. |
++---------------+--------------+-----------------------------------------------+
+| Exit | ``1`` | Typical function exit. |
++---------------+--------------+-----------------------------------------------+
+| Tail_Exit | ``2`` | An exit from a function due to tail call |
+| | | optimization. |
++---------------+--------------+-----------------------------------------------+
+| Entry_Args | ``3`` | A function entry that records arguments. |
++---------------+--------------+-----------------------------------------------+
+
+Entry_Args records do not contain the arguments themselves. Instead, metadata
+records for each of the logged args follow the function record in the stream.
+
+
+Metadata Records
+----------------
+
+Interspersed throughout the buffer are 16 byte Metadata records. For typically
+instrumented binaries, they will be sparser than Function records, and they
+provide a fuller picture of the binary execution state.
+
+Metadata record layout is partially record dependent, but they share a common
+structure.
+
+The same bit field rules described for function records apply to the first byte
+of MetadataRecords. Within this byte, little endian machines use lsb to msb
+ordering and big endian machines use msb to lsb ordering.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size | Description |
++===============+==============+===============================================+
+| discriminant | ``1 bit`` | Indicates whether a reader should read a |
+| | | Function or Metadata record. Set to ``1`` for |
+| | | Metadata records. |
++---------------+--------------+-----------------------------------------------+
+| record_kind | ``7 bits`` | The type of Metadata record. |
++---------------+--------------+-----------------------------------------------+
+| data | ``15 bytes`` | A data field used differently for each record |
+| | | type. |
++---------------+--------------+-----------------------------------------------+
+
+Here is a table of the enumerated record kinds.
+
++--------+---------------------------+
+| Number | Type |
++========+===========================+
+| 0 | NewBuffer |
++--------+---------------------------+
+| 1 | EndOfBuffer |
++--------+---------------------------+
+| 2 | NewCPUId |
++--------+---------------------------+
+| 3 | TSCWrap |
++--------+---------------------------+
+| 4 | WallTimeMarker |
++--------+---------------------------+
+| 5 | CustomEventMarker |
++--------+---------------------------+
+| 6 | CallArgument |
++--------+---------------------------+
+
+
+NewBuffer Records
+-----------------
+
+Each buffer begins with a NewBuffer record immediately after the header.
+It records the thread ID of the thread that the trace belongs to.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| thread_Id | ``2`` | Thread ID for buffer. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``13`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+WallClockTime Records
+---------------------
+
+Following the NewBuffer record, each buffer records an absolute time as a frame
+of reference for the durations recorded by timestamp counter deltas.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| seconds | ``8`` | Seconds on absolute timescale. The starting |
+| | | point is unspecified and depends on the |
+| | | implementation and platform configured by the |
+| | | tracer. |
++---------------+--------------+-----------------------------------------------+
+| microseconds | ``4`` | The microsecond component of the time. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``3`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+NewCpuId Records
+----------------
+
+Each function entry invokes a routine to determine what CPU is executing.
+Typically, this is done with readtscp, which reads the timestamp counter at the
+same time.
+
+If the tracing detects that the execution has switched CPUs or if this is the
+first instrumented entry point, the tracer will output a NewCpuId record.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| cpu_id | ``2`` | CPU Id. |
++---------------+--------------+-----------------------------------------------+
+| absolute_tsc | ``8`` | The absolute value of the timestamp counter. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``5`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+TSCWrap Records
+---------------
+
+Since each function record uses a 32 bit value to represent the number of ticks
+of the timestamp counter since the last reference, it is possible for this value
+to overflow, particularly for sparsely instrumented binaries.
+
+When this delta would not fit into a 32 bit representation, a reference absolute
+timestamp counter record is written in the form of a TSCWrap record.
+
+Its data segment is as follows.
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| absolute_tsc | ``8`` | Timestamp counter value. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``7`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+CallArgument Records
+--------------------
+
+Immediately following an Entry_Args type function record, there may be one or
+more CallArgument records that contain the traced function's parameter values.
+
+The order of the CallArgument Record sequency corresponds one to one with the
+order of the function parameters.
+
+CallArgument data segment:
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| argument | ``8`` | Numeric argument (may be pointer address). |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``7`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+CustomEventMarker Records
+-------------------------
+
+XRay provides the feature of logging custom events. This may be leveraged to
+record tracing info for RPCs or similarly trace data that is application
+specific.
+
+Custom Events themselves are an unstructured (application defined) segment of
+memory with arbitrary size within the buffer. They are preceded by
+CustomEventMarkers to indicate their presence and size.
+
+CustomEventMarker data segment:
+
++---------------+--------------+-----------------------------------------------+
+| Field | Size (bytes) | Description |
++===============+==============+===============================================+
+| event_size | ``4`` | Size of preceded event. |
++---------------+--------------+-----------------------------------------------+
+| absolute_tsc | ``8`` | A timestamp counter of the event. |
++---------------+--------------+-----------------------------------------------+
+| reserved | ``3`` | Unused. |
++---------------+--------------+-----------------------------------------------+
+
+
+EndOfBuffer Records
+-------------------
+
+An EndOfBuffer record type indicates that there is no more trace data in this
+buffer. The reader is expected to seek past the remaining buffer_size expressed
+before the start of buffer and look for either another header or EOF.
+
+
+Format Grammar and Invariants
+=============================
+
+Not all sequences of Metadata records and Function records are valid data. A
+sequence should be parsed as a state machine. The expectations for a valid
+format can be expressed as a context free grammar.
+
+This is an attempt to explain the format with statements in EBNF format.
+
+- Format := Header ThreadBuffer* EOF
+
+- ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End
+
+- BodySequence := NewCPUId | TSCWrap | Function | CustomEvent
+
+- Function := (Function_Entry_Args CallArgument*) | Function_Other_Type
+
+- CustomEvent := CustomEventMarker CustomEventUnstructuredMemory
+
+- End := EndOfBuffer RemainingBufferSizeToSkip
+
+
+Function Record Order
+---------------------
+
+There are a few clarifications that may help understand what is expected of
+Function records.
+
+- Functions with an Exit are expected to have a corresponding Entry or
+ Entry_Args function record precede them in the trace.
+
+- Tail_Exit Function records record the Function ID of the function whose return
+ address the program counter will take. In other words, the final function that
+ would be popped off of the call stack if tail call optimization was not used.
+
+- Not all functions marked for instrumentation are necessarily in the trace. The
+ tracer uses heuristics to preserve the trace for non-trivial functions.
+
+- Not every entry must have a traced Exit or Tail Exit. The buffer may run out
+ of space or the program may request for the tracer to finalize toreturn the
+ buffer before an instrumented function exits.
Added: www-releases/trunk/8.0.1/docs/_sources/YamlIO.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/YamlIO.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/YamlIO.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/YamlIO.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,1035 @@
+=====================
+YAML I/O
+=====================
+
+.. contents::
+ :local:
+
+Introduction to YAML
+====================
+
+YAML is a human readable data serialization language. The full YAML language
+spec can be read at `yaml.org
+<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_. The simplest form of
+yaml is just "scalars", "mappings", and "sequences". A scalar is any number
+or string. The pound/hash symbol (#) begins a comment line. A mapping is
+a set of key-value pairs where the key ends with a colon. For example:
+
+.. code-block:: yaml
+
+ # a mapping
+ name: Tom
+ hat-size: 7
+
+A sequence is a list of items where each item starts with a leading dash ('-').
+For example:
+
+.. code-block:: yaml
+
+ # a sequence
+ - x86
+ - x86_64
+ - PowerPC
+
+You can combine mappings and sequences by indenting. For example a sequence
+of mappings in which one of the mapping values is itself a sequence:
+
+.. code-block:: yaml
+
+ # a sequence of mappings with one key's value being a sequence
+ - name: Tom
+ cpus:
+ - x86
+ - x86_64
+ - name: Bob
+ cpus:
+ - x86
+ - name: Dan
+ cpus:
+ - PowerPC
+ - x86
+
+Sometime sequences are known to be short and the one entry per line is too
+verbose, so YAML offers an alternate syntax for sequences called a "Flow
+Sequence" in which you put comma separated sequence elements into square
+brackets. The above example could then be simplified to :
+
+
+.. code-block:: yaml
+
+ # a sequence of mappings with one key's value being a flow sequence
+ - name: Tom
+ cpus: [ x86, x86_64 ]
+ - name: Bob
+ cpus: [ x86 ]
+ - name: Dan
+ cpus: [ PowerPC, x86 ]
+
+
+Introduction to YAML I/O
+========================
+
+The use of indenting makes the YAML easy for a human to read and understand,
+but having a program read and write YAML involves a lot of tedious details.
+The YAML I/O library structures and simplifies reading and writing YAML
+documents.
+
+YAML I/O assumes you have some "native" data structures which you want to be
+able to dump as YAML and recreate from YAML. The first step is to try
+writing example YAML for your data structures. You may find after looking at
+possible YAML representations that a direct mapping of your data structures
+to YAML is not very readable. Often the fields are not in the order that
+a human would find readable. Or the same information is replicated in multiple
+locations, making it hard for a human to write such YAML correctly.
+
+In relational database theory there is a design step called normalization in
+which you reorganize fields and tables. The same considerations need to
+go into the design of your YAML encoding. But, you may not want to change
+your existing native data structures. Therefore, when writing out YAML
+there may be a normalization step, and when reading YAML there would be a
+corresponding denormalization step.
+
+YAML I/O uses a non-invasive, traits based design. YAML I/O defines some
+abstract base templates. You specialize those templates on your data types.
+For instance, if you have an enumerated type FooBar you could specialize
+ScalarEnumerationTraits on that type and define the enumeration() method:
+
+.. code-block:: c++
+
+ using llvm::yaml::ScalarEnumerationTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct ScalarEnumerationTraits<FooBar> {
+ static void enumeration(IO &io, FooBar &value) {
+ ...
+ }
+ };
+
+
+As with all YAML I/O template specializations, the ScalarEnumerationTraits is used for
+both reading and writing YAML. That is, the mapping between in-memory enum
+values and the YAML string representation is only in one place.
+This assures that the code for writing and parsing of YAML stays in sync.
+
+To specify a YAML mappings, you define a specialization on
+llvm::yaml::MappingTraits.
+If your native data structure happens to be a struct that is already normalized,
+then the specialization is simple. For example:
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct MappingTraits<Person> {
+ static void mapping(IO &io, Person &info) {
+ io.mapRequired("name", info.name);
+ io.mapOptional("hat-size", info.hatSize);
+ }
+ };
+
+
+A YAML sequence is automatically inferred if you data type has begin()/end()
+iterators and a push_back() method. Therefore any of the STL containers
+(such as std::vector<>) will automatically translate to YAML sequences.
+
+Once you have defined specializations for your data types, you can
+programmatically use YAML I/O to write a YAML document:
+
+.. code-block:: c++
+
+ using llvm::yaml::Output;
+
+ Person tom;
+ tom.name = "Tom";
+ tom.hatSize = 8;
+ Person dan;
+ dan.name = "Dan";
+ dan.hatSize = 7;
+ std::vector<Person> persons;
+ persons.push_back(tom);
+ persons.push_back(dan);
+
+ Output yout(llvm::outs());
+ yout << persons;
+
+This would write the following:
+
+.. code-block:: yaml
+
+ - name: Tom
+ hat-size: 8
+ - name: Dan
+ hat-size: 7
+
+And you can also read such YAML documents with the following code:
+
+.. code-block:: c++
+
+ using llvm::yaml::Input;
+
+ typedef std::vector<Person> PersonList;
+ std::vector<PersonList> docs;
+
+ Input yin(document.getBuffer());
+ yin >> docs;
+
+ if ( yin.error() )
+ return;
+
+ // Process read document
+ for ( PersonList &pl : docs ) {
+ for ( Person &person : pl ) {
+ cout << "name=" << person.name;
+ }
+ }
+
+One other feature of YAML is the ability to define multiple documents in a
+single file. That is why reading YAML produces a vector of your document type.
+
+
+
+Error Handling
+==============
+
+When parsing a YAML document, if the input does not match your schema (as
+expressed in your XxxTraits<> specializations). YAML I/O
+will print out an error message and your Input object's error() method will
+return true. For instance the following document:
+
+.. code-block:: yaml
+
+ - name: Tom
+ shoe-size: 12
+ - name: Dan
+ hat-size: 7
+
+Has a key (shoe-size) that is not defined in the schema. YAML I/O will
+automatically generate this error:
+
+.. code-block:: yaml
+
+ YAML:2:2: error: unknown key 'shoe-size'
+ shoe-size: 12
+ ^~~~~~~~~
+
+Similar errors are produced for other input not conforming to the schema.
+
+
+Scalars
+=======
+
+YAML scalars are just strings (i.e. not a sequence or mapping). The YAML I/O
+library provides support for translating between YAML scalars and specific
+C++ types.
+
+
+Built-in types
+--------------
+The following types have built-in support in YAML I/O:
+
+* bool
+* float
+* double
+* StringRef
+* std::string
+* int64_t
+* int32_t
+* int16_t
+* int8_t
+* uint64_t
+* uint32_t
+* uint16_t
+* uint8_t
+
+That is, you can use those types in fields of MappingTraits or as element type
+in sequence. When reading, YAML I/O will validate that the string found
+is convertible to that type and error out if not.
+
+
+Unique types
+------------
+Given that YAML I/O is trait based, the selection of how to convert your data
+to YAML is based on the type of your data. But in C++ type matching, typedefs
+do not generate unique type names. That means if you have two typedefs of
+unsigned int, to YAML I/O both types look exactly like unsigned int. To
+facilitate make unique type names, YAML I/O provides a macro which is used
+like a typedef on built-in types, but expands to create a class with conversion
+operators to and from the base type. For example:
+
+.. code-block:: c++
+
+ LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyFooFlags)
+ LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyBarFlags)
+
+This generates two classes MyFooFlags and MyBarFlags which you can use in your
+native data structures instead of uint32_t. They are implicitly
+converted to and from uint32_t. The point of creating these unique types
+is that you can now specify traits on them to get different YAML conversions.
+
+Hex types
+---------
+An example use of a unique type is that YAML I/O provides fixed sized unsigned
+integers that are written with YAML I/O as hexadecimal instead of the decimal
+format used by the built-in integer types:
+
+* Hex64
+* Hex32
+* Hex16
+* Hex8
+
+You can use llvm::yaml::Hex32 instead of uint32_t and the only different will
+be that when YAML I/O writes out that type it will be formatted in hexadecimal.
+
+
+ScalarEnumerationTraits
+-----------------------
+YAML I/O supports translating between in-memory enumerations and a set of string
+values in YAML documents. This is done by specializing ScalarEnumerationTraits<>
+on your enumeration type and define a enumeration() method.
+For instance, suppose you had an enumeration of CPUs and a struct with it as
+a field:
+
+.. code-block:: c++
+
+ enum CPUs {
+ cpu_x86_64 = 5,
+ cpu_x86 = 7,
+ cpu_PowerPC = 8
+ };
+
+ struct Info {
+ CPUs cpu;
+ uint32_t flags;
+ };
+
+To support reading and writing of this enumeration, you can define a
+ScalarEnumerationTraits specialization on CPUs, which can then be used
+as a field type:
+
+.. code-block:: c++
+
+ using llvm::yaml::ScalarEnumerationTraits;
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct ScalarEnumerationTraits<CPUs> {
+ static void enumeration(IO &io, CPUs &value) {
+ io.enumCase(value, "x86_64", cpu_x86_64);
+ io.enumCase(value, "x86", cpu_x86);
+ io.enumCase(value, "PowerPC", cpu_PowerPC);
+ }
+ };
+
+ template <>
+ struct MappingTraits<Info> {
+ static void mapping(IO &io, Info &info) {
+ io.mapRequired("cpu", info.cpu);
+ io.mapOptional("flags", info.flags, 0);
+ }
+ };
+
+When reading YAML, if the string found does not match any of the strings
+specified by enumCase() methods, an error is automatically generated.
+When writing YAML, if the value being written does not match any of the values
+specified by the enumCase() methods, a runtime assertion is triggered.
+
+
+BitValue
+--------
+Another common data structure in C++ is a field where each bit has a unique
+meaning. This is often used in a "flags" field. YAML I/O has support for
+converting such fields to a flow sequence. For instance suppose you
+had the following bit flags defined:
+
+.. code-block:: c++
+
+ enum {
+ flagsPointy = 1
+ flagsHollow = 2
+ flagsFlat = 4
+ flagsRound = 8
+ };
+
+ LLVM_YAML_STRONG_TYPEDEF(uint32_t, MyFlags)
+
+To support reading and writing of MyFlags, you specialize ScalarBitSetTraits<>
+on MyFlags and provide the bit values and their names.
+
+.. code-block:: c++
+
+ using llvm::yaml::ScalarBitSetTraits;
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct ScalarBitSetTraits<MyFlags> {
+ static void bitset(IO &io, MyFlags &value) {
+ io.bitSetCase(value, "hollow", flagHollow);
+ io.bitSetCase(value, "flat", flagFlat);
+ io.bitSetCase(value, "round", flagRound);
+ io.bitSetCase(value, "pointy", flagPointy);
+ }
+ };
+
+ struct Info {
+ StringRef name;
+ MyFlags flags;
+ };
+
+ template <>
+ struct MappingTraits<Info> {
+ static void mapping(IO &io, Info& info) {
+ io.mapRequired("name", info.name);
+ io.mapRequired("flags", info.flags);
+ }
+ };
+
+With the above, YAML I/O (when writing) will test mask each value in the
+bitset trait against the flags field, and each that matches will
+cause the corresponding string to be added to the flow sequence. The opposite
+is done when reading and any unknown string values will result in a error. With
+the above schema, a same valid YAML document is:
+
+.. code-block:: yaml
+
+ name: Tom
+ flags: [ pointy, flat ]
+
+Sometimes a "flags" field might contains an enumeration part
+defined by a bit-mask.
+
+.. code-block:: c++
+
+ enum {
+ flagsFeatureA = 1,
+ flagsFeatureB = 2,
+ flagsFeatureC = 4,
+
+ flagsCPUMask = 24,
+
+ flagsCPU1 = 8,
+ flagsCPU2 = 16
+ };
+
+To support reading and writing such fields, you need to use the maskedBitSet()
+method and provide the bit values, their names and the enumeration mask.
+
+.. code-block:: c++
+
+ template <>
+ struct ScalarBitSetTraits<MyFlags> {
+ static void bitset(IO &io, MyFlags &value) {
+ io.bitSetCase(value, "featureA", flagsFeatureA);
+ io.bitSetCase(value, "featureB", flagsFeatureB);
+ io.bitSetCase(value, "featureC", flagsFeatureC);
+ io.maskedBitSetCase(value, "CPU1", flagsCPU1, flagsCPUMask);
+ io.maskedBitSetCase(value, "CPU2", flagsCPU2, flagsCPUMask);
+ }
+ };
+
+YAML I/O (when writing) will apply the enumeration mask to the flags field,
+and compare the result and values from the bitset. As in case of a regular
+bitset, each that matches will cause the corresponding string to be added
+to the flow sequence.
+
+Custom Scalar
+-------------
+Sometimes for readability a scalar needs to be formatted in a custom way. For
+instance your internal data structure may use a integer for time (seconds since
+some epoch), but in YAML it would be much nicer to express that integer in
+some time format (e.g. 4-May-2012 10:30pm). YAML I/O has a way to support
+custom formatting and parsing of scalar types by specializing ScalarTraits<> on
+your data type. When writing, YAML I/O will provide the native type and
+your specialization must create a temporary llvm::StringRef. When reading,
+YAML I/O will provide an llvm::StringRef of scalar and your specialization
+must convert that to your native data type. An outline of a custom scalar type
+looks like:
+
+.. code-block:: c++
+
+ using llvm::yaml::ScalarTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct ScalarTraits<MyCustomType> {
+ static void output(const MyCustomType &value, void*,
+ llvm::raw_ostream &out) {
+ out << value; // do custom formatting here
+ }
+ static StringRef input(StringRef scalar, void*, MyCustomType &value) {
+ // do custom parsing here. Return the empty string on success,
+ // or an error message on failure.
+ return StringRef();
+ }
+ // Determine if this scalar needs quotes.
+ static QuotingType mustQuote(StringRef) { return QuotingType::Single; }
+ };
+
+Block Scalars
+-------------
+
+YAML block scalars are string literals that are represented in YAML using the
+literal block notation, just like the example shown below:
+
+.. code-block:: yaml
+
+ text: |
+ First line
+ Second line
+
+The YAML I/O library provides support for translating between YAML block scalars
+and specific C++ types by allowing you to specialize BlockScalarTraits<> on
+your data type. The library doesn't provide any built-in support for block
+scalar I/O for types like std::string and llvm::StringRef as they are already
+supported by YAML I/O and use the ordinary scalar notation by default.
+
+BlockScalarTraits specializations are very similar to the
+ScalarTraits specialization - YAML I/O will provide the native type and your
+specialization must create a temporary llvm::StringRef when writing, and
+it will also provide an llvm::StringRef that has the value of that block scalar
+and your specialization must convert that to your native data type when reading.
+An example of a custom type with an appropriate specialization of
+BlockScalarTraits is shown below:
+
+.. code-block:: c++
+
+ using llvm::yaml::BlockScalarTraits;
+ using llvm::yaml::IO;
+
+ struct MyStringType {
+ std::string Str;
+ };
+
+ template <>
+ struct BlockScalarTraits<MyStringType> {
+ static void output(const MyStringType &Value, void *Ctxt,
+ llvm::raw_ostream &OS) {
+ OS << Value.Str;
+ }
+
+ static StringRef input(StringRef Scalar, void *Ctxt,
+ MyStringType &Value) {
+ Value.Str = Scalar.str();
+ return StringRef();
+ }
+ };
+
+
+
+Mappings
+========
+
+To be translated to or from a YAML mapping for your type T you must specialize
+llvm::yaml::MappingTraits on T and implement the "void mapping(IO &io, T&)"
+method. If your native data structures use pointers to a class everywhere,
+you can specialize on the class pointer. Examples:
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ // Example of struct Foo which is used by value
+ template <>
+ struct MappingTraits<Foo> {
+ static void mapping(IO &io, Foo &foo) {
+ io.mapOptional("size", foo.size);
+ ...
+ }
+ };
+
+ // Example of struct Bar which is natively always a pointer
+ template <>
+ struct MappingTraits<Bar*> {
+ static void mapping(IO &io, Bar *&bar) {
+ io.mapOptional("size", bar->size);
+ ...
+ }
+ };
+
+
+No Normalization
+----------------
+
+The mapping() method is responsible, if needed, for normalizing and
+denormalizing. In a simple case where the native data structure requires no
+normalization, the mapping method just uses mapOptional() or mapRequired() to
+bind the struct's fields to YAML key names. For example:
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct MappingTraits<Person> {
+ static void mapping(IO &io, Person &info) {
+ io.mapRequired("name", info.name);
+ io.mapOptional("hat-size", info.hatSize);
+ }
+ };
+
+
+Normalization
+----------------
+
+When [de]normalization is required, the mapping() method needs a way to access
+normalized values as fields. To help with this, there is
+a template MappingNormalization<> which you can then use to automatically
+do the normalization and denormalization. The template is used to create
+a local variable in your mapping() method which contains the normalized keys.
+
+Suppose you have native data type
+Polar which specifies a position in polar coordinates (distance, angle):
+
+.. code-block:: c++
+
+ struct Polar {
+ float distance;
+ float angle;
+ };
+
+but you've decided the normalized YAML for should be in x,y coordinates. That
+is, you want the yaml to look like:
+
+.. code-block:: yaml
+
+ x: 10.3
+ y: -4.7
+
+You can support this by defining a MappingTraits that normalizes the polar
+coordinates to x,y coordinates when writing YAML and denormalizes x,y
+coordinates into polar when reading YAML.
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ template <>
+ struct MappingTraits<Polar> {
+
+ class NormalizedPolar {
+ public:
+ NormalizedPolar(IO &io)
+ : x(0.0), y(0.0) {
+ }
+ NormalizedPolar(IO &, Polar &polar)
+ : x(polar.distance * cos(polar.angle)),
+ y(polar.distance * sin(polar.angle)) {
+ }
+ Polar denormalize(IO &) {
+ return Polar(sqrt(x*x+y*y), arctan(x,y));
+ }
+
+ float x;
+ float y;
+ };
+
+ static void mapping(IO &io, Polar &polar) {
+ MappingNormalization<NormalizedPolar, Polar> keys(io, polar);
+
+ io.mapRequired("x", keys->x);
+ io.mapRequired("y", keys->y);
+ }
+ };
+
+When writing YAML, the local variable "keys" will be a stack allocated
+instance of NormalizedPolar, constructed from the supplied polar object which
+initializes it x and y fields. The mapRequired() methods then write out the x
+and y values as key/value pairs.
+
+When reading YAML, the local variable "keys" will be a stack allocated instance
+of NormalizedPolar, constructed by the empty constructor. The mapRequired
+methods will find the matching key in the YAML document and fill in the x and y
+fields of the NormalizedPolar object keys. At the end of the mapping() method
+when the local keys variable goes out of scope, the denormalize() method will
+automatically be called to convert the read values back to polar coordinates,
+and then assigned back to the second parameter to mapping().
+
+In some cases, the normalized class may be a subclass of the native type and
+could be returned by the denormalize() method, except that the temporary
+normalized instance is stack allocated. In these cases, the utility template
+MappingNormalizationHeap<> can be used instead. It just like
+MappingNormalization<> except that it heap allocates the normalized object
+when reading YAML. It never destroys the normalized object. The denormalize()
+method can this return "this".
+
+
+Default values
+--------------
+Within a mapping() method, calls to io.mapRequired() mean that that key is
+required to exist when parsing YAML documents, otherwise YAML I/O will issue an
+error.
+
+On the other hand, keys registered with io.mapOptional() are allowed to not
+exist in the YAML document being read. So what value is put in the field
+for those optional keys?
+There are two steps to how those optional fields are filled in. First, the
+second parameter to the mapping() method is a reference to a native class. That
+native class must have a default constructor. Whatever value the default
+constructor initially sets for an optional field will be that field's value.
+Second, the mapOptional() method has an optional third parameter. If provided
+it is the value that mapOptional() should set that field to if the YAML document
+does not have that key.
+
+There is one important difference between those two ways (default constructor
+and third parameter to mapOptional). When YAML I/O generates a YAML document,
+if the mapOptional() third parameter is used, if the actual value being written
+is the same as (using ==) the default value, then that key/value is not written.
+
+
+Order of Keys
+--------------
+
+When writing out a YAML document, the keys are written in the order that the
+calls to mapRequired()/mapOptional() are made in the mapping() method. This
+gives you a chance to write the fields in an order that a human reader of
+the YAML document would find natural. This may be different that the order
+of the fields in the native class.
+
+When reading in a YAML document, the keys in the document can be in any order,
+but they are processed in the order that the calls to mapRequired()/mapOptional()
+are made in the mapping() method. That enables some interesting
+functionality. For instance, if the first field bound is the cpu and the second
+field bound is flags, and the flags are cpu specific, you can programmatically
+switch how the flags are converted to and from YAML based on the cpu.
+This works for both reading and writing. For example:
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ struct Info {
+ CPUs cpu;
+ uint32_t flags;
+ };
+
+ template <>
+ struct MappingTraits<Info> {
+ static void mapping(IO &io, Info &info) {
+ io.mapRequired("cpu", info.cpu);
+ // flags must come after cpu for this to work when reading yaml
+ if ( info.cpu == cpu_x86_64 )
+ io.mapRequired("flags", *(My86_64Flags*)info.flags);
+ else
+ io.mapRequired("flags", *(My86Flags*)info.flags);
+ }
+ };
+
+
+Tags
+----
+
+The YAML syntax supports tags as a way to specify the type of a node before
+it is parsed. This allows dynamic types of nodes. But the YAML I/O model uses
+static typing, so there are limits to how you can use tags with the YAML I/O
+model. Recently, we added support to YAML I/O for checking/setting the optional
+tag on a map. Using this functionality it is even possbile to support different
+mappings, as long as they are convertible.
+
+To check a tag, inside your mapping() method you can use io.mapTag() to specify
+what the tag should be. This will also add that tag when writing yaml.
+
+Validation
+----------
+
+Sometimes in a yaml map, each key/value pair is valid, but the combination is
+not. This is similar to something having no syntax errors, but still having
+semantic errors. To support semantic level checking, YAML I/O allows
+an optional ``validate()`` method in a MappingTraits template specialization.
+
+When parsing yaml, the ``validate()`` method is call *after* all key/values in
+the map have been processed. Any error message returned by the ``validate()``
+method during input will be printed just a like a syntax error would be printed.
+When writing yaml, the ``validate()`` method is called *before* the yaml
+key/values are written. Any error during output will trigger an ``assert()``
+because it is a programming error to have invalid struct values.
+
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ struct Stuff {
+ ...
+ };
+
+ template <>
+ struct MappingTraits<Stuff> {
+ static void mapping(IO &io, Stuff &stuff) {
+ ...
+ }
+ static StringRef validate(IO &io, Stuff &stuff) {
+ // Look at all fields in 'stuff' and if there
+ // are any bad values return a string describing
+ // the error. Otherwise return an empty string.
+ return StringRef();
+ }
+ };
+
+Flow Mapping
+------------
+A YAML "flow mapping" is a mapping that uses the inline notation
+(e.g { x: 1, y: 0 } ) when written to YAML. To specify that a type should be
+written in YAML using flow mapping, your MappingTraits specialization should
+add "static const bool flow = true;". For instance:
+
+.. code-block:: c++
+
+ using llvm::yaml::MappingTraits;
+ using llvm::yaml::IO;
+
+ struct Stuff {
+ ...
+ };
+
+ template <>
+ struct MappingTraits<Stuff> {
+ static void mapping(IO &io, Stuff &stuff) {
+ ...
+ }
+
+ static const bool flow = true;
+ }
+
+Flow mappings are subject to line wrapping according to the Output object
+configuration.
+
+Sequence
+========
+
+To be translated to or from a YAML sequence for your type T you must specialize
+llvm::yaml::SequenceTraits on T and implement two methods:
+``size_t size(IO &io, T&)`` and
+``T::value_type& element(IO &io, T&, size_t indx)``. For example:
+
+.. code-block:: c++
+
+ template <>
+ struct SequenceTraits<MySeq> {
+ static size_t size(IO &io, MySeq &list) { ... }
+ static MySeqEl &element(IO &io, MySeq &list, size_t index) { ... }
+ };
+
+The size() method returns how many elements are currently in your sequence.
+The element() method returns a reference to the i'th element in the sequence.
+When parsing YAML, the element() method may be called with an index one bigger
+than the current size. Your element() method should allocate space for one
+more element (using default constructor if element is a C++ object) and returns
+a reference to that new allocated space.
+
+
+Flow Sequence
+-------------
+A YAML "flow sequence" is a sequence that when written to YAML it uses the
+inline notation (e.g [ foo, bar ] ). To specify that a sequence type should
+be written in YAML as a flow sequence, your SequenceTraits specialization should
+add "static const bool flow = true;". For instance:
+
+.. code-block:: c++
+
+ template <>
+ struct SequenceTraits<MyList> {
+ static size_t size(IO &io, MyList &list) { ... }
+ static MyListEl &element(IO &io, MyList &list, size_t index) { ... }
+
+ // The existence of this member causes YAML I/O to use a flow sequence
+ static const bool flow = true;
+ };
+
+With the above, if you used MyList as the data type in your native data
+structures, then when converted to YAML, a flow sequence of integers
+will be used (e.g. [ 10, -3, 4 ]).
+
+Flow sequences are subject to line wrapping according to the Output object
+configuration.
+
+Utility Macros
+--------------
+Since a common source of sequences is std::vector<>, YAML I/O provides macros:
+LLVM_YAML_IS_SEQUENCE_VECTOR() and LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR() which
+can be used to easily specify SequenceTraits<> on a std::vector type. YAML
+I/O does not partial specialize SequenceTraits on std::vector<> because that
+would force all vectors to be sequences. An example use of the macros:
+
+.. code-block:: c++
+
+ std::vector<MyType1>;
+ std::vector<MyType2>;
+ LLVM_YAML_IS_SEQUENCE_VECTOR(MyType1)
+ LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR(MyType2)
+
+
+
+Document List
+=============
+
+YAML allows you to define multiple "documents" in a single YAML file. Each
+new document starts with a left aligned "---" token. The end of all documents
+is denoted with a left aligned "..." token. Many users of YAML will never
+have need for multiple documents. The top level node in their YAML schema
+will be a mapping or sequence. For those cases, the following is not needed.
+But for cases where you do want multiple documents, you can specify a
+trait for you document list type. The trait has the same methods as
+SequenceTraits but is named DocumentListTraits. For example:
+
+.. code-block:: c++
+
+ template <>
+ struct DocumentListTraits<MyDocList> {
+ static size_t size(IO &io, MyDocList &list) { ... }
+ static MyDocType element(IO &io, MyDocList &list, size_t index) { ... }
+ };
+
+
+User Context Data
+=================
+When an llvm::yaml::Input or llvm::yaml::Output object is created their
+constructors take an optional "context" parameter. This is a pointer to
+whatever state information you might need.
+
+For instance, in a previous example we showed how the conversion type for a
+flags field could be determined at runtime based on the value of another field
+in the mapping. But what if an inner mapping needs to know some field value
+of an outer mapping? That is where the "context" parameter comes in. You
+can set values in the context in the outer map's mapping() method and
+retrieve those values in the inner map's mapping() method.
+
+The context value is just a void*. All your traits which use the context
+and operate on your native data types, need to agree what the context value
+actually is. It could be a pointer to an object or struct which your various
+traits use to shared context sensitive information.
+
+
+Output
+======
+
+The llvm::yaml::Output class is used to generate a YAML document from your
+in-memory data structures, using traits defined on your data types.
+To instantiate an Output object you need an llvm::raw_ostream, an optional
+context pointer and an optional wrapping column:
+
+.. code-block:: c++
+
+ class Output : public IO {
+ public:
+ Output(llvm::raw_ostream &, void *context = NULL, int WrapColumn = 70);
+
+Once you have an Output object, you can use the C++ stream operator on it
+to write your native data as YAML. One thing to recall is that a YAML file
+can contain multiple "documents". If the top level data structure you are
+streaming as YAML is a mapping, scalar, or sequence, then Output assumes you
+are generating one document and wraps the mapping output
+with "``---``" and trailing "``...``".
+
+The WrapColumn parameter will cause the flow mappings and sequences to
+line-wrap when they go over the supplied column. Pass 0 to completely
+suppress the wrapping.
+
+.. code-block:: c++
+
+ using llvm::yaml::Output;
+
+ void dumpMyMapDoc(const MyMapType &info) {
+ Output yout(llvm::outs());
+ yout << info;
+ }
+
+The above could produce output like:
+
+.. code-block:: yaml
+
+ ---
+ name: Tom
+ hat-size: 7
+ ...
+
+On the other hand, if the top level data structure you are streaming as YAML
+has a DocumentListTraits specialization, then Output walks through each element
+of your DocumentList and generates a "---" before the start of each element
+and ends with a "...".
+
+.. code-block:: c++
+
+ using llvm::yaml::Output;
+
+ void dumpMyMapDoc(const MyDocListType &docList) {
+ Output yout(llvm::outs());
+ yout << docList;
+ }
+
+The above could produce output like:
+
+.. code-block:: yaml
+
+ ---
+ name: Tom
+ hat-size: 7
+ ---
+ name: Tom
+ shoe-size: 11
+ ...
+
+Input
+=====
+
+The llvm::yaml::Input class is used to parse YAML document(s) into your native
+data structures. To instantiate an Input
+object you need a StringRef to the entire YAML file, and optionally a context
+pointer:
+
+.. code-block:: c++
+
+ class Input : public IO {
+ public:
+ Input(StringRef inputContent, void *context=NULL);
+
+Once you have an Input object, you can use the C++ stream operator to read
+the document(s). If you expect there might be multiple YAML documents in
+one file, you'll need to specialize DocumentListTraits on a list of your
+document type and stream in that document list type. Otherwise you can
+just stream in the document type. Also, you can check if there was
+any syntax errors in the YAML be calling the error() method on the Input
+object. For example:
+
+.. code-block:: c++
+
+ // Reading a single document
+ using llvm::yaml::Input;
+
+ Input yin(mb.getBuffer());
+
+ // Parse the YAML file
+ MyDocType theDoc;
+ yin >> theDoc;
+
+ // Check for error
+ if ( yin.error() )
+ return;
+
+
+.. code-block:: c++
+
+ // Reading multiple documents in one file
+ using llvm::yaml::Input;
+
+ LLVM_YAML_IS_DOCUMENT_LIST_VECTOR(MyDocType)
+
+ Input yin(mb.getBuffer());
+
+ // Parse the YAML file
+ std::vector<MyDocType> theDocList;
+ yin >> theDocList;
+
+ // Check for error
+ if ( yin.error() )
+ return;
+
+
Added: www-releases/trunk/8.0.1/docs/_sources/index.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/index.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/index.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/index.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,591 @@
+Overview
+========
+
+The LLVM compiler infrastructure supports a wide range of projects, from
+industrial strength compilers to specialized JIT applications to small
+research projects.
+
+Similarly, documentation is broken down into several high-level groupings
+targeted at different audiences:
+
+LLVM Design & Overview
+======================
+
+Several introductory papers and presentations.
+
+.. toctree::
+ :hidden:
+
+ LangRef
+
+:doc:`LangRef`
+ Defines the LLVM intermediate representation.
+
+`Introduction to the LLVM Compiler`__
+ Presentation providing a users introduction to LLVM.
+
+ .. __: http://llvm.org/pubs/2008-10-04-ACAT-LLVM-Intro.html
+
+`Intro to LLVM`__
+ Book chapter providing a compiler hacker's introduction to LLVM.
+
+ .. __: http://www.aosabook.org/en/llvm.html
+
+
+`LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation`__
+ Design overview.
+
+ .. __: http://llvm.org/pubs/2004-01-30-CGO-LLVM.html
+
+`LLVM: An Infrastructure for Multi-Stage Optimization`__
+ More details (quite old now).
+
+ .. __: http://llvm.org/pubs/2002-12-LattnerMSThesis.html
+
+`Publications mentioning LLVM <http://llvm.org/pubs>`_
+ ..
+
+User Guides
+===========
+
+For those new to the LLVM system.
+
+NOTE: If you are a user who is only interested in using LLVM-based
+compilers, you should look into `Clang <http://clang.llvm.org>`_ or
+`DragonEgg <http://dragonegg.llvm.org>`_ instead. The documentation here is
+intended for users who have a need to work with the intermediate LLVM
+representation.
+
+.. toctree::
+ :hidden:
+
+ CMake
+ CMakePrimer
+ AdvancedBuilds
+ HowToBuildOnARM
+ HowToBuildWithPGO
+ HowToCrossCompileBuiltinsOnArm
+ HowToCrossCompileLLVM
+ CommandGuide/index
+ GettingStarted
+ GettingStartedVS
+ FAQ
+ Lexicon
+ HowToAddABuilder
+ yaml2obj
+ HowToSubmitABug
+ SphinxQuickstartTemplate
+ MarkdownQuickstartTemplate
+ Phabricator
+ TestingGuide
+ tutorial/index
+ ReleaseNotes
+ Passes
+ YamlIO
+ GetElementPtr
+ Frontend/PerformanceTips
+ MCJITDesignAndImplementation
+ CodeOfConduct
+ CompileCudaWithLLVM
+ ReportingGuide
+ Benchmarking
+ Docker
+
+:doc:`GettingStarted`
+ Discusses how to get up and running quickly with the LLVM infrastructure.
+ Everything from unpacking and compilation of the distribution to execution
+ of some tools.
+
+:doc:`CMake`
+ An addendum to the main Getting Started guide for those using the `CMake
+ build system <http://www.cmake.org>`_.
+
+:doc:`HowToBuildOnARM`
+ Notes on building and testing LLVM/Clang on ARM.
+
+:doc:`HowToBuildWithPGO`
+ Notes on building LLVM/Clang with PGO.
+
+:doc:`HowToCrossCompileBuiltinsOnArm`
+ Notes on cross-building and testing the compiler-rt builtins for Arm.
+
+:doc:`HowToCrossCompileLLVM`
+ Notes on cross-building and testing LLVM/Clang.
+
+:doc:`GettingStartedVS`
+ An addendum to the main Getting Started guide for those using Visual Studio
+ on Windows.
+
+:doc:`tutorial/index`
+ Tutorials about using LLVM. Includes a tutorial about making a custom
+ language with LLVM.
+
+:doc:`LLVM Command Guide <CommandGuide/index>`
+ A reference manual for the LLVM command line utilities ("man" pages for LLVM
+ tools).
+
+:doc:`Passes`
+ A list of optimizations and analyses implemented in LLVM.
+
+:doc:`FAQ`
+ A list of common questions and problems and their solutions.
+
+:doc:`Release notes for the current release <ReleaseNotes>`
+ This describes new features, known bugs, and other limitations.
+
+:doc:`HowToSubmitABug`
+ Instructions for properly submitting information about any bugs you run into
+ in the LLVM system.
+
+:doc:`SphinxQuickstartTemplate`
+ A template + tutorial for writing new Sphinx documentation. It is meant
+ to be read in source form.
+
+:doc:`LLVM Testing Infrastructure Guide <TestingGuide>`
+ A reference manual for using the LLVM testing infrastructure.
+
+:doc:`TestSuiteGuide`
+ Describes how to compile and run the test-suite benchmarks.
+
+`How to build the C, C++, ObjC, and ObjC++ front end`__
+ Instructions for building the clang front-end from source.
+
+ .. __: http://clang.llvm.org/get_started.html
+
+:doc:`Lexicon`
+ Definition of acronyms, terms and concepts used in LLVM.
+
+:doc:`HowToAddABuilder`
+ Instructions for adding new builder to LLVM buildbot master.
+
+:doc:`YamlIO`
+ A reference guide for using LLVM's YAML I/O library.
+
+:doc:`GetElementPtr`
+ Answers to some very frequent questions about LLVM's most frequently
+ misunderstood instruction.
+
+:doc:`Frontend/PerformanceTips`
+ A collection of tips for frontend authors on how to generate IR
+ which LLVM is able to effectively optimize.
+
+:doc:`Docker`
+ A reference for using Dockerfiles provided with LLVM.
+
+
+Programming Documentation
+=========================
+
+For developers of applications which use LLVM as a library.
+
+.. toctree::
+ :hidden:
+
+ Atomics
+ CodingStandards
+ CommandLine
+ CompilerWriterInfo
+ ExtendingLLVM
+ HowToSetUpLLVMStyleRTTI
+ ProgrammersManual
+ Extensions
+ LibFuzzer
+ FuzzingLLVM
+ ScudoHardenedAllocator
+ OptBisect
+
+:doc:`LLVM Language Reference Manual <LangRef>`
+ Defines the LLVM intermediate representation and the assembly form of the
+ different nodes.
+
+:doc:`Atomics`
+ Information about LLVM's concurrency model.
+
+:doc:`ProgrammersManual`
+ Introduction to the general layout of the LLVM sourcebase, important classes
+ and APIs, and some tips & tricks.
+
+:doc:`Extensions`
+ LLVM-specific extensions to tools and formats LLVM seeks compatibility with.
+
+:doc:`CommandLine`
+ Provides information on using the command line parsing library.
+
+:doc:`CodingStandards`
+ Details the LLVM coding standards and provides useful information on writing
+ efficient C++ code.
+
+:doc:`HowToSetUpLLVMStyleRTTI`
+ How to make ``isa<>``, ``dyn_cast<>``, etc. available for clients of your
+ class hierarchy.
+
+:doc:`ExtendingLLVM`
+ Look here to see how to add instructions and intrinsics to LLVM.
+
+`Doxygen generated documentation <http://llvm.org/doxygen/>`_
+ (`classes <http://llvm.org/doxygen/inherits.html>`_)
+
+`Documentation for Go bindings <http://godoc.org/llvm.org/llvm/bindings/go/llvm>`_
+
+`Github Source Repository Browser <http://github.com/llvm/llvm-project//>`_
+ ..
+
+:doc:`CompilerWriterInfo`
+ A list of helpful links for compiler writers.
+
+:doc:`LibFuzzer`
+ A library for writing in-process guided fuzzers.
+
+:doc:`FuzzingLLVM`
+ Information on writing and using Fuzzers to find bugs in LLVM.
+
+:doc:`ScudoHardenedAllocator`
+ A library that implements a security-hardened `malloc()`.
+
+:doc:`OptBisect`
+ A command line option for debugging optimization-induced failures.
+
+.. _index-subsystem-docs:
+
+Subsystem Documentation
+=======================
+
+For API clients and LLVM developers.
+
+.. toctree::
+ :hidden:
+
+ AliasAnalysis
+ MemorySSA
+ BitCodeFormat
+ BlockFrequencyTerminology
+ BranchWeightMetadata
+ Bugpoint
+ CodeGenerator
+ ExceptionHandling
+ LinkTimeOptimization
+ SegmentedStacks
+ TableGenFundamentals
+ TableGen/index
+ DebuggingJITedCode
+ GoldPlugin
+ MarkedUpDisassembly
+ SystemLibrary
+ SupportLibrary
+ SourceLevelDebugging
+ Vectorizers
+ WritingAnLLVMBackend
+ GarbageCollection
+ WritingAnLLVMPass
+ HowToUseAttributes
+ NVPTXUsage
+ AMDGPUUsage
+ StackMaps
+ InAlloca
+ BigEndianNEON
+ CoverageMappingFormat
+ Statepoints
+ MergeFunctions
+ TypeMetadata
+ TransformMetadata
+ FaultMaps
+ MIRLangRef
+ Coroutines
+ GlobalISel
+ XRay
+ XRayExample
+ XRayFDRFormat
+ PDB/index
+ CFIVerify
+ SpeculativeLoadHardening
+ StackSafetyAnalysis
+
+:doc:`WritingAnLLVMPass`
+ Information on how to write LLVM transformations and analyses.
+
+:doc:`WritingAnLLVMBackend`
+ Information on how to write LLVM backends for machine targets.
+
+:doc:`CodeGenerator`
+ The design and implementation of the LLVM code generator. Useful if you are
+ working on retargetting LLVM to a new architecture, designing a new codegen
+ pass, or enhancing existing components.
+
+:doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`
+ A reference manual for the MIR serialization format, which is used to test
+ LLVM's code generation passes.
+
+:doc:`TableGen <TableGen/index>`
+ Describes the TableGen tool, which is used heavily by the LLVM code
+ generator.
+
+:doc:`AliasAnalysis`
+ Information on how to write a new alias analysis implementation or how to
+ use existing analyses.
+
+:doc:`MemorySSA`
+ Information about the MemorySSA utility in LLVM, as well as how to use it.
+
+:doc:`GarbageCollection`
+ The interfaces source-language compilers should use for compiling GC'd
+ programs.
+
+:doc:`Source Level Debugging with LLVM <SourceLevelDebugging>`
+ This document describes the design and philosophy behind the LLVM
+ source-level debugger.
+
+:doc:`Vectorizers`
+ This document describes the current status of vectorization in LLVM.
+
+:doc:`ExceptionHandling`
+ This document describes the design and implementation of exception handling
+ in LLVM.
+
+:doc:`Bugpoint`
+ Automatic bug finder and test-case reducer description and usage
+ information.
+
+:doc:`BitCodeFormat`
+ This describes the file format and encoding used for LLVM "bc" files.
+
+:doc:`Support Library <SupportLibrary>`
+ This document describes the LLVM Support Library (``lib/Support``) and
+ how to keep LLVM source code portable
+
+:doc:`LinkTimeOptimization`
+ This document describes the interface between LLVM intermodular optimizer
+ and the linker and its design
+
+:doc:`GoldPlugin`
+ How to build your programs with link-time optimization on Linux.
+
+:doc:`DebuggingJITedCode`
+ How to debug JITed code with GDB.
+
+:doc:`MCJITDesignAndImplementation`
+ Describes the inner workings of MCJIT execution engine.
+
+:doc:`BranchWeightMetadata`
+ Provides information about Branch Prediction Information.
+
+:doc:`BlockFrequencyTerminology`
+ Provides information about terminology used in the ``BlockFrequencyInfo``
+ analysis pass.
+
+:doc:`SegmentedStacks`
+ This document describes segmented stacks and how they are used in LLVM.
+
+:doc:`MarkedUpDisassembly`
+ This document describes the optional rich disassembly output syntax.
+
+:doc:`HowToUseAttributes`
+ Answers some questions about the new Attributes infrastructure.
+
+:doc:`NVPTXUsage`
+ This document describes using the NVPTX backend to compile GPU kernels.
+
+:doc:`AMDGPUUsage`
+ This document describes using the AMDGPU backend to compile GPU kernels.
+
+:doc:`StackMaps`
+ LLVM support for mapping instruction addresses to the location of
+ values and allowing code to be patched.
+
+:doc:`BigEndianNEON`
+ LLVM's support for generating NEON instructions on big endian ARM targets is
+ somewhat nonintuitive. This document explains the implementation and rationale.
+
+:doc:`CoverageMappingFormat`
+ This describes the format and encoding used for LLVMâs code coverage mapping.
+
+:doc:`Statepoints`
+ This describes a set of experimental extensions for garbage
+ collection support.
+
+:doc:`MergeFunctions`
+ Describes functions merging optimization.
+
+:doc:`InAlloca`
+ Description of the ``inalloca`` argument attribute.
+
+:doc:`FaultMaps`
+ LLVM support for folding control flow into faulting machine instructions.
+
+:doc:`CompileCudaWithLLVM`
+ LLVM support for CUDA.
+
+:doc:`Coroutines`
+ LLVM support for coroutines.
+
+:doc:`GlobalISel`
+ This describes the prototype instruction selection replacement, GlobalISel.
+
+:doc:`XRay`
+ High-level documentation of how to use XRay in LLVM.
+
+:doc:`XRayExample`
+ An example of how to debug an application with XRay.
+
+:doc:`The Microsoft PDB File Format <PDB/index>`
+ A detailed description of the Microsoft PDB (Program Database) file format.
+
+:doc:`CFIVerify`
+ A description of the verification tool for Control Flow Integrity.
+
+:doc:`SpeculativeLoadHardening`
+ A description of the Speculative Load Hardening mitigation for Spectre v1.
+
+:doc:`StackSafetyAnalysis`
+ This document describes the design of the stack safety analysis of local
+ variables.
+
+Development Process Documentation
+=================================
+
+Information about LLVM's development process.
+
+.. toctree::
+ :hidden:
+
+ Contributing
+ DeveloperPolicy
+ Projects
+ LLVMBuild
+ HowToReleaseLLVM
+ Packaging
+ ReleaseProcess
+ Phabricator
+ BugLifeCycle
+
+:doc:`Contributing`
+ An overview on how to contribute to LLVM.
+
+:doc:`DeveloperPolicy`
+ The LLVM project's policy towards developers and their contributions.
+
+:doc:`Projects`
+ How-to guide and templates for new projects that *use* the LLVM
+ infrastructure. The templates (directory organization, Makefiles, and test
+ tree) allow the project code to be located outside (or inside) the ``llvm/``
+ tree, while using LLVM header files and libraries.
+
+:doc:`LLVMBuild`
+ Describes the LLVMBuild organization and files used by LLVM to specify
+ component descriptions.
+
+:doc:`HowToReleaseLLVM`
+ This is a guide to preparing LLVM releases. Most developers can ignore it.
+
+:doc:`ReleaseProcess`
+ This is a guide to validate a new release, during the release process. Most developers can ignore it.
+
+:doc:`Packaging`
+ Advice on packaging LLVM into a distribution.
+
+:doc:`Phabricator`
+ Describes how to use the Phabricator code review tool hosted on
+ http://reviews.llvm.org/ and its command line interface, Arcanist.
+
+:doc:`BugLifeCycle`
+ Describes how bugs are reported, triaged and closed.
+
+Community
+=========
+
+LLVM has a thriving community of friendly and helpful developers.
+The two primary communication mechanisms in the LLVM community are mailing
+lists and IRC.
+
+Mailing Lists
+-------------
+
+If you can't find what you need in these docs, try consulting the mailing
+lists.
+
+`Developer's List (llvm-dev)`__
+ This list is for people who want to be included in technical discussions of
+ LLVM. People post to this list when they have questions about writing code
+ for or using the LLVM tools. It is relatively low volume.
+
+ .. __: http://lists.llvm.org/mailman/listinfo/llvm-dev
+
+`Commits Archive (llvm-commits)`__
+ This list contains all commit messages that are made when LLVM developers
+ commit code changes to the repository. It also serves as a forum for
+ patch review (i.e. send patches here). It is useful for those who want to
+ stay on the bleeding edge of LLVM development. This list is very high
+ volume.
+
+ .. __: http://lists.llvm.org/pipermail/llvm-commits/
+
+`Bugs & Patches Archive (llvm-bugs)`__
+ This list gets emailed every time a bug is opened and closed. It is
+ higher volume than the LLVM-dev list.
+
+ .. __: http://lists.llvm.org/pipermail/llvm-bugs/
+
+`Test Results Archive (llvm-testresults)`__
+ A message is automatically sent to this list by every active nightly tester
+ when it completes. As such, this list gets email several times each day,
+ making it a high volume list.
+
+ .. __: http://lists.llvm.org/pipermail/llvm-testresults/
+
+`LLVM Announcements List (llvm-announce)`__
+ This is a low volume list that provides important announcements regarding
+ LLVM. It gets email about once a month.
+
+ .. __: http://lists.llvm.org/mailman/listinfo/llvm-announce
+
+IRC
+---
+
+Users and developers of the LLVM project (including subprojects such as Clang)
+can be found in #llvm on `irc.oftc.net <irc://irc.oftc.net/llvm>`_.
+
+This channel has several bots.
+
+* Buildbot reporters
+
+ * llvmbb - Bot for the main LLVM buildbot master.
+ http://lab.llvm.org:8011/console
+ * bb-chapuni - An individually run buildbot master. http://bb.pgr.jp/console
+ * smooshlab - Apple's internal buildbot master.
+
+* robot - Bugzilla linker. %bug <number>
+
+* clang-bot - A `geordi <http://www.eelis.net/geordi/>`_ instance running
+ near-trunk clang instead of gcc.
+
+Community wide proposals
+------------------------
+
+Proposals for massive changes in how the community behaves and how the work flow
+can be better.
+
+.. toctree::
+ :hidden:
+
+ CodeOfConduct
+ Proposals/GitHubMove
+ Proposals/TestSuite
+ Proposals/VectorizationPlan
+
+:doc:`CodeOfConduct`
+ Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
+ IRC, etc).
+
+:doc:`Proposals/GitHubMove`
+ Proposal to move from SVN/Git to GitHub.
+
+:doc:`Proposals/TestSuite`
+ Proposals for additional benchmarks/programs for llvm's test-suite.
+
+:doc:`Proposals/VectorizationPlan`
+ Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`search`
Added: www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT1.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT1.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT1.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT1.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,323 @@
+=======================================================
+Building a JIT: Starting out with KaleidoscopeJIT
+=======================================================
+
+.. contents::
+ :local:
+
+Chapter 1 Introduction
+======================
+
+**Warning: This tutorial is currently being updated to account for ORC API
+changes. Only Chapters 1 and 2 are up-to-date.**
+
+**Example code from Chapters 3 to 5 will compile and run, but has not been
+updated**
+
+Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This
+tutorial runs through the implementation of a JIT compiler using LLVM's
+On-Request-Compilation (ORC) APIs. It begins with a simplified version of the
+KaleidoscopeJIT class used in the
+`Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then
+introduces new features like concurrent compilation, optimization, lazy
+compilation and remote execution.
+
+The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how
+these APIs interact with other parts of LLVM, and to teach you how to recombine
+them to build a custom JIT that is suited to your use-case.
+
+The structure of the tutorial is:
+
+- Chapter #1: Investigate the simple KaleidoscopeJIT class. This will
+ introduce some of the basic concepts of the ORC JIT APIs, including the
+ idea of an ORC *Layer*.
+
+- `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding
+ a new layer that will optimize IR and generated code.
+
+- `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a
+ Compile-On-Demand layer to lazily compile IR.
+
+- `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by
+ replacing the Compile-On-Demand layer with a custom layer that uses the ORC
+ Compile Callbacks API directly to defer IR-generation until functions are
+ called.
+
+- `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into
+ a remote process with reduced privileges using the JIT Remote APIs.
+
+To provide input for our JIT we will use a lightly modified version of the
+Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a
+language in LLVM tutorial".
+
+Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API.
+It was preceded by MCJIT, and before that by the (now deleted) legacy JIT.
+These tutorials don't assume any experience with these earlier APIs, but
+readers acquainted with them will see many familiar elements. Where appropriate
+we will make this connection with the earlier APIs explicit to help people who
+are transitioning from them to ORC.
+
+JIT API Basics
+==============
+
+The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed,
+rather than compiling whole programs to disk ahead of time as a traditional
+compiler does. To support that aim our initial, bare-bones JIT API will have
+just two functions:
+
+1. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module
+ available for execution.
+2. ``Expected<JITEvaluatedSymbol> lookup()``: Search for pointers to
+ symbols (functions or variables) that have been added to the JIT.
+
+A basic use-case for this API, executing the 'main' function from a module,
+will look like:
+
+.. code-block:: c++
+
+ JIT J;
+ J.addModule(buildModule());
+ auto *Main = (int(*)(int, char*[]))J.lookup("main").getAddress();
+ int Result = Main();
+
+The APIs that we build in these tutorials will all be variations on this simple
+theme. Behind this API we will refine the implementation of the JIT to add
+support for concurrent compilation, optimization and lazy compilation.
+Eventually we will extend the API itself to allow higher-level program
+representations (e.g. ASTs) to be added to the JIT.
+
+KaleidoscopeJIT
+===============
+
+In the previous section we described our API, now we examine a simple
+implementation of it: The KaleidoscopeJIT class [1]_ that was used in the
+`Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use
+the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the
+input for our JIT: Each time the user enters an expression the REPL will add a
+new IR module containing the code for that expression to the JIT. If the
+expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also
+use the lookup method of our JIT class find and execute the code for the
+expression. In later chapters of this tutorial we will modify the REPL to enable
+new interactions with our JIT class, but for now we will take this setup for
+granted and focus our attention on the implementation of our JIT itself.
+
+Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the
+usual include guards and #includes [2]_, we get to the definition of our class:
+
+.. code-block:: c++
+
+ #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
+ #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
+
+ #include "llvm/ADT/StringRef.h"
+ #include "llvm/ExecutionEngine/JITSymbol.h"
+ #include "llvm/ExecutionEngine/Orc/CompileUtils.h"
+ #include "llvm/ExecutionEngine/Orc/Core.h"
+ #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
+ #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h"
+ #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h"
+ #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h"
+ #include "llvm/ExecutionEngine/SectionMemoryManager.h"
+ #include "llvm/IR/DataLayout.h"
+ #include "llvm/IR/LLVMContext.h"
+ #include <memory>
+
+ namespace llvm {
+ namespace orc {
+
+ class KaleidoscopeJIT {
+ private:
+ ExecutionSession ES;
+ RTDyldObjectLinkingLayer ObjectLayer;
+ IRCompileLayer CompileLayer;
+
+ DataLayout DL;
+ MangleAndInterner Mangle;
+ ThreadSafeContext Ctx;
+
+ public:
+ KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
+ : ObjectLayer(ES,
+ []() { return llvm::make_unique<SectionMemoryManager>(); }),
+ CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
+ DL(std::move(DL)), Mangle(ES, this->DL),
+ Ctx(llvm::make_unique<LLVMContext>()) {
+ ES.getMainJITDylib().setGenerator(
+ cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL)));
+ }
+
+Our class begins with six member variables: An ExecutionSession member, ``ES``,
+which provides context for our running JIT'd code (including the string pool,
+global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer,
+``ObjectLayer``, that can be used to add object files to our JIT (though we will
+not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to
+add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout
+and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling
+(more on that later); and finally an LLVMContext that clients will use when
+building IR files for the JIT.
+
+Next up we have our class constructor, which takes a `JITTargetMachineBuilder``
+that will be used by our IRCompiler, and a ``DataLayout`` that we will use to
+initialize our DL member. The constructor begins by initializing our
+ObjectLayer. The ObjectLayer requires a reference to the ExecutionSession, and
+a function object that will build a JIT memory manager for each module that is
+added (a JIT memory manager manages memory allocations, memory permissions, and
+registration of exception handlers for JIT'd code). For this we use a lambda
+that returns a SectionMemoryManager, an off-the-shelf utility that provides all
+the basic memory management functionality required for this chapter. Next we
+initialize our CompileLayer. The CompileLayer needs three things: (1) A
+reference to the ExecutionSession, (2) A reference to our object layer, and (3)
+a compiler instance to use to perform the actual compilation from IR to object
+files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler,
+which we construct using this constructor's JITTargetMachineBuilder argument.
+The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build
+llvm TargetMachines (which are not thread safe) as needed for compiles. After
+this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with
+the input DataLayout, the ExecutionSession and DL member, and a new default
+constucted LLVMContext respectively. Now that our members have been initialized,
+so the one thing that remains to do is to tweak the configuration of the
+*JITDylib* that we will store our code in. We want to modify this dylib to
+contain not only the symbols that we add to it, but also the symbols from our
+REPL process as well. We do this by attaching a
+``DynamicLibrarySearchGenerator`` instance using the
+``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method.
+
+
+.. code-block:: c++
+
+ static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() {
+ auto JTMB = JITTargetMachineBuilder::detectHost();
+
+ if (!JTMB)
+ return JTMB.takeError();
+
+ auto DL = JTMB->getDefaultDataLayoutForTarget();
+ if (!DL)
+ return DL.takeError();
+
+ return llvm::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL));
+ }
+
+ const DataLayout &getDataLayout() const { return DL; }
+
+ LLVMContext &getContext() { return *Ctx.getContext(); }
+
+Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT
+instance that is configured to generate code for our host process. It does this
+by first generating a JITTargetMachineBuilder instance using that clases's
+detectHost method and then using that instance to generate a datalayout for
+the target process. Each of these operations can fail, so each returns its
+result wrapped in an Expected value [3]_ that we must check for error before
+continuing. If both operations succeed we can unwrap their results (using the
+dereference operator) and pass them into KaleidoscopeJIT's constructor on the
+last line of the function.
+
+Following the named constructor we have the ``getDataLayout()`` and
+``getContext()`` methods. These are used to make data structures created and
+managed by the JIT (especially the LLVMContext) available to the REPL code that
+will build our IR modules.
+
+.. code-block:: c++
+
+ void addModule(std::unique_ptr<Module> M) {
+ cantFail(CompileLayer.add(ES.getMainJITDylib(),
+ ThreadSafeModule(std::move(M), Ctx)));
+ }
+
+ Expected<JITEvaluatedSymbol> lookup(StringRef Name) {
+ return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str()));
+ }
+
+Now we come to the first of our JIT API methods: addModule. This method is
+responsible for adding IR to the JIT and making it available for execution. In
+this initial implementation of our JIT we will make our modules "available for
+execution" by adding them to the CompileLayer, which will it turn store the
+Module in the main JITDylib. This process will create new symbol table entries
+in the JITDylib for each definition in the module, and will defer compilation of
+the module until any of its definitions is looked up. Note that this is not lazy
+compilation: just referencing a definition, even if it is never used, will be
+enough to trigger compilation. In later chapters we will teach our JIT to defer
+compilation of functions until they're actually called. To add our Module we
+must first wrap it in a ThreadSafeModule instance, which manages the lifetime of
+the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our
+example, all modules will share the Ctx member, which will exist for the
+duration of the JIT. Once we switch to concurrent compilation in later chapters
+we will use a new context per module.
+
+Our last method is ``lookup``, which allows us to look up addresses for
+function and variable definitions added to the JIT based on their symbol names.
+As noted above, lookup will implicitly trigger compilation for any symbol
+that has not already been compiled. Our lookup method calls through to
+`ExecutionSession::lookup`, passing in a list of dylibs to search (in our case
+just the main dylib), and the symbol name to search for, with a twist: We have
+to *mangle* the name of the symbol we're searching for first. The ORC JIT
+components use mangled symbols internally the same way a static compiler and
+linker would, rather than using plain IR symbol names. This allows JIT'd code
+to interoperate easily with precompiled code in the application or shared
+libraries. The kind of mangling will depend on the DataLayout, which in turn
+depends on the target platform. To allow us to remain portable and search based
+on the un-mangled name, we just re-produce this mangling ourselves using our
+``Mangle`` member function object.
+
+This brings us to the end of Chapter 1 of Building a JIT. You now have a basic
+but fully functioning JIT stack that you can use to take LLVM IR and make it
+executable within the context of your JIT process. In the next chapter we'll
+look at how to extend this JIT to produce better quality code, and in the
+process take a deeper look at the ORC layer concept.
+
+`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example. To build this
+example, use:
+
+.. code-block:: bash
+
+ # Compile
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
+ # Run
+ ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h
+ :language: c++
+
+.. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a
+ simplifying assumption: symbols cannot be re-defined. This will make it
+ impossible to re-define symbols in the REPL, but will make our symbol
+ lookup logic simpler. Re-introducing support for symbol redefinition is
+ left as an exercise for the reader. (The KaleidoscopeJIT.h used in the
+ original tutorials will be a helpful reference).
+
+.. [2] +-----------------------------+-----------------------------------------------+
+ | File | Reason for inclusion |
+ +=============================+===============================================+
+ | JITSymbol.h | Defines the lookup result type |
+ | | JITEvaluatedSymbol |
+ +-----------------------------+-----------------------------------------------+
+ | CompileUtils.h | Provides the SimpleCompiler class. |
+ +-----------------------------+-----------------------------------------------+
+ | Core.h | Core utilities such as ExecutionSession and |
+ | | JITDylib. |
+ +-----------------------------+-----------------------------------------------+
+ | ExecutionUtils.h | Provides the DynamicLibrarySearchGenerator |
+ | | class. |
+ +-----------------------------+-----------------------------------------------+
+ | IRCompileLayer.h | Provides the IRCompileLayer class. |
+ +-----------------------------+-----------------------------------------------+
+ | JITTargetMachineBuilder.h | Provides the JITTargetMachineBuilder class. |
+ +-----------------------------+-----------------------------------------------+
+ | RTDyldObjectLinkingLayer.h | Provides the RTDyldObjectLinkingLayer class. |
+ +-----------------------------+-----------------------------------------------+
+ | SectionMemoryManager.h | Provides the SectionMemoryManager class. |
+ +-----------------------------+-----------------------------------------------+
+ | DataLayout.h | Provides the DataLayout class. |
+ +-----------------------------+-----------------------------------------------+
+ | LLVMContext.h | Provides the LLVMContext class. |
+ +-----------------------------+-----------------------------------------------+
+
+.. [3] See the ErrorHandling section in the LLVM Programmer's Manual
+ (http://llvm.org/docs/ProgrammersManual.html#error-handling)
\ No newline at end of file
Added: www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT2.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT2.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT2.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT2.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,277 @@
+=====================================================================
+Building a JIT: Adding Optimizations -- An introduction to ORC Layers
+=====================================================================
+
+.. contents::
+ :local:
+
+**This tutorial is under active development. It is incomplete and details may
+change frequently.** Nonetheless we invite you to try it out as it stands, and
+we welcome any feedback.
+
+Chapter 2 Introduction
+======================
+
+**Warning: This tutorial is currently being updated to account for ORC API
+changes. Only Chapters 1 and 2 are up-to-date.**
+
+**Example code from Chapters 3 to 5 will compile and run, but has not been
+updated**
+
+Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
+`Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
+class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
+executable code in memory. KaleidoscopeJIT was able to do this with relatively
+little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and
+ObjectLinkingLayer, to do much of the heavy lifting.
+
+In this layer we'll learn more about the ORC layer concept by using a new layer,
+IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
+
+Optimizing Modules using the IRTransformLayer
+=============================================
+
+In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
+tutorial series the llvm *FunctionPassManager* is introduced as a means for
+optimizing LLVM IR. Interested readers may read that chapter for details, but
+in short: to optimize a Module we create an llvm::FunctionPassManager
+instance, configure it with a set of optimizations, then run the PassManager on
+a Module to mutate it into a (hopefully) more optimized but semantically
+equivalent form. In the original tutorial series the FunctionPassManager was
+created outside the KaleidoscopeJIT and modules were optimized before being
+added to it. In this Chapter we will make optimization a phase of our JIT
+instead. For now this will provide us a motivation to learn more about ORC
+layers, but in the long term making optimization part of our JIT will yield an
+important benefit: When we begin lazily compiling code (i.e. deferring
+compilation of each function until the first time it's run) having
+optimization managed by our JIT will allow us to optimize lazily too, rather
+than having to do all our optimization up-front.
+
+To add optimization support to our JIT we will take the KaleidoscopeJIT from
+Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the
+IRTransformLayer works in more detail below, but the interface is simple: the
+constructor for this layer takes a reference to the execution session and the
+layer below (as all layers do) plus an *IR optimization function* that it will
+apply to each Module that is added via addModule:
+
+.. code-block:: c++
+
+ class KaleidoscopeJIT {
+ private:
+ ExecutionSession ES;
+ RTDyldObjectLinkingLayer ObjectLayer;
+ IRCompileLayer CompileLayer;
+ IRTransformLayer TransformLayer;
+
+ DataLayout DL;
+ MangleAndInterner Mangle;
+ ThreadSafeContext Ctx;
+
+ public:
+
+ KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
+ : ObjectLayer(ES,
+ []() { return llvm::make_unique<SectionMemoryManager>(); }),
+ CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
+ TransformLayer(ES, CompileLayer, optimizeModule),
+ DL(std::move(DL)), Mangle(ES, this->DL),
+ Ctx(llvm::make_unique<LLVMContext>()) {
+ ES.getMainJITDylib().setGenerator(
+ cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL)));
+ }
+
+Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
+but after the CompileLayer we introduce a new member, TransformLayer, which sits
+on top of our CompileLayer. We initialize our OptimizeLayer with a reference to
+the ExecutionSession and output layer (standard practice for layers), along with
+a *transform function*. For our transform function we supply our classes
+optimizeModule static method.
+
+.. code-block:: c++
+
+ // ...
+ return cantFail(OptimizeLayer.addModule(std::move(M),
+ std::move(Resolver)));
+ // ...
+
+Next we need to update our addModule method to replace the call to
+``CompileLayer::add`` with a call to ``OptimizeLayer::add`` instead.
+
+.. code-block:: c++
+
+ static Expected<ThreadSafeModule>
+ optimizeModule(ThreadSafeModule M, const MaterializationResponsibility &R) {
+ // Create a function pass manager.
+ auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get());
+
+ // Add some optimizations.
+ FPM->add(createInstructionCombiningPass());
+ FPM->add(createReassociatePass());
+ FPM->add(createGVNPass());
+ FPM->add(createCFGSimplificationPass());
+ FPM->doInitialization();
+
+ // Run the optimizations over all functions in the module being added to
+ // the JIT.
+ for (auto &F : *M)
+ FPM->run(F);
+
+ return M;
+ }
+
+At the bottom of our JIT we add a private method to do the actual optimization:
+*optimizeModule*. This function takes the module to be transformed as input (as
+a ThreadSafeModule) along with a reference to a reference to a new class:
+``MaterializationResponsibility``. The MaterializationResponsibility argument
+can be used to query JIT state for the module being transformed, such as the set
+of definitions in the module that JIT'd code is actively trying to call/access.
+For now we will ignore this argument and use a standard optimization
+pipeline. To do this we set up a FunctionPassManager, add some passes to it, run
+it over every function in the module, and then return the mutated module. The
+specific optimizations are the same ones used in `Chapter 4 <LangImpl04.html>`_
+of the "Implementing a language with LLVM" tutorial series. Readers may visit
+that chapter for a more in-depth discussion of these, and of IR optimization in
+general.
+
+And that's it in terms of changes to KaleidoscopeJIT: When a module is added via
+addModule the OptimizeLayer will call our optimizeModule function before passing
+the transformed module on to the CompileLayer below. Of course, we could have
+called optimizeModule directly in our addModule function and not gone to the
+bother of using the IRTransformLayer, but doing so gives us another opportunity
+to see how layers compose. It also provides a neat entry point to the *layer*
+concept itself, because IRTransformLayer is one of the simplest layers that
+can be implemented.
+
+.. code-block:: c++
+
+ // From IRTransformLayer.h:
+ class IRTransformLayer : public IRLayer {
+ public:
+ using TransformFunction = std::function<Expected<ThreadSafeModule>(
+ ThreadSafeModule, const MaterializationResponsibility &R)>;
+
+ IRTransformLayer(ExecutionSession &ES, IRLayer &BaseLayer,
+ TransformFunction Transform = identityTransform);
+
+ void setTransform(TransformFunction Transform) {
+ this->Transform = std::move(Transform);
+ }
+
+ static ThreadSafeModule
+ identityTransform(ThreadSafeModule TSM,
+ const MaterializationResponsibility &R) {
+ return TSM;
+ }
+
+ void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override;
+
+ private:
+ IRLayer &BaseLayer;
+ TransformFunction Transform;
+ };
+
+ // From IRTransfomrLayer.cpp:
+
+ IRTransformLayer::IRTransformLayer(ExecutionSession &ES,
+ IRLayer &BaseLayer,
+ TransformFunction Transform)
+ : IRLayer(ES), BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
+
+ void IRTransformLayer::emit(MaterializationResponsibility R,
+ ThreadSafeModule TSM) {
+ assert(TSM.getModule() && "Module must not be null");
+
+ if (auto TransformedTSM = Transform(std::move(TSM), R))
+ BaseLayer.emit(std::move(R), std::move(*TransformedTSM));
+ else {
+ R.failMaterialization();
+ getExecutionSession().reportError(TransformedTSM.takeError());
+ }
+ }
+
+This is the whole definition of IRTransformLayer, from
+``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h`` and
+``llvm/lib/ExecutionEngine/Orc/IRTransformLayer.cpp``. This class is concerned
+with two very simple jobs: (1) Running every IR Module that is emitted via this
+layer through the transform function object, and (2) implementing the ORC
+``IRLayer`` interface (which itself conforms to the general ORC Layer concept,
+more on that below). Most of the class is straightforward: a typedef for the
+transform function, a constructor to initialize the members, a setter for the
+transform function value, and a default no-op transform. The most important
+method is ``emit`` as this is half of our IRLayer interface. The emit method
+applies our transform to each module that it is called on and, if the transform
+succeeds, passes the transformed module to the base layer. If the transform
+fails, our emit function calls
+``MaterializationResponsibility::failMaterialization`` (this JIT clients who
+may be waiting on other threads know that the code they were waiting for has
+failed to compile) and logs the error with the execution session before bailing
+out.
+
+The other half of the IRLayer interface we inherit unmodified from the IRLayer
+class:
+
+.. code-block:: c++
+
+ Error IRLayer::add(JITDylib &JD, ThreadSafeModule TSM, VModuleKey K) {
+ return JD.define(llvm::make_unique<BasicIRLayerMaterializationUnit>(
+ *this, std::move(K), std::move(TSM)));
+ }
+
+This code, from ``llvm/lib/ExecutionEngine/Orc/Layer.cpp``, adds a
+ThreadSafeModule to a given JITDylib by wrapping it up in a
+``MaterializationUnit`` (in this case a ``BasicIRLayerMaterializationUnit``).
+Most layers that derived from IRLayer can rely on this default implementation
+of the ``add`` method.
+
+These two operations, ``add`` and ``emit``, together constitute the layer
+concept: A layer is a way to wrap a portion of a compiler pipeline (in this case
+the "opt" phase of an LLVM compiler) whose API is is opaque to ORC in an
+interface that allows ORC to invoke it when needed. The add method takes an
+module in some input program representation (in this case an LLVM IR module) and
+stores it in the target JITDylib, arranging for it to be passed back to the
+Layer's emit method when any symbol defined by that module is requested. Layers
+can compose neatly by calling the 'emit' method of a base layer to complete
+their work. For example, in this tutorial our IRTransformLayer calls through to
+our IRCompileLayer to compile the transformed IR, and our IRCompileLayer in turn
+calls our ObjectLayer to link the object file produced by our compiler.
+
+
+So far we have learned how to optimize and compile our LLVM IR, but we have not
+focused on when compilation happens. Our current REPL is eager: Each function
+definition is optimized and compiled as soon as it is referenced by any other
+code, regardless of whether it is ever called at runtime. In the next chapter we
+will introduce fully lazy compilation, in which functions are not compiled until
+they are first called at run-time. At this point the trade-offs get much more
+interesting: the lazier we are, the quicker we can start executing the first
+function, but the more often we will have to pause to compile newly encountered
+functions. If we only code-gen lazily, but optimize eagerly, we will have a
+longer startup time (as everything is optimized) but relatively short pauses as
+each function just passes through code-gen. If we both optimize and code-gen
+lazily we can start executing the first function more quickly, but we will have
+longer pauses as each function has to be both optimized and code-gen'd when it
+is first executed. Things become even more interesting if we consider
+interproceedural optimizations like inlining, which must be performed eagerly.
+These are complex trade-offs, and there is no one-size-fits all solution to
+them, but by providing composable layers we leave the decisions to the person
+implementing the JIT, and make it easy for them to experiment with different
+configurations.
+
+`Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example with an
+IRTransformLayer added to enable optimization. To build this example, use:
+
+.. code-block:: bash
+
+ # Compile
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
+ # Run
+ ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
+ :language: c++
Added: www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT3.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT3.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT3.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT3.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,191 @@
+=============================================
+Building a JIT: Per-function Lazy Compilation
+=============================================
+
+.. contents::
+ :local:
+
+**This tutorial is under active development. It is incomplete and details may
+change frequently.** Nonetheless we invite you to try it out as it stands, and
+we welcome any feedback.
+
+Chapter 3 Introduction
+======================
+
+**Warning: This text is currently out of date due to ORC API updates.**
+
+**The example code has been updated and can be used. The text will be updated
+once the API churn dies down.**
+
+Welcome to Chapter 3 of the "Building an ORC-based JIT in LLVM" tutorial. This
+chapter discusses lazy JITing and shows you how to enable it by adding an ORC
+CompileOnDemand layer the JIT from `Chapter 2 <BuildingAJIT2.html>`_.
+
+Lazy Compilation
+================
+
+When we add a module to the KaleidoscopeJIT class from Chapter 2 it is
+immediately optimized, compiled and linked for us by the IRTransformLayer,
+IRCompileLayer and RTDyldObjectLinkingLayer respectively. This scheme, where all the
+work to make a Module executable is done up front, is simple to understand and
+its performance characteristics are easy to reason about. However, it will lead
+to very high startup times if the amount of code to be compiled is large, and
+may also do a lot of unnecessary compilation if only a few compiled functions
+are ever called at runtime. A truly "just-in-time" compiler should allow us to
+defer the compilation of any given function until the moment that function is
+first called, improving launch times and eliminating redundant work. In fact,
+the ORC APIs provide us with a layer to lazily compile LLVM IR:
+*CompileOnDemandLayer*.
+
+The CompileOnDemandLayer class conforms to the layer interface described in
+Chapter 2, but its addModule method behaves quite differently from the layers
+we have seen so far: rather than doing any work up front, it just scans the
+Modules being added and arranges for each function in them to be compiled the
+first time it is called. To do this, the CompileOnDemandLayer creates two small
+utilities for each function that it scans: a *stub* and a *compile
+callback*. The stub is a pair of a function pointer (which will be pointed at
+the function's implementation once the function has been compiled) and an
+indirect jump through the pointer. By fixing the address of the indirect jump
+for the lifetime of the program we can give the function a permanent "effective
+address", one that can be safely used for indirection and function pointer
+comparison even if the function's implementation is never compiled, or if it is
+compiled more than once (due to, for example, recompiling the function at a
+higher optimization level) and changes address. The second utility, the compile
+callback, represents a re-entry point from the program into the compiler that
+will trigger compilation and then execution of a function. By initializing the
+function's stub to point at the function's compile callback, we enable lazy
+compilation: The first attempted call to the function will follow the function
+pointer and trigger the compile callback instead. The compile callback will
+compile the function, update the function pointer for the stub, then execute
+the function. On all subsequent calls to the function, the function pointer
+will point at the already-compiled function, so there is no further overhead
+from the compiler. We will look at this process in more detail in the next
+chapter of this tutorial, but for now we'll trust the CompileOnDemandLayer to
+set all the stubs and callbacks up for us. All we need to do is to add the
+CompileOnDemandLayer to the top of our stack and we'll get the benefits of
+lazy compilation. We just need a few changes to the source:
+
+.. code-block:: c++
+
+ ...
+ #include "llvm/ExecutionEngine/SectionMemoryManager.h"
+ #include "llvm/ExecutionEngine/Orc/CompileOnDemandLayer.h"
+ #include "llvm/ExecutionEngine/Orc/CompileUtils.h"
+ ...
+
+ ...
+ class KaleidoscopeJIT {
+ private:
+ std::unique_ptr<TargetMachine> TM;
+ const DataLayout DL;
+ RTDyldObjectLinkingLayer ObjectLayer;
+ IRCompileLayer<decltype(ObjectLayer), SimpleCompiler> CompileLayer;
+
+ using OptimizeFunction =
+ std::function<std::shared_ptr<Module>(std::shared_ptr<Module>)>;
+
+ IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer;
+
+ std::unique_ptr<JITCompileCallbackManager> CompileCallbackManager;
+ CompileOnDemandLayer<decltype(OptimizeLayer)> CODLayer;
+
+ public:
+ using ModuleHandle = decltype(CODLayer)::ModuleHandleT;
+
+First we need to include the CompileOnDemandLayer.h header, then add two new
+members: a std::unique_ptr<JITCompileCallbackManager> and a CompileOnDemandLayer,
+to our class. The CompileCallbackManager member is used by the CompileOnDemandLayer
+to create the compile callback needed for each function.
+
+.. code-block:: c++
+
+ KaleidoscopeJIT()
+ : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
+ ObjectLayer([]() { return std::make_shared<SectionMemoryManager>(); }),
+ CompileLayer(ObjectLayer, SimpleCompiler(*TM)),
+ OptimizeLayer(CompileLayer,
+ [this](std::shared_ptr<Module> M) {
+ return optimizeModule(std::move(M));
+ }),
+ CompileCallbackManager(
+ orc::createLocalCompileCallbackManager(TM->getTargetTriple(), 0)),
+ CODLayer(OptimizeLayer,
+ [this](Function &F) { return std::set<Function*>({&F}); },
+ *CompileCallbackManager,
+ orc::createLocalIndirectStubsManagerBuilder(
+ TM->getTargetTriple())) {
+ llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
+ }
+
+Next we have to update our constructor to initialize the new members. To create
+an appropriate compile callback manager we use the
+createLocalCompileCallbackManager function, which takes a TargetMachine and a
+JITTargetAddress to call if it receives a request to compile an unknown
+function. In our simple JIT this situation is unlikely to come up, so we'll
+cheat and just pass '0' here. In a production quality JIT you could give the
+address of a function that throws an exception in order to unwind the JIT'd
+code's stack.
+
+Now we can construct our CompileOnDemandLayer. Following the pattern from
+previous layers we start by passing a reference to the next layer down in our
+stack -- the OptimizeLayer. Next we need to supply a 'partitioning function':
+when a not-yet-compiled function is called, the CompileOnDemandLayer will call
+this function to ask us what we would like to compile. At a minimum we need to
+compile the function being called (given by the argument to the partitioning
+function), but we could also request that the CompileOnDemandLayer compile other
+functions that are unconditionally called (or highly likely to be called) from
+the function being called. For KaleidoscopeJIT we'll keep it simple and just
+request compilation of the function that was called. Next we pass a reference to
+our CompileCallbackManager. Finally, we need to supply an "indirect stubs
+manager builder": a utility function that constructs IndirectStubManagers, which
+are in turn used to build the stubs for the functions in each module. The
+CompileOnDemandLayer will call the indirect stub manager builder once for each
+call to addModule, and use the resulting indirect stubs manager to create
+stubs for all functions in all modules in the set. If/when the module set is
+removed from the JIT the indirect stubs manager will be deleted, freeing any
+memory allocated to the stubs. We supply this function by using the
+createLocalIndirectStubsManagerBuilder utility.
+
+.. code-block:: c++
+
+ // ...
+ if (auto Sym = CODLayer.findSymbol(Name, false))
+ // ...
+ return cantFail(CODLayer.addModule(std::move(Ms),
+ std::move(Resolver)));
+ // ...
+
+ // ...
+ return CODLayer.findSymbol(MangledNameStream.str(), true);
+ // ...
+
+ // ...
+ CODLayer.removeModule(H);
+ // ...
+
+Finally, we need to replace the references to OptimizeLayer in our addModule,
+findSymbol, and removeModule methods. With that, we're up and running.
+
+**To be done:**
+
+** Chapter conclusion.**
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example with a CompileOnDemand
+layer added to enable lazy function-at-a-time compilation. To build this example, use:
+
+.. code-block:: bash
+
+ # Compile
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
+ # Run
+ ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter3/KaleidoscopeJIT.h
+ :language: c++
+
+`Next: Extreme Laziness -- Using Compile Callbacks to JIT directly from ASTs <BuildingAJIT4.html>`_
Added: www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT4.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT4.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT4.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT4.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,48 @@
+===========================================================================
+Building a JIT: Extreme Laziness - Using Compile Callbacks to JIT from ASTs
+===========================================================================
+
+.. contents::
+ :local:
+
+**This tutorial is under active development. It is incomplete and details may
+change frequently.** Nonetheless we invite you to try it out as it stands, and
+we welcome any feedback.
+
+Chapter 4 Introduction
+======================
+
+Welcome to Chapter 4 of the "Building an ORC-based JIT in LLVM" tutorial. This
+chapter introduces the Compile Callbacks and Indirect Stubs APIs and shows how
+they can be used to replace the CompileOnDemand layer from
+`Chapter 3 <BuildingAJIT3.html>`_ with a custom lazy-JITing scheme that JITs
+directly from Kaleidoscope ASTs.
+
+**To be done:**
+
+**(1) Describe the drawbacks of JITing from IR (have to compile to IR first,
+which reduces the benefits of laziness).**
+
+**(2) Describe CompileCallbackManagers and IndirectStubManagers in detail.**
+
+**(3) Run through the implementation of addFunctionAST.**
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example that JITs lazily from
+Kaleidoscope ASTS. To build this example, use:
+
+.. code-block:: bash
+
+ # Compile
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
+ # Run
+ ./toy
+
+Here is the code:
+
+.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter4/KaleidoscopeJIT.h
+ :language: c++
+
+`Next: Remote-JITing -- Process-isolation and laziness-at-a-distance <BuildingAJIT5.html>`_
Added: www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT5.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT5.rst.txt?rev=368037&view=auto
==============================================================================
--- www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT5.rst.txt (added)
+++ www-releases/trunk/8.0.1/docs/_sources/tutorial/BuildingAJIT5.rst.txt Tue Aug 6 06:51:02 2019
@@ -0,0 +1,57 @@
+=============================================================================
+Building a JIT: Remote-JITing -- Process Isolation and Laziness at a Distance
+=============================================================================
+
+.. contents::
+ :local:
+
+**This tutorial is under active development. It is incomplete and details may
+change frequently.** Nonetheless we invite you to try it out as it stands, and
+we welcome any feedback.
+
+Chapter 5 Introduction
+======================
+
+Welcome to Chapter 5 of the "Building an ORC-based JIT in LLVM" tutorial. This
+chapter introduces the ORC RemoteJIT Client/Server APIs and shows how to use
+them to build a JIT stack that will execute its code via a communications
+channel with a different process. This can be a separate process on the same
+machine, a process on a different machine, or even a process on a different
+platform/architecture. The code builds on top of the lazy-AST-compiling JIT
+stack from `Chapter 4 <BuildingAJIT3.html>`_.
+
+**To be done -- this is going to be a long one:**
+
+**(1) Introduce channels, RPC, RemoteJIT Client and Server APIs**
+
+**(2) Describe the client code in greater detail. Discuss modifications of the
+KaleidoscopeJIT class, and the REPL itself.**
+
+**(3) Describe the server code.**
+
+**(4) Describe how to run the demo.**
+
+Full Code Listing
+=================
+
+Here is the complete code listing for our running example that JITs lazily from
+Kaleidoscope ASTS. To build this example, use:
+
+.. code-block:: bash
+
+ # Compile
+ clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
+ clang++ -g Server/server.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy-server
+ # Run
+ ./toy-server &
+ ./toy
+
+Here is the code for the modified KaleidoscopeJIT:
+
+.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter5/KaleidoscopeJIT.h
+ :language: c++
+
+And the code for the JIT server:
+
+.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter5/Server/server.cpp
+ :language: c++
More information about the llvm-commits
mailing list