[www-releases] r321287 - Add 5.0.1 docs

Thu Dec 21 10:09:58 PST 2017

Added: www-releases/trunk/5.0.1/docs/TestingGuide.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/TestingGuide.html?rev=321287&view=auto
==============================================================================

--- www-releases/trunk/5.0.1/docs/TestingGuide.html (added)
+++ www-releases/trunk/5.0.1/docs/TestingGuide.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,664 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>LLVM Testing Infrastructure Guide — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="LLVM test-suite Guide" href="TestSuiteMakefileGuide.html" />
+    <link rel="prev" title="Code Reviews with Phabricator" href="Phabricator.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="TestSuiteMakefileGuide.html" title="LLVM test-suite Guide"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="Phabricator.html" title="Code Reviews with Phabricator"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="llvm-testing-infrastructure-guide">
+<h1>LLVM Testing Infrastructure Guide<a class="headerlink" href="#llvm-testing-infrastructure-guide" title="Permalink to this headline">Â¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#overview" id="id5">Overview</a></li>
+<li><a class="reference internal" href="#requirements" id="id6">Requirements</a></li>
+<li><a class="reference internal" href="#llvm-testing-infrastructure-organization" id="id7">LLVM testing infrastructure organization</a><ul>
+<li><a class="reference internal" href="#regression-tests" id="id8">Regression tests</a></li>
+<li><a class="reference internal" href="#test-suite" id="id9"><code class="docutils literal"><span class="pre">test-suite</span></code></a></li>
+<li><a class="reference internal" href="#debugging-information-tests" id="id10">Debugging Information tests</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#quick-start" id="id11">Quick start</a><ul>
+<li><a class="reference internal" href="#id1" id="id12">Regression tests</a></li>
+<li><a class="reference internal" href="#id2" id="id13">Debugging Information tests</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#regression-test-structure" id="id14">Regression test structure</a><ul>
+<li><a class="reference internal" href="#writing-new-regression-tests" id="id15">Writing new regression tests</a></li>
+<li><a class="reference internal" href="#extra-files" id="id16">Extra files</a></li>
+<li><a class="reference internal" href="#fragile-tests" id="id17">Fragile tests</a></li>
+<li><a class="reference internal" href="#platform-specific-tests" id="id18">Platform-Specific Tests</a></li>
+<li><a class="reference internal" href="#constraining-test-execution" id="id19">Constraining test execution</a></li>
+<li><a class="reference internal" href="#substitutions" id="id20">Substitutions</a></li>
+<li><a class="reference internal" href="#options" id="id21">Options</a></li>
+<li><a class="reference internal" href="#other-features" id="id22">Other Features</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#test-suite-overview" id="id23"><code class="docutils literal"><span class="pre">test-suite</span></code> Overview</a><ul>
+<li><a class="reference internal" href="#test-suite-quickstart" id="id24"><code class="docutils literal"><span class="pre">test-suite</span></code> Quickstart</a></li>
+<li><a class="reference internal" href="#test-suite-makefiles" id="id25"><code class="docutils literal"><span class="pre">test-suite</span></code> Makefiles</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="toctree-wrapper compound">
+</div>
+<div class="section" id="overview">
+<h2><a class="toc-backref" href="#id5">Overview</a><a class="headerlink" href="#overview" title="Permalink to this headline">Â¶</a></h2>
+<p>This document is the reference manual for the LLVM testing
+infrastructure. It documents the structure of the LLVM testing
+infrastructure, the tools needed to use it, and how to add and run
+tests.</p>
+</div>
+<div class="section" id="requirements">
+<h2><a class="toc-backref" href="#id6">Requirements</a><a class="headerlink" href="#requirements" title="Permalink to this headline">Â¶</a></h2>
+<p>In order to use the LLVM testing infrastructure, you will need all of the
+software required to build LLVM, as well as <a class="reference external" href="http://python.org">Python</a> 2.7 or
+later.</p>
+<p>If you intend to run the <a class="reference internal" href="#test-suite-overview"><span class="std std-ref">test-suite</span></a>, you will also
+need a development version of zlib (zlib1g-dev is known to work on several Linux
+distributions).</p>
+</div>
+<div class="section" id="llvm-testing-infrastructure-organization">
+<h2><a class="toc-backref" href="#id7">LLVM testing infrastructure organization</a><a class="headerlink" href="#llvm-testing-infrastructure-organization" title="Permalink to this headline">Â¶</a></h2>
+<p>The LLVM testing infrastructure contains two major categories of tests:
+regression tests and whole programs. The regression tests are contained
+inside the LLVM repository itself under <code class="docutils literal"><span class="pre">llvm/test</span></code> and are expected
+to always pass â they should be run before every commit.</p>
+<p>The whole programs tests are referred to as the âLLVM test suiteâ (or
+âtest-suiteâ) and are in the <code class="docutils literal"><span class="pre">test-suite</span></code> module in subversion. For
+historical reasons, these tests are also referred to as the ânightly
+testsâ in places, which is less ambiguous than âtest-suiteâ and remains
+in use although we run them much more often than nightly.</p>
+<div class="section" id="regression-tests">
+<h3><a class="toc-backref" href="#id8">Regression tests</a><a class="headerlink" href="#regression-tests" title="Permalink to this headline">Â¶</a></h3>
+<p>The regression tests are small pieces of code that test a specific
+feature of LLVM or trigger a specific bug in LLVM. The language they are
+written in depends on the part of LLVM being tested. These tests are driven by
+the <a class="reference internal" href="CommandGuide/lit.html"><span class="doc">Lit</span></a> testing tool (which is part of LLVM), and
+are located in the <code class="docutils literal"><span class="pre">llvm/test</span></code> directory.</p>
+<p>Typically when a bug is found in LLVM, a regression test containing just
+enough code to reproduce the problem should be written and placed
+somewhere underneath this directory. For example, it can be a small
+piece of LLVM IR distilled from an actual application or benchmark.</p>
+</div>
+<div class="section" id="test-suite">
+<h3><a class="toc-backref" href="#id9"><code class="docutils literal"><span class="pre">test-suite</span></code></a><a class="headerlink" href="#test-suite" title="Permalink to this headline">Â¶</a></h3>
+<p>The test suite contains whole programs, which are pieces of code which
+can be compiled and linked into a stand-alone program that can be
+executed. These programs are generally written in high level languages
+such as C or C++.</p>
+<p>These programs are compiled using a user specified compiler and set of
+flags, and then executed to capture the program output and timing
+information. The output of these programs is compared to a reference
+output to ensure that the program is being compiled correctly.</p>
+<p>In addition to compiling and executing programs, whole program tests
+serve as a way of benchmarking LLVM performance, both in terms of the
+efficiency of the programs generated as well as the speed with which
+LLVM compiles, optimizes, and generates code.</p>
+<p>The test-suite is located in the <code class="docutils literal"><span class="pre">test-suite</span></code> Subversion module.</p>
+</div>
+<div class="section" id="debugging-information-tests">
+<h3><a class="toc-backref" href="#id10">Debugging Information tests</a><a class="headerlink" href="#debugging-information-tests" title="Permalink to this headline">Â¶</a></h3>
+<p>The test suite contains tests to check quality of debugging information.
+The test are written in C based languages or in LLVM assembly language.</p>
+<p>These tests are compiled and run under a debugger. The debugger output
+is checked to validate of debugging information. See README.txt in the
+test suite for more information . This test suite is located in the
+<code class="docutils literal"><span class="pre">debuginfo-tests</span></code> Subversion module.</p>
+</div>
+</div>
+<div class="section" id="quick-start">
+<h2><a class="toc-backref" href="#id11">Quick start</a><a class="headerlink" href="#quick-start" title="Permalink to this headline">Â¶</a></h2>
+<p>The tests are located in two separate Subversion modules. The
+regressions tests are in the main âllvmâ module under the directory
+<code class="docutils literal"><span class="pre">llvm/test</span></code> (so you get these tests for free with the main LLVM tree).
+Use <code class="docutils literal"><span class="pre">make</span> <span class="pre">check-all</span></code> to run the regression tests after building LLVM.</p>
+<p>The more comprehensive test suite that includes whole programs in C and C++
+is in the <code class="docutils literal"><span class="pre">test-suite</span></code> module. See <a class="reference internal" href="#test-suite-quickstart"><span class="std std-ref">test-suite Quickstart</span></a> for more information on running these tests.</p>
+<div class="section" id="id1">
+<h3><a class="toc-backref" href="#id12">Regression tests</a><a class="headerlink" href="#id1" title="Permalink to this headline">Â¶</a></h3>
+<p>To run all of the LLVM regression tests use the check-llvm target:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% make check-llvm
+</pre></div>
+</div>
+<p>If you have <a class="reference external" href="http://clang.llvm.org/">Clang</a> checked out and built, you
+can run the LLVM and Clang tests simultaneously using:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% make check-all
+</pre></div>
+</div>
+<p>To run the tests with Valgrind (Memcheck by default), use the <code class="docutils literal"><span class="pre">LIT_ARGS</span></code> make
+variable to pass the required options to lit. For example, you can use:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% make check <span class="nv">LIT_ARGS</span><span class="o">=</span><span class="s2">"-v --vg --vg-leak"</span>
+</pre></div>
+</div>
+<p>to enable testing with valgrind and with leak checking enabled.</p>
+<p>To run individual tests or subsets of tests, you can use the <code class="docutils literal"><span class="pre">llvm-lit</span></code>
+script which is built as part of LLVM. For example, to run the
+<code class="docutils literal"><span class="pre">Integer/BitPacked.ll</span></code> test by itself you can run:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% llvm-lit ~/llvm/test/Integer/BitPacked.ll
+</pre></div>
+</div>
+<p>or to run all of the ARM CodeGen tests:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% llvm-lit ~/llvm/test/CodeGen/ARM
+</pre></div>
+</div>
+<p>For more information on using the <strong class="program">lit</strong> tool, see <code class="docutils literal"><span class="pre">llvm-lit</span> <span class="pre">--help</span></code>
+or the <a class="reference internal" href="CommandGuide/lit.html"><span class="doc">lit man page</span></a>.</p>
+</div>
+<div class="section" id="id2">
+<h3><a class="toc-backref" href="#id13">Debugging Information tests</a><a class="headerlink" href="#id2" title="Permalink to this headline">Â¶</a></h3>
+<p>To run debugging information tests simply checkout the tests inside
+clang/test directory.</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% <span class="nb">cd</span> clang/test
+% svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests
+</pre></div>
+</div>
+<p>These tests are already set up to run as part of clang regression tests.</p>
+</div>
+</div>
+<div class="section" id="regression-test-structure">
+<h2><a class="toc-backref" href="#id14">Regression test structure</a><a class="headerlink" href="#regression-test-structure" title="Permalink to this headline">Â¶</a></h2>
+<p>The LLVM regression tests are driven by <strong class="program">lit</strong> and are located in the
+<code class="docutils literal"><span class="pre">llvm/test</span></code> directory.</p>
+<p>This directory contains a large array of small tests that exercise
+various features of LLVM and to ensure that regressions do not occur.
+The directory is broken into several sub-directories, each focused on a
+particular area of LLVM.</p>
+<div class="section" id="writing-new-regression-tests">
+<h3><a class="toc-backref" href="#id15">Writing new regression tests</a><a class="headerlink" href="#writing-new-regression-tests" title="Permalink to this headline">Â¶</a></h3>
+<p>The regression test structure is very simple, but does require some
+information to be set. This information is gathered via <code class="docutils literal"><span class="pre">configure</span></code>
+and is written to a file, <code class="docutils literal"><span class="pre">test/lit.site.cfg</span></code> in the build directory.
+The <code class="docutils literal"><span class="pre">llvm/test</span></code> Makefile does this work for you.</p>
+<p>In order for the regression tests to work, each directory of tests must
+have a <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> file. <strong class="program">lit</strong> looks for this file to determine
+how to run the tests. This file is just Python code and thus is very
+flexible, but weâve standardized it for the LLVM regression tests. If
+youâre adding a directory of tests, just copy <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> from
+another directory to get running. The standard <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> simply
+specifies which files to look in for tests. Any directory that contains
+only directories does not need the <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> file. Read the <a class="reference internal" href="CommandGuide/lit.html"><span class="doc">Lit
+documentation</span></a> for more information.</p>
+<p>Each test file must contain lines starting with âRUN:â that tell <strong class="program">lit</strong>
+how to run it. If there are no RUN lines, <strong class="program">lit</strong> will issue an error
+while running a test.</p>
+<p>RUN lines are specified in the comments of the test program using the
+keyword <code class="docutils literal"><span class="pre">RUN</span></code> followed by a colon, and lastly the command (pipeline)
+to execute. Together, these lines form the âscriptâ that <strong class="program">lit</strong>
+executes to run the test case. The syntax of the RUN lines is similar to a
+shellâs syntax for pipelines including I/O redirection and variable
+substitution. However, even though these lines may <em>look</em> like a shell
+script, they are not. RUN lines are interpreted by <strong class="program">lit</strong>.
+Consequently, the syntax differs from shell in a few ways. You can specify
+as many RUN lines as needed.</p>
+<p><strong class="program">lit</strong> performs substitution on each RUN line to replace LLVM tool names
+with the full paths to the executable built for each tool (in
+<code class="docutils literal"><span class="pre">$(LLVM_OBJ_ROOT)/$(BuildMode)/bin)</span></code>. This ensures that <strong class="program">lit</strong> does
+not invoke any stray LLVM tools in the userâs path during testing.</p>
+<p>Each RUN line is executed on its own, distinct from other lines unless
+its last character is <code class="docutils literal"><span class="pre">\</span></code>. This continuation character causes the RUN
+line to be concatenated with the next one. In this way you can build up
+long pipelines of commands without making huge line lengths. The lines
+ending in <code class="docutils literal"><span class="pre">\</span></code> are concatenated until a RUN line that doesnât end in
+<code class="docutils literal"><span class="pre">\</span></code> is found. This concatenated set of RUN lines then constitutes one
+execution. <strong class="program">lit</strong> will substitute variables and arrange for the pipeline
+to be executed. If any process in the pipeline fails, the entire line (and
+test case) fails too.</p>
+<p>Below is an example of legal RUN lines in a <code class="docutils literal"><span class="pre">.ll</span></code> file:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; RUN: llvm-as < %s | llvm-dis > %t1</span>
+<span class="c">; RUN: llvm-dis < %s.bc-13 > %t2</span>
+<span class="c">; RUN: diff %t1 %t2</span>
+</pre></div>
+</div>
+<p>As with a Unix shell, the RUN lines permit pipelines and I/O
+redirection to be used.</p>
+<p>There are some quoting rules that you must pay attention to when writing
+your RUN lines. In general nothing needs to be quoted. <strong class="program">lit</strong> wonât
+strip off any quote characters so they will get passed to the invoked program.
+To avoid this use curly braces to tell <strong class="program">lit</strong> that it should treat
+everything enclosed as one value.</p>
+<p>In general, you should strive to keep your RUN lines as simple as possible,
+using them only to run tools that generate textual output you can then examine.
+The recommended way to examine output to figure out if the test passes is using
+the <a class="reference internal" href="CommandGuide/FileCheck.html"><span class="doc">FileCheck tool</span></a>. <em>[The usage of grep in RUN
+lines is deprecated - please do not send or commit patches that use it.]</em></p>
+<p>Put related tests into a single file rather than having a separate file per
+test. Check if there are files already covering your feature and consider
+adding your code there instead of creating a new file.</p>
+</div>
+<div class="section" id="extra-files">
+<h3><a class="toc-backref" href="#id16">Extra files</a><a class="headerlink" href="#extra-files" title="Permalink to this headline">Â¶</a></h3>
+<p>If your test requires extra files besides the file containing the <code class="docutils literal"><span class="pre">RUN:</span></code>
+lines, the idiomatic place to put them is in a subdirectory <code class="docutils literal"><span class="pre">Inputs</span></code>.
+You can then refer to the extra files as <code class="docutils literal"><span class="pre">%S/Inputs/foo.bar</span></code>.</p>
+<p>For example, consider <code class="docutils literal"><span class="pre">test/Linker/ident.ll</span></code>. The directory structure is
+as follows:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">test</span><span class="o">/</span>
+  <span class="n">Linker</span><span class="o">/</span>
+    <span class="n">ident</span><span class="o">.</span><span class="n">ll</span>
+    <span class="n">Inputs</span><span class="o">/</span>
+      <span class="n">ident</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">ll</span>
+      <span class="n">ident</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">ll</span>
+</pre></div>
+</div>
+<p>For convenience, these are the contents:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">;;;;; ident.ll:</span>
+
+<span class="c">; RUN: llvm-link %S/Inputs/ident.a.ll %S/Inputs/ident.b.ll -S | FileCheck %s</span>
+
+<span class="c">; Verify that multiple input llvm.ident metadata are linked together.</span>
+
+<span class="c">; CHECK-DAG: !llvm.ident = !{!0, !1, !2}</span>
+<span class="c">; CHECK-DAG: "Compiler V1"</span>
+<span class="c">; CHECK-DAG: "Compiler V2"</span>
+<span class="c">; CHECK-DAG: "Compiler V3"</span>
+
+<span class="c">;;;;; Inputs/ident.a.ll:</span>
+
+<span class="nv">!llvm.ident</span> <span class="p">=</span> <span class="p">!{</span><span class="nv nv-Anonymous">!0</span><span class="p">,</span> <span class="nv nv-Anonymous">!1</span><span class="p">}</span>
+<span class="nv nv-Anonymous">!0</span> <span class="p">=</span> <span class="kt">metadata</span> <span class="p">!{</span><span class="kt">metadata</span> <span class="nv">!"Compiler V1"</span><span class="p">}</span>
+<span class="nv nv-Anonymous">!1</span> <span class="p">=</span> <span class="kt">metadata</span> <span class="p">!{</span><span class="kt">metadata</span> <span class="nv">!"Compiler V2"</span><span class="p">}</span>
+
+<span class="c">;;;;; Inputs/ident.b.ll:</span>
+
+<span class="nv">!llvm.ident</span> <span class="p">=</span> <span class="p">!{</span><span class="nv nv-Anonymous">!0</span><span class="p">}</span>
+<span class="nv nv-Anonymous">!0</span> <span class="p">=</span> <span class="kt">metadata</span> <span class="p">!{</span><span class="kt">metadata</span> <span class="nv">!"Compiler V3"</span><span class="p">}</span>
+</pre></div>
+</div>
+<p>For symmetry reasons, <code class="docutils literal"><span class="pre">ident.ll</span></code> is just a dummy file that doesnât
+actually participate in the test besides holding the <code class="docutils literal"><span class="pre">RUN:</span></code> lines.</p>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">Some existing tests use <code class="docutils literal"><span class="pre">RUN:</span> <span class="pre">true</span></code> in extra files instead of just
+putting the extra files in an <code class="docutils literal"><span class="pre">Inputs/</span></code> directory. This pattern is
+deprecated.</p>
+</div>
+</div>
+<div class="section" id="fragile-tests">
+<h3><a class="toc-backref" href="#id17">Fragile tests</a><a class="headerlink" href="#fragile-tests" title="Permalink to this headline">Â¶</a></h3>
+<p>It is easy to write a fragile test that would fail spuriously if the tool being
+tested outputs a full path to the input file.  For example, <strong class="program">opt</strong> by
+default outputs a <code class="docutils literal"><span class="pre">ModuleID</span></code>:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> cat example.ll
+<span class="go">define i32 @main() nounwind {</span>
+<span class="go">    ret i32 0</span>
+<span class="go">}</span>
+
+<span class="gp">$</span> opt -S /path/to/example.ll
+<span class="go">; ModuleID = '/path/to/example.ll'</span>
+
+<span class="go">define i32 @main() nounwind {</span>
+<span class="go">    ret i32 0</span>
+<span class="go">}</span>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">ModuleID</span></code> can unexpectedly match against <code class="docutils literal"><span class="pre">CHECK</span></code> lines.  For example:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; RUN: opt -S %s | FileCheck</span>
+
+<span class="k">define</span> <span class="k">i32</span> <span class="vg">@main</span><span class="p">()</span> <span class="k">nounwind</span> <span class="p">{</span>
+    <span class="c">; CHECK-NOT: load</span>
+    <span class="k">ret</span> <span class="k">i32</span> <span class="m">0</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This test will fail if placed into a <code class="docutils literal"><span class="pre">download</span></code> directory.</p>
+<p>To make your tests robust, always use <code class="docutils literal"><span class="pre">opt</span> <span class="pre">...</span> <span class="pre"><</span> <span class="pre">%s</span></code> in the RUN line.
+<strong class="program">opt</strong> does not output a <code class="docutils literal"><span class="pre">ModuleID</span></code> when input comes from stdin.</p>
+</div>
+<div class="section" id="platform-specific-tests">
+<h3><a class="toc-backref" href="#id18">Platform-Specific Tests</a><a class="headerlink" href="#platform-specific-tests" title="Permalink to this headline">Â¶</a></h3>
+<p>Whenever adding tests that require the knowledge of a specific platform,
+either related to code generated, specific output or back-end features,
+you must make sure to isolate the features, so that buildbots that
+run on different architectures (and donât even compile all back-ends),
+donât fail.</p>
+<p>The first problem is to check for target-specific output, for example sizes
+of structures, paths and architecture names, for example:</p>
+<ul class="simple">
+<li>Tests containing Windows paths will fail on Linux and vice-versa.</li>
+<li>Tests that check for <code class="docutils literal"><span class="pre">x86_64</span></code> somewhere in the text will fail anywhere else.</li>
+<li>Tests where the debug information calculates the size of types and structures.</li>
+</ul>
+<p>Also, if the test rely on any behaviour that is coded in any back-end, it must
+go in its own directory. So, for instance, code generator tests for ARM go
+into <code class="docutils literal"><span class="pre">test/CodeGen/ARM</span></code> and so on. Those directories contain a special
+<code class="docutils literal"><span class="pre">lit</span></code> configuration file that ensure all tests in that directory will
+only run if a specific back-end is compiled and available.</p>
+<p>For instance, on <code class="docutils literal"><span class="pre">test/CodeGen/ARM</span></code>, the <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">config</span><span class="o">.</span><span class="n">suffixes</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'.ll'</span><span class="p">,</span> <span class="s1">'.c'</span><span class="p">,</span> <span class="s1">'.cpp'</span><span class="p">,</span> <span class="s1">'.test'</span><span class="p">]</span>
+<span class="k">if</span> <span class="ow">not</span> <span class="s1">'ARM'</span> <span class="ow">in</span> <span class="n">config</span><span class="o">.</span><span class="n">root</span><span class="o">.</span><span class="n">targets</span><span class="p">:</span>
+  <span class="n">config</span><span class="o">.</span><span class="n">unsupported</span> <span class="o">=</span> <span class="bp">True</span>
+</pre></div>
+</div>
+<p>Other platform-specific tests are those that depend on a specific feature
+of a specific sub-architecture, for example only to Intel chips that support <code class="docutils literal"><span class="pre">AVX2</span></code>.</p>
+<p>For instance, <code class="docutils literal"><span class="pre">test/CodeGen/X86/psubus.ll</span></code> tests three sub-architecture
+variants:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; RUN: llc -mcpu=core2 < %s | FileCheck %s -check-prefix=SSE2</span>
+<span class="c">; RUN: llc -mcpu=corei7-avx < %s | FileCheck %s -check-prefix=AVX1</span>
+<span class="c">; RUN: llc -mcpu=core-avx2 < %s | FileCheck %s -check-prefix=AVX2</span>
+</pre></div>
+</div>
+<p>And the checks are different:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; SSE2: @test1</span>
+<span class="c">; SSE2: psubusw LCPI0_0(%rip), %xmm0</span>
+<span class="c">; AVX1: @test1</span>
+<span class="c">; AVX1: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0</span>
+<span class="c">; AVX2: @test1</span>
+<span class="c">; AVX2: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0</span>
+</pre></div>
+</div>
+<p>So, if youâre testing for a behaviour that you know is platform-specific or
+depends on special features of sub-architectures, you must add the specific
+triple, test with the specific FileCheck and put it into the specific
+directory that will filter out all other architectures.</p>
+</div>
+<div class="section" id="constraining-test-execution">
+<h3><a class="toc-backref" href="#id19">Constraining test execution</a><a class="headerlink" href="#constraining-test-execution" title="Permalink to this headline">Â¶</a></h3>
+<p>Some tests can be run only in specific configurations, such as
+with debug builds or on particular platforms. Use <code class="docutils literal"><span class="pre">REQUIRES</span></code>
+and <code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> to control when the test is enabled.</p>
+<p>Some tests are expected to fail. For example, there may be a known bug
+that the test detect. Use <code class="docutils literal"><span class="pre">XFAIL</span></code> to mark a test as an expected failure.
+An <code class="docutils literal"><span class="pre">XFAIL</span></code> test will be successful if its execution fails, and
+will be a failure if its execution succeeds.</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; This test will be only enabled in the build with asserts.</span>
+<span class="c">; REQUIRES: asserts</span>
+<span class="c">; This test is disabled on Linux.</span>
+<span class="c">; UNSUPPORTED: -linux-</span>
+<span class="c">; This test is expected to fail on PowerPC.</span>
+<span class="c">; XFAIL: powerpc</span>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">REQUIRES</span></code> and <code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> and <code class="docutils literal"><span class="pre">XFAIL</span></code> all accept a comma-separated
+list of boolean expressions. The values in each expression may be:</p>
+<ul class="simple">
+<li>Features added to <code class="docutils literal"><span class="pre">config.available_features</span></code> by
+configuration files such as <code class="docutils literal"><span class="pre">lit.cfg</span></code>.</li>
+<li>Substrings of the target triple (<code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> and <code class="docutils literal"><span class="pre">XFAIL</span></code> only).</li>
+</ul>
+<div class="line-block">
+<div class="line"><code class="docutils literal"><span class="pre">REQUIRES</span></code> enables the test if all expressions are true.</div>
+<div class="line"><code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> disables the test if any expression is true.</div>
+<div class="line"><code class="docutils literal"><span class="pre">XFAIL</span></code> expects the test to fail if any expression is true.</div>
+</div>
+<p>As a special case, <code class="docutils literal"><span class="pre">XFAIL:</span> <span class="pre">*</span></code> is expected to fail everywhere.</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; This test is disabled on Windows,</span>
+<span class="c">; and is disabled on Linux, except for Android Linux.</span>
+<span class="c">; UNSUPPORTED: windows, linux && !android</span>
+<span class="c">; This test is expected to fail on both PowerPC and ARM.</span>
+<span class="c">; XFAIL: powerpc || arm</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="substitutions">
+<h3><a class="toc-backref" href="#id20">Substitutions</a><a class="headerlink" href="#substitutions" title="Permalink to this headline">Â¶</a></h3>
+<p>Besides replacing LLVM tool names the following substitutions are performed in
+RUN lines:</p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">%%</span></code></dt>
+<dd>Replaced by a single <code class="docutils literal"><span class="pre">%</span></code>. This allows escaping other substitutions.</dd>
+<dt><code class="docutils literal"><span class="pre">%s</span></code></dt>
+<dd><p class="first">File path to the test caseâs source. This is suitable for passing on the
+command line as the input to an LLVM tool.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm/test/MC/ELF/foo_test.s</span></code></p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%S</span></code></dt>
+<dd><p class="first">Directory path to the test caseâs source.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm/test/MC/ELF</span></code></p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%t</span></code></dt>
+<dd><p class="first">File path to a temporary file name that could be used for this test case.
+The file name wonât conflict with other test cases. You can append to it
+if you need multiple temporaries. This is useful as the destination of
+some redirected output.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm.build/test/MC/ELF/Output/foo_test.s.tmp</span></code></p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%T</span></code></dt>
+<dd><p class="first">Directory of <code class="docutils literal"><span class="pre">%t</span></code>.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm.build/test/MC/ELF/Output</span></code></p>
+</dd>
+</dl>
+<p><code class="docutils literal"><span class="pre">%{pathsep}</span></code></p>
+<blockquote>
+<div>Expands to the path separator, i.e. <code class="docutils literal"><span class="pre">:</span></code> (or <code class="docutils literal"><span class="pre">;</span></code> on Windows).</div></blockquote>
+<p><code class="docutils literal"><span class="pre">%/s,</span> <span class="pre">%/S,</span> <span class="pre">%/t,</span> <span class="pre">%/T:</span></code></p>
+<blockquote>
+<div><p>Act like the corresponding substitution above but replace any <code class="docutils literal"><span class="pre">\</span></code>
+character with a <code class="docutils literal"><span class="pre">/</span></code>. This is useful to normalize path separators.</p>
+<blockquote>
+<div><p>Example: <code class="docutils literal"><span class="pre">%s:</span>  <span class="pre">C:\Desktop</span> <span class="pre">Files/foo_test.s.tmp</span></code></p>
+<p>Example: <code class="docutils literal"><span class="pre">%/s:</span> <span class="pre">C:/Desktop</span> <span class="pre">Files/foo_test.s.tmp</span></code></p>
+</div></blockquote>
+</div></blockquote>
+<p><code class="docutils literal"><span class="pre">%:s,</span> <span class="pre">%:S,</span> <span class="pre">%:t,</span> <span class="pre">%:T:</span></code></p>
+<blockquote>
+<div><p>Act like the corresponding substitution above but remove colons at
+the beginning of Windows paths. This is useful to allow concatenation
+of absolute paths on Windows to produce a legal path.</p>
+<blockquote>
+<div><p>Example: <code class="docutils literal"><span class="pre">%s:</span>  <span class="pre">C:\Desktop</span> <span class="pre">Files\foo_test.s.tmp</span></code></p>
+<p>Example: <code class="docutils literal"><span class="pre">%:s:</span> <span class="pre">C\Desktop</span> <span class="pre">Files\foo_test.s.tmp</span></code></p>
+</div></blockquote>
+</div></blockquote>
+<p><strong>LLVM-specific substitutions:</strong></p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">%shlibext</span></code></dt>
+<dd><p class="first">The suffix for the host platforms shared library files. This includes the
+period as the first character.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">.so</span></code> (Linux), <code class="docutils literal"><span class="pre">.dylib</span></code> (OS X), <code class="docutils literal"><span class="pre">.dll</span></code> (Windows)</p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%exeext</span></code></dt>
+<dd><p class="first">The suffix for the host platforms executable files. This includes the
+period as the first character.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">.exe</span></code> (Windows), empty on Linux.</p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%(line)</span></code>, <code class="docutils literal"><span class="pre">%(line+<number>)</span></code>, <code class="docutils literal"><span class="pre">%(line-<number>)</span></code></dt>
+<dd>The number of the line where this substitution is used, with an optional
+integer offset. This can be used in tests with multiple RUN lines, which
+reference test fileâs line numbers.</dd>
+</dl>
+<p><strong>Clang-specific substitutions:</strong></p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">%clang</span></code></dt>
+<dd>Invokes the Clang driver.</dd>
+<dt><code class="docutils literal"><span class="pre">%clang_cpp</span></code></dt>
+<dd>Invokes the Clang driver for C++.</dd>
+<dt><code class="docutils literal"><span class="pre">%clang_cl</span></code></dt>
+<dd>Invokes the CL-compatible Clang driver.</dd>
+<dt><code class="docutils literal"><span class="pre">%clangxx</span></code></dt>
+<dd>Invokes the G++-compatible Clang driver.</dd>
+<dt><code class="docutils literal"><span class="pre">%clang_cc1</span></code></dt>
+<dd>Invokes the Clang frontend.</dd>
+<dt><code class="docutils literal"><span class="pre">%itanium_abi_triple</span></code>, <code class="docutils literal"><span class="pre">%ms_abi_triple</span></code></dt>
+<dd>These substitutions can be used to get the current target triple adjusted to
+the desired ABI. For example, if the test suite is running with the
+<code class="docutils literal"><span class="pre">i686-pc-win32</span></code> target, <code class="docutils literal"><span class="pre">%itanium_abi_triple</span></code> will expand to
+<code class="docutils literal"><span class="pre">i686-pc-mingw32</span></code>. This allows a test to run with a specific ABI without
+constraining it to a specific triple.</dd>
+</dl>
+<p>To add more substituations, look at <code class="docutils literal"><span class="pre">test/lit.cfg</span></code> or <code class="docutils literal"><span class="pre">lit.local.cfg</span></code>.</p>
+</div>
+<div class="section" id="options">
+<h3><a class="toc-backref" href="#id21">Options</a><a class="headerlink" href="#options" title="Permalink to this headline">Â¶</a></h3>
+<p>The llvm lit configuration allows to customize some things with user options:</p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">llc</span></code>, <code class="docutils literal"><span class="pre">opt</span></code>, â¦</dt>
+<dd><p class="first">Substitute the respective llvm tool name with a custom command line. This
+allows to specify custom paths and default arguments for these tools.
+Example:</p>
+<p class="last">% llvm-lit â-Dllc=llc -verify-machineinstrsâ</p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">run_long_tests</span></code></dt>
+<dd>Enable the execution of long running tests.</dd>
+<dt><code class="docutils literal"><span class="pre">llvm_site_config</span></code></dt>
+<dd>Load the specified lit configuration instead of the default one.</dd>
+</dl>
+</div>
+<div class="section" id="other-features">
+<h3><a class="toc-backref" href="#id22">Other Features</a><a class="headerlink" href="#other-features" title="Permalink to this headline">Â¶</a></h3>
+<p>To make RUN line writing easier, there are several helper programs. These
+helpers are in the PATH when running tests, so you can just call them using
+their name. For example:</p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">not</span></code></dt>
+<dd>This program runs its arguments and then inverts the result code from it.
+Zero result codes become 1. Non-zero result codes become 0.</dd>
+</dl>
+<p>To make the output more useful, <strong class="program">lit</strong> will scan
+the lines of the test case for ones that contain a pattern that matches
+<code class="docutils literal"><span class="pre">PR[0-9]+</span></code>. This is the syntax for specifying a PR (Problem Report) number
+that is related to the test case. The number after âPRâ specifies the
+LLVM bugzilla number. When a PR number is specified, it will be used in
+the pass/fail reporting. This is useful to quickly get some context when
+a test fails.</p>
+<p>Finally, any line that contains âEND.â will cause the special
+interpretation of lines to terminate. This is generally done right after
+the last RUN: line. This has two side effects:</p>
+<ol class="loweralpha simple">
+<li>it prevents special interpretation of lines that are part of the test
+program, not the instructions to the test case, and</li>
+<li>it speeds things up for really big test cases by avoiding
+interpretation of the remainder of the file.</li>
+</ol>
+</div>
+</div>
+<div class="section" id="test-suite-overview">
+<span id="id3"></span><h2><a class="toc-backref" href="#id23"><code class="docutils literal"><span class="pre">test-suite</span></code> Overview</a><a class="headerlink" href="#test-suite-overview" title="Permalink to this headline">Â¶</a></h2>
+<p>The <code class="docutils literal"><span class="pre">test-suite</span></code> module contains a number of programs that can be
+compiled and executed. The <code class="docutils literal"><span class="pre">test-suite</span></code> includes reference outputs for
+all of the programs, so that the output of the executed program can be
+checked for correctness.</p>
+<p><code class="docutils literal"><span class="pre">test-suite</span></code> tests are divided into three types of tests: MultiSource,
+SingleSource, and External.</p>
+<ul>
+<li><p class="first"><code class="docutils literal"><span class="pre">test-suite/SingleSource</span></code></p>
+<p>The SingleSource directory contains test programs that are only a
+single source file in size. These are usually small benchmark
+programs or small programs that calculate a particular value. Several
+such programs are grouped together in each directory.</p>
+</li>
+<li><p class="first"><code class="docutils literal"><span class="pre">test-suite/MultiSource</span></code></p>
+<p>The MultiSource directory contains subdirectories which contain
+entire programs with multiple source files. Large benchmarks and
+whole applications go here.</p>
+</li>
+<li><p class="first"><code class="docutils literal"><span class="pre">test-suite/External</span></code></p>
+<p>The External directory contains Makefiles for building code that is
+external to (i.e., not distributed with) LLVM. The most prominent
+members of this directory are the SPEC 95 and SPEC 2000 benchmark
+suites. The <code class="docutils literal"><span class="pre">External</span></code> directory does not contain these actual
+tests, but only the Makefiles that know how to properly compile these
+programs from somewhere else. When using <code class="docutils literal"><span class="pre">LNT</span></code>, use the
+<code class="docutils literal"><span class="pre">--test-externals</span></code> option to include these tests in the results.</p>
+</li>
+</ul>
+<div class="section" id="test-suite-quickstart">
+<span id="id4"></span><h3><a class="toc-backref" href="#id24"><code class="docutils literal"><span class="pre">test-suite</span></code> Quickstart</a><a class="headerlink" href="#test-suite-quickstart" title="Permalink to this headline">Â¶</a></h3>
+<p>The modern way of running the <code class="docutils literal"><span class="pre">test-suite</span></code> is focused on testing and
+benchmarking complete compilers using the
+<a class="reference external" href="http://llvm.org/docs/lnt">LNT</a> testing infrastructure.</p>
+<p>For more information on using LNT to execute the <code class="docutils literal"><span class="pre">test-suite</span></code>, please
+see the <a class="reference external" href="http://llvm.org/docs/lnt/quickstart.html">LNT Quickstart</a>
+documentation.</p>
+</div>
+<div class="section" id="test-suite-makefiles">
+<h3><a class="toc-backref" href="#id25"><code class="docutils literal"><span class="pre">test-suite</span></code> Makefiles</a><a class="headerlink" href="#test-suite-makefiles" title="Permalink to this headline">Â¶</a></h3>
+<p>Historically, the <code class="docutils literal"><span class="pre">test-suite</span></code> was executed using a complicated setup
+of Makefiles. The LNT based approach above is recommended for most
+users, but there are some testing scenarios which are not supported by
+the LNT approach. In addition, LNT currently uses the Makefile setup
+under the covers and so developers who are interested in how LNT works
+under the hood may want to understand the Makefile based setup.</p>
+<p>For more information on the <code class="docutils literal"><span class="pre">test-suite</span></code> Makefile setup, please see
+the <a class="reference internal" href="TestSuiteMakefileGuide.html"><span class="doc">Test Suite Makefile Guide</span></a>.</p>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="TestSuiteMakefileGuide.html" title="LLVM test-suite Guide"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="Phabricator.html" title="Code Reviews with Phabricator"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/TypeMetadata.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/TypeMetadata.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/TypeMetadata.html (added)
+++ www-releases/trunk/5.0.1/docs/TypeMetadata.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,386 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Type Metadata — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="FaultMaps and implicit checks" href="FaultMaps.html" />
+    <link rel="prev" title="MergeFunctions pass, how it works" href="MergeFunctions.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="FaultMaps.html" title="FaultMaps and implicit checks"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="MergeFunctions.html" title="MergeFunctions pass, how it works"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="type-metadata">
+<h1>Type Metadata<a class="headerlink" href="#type-metadata" title="Permalink to this headline">Â¶</a></h1>
+<p>Type metadata is a mechanism that allows IR modules to co-operatively build
+pointer sets corresponding to addresses within a given set of globals. LLVMâs
+<a class="reference external" href="http://clang.llvm.org/docs/ControlFlowIntegrity.html">control flow integrity</a> implementation uses this metadata to efficiently
+check (at each call site) that a given address corresponds to either a
+valid vtable or function pointer for a given class or function type, and its
+whole-program devirtualization pass uses the metadata to identify potential
+callees for a given virtual call.</p>
+<p>To use the mechanism, a client creates metadata nodes with two elements:</p>
+<ol class="arabic simple">
+<li>a byte offset into the global (generally zero for functions)</li>
+<li>a metadata object representing an identifier for the type</li>
+</ol>
+<p>These metadata nodes are associated with globals by using global object
+metadata attachments with the <code class="docutils literal"><span class="pre">!type</span></code> metadata kind.</p>
+<p>Each type identifier must exclusively identify either global variables
+or functions.</p>
+<div class="admonition-limitation admonition">
+<p class="first admonition-title">Limitation</p>
+<p class="last">The current implementation only supports attaching metadata to functions on
+the x86-32 and x86-64 architectures.</p>
+</div>
+<p>An intrinsic, <a class="reference internal" href="LangRef.html#type-test"><span class="std std-ref">llvm.type.test</span></a>, is used to test whether a
+given pointer is associated with a type identifier.</p>
+<div class="section" id="representing-type-information-using-type-metadata">
+<h2>Representing Type Information using Type Metadata<a class="headerlink" href="#representing-type-information-using-type-metadata" title="Permalink to this headline">Â¶</a></h2>
+<p>This section describes how Clang represents C++ type information associated with
+virtual tables using type metadata.</p>
+<p>Consider the following inheritance hierarchy:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">A</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">();</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="nl">B</span> <span class="p">:</span> <span class="n">A</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">();</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="nf">g</span><span class="p">();</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="n">C</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">h</span><span class="p">();</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="nl">D</span> <span class="p">:</span> <span class="n">A</span><span class="p">,</span> <span class="n">C</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">();</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="nf">h</span><span class="p">();</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>The virtual table objects for A, B, C and D look like this (under the Itanium ABI):</p>
+<table border="1" class="docutils" id="id1">
+<caption><span class="caption-text">Virtual Table Layout for A, B, C, D</span><a class="headerlink" href="#id1" title="Permalink to this table">Â¶</a></caption>
+<colgroup>
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Class</th>
+<th class="head">0</th>
+<th class="head">1</th>
+<th class="head">2</th>
+<th class="head">3</th>
+<th class="head">4</th>
+<th class="head">5</th>
+<th class="head">6</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>A</td>
+<td>A::offset-to-top</td>
+<td>&A::rtti</td>
+<td>&A::f</td>
+<td> </td>
+<td> </td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="row-odd"><td>B</td>
+<td>B::offset-to-top</td>
+<td>&B::rtti</td>
+<td>&B::f</td>
+<td>&B::g</td>
+<td> </td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="row-even"><td>C</td>
+<td>C::offset-to-top</td>
+<td>&C::rtti</td>
+<td>&C::h</td>
+<td> </td>
+<td> </td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="row-odd"><td>D</td>
+<td>D::offset-to-top</td>
+<td>&D::rtti</td>
+<td>&D::f</td>
+<td>&D::h</td>
+<td>D::offset-to-top</td>
+<td>&D::rtti</td>
+<td>thunk for &D::h</td>
+</tr>
+</tbody>
+</table>
+<p>When an object of type A is constructed, the address of <code class="docutils literal"><span class="pre">&A::f</span></code> in Aâs
+virtual table object is stored in the objectâs vtable pointer.  In ABI parlance
+this address is known as an <a class="reference external" href="https://mentorembedded.github.io/cxx-abi/abi.html#vtable-general">address point</a>. Similarly, when an object of type
+B is constructed, the address of <code class="docutils literal"><span class="pre">&B::f</span></code> is stored in the vtable pointer. In
+this way, the vtable in Bâs virtual table object is compatible with Aâs vtable.</p>
+<p>D is a little more complicated, due to the use of multiple inheritance. Its
+virtual table object contains two vtables, one compatible with Aâs vtable and
+the other compatible with Câs vtable. Objects of type D contain two virtual
+pointers, one belonging to the A subobject and containing the address of
+the vtable compatible with Aâs vtable, and the other belonging to the C
+subobject and containing the address of the vtable compatible with Câs vtable.</p>
+<p>The full set of compatibility information for the above class hierarchy is
+shown below. The following table shows the name of a class, the offset of an
+address point within that classâs vtable and the name of one of the classes
+with which that address point is compatible.</p>
+<table border="1" class="docutils" id="id2">
+<caption><span class="caption-text">Type Offsets for A, B, C, D</span><a class="headerlink" href="#id2" title="Permalink to this table">Â¶</a></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">VTable for</th>
+<th class="head">Offset</th>
+<th class="head">Compatible Class</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>A</td>
+<td>16</td>
+<td>A</td>
+</tr>
+<tr class="row-odd"><td>B</td>
+<td>16</td>
+<td>A</td>
+</tr>
+<tr class="row-even"><td> </td>
+<td> </td>
+<td>B</td>
+</tr>
+<tr class="row-odd"><td>C</td>
+<td>16</td>
+<td>C</td>
+</tr>
+<tr class="row-even"><td>D</td>
+<td>16</td>
+<td>A</td>
+</tr>
+<tr class="row-odd"><td> </td>
+<td> </td>
+<td>D</td>
+</tr>
+<tr class="row-even"><td> </td>
+<td>48</td>
+<td>C</td>
+</tr>
+</tbody>
+</table>
+<p>The next step is to encode this compatibility information into the IR. The way
+this is done is to create type metadata named after each of the compatible
+classes, with which we associate each of the compatible address points in
+each vtable. For example, these type metadata entries encode the compatibility
+information for the above hierarchy:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>@_ZTV1A = constant [...], !type !0
+ at _ZTV1B = constant [...], !type !0, !type !1
+ at _ZTV1C = constant [...], !type !2
+ at _ZTV1D = constant [...], !type !0, !type !3, !type !4
+
+!0 = !{i64 16, !"_ZTS1A"}
+!1 = !{i64 16, !"_ZTS1B"}
+!2 = !{i64 16, !"_ZTS1C"}
+!3 = !{i64 16, !"_ZTS1D"}
+!4 = !{i64 48, !"_ZTS1C"}
+</pre></div>
+</div>
+<p>With this type metadata, we can now use the <code class="docutils literal"><span class="pre">llvm.type.test</span></code> intrinsic to
+test whether a given pointer is compatible with a type identifier. Working
+backwards, if <code class="docutils literal"><span class="pre">llvm.type.test</span></code> returns true for a particular pointer,
+we can also statically determine the identities of the virtual functions
+that a particular virtual call may call. For example, if a program assumes
+a pointer to be a member of <code class="docutils literal"><span class="pre">!"_ZST1A"</span></code>, we know that the address can
+be only be one of <code class="docutils literal"><span class="pre">_ZTV1A+16</span></code>, <code class="docutils literal"><span class="pre">_ZTV1B+16</span></code> or <code class="docutils literal"><span class="pre">_ZTV1D+16</span></code> (i.e. the
+address points of the vtables of A, B and D respectively). If we then load
+an address from that pointer, we know that the address can only be one of
+<code class="docutils literal"><span class="pre">&A::f</span></code>, <code class="docutils literal"><span class="pre">&B::f</span></code> or <code class="docutils literal"><span class="pre">&D::f</span></code>.</p>
+</div>
+<div class="section" id="testing-addresses-for-type-membership">
+<h2>Testing Addresses For Type Membership<a class="headerlink" href="#testing-addresses-for-type-membership" title="Permalink to this headline">Â¶</a></h2>
+<p>If a program tests an address using <code class="docutils literal"><span class="pre">llvm.type.test</span></code>, this will cause
+a link-time optimization pass, <code class="docutils literal"><span class="pre">LowerTypeTests</span></code>, to replace calls to this
+intrinsic with efficient code to perform type member tests. At a high level,
+the pass will lay out referenced globals in a consecutive memory region in
+the object file, construct bit vectors that map onto that memory region,
+and generate code at each of the <code class="docutils literal"><span class="pre">llvm.type.test</span></code> call sites to test
+pointers against those bit vectors. Because of the layout manipulation, the
+globalsâ definitions must be available at LTO time. For more information,
+see the <a class="reference external" href="http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html">control flow integrity design document</a>.</p>
+<p>A type identifier that identifies functions is transformed into a jump table,
+which is a block of code consisting of one branch instruction for each
+of the functions associated with the type identifier that branches to the
+target function. The pass will redirect any taken function addresses to the
+corresponding jump table entry. In the object fileâs symbol table, the jump
+table entries take the identities of the original functions, so that addresses
+taken outside the module will pass any verification done inside the module.</p>
+<p>Jump tables may call external functions, so their definitions need not
+be available at LTO time. Note that if an externally defined function is
+associated with a type identifier, there is no guarantee that its identity
+within the module will be the same as its identity outside of the module,
+as the former will be the jump table entry if a jump table is necessary.</p>
+<p>The <a class="reference external" href="http://llvm.org/klaus/llvm/blob/master/include/llvm/Transforms/IPO/LowerTypeTests.h">GlobalLayoutBuilder</a> class is responsible for laying out the globals
+efficiently to minimize the sizes of the underlying bitsets.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Example:</th><td class="field-body"></td>
+</tr>
+</tbody>
+</table>
+<div class="highlight-default"><div class="highlight"><pre><span></span>target datalayout = "e-p:32:32"
+
+ at a = internal global i32 0, !type !0
+ at b = internal global i32 0, !type !0, !type !1
+ at c = internal global i32 0, !type !1
+ at d = internal global [2 x i32] [i32 0, i32 0], !type !2
+
+define void @e() !type !3 {
+  ret void
+}
+
+define void @f() {
+  ret void
+}
+
+declare void @g() !type !3
+
+!0 = !{i32 0, !"typeid1"}
+!1 = !{i32 0, !"typeid2"}
+!2 = !{i32 4, !"typeid2"}
+!3 = !{i32 0, !"typeid3"}
+
+declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone
+
+define i1 @foo(i32* %p) {
+  %pi8 = bitcast i32* %p to i8*
+  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1")
+  ret i1 %x
+}
+
+define i1 @bar(i32* %p) {
+  %pi8 = bitcast i32* %p to i8*
+  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2")
+  ret i1 %x
+}
+
+define i1 @baz(void ()* %p) {
+  %pi8 = bitcast void ()* %p to i8*
+  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
+  ret i1 %x
+}
+
+define void @main() {
+  %a1 = call i1 @foo(i32* @a) ; returns 1
+  %b1 = call i1 @foo(i32* @b) ; returns 1
+  %c1 = call i1 @foo(i32* @c) ; returns 0
+  %a2 = call i1 @bar(i32* @a) ; returns 0
+  %b2 = call i1 @bar(i32* @b) ; returns 1
+  %c2 = call i1 @bar(i32* @c) ; returns 1
+  %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
+  %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
+  %e = call i1 @baz(void ()* @e) ; returns 1
+  %f = call i1 @baz(void ()* @f) ; returns 0
+  %g = call i1 @baz(void ()* @g) ; returns 1
+  ret void
+}
+</pre></div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="FaultMaps.html" title="FaultMaps and implicit checks"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="MergeFunctions.html" title="MergeFunctions pass, how it works"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/Vectorizers.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/Vectorizers.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/Vectorizers.html (added)
+++ www-releases/trunk/5.0.1/docs/Vectorizers.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,502 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Auto-Vectorization in LLVM — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="Vectorization Plan" href="Proposals/VectorizationPlan.html" />
+    <link rel="prev" title="Source Level Debugging with LLVM" href="SourceLevelDebugging.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="SourceLevelDebugging.html" title="Source Level Debugging with LLVM"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="auto-vectorization-in-llvm">
+<h1>Auto-Vectorization in LLVM<a class="headerlink" href="#auto-vectorization-in-llvm" title="Permalink to this headline">Â¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#the-loop-vectorizer" id="id2">The Loop Vectorizer</a><ul>
+<li><a class="reference internal" href="#usage" id="id3">Usage</a><ul>
+<li><a class="reference internal" href="#command-line-flags" id="id4">Command line flags</a></li>
+<li><a class="reference internal" href="#pragma-loop-hint-directives" id="id5">Pragma loop hint directives</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#diagnostics" id="id6">Diagnostics</a></li>
+<li><a class="reference internal" href="#features" id="id7">Features</a><ul>
+<li><a class="reference internal" href="#loops-with-unknown-trip-count" id="id8">Loops with unknown trip count</a></li>
+<li><a class="reference internal" href="#runtime-checks-of-pointers" id="id9">Runtime Checks of Pointers</a></li>
+<li><a class="reference internal" href="#reductions" id="id10">Reductions</a></li>
+<li><a class="reference internal" href="#inductions" id="id11">Inductions</a></li>
+<li><a class="reference internal" href="#if-conversion" id="id12">If Conversion</a></li>
+<li><a class="reference internal" href="#pointer-induction-variables" id="id13">Pointer Induction Variables</a></li>
+<li><a class="reference internal" href="#reverse-iterators" id="id14">Reverse Iterators</a></li>
+<li><a class="reference internal" href="#scatter-gather" id="id15">Scatter / Gather</a></li>
+<li><a class="reference internal" href="#vectorization-of-mixed-types" id="id16">Vectorization of Mixed Types</a></li>
+<li><a class="reference internal" href="#global-structures-alias-analysis" id="id17">Global Structures Alias Analysis</a></li>
+<li><a class="reference internal" href="#vectorization-of-function-calls" id="id18">Vectorization of function calls</a></li>
+<li><a class="reference internal" href="#partial-unrolling-during-vectorization" id="id19">Partial unrolling during vectorization</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#performance" id="id20">Performance</a></li>
+<li><a class="reference internal" href="#ongoing-development-directions" id="id21">Ongoing Development Directions</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-slp-vectorizer" id="id22">The SLP Vectorizer</a><ul>
+<li><a class="reference internal" href="#details" id="id23">Details</a></li>
+<li><a class="reference internal" href="#id1" id="id24">Usage</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<p>LLVM has two vectorizers: The <a class="reference internal" href="#loop-vectorizer"><span class="std std-ref">Loop Vectorizer</span></a>,
+which operates on Loops, and the <a class="reference internal" href="#slp-vectorizer"><span class="std std-ref">SLP Vectorizer</span></a>. These vectorizers
+focus on different optimization opportunities and use different techniques.
+The SLP vectorizer merges multiple scalars that are found in the code into
+vectors while the Loop Vectorizer widens instructions in loops
+to operate on multiple consecutive iterations.</p>
+<p>Both the Loop Vectorizer and the SLP Vectorizer are enabled by default.</p>
+<div class="section" id="the-loop-vectorizer">
+<span id="loop-vectorizer"></span><h2><a class="toc-backref" href="#id2">The Loop Vectorizer</a><a class="headerlink" href="#the-loop-vectorizer" title="Permalink to this headline">Â¶</a></h2>
+<div class="section" id="usage">
+<h3><a class="toc-backref" href="#id3">Usage</a><a class="headerlink" href="#usage" title="Permalink to this headline">Â¶</a></h3>
+<p>The Loop Vectorizer is enabled by default, but it can be disabled
+through clang using the command line flag:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang ... -fno-vectorize  file.c
+</pre></div>
+</div>
+<div class="section" id="command-line-flags">
+<h4><a class="toc-backref" href="#id4">Command line flags</a><a class="headerlink" href="#command-line-flags" title="Permalink to this headline">Â¶</a></h4>
+<p>The loop vectorizer uses a cost model to decide on the optimal vectorization factor
+and unroll factor. However, users of the vectorizer can force the vectorizer to use
+specific values. Both âclangâ and âoptâ support the flags below.</p>
+<p>Users can control the vectorization SIMD width using the command line flag â-force-vector-widthâ.</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang  -mllvm -force-vector-width<span class="o">=</span><span class="m">8</span> ...
+<span class="gp">$</span> opt -loop-vectorize -force-vector-width<span class="o">=</span><span class="m">8</span> ...
+</pre></div>
+</div>
+<p>Users can control the unroll factor using the command line flag â-force-vector-interleaveâ</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang  -mllvm -force-vector-interleave<span class="o">=</span><span class="m">2</span> ...
+<span class="gp">$</span> opt -loop-vectorize -force-vector-interleave<span class="o">=</span><span class="m">2</span> ...
+</pre></div>
+</div>
+</div>
+<div class="section" id="pragma-loop-hint-directives">
+<h4><a class="toc-backref" href="#id5">Pragma loop hint directives</a><a class="headerlink" href="#pragma-loop-hint-directives" title="Permalink to this headline">Â¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">#pragma</span> <span class="pre">clang</span> <span class="pre">loop</span></code> directive allows loop vectorization hints to be
+specified for the subsequent for, while, do-while, or c++11 range-based for
+loop. The directive allows vectorization and interleaving to be enabled or
+disabled. Vector width as well as interleave count can also be manually
+specified. The following example explicitly enables vectorization and
+interleaving:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#pragma clang loop vectorize(enable) interleave(enable)</span>
+<span class="k">while</span><span class="p">(...)</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The following example implicitly enables vectorization and interleaving by
+specifying a vector width and interleaving count:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#pragma clang loop vectorize_width(2) interleave_count(2)</span>
+<span class="k">for</span><span class="p">(...)</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>See the Clang
+<a class="reference external" href="http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations">language extensions</a>
+for details.</p>
+</div>
+</div>
+<div class="section" id="diagnostics">
+<h3><a class="toc-backref" href="#id6">Diagnostics</a><a class="headerlink" href="#diagnostics" title="Permalink to this headline">Â¶</a></h3>
+<p>Many loops cannot be vectorized including loops with complicated control flow,
+unvectorizable types, and unvectorizable calls. The loop vectorizer generates
+optimization remarks which can be queried using command line options to identify
+and diagnose loops that are skipped by the loop-vectorizer.</p>
+<p>Optimization remarks are enabled using:</p>
+<p><code class="docutils literal"><span class="pre">-Rpass=loop-vectorize</span></code> identifies loops that were successfully vectorized.</p>
+<p><code class="docutils literal"><span class="pre">-Rpass-missed=loop-vectorize</span></code> identifies loops that failed vectorization and
+indicates if vectorization was specified.</p>
+<p><code class="docutils literal"><span class="pre">-Rpass-analysis=loop-vectorize</span></code> identifies the statements that caused
+vectorization to fail. If in addition <code class="docutils literal"><span class="pre">-fsave-optimization-record</span></code> is
+provided, multiple causes of vectorization failure may be listed (this behavior
+might change in the future).</p>
+<p>Consider the following loop:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#pragma clang loop vectorize(enable)</span>
+<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">switch</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
+  <span class="k">case</span> <span class="mi">0</span><span class="o">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
+  <span class="k">case</span> <span class="mi">1</span><span class="o">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>   <span class="k">break</span><span class="p">;</span>
+  <span class="k">default</span><span class="o">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The command line <code class="docutils literal"><span class="pre">-Rpass-missed=loop-vectorized</span></code> prints the remark:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="go">no_switch.cpp:4:5: remark: loop not vectorized: vectorization is explicitly enabled [-Rpass-missed=loop-vectorize]</span>
+</pre></div>
+</div>
+<p>And the command line <code class="docutils literal"><span class="pre">-Rpass-analysis=loop-vectorize</span></code> indicates that the
+switch statement cannot be vectorized.</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="go">no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement [-Rpass-analysis=loop-vectorize]</span>
+<span class="go">  switch(A[i]) {</span>
+<span class="go">  ^</span>
+</pre></div>
+</div>
+<p>To ensure line and column numbers are produced include the command line options
+<code class="docutils literal"><span class="pre">-gline-tables-only</span></code> and <code class="docutils literal"><span class="pre">-gcolumn-info</span></code>. See the Clang <a class="reference external" href="http://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports">user manual</a>
+for details</p>
+</div>
+<div class="section" id="features">
+<h3><a class="toc-backref" href="#id7">Features</a><a class="headerlink" href="#features" title="Permalink to this headline">Â¶</a></h3>
+<p>The LLVM Loop Vectorizer has a number of features that allow it to vectorize
+complex loops.</p>
+<div class="section" id="loops-with-unknown-trip-count">
+<h4><a class="toc-backref" href="#id8">Loops with unknown trip count</a><a class="headerlink" href="#loops-with-unknown-trip-count" title="Permalink to this headline">Â¶</a></h4>
+<p>The Loop Vectorizer supports loops with an unknown trip count.
+In the loop below, the iteration <code class="docutils literal"><span class="pre">start</span></code> and <code class="docutils literal"><span class="pre">finish</span></code> points are unknown,
+and the Loop Vectorizer has a mechanism to vectorize loops that do not start
+at zero. In this example, ânâ may not be a multiple of the vector width, and
+the vectorizer has to execute the last few iterations as scalar code. Keeping
+a scalar copy of the loop increases the code size.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">float</span> <span class="n">K</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="kt">int</span> <span class="n">end</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">start</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">end</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">K</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="runtime-checks-of-pointers">
+<h4><a class="toc-backref" href="#id9">Runtime Checks of Pointers</a><a class="headerlink" href="#runtime-checks-of-pointers" title="Permalink to this headline">Â¶</a></h4>
+<p>In the example below, if the pointers A and B point to consecutive addresses,
+then it is illegal to vectorize the code because some elements of A will be
+written before they are read from array B.</p>
+<p>Some programmers use the ârestrictâ keyword to notify the compiler that the
+pointers are disjointed, but in our example, the Loop Vectorizer has no way of
+knowing that the pointers A and B are unique. The Loop Vectorizer handles this
+loop by placing code that checks, at runtime, if the arrays A and B point to
+disjointed memory locations. If arrays A and B overlap, then the scalar version
+of the loop is executed.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">float</span> <span class="n">K</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">K</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="reductions">
+<h4><a class="toc-backref" href="#id10">Reductions</a><a class="headerlink" href="#reductions" title="Permalink to this headline">Â¶</a></h4>
+<p>In this example the <code class="docutils literal"><span class="pre">sum</span></code> variable is used by consecutive iterations of
+the loop. Normally, this would prevent vectorization, but the vectorizer can
+detect that âsumâ is a reduction variable. The variable âsumâ becomes a vector
+of integers, and at the end of the loop the elements of the array are added
+together to create the correct result. We support a number of different
+reduction operations, such as addition, multiplication, XOR, AND and OR.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">unsigned</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">sum</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">5</span><span class="p">;</span>
+  <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>We support floating point reduction operations when <cite>-ffast-math</cite> is used.</p>
+</div>
+<div class="section" id="inductions">
+<h4><a class="toc-backref" href="#id11">Inductions</a><a class="headerlink" href="#inductions" title="Permalink to this headline">Â¶</a></h4>
+<p>In this example the value of the induction variable <code class="docutils literal"><span class="pre">i</span></code> is saved into an
+array. The Loop Vectorizer knows to vectorize induction variables.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">float</span> <span class="n">K</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="if-conversion">
+<h4><a class="toc-backref" href="#id12">If Conversion</a><a class="headerlink" href="#if-conversion" title="Permalink to this headline">Â¶</a></h4>
+<p>The Loop Vectorizer is able to âflattenâ the IF statement in the code and
+generate a single stream of instructions. The Loop Vectorizer supports any
+control flow in the innermost loop. The innermost loop may contain complex
+nesting of IFs, ELSEs and even GOTOs.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">unsigned</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">></span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
+      <span class="n">sum</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">5</span><span class="p">;</span>
+  <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="pointer-induction-variables">
+<h4><a class="toc-backref" href="#id13">Pointer Induction Variables</a><a class="headerlink" href="#pointer-induction-variables" title="Permalink to this headline">Â¶</a></h4>
+<p>This example uses the âaccumulateâ function of the standard c++ library. This
+loop uses C++ iterators, which are pointers, and not integer indices.
+The Loop Vectorizer detects pointer induction variables and can vectorize
+this loop. This feature is important because many C++ programs use iterators.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">baz</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">accumulate</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">A</span> <span class="o">+</span> <span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="reverse-iterators">
+<h4><a class="toc-backref" href="#id14">Reverse Iterators</a><a class="headerlink" href="#reverse-iterators" title="Permalink to this headline">Â¶</a></h4>
+<p>The Loop Vectorizer can vectorize loops that count backwards.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">;</span> <span class="o">--</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span><span class="mi">1</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="scatter-gather">
+<h4><a class="toc-backref" href="#id15">Scatter / Gather</a><a class="headerlink" href="#scatter-gather" title="Permalink to this headline">Â¶</a></h4>
+<p>The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions
+that scatter/gathers memory.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span> <span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">intptr_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+      <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">4</span><span class="p">];</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>In many situations the cost model will inform LLVM that this is not beneficial
+and LLVM will only vectorize such code if forced with â-mllvm -force-vector-width=#â.</p>
+</div>
+<div class="section" id="vectorization-of-mixed-types">
+<h4><a class="toc-backref" href="#id16">Vectorization of Mixed Types</a><a class="headerlink" href="#vectorization-of-mixed-types" title="Permalink to this headline">Â¶</a></h4>
+<p>The Loop Vectorizer can vectorize programs with mixed types. The Vectorizer
+cost model can estimate the cost of the type conversion and decide if
+vectorization is profitable.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">,</span> <span class="kt">int</span> <span class="n">k</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="global-structures-alias-analysis">
+<h4><a class="toc-backref" href="#id17">Global Structures Alias Analysis</a><a class="headerlink" href="#global-structures-alias-analysis" title="Permalink to this headline">Â¶</a></h4>
+<p>Access to global structures can also be vectorized, with alias analysis being
+used to make sure accesses donât alias. Run-time checks can also be added on
+pointer access to structure members.</p>
+<p>Many variations are supported, but some that rely on undefined behaviour being
+ignored (as other compilers do) are still being left un-vectorized.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">A</span><span class="p">[</span><span class="mi">100</span><span class="p">],</span> <span class="n">K</span><span class="p">,</span> <span class="n">B</span><span class="p">[</span><span class="mi">100</span><span class="p">];</span> <span class="p">}</span> <span class="n">Foo</span><span class="p">;</span>
+
+<span class="kt">int</span> <span class="nf">foo</span><span class="p">()</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">100</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">Foo</span><span class="p">.</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">Foo</span><span class="p">.</span><span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">100</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="vectorization-of-function-calls">
+<h4><a class="toc-backref" href="#id18">Vectorization of function calls</a><a class="headerlink" href="#vectorization-of-function-calls" title="Permalink to this headline">Â¶</a></h4>
+<p>The Loop Vectorize can vectorize intrinsic math functions.
+See the table below for a list of these functions.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="26%" />
+<col width="26%" />
+<col width="47%" />
+</colgroup>
+<tbody valign="top">
+<tr class="row-odd"><td>pow</td>
+<td>exp</td>
+<td>exp2</td>
+</tr>
+<tr class="row-even"><td>sin</td>
+<td>cos</td>
+<td>sqrt</td>
+</tr>
+<tr class="row-odd"><td>log</td>
+<td>log2</td>
+<td>log10</td>
+</tr>
+<tr class="row-even"><td>fabs</td>
+<td>floor</td>
+<td>ceil</td>
+</tr>
+<tr class="row-odd"><td>fma</td>
+<td>trunc</td>
+<td>nearbyint</td>
+</tr>
+<tr class="row-even"><td> </td>
+<td> </td>
+<td>fmuladd</td>
+</tr>
+</tbody>
+</table>
+<p>The loop vectorizer knows about special instructions on the target and will
+vectorize a loop containing a function call that maps to the instructions. For
+example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps
+instruction is available.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">!=</span> <span class="mi">1024</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">f</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">floorf</span><span class="p">(</span><span class="n">f</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="partial-unrolling-during-vectorization">
+<h4><a class="toc-backref" href="#id19">Partial unrolling during vectorization</a><a class="headerlink" href="#partial-unrolling-during-vectorization" title="Permalink to this headline">Â¶</a></h4>
+<p>Modern processors feature multiple execution units, and only programs that contain a
+high degree of parallelism can fully utilize the entire width of the machine.
+The Loop Vectorizer increases the instruction level parallelism (ILP) by
+performing partial-unrolling of loops.</p>
+<p>In the example below the entire array is accumulated into the variable âsumâ.
+This is inefficient because only a single execution port can be used by the processor.
+By unrolling the code the Loop Vectorizer allows two or more execution ports
+to be used simultaneously.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">unsigned</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+      <span class="n">sum</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
+  <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The Loop Vectorizer uses a cost model to decide when it is profitable to unroll loops.
+The decision to unroll the loop depends on the register pressure and the generated code size.</p>
+</div>
+</div>
+<div class="section" id="performance">
+<h3><a class="toc-backref" href="#id20">Performance</a><a class="headerlink" href="#performance" title="Permalink to this headline">Â¶</a></h3>
+<p>This section shows the execution time of Clang on a simple benchmark:
+<a class="reference external" href="http://llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/UnitTests/Vectorizer/">gcc-loops</a>.
+This benchmarks is a collection of loops from the GCC autovectorization
+<a class="reference external" href="http://gcc.gnu.org/projects/tree-ssa/vectorization.html">page</a> by Dorit Nuzman.</p>
+<p>The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for âcorei7-avxâ, running on a Sandybridge iMac.
+The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels.</p>
+<img alt="_images/gcc-loops.png" src="_images/gcc-loops.png" />
+<p>And Linpack-pc with the same configuration. Result is Mflops, higher is better.</p>
+<img alt="_images/linpack-pc.png" src="_images/linpack-pc.png" />
+</div>
+<div class="section" id="ongoing-development-directions">
+<h3><a class="toc-backref" href="#id21">Ongoing Development Directions</a><a class="headerlink" href="#ongoing-development-directions" title="Permalink to this headline">Â¶</a></h3>
+<div class="toctree-wrapper compound">
+</div>
+<dl class="docutils">
+<dt><a class="reference internal" href="Proposals/VectorizationPlan.html"><span class="doc">Vectorization Plan</span></a></dt>
+<dd>Modeling the process and upgrading the infrastructure of LLVMâs Loop Vectorizer.</dd>
+</dl>
+</div>
+</div>
+<div class="section" id="the-slp-vectorizer">
+<span id="slp-vectorizer"></span><h2><a class="toc-backref" href="#id22">The SLP Vectorizer</a><a class="headerlink" href="#the-slp-vectorizer" title="Permalink to this headline">Â¶</a></h2>
+<div class="section" id="details">
+<h3><a class="toc-backref" href="#id23">Details</a><a class="headerlink" href="#details" title="Permalink to this headline">Â¶</a></h3>
+<p>The goal of SLP vectorization (a.k.a. superword-level parallelism) is
+to combine similar independent instructions
+into vector instructions. Memory accesses, arithmetic operations, comparison
+operations, PHI-nodes, can all be vectorized using this technique.</p>
+<p>For example, the following function performs very similar operations on its
+inputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these
+into vector operations.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">int</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b1</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b2</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">A</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">a1</span><span class="o">*</span><span class="p">(</span><span class="n">a1</span> <span class="o">+</span> <span class="n">b1</span><span class="p">)</span><span class="o">/</span><span class="n">b1</span> <span class="o">+</span> <span class="mi">50</span><span class="o">*</span><span class="n">b1</span><span class="o">/</span><span class="n">a1</span><span class="p">;</span>
+  <span class="n">A</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">a2</span><span class="o">*</span><span class="p">(</span><span class="n">a2</span> <span class="o">+</span> <span class="n">b2</span><span class="p">)</span><span class="o">/</span><span class="n">b2</span> <span class="o">+</span> <span class="mi">50</span><span class="o">*</span><span class="n">b2</span><span class="o">/</span><span class="n">a2</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The SLP-vectorizer processes the code bottom-up, across basic blocks, in search of scalars to combine.</p>
+</div>
+<div class="section" id="id1">
+<h3><a class="toc-backref" href="#id24">Usage</a><a class="headerlink" href="#id1" title="Permalink to this headline">Â¶</a></h3>
+<p>The SLP Vectorizer is enabled by default, but it can be disabled
+through clang using the command line flag:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang -fno-slp-vectorize file.c
+</pre></div>
+</div>
+<p>LLVM has a second basic block vectorization phase
+which is more compile-time intensive (The BB vectorizer). This optimization
+can be enabled through clang using the command line flag:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang -fslp-vectorize-aggressive file.c
+</pre></div>
+</div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="SourceLevelDebugging.html" title="Source Level Debugging with LLVM"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html (added)
+++ www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1821 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Writing an LLVM Backend — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="How To Use Instruction Mappings" href="HowToUseInstrMappings.html" />
+    <link rel="prev" title="Vectorization Plan" href="Proposals/VectorizationPlan.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="HowToUseInstrMappings.html" title="How To Use Instruction Mappings"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="writing-an-llvm-backend">
+<h1>Writing an LLVM Backend<a class="headerlink" href="#writing-an-llvm-backend" title="Permalink to this headline">Â¶</a></h1>
+<div class="toctree-wrapper compound">
+</div>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction" id="id3">Introduction</a><ul>
+<li><a class="reference internal" href="#audience" id="id4">Audience</a></li>
+<li><a class="reference internal" href="#prerequisite-reading" id="id5">Prerequisite Reading</a></li>
+<li><a class="reference internal" href="#basic-steps" id="id6">Basic Steps</a></li>
+<li><a class="reference internal" href="#preliminaries" id="id7">Preliminaries</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#target-machine" id="id8">Target Machine</a></li>
+<li><a class="reference internal" href="#target-registration" id="id9">Target Registration</a></li>
+<li><a class="reference internal" href="#register-set-and-register-classes" id="id10">Register Set and Register Classes</a><ul>
+<li><a class="reference internal" href="#defining-a-register" id="id11">Defining a Register</a></li>
+<li><a class="reference internal" href="#defining-a-register-class" id="id12">Defining a Register Class</a></li>
+<li><a class="reference internal" href="#implement-a-subclass-of-targetregisterinfo" id="id13">Implement a subclass of <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code></a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#instruction-set" id="id14">Instruction Set</a><ul>
+<li><a class="reference internal" href="#instruction-operand-mapping" id="id15">Instruction Operand Mapping</a><ul>
+<li><a class="reference internal" href="#instruction-operand-name-mapping" id="id16">Instruction Operand Name Mapping</a></li>
+<li><a class="reference internal" href="#instruction-operand-types" id="id17">Instruction Operand Types</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#instruction-scheduling" id="id18">Instruction Scheduling</a></li>
+<li><a class="reference internal" href="#instruction-relation-mapping" id="id19">Instruction Relation Mapping</a></li>
+<li><a class="reference internal" href="#implement-a-subclass-of-targetinstrinfo" id="id20">Implement a subclass of <code class="docutils literal"><span class="pre">TargetInstrInfo</span></code></a></li>
+<li><a class="reference internal" href="#branch-folding-and-if-conversion" id="id21">Branch Folding and If Conversion</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#instruction-selector" id="id22">Instruction Selector</a><ul>
+<li><a class="reference internal" href="#the-selectiondag-legalize-phase" id="id23">The SelectionDAG Legalize Phase</a><ul>
+<li><a class="reference internal" href="#promote" id="id24">Promote</a></li>
+<li><a class="reference internal" href="#expand" id="id25">Expand</a></li>
+<li><a class="reference internal" href="#custom" id="id26">Custom</a></li>
+<li><a class="reference internal" href="#legal" id="id27">Legal</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#calling-conventions" id="id28">Calling Conventions</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#assembly-printer" id="id29">Assembly Printer</a></li>
+<li><a class="reference internal" href="#subtarget-support" id="id30">Subtarget Support</a></li>
+<li><a class="reference internal" href="#jit-support" id="id31">JIT Support</a><ul>
+<li><a class="reference internal" href="#machine-code-emitter" id="id32">Machine Code Emitter</a></li>
+<li><a class="reference internal" href="#target-jit-info" id="id33">Target JIT Info</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="introduction">
+<h2><a class="toc-backref" href="#id3">Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this headline">Â¶</a></h2>
+<p>This document describes techniques for writing compiler backends that convert
+the LLVM Intermediate Representation (IR) to code for a specified machine or
+other languages.  Code intended for a specific machine can take the form of
+either assembly code or binary code (usable for a JIT compiler).</p>
+<p>The backend of LLVM features a target-independent code generator that may
+create output for several types of target CPUs â including X86, PowerPC,
+ARM, and SPARC.  The backend may also be used to generate code targeted at SPUs
+of the Cell processor or GPUs to support the execution of compute kernels.</p>
+<p>The document focuses on existing examples found in subdirectories of
+<code class="docutils literal"><span class="pre">llvm/lib/Target</span></code> in a downloaded LLVM release.  In particular, this document
+focuses on the example of creating a static compiler (one that emits text
+assembly) for a SPARC target, because SPARC has fairly standard
+characteristics, such as a RISC instruction set and straightforward calling
+conventions.</p>
+<div class="section" id="audience">
+<h3><a class="toc-backref" href="#id4">Audience</a><a class="headerlink" href="#audience" title="Permalink to this headline">Â¶</a></h3>
+<p>The audience for this document is anyone who needs to write an LLVM backend to
+generate code for a specific hardware or software target.</p>
+</div>
+<div class="section" id="prerequisite-reading">
+<h3><a class="toc-backref" href="#id5">Prerequisite Reading</a><a class="headerlink" href="#prerequisite-reading" title="Permalink to this headline">Â¶</a></h3>
+<p>These essential documents must be read before reading this document:</p>
+<ul class="simple">
+<li><a class="reference external" href="LangRef.html">LLVM Language Reference Manual</a> â a reference manual for
+the LLVM assembly language.</li>
+<li><a class="reference internal" href="CodeGenerator.html"><span class="doc">The LLVM Target-Independent Code Generator</span></a> â a guide to the components (classes and code
+generation algorithms) for translating the LLVM internal representation into
+machine code for a specified target.  Pay particular attention to the
+descriptions of code generation stages: Instruction Selection, Scheduling and
+Formation, SSA-based Optimization, Register Allocation, Prolog/Epilog Code
+Insertion, Late Machine Code Optimizations, and Code Emission.</li>
+<li><a class="reference internal" href="TableGen/index.html"><span class="doc">TableGen</span></a> â a document that describes the TableGen
+(<code class="docutils literal"><span class="pre">tblgen</span></code>) application that manages domain-specific information to support
+LLVM code generation.  TableGen processes input from a target description
+file (<code class="docutils literal"><span class="pre">.td</span></code> suffix) and generates C++ code that can be used for code
+generation.</li>
+<li><a class="reference internal" href="WritingAnLLVMPass.html"><span class="doc">Writing an LLVM Pass</span></a> â The assembly printer is a <code class="docutils literal"><span class="pre">FunctionPass</span></code>, as
+are several <code class="docutils literal"><span class="pre">SelectionDAG</span></code> processing steps.</li>
+</ul>
+<p>To follow the SPARC examples in this document, have a copy of <a class="reference external" href="http://www.sparc.org/standards/V8.pdf">The SPARC
+Architecture Manual, Version 8</a> for
+reference.  For details about the ARM instruction set, refer to the <a class="reference external" href="http://infocenter.arm.com/">ARM
+Architecture Reference Manual</a>.  For more about
+the GNU Assembler format (<code class="docutils literal"><span class="pre">GAS</span></code>), see <a class="reference external" href="http://sourceware.org/binutils/docs/as/index.html">Using As</a>, especially for the
+assembly printer.  âUsing Asâ contains a list of target machine dependent
+features.</p>
+</div>
+<div class="section" id="basic-steps">
+<h3><a class="toc-backref" href="#id6">Basic Steps</a><a class="headerlink" href="#basic-steps" title="Permalink to this headline">Â¶</a></h3>
+<p>To write a compiler backend for LLVM that converts the LLVM IR to code for a
+specified target (machine or other language), follow these steps:</p>
+<ul class="simple">
+<li>Create a subclass of the <code class="docutils literal"><span class="pre">TargetMachine</span></code> class that describes
+characteristics of your target machine.  Copy existing examples of specific
+<code class="docutils literal"><span class="pre">TargetMachine</span></code> class and header files; for example, start with
+<code class="docutils literal"><span class="pre">SparcTargetMachine.cpp</span></code> and <code class="docutils literal"><span class="pre">SparcTargetMachine.h</span></code>, but change the file
+names for your target.  Similarly, change code that references â<code class="docutils literal"><span class="pre">Sparc</span></code>â to
+reference your target.</li>
+<li>Describe the register set of the target.  Use TableGen to generate code for
+register definition, register aliases, and register classes from a
+target-specific <code class="docutils literal"><span class="pre">RegisterInfo.td</span></code> input file.  You should also write
+additional code for a subclass of the <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code> class that
+represents the class register file data used for register allocation and also
+describes the interactions between registers.</li>
+<li>Describe the instruction set of the target.  Use TableGen to generate code
+for target-specific instructions from target-specific versions of
+<code class="docutils literal"><span class="pre">TargetInstrFormats.td</span></code> and <code class="docutils literal"><span class="pre">TargetInstrInfo.td</span></code>.  You should write
+additional code for a subclass of the <code class="docutils literal"><span class="pre">TargetInstrInfo</span></code> class to represent
+machine instructions supported by the target machine.</li>
+<li>Describe the selection and conversion of the LLVM IR from a Directed Acyclic
+Graph (DAG) representation of instructions to native target-specific
+instructions.  Use TableGen to generate code that matches patterns and
+selects instructions based on additional information in a target-specific
+version of <code class="docutils literal"><span class="pre">TargetInstrInfo.td</span></code>.  Write code for <code class="docutils literal"><span class="pre">XXXISelDAGToDAG.cpp</span></code>,
+where <code class="docutils literal"><span class="pre">XXX</span></code> identifies the specific target, to perform pattern matching and
+DAG-to-DAG instruction selection.  Also write code in <code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code>
+to replace or remove operations and data types that are not supported
+natively in a SelectionDAG.</li>
+<li>Write code for an assembly printer that converts LLVM IR to a GAS format for
+your target machine.  You should add assembly strings to the instructions
+defined in your target-specific version of <code class="docutils literal"><span class="pre">TargetInstrInfo.td</span></code>.  You
+should also write code for a subclass of <code class="docutils literal"><span class="pre">AsmPrinter</span></code> that performs the
+LLVM-to-assembly conversion and a trivial subclass of <code class="docutils literal"><span class="pre">TargetAsmInfo</span></code>.</li>
+<li>Optionally, add support for subtargets (i.e., variants with different
+capabilities).  You should also write code for a subclass of the
+<code class="docutils literal"><span class="pre">TargetSubtarget</span></code> class, which allows you to use the <code class="docutils literal"><span class="pre">-mcpu=</span></code> and
+<code class="docutils literal"><span class="pre">-mattr=</span></code> command-line options.</li>
+<li>Optionally, add JIT support and create a machine code emitter (subclass of
+<code class="docutils literal"><span class="pre">TargetJITInfo</span></code>) that is used to emit binary code directly into memory.</li>
+</ul>
+<p>In the <code class="docutils literal"><span class="pre">.cpp</span></code> and <code class="docutils literal"><span class="pre">.h</span></code>. files, initially stub up these methods and then
+implement them later.  Initially, you may not know which private members that
+the class will need and which components will need to be subclassed.</p>
+</div>
+<div class="section" id="preliminaries">
+<h3><a class="toc-backref" href="#id7">Preliminaries</a><a class="headerlink" href="#preliminaries" title="Permalink to this headline">Â¶</a></h3>
+<p>To actually create your compiler backend, you need to create and modify a few
+files.  The absolute minimum is discussed here.  But to actually use the LLVM
+target-independent code generator, you must perform the steps described in the
+<a class="reference internal" href="CodeGenerator.html"><span class="doc">LLVM Target-Independent Code Generator</span></a> document.</p>
+<p>First, you should create a subdirectory under <code class="docutils literal"><span class="pre">lib/Target</span></code> to hold all the
+files related to your target.  If your target is called âDummyâ, create the
+directory <code class="docutils literal"><span class="pre">lib/Target/Dummy</span></code>.</p>
+<p>In this new directory, create a <code class="docutils literal"><span class="pre">CMakeLists.txt</span></code>.  It is easiest to copy a
+<code class="docutils literal"><span class="pre">CMakeLists.txt</span></code> of another target and modify it.  It should at least contain
+the <code class="docutils literal"><span class="pre">LLVM_TARGET_DEFINITIONS</span></code> variable. The library can be named <code class="docutils literal"><span class="pre">LLVMDummy</span></code>
+(for example, see the MIPS target).  Alternatively, you can split the library
+into <code class="docutils literal"><span class="pre">LLVMDummyCodeGen</span></code> and <code class="docutils literal"><span class="pre">LLVMDummyAsmPrinter</span></code>, the latter of which
+should be implemented in a subdirectory below <code class="docutils literal"><span class="pre">lib/Target/Dummy</span></code> (for example,
+see the PowerPC target).</p>
+<p>Note that these two naming schemes are hardcoded into <code class="docutils literal"><span class="pre">llvm-config</span></code>.  Using
+any other naming scheme will confuse <code class="docutils literal"><span class="pre">llvm-config</span></code> and produce a lot of
+(seemingly unrelated) linker errors when linking <code class="docutils literal"><span class="pre">llc</span></code>.</p>
+<p>To make your target actually do something, you need to implement a subclass of
+<code class="docutils literal"><span class="pre">TargetMachine</span></code>.  This implementation should typically be in the file
+<code class="docutils literal"><span class="pre">lib/Target/DummyTargetMachine.cpp</span></code>, but any file in the <code class="docutils literal"><span class="pre">lib/Target</span></code>
+directory will be built and should work.  To use LLVMâs target independent code
+generator, you should do what all current machine backends do: create a
+subclass of <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code>.  (To create a target from scratch, create a
+subclass of <code class="docutils literal"><span class="pre">TargetMachine</span></code>.)</p>
+<p>To get LLVM to actually build and link your target, you need to run <code class="docutils literal"><span class="pre">cmake</span></code>
+with <code class="docutils literal"><span class="pre">-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=Dummy</span></code>. This will build your
+target without needing to add it to the list of all the targets.</p>
+<p>Once your target is stable, you can add it to the <code class="docutils literal"><span class="pre">LLVM_ALL_TARGETS</span></code> variable
+located in the main <code class="docutils literal"><span class="pre">CMakeLists.txt</span></code>.</p>
+</div>
+</div>
+<div class="section" id="target-machine">
+<h2><a class="toc-backref" href="#id8">Target Machine</a><a class="headerlink" href="#target-machine" title="Permalink to this headline">Â¶</a></h2>
+<p><code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code> is designed as a base class for targets implemented with
+the LLVM target-independent code generator.  The <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code> class
+should be specialized by a concrete target class that implements the various
+virtual methods.  <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code> is defined as a subclass of
+<code class="docutils literal"><span class="pre">TargetMachine</span></code> in <code class="docutils literal"><span class="pre">include/llvm/Target/TargetMachine.h</span></code>.  The
+<code class="docutils literal"><span class="pre">TargetMachine</span></code> class implementation (<code class="docutils literal"><span class="pre">TargetMachine.cpp</span></code>) also processes
+numerous command-line options.</p>
+<p>To create a concrete target-specific subclass of <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code>, start
+by copying an existing <code class="docutils literal"><span class="pre">TargetMachine</span></code> class and header.  You should name the
+files that you create to reflect your specific target.  For instance, for the
+SPARC target, name the files <code class="docutils literal"><span class="pre">SparcTargetMachine.h</span></code> and
+<code class="docutils literal"><span class="pre">SparcTargetMachine.cpp</span></code>.</p>
+<p>For a target machine <code class="docutils literal"><span class="pre">XXX</span></code>, the implementation of <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> must
+have access methods to obtain objects that represent target components.  These
+methods are named <code class="docutils literal"><span class="pre">get*Info</span></code>, and are intended to obtain the instruction set
+(<code class="docutils literal"><span class="pre">getInstrInfo</span></code>), register set (<code class="docutils literal"><span class="pre">getRegisterInfo</span></code>), stack frame layout
+(<code class="docutils literal"><span class="pre">getFrameInfo</span></code>), and similar information.  <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> must also
+implement the <code class="docutils literal"><span class="pre">getDataLayout</span></code> method to access an object with target-specific
+data characteristics, such as data type size and alignment requirements.</p>
+<p>For instance, for the SPARC target, the header file <code class="docutils literal"><span class="pre">SparcTargetMachine.h</span></code>
+declares prototypes for several <code class="docutils literal"><span class="pre">get*Info</span></code> and <code class="docutils literal"><span class="pre">getDataLayout</span></code> methods that
+simply return a class member.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="n">llvm</span> <span class="p">{</span>
+
+<span class="k">class</span> <span class="nc">Module</span><span class="p">;</span>
+
+<span class="k">class</span> <span class="nc">SparcTargetMachine</span> <span class="o">:</span> <span class="k">public</span> <span class="n">LLVMTargetMachine</span> <span class="p">{</span>
+  <span class="k">const</span> <span class="n">DataLayout</span> <span class="n">DataLayout</span><span class="p">;</span>       <span class="c1">// Calculates type size & alignment</span>
+  <span class="n">SparcSubtarget</span> <span class="n">Subtarget</span><span class="p">;</span>
+  <span class="n">SparcInstrInfo</span> <span class="n">InstrInfo</span><span class="p">;</span>
+  <span class="n">TargetFrameInfo</span> <span class="n">FrameInfo</span><span class="p">;</span>
+
+<span class="k">protected</span><span class="o">:</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetAsmInfo</span> <span class="o">*</span><span class="n">createTargetAsmInfo</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>
+
+<span class="k">public</span><span class="o">:</span>
+  <span class="n">SparcTargetMachine</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">FS</span><span class="p">);</span>
+
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">SparcInstrInfo</span> <span class="o">*</span><span class="nf">getInstrInfo</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span><span class="k">return</span> <span class="o">&</span><span class="n">InstrInfo</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetFrameInfo</span> <span class="o">*</span><span class="nf">getFrameInfo</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span><span class="k">return</span> <span class="o">&</span><span class="n">FrameInfo</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetSubtarget</span> <span class="o">*</span><span class="nf">getSubtargetImpl</span><span class="p">()</span> <span class="k">const</span><span class="p">{</span><span class="k">return</span> <span class="o">&</span><span class="n">Subtarget</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetRegisterInfo</span> <span class="o">*</span><span class="nf">getRegisterInfo</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
+    <span class="k">return</span> <span class="o">&</span><span class="n">InstrInfo</span><span class="p">.</span><span class="n">getRegisterInfo</span><span class="p">();</span>
+  <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">DataLayout</span> <span class="o">*</span><span class="nf">getDataLayout</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="o">&</span><span class="n">DataLayout</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="kt">unsigned</span> <span class="nf">getModuleMatchQuality</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">);</span>
+
+  <span class="c1">// Pass Pipeline Configuration</span>
+  <span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">addInstSelector</span><span class="p">(</span><span class="n">PassManagerBase</span> <span class="o">&</span><span class="n">PM</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">Fast</span><span class="p">);</span>
+  <span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">addPreEmitPass</span><span class="p">(</span><span class="n">PassManagerBase</span> <span class="o">&</span><span class="n">PM</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">Fast</span><span class="p">);</span>
+<span class="p">};</span>
+
+<span class="p">}</span> <span class="c1">// end namespace llvm</span>
+</pre></div>
+</div>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getInstrInfo()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getRegisterInfo()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getFrameInfo()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getDataLayout()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getSubtargetImpl()</span></code></li>
+</ul>
+<p>For some targets, you also need to support the following methods:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getTargetLowering()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getJITInfo()</span></code></li>
+</ul>
+<p>Some architectures, such as GPUs, do not support jumping to an arbitrary
+program location and implement branching using masked execution and loop using
+special instructions around the loop body. In order to avoid CFG modifications
+that introduce irreducible control flow not handled by such hardware, a target
+must call <cite>setRequiresStructuredCFG(true)</cite> when being initialized.</p>
+<p>In addition, the <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> constructor should specify a
+<code class="docutils literal"><span class="pre">TargetDescription</span></code> string that determines the data layout for the target
+machine, including characteristics such as pointer size, alignment, and
+endianness.  For example, the constructor for <code class="docutils literal"><span class="pre">SparcTargetMachine</span></code> contains
+the following:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SparcTargetMachine</span><span class="o">::</span><span class="n">SparcTargetMachine</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">FS</span><span class="p">)</span>
+  <span class="o">:</span> <span class="n">DataLayout</span><span class="p">(</span><span class="s">"E-p:32:32-f128:128:128"</span><span class="p">),</span>
+    <span class="n">Subtarget</span><span class="p">(</span><span class="n">M</span><span class="p">,</span> <span class="n">FS</span><span class="p">),</span> <span class="n">InstrInfo</span><span class="p">(</span><span class="n">Subtarget</span><span class="p">),</span>
+    <span class="n">FrameInfo</span><span class="p">(</span><span class="n">TargetFrameInfo</span><span class="o">::</span><span class="n">StackGrowsDown</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Hyphens separate portions of the <code class="docutils literal"><span class="pre">TargetDescription</span></code> string.</p>
+<ul class="simple">
+<li>An upper-case â<code class="docutils literal"><span class="pre">E</span></code>â in the string indicates a big-endian target data model.
+A lower-case â<code class="docutils literal"><span class="pre">e</span></code>â indicates little-endian.</li>
+<li>â<code class="docutils literal"><span class="pre">p:</span></code>â is followed by pointer information: size, ABI alignment, and
+preferred alignment.  If only two figures follow â<code class="docutils literal"><span class="pre">p:</span></code>â, then the first
+value is pointer size, and the second value is both ABI and preferred
+alignment.</li>
+<li>Then a letter for numeric type alignment: â<code class="docutils literal"><span class="pre">i</span></code>â, â<code class="docutils literal"><span class="pre">f</span></code>â, â<code class="docutils literal"><span class="pre">v</span></code>â, or
+â<code class="docutils literal"><span class="pre">a</span></code>â (corresponding to integer, floating point, vector, or aggregate).
+â<code class="docutils literal"><span class="pre">i</span></code>â, â<code class="docutils literal"><span class="pre">v</span></code>â, or â<code class="docutils literal"><span class="pre">a</span></code>â are followed by ABI alignment and preferred
+alignment. â<code class="docutils literal"><span class="pre">f</span></code>â is followed by three values: the first indicates the size
+of a long double, then ABI alignment, and then ABI preferred alignment.</li>
+</ul>
+</div>
+<div class="section" id="target-registration">
+<h2><a class="toc-backref" href="#id9">Target Registration</a><a class="headerlink" href="#target-registration" title="Permalink to this headline">Â¶</a></h2>
+<p>You must also register your target with the <code class="docutils literal"><span class="pre">TargetRegistry</span></code>, which is what
+other LLVM tools use to be able to lookup and use your target at runtime.  The
+<code class="docutils literal"><span class="pre">TargetRegistry</span></code> can be used directly, but for most targets there are helper
+templates which should take care of the work for you.</p>
+<p>All targets should declare a global <code class="docutils literal"><span class="pre">Target</span></code> object which is used to
+represent the target during registration.  Then, in the targetâs <code class="docutils literal"><span class="pre">TargetInfo</span></code>
+library, the target should define that object and use the <code class="docutils literal"><span class="pre">RegisterTarget</span></code>
+template to register the target.  For example, the Sparc registration code
+looks like this:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">Target</span> <span class="n">llvm</span><span class="o">::</span><span class="n">getTheSparcTarget</span><span class="p">();</span>
+
+<span class="k">extern</span> <span class="s">"C"</span> <span class="kt">void</span> <span class="n">LLVMInitializeSparcTargetInfo</span><span class="p">()</span> <span class="p">{</span>
+  <span class="n">RegisterTarget</span><span class="o"><</span><span class="n">Triple</span><span class="o">::</span><span class="n">sparc</span><span class="p">,</span> <span class="cm">/*HasJIT=*/</span><span class="nb">false</span><span class="o">></span>
+    <span class="n">X</span><span class="p">(</span><span class="n">getTheSparcTarget</span><span class="p">(),</span> <span class="s">"sparc"</span><span class="p">,</span> <span class="s">"Sparc"</span><span class="p">);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This allows the <code class="docutils literal"><span class="pre">TargetRegistry</span></code> to look up the target by name or by target
+triple.  In addition, most targets will also register additional features which
+are available in separate libraries.  These registration steps are separate,
+because some clients may wish to only link in some parts of the target â the
+JIT code generator does not require the use of the assembler printer, for
+example.  Here is an example of registering the Sparc assembly printer:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">extern</span> <span class="s">"C"</span> <span class="kt">void</span> <span class="n">LLVMInitializeSparcAsmPrinter</span><span class="p">()</span> <span class="p">{</span>
+  <span class="n">RegisterAsmPrinter</span><span class="o"><</span><span class="n">SparcAsmPrinter</span><span class="o">></span> <span class="n">X</span><span class="p">(</span><span class="n">getTheSparcTarget</span><span class="p">());</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>For more information, see â<a class="reference external" href="/doxygen/TargetRegistry_8h-source.html">llvm/Target/TargetRegistry.h</a>â.</p>
+</div>
+<div class="section" id="register-set-and-register-classes">
+<h2><a class="toc-backref" href="#id10">Register Set and Register Classes</a><a class="headerlink" href="#register-set-and-register-classes" title="Permalink to this headline">Â¶</a></h2>
+<p>You should describe a concrete target-specific class that represents the
+register file of a target machine.  This class is called <code class="docutils literal"><span class="pre">XXXRegisterInfo</span></code>
+(where <code class="docutils literal"><span class="pre">XXX</span></code> identifies the target) and represents the class register file
+data that is used for register allocation.  It also describes the interactions
+between registers.</p>
+<p>You also need to define register classes to categorize related registers.  A
+register class should be added for groups of registers that are all treated the
+same way for some instruction.  Typical examples are register classes for
+integer, floating-point, or vector registers.  A register allocator allows an
+instruction to use any register in a specified register class to perform the
+instruction in a similar manner.  Register classes allocate virtual registers
+to instructions from these sets, and register classes let the
+target-independent register allocator automatically choose the actual
+registers.</p>
+<p>Much of the code for registers, including register definition, register
+aliases, and register classes, is generated by TableGen from
+<code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> input files and placed in <code class="docutils literal"><span class="pre">XXXGenRegisterInfo.h.inc</span></code>
+and <code class="docutils literal"><span class="pre">XXXGenRegisterInfo.inc</span></code> output files.  Some of the code in the
+implementation of <code class="docutils literal"><span class="pre">XXXRegisterInfo</span></code> requires hand-coding.</p>
+<div class="section" id="defining-a-register">
+<h3><a class="toc-backref" href="#id11">Defining a Register</a><a class="headerlink" href="#defining-a-register" title="Permalink to this headline">Â¶</a></h3>
+<p>The <code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> file typically starts with register definitions for
+a target machine.  The <code class="docutils literal"><span class="pre">Register</span></code> class (specified in <code class="docutils literal"><span class="pre">Target.td</span></code>) is used
+to define an object for each register.  The specified string <code class="docutils literal"><span class="pre">n</span></code> becomes the
+<code class="docutils literal"><span class="pre">Name</span></code> of the register.  The basic <code class="docutils literal"><span class="pre">Register</span></code> object does not have any
+subregisters and does not specify any aliases.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class Register<string n> {
+  string Namespace = "";
+  string AsmName = n;
+  string Name = n;
+  int SpillSize = 0;
+  int SpillAlignment = 0;
+  list<Register> Aliases = [];
+  list<Register> SubRegs = [];
+  list<int> DwarfNumbers = [];
+}
+</pre></div>
+</div>
+<p>For example, in the <code class="docutils literal"><span class="pre">X86RegisterInfo.td</span></code> file, there are register definitions
+that utilize the <code class="docutils literal"><span class="pre">Register</span></code> class, such as:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>;
+</pre></div>
+</div>
+<p>This defines the register <code class="docutils literal"><span class="pre">AL</span></code> and assigns it values (with <code class="docutils literal"><span class="pre">DwarfRegNum</span></code>)
+that are used by <code class="docutils literal"><span class="pre">gcc</span></code>, <code class="docutils literal"><span class="pre">gdb</span></code>, or a debug information writer to identify a
+register.  For register <code class="docutils literal"><span class="pre">AL</span></code>, <code class="docutils literal"><span class="pre">DwarfRegNum</span></code> takes an array of 3 values
+representing 3 different modes: the first element is for X86-64, the second for
+exception handling (EH) on X86-32, and the third is generic. -1 is a special
+Dwarf number that indicates the gcc number is undefined, and -2 indicates the
+register number is invalid for this mode.</p>
+<p>From the previously described line in the <code class="docutils literal"><span class="pre">X86RegisterInfo.td</span></code> file, TableGen
+generates this code in the <code class="docutils literal"><span class="pre">X86GenRegisterInfo.inc</span></code> file:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="n">GR8</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">X86</span><span class="o">::</span><span class="n">AL</span><span class="p">,</span> <span class="p">...</span> <span class="p">};</span>
+
+<span class="k">const</span> <span class="kt">unsigned</span> <span class="n">AL_AliasSet</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">X86</span><span class="o">::</span><span class="n">AX</span><span class="p">,</span> <span class="n">X86</span><span class="o">::</span><span class="n">EAX</span><span class="p">,</span> <span class="n">X86</span><span class="o">::</span><span class="n">RAX</span><span class="p">,</span> <span class="mi">0</span> <span class="p">};</span>
+
+<span class="k">const</span> <span class="n">TargetRegisterDesc</span> <span class="n">RegisterDescriptors</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">{</span> <span class="s">"AL"</span><span class="p">,</span> <span class="s">"AL"</span><span class="p">,</span> <span class="n">AL_AliasSet</span><span class="p">,</span> <span class="n">Empty_SubRegsSet</span><span class="p">,</span> <span class="n">Empty_SubRegsSet</span><span class="p">,</span> <span class="n">AL_SuperRegsSet</span> <span class="p">},</span> <span class="p">...</span>
+</pre></div>
+</div>
+<p>From the register info file, TableGen generates a <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code> object
+for each register.  <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code> is defined in
+<code class="docutils literal"><span class="pre">include/llvm/Target/TargetRegisterInfo.h</span></code> with the following fields:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">TargetRegisterDesc</span> <span class="p">{</span>
+  <span class="k">const</span> <span class="kt">char</span>     <span class="o">*</span><span class="n">AsmName</span><span class="p">;</span>      <span class="c1">// Assembly language name for the register</span>
+  <span class="k">const</span> <span class="kt">char</span>     <span class="o">*</span><span class="n">Name</span><span class="p">;</span>         <span class="c1">// Printable name for the reg (for debugging)</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">AliasSet</span><span class="p">;</span>     <span class="c1">// Register Alias Set</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">SubRegs</span><span class="p">;</span>      <span class="c1">// Sub-register set</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">ImmSubRegs</span><span class="p">;</span>   <span class="c1">// Immediate sub-register set</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">SuperRegs</span><span class="p">;</span>    <span class="c1">// Super-register set</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>TableGen uses the entire target description file (<code class="docutils literal"><span class="pre">.td</span></code>) to determine text
+names for the register (in the <code class="docutils literal"><span class="pre">AsmName</span></code> and <code class="docutils literal"><span class="pre">Name</span></code> fields of
+<code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code>) and the relationships of other registers to the defined
+register (in the other <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code> fields).  In this example, other
+definitions establish the registers â<code class="docutils literal"><span class="pre">AX</span></code>â, â<code class="docutils literal"><span class="pre">EAX</span></code>â, and â<code class="docutils literal"><span class="pre">RAX</span></code>â as
+aliases for one another, so TableGen generates a null-terminated array
+(<code class="docutils literal"><span class="pre">AL_AliasSet</span></code>) for this register alias set.</p>
+<p>The <code class="docutils literal"><span class="pre">Register</span></code> class is commonly used as a base class for more complex
+classes.  In <code class="docutils literal"><span class="pre">Target.td</span></code>, the <code class="docutils literal"><span class="pre">Register</span></code> class is the base for the
+<code class="docutils literal"><span class="pre">RegisterWithSubRegs</span></code> class that is used to define registers that need to
+specify subregisters in the <code class="docutils literal"><span class="pre">SubRegs</span></code> list, as shown here:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> {
+  let SubRegs = subregs;
+}
+</pre></div>
+</div>
+<p>In <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code>, additional register classes are defined for SPARC:
+a <code class="docutils literal"><span class="pre">Register</span></code> subclass, <code class="docutils literal"><span class="pre">SparcReg</span></code>, and further subclasses: <code class="docutils literal"><span class="pre">Ri</span></code>, <code class="docutils literal"><span class="pre">Rf</span></code>,
+and <code class="docutils literal"><span class="pre">Rd</span></code>.  SPARC registers are identified by 5-bit ID numbers, which is a
+feature common to these subclasses.  Note the use of â<code class="docutils literal"><span class="pre">let</span></code>â expressions to
+override values that are initially defined in a superclass (such as <code class="docutils literal"><span class="pre">SubRegs</span></code>
+field in the <code class="docutils literal"><span class="pre">Rd</span></code> class).</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class SparcReg<string n> : Register<n> {
+  field bits<5> Num;
+  let Namespace = "SP";
+}
+// Ri - 32-bit integer registers
+class Ri<bits<5> num, string n> :
+SparcReg<n> {
+  let Num = num;
+}
+// Rf - 32-bit floating-point registers
+class Rf<bits<5> num, string n> :
+SparcReg<n> {
+  let Num = num;
+}
+// Rd - Slots in the FP register file for 64-bit floating-point values.
+class Rd<bits<5> num, string n, list<Register> subregs> : SparcReg<n> {
+  let Num = num;
+  let SubRegs = subregs;
+}
+</pre></div>
+</div>
+<p>In the <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> file, there are register definitions that
+utilize these subclasses of <code class="docutils literal"><span class="pre">Register</span></code>, such as:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>;
+def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>;
+...
+def F0 : Rf< 0, "F0">, DwarfRegNum<[32]>;
+def F1 : Rf< 1, "F1">, DwarfRegNum<[33]>;
+...
+def D0 : Rd< 0, "F0", [F0, F1]>, DwarfRegNum<[32]>;
+def D1 : Rd< 2, "F2", [F2, F3]>, DwarfRegNum<[34]>;
+</pre></div>
+</div>
+<p>The last two registers shown above (<code class="docutils literal"><span class="pre">D0</span></code> and <code class="docutils literal"><span class="pre">D1</span></code>) are double-precision
+floating-point registers that are aliases for pairs of single-precision
+floating-point sub-registers.  In addition to aliases, the sub-register and
+super-register relationships of the defined register are in fields of a
+registerâs <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code>.</p>
+</div>
+<div class="section" id="defining-a-register-class">
+<h3><a class="toc-backref" href="#id12">Defining a Register Class</a><a class="headerlink" href="#defining-a-register-class" title="Permalink to this headline">Â¶</a></h3>
+<p>The <code class="docutils literal"><span class="pre">RegisterClass</span></code> class (specified in <code class="docutils literal"><span class="pre">Target.td</span></code>) is used to define an
+object that represents a group of related registers and also defines the
+default allocation order of the registers.  A target description file
+<code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> that uses <code class="docutils literal"><span class="pre">Target.td</span></code> can construct register classes
+using the following class:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class RegisterClass<string namespace,
+list<ValueType> regTypes, int alignment, dag regList> {
+  string Namespace = namespace;
+  list<ValueType> RegTypes = regTypes;
+  int Size = 0;  // spill size, in bits; zero lets tblgen pick the size
+  int Alignment = alignment;
+
+  // CopyCost is the cost of copying a value between two registers
+  // default value 1 means a single instruction
+  // A negative value means copying is extremely expensive or impossible
+  int CopyCost = 1;
+  dag MemberList = regList;
+
+  // for register classes that are subregisters of this class
+  list<RegisterClass> SubRegClassList = [];
+
+  code MethodProtos = [{}];  // to insert arbitrary code
+  code MethodBodies = [{}];
+}
+</pre></div>
+</div>
+<p>To define a <code class="docutils literal"><span class="pre">RegisterClass</span></code>, use the following 4 arguments:</p>
+<ul class="simple">
+<li>The first argument of the definition is the name of the namespace.</li>
+<li>The second argument is a list of <code class="docutils literal"><span class="pre">ValueType</span></code> register type values that are
+defined in <code class="docutils literal"><span class="pre">include/llvm/CodeGen/ValueTypes.td</span></code>.  Defined values include
+integer types (such as <code class="docutils literal"><span class="pre">i16</span></code>, <code class="docutils literal"><span class="pre">i32</span></code>, and <code class="docutils literal"><span class="pre">i1</span></code> for Boolean),
+floating-point types (<code class="docutils literal"><span class="pre">f32</span></code>, <code class="docutils literal"><span class="pre">f64</span></code>), and vector types (for example,
+<code class="docutils literal"><span class="pre">v8i16</span></code> for an <code class="docutils literal"><span class="pre">8</span> <span class="pre">x</span> <span class="pre">i16</span></code> vector).  All registers in a <code class="docutils literal"><span class="pre">RegisterClass</span></code>
+must have the same <code class="docutils literal"><span class="pre">ValueType</span></code>, but some registers may store vector data in
+different configurations.  For example a register that can process a 128-bit
+vector may be able to handle 16 8-bit integer elements, 8 16-bit integers, 4
+32-bit integers, and so on.</li>
+<li>The third argument of the <code class="docutils literal"><span class="pre">RegisterClass</span></code> definition specifies the
+alignment required of the registers when they are stored or loaded to
+memory.</li>
+<li>The final argument, <code class="docutils literal"><span class="pre">regList</span></code>, specifies which registers are in this class.
+If an alternative allocation order method is not specified, then <code class="docutils literal"><span class="pre">regList</span></code>
+also defines the order of allocation used by the register allocator.  Besides
+simply listing registers with <code class="docutils literal"><span class="pre">(add</span> <span class="pre">R0,</span> <span class="pre">R1,</span> <span class="pre">...)</span></code>, more advanced set
+operators are available.  See <code class="docutils literal"><span class="pre">include/llvm/Target/Target.td</span></code> for more
+information.</li>
+</ul>
+<p>In <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code>, three <code class="docutils literal"><span class="pre">RegisterClass</span></code> objects are defined:
+<code class="docutils literal"><span class="pre">FPRegs</span></code>, <code class="docutils literal"><span class="pre">DFPRegs</span></code>, and <code class="docutils literal"><span class="pre">IntRegs</span></code>.  For all three register classes, the
+first argument defines the namespace with the string â<code class="docutils literal"><span class="pre">SP</span></code>â.  <code class="docutils literal"><span class="pre">FPRegs</span></code>
+defines a group of 32 single-precision floating-point registers (<code class="docutils literal"><span class="pre">F0</span></code> to
+<code class="docutils literal"><span class="pre">F31</span></code>); <code class="docutils literal"><span class="pre">DFPRegs</span></code> defines a group of 16 double-precision registers
+(<code class="docutils literal"><span class="pre">D0-D15</span></code>).</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>// F0, F1, F2, ..., F31
+def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>;
+
+def DFPRegs : RegisterClass<"SP", [f64], 64,
+                            (add D0, D1, D2, D3, D4, D5, D6, D7, D8,
+                                 D9, D10, D11, D12, D13, D14, D15)>;
+
+def IntRegs : RegisterClass<"SP", [i32], 32,
+    (add L0, L1, L2, L3, L4, L5, L6, L7,
+         I0, I1, I2, I3, I4, I5,
+         O0, O1, O2, O3, O4, O5, O7,
+         G1,
+         // Non-allocatable regs:
+         G2, G3, G4,
+         O6,        // stack ptr
+         I6,        // frame ptr
+         I7,        // return address
+         G0,        // constant zero
+         G5, G6, G7 // reserved for kernel
+    )>;
+</pre></div>
+</div>
+<p>Using <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> with TableGen generates several output files
+that are intended for inclusion in other source code that you write.
+<code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> generates <code class="docutils literal"><span class="pre">SparcGenRegisterInfo.h.inc</span></code>, which should
+be included in the header file for the implementation of the SPARC register
+implementation that you write (<code class="docutils literal"><span class="pre">SparcRegisterInfo.h</span></code>).  In
+<code class="docutils literal"><span class="pre">SparcGenRegisterInfo.h.inc</span></code> a new structure is defined called
+<code class="docutils literal"><span class="pre">SparcGenRegisterInfo</span></code> that uses <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code> as its base.  It also
+specifies types, based upon the defined register classes: <code class="docutils literal"><span class="pre">DFPRegsClass</span></code>,
+<code class="docutils literal"><span class="pre">FPRegsClass</span></code>, and <code class="docutils literal"><span class="pre">IntRegsClass</span></code>.</p>
+<p><code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> also generates <code class="docutils literal"><span class="pre">SparcGenRegisterInfo.inc</span></code>, which is
+included at the bottom of <code class="docutils literal"><span class="pre">SparcRegisterInfo.cpp</span></code>, the SPARC register
+implementation.  The code below shows only the generated integer registers and
+associated register classes.  The order of registers in <code class="docutils literal"><span class="pre">IntRegs</span></code> reflects
+the order in the definition of <code class="docutils literal"><span class="pre">IntRegs</span></code> in the target description file.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// IntRegs Register Class...</span>
+<span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="n">IntRegs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">L0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L3</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L5</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">L6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L7</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I3</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">I4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I5</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O3</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">O4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O5</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O7</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G3</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">G4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I7</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G5</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">G6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G7</span><span class="p">,</span>
+<span class="p">};</span>
+
+<span class="c1">// IntRegsVTs Register Class Value Types...</span>
+<span class="k">static</span> <span class="k">const</span> <span class="n">MVT</span><span class="o">::</span><span class="n">ValueType</span> <span class="n">IntRegsVTs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+  <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">Other</span>
+<span class="p">};</span>
+
+<span class="k">namespace</span> <span class="n">SP</span> <span class="p">{</span>   <span class="c1">// Register class instances</span>
+  <span class="n">DFPRegsClass</span>    <span class="n">DFPRegsRegClass</span><span class="p">;</span>
+  <span class="n">FPRegsClass</span>     <span class="n">FPRegsRegClass</span><span class="p">;</span>
+  <span class="n">IntRegsClass</span>    <span class="n">IntRegsRegClass</span><span class="p">;</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Sub-register Classes...</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSubRegClasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Super-register Classes..</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSuperRegClasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Register Class sub-classes...</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSubclasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Register Class super-classes...</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSuperclasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+
+  <span class="n">IntRegsClass</span><span class="o">::</span><span class="n">IntRegsClass</span><span class="p">()</span> <span class="o">:</span> <span class="n">TargetRegisterClass</span><span class="p">(</span><span class="n">IntRegsRegClassID</span><span class="p">,</span>
+    <span class="n">IntRegsVTs</span><span class="p">,</span> <span class="n">IntRegsSubclasses</span><span class="p">,</span> <span class="n">IntRegsSuperclasses</span><span class="p">,</span> <span class="n">IntRegsSubRegClasses</span><span class="p">,</span>
+    <span class="n">IntRegsSuperRegClasses</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">IntRegs</span><span class="p">,</span> <span class="n">IntRegs</span> <span class="o">+</span> <span class="mi">32</span><span class="p">)</span> <span class="p">{}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The register allocators will avoid using reserved registers, and callee saved
+registers are not used until all the volatile registers have been used.  That
+is usually good enough, but in some cases it may be necessary to provide custom
+allocation orders.</p>
+</div>
+<div class="section" id="implement-a-subclass-of-targetregisterinfo">
+<h3><a class="toc-backref" href="#id13">Implement a subclass of <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code></a><a class="headerlink" href="#implement-a-subclass-of-targetregisterinfo" title="Permalink to this headline">Â¶</a></h3>
+<p>The final step is to hand code portions of <code class="docutils literal"><span class="pre">XXXRegisterInfo</span></code>, which
+implements the interface described in <code class="docutils literal"><span class="pre">TargetRegisterInfo.h</span></code> (see
+<a class="reference internal" href="CodeGenerator.html#targetregisterinfo"><span class="std std-ref">The TargetRegisterInfo class</span></a>).  These functions return <code class="docutils literal"><span class="pre">0</span></code>, <code class="docutils literal"><span class="pre">NULL</span></code>, or
+<code class="docutils literal"><span class="pre">false</span></code>, unless overridden.  Here is a list of functions that are overridden
+for the SPARC implementation in <code class="docutils literal"><span class="pre">SparcRegisterInfo.cpp</span></code>:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getCalleeSavedRegs</span></code> â Returns a list of callee-saved registers in the
+order of the desired callee-save stack frame offset.</li>
+<li><code class="docutils literal"><span class="pre">getReservedRegs</span></code> â Returns a bitset indexed by physical register
+numbers, indicating if a particular register is unavailable.</li>
+<li><code class="docutils literal"><span class="pre">hasFP</span></code> â Return a Boolean indicating if a function should have a
+dedicated frame pointer register.</li>
+<li><code class="docutils literal"><span class="pre">eliminateCallFramePseudoInstr</span></code> â If call frame setup or destroy pseudo
+instructions are used, this can be called to eliminate them.</li>
+<li><code class="docutils literal"><span class="pre">eliminateFrameIndex</span></code> â Eliminate abstract frame indices from
+instructions that may use them.</li>
+<li><code class="docutils literal"><span class="pre">emitPrologue</span></code> â Insert prologue code into the function.</li>
+<li><code class="docutils literal"><span class="pre">emitEpilogue</span></code> â Insert epilogue code into the function.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="instruction-set">
+<span id="id1"></span><h2><a class="toc-backref" href="#id14">Instruction Set</a><a class="headerlink" href="#instruction-set" title="Permalink to this headline">Â¶</a></h2>
+<p>During the early stages of code generation, the LLVM IR code is converted to a
+<code class="docutils literal"><span class="pre">SelectionDAG</span></code> with nodes that are instances of the <code class="docutils literal"><span class="pre">SDNode</span></code> class
+containing target instructions.  An <code class="docutils literal"><span class="pre">SDNode</span></code> has an opcode, operands, type
+requirements, and operation properties.  For example, is an operation
+commutative, does an operation load from memory.  The various operation node
+types are described in the <code class="docutils literal"><span class="pre">include/llvm/CodeGen/SelectionDAGNodes.h</span></code> file
+(values of the <code class="docutils literal"><span class="pre">NodeType</span></code> enum in the <code class="docutils literal"><span class="pre">ISD</span></code> namespace).</p>
+<p>TableGen uses the following target description (<code class="docutils literal"><span class="pre">.td</span></code>) input files to
+generate much of the code for instruction definition:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">Target.td</span></code> â Where the <code class="docutils literal"><span class="pre">Instruction</span></code>, <code class="docutils literal"><span class="pre">Operand</span></code>, <code class="docutils literal"><span class="pre">InstrInfo</span></code>, and
+other fundamental classes are defined.</li>
+<li><code class="docutils literal"><span class="pre">TargetSelectionDAG.td</span></code> â Used by <code class="docutils literal"><span class="pre">SelectionDAG</span></code> instruction selection
+generators, contains <code class="docutils literal"><span class="pre">SDTC*</span></code> classes (selection DAG type constraint),
+definitions of <code class="docutils literal"><span class="pre">SelectionDAG</span></code> nodes (such as <code class="docutils literal"><span class="pre">imm</span></code>, <code class="docutils literal"><span class="pre">cond</span></code>, <code class="docutils literal"><span class="pre">bb</span></code>,
+<code class="docutils literal"><span class="pre">add</span></code>, <code class="docutils literal"><span class="pre">fadd</span></code>, <code class="docutils literal"><span class="pre">sub</span></code>), and pattern support (<code class="docutils literal"><span class="pre">Pattern</span></code>, <code class="docutils literal"><span class="pre">Pat</span></code>,
+<code class="docutils literal"><span class="pre">PatFrag</span></code>, <code class="docutils literal"><span class="pre">PatLeaf</span></code>, <code class="docutils literal"><span class="pre">ComplexPattern</span></code>.</li>
+<li><code class="docutils literal"><span class="pre">XXXInstrFormats.td</span></code> â Patterns for definitions of target-specific
+instructions.</li>
+<li><code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> â Target-specific definitions of instruction templates,
+condition codes, and instructions of an instruction set.  For architecture
+modifications, a different file name may be used.  For example, for Pentium
+with SSE instruction, this file is <code class="docutils literal"><span class="pre">X86InstrSSE.td</span></code>, and for Pentium with
+MMX, this file is <code class="docutils literal"><span class="pre">X86InstrMMX.td</span></code>.</li>
+</ul>
+<p>There is also a target-specific <code class="docutils literal"><span class="pre">XXX.td</span></code> file, where <code class="docutils literal"><span class="pre">XXX</span></code> is the name of
+the target.  The <code class="docutils literal"><span class="pre">XXX.td</span></code> file includes the other <code class="docutils literal"><span class="pre">.td</span></code> input files, but
+its contents are only directly important for subtargets.</p>
+<p>You should describe a concrete target-specific class <code class="docutils literal"><span class="pre">XXXInstrInfo</span></code> that
+represents machine instructions supported by a target machine.
+<code class="docutils literal"><span class="pre">XXXInstrInfo</span></code> contains an array of <code class="docutils literal"><span class="pre">XXXInstrDescriptor</span></code> objects, each of
+which describes one instruction.  An instruction descriptor defines:</p>
+<ul class="simple">
+<li>Opcode mnemonic</li>
+<li>Number of operands</li>
+<li>List of implicit register definitions and uses</li>
+<li>Target-independent properties (such as memory access, is commutable)</li>
+<li>Target-specific flags</li>
+</ul>
+<p>The Instruction class (defined in <code class="docutils literal"><span class="pre">Target.td</span></code>) is mostly used as a base for
+more complex instruction classes.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class Instruction {
+  string Namespace = "";
+  dag OutOperandList;    // A dag containing the MI def operand list.
+  dag InOperandList;     // A dag containing the MI use operand list.
+  string AsmString = ""; // The .s format to print the instruction with.
+  list<dag> Pattern;     // Set to the DAG pattern for this instruction.
+  list<Register> Uses = [];
+  list<Register> Defs = [];
+  list<Predicate> Predicates = [];  // predicates turned into isel match code
+  ... remainder not shown for space ...
+}
+</pre></div>
+</div>
+<p>A <code class="docutils literal"><span class="pre">SelectionDAG</span></code> node (<code class="docutils literal"><span class="pre">SDNode</span></code>) should contain an object representing a
+target-specific instruction that is defined in <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code>.  The
+instruction objects should represent instructions from the architecture manual
+of the target machine (such as the SPARC Architecture Manual for the SPARC
+target).</p>
+<p>A single instruction from the architecture manual is often modeled as multiple
+target instructions, depending upon its operands.  For example, a manual might
+describe an add instruction that takes a register or an immediate operand.  An
+LLVM target could model this with two instructions named <code class="docutils literal"><span class="pre">ADDri</span></code> and
+<code class="docutils literal"><span class="pre">ADDrr</span></code>.</p>
+<p>You should define a class for each instruction category and define each opcode
+as a subclass of the category with appropriate parameters such as the fixed
+binary encoding of opcodes and extended opcodes.  You should map the register
+bits to the bits of the instruction in which they are encoded (for the JIT).
+Also you should specify how the instruction should be printed when the
+automatic assembly printer is used.</p>
+<p>As is described in the SPARC Architecture Manual, Version 8, there are three
+major 32-bit formats for instructions.  Format 1 is only for the <code class="docutils literal"><span class="pre">CALL</span></code>
+instruction.  Format 2 is for branch on condition codes and <code class="docutils literal"><span class="pre">SETHI</span></code> (set high
+bits of a register) instructions.  Format 3 is for other instructions.</p>
+<p>Each of these formats has corresponding classes in <code class="docutils literal"><span class="pre">SparcInstrFormat.td</span></code>.
+<code class="docutils literal"><span class="pre">InstSP</span></code> is a base class for other instruction classes.  Additional base
+classes are specified for more precise formats: for example in
+<code class="docutils literal"><span class="pre">SparcInstrFormat.td</span></code>, <code class="docutils literal"><span class="pre">F2_1</span></code> is for <code class="docutils literal"><span class="pre">SETHI</span></code>, and <code class="docutils literal"><span class="pre">F2_2</span></code> is for
+branches.  There are three other base classes: <code class="docutils literal"><span class="pre">F3_1</span></code> for register/register
+operations, <code class="docutils literal"><span class="pre">F3_2</span></code> for register/immediate operations, and <code class="docutils literal"><span class="pre">F3_3</span></code> for
+floating-point operations.  <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> also adds the base class
+<code class="docutils literal"><span class="pre">Pseudo</span></code> for synthetic SPARC instructions.</p>
+<p><code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> largely consists of operand and instruction definitions
+for the SPARC target.  In <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>, the following target
+description file entry, <code class="docutils literal"><span class="pre">LDrr</span></code>, defines the Load Integer instruction for a
+Word (the <code class="docutils literal"><span class="pre">LD</span></code> SPARC opcode) from a memory address to a register.  The first
+parameter, the value 3 (<code class="docutils literal"><span class="pre">11</span></code><sub>2</sub>), is the operation value for this
+category of operation.  The second parameter (<code class="docutils literal"><span class="pre">000000</span></code><sub>2</sub>) is the
+specific operation value for <code class="docutils literal"><span class="pre">LD</span></code>/Load Word.  The third parameter is the
+output destination, which is a register operand and defined in the <code class="docutils literal"><span class="pre">Register</span></code>
+target description file (<code class="docutils literal"><span class="pre">IntRegs</span></code>).</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr),
+                 "ld [$addr], $dst",
+                 [(set i32:$dst, (load ADDRrr:$addr))]>;
+</pre></div>
+</div>
+<p>The fourth parameter is the input source, which uses the address operand
+<code class="docutils literal"><span class="pre">MEMrr</span></code> that is defined earlier in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def MEMrr : Operand<i32> {
+  let PrintMethod = "printMemOperand";
+  let MIOperandInfo = (ops IntRegs, IntRegs);
+}
+</pre></div>
+</div>
+<p>The fifth parameter is a string that is used by the assembly printer and can be
+left as an empty string until the assembly printer interface is implemented.
+The sixth and final parameter is the pattern used to match the instruction
+during the SelectionDAG Select Phase described in <a class="reference internal" href="CodeGenerator.html"><span class="doc">The LLVM Target-Independent Code Generator</span></a>.
+This parameter is detailed in the next section, <a class="reference internal" href="#instruction-selector"><span class="std std-ref">Instruction Selector</span></a>.</p>
+<p>Instruction class definitions are not overloaded for different operand types,
+so separate versions of instructions are needed for register, memory, or
+immediate value operands.  For example, to perform a Load Integer instruction
+for a Word from an immediate operand to a register, the following instruction
+class is defined:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr),
+                 "ld [$addr], $dst",
+                 [(set i32:$dst, (load ADDRri:$addr))]>;
+</pre></div>
+</div>
+<p>Writing these definitions for so many similar instructions can involve a lot of
+cut and paste.  In <code class="docutils literal"><span class="pre">.td</span></code> files, the <code class="docutils literal"><span class="pre">multiclass</span></code> directive enables the
+creation of templates to define several instruction classes at once (using the
+<code class="docutils literal"><span class="pre">defm</span></code> directive).  For example in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>, the <code class="docutils literal"><span class="pre">multiclass</span></code>
+pattern <code class="docutils literal"><span class="pre">F3_12</span></code> is defined to create 2 instruction classes each time
+<code class="docutils literal"><span class="pre">F3_12</span></code> is invoked:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> {
+  def rr  : F3_1 <2, Op3Val,
+                 (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
+                 !strconcat(OpcStr, " $b, $c, $dst"),
+                 [(set i32:$dst, (OpNode i32:$b, i32:$c))]>;
+  def ri  : F3_2 <2, Op3Val,
+                 (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c),
+                 !strconcat(OpcStr, " $b, $c, $dst"),
+                 [(set i32:$dst, (OpNode i32:$b, simm13:$c))]>;
+}
+</pre></div>
+</div>
+<p>So when the <code class="docutils literal"><span class="pre">defm</span></code> directive is used for the <code class="docutils literal"><span class="pre">XOR</span></code> and <code class="docutils literal"><span class="pre">ADD</span></code>
+instructions, as seen below, it creates four instruction objects: <code class="docutils literal"><span class="pre">XORrr</span></code>,
+<code class="docutils literal"><span class="pre">XORri</span></code>, <code class="docutils literal"><span class="pre">ADDrr</span></code>, and <code class="docutils literal"><span class="pre">ADDri</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>defm XOR   : F3_12<"xor", 0b000011, xor>;
+defm ADD   : F3_12<"add", 0b000000, add>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> also includes definitions for condition codes that are
+referenced by branch instructions.  The following definitions in
+<code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> indicate the bit location of the SPARC condition code.
+For example, the 10<sup>th</sup> bit represents the âgreater thanâ condition for
+integers, and the 22<sup>nd</sup> bit represents the âgreater thanâ condition for
+floats.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def ICC_NE  : ICC_VAL< 9>;  // Not Equal
+def ICC_E   : ICC_VAL< 1>;  // Equal
+def ICC_G   : ICC_VAL<10>;  // Greater
+...
+def FCC_U   : FCC_VAL<23>;  // Unordered
+def FCC_G   : FCC_VAL<22>;  // Greater
+def FCC_UG  : FCC_VAL<21>;  // Unordered or Greater
+...
+</pre></div>
+</div>
+<p>(Note that <code class="docutils literal"><span class="pre">Sparc.h</span></code> also defines enums that correspond to the same SPARC
+condition codes.  Care must be taken to ensure the values in <code class="docutils literal"><span class="pre">Sparc.h</span></code>
+correspond to the values in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>.  I.e., <code class="docutils literal"><span class="pre">SPCC::ICC_NE</span> <span class="pre">=</span> <span class="pre">9</span></code>,
+<code class="docutils literal"><span class="pre">SPCC::FCC_U</span> <span class="pre">=</span> <span class="pre">23</span></code> and so on.)</p>
+<div class="section" id="instruction-operand-mapping">
+<h3><a class="toc-backref" href="#id15">Instruction Operand Mapping</a><a class="headerlink" href="#instruction-operand-mapping" title="Permalink to this headline">Â¶</a></h3>
+<p>The code generator backend maps instruction operands to fields in the
+instruction.  Operands are assigned to unbound fields in the instruction in the
+order they are defined.  Fields are bound when they are assigned a value.  For
+example, the Sparc target defines the <code class="docutils literal"><span class="pre">XNORrr</span></code> instruction as a <code class="docutils literal"><span class="pre">F3_1</span></code>
+format instruction having three operands.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def XNORrr  : F3_1<2, 0b000111,
+                   (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
+                   "xnor $b, $c, $dst",
+                   [(set i32:$dst, (not (xor i32:$b, i32:$c)))]>;
+</pre></div>
+</div>
+<p>The instruction templates in <code class="docutils literal"><span class="pre">SparcInstrFormats.td</span></code> show the base class for
+<code class="docutils literal"><span class="pre">F3_1</span></code> is <code class="docutils literal"><span class="pre">InstSP</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction {
+  field bits<32> Inst;
+  let Namespace = "SP";
+  bits<2> op;
+  let Inst{31-30} = op;
+  dag OutOperandList = outs;
+  dag InOperandList = ins;
+  let AsmString   = asmstr;
+  let Pattern = pattern;
+}
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">InstSP</span></code> leaves the <code class="docutils literal"><span class="pre">op</span></code> field unbound.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class F3<dag outs, dag ins, string asmstr, list<dag> pattern>
+    : InstSP<outs, ins, asmstr, pattern> {
+  bits<5> rd;
+  bits<6> op3;
+  bits<5> rs1;
+  let op{1} = 1;   // Op = 2 or 3
+  let Inst{29-25} = rd;
+  let Inst{24-19} = op3;
+  let Inst{18-14} = rs1;
+}
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">F3</span></code> binds the <code class="docutils literal"><span class="pre">op</span></code> field and defines the <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">op3</span></code>, and <code class="docutils literal"><span class="pre">rs1</span></code>
+fields.  <code class="docutils literal"><span class="pre">F3</span></code> format instructions will bind the operands <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">op3</span></code>, and
+<code class="docutils literal"><span class="pre">rs1</span></code> fields.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins,
+           string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> {
+  bits<8> asi = 0; // asi not currently used
+  bits<5> rs2;
+  let op         = opVal;
+  let op3        = op3val;
+  let Inst{13}   = 0;     // i field = 0
+  let Inst{12-5} = asi;   // address space identifier
+  let Inst{4-0}  = rs2;
+}
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">F3_1</span></code> binds the <code class="docutils literal"><span class="pre">op3</span></code> field and defines the <code class="docutils literal"><span class="pre">rs2</span></code> fields.  <code class="docutils literal"><span class="pre">F3_1</span></code>
+format instructions will bind the operands to the <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">rs1</span></code>, and <code class="docutils literal"><span class="pre">rs2</span></code>
+fields.  This results in the <code class="docutils literal"><span class="pre">XNORrr</span></code> instruction binding <code class="docutils literal"><span class="pre">$dst</span></code>, <code class="docutils literal"><span class="pre">$b</span></code>,
+and <code class="docutils literal"><span class="pre">$c</span></code> operands to the <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">rs1</span></code>, and <code class="docutils literal"><span class="pre">rs2</span></code> fields respectively.</p>
+<div class="section" id="instruction-operand-name-mapping">
+<h4><a class="toc-backref" href="#id16">Instruction Operand Name Mapping</a><a class="headerlink" href="#instruction-operand-name-mapping" title="Permalink to this headline">Â¶</a></h4>
+<p>TableGen will also generate a function called getNamedOperandIdx() which
+can be used to look up an operandâs index in a MachineInstr based on its
+TableGen name.  Setting the UseNamedOperandTable bit in an instructionâs
+TableGen definition will add all of its operands to an enumeration in the
+llvm::XXX:OpName namespace and also add an entry for it into the OperandMap
+table, which can be queried using getNamedOperandIdx()</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>int DstIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::dst); // => 0
+int BIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::b);     // => 1
+int CIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::c);     // => 2
+int DIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::d);     // => -1
+
+...
+</pre></div>
+</div>
+<p>The entries in the OpName enum are taken verbatim from the TableGen definitions,
+so operands with lowercase names will have lower case entries in the enum.</p>
+<p>To include the getNamedOperandIdx() function in your backend, you will need
+to define a few preprocessor macros in XXXInstrInfo.cpp and XXXInstrInfo.h.
+For example:</p>
+<p>XXXInstrInfo.cpp:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#define GET_INSTRINFO_NAMED_OPS </span><span class="c1">// For getNamedOperandIdx() function</span>
+<span class="cp">#include</span> <span class="cpf">"XXXGenInstrInfo.inc"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>XXXInstrInfo.h:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#define GET_INSTRINFO_OPERAND_ENUM </span><span class="c1">// For OpName enum</span>
+<span class="cp">#include</span> <span class="cpf">"XXXGenInstrInfo.inc"</span><span class="cp"></span>
+
+<span class="k">namespace</span> <span class="n">XXX</span> <span class="p">{</span>
+  <span class="kt">int16_t</span> <span class="n">getNamedOperandIdx</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">Opcode</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">NamedIndex</span><span class="p">);</span>
+<span class="p">}</span> <span class="c1">// End namespace XXX</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="instruction-operand-types">
+<h4><a class="toc-backref" href="#id17">Instruction Operand Types</a><a class="headerlink" href="#instruction-operand-types" title="Permalink to this headline">Â¶</a></h4>
+<p>TableGen will also generate an enumeration consisting of all named Operand
+types defined in the backend, in the llvm::XXX::OpTypes namespace.
+Some common immediate Operand types (for instance i8, i32, i64, f32, f64)
+are defined for all targets in <code class="docutils literal"><span class="pre">include/llvm/Target/Target.td</span></code>, and are
+available in each Targetâs OpTypes enum.  Also, only named Operand types appear
+in the enumeration: anonymous types are ignored.
+For example, the X86 backend defines <code class="docutils literal"><span class="pre">brtarget</span></code> and <code class="docutils literal"><span class="pre">brtarget8</span></code>, both
+instances of the TableGen <code class="docutils literal"><span class="pre">Operand</span></code> class, which represent branch target
+operands:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def brtarget : Operand<OtherVT>;
+def brtarget8 : Operand<OtherVT>;
+</pre></div>
+</div>
+<p>This results in:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="n">X86</span> <span class="p">{</span>
+<span class="k">namespace</span> <span class="n">OpTypes</span> <span class="p">{</span>
+<span class="k">enum</span> <span class="n">OperandType</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="n">brtarget</span><span class="p">,</span>
+  <span class="n">brtarget8</span><span class="p">,</span>
+  <span class="p">...</span>
+  <span class="n">i32imm</span><span class="p">,</span>
+  <span class="n">i64imm</span><span class="p">,</span>
+  <span class="p">...</span>
+  <span class="n">OPERAND_TYPE_LIST_END</span>
+<span class="p">}</span> <span class="c1">// End namespace OpTypes</span>
+<span class="p">}</span> <span class="c1">// End namespace X86</span>
+</pre></div>
+</div>
+<p>In typical TableGen fashion, to use the enum, you will need to define a
+preprocessor macro:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#define GET_INSTRINFO_OPERAND_TYPES_ENUM </span><span class="c1">// For OpTypes enum</span>
+<span class="cp">#include</span> <span class="cpf">"XXXGenInstrInfo.inc"</span><span class="cp"></span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="instruction-scheduling">
+<h3><a class="toc-backref" href="#id18">Instruction Scheduling</a><a class="headerlink" href="#instruction-scheduling" title="Permalink to this headline">Â¶</a></h3>
+<p>Instruction itineraries can be queried using MCDesc::getSchedClass(). The
+value can be named by an enumemation in llvm::XXX::Sched namespace generated
+by TableGen in XXXGenInstrInfo.inc. The name of the schedule classes are
+the same as provided in XXXSchedule.td plus a default NoItinerary class.</p>
+</div>
+<div class="section" id="instruction-relation-mapping">
+<h3><a class="toc-backref" href="#id19">Instruction Relation Mapping</a><a class="headerlink" href="#instruction-relation-mapping" title="Permalink to this headline">Â¶</a></h3>
+<p>This TableGen feature is used to relate instructions with each other.  It is
+particularly useful when you have multiple instruction formats and need to
+switch between them after instruction selection.  This entire feature is driven
+by relation models which can be defined in <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> files
+according to the target-specific instruction set.  Relation models are defined
+using <code class="docutils literal"><span class="pre">InstrMapping</span></code> class as a base.  TableGen parses all the models
+and generates instruction relation maps using the specified information.
+Relation maps are emitted as tables in the <code class="docutils literal"><span class="pre">XXXGenInstrInfo.inc</span></code> file
+along with the functions to query them.  For the detailed information on how to
+use this feature, please refer to <a class="reference internal" href="HowToUseInstrMappings.html"><span class="doc">How To Use Instruction Mappings</span></a>.</p>
+</div>
+<div class="section" id="implement-a-subclass-of-targetinstrinfo">
+<h3><a class="toc-backref" href="#id20">Implement a subclass of <code class="docutils literal"><span class="pre">TargetInstrInfo</span></code></a><a class="headerlink" href="#implement-a-subclass-of-targetinstrinfo" title="Permalink to this headline">Â¶</a></h3>
+<p>The final step is to hand code portions of <code class="docutils literal"><span class="pre">XXXInstrInfo</span></code>, which implements
+the interface described in <code class="docutils literal"><span class="pre">TargetInstrInfo.h</span></code> (see <a class="reference internal" href="CodeGenerator.html#targetinstrinfo"><span class="std std-ref">The TargetInstrInfo class</span></a>).
+These functions return <code class="docutils literal"><span class="pre">0</span></code> or a Boolean or they assert, unless overridden.
+Hereâs a list of functions that are overridden for the SPARC implementation in
+<code class="docutils literal"><span class="pre">SparcInstrInfo.cpp</span></code>:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">isLoadFromStackSlot</span></code> â If the specified machine instruction is a direct
+load from a stack slot, return the register number of the destination and the
+<code class="docutils literal"><span class="pre">FrameIndex</span></code> of the stack slot.</li>
+<li><code class="docutils literal"><span class="pre">isStoreToStackSlot</span></code> â If the specified machine instruction is a direct
+store to a stack slot, return the register number of the destination and the
+<code class="docutils literal"><span class="pre">FrameIndex</span></code> of the stack slot.</li>
+<li><code class="docutils literal"><span class="pre">copyPhysReg</span></code> â Copy values between a pair of physical registers.</li>
+<li><code class="docutils literal"><span class="pre">storeRegToStackSlot</span></code> â Store a register value to a stack slot.</li>
+<li><code class="docutils literal"><span class="pre">loadRegFromStackSlot</span></code> â Load a register value from a stack slot.</li>
+<li><code class="docutils literal"><span class="pre">storeRegToAddr</span></code> â Store a register value to memory.</li>
+<li><code class="docutils literal"><span class="pre">loadRegFromAddr</span></code> â Load a register value from memory.</li>
+<li><code class="docutils literal"><span class="pre">foldMemoryOperand</span></code> â Attempt to combine instructions of any load or
+store instruction for the specified operand(s).</li>
+</ul>
+</div>
+<div class="section" id="branch-folding-and-if-conversion">
+<h3><a class="toc-backref" href="#id21">Branch Folding and If Conversion</a><a class="headerlink" href="#branch-folding-and-if-conversion" title="Permalink to this headline">Â¶</a></h3>
+<p>Performance can be improved by combining instructions or by eliminating
+instructions that are never reached.  The <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> method in
+<code class="docutils literal"><span class="pre">XXXInstrInfo</span></code> may be implemented to examine conditional instructions and
+remove unnecessary instructions.  <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> looks at the end of a
+machine basic block (MBB) for opportunities for improvement, such as branch
+folding and if conversion.  The <code class="docutils literal"><span class="pre">BranchFolder</span></code> and <code class="docutils literal"><span class="pre">IfConverter</span></code> machine
+function passes (see the source files <code class="docutils literal"><span class="pre">BranchFolding.cpp</span></code> and
+<code class="docutils literal"><span class="pre">IfConversion.cpp</span></code> in the <code class="docutils literal"><span class="pre">lib/CodeGen</span></code> directory) call <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code>
+to improve the control flow graph that represents the instructions.</p>
+<p>Several implementations of <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (for ARM, Alpha, and X86) can be
+examined as models for your own <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> implementation.  Since SPARC
+does not implement a useful <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code>, the ARM target implementation is
+shown below.</p>
+<p><code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> returns a Boolean value and takes four parameters:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">MachineBasicBlock</span> <span class="pre">&MBB</span></code> â The incoming block to be examined.</li>
+<li><code class="docutils literal"><span class="pre">MachineBasicBlock</span> <span class="pre">*&TBB</span></code> â A destination block that is returned.  For a
+conditional branch that evaluates to true, <code class="docutils literal"><span class="pre">TBB</span></code> is the destination.</li>
+<li><code class="docutils literal"><span class="pre">MachineBasicBlock</span> <span class="pre">*&FBB</span></code> â For a conditional branch that evaluates to
+false, <code class="docutils literal"><span class="pre">FBB</span></code> is returned as the destination.</li>
+<li><code class="docutils literal"><span class="pre">std::vector<MachineOperand></span> <span class="pre">&Cond</span></code> â List of operands to evaluate a
+condition for a conditional branch.</li>
+</ul>
+<p>In the simplest case, if a block ends without a branch, then it falls through
+to the successor block.  No destination blocks are specified for either <code class="docutils literal"><span class="pre">TBB</span></code>
+or <code class="docutils literal"><span class="pre">FBB</span></code>, so both parameters return <code class="docutils literal"><span class="pre">NULL</span></code>.  The start of the
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (see code below for the ARM target) shows the function
+parameters and the code for the simplest case.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">bool</span> <span class="n">ARMInstrInfo</span><span class="o">::</span><span class="n">AnalyzeBranch</span><span class="p">(</span><span class="n">MachineBasicBlock</span> <span class="o">&</span><span class="n">MBB</span><span class="p">,</span>
+                                 <span class="n">MachineBasicBlock</span> <span class="o">*&</span><span class="n">TBB</span><span class="p">,</span>
+                                 <span class="n">MachineBasicBlock</span> <span class="o">*&</span><span class="n">FBB</span><span class="p">,</span>
+                                 <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MachineOperand</span><span class="o">></span> <span class="o">&</span><span class="n">Cond</span><span class="p">)</span> <span class="k">const</span>
+<span class="p">{</span>
+  <span class="n">MachineBasicBlock</span><span class="o">::</span><span class="n">iterator</span> <span class="n">I</span> <span class="o">=</span> <span class="n">MBB</span><span class="p">.</span><span class="n">end</span><span class="p">();</span>
+  <span class="k">if</span> <span class="p">(</span><span class="n">I</span> <span class="o">==</span> <span class="n">MBB</span><span class="p">.</span><span class="n">begin</span><span class="p">()</span> <span class="o">||</span> <span class="o">!</span><span class="n">isUnpredicatedTerminator</span><span class="p">(</span><span class="o">--</span><span class="n">I</span><span class="p">))</span>
+    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>If a block ends with a single unconditional branch instruction, then
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (shown below) should return the destination of that branch in
+the <code class="docutils literal"><span class="pre">TBB</span></code> parameter.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span> <span class="o">||</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">TBB</span> <span class="o">=</span> <span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>If a block ends with two unconditional branches, then the second branch is
+never reached.  In that situation, as shown below, remove the last branch
+instruction and return the penultimate branch in the <code class="docutils literal"><span class="pre">TBB</span></code> parameter.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">((</span><span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span> <span class="o">||</span> <span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">)</span> <span class="o">&&</span>
+    <span class="p">(</span><span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span> <span class="o">||</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">))</span> <span class="p">{</span>
+  <span class="n">TBB</span> <span class="o">=</span> <span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="n">I</span> <span class="o">=</span> <span class="n">LastInst</span><span class="p">;</span>
+  <span class="n">I</span><span class="o">-></span><span class="n">eraseFromParent</span><span class="p">();</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>A block may end with a single conditional branch instruction that falls through
+to successor block if the condition evaluates to false.  In that case,
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (shown below) should return the destination of that
+conditional branch in the <code class="docutils literal"><span class="pre">TBB</span></code> parameter and a list of operands in the
+<code class="docutils literal"><span class="pre">Cond</span></code> parameter to evaluate the condition.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">Bcc</span> <span class="o">||</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tBcc</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// Block ends with fall-through condbranch.</span>
+  <span class="n">TBB</span> <span class="o">=</span> <span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">1</span><span class="p">));</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>If a block ends with both a conditional branch and an ensuing unconditional
+branch, then <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (shown below) should return the conditional
+branch destination (assuming it corresponds to a conditional evaluation of
+â<code class="docutils literal"><span class="pre">true</span></code>â) in the <code class="docutils literal"><span class="pre">TBB</span></code> parameter and the unconditional branch destination
+in the <code class="docutils literal"><span class="pre">FBB</span></code> (corresponding to a conditional evaluation of â<code class="docutils literal"><span class="pre">false</span></code>â).  A
+list of operands to evaluate the condition should be returned in the <code class="docutils literal"><span class="pre">Cond</span></code>
+parameter.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">unsigned</span> <span class="n">SecondLastOpc</span> <span class="o">=</span> <span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOpcode</span><span class="p">();</span>
+
+<span class="k">if</span> <span class="p">((</span><span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">Bcc</span> <span class="o">&&</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span><span class="p">)</span> <span class="o">||</span>
+    <span class="p">(</span><span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tBcc</span> <span class="o">&&</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">))</span> <span class="p">{</span>
+  <span class="n">TBB</span> <span class="o">=</span>  <span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">1</span><span class="p">));</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
+  <span class="n">FBB</span> <span class="o">=</span> <span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>For the last two cases (ending with a single conditional branch or ending with
+one conditional and one unconditional branch), the operands returned in the
+<code class="docutils literal"><span class="pre">Cond</span></code> parameter can be passed to methods of other instructions to create new
+branches or perform other operations.  An implementation of <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code>
+requires the helper methods <code class="docutils literal"><span class="pre">RemoveBranch</span></code> and <code class="docutils literal"><span class="pre">InsertBranch</span></code> to manage
+subsequent operations.</p>
+<p><code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> should return false indicating success in most circumstances.
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> should only return true when the method is stumped about what
+to do, for example, if a block has three terminating branches.
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> may return true if it encounters a terminator it cannot
+handle, such as an indirect branch.</p>
+</div>
+</div>
+<div class="section" id="instruction-selector">
+<span id="id2"></span><h2><a class="toc-backref" href="#id22">Instruction Selector</a><a class="headerlink" href="#instruction-selector" title="Permalink to this headline">Â¶</a></h2>
+<p>LLVM uses a <code class="docutils literal"><span class="pre">SelectionDAG</span></code> to represent LLVM IR instructions, and nodes of
+the <code class="docutils literal"><span class="pre">SelectionDAG</span></code> ideally represent native target instructions.  During code
+generation, instruction selection passes are performed to convert non-native
+DAG instructions into native target-specific instructions.  The pass described
+in <code class="docutils literal"><span class="pre">XXXISelDAGToDAG.cpp</span></code> is used to match patterns and perform DAG-to-DAG
+instruction selection.  Optionally, a pass may be defined (in
+<code class="docutils literal"><span class="pre">XXXBranchSelector.cpp</span></code>) to perform similar DAG-to-DAG operations for branch
+instructions.  Later, the code in <code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code> replaces or removes
+operations and data types not supported natively (legalizes) in a
+<code class="docutils literal"><span class="pre">SelectionDAG</span></code>.</p>
+<p>TableGen generates code for instruction selection using the following target
+description input files:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> â Contains definitions of instructions in a
+target-specific instruction set, generates <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code>, which is
+included in <code class="docutils literal"><span class="pre">XXXISelDAGToDAG.cpp</span></code>.</li>
+<li><code class="docutils literal"><span class="pre">XXXCallingConv.td</span></code> â Contains the calling and return value conventions
+for the target architecture, and it generates <code class="docutils literal"><span class="pre">XXXGenCallingConv.inc</span></code>,
+which is included in <code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code>.</li>
+</ul>
+<p>The implementation of an instruction selection pass must include a header that
+declares the <code class="docutils literal"><span class="pre">FunctionPass</span></code> class or a subclass of <code class="docutils literal"><span class="pre">FunctionPass</span></code>.  In
+<code class="docutils literal"><span class="pre">XXXTargetMachine.cpp</span></code>, a Pass Manager (PM) should add each instruction
+selection pass into the queue of passes to run.</p>
+<p>The LLVM static compiler (<code class="docutils literal"><span class="pre">llc</span></code>) is an excellent tool for visualizing the
+contents of DAGs.  To display the <code class="docutils literal"><span class="pre">SelectionDAG</span></code> before or after specific
+processing phases, use the command line options for <code class="docutils literal"><span class="pre">llc</span></code>, described at
+<a class="reference internal" href="CodeGenerator.html#selectiondag-process"><span class="std std-ref">SelectionDAG Instruction Selection Process</span></a>.</p>
+<p>To describe instruction selector behavior, you should add patterns for lowering
+LLVM code into a <code class="docutils literal"><span class="pre">SelectionDAG</span></code> as the last parameter of the instruction
+definitions in <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code>.  For example, in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>,
+this entry defines a register store operation, and the last parameter describes
+a pattern with the store DAG operator.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def STrr  : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src),
+                 "st $src, [$addr]", [(store i32:$src, ADDRrr:$addr)]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">ADDRrr</span></code> is a memory mode that is also defined in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>;
+</pre></div>
+</div>
+<p>The definition of <code class="docutils literal"><span class="pre">ADDRrr</span></code> refers to <code class="docutils literal"><span class="pre">SelectADDRrr</span></code>, which is a function
+defined in an implementation of the Instructor Selector (such as
+<code class="docutils literal"><span class="pre">SparcISelDAGToDAG.cpp</span></code>).</p>
+<p>In <code class="docutils literal"><span class="pre">lib/Target/TargetSelectionDAG.td</span></code>, the DAG operator for store is defined
+below:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def store : PatFrag<(ops node:$val, node:$ptr),
+                    (st node:$val, node:$ptr), [{
+  if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N))
+    return !ST->isTruncatingStore() &&
+           ST->getAddressingMode() == ISD::UNINDEXED;
+  return false;
+}]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> also generates (in <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code>) the
+<code class="docutils literal"><span class="pre">SelectCode</span></code> method that is used to call the appropriate processing method
+for an instruction.  In this example, <code class="docutils literal"><span class="pre">SelectCode</span></code> calls <code class="docutils literal"><span class="pre">Select_ISD_STORE</span></code>
+for the <code class="docutils literal"><span class="pre">ISD::STORE</span></code> opcode.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SDNode</span> <span class="o">*</span><span class="nf">SelectCode</span><span class="p">(</span><span class="n">SDValue</span> <span class="n">N</span><span class="p">)</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="n">MVT</span><span class="o">::</span><span class="n">ValueType</span> <span class="n">NVT</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getNode</span><span class="p">()</span><span class="o">-></span><span class="n">getValueType</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
+  <span class="k">switch</span> <span class="p">(</span><span class="n">N</span><span class="p">.</span><span class="n">getOpcode</span><span class="p">())</span> <span class="p">{</span>
+  <span class="k">case</span> <span class="n">ISD</span><span class="o">::</span><span class="nl">STORE</span><span class="p">:</span> <span class="p">{</span>
+    <span class="k">switch</span> <span class="p">(</span><span class="n">NVT</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">default</span><span class="o">:</span>
+      <span class="k">return</span> <span class="n">Select_ISD_STORE</span><span class="p">(</span><span class="n">N</span><span class="p">);</span>
+      <span class="k">break</span><span class="p">;</span>
+    <span class="p">}</span>
+    <span class="k">break</span><span class="p">;</span>
+  <span class="p">}</span>
+  <span class="p">...</span>
+</pre></div>
+</div>
+<p>The pattern for <code class="docutils literal"><span class="pre">STrr</span></code> is matched, so elsewhere in <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code>,
+code for <code class="docutils literal"><span class="pre">STrr</span></code> is created for <code class="docutils literal"><span class="pre">Select_ISD_STORE</span></code>.  The <code class="docutils literal"><span class="pre">Emit_22</span></code> method
+is also generated in <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code> to complete the processing of this
+instruction.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SDNode</span> <span class="o">*</span><span class="nf">Select_ISD_STORE</span><span class="p">(</span><span class="k">const</span> <span class="n">SDValue</span> <span class="o">&</span><span class="n">N</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">SDValue</span> <span class="n">Chain</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
+  <span class="k">if</span> <span class="p">(</span><span class="n">Predicate_store</span><span class="p">(</span><span class="n">N</span><span class="p">.</span><span class="n">getNode</span><span class="p">()))</span> <span class="p">{</span>
+    <span class="n">SDValue</span> <span class="n">N1</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
+    <span class="n">SDValue</span> <span class="n">N2</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
+    <span class="n">SDValue</span> <span class="n">CPTmp0</span><span class="p">;</span>
+    <span class="n">SDValue</span> <span class="n">CPTmp1</span><span class="p">;</span>
+
+    <span class="c1">// Pattern: (st:void i32:i32:$src,</span>
+    <span class="c1">//           ADDRrr:i32:$addr)<<P:Predicate_store>></span>
+    <span class="c1">// Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src)</span>
+    <span class="c1">// Pattern complexity = 13  cost = 1  size = 0</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">SelectADDRrr</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N2</span><span class="p">,</span> <span class="n">CPTmp0</span><span class="p">,</span> <span class="n">CPTmp1</span><span class="p">)</span> <span class="o">&&</span>
+        <span class="n">N1</span><span class="p">.</span><span class="n">getNode</span><span class="p">()</span><span class="o">-></span><span class="n">getValueType</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span> <span class="o">&&</span>
+        <span class="n">N2</span><span class="p">.</span><span class="n">getNode</span><span class="p">()</span><span class="o">-></span><span class="n">getValueType</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">)</span> <span class="p">{</span>
+      <span class="k">return</span> <span class="n">Emit_22</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">STrr</span><span class="p">,</span> <span class="n">CPTmp0</span><span class="p">,</span> <span class="n">CPTmp1</span><span class="p">);</span>
+    <span class="p">}</span>
+<span class="p">...</span>
+</pre></div>
+</div>
+<div class="section" id="the-selectiondag-legalize-phase">
+<h3><a class="toc-backref" href="#id23">The SelectionDAG Legalize Phase</a><a class="headerlink" href="#the-selectiondag-legalize-phase" title="Permalink to this headline">Â¶</a></h3>
+<p>The Legalize phase converts a DAG to use types and operations that are natively
+supported by the target.  For natively unsupported types and operations, you
+need to add code to the target-specific <code class="docutils literal"><span class="pre">XXXTargetLowering</span></code> implementation to
+convert unsupported types and operations to supported ones.</p>
+<p>In the constructor for the <code class="docutils literal"><span class="pre">XXXTargetLowering</span></code> class, first use the
+<code class="docutils literal"><span class="pre">addRegisterClass</span></code> method to specify which types are supported and which
+register classes are associated with them.  The code for the register classes
+are generated by TableGen from <code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> and placed in
+<code class="docutils literal"><span class="pre">XXXGenRegisterInfo.h.inc</span></code>.  For example, the implementation of the
+constructor for the SparcTargetLowering class (in <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code>)
+starts with the following code:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">addRegisterClass</span><span class="p">(</span><span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">IntRegsRegisterClass</span><span class="p">);</span>
+<span class="n">addRegisterClass</span><span class="p">(</span><span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">FPRegsRegisterClass</span><span class="p">);</span>
+<span class="n">addRegisterClass</span><span class="p">(</span><span class="n">MVT</span><span class="o">::</span><span class="n">f64</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">DFPRegsRegisterClass</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>You should examine the node types in the <code class="docutils literal"><span class="pre">ISD</span></code> namespace
+(<code class="docutils literal"><span class="pre">include/llvm/CodeGen/SelectionDAGNodes.h</span></code>) and determine which operations
+the target natively supports.  For operations that do <strong>not</strong> have native
+support, add a callback to the constructor for the <code class="docutils literal"><span class="pre">XXXTargetLowering</span></code> class,
+so the instruction selection process knows what to do.  The <code class="docutils literal"><span class="pre">TargetLowering</span></code>
+class callback methods (declared in <code class="docutils literal"><span class="pre">llvm/Target/TargetLowering.h</span></code>) are:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">setOperationAction</span></code> â General operation.</li>
+<li><code class="docutils literal"><span class="pre">setLoadExtAction</span></code> â Load with extension.</li>
+<li><code class="docutils literal"><span class="pre">setTruncStoreAction</span></code> â Truncating store.</li>
+<li><code class="docutils literal"><span class="pre">setIndexedLoadAction</span></code> â Indexed load.</li>
+<li><code class="docutils literal"><span class="pre">setIndexedStoreAction</span></code> â Indexed store.</li>
+<li><code class="docutils literal"><span class="pre">setConvertAction</span></code> â Type conversion.</li>
+<li><code class="docutils literal"><span class="pre">setCondCodeAction</span></code> â Support for a given condition code.</li>
+</ul>
+<p>Note: on older releases, <code class="docutils literal"><span class="pre">setLoadXAction</span></code> is used instead of
+<code class="docutils literal"><span class="pre">setLoadExtAction</span></code>.  Also, on older releases, <code class="docutils literal"><span class="pre">setCondCodeAction</span></code> may not
+be supported.  Examine your release to see what methods are specifically
+supported.</p>
+<p>These callbacks are used to determine that an operation does or does not work
+with a specified type (or types).  And in all cases, the third parameter is a
+<code class="docutils literal"><span class="pre">LegalAction</span></code> type enum value: <code class="docutils literal"><span class="pre">Promote</span></code>, <code class="docutils literal"><span class="pre">Expand</span></code>, <code class="docutils literal"><span class="pre">Custom</span></code>, or
+<code class="docutils literal"><span class="pre">Legal</span></code>.  <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code> contains examples of all four
+<code class="docutils literal"><span class="pre">LegalAction</span></code> values.</p>
+<div class="section" id="promote">
+<h4><a class="toc-backref" href="#id24">Promote</a><a class="headerlink" href="#promote" title="Permalink to this headline">Â¶</a></h4>
+<p>For an operation without native support for a given type, the specified type
+may be promoted to a larger type that is supported.  For example, SPARC does
+not support a sign-extending load for Boolean values (<code class="docutils literal"><span class="pre">i1</span></code> type), so in
+<code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code> the third parameter below, <code class="docutils literal"><span class="pre">Promote</span></code>, changes
+<code class="docutils literal"><span class="pre">i1</span></code> type values to a large type before loading.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setLoadExtAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">SEXTLOAD</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i1</span><span class="p">,</span> <span class="n">Promote</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="expand">
+<h4><a class="toc-backref" href="#id25">Expand</a><a class="headerlink" href="#expand" title="Permalink to this headline">Â¶</a></h4>
+<p>For a type without native support, a value may need to be broken down further,
+rather than promoted.  For an operation without native support, a combination
+of other operations may be used to similar effect.  In SPARC, the
+floating-point sine and cosine trig operations are supported by expansion to
+other operations, as indicated by the third parameter, <code class="docutils literal"><span class="pre">Expand</span></code>, to
+<code class="docutils literal"><span class="pre">setOperationAction</span></code>:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">FSIN</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">Expand</span><span class="p">);</span>
+<span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">FCOS</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">Expand</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="custom">
+<h4><a class="toc-backref" href="#id26">Custom</a><a class="headerlink" href="#custom" title="Permalink to this headline">Â¶</a></h4>
+<p>For some operations, simple type promotion or operation expansion may be
+insufficient.  In some cases, a special intrinsic function must be implemented.</p>
+<p>For example, a constant value may require special treatment, or an operation
+may require spilling and restoring registers in the stack and working with
+register allocators.</p>
+<p>As seen in <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code> code below, to perform a type conversion
+from a floating point value to a signed integer, first the
+<code class="docutils literal"><span class="pre">setOperationAction</span></code> should be called with <code class="docutils literal"><span class="pre">Custom</span></code> as the third parameter:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">FP_TO_SINT</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Custom</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>In the <code class="docutils literal"><span class="pre">LowerOperation</span></code> method, for each <code class="docutils literal"><span class="pre">Custom</span></code> operation, a case
+statement should be added to indicate what function to call.  In the following
+code, an <code class="docutils literal"><span class="pre">FP_TO_SINT</span></code> opcode will call the <code class="docutils literal"><span class="pre">LowerFP_TO_SINT</span></code> method:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SDValue</span> <span class="n">SparcTargetLowering</span><span class="o">::</span><span class="n">LowerOperation</span><span class="p">(</span><span class="n">SDValue</span> <span class="n">Op</span><span class="p">,</span> <span class="n">SelectionDAG</span> <span class="o">&</span><span class="n">DAG</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">switch</span> <span class="p">(</span><span class="n">Op</span><span class="p">.</span><span class="n">getOpcode</span><span class="p">())</span> <span class="p">{</span>
+  <span class="k">case</span> <span class="n">ISD</span><span class="o">::</span><span class="nl">FP_TO_SINT</span><span class="p">:</span> <span class="k">return</span> <span class="n">LowerFP_TO_SINT</span><span class="p">(</span><span class="n">Op</span><span class="p">,</span> <span class="n">DAG</span><span class="p">);</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Finally, the <code class="docutils literal"><span class="pre">LowerFP_TO_SINT</span></code> method is implemented, using an FP register to
+convert the floating-point value to an integer.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">SDValue</span> <span class="nf">LowerFP_TO_SINT</span><span class="p">(</span><span class="n">SDValue</span> <span class="n">Op</span><span class="p">,</span> <span class="n">SelectionDAG</span> <span class="o">&</span><span class="n">DAG</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">Op</span><span class="p">.</span><span class="n">getValueType</span><span class="p">()</span> <span class="o">==</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">);</span>
+  <span class="n">Op</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">.</span><span class="n">getNode</span><span class="p">(</span><span class="n">SPISD</span><span class="o">::</span><span class="n">FTOI</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">Op</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span>
+  <span class="k">return</span> <span class="n">DAG</span><span class="p">.</span><span class="n">getNode</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">BITCAST</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Op</span><span class="p">);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="legal">
+<h4><a class="toc-backref" href="#id27">Legal</a><a class="headerlink" href="#legal" title="Permalink to this headline">Â¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">Legal</span></code> <code class="docutils literal"><span class="pre">LegalizeAction</span></code> enum value simply indicates that an operation
+<strong>is</strong> natively supported.  <code class="docutils literal"><span class="pre">Legal</span></code> represents the default condition, so it
+is rarely used.  In <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code>, the action for <code class="docutils literal"><span class="pre">CTPOP</span></code> (an
+operation to count the bits set in an integer) is natively supported only for
+SPARC v9.  The following code enables the <code class="docutils literal"><span class="pre">Expand</span></code> conversion technique for
+non-v9 SPARC implementations.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">CTPOP</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Expand</span><span class="p">);</span>
+<span class="p">...</span>
+<span class="k">if</span> <span class="p">(</span><span class="n">TM</span><span class="p">.</span><span class="n">getSubtarget</span><span class="o"><</span><span class="n">SparcSubtarget</span><span class="o">></span><span class="p">().</span><span class="n">isV9</span><span class="p">())</span>
+  <span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">CTPOP</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Legal</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="calling-conventions">
+<h3><a class="toc-backref" href="#id28">Calling Conventions</a><a class="headerlink" href="#calling-conventions" title="Permalink to this headline">Â¶</a></h3>
+<p>To support target-specific calling conventions, <code class="docutils literal"><span class="pre">XXXGenCallingConv.td</span></code> uses
+interfaces (such as <code class="docutils literal"><span class="pre">CCIfType</span></code> and <code class="docutils literal"><span class="pre">CCAssignToReg</span></code>) that are defined in
+<code class="docutils literal"><span class="pre">lib/Target/TargetCallingConv.td</span></code>.  TableGen can take the target descriptor
+file <code class="docutils literal"><span class="pre">XXXGenCallingConv.td</span></code> and generate the header file
+<code class="docutils literal"><span class="pre">XXXGenCallingConv.inc</span></code>, which is typically included in
+<code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code>.  You can use the interfaces in
+<code class="docutils literal"><span class="pre">TargetCallingConv.td</span></code> to specify:</p>
+<ul class="simple">
+<li>The order of parameter allocation.</li>
+<li>Where parameters and return values are placed (that is, on the stack or in
+registers).</li>
+<li>Which registers may be used.</li>
+<li>Whether the caller or callee unwinds the stack.</li>
+</ul>
+<p>The following example demonstrates the use of the <code class="docutils literal"><span class="pre">CCIfType</span></code> and
+<code class="docutils literal"><span class="pre">CCAssignToReg</span></code> interfaces.  If the <code class="docutils literal"><span class="pre">CCIfType</span></code> predicate is true (that is,
+if the current argument is of type <code class="docutils literal"><span class="pre">f32</span></code> or <code class="docutils literal"><span class="pre">f64</span></code>), then the action is
+performed.  In this case, the <code class="docutils literal"><span class="pre">CCAssignToReg</span></code> action assigns the argument
+value to the first available register: either <code class="docutils literal"><span class="pre">R0</span></code> or <code class="docutils literal"><span class="pre">R1</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">SparcCallingConv.td</span></code> contains definitions for a target-specific return-value
+calling convention (<code class="docutils literal"><span class="pre">RetCC_Sparc32</span></code>) and a basic 32-bit C calling convention
+(<code class="docutils literal"><span class="pre">CC_Sparc32</span></code>).  The definition of <code class="docutils literal"><span class="pre">RetCC_Sparc32</span></code> (shown below) indicates
+which registers are used for specified scalar return types.  A single-precision
+float is returned to register <code class="docutils literal"><span class="pre">F0</span></code>, and a double-precision float goes to
+register <code class="docutils literal"><span class="pre">D0</span></code>.  A 32-bit integer is returned in register <code class="docutils literal"><span class="pre">I0</span></code> or <code class="docutils literal"><span class="pre">I1</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def RetCC_Sparc32 : CallingConv<[
+  CCIfType<[i32], CCAssignToReg<[I0, I1]>>,
+  CCIfType<[f32], CCAssignToReg<[F0]>>,
+  CCIfType<[f64], CCAssignToReg<[D0]>>
+]>;
+</pre></div>
+</div>
+<p>The definition of <code class="docutils literal"><span class="pre">CC_Sparc32</span></code> in <code class="docutils literal"><span class="pre">SparcCallingConv.td</span></code> introduces
+<code class="docutils literal"><span class="pre">CCAssignToStack</span></code>, which assigns the value to a stack slot with the specified
+size and alignment.  In the example below, the first parameter, 4, indicates
+the size of the slot, and the second parameter, also 4, indicates the stack
+alignment along 4-byte units.  (Special cases: if size is zero, then the ABI
+size is used; if alignment is zero, then the ABI alignment is used.)</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def CC_Sparc32 : CallingConv<[
+  // All arguments get passed in integer registers if there is space.
+  CCIfType<[i32, f32, f64], CCAssignToReg<[I0, I1, I2, I3, I4, I5]>>,
+  CCAssignToStack<4, 4>
+]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">CCDelegateTo</span></code> is another commonly used interface, which tries to find a
+specified sub-calling convention, and, if a match is found, it is invoked.  In
+the following example (in <code class="docutils literal"><span class="pre">X86CallingConv.td</span></code>), the definition of
+<code class="docutils literal"><span class="pre">RetCC_X86_32_C</span></code> ends with <code class="docutils literal"><span class="pre">CCDelegateTo</span></code>.  After the current value is
+assigned to the register <code class="docutils literal"><span class="pre">ST0</span></code> or <code class="docutils literal"><span class="pre">ST1</span></code>, the <code class="docutils literal"><span class="pre">RetCC_X86Common</span></code> is
+invoked.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def RetCC_X86_32_C : CallingConv<[
+  CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>,
+  CCIfType<[f64], CCAssignToReg<[ST0, ST1]>>,
+  CCDelegateTo<RetCC_X86Common>
+]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">CCIfCC</span></code> is an interface that attempts to match the given name to the current
+calling convention.  If the name identifies the current calling convention,
+then a specified action is invoked.  In the following example (in
+<code class="docutils literal"><span class="pre">X86CallingConv.td</span></code>), if the <code class="docutils literal"><span class="pre">Fast</span></code> calling convention is in use, then
+<code class="docutils literal"><span class="pre">RetCC_X86_32_Fast</span></code> is invoked.  If the <code class="docutils literal"><span class="pre">SSECall</span></code> calling convention is in
+use, then <code class="docutils literal"><span class="pre">RetCC_X86_32_SSE</span></code> is invoked.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def RetCC_X86_32 : CallingConv<[
+  CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>,
+  CCIfCC<"CallingConv::X86_SSECall", CCDelegateTo<RetCC_X86_32_SSE>>,
+  CCDelegateTo<RetCC_X86_32_C>
+]>;
+</pre></div>
+</div>
+<p>Other calling convention interfaces include:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">CCIf</span> <span class="pre"><predicate,</span> <span class="pre">action></span></code> â If the predicate matches, apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCIfInReg</span> <span class="pre"><action></span></code> â If the argument is marked with the â<code class="docutils literal"><span class="pre">inreg</span></code>â
+attribute, then apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCIfNest</span> <span class="pre"><action></span></code> â If the argument is marked with the â<code class="docutils literal"><span class="pre">nest</span></code>â
+attribute, then apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCIfNotVarArg</span> <span class="pre"><action></span></code> â If the current function does not take a
+variable number of arguments, apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCAssignToRegWithShadow</span> <span class="pre"><registerList,</span> <span class="pre">shadowList></span></code> â similar to
+<code class="docutils literal"><span class="pre">CCAssignToReg</span></code>, but with a shadow list of registers.</li>
+<li><code class="docutils literal"><span class="pre">CCPassByVal</span> <span class="pre"><size,</span> <span class="pre">align></span></code> â Assign value to a stack slot with the
+minimum specified size and alignment.</li>
+<li><code class="docutils literal"><span class="pre">CCPromoteToType</span> <span class="pre"><type></span></code> â Promote the current value to the specified
+type.</li>
+<li><code class="docutils literal"><span class="pre">CallingConv</span> <span class="pre"><[actions]></span></code> â Define each calling convention that is
+supported.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="assembly-printer">
+<h2><a class="toc-backref" href="#id29">Assembly Printer</a><a class="headerlink" href="#assembly-printer" title="Permalink to this headline">Â¶</a></h2>
+<p>During the code emission stage, the code generator may utilize an LLVM pass to
+produce assembly output.  To do this, you want to implement the code for a
+printer that converts LLVM IR to a GAS-format assembly language for your target
+machine, using the following steps:</p>
+<ul class="simple">
+<li>Define all the assembly strings for your target, adding them to the
+instructions defined in the <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> file.  (See
+<a class="reference internal" href="#instruction-set"><span class="std std-ref">Instruction Set</span></a>.)  TableGen will produce an output file
+(<code class="docutils literal"><span class="pre">XXXGenAsmWriter.inc</span></code>) with an implementation of the <code class="docutils literal"><span class="pre">printInstruction</span></code>
+method for the <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code> class.</li>
+<li>Write <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.h</span></code>, which contains the bare-bones declaration of
+the <code class="docutils literal"><span class="pre">XXXTargetAsmInfo</span></code> class (a subclass of <code class="docutils literal"><span class="pre">TargetAsmInfo</span></code>).</li>
+<li>Write <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.cpp</span></code>, which contains target-specific values for
+<code class="docutils literal"><span class="pre">TargetAsmInfo</span></code> properties and sometimes new implementations for methods.</li>
+<li>Write <code class="docutils literal"><span class="pre">XXXAsmPrinter.cpp</span></code>, which implements the <code class="docutils literal"><span class="pre">AsmPrinter</span></code> class that
+performs the LLVM-to-assembly conversion.</li>
+</ul>
+<p>The code in <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.h</span></code> is usually a trivial declaration of the
+<code class="docutils literal"><span class="pre">XXXTargetAsmInfo</span></code> class for use in <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.cpp</span></code>.  Similarly,
+<code class="docutils literal"><span class="pre">XXXTargetAsmInfo.cpp</span></code> usually has a few declarations of <code class="docutils literal"><span class="pre">XXXTargetAsmInfo</span></code>
+replacement values that override the default values in <code class="docutils literal"><span class="pre">TargetAsmInfo.cpp</span></code>.
+For example in <code class="docutils literal"><span class="pre">SparcTargetAsmInfo.cpp</span></code>:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SparcTargetAsmInfo</span><span class="o">::</span><span class="n">SparcTargetAsmInfo</span><span class="p">(</span><span class="k">const</span> <span class="n">SparcTargetMachine</span> <span class="o">&</span><span class="n">TM</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">Data16bitsDirective</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.half</span><span class="se">\t</span><span class="s">"</span><span class="p">;</span>
+  <span class="n">Data32bitsDirective</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.word</span><span class="se">\t</span><span class="s">"</span><span class="p">;</span>
+  <span class="n">Data64bitsDirective</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="c1">// .xword is only supported by V9.</span>
+  <span class="n">ZeroDirective</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.skip</span><span class="se">\t</span><span class="s">"</span><span class="p">;</span>
+  <span class="n">CommentString</span> <span class="o">=</span> <span class="s">"!"</span><span class="p">;</span>
+  <span class="n">ConstantPoolSection</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.section </span><span class="se">\"</span><span class="s">.rodata</span><span class="se">\"</span><span class="s">,#alloc</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The X86 assembly printer implementation (<code class="docutils literal"><span class="pre">X86TargetAsmInfo</span></code>) is an example
+where the target specific <code class="docutils literal"><span class="pre">TargetAsmInfo</span></code> class uses an overridden methods:
+<code class="docutils literal"><span class="pre">ExpandInlineAsm</span></code>.</p>
+<p>A target-specific implementation of <code class="docutils literal"><span class="pre">AsmPrinter</span></code> is written in
+<code class="docutils literal"><span class="pre">XXXAsmPrinter.cpp</span></code>, which implements the <code class="docutils literal"><span class="pre">AsmPrinter</span></code> class that converts
+the LLVM to printable assembly.  The implementation must include the following
+headers that have declarations for the <code class="docutils literal"><span class="pre">AsmPrinter</span></code> and
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> classes.  The <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> is a subclass of
+<code class="docutils literal"><span class="pre">FunctionPass</span></code>.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/CodeGen/AsmPrinter.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/CodeGen/MachineFunctionPass.h"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>As a <code class="docutils literal"><span class="pre">FunctionPass</span></code>, <code class="docutils literal"><span class="pre">AsmPrinter</span></code> first calls <code class="docutils literal"><span class="pre">doInitialization</span></code> to set
+up the <code class="docutils literal"><span class="pre">AsmPrinter</span></code>.  In <code class="docutils literal"><span class="pre">SparcAsmPrinter</span></code>, a <code class="docutils literal"><span class="pre">Mangler</span></code> object is
+instantiated to process variable names.</p>
+<p>In <code class="docutils literal"><span class="pre">XXXAsmPrinter.cpp</span></code>, the <code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> method (declared in
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>) must be implemented for <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code>.  In
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>, the <code class="docutils literal"><span class="pre">runOnFunction</span></code> method invokes
+<code class="docutils literal"><span class="pre">runOnMachineFunction</span></code>.  Target-specific implementations of
+<code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> differ, but generally do the following to process each
+machine function:</p>
+<ul class="simple">
+<li>Call <code class="docutils literal"><span class="pre">SetupMachineFunction</span></code> to perform initialization.</li>
+<li>Call <code class="docutils literal"><span class="pre">EmitConstantPool</span></code> to print out (to the output stream) constants which
+have been spilled to memory.</li>
+<li>Call <code class="docutils literal"><span class="pre">EmitJumpTableInfo</span></code> to print out jump tables used by the current
+function.</li>
+<li>Print out the label for the current function.</li>
+<li>Print out the code for the function, including basic block labels and the
+assembly for the instruction (using <code class="docutils literal"><span class="pre">printInstruction</span></code>)</li>
+</ul>
+<p>The <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code> implementation must also include the code generated by
+TableGen that is output in the <code class="docutils literal"><span class="pre">XXXGenAsmWriter.inc</span></code> file.  The code in
+<code class="docutils literal"><span class="pre">XXXGenAsmWriter.inc</span></code> contains an implementation of the <code class="docutils literal"><span class="pre">printInstruction</span></code>
+method that may call these methods:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">printOperand</span></code></li>
+<li><code class="docutils literal"><span class="pre">printMemOperand</span></code></li>
+<li><code class="docutils literal"><span class="pre">printCCOperand</span></code> (for conditional statements)</li>
+<li><code class="docutils literal"><span class="pre">printDataDirective</span></code></li>
+<li><code class="docutils literal"><span class="pre">printDeclare</span></code></li>
+<li><code class="docutils literal"><span class="pre">printImplicitDef</span></code></li>
+<li><code class="docutils literal"><span class="pre">printInlineAsm</span></code></li>
+</ul>
+<p>The implementations of <code class="docutils literal"><span class="pre">printDeclare</span></code>, <code class="docutils literal"><span class="pre">printImplicitDef</span></code>,
+<code class="docutils literal"><span class="pre">printInlineAsm</span></code>, and <code class="docutils literal"><span class="pre">printLabel</span></code> in <code class="docutils literal"><span class="pre">AsmPrinter.cpp</span></code> are generally
+adequate for printing assembly and do not need to be overridden.</p>
+<p>The <code class="docutils literal"><span class="pre">printOperand</span></code> method is implemented with a long <code class="docutils literal"><span class="pre">switch</span></code>/<code class="docutils literal"><span class="pre">case</span></code>
+statement for the type of operand: register, immediate, basic block, external
+symbol, global address, constant pool index, or jump table index.  For an
+instruction with a memory address operand, the <code class="docutils literal"><span class="pre">printMemOperand</span></code> method
+should be implemented to generate the proper output.  Similarly,
+<code class="docutils literal"><span class="pre">printCCOperand</span></code> should be used to print a conditional operand.</p>
+<p><code class="docutils literal"><span class="pre">doFinalization</span></code> should be overridden in <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code>, and it should be
+called to shut down the assembly printer.  During <code class="docutils literal"><span class="pre">doFinalization</span></code>, global
+variables and constants are printed to output.</p>
+</div>
+<div class="section" id="subtarget-support">
+<h2><a class="toc-backref" href="#id30">Subtarget Support</a><a class="headerlink" href="#subtarget-support" title="Permalink to this headline">Â¶</a></h2>
+<p>Subtarget support is used to inform the code generation process of instruction
+set variations for a given chip set.  For example, the LLVM SPARC
+implementation provided covers three major versions of the SPARC microprocessor
+architecture: Version 8 (V8, which is a 32-bit architecture), Version 9 (V9, a
+64-bit architecture), and the UltraSPARC architecture.  V8 has 16
+double-precision floating-point registers that are also usable as either 32
+single-precision or 8 quad-precision registers.  V8 is also purely big-endian.
+V9 has 32 double-precision floating-point registers that are also usable as 16
+quad-precision registers, but cannot be used as single-precision registers.
+The UltraSPARC architecture combines V9 with UltraSPARC Visual Instruction Set
+extensions.</p>
+<p>If subtarget support is needed, you should implement a target-specific
+<code class="docutils literal"><span class="pre">XXXSubtarget</span></code> class for your architecture.  This class should process the
+command-line options <code class="docutils literal"><span class="pre">-mcpu=</span></code> and <code class="docutils literal"><span class="pre">-mattr=</span></code>.</p>
+<p>TableGen uses definitions in the <code class="docutils literal"><span class="pre">Target.td</span></code> and <code class="docutils literal"><span class="pre">Sparc.td</span></code> files to
+generate code in <code class="docutils literal"><span class="pre">SparcGenSubtarget.inc</span></code>.  In <code class="docutils literal"><span class="pre">Target.td</span></code>, shown below, the
+<code class="docutils literal"><span class="pre">SubtargetFeature</span></code> interface is defined.  The first 4 string parameters of
+the <code class="docutils literal"><span class="pre">SubtargetFeature</span></code> interface are a feature name, an attribute set by the
+feature, the value of the attribute, and a description of the feature.  (The
+fifth parameter is a list of features whose presence is implied, and its
+default value is an empty array.)</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class SubtargetFeature<string n, string a, string v, string d,
+                       list<SubtargetFeature> i = []> {
+  string Name = n;
+  string Attribute = a;
+  string Value = v;
+  string Desc = d;
+  list<SubtargetFeature> Implies = i;
+}
+</pre></div>
+</div>
+<p>In the <code class="docutils literal"><span class="pre">Sparc.td</span></code> file, the <code class="docutils literal"><span class="pre">SubtargetFeature</span></code> is used to define the
+following features.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true",
+                     "Enable SPARC-V9 instructions">;
+def FeatureV8Deprecated : SubtargetFeature<"deprecated-v8",
+                     "V8DeprecatedInsts", "true",
+                     "Enable deprecated V8 instructions in V9 mode">;
+def FeatureVIS : SubtargetFeature<"vis", "IsVIS", "true",
+                     "Enable UltraSPARC Visual Instruction Set extensions">;
+</pre></div>
+</div>
+<p>Elsewhere in <code class="docutils literal"><span class="pre">Sparc.td</span></code>, the <code class="docutils literal"><span class="pre">Proc</span></code> class is defined and then is used to
+define particular SPARC processor subtypes that may have the previously
+described features.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class Proc<string Name, list<SubtargetFeature> Features>
+  : Processor<Name, NoItineraries, Features>;
+
+def : Proc<"generic",         []>;
+def : Proc<"v8",              []>;
+def : Proc<"supersparc",      []>;
+def : Proc<"sparclite",       []>;
+def : Proc<"f934",            []>;
+def : Proc<"hypersparc",      []>;
+def : Proc<"sparclite86x",    []>;
+def : Proc<"sparclet",        []>;
+def : Proc<"tsc701",          []>;
+def : Proc<"v9",              [FeatureV9]>;
+def : Proc<"ultrasparc",      [FeatureV9, FeatureV8Deprecated]>;
+def : Proc<"ultrasparc3",     [FeatureV9, FeatureV8Deprecated]>;
+def : Proc<"ultrasparc3-vis", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>;
+</pre></div>
+</div>
+<p>From <code class="docutils literal"><span class="pre">Target.td</span></code> and <code class="docutils literal"><span class="pre">Sparc.td</span></code> files, the resulting
+<code class="docutils literal"><span class="pre">SparcGenSubtarget.inc</span></code> specifies enum values to identify the features,
+arrays of constants to represent the CPU features and CPU subtypes, and the
+<code class="docutils literal"><span class="pre">ParseSubtargetFeatures</span></code> method that parses the features string that sets
+specified subtarget options.  The generated <code class="docutils literal"><span class="pre">SparcGenSubtarget.inc</span></code> file
+should be included in the <code class="docutils literal"><span class="pre">SparcSubtarget.cpp</span></code>.  The target-specific
+implementation of the <code class="docutils literal"><span class="pre">XXXSubtarget</span></code> method should follow this pseudocode:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">XXXSubtarget</span><span class="o">::</span><span class="n">XXXSubtarget</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">FS</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// Set the default features</span>
+  <span class="c1">// Determine default and user specified characteristics of the CPU</span>
+  <span class="c1">// Call ParseSubtargetFeatures(FS, CPU) to parse the features string</span>
+  <span class="c1">// Perform any additional operations</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="jit-support">
+<h2><a class="toc-backref" href="#id31">JIT Support</a><a class="headerlink" href="#jit-support" title="Permalink to this headline">Â¶</a></h2>
+<p>The implementation of a target machine optionally includes a Just-In-Time (JIT)
+code generator that emits machine code and auxiliary structures as binary
+output that can be written directly to memory.  To do this, implement JIT code
+generation by performing the following steps:</p>
+<ul class="simple">
+<li>Write an <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code> file that contains a machine function pass
+that transforms target-machine instructions into relocatable machine
+code.</li>
+<li>Write an <code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> file that implements the JIT interfaces for
+target-specific code-generation activities, such as emitting machine code and
+stubs.</li>
+<li>Modify <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> so that it provides a <code class="docutils literal"><span class="pre">TargetJITInfo</span></code> object
+through its <code class="docutils literal"><span class="pre">getJITInfo</span></code> method.</li>
+</ul>
+<p>There are several different approaches to writing the JIT support code.  For
+instance, TableGen and target descriptor files may be used for creating a JIT
+code generator, but are not mandatory.  For the Alpha and PowerPC target
+machines, TableGen is used to generate <code class="docutils literal"><span class="pre">XXXGenCodeEmitter.inc</span></code>, which
+contains the binary coding of machine instructions and the
+<code class="docutils literal"><span class="pre">getBinaryCodeForInstr</span></code> method to access those codes.  Other JIT
+implementations do not.</p>
+<p>Both <code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> and <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code> must include the
+<code class="docutils literal"><span class="pre">llvm/CodeGen/MachineCodeEmitter.h</span></code> header file that defines the
+<code class="docutils literal"><span class="pre">MachineCodeEmitter</span></code> class containing code for several callback functions
+that write data (in bytes, words, strings, etc.) to the output stream.</p>
+<div class="section" id="machine-code-emitter">
+<h3><a class="toc-backref" href="#id32">Machine Code Emitter</a><a class="headerlink" href="#machine-code-emitter" title="Permalink to this headline">Â¶</a></h3>
+<p>In <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code>, a target-specific of the <code class="docutils literal"><span class="pre">Emitter</span></code> class is
+implemented as a function pass (subclass of <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>).  The
+target-specific implementation of <code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> (invoked by
+<code class="docutils literal"><span class="pre">runOnFunction</span></code> in <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>) iterates through the
+<code class="docutils literal"><span class="pre">MachineBasicBlock</span></code> calls <code class="docutils literal"><span class="pre">emitInstruction</span></code> to process each instruction and
+emit binary code.  <code class="docutils literal"><span class="pre">emitInstruction</span></code> is largely implemented with case
+statements on the instruction types defined in <code class="docutils literal"><span class="pre">XXXInstrInfo.h</span></code>.  For
+example, in <code class="docutils literal"><span class="pre">X86CodeEmitter.cpp</span></code>, the <code class="docutils literal"><span class="pre">emitInstruction</span></code> method is built
+around the following <code class="docutils literal"><span class="pre">switch</span></code>/<code class="docutils literal"><span class="pre">case</span></code> statements:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">switch</span> <span class="p">(</span><span class="n">Desc</span><span class="o">-></span><span class="n">TSFlags</span> <span class="o">&</span> <span class="n">X86</span><span class="o">::</span><span class="n">FormMask</span><span class="p">)</span> <span class="p">{</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">Pseudo</span><span class="p">:</span>  <span class="c1">// for not yet implemented instructions</span>
+   <span class="p">...</span>               <span class="c1">// or pseudo-instructions</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">RawFrm</span><span class="p">:</span>  <span class="c1">// for instructions with a fixed opcode value</span>
+   <span class="p">...</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">AddRegFrm</span><span class="p">:</span> <span class="c1">// for instructions that have one register operand</span>
+   <span class="p">...</span>                 <span class="c1">// added to their opcode</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMDestReg</span><span class="p">:</span><span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a destination (register)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMDestMem</span><span class="p">:</span><span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a destination (memory)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMSrcReg</span><span class="p">:</span> <span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a source (register)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMSrcMem</span><span class="p">:</span> <span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a source (memory)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM0r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM1r</span><span class="p">:</span>  <span class="c1">// for instructions that operate on</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM2r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM3r</span><span class="p">:</span>  <span class="c1">// a REGISTER r/m operand and</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM4r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM5r</span><span class="p">:</span>  <span class="c1">// use the Mod/RM byte and a field</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM6r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM7r</span><span class="p">:</span>  <span class="c1">// to hold extended opcode data</span>
+   <span class="p">...</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM0m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM1m</span><span class="p">:</span>  <span class="c1">// for instructions that operate on</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM2m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM3m</span><span class="p">:</span>  <span class="c1">// a MEMORY r/m operand and</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM4m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM5m</span><span class="p">:</span>  <span class="c1">// use the Mod/RM byte and a field</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM6m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM7m</span><span class="p">:</span>  <span class="c1">// to hold extended opcode data</span>
+   <span class="p">...</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMInitReg</span><span class="p">:</span> <span class="c1">// for instructions whose source and</span>
+   <span class="p">...</span>                  <span class="c1">// destination are the same register</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The implementations of these case statements often first emit the opcode and
+then get the operand(s).  Then depending upon the operand, helper methods may
+be called to process the operand(s).  For example, in <code class="docutils literal"><span class="pre">X86CodeEmitter.cpp</span></code>,
+for the <code class="docutils literal"><span class="pre">X86II::AddRegFrm</span></code> case, the first data emitted (by <code class="docutils literal"><span class="pre">emitByte</span></code>) is
+the opcode added to the register operand.  Then an object representing the
+machine operand, <code class="docutils literal"><span class="pre">MO1</span></code>, is extracted.  The helper methods such as
+<code class="docutils literal"><span class="pre">isImmediate</span></code>, <code class="docutils literal"><span class="pre">isGlobalAddress</span></code>, <code class="docutils literal"><span class="pre">isExternalSymbol</span></code>,
+<code class="docutils literal"><span class="pre">isConstantPoolIndex</span></code>, and <code class="docutils literal"><span class="pre">isJumpTableIndex</span></code> determine the operand type.
+(<code class="docutils literal"><span class="pre">X86CodeEmitter.cpp</span></code> also has private methods such as <code class="docutils literal"><span class="pre">emitConstant</span></code>,
+<code class="docutils literal"><span class="pre">emitGlobalAddress</span></code>, <code class="docutils literal"><span class="pre">emitExternalSymbolAddress</span></code>, <code class="docutils literal"><span class="pre">emitConstPoolAddress</span></code>,
+and <code class="docutils literal"><span class="pre">emitJumpTableAddress</span></code> that emit the data into the output stream.)</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">AddRegFrm</span><span class="p">:</span>
+  <span class="n">MCE</span><span class="p">.</span><span class="n">emitByte</span><span class="p">(</span><span class="n">BaseOpcode</span> <span class="o">+</span> <span class="n">getX86RegNum</span><span class="p">(</span><span class="n">MI</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="n">CurOp</span><span class="o">++</span><span class="p">).</span><span class="n">getReg</span><span class="p">()));</span>
+
+  <span class="k">if</span> <span class="p">(</span><span class="n">CurOp</span> <span class="o">!=</span> <span class="n">NumOps</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">const</span> <span class="n">MachineOperand</span> <span class="o">&</span><span class="n">MO1</span> <span class="o">=</span> <span class="n">MI</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="n">CurOp</span><span class="o">++</span><span class="p">);</span>
+    <span class="kt">unsigned</span> <span class="n">Size</span> <span class="o">=</span> <span class="n">X86InstrInfo</span><span class="o">::</span><span class="n">sizeOfImm</span><span class="p">(</span><span class="n">Desc</span><span class="p">);</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isImmediate</span><span class="p">())</span>
+      <span class="n">emitConstant</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getImm</span><span class="p">(),</span> <span class="n">Size</span><span class="p">);</span>
+    <span class="k">else</span> <span class="p">{</span>
+      <span class="kt">unsigned</span> <span class="n">rt</span> <span class="o">=</span> <span class="n">Is64BitMode</span> <span class="o">?</span> <span class="n">X86</span><span class="o">::</span><span class="nl">reloc_pcrel_word</span>
+        <span class="p">:</span> <span class="p">(</span><span class="n">IsPIC</span> <span class="o">?</span> <span class="n">X86</span><span class="o">::</span><span class="nl">reloc_picrel_word</span> <span class="p">:</span> <span class="n">X86</span><span class="o">::</span><span class="n">reloc_absolute_word</span><span class="p">);</span>
+      <span class="k">if</span> <span class="p">(</span><span class="n">Opcode</span> <span class="o">==</span> <span class="n">X86</span><span class="o">::</span><span class="n">MOV64ri</span><span class="p">)</span>
+        <span class="n">rt</span> <span class="o">=</span> <span class="n">X86</span><span class="o">::</span><span class="n">reloc_absolute_dword</span><span class="p">;</span>  <span class="c1">// FIXME: add X86II flag?</span>
+      <span class="k">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isGlobalAddress</span><span class="p">())</span> <span class="p">{</span>
+        <span class="kt">bool</span> <span class="n">NeedStub</span> <span class="o">=</span> <span class="n">isa</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getGlobal</span><span class="p">());</span>
+        <span class="kt">bool</span> <span class="n">isLazy</span> <span class="o">=</span> <span class="n">gvNeedsLazyPtr</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getGlobal</span><span class="p">());</span>
+        <span class="n">emitGlobalAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getGlobal</span><span class="p">(),</span> <span class="n">rt</span><span class="p">,</span> <span class="n">MO1</span><span class="p">.</span><span class="n">getOffset</span><span class="p">(),</span> <span class="mi">0</span><span class="p">,</span>
+                          <span class="n">NeedStub</span><span class="p">,</span> <span class="n">isLazy</span><span class="p">);</span>
+      <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isExternalSymbol</span><span class="p">())</span>
+        <span class="n">emitExternalSymbolAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getSymbolName</span><span class="p">(),</span> <span class="n">rt</span><span class="p">);</span>
+      <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isConstantPoolIndex</span><span class="p">())</span>
+        <span class="n">emitConstPoolAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getIndex</span><span class="p">(),</span> <span class="n">rt</span><span class="p">);</span>
+      <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isJumpTableIndex</span><span class="p">())</span>
+        <span class="n">emitJumpTableAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getIndex</span><span class="p">(),</span> <span class="n">rt</span><span class="p">);</span>
+    <span class="p">}</span>
+  <span class="p">}</span>
+  <span class="k">break</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>In the previous example, <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code> uses the variable <code class="docutils literal"><span class="pre">rt</span></code>, which
+is a <code class="docutils literal"><span class="pre">RelocationType</span></code> enum that may be used to relocate addresses (for
+example, a global address with a PIC base offset).  The <code class="docutils literal"><span class="pre">RelocationType</span></code> enum
+for that target is defined in the short target-specific <code class="docutils literal"><span class="pre">XXXRelocations.h</span></code>
+file.  The <code class="docutils literal"><span class="pre">RelocationType</span></code> is used by the <code class="docutils literal"><span class="pre">relocate</span></code> method defined in
+<code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> to rewrite addresses for referenced global symbols.</p>
+<p>For example, <code class="docutils literal"><span class="pre">X86Relocations.h</span></code> specifies the following relocation types for
+the X86 addresses.  In all four cases, the relocated value is added to the
+value already in memory.  For <code class="docutils literal"><span class="pre">reloc_pcrel_word</span></code> and <code class="docutils literal"><span class="pre">reloc_picrel_word</span></code>,
+there is an additional initial adjustment.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="n">RelocationType</span> <span class="p">{</span>
+  <span class="n">reloc_pcrel_word</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>    <span class="c1">// add reloc value after adjusting for the PC loc</span>
+  <span class="n">reloc_picrel_word</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>   <span class="c1">// add reloc value after adjusting for the PIC base</span>
+  <span class="n">reloc_absolute_word</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="c1">// absolute relocation; no additional adjustment</span>
+  <span class="n">reloc_absolute_dword</span> <span class="o">=</span> <span class="mi">3</span> <span class="c1">// absolute relocation; no additional adjustment</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="target-jit-info">
+<h3><a class="toc-backref" href="#id33">Target JIT Info</a><a class="headerlink" href="#target-jit-info" title="Permalink to this headline">Â¶</a></h3>
+<p><code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> implements the JIT interfaces for target-specific
+code-generation activities, such as emitting machine code and stubs.  At
+minimum, a target-specific version of <code class="docutils literal"><span class="pre">XXXJITInfo</span></code> implements the following:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> â Initializes the JIT, gives the target a
+function that is used for compilation.</li>
+<li><code class="docutils literal"><span class="pre">emitFunctionStub</span></code> â Returns a native function with a specified address
+for a callback function.</li>
+<li><code class="docutils literal"><span class="pre">relocate</span></code> â Changes the addresses of referenced globals, based on
+relocation types.</li>
+<li>Callback function that are wrappers to a function stub that is used when the
+real target is not initially known.</li>
+</ul>
+<p><code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> is generally trivial to implement.  It makes the
+incoming parameter as the global <code class="docutils literal"><span class="pre">JITCompilerFunction</span></code> and returns the
+callback function that will be used a function wrapper.  For the Alpha target
+(in <code class="docutils literal"><span class="pre">AlphaJITInfo.cpp</span></code>), the <code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> implementation is
+simply:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">TargetJITInfo</span><span class="o">::</span><span class="n">LazyResolverFn</span> <span class="n">AlphaJITInfo</span><span class="o">::</span><span class="n">getLazyResolverFunction</span><span class="p">(</span>
+                                            <span class="n">JITCompilerFn</span> <span class="n">F</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">JITCompilerFunction</span> <span class="o">=</span> <span class="n">F</span><span class="p">;</span>
+  <span class="k">return</span> <span class="n">AlphaCompilationCallback</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>For the X86 target, the <code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> implementation is a little
+more complicated, because it returns a different callback function for
+processors with SSE instructions and XMM registers.</p>
+<p>The callback function initially saves and later restores the callee register
+values, incoming arguments, and frame and return address.  The callback
+function needs low-level access to the registers or stack, so it is typically
+implemented with assembler.</p>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="HowToUseInstrMappings.html" title="How To Use Instruction Mappings"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html (added)
+++ www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1326 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Writing an LLVM Pass — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="How To Use Attributes" href="HowToUseAttributes.html" />
+    <link rel="prev" title="Garbage Collection with LLVM" href="GarbageCollection.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="HowToUseAttributes.html" title="How To Use Attributes"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="GarbageCollection.html" title="Garbage Collection with LLVM"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="writing-an-llvm-pass">
+<h1>Writing an LLVM Pass<a class="headerlink" href="#writing-an-llvm-pass" title="Permalink to this headline">Â¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction-what-is-a-pass" id="id5">Introduction â What is a pass?</a></li>
+<li><a class="reference internal" href="#quick-start-writing-hello-world" id="id6">Quick Start â Writing hello world</a><ul>
+<li><a class="reference internal" href="#setting-up-the-build-environment" id="id7">Setting up the build environment</a></li>
+<li><a class="reference internal" href="#basic-code-required" id="id8">Basic code required</a></li>
+<li><a class="reference internal" href="#running-a-pass-with-opt" id="id9">Running a pass with <code class="docutils literal"><span class="pre">opt</span></code></a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pass-classes-and-requirements" id="id10">Pass classes and requirements</a><ul>
+<li><a class="reference internal" href="#the-immutablepass-class" id="id11">The <code class="docutils literal"><span class="pre">ImmutablePass</span></code> class</a></li>
+<li><a class="reference internal" href="#the-modulepass-class" id="id12">The <code class="docutils literal"><span class="pre">ModulePass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-runonmodule-method" id="id13">The <code class="docutils literal"><span class="pre">runOnModule</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-callgraphsccpass-class" id="id14">The <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-callgraph-method" id="id15">The <code class="docutils literal"><span class="pre">doInitialization(CallGraph</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonscc-method" id="id16">The <code class="docutils literal"><span class="pre">runOnSCC</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-callgraph-method" id="id17">The <code class="docutils literal"><span class="pre">doFinalization(CallGraph</span> <span class="pre">&)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-functionpass-class" id="id18">The <code class="docutils literal"><span class="pre">FunctionPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-module-method" id="id19">The <code class="docutils literal"><span class="pre">doInitialization(Module</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonfunction-method" id="id20">The <code class="docutils literal"><span class="pre">runOnFunction</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-module-method" id="id21">The <code class="docutils literal"><span class="pre">doFinalization(Module</span> <span class="pre">&)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-looppass-class" id="id22">The <code class="docutils literal"><span class="pre">LoopPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-loop-lppassmanager-method" id="id23">The <code class="docutils literal"><span class="pre">doInitialization(Loop</span> <span class="pre">*,</span> <span class="pre">LPPassManager</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonloop-method" id="id24">The <code class="docutils literal"><span class="pre">runOnLoop</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-method" id="id25">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-regionpass-class" id="id26">The <code class="docutils literal"><span class="pre">RegionPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-region-rgpassmanager-method" id="id27">The <code class="docutils literal"><span class="pre">doInitialization(Region</span> <span class="pre">*,</span> <span class="pre">RGPassManager</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonregion-method" id="id28">The <code class="docutils literal"><span class="pre">runOnRegion</span></code> method</a></li>
+<li><a class="reference internal" href="#id2" id="id29">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-basicblockpass-class" id="id30">The <code class="docutils literal"><span class="pre">BasicBlockPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-function-method" id="id31">The <code class="docutils literal"><span class="pre">doInitialization(Function</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonbasicblock-method" id="id32">The <code class="docutils literal"><span class="pre">runOnBasicBlock</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-function-method" id="id33">The <code class="docutils literal"><span class="pre">doFinalization(Function</span> <span class="pre">&)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-machinefunctionpass-class" id="id34">The <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-runonmachinefunction-machinefunction-mf-method" id="id35">The <code class="docutils literal"><span class="pre">runOnMachineFunction(MachineFunction</span> <span class="pre">&MF)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pass-registration" id="id36">Pass registration</a><ul>
+<li><a class="reference internal" href="#the-print-method" id="id37">The <code class="docutils literal"><span class="pre">print</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#specifying-interactions-between-passes" id="id38">Specifying interactions between passes</a><ul>
+<li><a class="reference internal" href="#the-getanalysisusage-method" id="id39">The <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> method</a></li>
+<li><a class="reference internal" href="#the-analysisusage-addrequired-and-analysisusage-addrequiredtransitive-methods" id="id40">The <code class="docutils literal"><span class="pre">AnalysisUsage::addRequired<></span></code> and <code class="docutils literal"><span class="pre">AnalysisUsage::addRequiredTransitive<></span></code> methods</a></li>
+<li><a class="reference internal" href="#the-analysisusage-addpreserved-method" id="id41">The <code class="docutils literal"><span class="pre">AnalysisUsage::addPreserved<></span></code> method</a></li>
+<li><a class="reference internal" href="#example-implementations-of-getanalysisusage" id="id42">Example implementations of <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code></a></li>
+<li><a class="reference internal" href="#the-getanalysis-and-getanalysisifavailable-methods" id="id43">The <code class="docutils literal"><span class="pre">getAnalysis<></span></code> and <code class="docutils literal"><span class="pre">getAnalysisIfAvailable<></span></code> methods</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#implementing-analysis-groups" id="id44">Implementing Analysis Groups</a><ul>
+<li><a class="reference internal" href="#analysis-group-concepts" id="id45">Analysis Group Concepts</a></li>
+<li><a class="reference internal" href="#using-registeranalysisgroup" id="id46">Using <code class="docutils literal"><span class="pre">RegisterAnalysisGroup</span></code></a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pass-statistics" id="id47">Pass Statistics</a><ul>
+<li><a class="reference internal" href="#what-passmanager-does" id="id48">What PassManager does</a><ul>
+<li><a class="reference internal" href="#the-releasememory-method" id="id49">The <code class="docutils literal"><span class="pre">releaseMemory</span></code> method</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a class="reference internal" href="#registering-dynamically-loaded-passes" id="id50">Registering dynamically loaded passes</a><ul>
+<li><a class="reference internal" href="#using-existing-registries" id="id51">Using existing registries</a></li>
+<li><a class="reference internal" href="#creating-new-registries" id="id52">Creating new registries</a></li>
+<li><a class="reference internal" href="#using-gdb-with-dynamically-loaded-passes" id="id53">Using GDB with dynamically loaded passes</a><ul>
+<li><a class="reference internal" href="#setting-a-breakpoint-in-your-pass" id="id54">Setting a breakpoint in your pass</a></li>
+<li><a class="reference internal" href="#miscellaneous-problems" id="id55">Miscellaneous Problems</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#future-extensions-planned" id="id56">Future extensions planned</a><ul>
+<li><a class="reference internal" href="#multithreaded-llvm" id="id57">Multithreaded LLVM</a></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="introduction-what-is-a-pass">
+<h2><a class="toc-backref" href="#id5">Introduction â What is a pass?</a><a class="headerlink" href="#introduction-what-is-a-pass" title="Permalink to this headline">Â¶</a></h2>
+<p>The LLVM Pass Framework is an important part of the LLVM system, because LLVM
+passes are where most of the interesting parts of the compiler exist.  Passes
+perform the transformations and optimizations that make up the compiler, they
+build the analysis results that are used by these transformations, and they
+are, above all, a structuring technique for compiler code.</p>
+<p>All LLVM passes are subclasses of the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Pass.html">Pass</a> class, which implement
+functionality by overriding virtual methods inherited from <code class="docutils literal"><span class="pre">Pass</span></code>.  Depending
+on how your pass works, you should inherit from the <a class="reference internal" href="#writing-an-llvm-pass-modulepass"><span class="std std-ref">ModulePass</span></a> , <a class="reference internal" href="#writing-an-llvm-pass-callgraphsccpass"><span class="std std-ref">CallGraphSCCPass</span></a>, <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> , or <a class="reference internal" href="#writing-an-llvm-pass-looppass"><span class="std std-ref">LoopPass</span></a>, or <a class="reference internal" href="#writing-an-llvm-pass-regionpass"><span class="std std-ref">RegionPass</span></a>, or <a class="reference internal" href="#writing-an-llvm-pass-basicblockpass"><span class="std std-ref">BasicBlockPass</span></a> classes, which gives the system more
+information about what your pass does, and how it can be combined with other
+passes.  One of the main features of the LLVM Pass Framework is that it
+schedules passes to run in an efficient way based on the constraints that your
+pass meets (which are indicated by which class they derive from).</p>
+<p>We start by showing you how to construct a pass, everything from setting up the
+code, to compiling, loading, and executing it.  After the basics are down, more
+advanced features are discussed.</p>
+</div>
+<div class="section" id="quick-start-writing-hello-world">
+<h2><a class="toc-backref" href="#id6">Quick Start â Writing hello world</a><a class="headerlink" href="#quick-start-writing-hello-world" title="Permalink to this headline">Â¶</a></h2>
+<p>Here we describe how to write the âhello worldâ of passes.  The âHelloâ pass is
+designed to simply print out the name of non-external functions that exist in
+the program being compiled.  It does not modify the program at all, it just
+inspects it.  The source code and files for this pass are available in the LLVM
+source tree in the <code class="docutils literal"><span class="pre">lib/Transforms/Hello</span></code> directory.</p>
+<div class="section" id="setting-up-the-build-environment">
+<span id="writing-an-llvm-pass-makefile"></span><h3><a class="toc-backref" href="#id7">Setting up the build environment</a><a class="headerlink" href="#setting-up-the-build-environment" title="Permalink to this headline">Â¶</a></h3>
+<p>First, configure and build LLVM.  Next, you need to create a new directory
+somewhere in the LLVM source base.  For this example, weâll assume that you
+made <code class="docutils literal"><span class="pre">lib/Transforms/Hello</span></code>.  Finally, you must set up a build script
+that will compile the source code for the new pass.  To do this,
+copy the following into <code class="docutils literal"><span class="pre">CMakeLists.txt</span></code>:</p>
+<div class="highlight-cmake"><div class="highlight"><pre><span></span><span class="nb">add_llvm_loadable_module</span><span class="p">(</span> <span class="s">LLVMHello</span>
+  <span class="s">Hello.cpp</span>
+
+  <span class="s">PLUGIN_TOOL</span>
+  <span class="s">opt</span>
+  <span class="p">)</span>
+</pre></div>
+</div>
+<p>and the following line into <code class="docutils literal"><span class="pre">lib/Transforms/CMakeLists.txt</span></code>:</p>
+<div class="highlight-cmake"><div class="highlight"><pre><span></span><span class="nb">add_subdirectory</span><span class="p">(</span><span class="s">Hello</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>(Note that there is already a directory named <code class="docutils literal"><span class="pre">Hello</span></code> with a sample âHelloâ
+pass; you may play with it â in which case you donât need to modify any
+<code class="docutils literal"><span class="pre">CMakeLists.txt</span></code> files â or, if you want to create everything from scratch,
+use another name.)</p>
+<p>This build script specifies that <code class="docutils literal"><span class="pre">Hello.cpp</span></code> file in the current directory
+is to be compiled and linked into a shared object <code class="docutils literal"><span class="pre">$(LEVEL)/lib/LLVMHello.so</span></code> that
+can be dynamically loaded by the <strong class="program">opt</strong> tool via its <a class="reference internal" href="CommandGuide/opt.html#cmdoption-load"><code class="xref std std-option docutils literal"><span class="pre">-load</span></code></a>
+option. If your operating system uses a suffix other than <code class="docutils literal"><span class="pre">.so</span></code> (such as
+Windows or Mac OS X), the appropriate extension will be used.</p>
+<p>Now that we have the build scripts set up, we just need to write the code for
+the pass itself.</p>
+</div>
+<div class="section" id="basic-code-required">
+<span id="writing-an-llvm-pass-basiccode"></span><h3><a class="toc-backref" href="#id8">Basic code required</a><a class="headerlink" href="#basic-code-required" title="Permalink to this headline">Â¶</a></h3>
+<p>Now that we have a way to compile our new pass, we just have to write it.
+Start out with:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/Pass.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/IR/Function.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/Support/raw_ostream.h"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>Which are needed because we are writing a <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Pass.html">Pass</a>, we are operating on
+<a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Function.html">Function</a>s, and we will
+be doing some printing.</p>
+<p>Next we have:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="k">namespace</span> <span class="n">llvm</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>â¦ which is required because the functions from the include files live in the
+llvm namespace.</p>
+<p>Next we have:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="p">{</span>
+</pre></div>
+</div>
+<p>â¦ which starts out an anonymous namespace.  Anonymous namespaces are to C++
+what the â<code class="docutils literal"><span class="pre">static</span></code>â keyword is to C (at global scope).  It makes the things
+declared inside of the anonymous namespace visible only to the current file.
+If youâre not familiar with them, consult a decent C++ book for more
+information.</p>
+<p>Next, we declare our pass itself:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="nl">Hello</span> <span class="p">:</span> <span class="k">public</span> <span class="n">FunctionPass</span> <span class="p">{</span>
+</pre></div>
+</div>
+<p>This declares a â<code class="docutils literal"><span class="pre">Hello</span></code>â class that is a subclass of <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a>.  The different builtin pass subclasses
+are described in detail <a class="reference internal" href="#writing-an-llvm-pass-pass-classes"><span class="std std-ref">later</span></a>, but
+for now, know that <code class="docutils literal"><span class="pre">FunctionPass</span></code> operates on a function at a time.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="kt">char</span> <span class="n">ID</span><span class="p">;</span>
+<span class="n">Hello</span><span class="p">()</span> <span class="o">:</span> <span class="n">FunctionPass</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="p">{}</span>
+</pre></div>
+</div>
+<p>This declares pass identifier used by LLVM to identify pass.  This allows LLVM
+to avoid using expensive C++ runtime information.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span>  <span class="kt">bool</span> <span class="nf">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
+    <span class="n">errs</span><span class="p">()</span> <span class="o"><<</span> <span class="s">"Hello: "</span><span class="p">;</span>
+    <span class="n">errs</span><span class="p">().</span><span class="n">write_escaped</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">getName</span><span class="p">())</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
+    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">};</span> <span class="c1">// end of struct Hello</span>
+<span class="p">}</span>  <span class="c1">// end of anonymous namespace</span>
+</pre></div>
+</div>
+<p>We declare a <a class="reference internal" href="#writing-an-llvm-pass-runonfunction"><span class="std std-ref">runOnFunction</span></a> method,
+which overrides an abstract virtual method inherited from <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a>.  This is where we are supposed to do our
+thing, so we just print out our message with the name of each function.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">char</span> <span class="n">Hello</span><span class="o">::</span><span class="n">ID</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>We initialize pass ID here.  LLVM uses IDâs address to identify a pass, so
+initialization value is not important.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">RegisterPass</span><span class="o"><</span><span class="n">Hello</span><span class="o">></span> <span class="n">X</span><span class="p">(</span><span class="s">"hello"</span><span class="p">,</span> <span class="s">"Hello World Pass"</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Only looks at CFG */</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Analysis Pass */</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Lastly, we <a class="reference internal" href="#writing-an-llvm-pass-registration"><span class="std std-ref">register our class</span></a>
+<code class="docutils literal"><span class="pre">Hello</span></code>, giving it a command line argument â<code class="docutils literal"><span class="pre">hello</span></code>â, and a name âHello
+World Passâ.  The last two arguments describe its behavior: if a pass walks CFG
+without modifying it then the third argument is set to <code class="docutils literal"><span class="pre">true</span></code>; if a pass is
+an analysis pass, for example dominator tree pass, then <code class="docutils literal"><span class="pre">true</span></code> is supplied as
+the fourth argument.</p>
+<p>As a whole, the <code class="docutils literal"><span class="pre">.cpp</span></code> file looks like:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/Pass.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/IR/Function.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/Support/raw_ostream.h"</span><span class="cp"></span>
+
+<span class="k">using</span> <span class="k">namespace</span> <span class="n">llvm</span><span class="p">;</span>
+
+<span class="k">namespace</span> <span class="p">{</span>
+<span class="k">struct</span> <span class="nl">Hello</span> <span class="p">:</span> <span class="k">public</span> <span class="n">FunctionPass</span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">char</span> <span class="n">ID</span><span class="p">;</span>
+  <span class="n">Hello</span><span class="p">()</span> <span class="o">:</span> <span class="n">FunctionPass</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="p">{}</span>
+
+  <span class="kt">bool</span> <span class="n">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
+    <span class="n">errs</span><span class="p">()</span> <span class="o"><<</span> <span class="s">"Hello: "</span><span class="p">;</span>
+    <span class="n">errs</span><span class="p">().</span><span class="n">write_escaped</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">getName</span><span class="p">())</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
+    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">};</span> <span class="c1">// end of struct Hello</span>
+<span class="p">}</span>  <span class="c1">// end of anonymous namespace</span>
+
+<span class="kt">char</span> <span class="n">Hello</span><span class="o">::</span><span class="n">ID</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+<span class="k">static</span> <span class="n">RegisterPass</span><span class="o"><</span><span class="n">Hello</span><span class="o">></span> <span class="n">X</span><span class="p">(</span><span class="s">"hello"</span><span class="p">,</span> <span class="s">"Hello World Pass"</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Only looks at CFG */</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Analysis Pass */</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Now that itâs all together, compile the file with a simple â<code class="docutils literal"><span class="pre">gmake</span></code>â command
+from the top level of your build directory and you should get a new file
+â<code class="docutils literal"><span class="pre">lib/LLVMHello.so</span></code>â.  Note that everything in this file is
+contained in an anonymous namespace â this reflects the fact that passes
+are self contained units that do not need external interfaces (although they
+can have them) to be useful.</p>
+</div>
+<div class="section" id="running-a-pass-with-opt">
+<h3><a class="toc-backref" href="#id9">Running a pass with <code class="docutils literal"><span class="pre">opt</span></code></a><a class="headerlink" href="#running-a-pass-with-opt" title="Permalink to this headline">Â¶</a></h3>
+<p>Now that you have a brand new shiny shared object file, we can use the
+<strong class="program">opt</strong> command to run an LLVM program through your pass.  Because you
+registered your pass with <code class="docutils literal"><span class="pre">RegisterPass</span></code>, you will be able to use the
+<strong class="program">opt</strong> tool to access it, once loaded.</p>
+<p>To test it, follow the example at the end of the <a class="reference internal" href="GettingStarted.html"><span class="doc">Getting Started with the LLVM System</span></a> to
+compile âHello Worldâ to LLVM.  We can now run the bitcode file (hello.bc) for
+the program through our transformation like this (or course, any bitcode file
+will work):</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -hello < hello.bc > /dev/null
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+</pre></div>
+</div>
+<p>The <a class="reference internal" href="CommandGuide/opt.html#cmdoption-load"><code class="xref std std-option docutils literal"><span class="pre">-load</span></code></a> option specifies that <strong class="program">opt</strong> should load your pass
+as a shared object, which makes â<code class="docutils literal"><span class="pre">-hello</span></code>â a valid command line argument
+(which is one reason you need to <a class="reference internal" href="#writing-an-llvm-pass-registration"><span class="std std-ref">register your pass</span></a>).  Because the Hello pass does not modify
+the program in any interesting way, we just throw away the result of
+<strong class="program">opt</strong> (sending it to <code class="docutils literal"><span class="pre">/dev/null</span></code>).</p>
+<p>To see what happened to the other string you registered, try running
+<strong class="program">opt</strong> with the <a class="reference internal" href="CommandGuide/opt.html#cmdoption-help"><code class="xref std std-option docutils literal"><span class="pre">-help</span></code></a> option:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -help
+<span class="go">OVERVIEW: llvm .bc -> .bc modular optimizer and analysis printer</span>
+
+<span class="go">USAGE: opt [subcommand] [options] <input bitcode file></span>
+
+<span class="go">OPTIONS:</span>
+<span class="go">  Optimizations available:</span>
+<span class="go">...</span>
+<span class="go">    -guard-widening           - Widen guards</span>
+<span class="go">    -gvn                      - Global Value Numbering</span>
+<span class="go">    -gvn-hoist                - Early GVN Hoisting of Expressions</span>
+<span class="go">    -hello                    - Hello World Pass</span>
+<span class="go">    -indvars                  - Induction Variable Simplification</span>
+<span class="go">    -inferattrs               - Infer set function attributes</span>
+<span class="go">...</span>
+</pre></div>
+</div>
+<p>The pass name gets added as the information string for your pass, giving some
+documentation to users of <strong class="program">opt</strong>.  Now that you have a working pass,
+you would go ahead and make it do the cool transformations you want.  Once you
+get it all working and tested, it may become useful to find out how fast your
+pass is.  The <a class="reference internal" href="#writing-an-llvm-pass-passmanager"><span class="std std-ref">PassManager</span></a> provides a
+nice command line option (<a class="reference internal" href="CommandGuide/llc.html#cmdoption-time-passes"><code class="xref std std-option docutils literal"><span class="pre">--time-passes</span></code></a>) that allows you to get
+information about the execution time of your pass along with the other passes
+you queue up.  For example:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -hello -time-passes < hello.bc > /dev/null
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+<span class="go">===-------------------------------------------------------------------------===</span>
+<span class="go">                      ... Pass execution timing report ...</span>
+<span class="go">===-------------------------------------------------------------------------===</span>
+<span class="go">  Total Execution Time: 0.0007 seconds (0.0005 wall clock)</span>
+
+<span class="go">   ---User Time---   --User+System--   ---Wall Time---  --- Name ---</span>
+<span class="go">   0.0004 ( 55.3%)   0.0004 ( 55.3%)   0.0004 ( 75.7%)  Bitcode Writer</span>
+<span class="go">   0.0003 ( 44.7%)   0.0003 ( 44.7%)   0.0001 ( 13.6%)  Hello World Pass</span>
+<span class="go">   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 ( 10.7%)  Module Verifier</span>
+<span class="go">   0.0007 (100.0%)   0.0007 (100.0%)   0.0005 (100.0%)  Total</span>
+</pre></div>
+</div>
+<p>As you can see, our implementation above is pretty fast.  The additional
+passes listed are automatically inserted by the <strong class="program">opt</strong> tool to verify
+that the LLVM emitted by your pass is still valid and well formed LLVM, which
+hasnât been broken somehow.</p>
+<p>Now that you have seen the basics of the mechanics behind passes, we can talk
+about some more details of how they work and how to use them.</p>
+</div>
+</div>
+<div class="section" id="pass-classes-and-requirements">
+<span id="writing-an-llvm-pass-pass-classes"></span><h2><a class="toc-backref" href="#id10">Pass classes and requirements</a><a class="headerlink" href="#pass-classes-and-requirements" title="Permalink to this headline">Â¶</a></h2>
+<p>One of the first things that you should do when designing a new pass is to
+decide what class you should subclass for your pass.  The <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> example uses the <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> class for its implementation, but we did
+not discuss why or when this should occur.  Here we talk about the classes
+available, from the most general to the most specific.</p>
+<p>When choosing a superclass for your <code class="docutils literal"><span class="pre">Pass</span></code>, you should choose the <strong>most
+specific</strong> class possible, while still being able to meet the requirements
+listed.  This gives the LLVM Pass Infrastructure information necessary to
+optimize how passes are run, so that the resultant compiler isnât unnecessarily
+slow.</p>
+<div class="section" id="the-immutablepass-class">
+<h3><a class="toc-backref" href="#id11">The <code class="docutils literal"><span class="pre">ImmutablePass</span></code> class</a><a class="headerlink" href="#the-immutablepass-class" title="Permalink to this headline">Â¶</a></h3>
+<p>The most plain and boring type of pass is the â<a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1ImmutablePass.html">ImmutablePass</a>â class.  This pass
+type is used for passes that do not have to be run, do not change state, and
+never need to be updated.  This is not a normal type of transformation or
+analysis, but can provide information about the current compiler configuration.</p>
+<p>Although this pass class is very infrequently used, it is important for
+providing information about the current target machine being compiled for, and
+other static information that can affect the various transformations.</p>
+<p><code class="docutils literal"><span class="pre">ImmutablePass</span></code>es never invalidate other transformations, are never
+invalidated, and are never ârunâ.</p>
+</div>
+<div class="section" id="the-modulepass-class">
+<span id="writing-an-llvm-pass-modulepass"></span><h3><a class="toc-backref" href="#id12">The <code class="docutils literal"><span class="pre">ModulePass</span></code> class</a><a class="headerlink" href="#the-modulepass-class" title="Permalink to this headline">Â¶</a></h3>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1ModulePass.html">ModulePass</a> class
+is the most general of all superclasses that you can use.  Deriving from
+<code class="docutils literal"><span class="pre">ModulePass</span></code> indicates that your pass uses the entire program as a unit,
+referring to function bodies in no predictable order, or adding and removing
+functions.  Because nothing is known about the behavior of <code class="docutils literal"><span class="pre">ModulePass</span></code>
+subclasses, no optimization can be done for their execution.</p>
+<p>A module pass can use function level passes (e.g. dominators) using the
+<code class="docutils literal"><span class="pre">getAnalysis</span></code> interface <code class="docutils literal"><span class="pre">getAnalysis<DominatorTree>(llvm::Function</span> <span class="pre">*)</span></code> to
+provide the function to retrieve analysis result for, if the function pass does
+not require any module or immutable passes.  Note that this can only be done
+for functions for which the analysis ran, e.g. in the case of dominators you
+should only ask for the <code class="docutils literal"><span class="pre">DominatorTree</span></code> for function definitions, not
+declarations.</p>
+<p>To write a correct <code class="docutils literal"><span class="pre">ModulePass</span></code> subclass, derive from <code class="docutils literal"><span class="pre">ModulePass</span></code> and
+overload the <code class="docutils literal"><span class="pre">runOnModule</span></code> method with the following signature:</p>
+<div class="section" id="the-runonmodule-method">
+<h4><a class="toc-backref" href="#id13">The <code class="docutils literal"><span class="pre">runOnModule</span></code> method</a><a class="headerlink" href="#the-runonmodule-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnModule</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnModule</span></code> method performs the interesting work of the pass.  It
+should return <code class="docutils literal"><span class="pre">true</span></code> if the module was modified by the transformation and
+<code class="docutils literal"><span class="pre">false</span></code> otherwise.</p>
+</div>
+</div>
+<div class="section" id="the-callgraphsccpass-class">
+<span id="writing-an-llvm-pass-callgraphsccpass"></span><h3><a class="toc-backref" href="#id14">The <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> class</a><a class="headerlink" href="#the-callgraphsccpass-class" title="Permalink to this headline">Â¶</a></h3>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1CallGraphSCCPass.html">CallGraphSCCPass</a> is used by
+passes that need to traverse the program bottom-up on the call graph (callees
+before callers).  Deriving from <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> provides some mechanics
+for building and traversing the <code class="docutils literal"><span class="pre">CallGraph</span></code>, but also allows the system to
+optimize execution of <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code>es.  If your pass meets the
+requirements outlined below, and doesnât meet the requirements of a
+<a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> or <a class="reference internal" href="#writing-an-llvm-pass-basicblockpass"><span class="std std-ref">BasicBlockPass</span></a>, you should derive from
+<code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code>.</p>
+<p><code class="docutils literal"><span class="pre">TODO</span></code>: explain briefly what SCC, Tarjanâs algo, and B-U mean.</p>
+<p>To be explicit, CallGraphSCCPass subclasses are:</p>
+<ol class="arabic simple">
+<li>â¦ <em>not allowed</em> to inspect or modify any <code class="docutils literal"><span class="pre">Function</span></code>s other than those
+in the current SCC and the direct callers and direct callees of the SCC.</li>
+<li>â¦ <em>required</em> to preserve the current <code class="docutils literal"><span class="pre">CallGraph</span></code> object, updating it to
+reflect any changes made to the program.</li>
+<li>â¦ <em>not allowed</em> to add or remove SCCâs from the current Module, though
+they may change the contents of an SCC.</li>
+<li>â¦ <em>allowed</em> to add or remove global variables from the current Module.</li>
+<li>â¦ <em>allowed</em> to maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonscc"><span class="std std-ref">runOnSCC</span></a> (including global data).</li>
+</ol>
+<p>Implementing a <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> is slightly tricky in some cases because it
+has to handle SCCs with more than one node in it.  All of the virtual methods
+described below should return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or
+<code class="docutils literal"><span class="pre">false</span></code> if they didnât.</p>
+<div class="section" id="the-doinitialization-callgraph-method">
+<h4><a class="toc-backref" href="#id15">The <code class="docutils literal"><span class="pre">doInitialization(CallGraph</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-callgraph-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">CallGraph</span> <span class="o">&</span><span class="n">CG</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is allowed to do most of the things that
+<code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code>es are not allowed to do.  They can add and remove
+functions, get pointers to functions, etc.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is
+designed to do simple initialization type of stuff that does not depend on the
+SCCs being processed.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to
+overlap with any other pass executions (thus it should be very fast).</p>
+</div>
+<div class="section" id="the-runonscc-method">
+<span id="writing-an-llvm-pass-runonscc"></span><h4><a class="toc-backref" href="#id16">The <code class="docutils literal"><span class="pre">runOnSCC</span></code> method</a><a class="headerlink" href="#the-runonscc-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnSCC</span><span class="p">(</span><span class="n">CallGraphSCC</span> <span class="o">&</span><span class="n">SCC</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnSCC</span></code> method performs the interesting work of the pass, and should
+return <code class="docutils literal"><span class="pre">true</span></code> if the module was modified by the transformation, <code class="docutils literal"><span class="pre">false</span></code>
+otherwise.</p>
+</div>
+<div class="section" id="the-dofinalization-callgraph-method">
+<h4><a class="toc-backref" href="#id17">The <code class="docutils literal"><span class="pre">doFinalization(CallGraph</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-dofinalization-callgraph-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">(</span><span class="n">CallGraph</span> <span class="o">&</span><span class="n">CG</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonscc"><span class="std std-ref">runOnSCC</span></a> for every SCC in the program being compiled.</p>
+</div>
+</div>
+<div class="section" id="the-functionpass-class">
+<span id="writing-an-llvm-pass-functionpass"></span><h3><a class="toc-backref" href="#id18">The <code class="docutils literal"><span class="pre">FunctionPass</span></code> class</a><a class="headerlink" href="#the-functionpass-class" title="Permalink to this headline">Â¶</a></h3>
+<p>In contrast to <code class="docutils literal"><span class="pre">ModulePass</span></code> subclasses, <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Pass.html">FunctionPass</a> subclasses do have a
+predictable, local behavior that can be expected by the system.  All
+<code class="docutils literal"><span class="pre">FunctionPass</span></code> execute on each function in the program independent of all of
+the other functions in the program.  <code class="docutils literal"><span class="pre">FunctionPass</span></code>es do not require that
+they are executed in a particular order, and <code class="docutils literal"><span class="pre">FunctionPass</span></code>es do not modify
+external functions.</p>
+<p>To be explicit, <code class="docutils literal"><span class="pre">FunctionPass</span></code> subclasses are not allowed to:</p>
+<ol class="arabic simple">
+<li>Inspect or modify a <code class="docutils literal"><span class="pre">Function</span></code> other than the one currently being processed.</li>
+<li>Add or remove <code class="docutils literal"><span class="pre">Function</span></code>s from the current <code class="docutils literal"><span class="pre">Module</span></code>.</li>
+<li>Add or remove global variables from the current <code class="docutils literal"><span class="pre">Module</span></code>.</li>
+<li>Maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonfunction"><span class="std std-ref">runOnFunction</span></a> (including global data).</li>
+</ol>
+<p>Implementing a <code class="docutils literal"><span class="pre">FunctionPass</span></code> is usually straightforward (See the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello
+World</span></a> pass for example).
+<code class="docutils literal"><span class="pre">FunctionPass</span></code>es may overload three virtual methods to do their work.  All
+of these methods should return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or
+<code class="docutils literal"><span class="pre">false</span></code> if they didnât.</p>
+<div class="section" id="the-doinitialization-module-method">
+<span id="writing-an-llvm-pass-doinitialization-mod"></span><h4><a class="toc-backref" href="#id19">The <code class="docutils literal"><span class="pre">doInitialization(Module</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-module-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is allowed to do most of the things that
+<code class="docutils literal"><span class="pre">FunctionPass</span></code>es are not allowed to do.  They can add and remove functions,
+get pointers to functions, etc.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to
+do simple initialization type of stuff that does not depend on the functions
+being processed.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to
+overlap with any other pass executions (thus it should be very fast).</p>
+<p>A good example of how this method should be used is the <a class="reference external" href="http://llvm.org/doxygen/LowerAllocations_8cpp-source.html">LowerAllocations</a> pass.  This pass
+converts <code class="docutils literal"><span class="pre">malloc</span></code> and <code class="docutils literal"><span class="pre">free</span></code> instructions into platform dependent
+<code class="docutils literal"><span class="pre">malloc()</span></code> and <code class="docutils literal"><span class="pre">free()</span></code> function calls.  It uses the <code class="docutils literal"><span class="pre">doInitialization</span></code>
+method to get a reference to the <code class="docutils literal"><span class="pre">malloc</span></code> and <code class="docutils literal"><span class="pre">free</span></code> functions that it
+needs, adding prototypes to the module if necessary.</p>
+</div>
+<div class="section" id="the-runonfunction-method">
+<span id="writing-an-llvm-pass-runonfunction"></span><h4><a class="toc-backref" href="#id20">The <code class="docutils literal"><span class="pre">runOnFunction</span></code> method</a><a class="headerlink" href="#the-runonfunction-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnFunction</span></code> method must be implemented by your subclass to do the
+transformation or analysis work of your pass.  As usual, a <code class="docutils literal"><span class="pre">true</span></code> value
+should be returned if the function is modified.</p>
+</div>
+<div class="section" id="the-dofinalization-module-method">
+<span id="writing-an-llvm-pass-dofinalization-mod"></span><h4><a class="toc-backref" href="#id21">The <code class="docutils literal"><span class="pre">doFinalization(Module</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-dofinalization-module-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonfunction"><span class="std std-ref">runOnFunction</span></a> for every function in the program being
+compiled.</p>
+</div>
+</div>
+<div class="section" id="the-looppass-class">
+<span id="writing-an-llvm-pass-looppass"></span><h3><a class="toc-backref" href="#id22">The <code class="docutils literal"><span class="pre">LoopPass</span></code> class</a><a class="headerlink" href="#the-looppass-class" title="Permalink to this headline">Â¶</a></h3>
+<p>All <code class="docutils literal"><span class="pre">LoopPass</span></code> execute on each loop in the function independent of all of the
+other loops in the function.  <code class="docutils literal"><span class="pre">LoopPass</span></code> processes loops in loop nest order
+such that outer most loop is processed last.</p>
+<p><code class="docutils literal"><span class="pre">LoopPass</span></code> subclasses are allowed to update loop nest using <code class="docutils literal"><span class="pre">LPPassManager</span></code>
+interface.  Implementing a loop pass is usually straightforward.
+<code class="docutils literal"><span class="pre">LoopPass</span></code>es may overload three virtual methods to do their work.  All
+these methods should return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or <code class="docutils literal"><span class="pre">false</span></code>
+if they didnât.</p>
+<p>A <code class="docutils literal"><span class="pre">LoopPass</span></code> subclass which is intended to run as part of the main loop pass
+pipeline needs to preserve all of the same <em>function</em> analyses that the other
+loop passes in its pipeline require. To make that easier,
+a <code class="docutils literal"><span class="pre">getLoopAnalysisUsage</span></code> function is provided by <code class="docutils literal"><span class="pre">LoopUtils.h</span></code>. It can be
+called within the subclassâs <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> override to get consistent
+and correct behavior. Analogously, <code class="docutils literal"><span class="pre">INITIALIZE_PASS_DEPENDENCY(LoopPass)</span></code>
+will initialize this set of function analyses.</p>
+<div class="section" id="the-doinitialization-loop-lppassmanager-method">
+<h4><a class="toc-backref" href="#id23">The <code class="docutils literal"><span class="pre">doInitialization(Loop</span> <span class="pre">*,</span> <span class="pre">LPPassManager</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-loop-lppassmanager-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Loop</span> <span class="o">*</span><span class="p">,</span> <span class="n">LPPassManager</span> <span class="o">&</span><span class="n">LPM</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to do simple initialization type of
+stuff that does not depend on the functions being processed.  The
+<code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).  <code class="docutils literal"><span class="pre">LPPassManager</span></code> interface
+should be used to access <code class="docutils literal"><span class="pre">Function</span></code> or <code class="docutils literal"><span class="pre">Module</span></code> level analysis information.</p>
+</div>
+<div class="section" id="the-runonloop-method">
+<span id="writing-an-llvm-pass-runonloop"></span><h4><a class="toc-backref" href="#id24">The <code class="docutils literal"><span class="pre">runOnLoop</span></code> method</a><a class="headerlink" href="#the-runonloop-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnLoop</span><span class="p">(</span><span class="n">Loop</span> <span class="o">*</span><span class="p">,</span> <span class="n">LPPassManager</span> <span class="o">&</span><span class="n">LPM</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnLoop</span></code> method must be implemented by your subclass to do the
+transformation or analysis work of your pass.  As usual, a <code class="docutils literal"><span class="pre">true</span></code> value
+should be returned if the function is modified.  <code class="docutils literal"><span class="pre">LPPassManager</span></code> interface
+should be used to update loop nest.</p>
+</div>
+<div class="section" id="the-dofinalization-method">
+<h4><a class="toc-backref" href="#id25">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a><a class="headerlink" href="#the-dofinalization-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonloop"><span class="std std-ref">runOnLoop</span></a> for every loop in the program being compiled.</p>
+</div>
+</div>
+<div class="section" id="the-regionpass-class">
+<span id="writing-an-llvm-pass-regionpass"></span><h3><a class="toc-backref" href="#id26">The <code class="docutils literal"><span class="pre">RegionPass</span></code> class</a><a class="headerlink" href="#the-regionpass-class" title="Permalink to this headline">Â¶</a></h3>
+<p><code class="docutils literal"><span class="pre">RegionPass</span></code> is similar to <a class="reference internal" href="#writing-an-llvm-pass-looppass"><span class="std std-ref">LoopPass</span></a>,
+but executes on each single entry single exit region in the function.
+<code class="docutils literal"><span class="pre">RegionPass</span></code> processes regions in nested order such that the outer most
+region is processed last.</p>
+<p><code class="docutils literal"><span class="pre">RegionPass</span></code> subclasses are allowed to update the region tree by using the
+<code class="docutils literal"><span class="pre">RGPassManager</span></code> interface.  You may overload three virtual methods of
+<code class="docutils literal"><span class="pre">RegionPass</span></code> to implement your own region pass.  All these methods should
+return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or <code class="docutils literal"><span class="pre">false</span></code> if they did not.</p>
+<div class="section" id="the-doinitialization-region-rgpassmanager-method">
+<h4><a class="toc-backref" href="#id27">The <code class="docutils literal"><span class="pre">doInitialization(Region</span> <span class="pre">*,</span> <span class="pre">RGPassManager</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-region-rgpassmanager-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Region</span> <span class="o">*</span><span class="p">,</span> <span class="n">RGPassManager</span> <span class="o">&</span><span class="n">RGM</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to do simple initialization type of
+stuff that does not depend on the functions being processed.  The
+<code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).  <code class="docutils literal"><span class="pre">RPPassManager</span></code> interface
+should be used to access <code class="docutils literal"><span class="pre">Function</span></code> or <code class="docutils literal"><span class="pre">Module</span></code> level analysis information.</p>
+</div>
+<div class="section" id="the-runonregion-method">
+<span id="writing-an-llvm-pass-runonregion"></span><h4><a class="toc-backref" href="#id28">The <code class="docutils literal"><span class="pre">runOnRegion</span></code> method</a><a class="headerlink" href="#the-runonregion-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnRegion</span><span class="p">(</span><span class="n">Region</span> <span class="o">*</span><span class="p">,</span> <span class="n">RGPassManager</span> <span class="o">&</span><span class="n">RGM</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnRegion</span></code> method must be implemented by your subclass to do the
+transformation or analysis work of your pass.  As usual, a true value should be
+returned if the region is modified.  <code class="docutils literal"><span class="pre">RGPassManager</span></code> interface should be used to
+update region tree.</p>
+</div>
+<div class="section" id="id2">
+<h4><a class="toc-backref" href="#id29">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a><a class="headerlink" href="#id2" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonregion"><span class="std std-ref">runOnRegion</span></a> for every region in the program being
+compiled.</p>
+</div>
+</div>
+<div class="section" id="the-basicblockpass-class">
+<span id="writing-an-llvm-pass-basicblockpass"></span><h3><a class="toc-backref" href="#id30">The <code class="docutils literal"><span class="pre">BasicBlockPass</span></code> class</a><a class="headerlink" href="#the-basicblockpass-class" title="Permalink to this headline">Â¶</a></h3>
+<p><code class="docutils literal"><span class="pre">BasicBlockPass</span></code>es are just like <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPassâs</span></a> , except that they must limit their scope
+of inspection and modification to a single basic block at a time.  As such,
+they are <strong>not</strong> allowed to do any of the following:</p>
+<ol class="arabic simple">
+<li>Modify or inspect any basic blocks outside of the current one.</li>
+<li>Maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonbasicblock"><span class="std std-ref">runOnBasicBlock</span></a>.</li>
+<li>Modify the control flow graph (by altering terminator instructions)</li>
+<li>Any of the things forbidden for <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPasses</span></a>.</li>
+</ol>
+<p><code class="docutils literal"><span class="pre">BasicBlockPass</span></code>es are useful for traditional local and âpeepholeâ
+optimizations.  They may override the same <a class="reference internal" href="#writing-an-llvm-pass-doinitialization-mod"><span class="std std-ref">doInitialization(Module &)</span></a> and <a class="reference internal" href="#writing-an-llvm-pass-dofinalization-mod"><span class="std std-ref">doFinalization(Module &)</span></a> methods that <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPassâs</span></a> have, but also have the following virtual
+methods that may also be implemented:</p>
+<div class="section" id="the-doinitialization-function-method">
+<h4><a class="toc-backref" href="#id31">The <code class="docutils literal"><span class="pre">doInitialization(Function</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-function-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is allowed to do most of the things that
+<code class="docutils literal"><span class="pre">BasicBlockPass</span></code>es are not allowed to do, but that <code class="docutils literal"><span class="pre">FunctionPass</span></code>es
+can.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to do simple initialization
+that does not depend on the <code class="docutils literal"><span class="pre">BasicBlock</span></code>s being processed.  The
+<code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).</p>
+</div>
+<div class="section" id="the-runonbasicblock-method">
+<span id="writing-an-llvm-pass-runonbasicblock"></span><h4><a class="toc-backref" href="#id32">The <code class="docutils literal"><span class="pre">runOnBasicBlock</span></code> method</a><a class="headerlink" href="#the-runonbasicblock-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnBasicBlock</span><span class="p">(</span><span class="n">BasicBlock</span> <span class="o">&</span><span class="n">BB</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>Override this function to do the work of the <code class="docutils literal"><span class="pre">BasicBlockPass</span></code>.  This function
+is not allowed to inspect or modify basic blocks other than the parameter, and
+are not allowed to modify the CFG.  A <code class="docutils literal"><span class="pre">true</span></code> value must be returned if the
+basic block is modified.</p>
+</div>
+<div class="section" id="the-dofinalization-function-method">
+<h4><a class="toc-backref" href="#id33">The <code class="docutils literal"><span class="pre">doFinalization(Function</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-dofinalization-function-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonbasicblock"><span class="std std-ref">runOnBasicBlock</span></a> for every <code class="docutils literal"><span class="pre">BasicBlock</span></code> in the program
+being compiled.  This can be used to perform per-function finalization.</p>
+</div>
+</div>
+<div class="section" id="the-machinefunctionpass-class">
+<h3><a class="toc-backref" href="#id34">The <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> class</a><a class="headerlink" href="#the-machinefunctionpass-class" title="Permalink to this headline">Â¶</a></h3>
+<p>A <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> is a part of the LLVM code generator that executes on
+the machine-dependent representation of each LLVM function in the program.</p>
+<p>Code generator passes are registered and initialized specially by
+<code class="docutils literal"><span class="pre">TargetMachine::addPassesToEmitFile</span></code> and similar routines, so they cannot
+generally be run from the <strong class="program">opt</strong> or <strong class="program">bugpoint</strong> commands.</p>
+<p>A <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> is also a <code class="docutils literal"><span class="pre">FunctionPass</span></code>, so all the restrictions
+that apply to a <code class="docutils literal"><span class="pre">FunctionPass</span></code> also apply to it.  <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>es
+also have additional restrictions.  In particular, <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>es
+are not allowed to do any of the following:</p>
+<ol class="arabic simple">
+<li>Modify or create any LLVM IR <code class="docutils literal"><span class="pre">Instruction</span></code>s, <code class="docutils literal"><span class="pre">BasicBlock</span></code>s,
+<code class="docutils literal"><span class="pre">Argument</span></code>s, <code class="docutils literal"><span class="pre">Function</span></code>s, <code class="docutils literal"><span class="pre">GlobalVariable</span></code>s,
+<code class="docutils literal"><span class="pre">GlobalAlias</span></code>es, or <code class="docutils literal"><span class="pre">Module</span></code>s.</li>
+<li>Modify a <code class="docutils literal"><span class="pre">MachineFunction</span></code> other than the one currently being processed.</li>
+<li>Maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonmachinefunction"><span class="std std-ref">runOnMachineFunction</span></a> (including global data).</li>
+</ol>
+<div class="section" id="the-runonmachinefunction-machinefunction-mf-method">
+<span id="writing-an-llvm-pass-runonmachinefunction"></span><h4><a class="toc-backref" href="#id35">The <code class="docutils literal"><span class="pre">runOnMachineFunction(MachineFunction</span> <span class="pre">&MF)</span></code> method</a><a class="headerlink" href="#the-runonmachinefunction-machinefunction-mf-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnMachineFunction</span><span class="p">(</span><span class="n">MachineFunction</span> <span class="o">&</span><span class="n">MF</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> can be considered the main entry point of a
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>; that is, you should override this method to do the
+work of your <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>.</p>
+<p>The <code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> method is called on every <code class="docutils literal"><span class="pre">MachineFunction</span></code> in a
+<code class="docutils literal"><span class="pre">Module</span></code>, so that the <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> may perform optimizations on
+the machine-dependent representation of the function.  If you want to get at
+the LLVM <code class="docutils literal"><span class="pre">Function</span></code> for the <code class="docutils literal"><span class="pre">MachineFunction</span></code> youâre working on, use
+<code class="docutils literal"><span class="pre">MachineFunction</span></code>âs <code class="docutils literal"><span class="pre">getFunction()</span></code> accessor method â but remember, you
+may not modify the LLVM <code class="docutils literal"><span class="pre">Function</span></code> or its contents from a
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>.</p>
+</div>
+</div>
+<div class="section" id="pass-registration">
+<span id="writing-an-llvm-pass-registration"></span><h3><a class="toc-backref" href="#id36">Pass registration</a><a class="headerlink" href="#pass-registration" title="Permalink to this headline">Â¶</a></h3>
+<p>In the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> example pass we
+illustrated how pass registration works, and discussed some of the reasons that
+it is used and what it does.  Here we discuss how and why passes are
+registered.</p>
+<p>As we saw above, passes are registered with the <code class="docutils literal"><span class="pre">RegisterPass</span></code> template.  The
+template parameter is the name of the pass that is to be used on the command
+line to specify that the pass should be added to a program (for example, with
+<strong class="program">opt</strong> or <strong class="program">bugpoint</strong>).  The first argument is the name of the
+pass, which is to be used for the <a class="reference internal" href="CommandGuide/opt.html#cmdoption-help"><code class="xref std std-option docutils literal"><span class="pre">-help</span></code></a> output of programs, as well
+as for debug output generated by the <cite>âdebug-pass</cite> option.</p>
+<p>If you want your pass to be easily dumpable, you should implement the virtual
+print method:</p>
+<div class="section" id="the-print-method">
+<h4><a class="toc-backref" href="#id37">The <code class="docutils literal"><span class="pre">print</span></code> method</a><a class="headerlink" href="#the-print-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">void</span> <span class="nf">print</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="n">O</span><span class="p">,</span> <span class="k">const</span> <span class="n">Module</span> <span class="o">*</span><span class="n">M</span><span class="p">)</span> <span class="k">const</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">print</span></code> method must be implemented by âanalysesâ in order to print a
+human readable version of the analysis results.  This is useful for debugging
+an analysis itself, as well as for other people to figure out how an analysis
+works.  Use the opt <code class="docutils literal"><span class="pre">-analyze</span></code> argument to invoke this method.</p>
+<p>The <code class="docutils literal"><span class="pre">llvm::raw_ostream</span></code> parameter specifies the stream to write the results
+on, and the <code class="docutils literal"><span class="pre">Module</span></code> parameter gives a pointer to the top level module of the
+program that has been analyzed.  Note however that this pointer may be <code class="docutils literal"><span class="pre">NULL</span></code>
+in certain circumstances (such as calling the <code class="docutils literal"><span class="pre">Pass::dump()</span></code> from a
+debugger), so it should only be used to enhance debug output, it should not be
+depended on.</p>
+</div>
+</div>
+<div class="section" id="specifying-interactions-between-passes">
+<span id="writing-an-llvm-pass-interaction"></span><h3><a class="toc-backref" href="#id38">Specifying interactions between passes</a><a class="headerlink" href="#specifying-interactions-between-passes" title="Permalink to this headline">Â¶</a></h3>
+<p>One of the main responsibilities of the <code class="docutils literal"><span class="pre">PassManager</span></code> is to make sure that
+passes interact with each other correctly.  Because <code class="docutils literal"><span class="pre">PassManager</span></code> tries to
+<a class="reference internal" href="#writing-an-llvm-pass-passmanager"><span class="std std-ref">optimize the execution of passes</span></a> it
+must know how the passes interact with each other and what dependencies exist
+between the various passes.  To track this, each pass can declare the set of
+passes that are required to be executed before the current pass, and the passes
+which are invalidated by the current pass.</p>
+<p>Typically this functionality is used to require that analysis results are
+computed before your pass is run.  Running arbitrary transformation passes can
+invalidate the computed analysis results, which is what the invalidation set
+specifies.  If a pass does not implement the <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a> method, it defaults to not having any
+prerequisite passes, and invalidating <strong>all</strong> other passes.</p>
+<div class="section" id="the-getanalysisusage-method">
+<span id="writing-an-llvm-pass-getanalysisusage"></span><h4><a class="toc-backref" href="#id39">The <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> method</a><a class="headerlink" href="#the-getanalysisusage-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">void</span> <span class="nf">getAnalysisUsage</span><span class="p">(</span><span class="n">AnalysisUsage</span> <span class="o">&</span><span class="n">Info</span><span class="p">)</span> <span class="k">const</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>By implementing the <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> method, the required and invalidated
+sets may be specified for your transformation.  The implementation should fill
+in the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AnalysisUsage.html">AnalysisUsage</a> object with
+information about which passes are required and not invalidated.  To do this, a
+pass may call any of the following methods on the <code class="docutils literal"><span class="pre">AnalysisUsage</span></code> object:</p>
+</div>
+<div class="section" id="the-analysisusage-addrequired-and-analysisusage-addrequiredtransitive-methods">
+<h4><a class="toc-backref" href="#id40">The <code class="docutils literal"><span class="pre">AnalysisUsage::addRequired<></span></code> and <code class="docutils literal"><span class="pre">AnalysisUsage::addRequiredTransitive<></span></code> methods</a><a class="headerlink" href="#the-analysisusage-addrequired-and-analysisusage-addrequiredtransitive-methods" title="Permalink to this headline">Â¶</a></h4>
+<p>If your pass requires a previous pass to be executed (an analysis for example),
+it can use one of these methods to arrange for it to be run before your pass.
+LLVM has many different types of analyses and passes that can be required,
+spanning the range from <code class="docutils literal"><span class="pre">DominatorSet</span></code> to <code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>.  Requiring
+<code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>, for example, guarantees that there will be no critical
+edges in the CFG when your pass has been run.</p>
+<p>Some analyses chain to other analyses to do their job.  For example, an
+<cite>AliasAnalysis <AliasAnalysis></cite> implementation is required to <a class="reference internal" href="AliasAnalysis.html#aliasanalysis-chaining"><span class="std std-ref">chain</span></a> to other alias analysis passes.  In cases where
+analyses chain, the <code class="docutils literal"><span class="pre">addRequiredTransitive</span></code> method should be used instead of
+the <code class="docutils literal"><span class="pre">addRequired</span></code> method.  This informs the <code class="docutils literal"><span class="pre">PassManager</span></code> that the
+transitively required pass should be alive as long as the requiring pass is.</p>
+</div>
+<div class="section" id="the-analysisusage-addpreserved-method">
+<h4><a class="toc-backref" href="#id41">The <code class="docutils literal"><span class="pre">AnalysisUsage::addPreserved<></span></code> method</a><a class="headerlink" href="#the-analysisusage-addpreserved-method" title="Permalink to this headline">Â¶</a></h4>
+<p>One of the jobs of the <code class="docutils literal"><span class="pre">PassManager</span></code> is to optimize how and when analyses are
+run.  In particular, it attempts to avoid recomputing data unless it needs to.
+For this reason, passes are allowed to declare that they preserve (i.e., they
+donât invalidate) an existing analysis if itâs available.  For example, a
+simple constant folding pass would not modify the CFG, so it canât possibly
+affect the results of dominator analysis.  By default, all passes are assumed
+to invalidate all others.</p>
+<p>The <code class="docutils literal"><span class="pre">AnalysisUsage</span></code> class provides several methods which are useful in
+certain circumstances that are related to <code class="docutils literal"><span class="pre">addPreserved</span></code>.  In particular, the
+<code class="docutils literal"><span class="pre">setPreservesAll</span></code> method can be called to indicate that the pass does not
+modify the LLVM program at all (which is true for analyses), and the
+<code class="docutils literal"><span class="pre">setPreservesCFG</span></code> method can be used by transformations that change
+instructions in the program but do not modify the CFG or terminator
+instructions (note that this property is implicitly set for
+<a class="reference internal" href="#writing-an-llvm-pass-basicblockpass"><span class="std std-ref">BasicBlockPass</span></a>es).</p>
+<p><code class="docutils literal"><span class="pre">addPreserved</span></code> is particularly useful for transformations like
+<code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>.  This pass knows how to update a small set of loop and
+dominator related analyses if they exist, so it can preserve them, despite the
+fact that it hacks on the CFG.</p>
+</div>
+<div class="section" id="example-implementations-of-getanalysisusage">
+<h4><a class="toc-backref" href="#id42">Example implementations of <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code></a><a class="headerlink" href="#example-implementations-of-getanalysisusage" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// This example modifies the program, but does not modify the CFG</span>
+<span class="kt">void</span> <span class="n">LICM</span><span class="o">::</span><span class="n">getAnalysisUsage</span><span class="p">(</span><span class="n">AnalysisUsage</span> <span class="o">&</span><span class="n">AU</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span>
+  <span class="n">AU</span><span class="p">.</span><span class="n">setPreservesCFG</span><span class="p">();</span>
+  <span class="n">AU</span><span class="p">.</span><span class="n">addRequired</span><span class="o"><</span><span class="n">LoopInfoWrapperPass</span><span class="o">></span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="the-getanalysis-and-getanalysisifavailable-methods">
+<span id="writing-an-llvm-pass-getanalysis"></span><h4><a class="toc-backref" href="#id43">The <code class="docutils literal"><span class="pre">getAnalysis<></span></code> and <code class="docutils literal"><span class="pre">getAnalysisIfAvailable<></span></code> methods</a><a class="headerlink" href="#the-getanalysis-and-getanalysisifavailable-methods" title="Permalink to this headline">Â¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">Pass::getAnalysis<></span></code> method is automatically inherited by your class,
+providing you with access to the passes that you declared that you required
+with the <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a>
+method.  It takes a single template argument that specifies which pass class
+you want, and returns a reference to that pass.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">bool</span> <span class="n">LICM</span><span class="o">::</span><span class="n">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">LoopInfo</span> <span class="o">&</span><span class="n">LI</span> <span class="o">=</span> <span class="n">getAnalysis</span><span class="o"><</span><span class="n">LoopInfoWrapperPass</span><span class="o">></span><span class="p">().</span><span class="n">getLoopInfo</span><span class="p">();</span>
+  <span class="c1">//...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This method call returns a reference to the pass desired.  You may get a
+runtime assertion failure if you attempt to get an analysis that you did not
+declare as required in your <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a> implementation.  This method can be
+called by your <code class="docutils literal"><span class="pre">run*</span></code> method implementation, or by any other local method
+invoked by your <code class="docutils literal"><span class="pre">run*</span></code> method.</p>
+<p>A module level pass can use function level analysis info using this interface.
+For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">bool</span> <span class="n">ModuleLevelPass</span><span class="o">::</span><span class="n">runOnModule</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">//...</span>
+  <span class="n">DominatorTree</span> <span class="o">&</span><span class="n">DT</span> <span class="o">=</span> <span class="n">getAnalysis</span><span class="o"><</span><span class="n">DominatorTree</span><span class="o">></span><span class="p">(</span><span class="n">Func</span><span class="p">);</span>
+  <span class="c1">//...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>In above example, <code class="docutils literal"><span class="pre">runOnFunction</span></code> for <code class="docutils literal"><span class="pre">DominatorTree</span></code> is called by pass
+manager before returning a reference to the desired pass.</p>
+<p>If your pass is capable of updating analyses if they exist (e.g.,
+<code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>, as described above), you can use the
+<code class="docutils literal"><span class="pre">getAnalysisIfAvailable</span></code> method, which returns a pointer to the analysis if
+it is active.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">DominatorSet</span> <span class="o">*</span><span class="n">DS</span> <span class="o">=</span> <span class="n">getAnalysisIfAvailable</span><span class="o"><</span><span class="n">DominatorSet</span><span class="o">></span><span class="p">())</span> <span class="p">{</span>
+  <span class="c1">// A DominatorSet is active.  This code will update it.</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="implementing-analysis-groups">
+<h3><a class="toc-backref" href="#id44">Implementing Analysis Groups</a><a class="headerlink" href="#implementing-analysis-groups" title="Permalink to this headline">Â¶</a></h3>
+<p>Now that we understand the basics of how passes are defined, how they are used,
+and how they are required from other passes, itâs time to get a little bit
+fancier.  All of the pass relationships that we have seen so far are very
+simple: one pass depends on one other specific pass to be run before it can
+run.  For many applications, this is great, for others, more flexibility is
+required.</p>
+<p>In particular, some analyses are defined such that there is a single simple
+interface to the analysis results, but multiple ways of calculating them.
+Consider alias analysis for example.  The most trivial alias analysis returns
+âmay aliasâ for any alias query.  The most sophisticated analysis a
+flow-sensitive, context-sensitive interprocedural analysis that can take a
+significant amount of time to execute (and obviously, there is a lot of room
+between these two extremes for other implementations).  To cleanly support
+situations like this, the LLVM Pass Infrastructure supports the notion of
+Analysis Groups.</p>
+<div class="section" id="analysis-group-concepts">
+<h4><a class="toc-backref" href="#id45">Analysis Group Concepts</a><a class="headerlink" href="#analysis-group-concepts" title="Permalink to this headline">Â¶</a></h4>
+<p>An Analysis Group is a single simple interface that may be implemented by
+multiple different passes.  Analysis Groups can be given human readable names
+just like passes, but unlike passes, they need not derive from the <code class="docutils literal"><span class="pre">Pass</span></code>
+class.  An analysis group may have one or more implementations, one of which is
+the âdefaultâ implementation.</p>
+<p>Analysis groups are used by client passes just like other passes are: the
+<code class="docutils literal"><span class="pre">AnalysisUsage::addRequired()</span></code> and <code class="docutils literal"><span class="pre">Pass::getAnalysis()</span></code> methods.  In order
+to resolve this requirement, the <a class="reference internal" href="#writing-an-llvm-pass-passmanager"><span class="std std-ref">PassManager</span></a> scans the available passes to see if any
+implementations of the analysis group are available.  If none is available, the
+default implementation is created for the pass to use.  All standard rules for
+<a class="reference internal" href="#writing-an-llvm-pass-interaction"><span class="std std-ref">interaction between passes</span></a> still
+apply.</p>
+<p>Although <a class="reference internal" href="#writing-an-llvm-pass-registration"><span class="std std-ref">Pass Registration</span></a> is
+optional for normal passes, all analysis group implementations must be
+registered, and must use the <a class="reference internal" href="#writing-an-llvm-pass-registeranalysisgroup"><span class="std std-ref">INITIALIZE_AG_PASS</span></a> template to join the
+implementation pool.  Also, a default implementation of the interface <strong>must</strong>
+be registered with <a class="reference internal" href="#writing-an-llvm-pass-registeranalysisgroup"><span class="std std-ref">RegisterAnalysisGroup</span></a>.</p>
+<p>As a concrete example of an Analysis Group in action, consider the
+<a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html">AliasAnalysis</a>
+analysis group.  The default implementation of the alias analysis interface
+(the <a class="reference external" href="http://llvm.org/doxygen/structBasicAliasAnalysis.html">basicaa</a> pass)
+just does a few simple checks that donât require significant analysis to
+compute (such as: two different globals can never alias each other, etc).
+Passes that use the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html">AliasAnalysis</a> interface (for
+example the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1GVN.html">gvn</a> pass), do not
+care which implementation of alias analysis is actually provided, they just use
+the designated interface.</p>
+<p>From the userâs perspective, commands work just like normal.  Issuing the
+command <code class="docutils literal"><span class="pre">opt</span> <span class="pre">-gvn</span> <span class="pre">...</span></code> will cause the <code class="docutils literal"><span class="pre">basicaa</span></code> class to be instantiated
+and added to the pass sequence.  Issuing the command <code class="docutils literal"><span class="pre">opt</span> <span class="pre">-somefancyaa</span> <span class="pre">-gvn</span>
+<span class="pre">...</span></code> will cause the <code class="docutils literal"><span class="pre">gvn</span></code> pass to use the <code class="docutils literal"><span class="pre">somefancyaa</span></code> alias analysis
+(which doesnât actually exist, itâs just a hypothetical example) instead.</p>
+</div>
+<div class="section" id="using-registeranalysisgroup">
+<span id="writing-an-llvm-pass-registeranalysisgroup"></span><h4><a class="toc-backref" href="#id46">Using <code class="docutils literal"><span class="pre">RegisterAnalysisGroup</span></code></a><a class="headerlink" href="#using-registeranalysisgroup" title="Permalink to this headline">Â¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">RegisterAnalysisGroup</span></code> template is used to register the analysis group
+itself, while the <code class="docutils literal"><span class="pre">INITIALIZE_AG_PASS</span></code> is used to add pass implementations to
+the analysis group.  First, an analysis group should be registered, with a
+human readable name provided for it.  Unlike registration of passes, there is
+no command line argument to be specified for the Analysis Group Interface
+itself, because it is âabstractâ:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">RegisterAnalysisGroup</span><span class="o"><</span><span class="n">AliasAnalysis</span><span class="o">></span> <span class="n">A</span><span class="p">(</span><span class="s">"Alias Analysis"</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Once the analysis is registered, passes can declare that they are valid
+implementations of the interface by using the following code:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="p">{</span>
+  <span class="c1">// Declare that we implement the AliasAnalysis interface</span>
+  <span class="n">INITIALIZE_AG_PASS</span><span class="p">(</span><span class="n">FancyAA</span><span class="p">,</span> <span class="n">AliasAnalysis</span> <span class="p">,</span> <span class="s">"somefancyaa"</span><span class="p">,</span>
+      <span class="s">"A more complex alias analysis implementation"</span><span class="p">,</span>
+      <span class="nb">false</span><span class="p">,</span>  <span class="c1">// Is CFG Only?</span>
+      <span class="nb">true</span><span class="p">,</span>   <span class="c1">// Is Analysis?</span>
+      <span class="nb">false</span><span class="p">);</span> <span class="c1">// Is default Analysis Group implementation?</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This just shows a class <code class="docutils literal"><span class="pre">FancyAA</span></code> that uses the <code class="docutils literal"><span class="pre">INITIALIZE_AG_PASS</span></code> macro
+both to register and to âjoinâ the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html">AliasAnalysis</a> analysis group.
+Every implementation of an analysis group should join using this macro.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="p">{</span>
+  <span class="c1">// Declare that we implement the AliasAnalysis interface</span>
+  <span class="n">INITIALIZE_AG_PASS</span><span class="p">(</span><span class="n">BasicAA</span><span class="p">,</span> <span class="n">AliasAnalysis</span><span class="p">,</span> <span class="s">"basicaa"</span><span class="p">,</span>
+      <span class="s">"Basic Alias Analysis (default AA impl)"</span><span class="p">,</span>
+      <span class="nb">false</span><span class="p">,</span> <span class="c1">// Is CFG Only?</span>
+      <span class="nb">true</span><span class="p">,</span>  <span class="c1">// Is Analysis?</span>
+      <span class="nb">true</span><span class="p">);</span> <span class="c1">// Is default Analysis Group implementation?</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Here we show how the default implementation is specified (using the final
+argument to the <code class="docutils literal"><span class="pre">INITIALIZE_AG_PASS</span></code> template).  There must be exactly one
+default implementation available at all times for an Analysis Group to be used.
+Only default implementation can derive from <code class="docutils literal"><span class="pre">ImmutablePass</span></code>.  Here we declare
+that the <a class="reference external" href="http://llvm.org/doxygen/structBasicAliasAnalysis.html">BasicAliasAnalysis</a> pass is the default
+implementation for the interface.</p>
+</div>
+</div>
+</div>
+<div class="section" id="pass-statistics">
+<h2><a class="toc-backref" href="#id47">Pass Statistics</a><a class="headerlink" href="#pass-statistics" title="Permalink to this headline">Â¶</a></h2>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/Statistic_8h-source.html">Statistic</a> class is
+designed to be an easy way to expose various success metrics from passes.
+These statistics are printed at the end of a run, when the <a class="reference internal" href="CommandGuide/opt.html#cmdoption-stats"><code class="xref std std-option docutils literal"><span class="pre">-stats</span></code></a>
+command line option is enabled on the command line.  See the <a class="reference internal" href="ProgrammersManual.html#statistic"><span class="std std-ref">Statistics
+section</span></a> in the Programmerâs Manual for details.</p>
+<div class="section" id="what-passmanager-does">
+<span id="writing-an-llvm-pass-passmanager"></span><h3><a class="toc-backref" href="#id48">What PassManager does</a><a class="headerlink" href="#what-passmanager-does" title="Permalink to this headline">Â¶</a></h3>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/PassManager_8h-source.html">PassManager</a> <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1PassManager.html">class</a> takes a list of
+passes, ensures their <a class="reference internal" href="#writing-an-llvm-pass-interaction"><span class="std std-ref">prerequisites</span></a>
+are set up correctly, and then schedules passes to run efficiently.  All of the
+LLVM tools that run passes use the PassManager for execution of these passes.</p>
+<p>The PassManager does two main things to try to reduce the execution time of a
+series of passes:</p>
+<ol class="arabic">
+<li><p class="first"><strong>Share analysis results.</strong>  The <code class="docutils literal"><span class="pre">PassManager</span></code> attempts to avoid
+recomputing analysis results as much as possible.  This means keeping track
+of which analyses are available already, which analyses get invalidated, and
+which analyses are needed to be run for a pass.  An important part of work
+is that the <code class="docutils literal"><span class="pre">PassManager</span></code> tracks the exact lifetime of all analysis
+results, allowing it to <a class="reference internal" href="#writing-an-llvm-pass-releasememory"><span class="std std-ref">free memory</span></a> allocated to holding analysis results
+as soon as they are no longer needed.</p>
+</li>
+<li><p class="first"><strong>Pipeline the execution of passes on the program.</strong>  The <code class="docutils literal"><span class="pre">PassManager</span></code>
+attempts to get better cache and memory usage behavior out of a series of
+passes by pipelining the passes together.  This means that, given a series
+of consecutive <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a>, it
+will execute all of the <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> on the first function, then all of the
+<a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPasses</span></a> on the second
+function, etcâ¦ until the entire program has been run through the passes.</p>
+<p>This improves the cache behavior of the compiler, because it is only
+touching the LLVM program representation for a single function at a time,
+instead of traversing the entire program.  It reduces the memory consumption
+of compiler, because, for example, only one <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1DominatorSet.html">DominatorSet</a> needs to be
+calculated at a time.  This also makes it possible to implement some
+<a class="reference internal" href="#writing-an-llvm-pass-smp"><span class="std std-ref">interesting enhancements</span></a> in the future.</p>
+</li>
+</ol>
+<p>The effectiveness of the <code class="docutils literal"><span class="pre">PassManager</span></code> is influenced directly by how much
+information it has about the behaviors of the passes it is scheduling.  For
+example, the âpreservedâ set is intentionally conservative in the face of an
+unimplemented <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a>
+method.  Not implementing when it should be implemented will have the effect of
+not allowing any analysis results to live across the execution of your pass.</p>
+<p>The <code class="docutils literal"><span class="pre">PassManager</span></code> class exposes a <code class="docutils literal"><span class="pre">--debug-pass</span></code> command line options that
+is useful for debugging pass execution, seeing how things work, and diagnosing
+when you should be preserving more analyses than you currently are.  (To get
+information about all of the variants of the <code class="docutils literal"><span class="pre">--debug-pass</span></code> option, just type
+â<code class="docutils literal"><span class="pre">opt</span> <span class="pre">-help-hidden</span></code>â).</p>
+<p>By using the âdebug-pass=Structure option, for example, we can see how our
+<a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> pass interacts with other
+passes.  Lets try it out with the gvn and licm passes:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -gvn -licm --debug-pass<span class="o">=</span>Structure < hello.bc > /dev/null
+<span class="go">ModulePass Manager</span>
+<span class="go">  FunctionPass Manager</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Memory Dependence Analysis</span>
+<span class="go">    Global Value Numbering</span>
+<span class="go">    Natural Loop Information</span>
+<span class="go">    Canonicalize natural loops</span>
+<span class="go">    Loop-Closed SSA Form Pass</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Scalar Evolution Analysis</span>
+<span class="go">    Loop Pass Manager</span>
+<span class="go">      Loop Invariant Code Motion</span>
+<span class="go">    Module Verifier</span>
+<span class="go">  Bitcode Writer</span>
+</pre></div>
+</div>
+<p>This output shows us when passes are constructed.
+Here we see that GVN uses dominator tree information to do its job.  The LICM pass
+uses natural loop information, which uses dominator tree as well.</p>
+<p>After the LICM pass, the module verifier runs (which is automatically added by
+the <strong class="program">opt</strong> tool), which uses the dominator tree to check that the
+resultant LLVM code is well formed. Note that the dominator tree is computed
+once, and shared by three passes.</p>
+<p>Lets see how this changes when we run the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> pass in between the two passes:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -gvn -hello -licm --debug-pass<span class="o">=</span>Structure < hello.bc > /dev/null
+<span class="go">ModulePass Manager</span>
+<span class="go">  FunctionPass Manager</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Memory Dependence Analysis</span>
+<span class="go">    Global Value Numbering</span>
+<span class="go">    Hello World Pass</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Natural Loop Information</span>
+<span class="go">    Canonicalize natural loops</span>
+<span class="go">    Loop-Closed SSA Form Pass</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Scalar Evolution Analysis</span>
+<span class="go">    Loop Pass Manager</span>
+<span class="go">      Loop Invariant Code Motion</span>
+<span class="go">    Module Verifier</span>
+<span class="go">  Bitcode Writer</span>
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+</pre></div>
+</div>
+<p>Here we see that the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> pass
+has killed the Dominator Tree pass, even though it doesnât modify the code at
+all!  To fix this, we need to add the following <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a> method to our pass:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// We don't modify the program, so we preserve all analyses</span>
+<span class="kt">void</span> <span class="nf">getAnalysisUsage</span><span class="p">(</span><span class="n">AnalysisUsage</span> <span class="o">&</span><span class="n">AU</span><span class="p">)</span> <span class="k">const</span> <span class="k">override</span> <span class="p">{</span>
+  <span class="n">AU</span><span class="p">.</span><span class="n">setPreservesAll</span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Now when we run our pass, we get this output:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -gvn -hello -licm --debug-pass<span class="o">=</span>Structure < hello.bc > /dev/null
+<span class="go">Pass Arguments:  -gvn -hello -licm</span>
+<span class="go">ModulePass Manager</span>
+<span class="go">  FunctionPass Manager</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Memory Dependence Analysis</span>
+<span class="go">    Global Value Numbering</span>
+<span class="go">    Hello World Pass</span>
+<span class="go">    Natural Loop Information</span>
+<span class="go">    Canonicalize natural loops</span>
+<span class="go">    Loop-Closed SSA Form Pass</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Scalar Evolution Analysis</span>
+<span class="go">    Loop Pass Manager</span>
+<span class="go">      Loop Invariant Code Motion</span>
+<span class="go">    Module Verifier</span>
+<span class="go">  Bitcode Writer</span>
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+</pre></div>
+</div>
+<p>Which shows that we donât accidentally invalidate dominator information
+anymore, and therefore do not have to compute it twice.</p>
+<div class="section" id="the-releasememory-method">
+<span id="writing-an-llvm-pass-releasememory"></span><h4><a class="toc-backref" href="#id49">The <code class="docutils literal"><span class="pre">releaseMemory</span></code> method</a><a class="headerlink" href="#the-releasememory-method" title="Permalink to this headline">Â¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">void</span> <span class="nf">releaseMemory</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">PassManager</span></code> automatically determines when to compute analysis results,
+and how long to keep them around for.  Because the lifetime of the pass object
+itself is effectively the entire duration of the compilation process, we need
+some way to free analysis results when they are no longer useful.  The
+<code class="docutils literal"><span class="pre">releaseMemory</span></code> virtual method is the way to do this.</p>
+<p>If you are writing an analysis or any other pass that retains a significant
+amount of state (for use by another pass which ârequiresâ your pass and uses
+the <a class="reference internal" href="#writing-an-llvm-pass-getanalysis"><span class="std std-ref">getAnalysis</span></a> method) you should
+implement <code class="docutils literal"><span class="pre">releaseMemory</span></code> to, well, release the memory allocated to maintain
+this internal state.  This method is called after the <code class="docutils literal"><span class="pre">run*</span></code> method for the
+class, before the next call of <code class="docutils literal"><span class="pre">run*</span></code> in your pass.</p>
+</div>
+</div>
+</div>
+<div class="section" id="registering-dynamically-loaded-passes">
+<h2><a class="toc-backref" href="#id50">Registering dynamically loaded passes</a><a class="headerlink" href="#registering-dynamically-loaded-passes" title="Permalink to this headline">Â¶</a></h2>
+<p><em>Size matters</em> when constructing production quality tools using LLVM, both for
+the purposes of distribution, and for regulating the resident code size when
+running on the target system.  Therefore, it becomes desirable to selectively
+use some passes, while omitting others and maintain the flexibility to change
+configurations later on.  You want to be able to do all this, and, provide
+feedback to the user.  This is where pass registration comes into play.</p>
+<p>The fundamental mechanisms for pass registration are the
+<code class="docutils literal"><span class="pre">MachinePassRegistry</span></code> class and subclasses of <code class="docutils literal"><span class="pre">MachinePassRegistryNode</span></code>.</p>
+<p>An instance of <code class="docutils literal"><span class="pre">MachinePassRegistry</span></code> is used to maintain a list of
+<code class="docutils literal"><span class="pre">MachinePassRegistryNode</span></code> objects.  This instance maintains the list and
+communicates additions and deletions to the command line interface.</p>
+<p>An instance of <code class="docutils literal"><span class="pre">MachinePassRegistryNode</span></code> subclass is used to maintain
+information provided about a particular pass.  This information includes the
+command line name, the command help string and the address of the function used
+to create an instance of the pass.  A global static constructor of one of these
+instances <em>registers</em> with a corresponding <code class="docutils literal"><span class="pre">MachinePassRegistry</span></code>, the static
+destructor <em>unregisters</em>.  Thus a pass that is statically linked in the tool
+will be registered at start up.  A dynamically loaded pass will register on
+load and unregister at unload.</p>
+<div class="section" id="using-existing-registries">
+<h3><a class="toc-backref" href="#id51">Using existing registries</a><a class="headerlink" href="#using-existing-registries" title="Permalink to this headline">Â¶</a></h3>
+<p>There are predefined registries to track instruction scheduling
+(<code class="docutils literal"><span class="pre">RegisterScheduler</span></code>) and register allocation (<code class="docutils literal"><span class="pre">RegisterRegAlloc</span></code>) machine
+passes.  Here we will describe how to <em>register</em> a register allocator machine
+pass.</p>
+<p>Implement your register allocator machine pass.  In your register allocator
+<code class="docutils literal"><span class="pre">.cpp</span></code> file add the following include:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/CodeGen/RegAllocRegistry.h"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>Also in your register allocator <code class="docutils literal"><span class="pre">.cpp</span></code> file, define a creator function in the
+form:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">FunctionPass</span> <span class="o">*</span><span class="nf">createMyRegisterAllocator</span><span class="p">()</span> <span class="p">{</span>
+  <span class="k">return</span> <span class="k">new</span> <span class="n">MyRegisterAllocator</span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Note that the signature of this function should match the type of
+<code class="docutils literal"><span class="pre">RegisterRegAlloc::FunctionPassCtor</span></code>.  In the same file add the âinstallingâ
+declaration, in the form:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">RegisterRegAlloc</span> <span class="nf">myRegAlloc</span><span class="p">(</span><span class="s">"myregalloc"</span><span class="p">,</span>
+                                   <span class="s">"my register allocator help string"</span><span class="p">,</span>
+                                   <span class="n">createMyRegisterAllocator</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Note the two spaces prior to the help string produces a tidy result on the
+<a class="reference internal" href="CommandGuide/opt.html#cmdoption-help"><code class="xref std std-option docutils literal"><span class="pre">-help</span></code></a> query.</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> llc -help
+<span class="go">  ...</span>
+<span class="go">  -regalloc                    - Register allocator to use (default=linearscan)</span>
+<span class="go">    =linearscan                -   linear scan register allocator</span>
+<span class="go">    =local                     -   local register allocator</span>
+<span class="go">    =simple                    -   simple register allocator</span>
+<span class="go">    =myregalloc                -   my register allocator help string</span>
+<span class="go">  ...</span>
+</pre></div>
+</div>
+<p>And thatâs it.  The user is now free to use <code class="docutils literal"><span class="pre">-regalloc=myregalloc</span></code> as an
+option.  Registering instruction schedulers is similar except use the
+<code class="docutils literal"><span class="pre">RegisterScheduler</span></code> class.  Note that the
+<code class="docutils literal"><span class="pre">RegisterScheduler::FunctionPassCtor</span></code> is significantly different from
+<code class="docutils literal"><span class="pre">RegisterRegAlloc::FunctionPassCtor</span></code>.</p>
+<p>To force the load/linking of your register allocator into the
+<strong class="program">llc</strong>/<strong class="program">lli</strong> tools, add your creator functionâs global
+declaration to <code class="docutils literal"><span class="pre">Passes.h</span></code> and add a âpseudoâ call line to
+<code class="docutils literal"><span class="pre">llvm/Codegen/LinkAllCodegenComponents.h</span></code>.</p>
+</div>
+<div class="section" id="creating-new-registries">
+<h3><a class="toc-backref" href="#id52">Creating new registries</a><a class="headerlink" href="#creating-new-registries" title="Permalink to this headline">Â¶</a></h3>
+<p>The easiest way to get started is to clone one of the existing registries; we
+recommend <code class="docutils literal"><span class="pre">llvm/CodeGen/RegAllocRegistry.h</span></code>.  The key things to modify are
+the class name and the <code class="docutils literal"><span class="pre">FunctionPassCtor</span></code> type.</p>
+<p>Then you need to declare the registry.  Example: if your pass registry is
+<code class="docutils literal"><span class="pre">RegisterMyPasses</span></code> then define:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">MachinePassRegistry</span> <span class="n">RegisterMyPasses</span><span class="o">::</span><span class="n">Registry</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>And finally, declare the command line option for your passes.  Example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">cl</span><span class="o">::</span><span class="n">opt</span><span class="o"><</span><span class="n">RegisterMyPasses</span><span class="o">::</span><span class="n">FunctionPassCtor</span><span class="p">,</span> <span class="nb">false</span><span class="p">,</span>
+        <span class="n">RegisterPassParser</span><span class="o"><</span><span class="n">RegisterMyPasses</span><span class="o">></span> <span class="o">></span>
+<span class="n">MyPassOpt</span><span class="p">(</span><span class="s">"mypass"</span><span class="p">,</span>
+          <span class="n">cl</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="o">&</span><span class="n">createDefaultMyPass</span><span class="p">),</span>
+          <span class="n">cl</span><span class="o">::</span><span class="n">desc</span><span class="p">(</span><span class="s">"my pass option help"</span><span class="p">));</span>
+</pre></div>
+</div>
+<p>Here the command option is â<code class="docutils literal"><span class="pre">mypass</span></code>â, with <code class="docutils literal"><span class="pre">createDefaultMyPass</span></code> as the
+default creator.</p>
+</div>
+<div class="section" id="using-gdb-with-dynamically-loaded-passes">
+<h3><a class="toc-backref" href="#id53">Using GDB with dynamically loaded passes</a><a class="headerlink" href="#using-gdb-with-dynamically-loaded-passes" title="Permalink to this headline">Â¶</a></h3>
+<p>Unfortunately, using GDB with dynamically loaded passes is not as easy as it
+should be.  First of all, you canât set a breakpoint in a shared object that
+has not been loaded yet, and second of all there are problems with inlined
+functions in shared objects.  Here are some suggestions to debugging your pass
+with GDB.</p>
+<p>For sake of discussion, Iâm going to assume that you are debugging a
+transformation invoked by <strong class="program">opt</strong>, although nothing described here
+depends on that.</p>
+<div class="section" id="setting-a-breakpoint-in-your-pass">
+<h4><a class="toc-backref" href="#id54">Setting a breakpoint in your pass</a><a class="headerlink" href="#setting-a-breakpoint-in-your-pass" title="Permalink to this headline">Â¶</a></h4>
+<p>First thing you do is start gdb on the opt process:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> gdb opt
+<span class="go">GNU gdb 5.0</span>
+<span class="go">Copyright 2000 Free Software Foundation, Inc.</span>
+<span class="go">GDB is free software, covered by the GNU General Public License, and you are</span>
+<span class="go">welcome to change it and/or distribute copies of it under certain conditions.</span>
+<span class="go">Type "show copying" to see the conditions.</span>
+<span class="go">There is absolutely no warranty for GDB.  Type "show warranty" for details.</span>
+<span class="go">This GDB was configured as "sparc-sun-solaris2.6"...</span>
+<span class="go">(gdb)</span>
+</pre></div>
+</div>
+<p>Note that <strong class="program">opt</strong> has a lot of debugging information in it, so it takes
+time to load.  Be patient.  Since we cannot set a breakpoint in our pass yet
+(the shared object isnât loaded until runtime), we must execute the process,
+and have it stop before it invokes our pass, but after it has loaded the shared
+object.  The most foolproof way of doing this is to set a breakpoint in
+<code class="docutils literal"><span class="pre">PassManager::run</span></code> and then run the process with the arguments you want:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> <span class="o">(</span>gdb<span class="o">)</span> <span class="nb">break</span> llvm::PassManager::run
+<span class="go">Breakpoint 1 at 0x2413bc: file Pass.cpp, line 70.</span>
+<span class="go">(gdb) run test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]</span>
+<span class="go">Starting program: opt test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]</span>
+<span class="go">Breakpoint 1, PassManager::run (this=0xffbef174, M=@0x70b298) at Pass.cpp:70</span>
+<span class="go">70      bool PassManager::run(Module &M) { return PM->run(M); }</span>
+<span class="go">(gdb)</span>
+</pre></div>
+</div>
+<p>Once the <strong class="program">opt</strong> stops in the <code class="docutils literal"><span class="pre">PassManager::run</span></code> method you are now
+free to set breakpoints in your pass so that you can trace through execution or
+do other standard debugging stuff.</p>
+</div>
+<div class="section" id="miscellaneous-problems">
+<h4><a class="toc-backref" href="#id55">Miscellaneous Problems</a><a class="headerlink" href="#miscellaneous-problems" title="Permalink to this headline">Â¶</a></h4>
+<p>Once you have the basics down, there are a couple of problems that GDB has,
+some with solutions, some without.</p>
+<ul class="simple">
+<li>Inline functions have bogus stack information.  In general, GDB does a pretty
+good job getting stack traces and stepping through inline functions.  When a
+pass is dynamically loaded however, it somehow completely loses this
+capability.  The only solution I know of is to de-inline a function (move it
+from the body of a class to a <code class="docutils literal"><span class="pre">.cpp</span></code> file).</li>
+<li>Restarting the program breaks breakpoints.  After following the information
+above, you have succeeded in getting some breakpoints planted in your pass.
+Next thing you know, you restart the program (i.e., you type â<code class="docutils literal"><span class="pre">run</span></code>â again),
+and you start getting errors about breakpoints being unsettable.  The only
+way I have found to âfixâ this problem is to delete the breakpoints that are
+already set in your pass, run the program, and re-set the breakpoints once
+execution stops in <code class="docutils literal"><span class="pre">PassManager::run</span></code>.</li>
+</ul>
+<p>Hopefully these tips will help with common case debugging situations.  If youâd
+like to contribute some tips of your own, just contact <a class="reference external" href="mailto:sabre%40nondot.org">Chris</a>.</p>
+</div>
+</div>
+<div class="section" id="future-extensions-planned">
+<h3><a class="toc-backref" href="#id56">Future extensions planned</a><a class="headerlink" href="#future-extensions-planned" title="Permalink to this headline">Â¶</a></h3>
+<p>Although the LLVM Pass Infrastructure is very capable as it stands, and does
+some nifty stuff, there are things weâd like to add in the future.  Here is
+where we are going:</p>
+<div class="section" id="multithreaded-llvm">
+<span id="writing-an-llvm-pass-smp"></span><h4><a class="toc-backref" href="#id57">Multithreaded LLVM</a><a class="headerlink" href="#multithreaded-llvm" title="Permalink to this headline">Â¶</a></h4>
+<p>Multiple CPU machines are becoming more common and compilation can never be
+fast enough: obviously we should allow for a multithreaded compiler.  Because
+of the semantics defined for passes above (specifically they cannot maintain
+state across invocations of their <code class="docutils literal"><span class="pre">run*</span></code> methods), a nice clean way to
+implement a multithreaded compiler would be for the <code class="docutils literal"><span class="pre">PassManager</span></code> class to
+create multiple instances of each pass object, and allow the separate instances
+to be hacking on different parts of the program at the same time.</p>
+<p>This implementation would prevent each of the passes from having to implement
+multithreaded constructs, requiring only the LLVM core to have locking in a few
+places (for global resources).  Although this is a simple extension, we simply
+havenât had time (or multiprocessor machines, thus a reason) to implement this.
+Despite that, we have kept the LLVM passes SMP ready, and you should too.</p>
+</div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="HowToUseAttributes.html" title="How To Use Attributes"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="GarbageCollection.html" title="Garbage Collection with LLVM"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/XRay.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/XRay.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/XRay.html (added)
+++ www-releases/trunk/5.0.1/docs/XRay.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,408 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>XRay Instrumentation — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="Debugging with XRay" href="XRayExample.html" />
+    <link rel="prev" title="Global Instruction Selection" href="GlobalISel.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="XRayExample.html" title="Debugging with XRay"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="GlobalISel.html" title="Global Instruction Selection"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="xray-instrumentation">
+<h1>XRay Instrumentation<a class="headerlink" href="#xray-instrumentation" title="Permalink to this headline">Â¶</a></h1>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Version:</th><td class="field-body">1 as of 2016-11-08</td>
+</tr>
+</tbody>
+</table>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction" id="id1">Introduction</a></li>
+<li><a class="reference internal" href="#xray-in-llvm" id="id2">XRay in LLVM</a></li>
+<li><a class="reference internal" href="#using-xray" id="id3">Using XRay</a><ul>
+<li><a class="reference internal" href="#instrumenting-your-c-c-objective-c-application" id="id4">Instrumenting your C/C++/Objective-C Application</a></li>
+<li><a class="reference internal" href="#llvm-function-attribute" id="id5">LLVM Function Attribute</a></li>
+<li><a class="reference internal" href="#xray-runtime-library" id="id6">XRay Runtime Library</a></li>
+<li><a class="reference internal" href="#flight-data-recorder-mode" id="id7">Flight Data Recorder Mode</a></li>
+<li><a class="reference internal" href="#trace-analysis-tools" id="id8">Trace Analysis Tools</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#future-work" id="id9">Future Work</a><ul>
+<li><a class="reference internal" href="#trace-analysis" id="id10">Trace Analysis</a></li>
+<li><a class="reference internal" href="#more-platforms" id="id11">More Platforms</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="introduction">
+<h2><a class="toc-backref" href="#id1">Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this headline">Â¶</a></h2>
+<p>XRay is a function call tracing system which combines compiler-inserted
+instrumentation points and a runtime library that can dynamically enable and
+disable the instrumentation.</p>
+<p>More high level information about XRay can be found in the <a class="reference external" href="http://research.google.com/pubs/pub45287.html">XRay whitepaper</a>.</p>
+<p>This document describes how to use XRay as implemented in LLVM.</p>
+</div>
+<div class="section" id="xray-in-llvm">
+<h2><a class="toc-backref" href="#id2">XRay in LLVM</a><a class="headerlink" href="#xray-in-llvm" title="Permalink to this headline">Â¶</a></h2>
+<p>XRay consists of three main parts:</p>
+<ul>
+<li><p class="first">Compiler-inserted instrumentation points.</p>
+</li>
+<li><p class="first">A runtime library for enabling/disabling tracing at runtime.</p>
+</li>
+<li><p class="first">A suite of tools for analysing the traces.</p>
+<p><strong>NOTE:</strong> As of February 27, 2017 , XRay is only available for the following
+architectures running Linux: x86_64, arm7 (no thumb), aarch64, powerpc64le,
+mips, mipsel, mips64, mips64el.</p>
+</li>
+</ul>
+<p>The compiler-inserted instrumentation points come in the form of nop-sleds in
+the final generated binary, and an ELF section named <code class="docutils literal"><span class="pre">xray_instr_map</span></code> which
+contains entries pointing to these instrumentation points. The runtime library
+relies on being able to access the entries of the <code class="docutils literal"><span class="pre">xray_instr_map</span></code>, and
+overwrite the instrumentation points at runtime.</p>
+</div>
+<div class="section" id="using-xray">
+<h2><a class="toc-backref" href="#id3">Using XRay</a><a class="headerlink" href="#using-xray" title="Permalink to this headline">Â¶</a></h2>
+<p>You can use XRay in a couple of ways:</p>
+<ul class="simple">
+<li>Instrumenting your C/C++/Objective-C/Objective-C++ application.</li>
+<li>Generating LLVM IR with the correct function attributes.</li>
+</ul>
+<p>The rest of this section covers these main ways and later on how to customise
+what XRay does in an XRay-instrumented binary.</p>
+<div class="section" id="instrumenting-your-c-c-objective-c-application">
+<h3><a class="toc-backref" href="#id4">Instrumenting your C/C++/Objective-C Application</a><a class="headerlink" href="#instrumenting-your-c-c-objective-c-application" title="Permalink to this headline">Â¶</a></h3>
+<p>The easiest way of getting XRay instrumentation for your application is by
+enabling the <code class="docutils literal"><span class="pre">-fxray-instrument</span></code> flag in your clang invocation.</p>
+<p>For example:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">clang</span> <span class="o">-</span><span class="n">fxray</span><span class="o">-</span><span class="n">instrument</span> <span class="o">..</span>
+</pre></div>
+</div>
+<p>By default, functions that have at least 200 instructions will get XRay
+instrumentation points. You can tweak that number through the
+<code class="docutils literal"><span class="pre">-fxray-instruction-threshold=</span></code> flag:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">clang</span> <span class="o">-</span><span class="n">fxray</span><span class="o">-</span><span class="n">instrument</span> <span class="o">-</span><span class="n">fxray</span><span class="o">-</span><span class="n">instruction</span><span class="o">-</span><span class="n">threshold</span><span class="o">=</span><span class="mi">1</span> <span class="o">..</span>
+</pre></div>
+</div>
+<p>You can also specifically instrument functions in your binary to either always
+or never be instrumented using source-level attributes. You can do it using the
+GCC-style attributes or C++11-style attributes.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_always_intrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">always_instrumented</span><span class="p">();</span>
+
+<span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_never_instrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">never_instrumented</span><span class="p">();</span>
+
+<span class="kt">void</span> <span class="nf">alt_always_instrumented</span><span class="p">()</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">xray_always_intrument</span><span class="p">));</span>
+
+<span class="kt">void</span> <span class="nf">alt_never_instrumented</span><span class="p">()</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">xray_never_instrument</span><span class="p">));</span>
+</pre></div>
+</div>
+<p>When linking a binary, you can either manually link in the <a class="reference internal" href="#xray-runtime-library">XRay Runtime
+Library</a> or use <code class="docutils literal"><span class="pre">clang</span></code> to link it in automatically with the
+<code class="docutils literal"><span class="pre">-fxray-instrument</span></code> flag. Alternatively, you can statically link-in the XRay
+runtime library from compiler-rt â those archive files will take the name of
+<cite>libclang_rt.xray-{arch}</cite> where <cite>{arch}</cite> is the mnemonic supported by clang
+(x86_64, arm7, etc.).</p>
+</div>
+<div class="section" id="llvm-function-attribute">
+<h3><a class="toc-backref" href="#id5">LLVM Function Attribute</a><a class="headerlink" href="#llvm-function-attribute" title="Permalink to this headline">Â¶</a></h3>
+<p>If youâre using LLVM IR directly, you can add the <code class="docutils literal"><span class="pre">function-instrument</span></code>
+string attribute to your functions, to get the similar effect that the
+C/C++/Objective-C source-level attributes would get:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="k">define</span> <span class="k">i32</span> <span class="vg">@always_instrument</span><span class="p">()</span> <span class="k">uwtable</span> <span class="s">"function-instrument"</span><span class="p">=</span><span class="s">"xray-always"</span> <span class="p">{</span>
+  <span class="c">; ...</span>
+<span class="p">}</span>
+
+<span class="k">define</span> <span class="k">i32</span> <span class="vg">@never_instrument</span><span class="p">()</span> <span class="k">uwtable</span> <span class="s">"function-instrument"</span><span class="p">=</span><span class="s">"xray-never"</span> <span class="p">{</span>
+  <span class="c">; ...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>You can also set the <code class="docutils literal"><span class="pre">xray-instruction-threshold</span></code> attribute and provide a
+numeric string value for how many instructions should be in the function before
+it gets instrumented.</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="k">define</span> <span class="k">i32</span> <span class="vg">@maybe_instrument</span><span class="p">()</span> <span class="k">uwtable</span> <span class="s">"xray-instruction-threshold"</span><span class="p">=</span><span class="s">"2"</span> <span class="p">{</span>
+  <span class="c">; ...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="xray-runtime-library">
+<h3><a class="toc-backref" href="#id6">XRay Runtime Library</a><a class="headerlink" href="#xray-runtime-library" title="Permalink to this headline">Â¶</a></h3>
+<p>The XRay Runtime Library is part of the compiler-rt project, which implements
+the runtime components that perform the patching and unpatching of inserted
+instrumentation points. When you use <code class="docutils literal"><span class="pre">clang</span></code> to link your binaries and the
+<code class="docutils literal"><span class="pre">-fxray-instrument</span></code> flag, it will automatically link in the XRay runtime.</p>
+<p>The default implementation of the XRay runtime will enable XRay instrumentation
+before <code class="docutils literal"><span class="pre">main</span></code> starts, which works for applications that have a short
+lifetime. This implementation also records all function entry and exit events
+which may result in a lot of records in the resulting trace.</p>
+<p>Also by default the filename of the XRay trace is <code class="docutils literal"><span class="pre">xray-log.XXXXXX</span></code> where the
+<code class="docutils literal"><span class="pre">XXXXXX</span></code> part is randomly generated.</p>
+<p>These options can be controlled through the <code class="docutils literal"><span class="pre">XRAY_OPTIONS</span></code> environment
+variable, where we list down the options and their defaults below.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="25%" />
+<col width="23%" />
+<col width="20%" />
+<col width="32%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Option</th>
+<th class="head">Type</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>patch_premain</td>
+<td><code class="docutils literal"><span class="pre">bool</span></code></td>
+<td><code class="docutils literal"><span class="pre">false</span></code></td>
+<td>Whether to patch
+instrumentation points
+before main.</td>
+</tr>
+<tr class="row-odd"><td>xray_naive_log</td>
+<td><code class="docutils literal"><span class="pre">bool</span></code></td>
+<td><code class="docutils literal"><span class="pre">true</span></code></td>
+<td>Whether to install
+the naive log
+implementation.</td>
+</tr>
+<tr class="row-even"><td>xray_logfile_base</td>
+<td><code class="docutils literal"><span class="pre">const</span> <span class="pre">char*</span></code></td>
+<td><code class="docutils literal"><span class="pre">xray-log.</span></code></td>
+<td>Filename base for the
+XRay logfile.</td>
+</tr>
+<tr class="row-odd"><td>xray_fdr_log</td>
+<td><code class="docutils literal"><span class="pre">bool</span></code></td>
+<td><code class="docutils literal"><span class="pre">false</span></code></td>
+<td>Whether to install the
+Flight Data Recorder
+(FDR) mode.</td>
+</tr>
+</tbody>
+</table>
+<p>If you choose to not use the default logging implementation that comes with the
+XRay runtime and/or control when/how the XRay instrumentation runs, you may use
+the XRay APIs directly for doing so. To do this, youâll need to include the
+<code class="docutils literal"><span class="pre">xray_interface.h</span></code> from the compiler-rt <code class="docutils literal"><span class="pre">xray</span></code> directory. The important API
+functions we list below:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">__xray_set_handler(void</span> <span class="pre">(*entry)(int32_t,</span> <span class="pre">XRayEntryType))</span></code>: Install your
+own logging handler for when an event is encountered. See
+<code class="docutils literal"><span class="pre">xray/xray_interface.h</span></code> for more details.</li>
+<li><code class="docutils literal"><span class="pre">__xray_remove_handler()</span></code>: Removes whatever the installed handler is.</li>
+<li><code class="docutils literal"><span class="pre">__xray_patch()</span></code>: Patch all the instrumentation points defined in the
+binary.</li>
+<li><code class="docutils literal"><span class="pre">__xray_unpatch()</span></code>: Unpatch the instrumentation points defined in the
+binary.</li>
+</ul>
+<p>There are some requirements on the logging handler to be installed for the
+thread-safety of operations to be performed by the XRay runtime library:</p>
+<ul class="simple">
+<li>The function should be thread-safe, as multiple threads may be invoking the
+function at the same time. If the logging function needs to do
+synchronisation, it must do so internally as XRay does not provide any
+synchronisation guarantees outside from the atomicity of updates to the
+pointer.</li>
+<li>The pointer provided to <code class="docutils literal"><span class="pre">__xray_set_handler(...)</span></code> must be live even after
+calls to <code class="docutils literal"><span class="pre">__xray_remove_handler()</span></code> and <code class="docutils literal"><span class="pre">__xray_unpatch()</span></code> have succeeded.
+XRay cannot guarantee that all threads that have ever gotten a copy of the
+pointer will not invoke the function.</li>
+</ul>
+</div>
+<div class="section" id="flight-data-recorder-mode">
+<h3><a class="toc-backref" href="#id7">Flight Data Recorder Mode</a><a class="headerlink" href="#flight-data-recorder-mode" title="Permalink to this headline">Â¶</a></h3>
+<p>XRay supports a logging mode which allows the application to only capture a
+fixed amount of memoryâs worth of events. Flight Data Recorder (FDR) mode works
+very much like a planeâs âblack boxâ which keeps recording data to memory in a
+fixed-size circular queue of buffers, and have the data available
+programmatically until the buffers are finalized and flushed. To use FDR mode
+on your application, you may set the <code class="docutils literal"><span class="pre">xray_fdr_log</span></code> option to <code class="docutils literal"><span class="pre">true</span></code> in the
+<code class="docutils literal"><span class="pre">XRAY_OPTIONS</span></code> environment variable (while also optionally setting the
+<code class="docutils literal"><span class="pre">xray_naive_log</span></code> to <code class="docutils literal"><span class="pre">false</span></code>).</p>
+<p>When FDR mode is on, it will keep writing and recycling memory buffers until
+the logging implementation is finalized â at which point it can be flushed and
+re-initialised later. To do this programmatically, we follow the workflow
+provided below:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// Patch the sleds, if we haven't yet.</span>
+<span class="k">auto</span> <span class="n">patch_status</span> <span class="o">=</span> <span class="n">__xray_patch</span><span class="p">();</span>
+
+<span class="c1">// Maybe handle the patch_status errors.</span>
+
+<span class="c1">// When we want to flush the log, we need to finalize it first, to give</span>
+<span class="c1">// threads a chance to return buffers to the queue.</span>
+<span class="k">auto</span> <span class="n">finalize_status</span> <span class="o">=</span> <span class="n">__xray_log_finalize</span><span class="p">();</span>
+<span class="k">if</span> <span class="p">(</span><span class="n">finalize_status</span> <span class="o">!=</span> <span class="n">XRAY_LOG_FINALIZED</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// maybe retry, or bail out.</span>
+<span class="p">}</span>
+
+<span class="c1">// At this point, we are sure that the log is finalized, so we may try</span>
+<span class="c1">// flushing the log.</span>
+<span class="k">auto</span> <span class="n">flush_status</span> <span class="o">=</span> <span class="n">__xray_log_flushLog</span><span class="p">();</span>
+<span class="k">if</span> <span class="p">(</span><span class="n">flush_status</span> <span class="o">!=</span> <span class="n">XRAY_LOG_FLUSHED</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// maybe retry, or bail out.</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The default settings for the FDR mode implementation will create logs named
+similarly to the naive log implementation, but will have a different log
+format. All the trace analysis tools (and the trace reading library) will
+support all versions of the FDR mode format as we add more functionality and
+record types in the future.</p>
+<blockquote>
+<div><strong>NOTE:</strong> We do not however promise perpetual support for when we update the
+log versions we support going forward. Deprecation of the formats will be
+announced and discussed on the developers mailing list.</div></blockquote>
+<p>XRay allows for replacing the default FDR mode logging implementation using the
+following API:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">__xray_set_log_impl(...)</span></code>: This function takes a struct of type
+<code class="docutils literal"><span class="pre">XRayLogImpl</span></code>, which is defined in <code class="docutils literal"><span class="pre">xray/xray_log_interface.h</span></code>, part of
+the XRay compiler-rt installation.</li>
+<li><code class="docutils literal"><span class="pre">__xray_log_init(...)</span></code>: This function allows for initializing and
+re-initializing an installed logging implementation. See
+<code class="docutils literal"><span class="pre">xray/xray_log_interface.h</span></code> for details, part of the XRay compiler-rt
+installation.</li>
+</ul>
+</div>
+<div class="section" id="trace-analysis-tools">
+<h3><a class="toc-backref" href="#id8">Trace Analysis Tools</a><a class="headerlink" href="#trace-analysis-tools" title="Permalink to this headline">Â¶</a></h3>
+<p>We currently have the beginnings of a trace analysis tool in LLVM, which can be
+found in the <code class="docutils literal"><span class="pre">tools/llvm-xray</span></code> directory. The <code class="docutils literal"><span class="pre">llvm-xray</span></code> tool currently
+supports the following subcommands:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">extract</span></code>: Extract the instrumentation map from a binary, and return it as
+YAML.</li>
+<li><code class="docutils literal"><span class="pre">account</span></code>: Performs basic function call accounting statistics with various
+options for sorting, and output formats (supports CSV, YAML, and
+console-friendly TEXT).</li>
+<li><code class="docutils literal"><span class="pre">convert</span></code>: Converts an XRay log file from one format to another. Currently
+only converts to YAML.</li>
+<li><code class="docutils literal"><span class="pre">graph</span></code>: Generates a DOT graph of the function call relationships between
+functions found in an XRay trace.</li>
+</ul>
+<p>These subcommands use various library components found as part of the XRay
+libraries, distributed with the LLVM distribution. These are:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">llvm/XRay/Trace.h</span></code> : A trace reading library for conveniently loading
+an XRay trace of supported forms, into a convenient in-memory representation.
+All the analysis tools that deal with traces use this implementation.</li>
+<li><code class="docutils literal"><span class="pre">llvm/XRay/Graph.h</span></code> : A semi-generic graph type used by the graph
+subcommand to conveniently represent a function call graph with statistics
+associated with edges and vertices.</li>
+<li><code class="docutils literal"><span class="pre">llvm/XRay/InstrumentationMap.h</span></code>: A convenient tool for analyzing the
+instrumentation map in XRay-instrumented object files and binaries. The
+<code class="docutils literal"><span class="pre">extract</span></code> subcommand uses this particular library.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="future-work">
+<h2><a class="toc-backref" href="#id9">Future Work</a><a class="headerlink" href="#future-work" title="Permalink to this headline">Â¶</a></h2>
+<p>There are a number of ongoing efforts for expanding the toolset building around
+the XRay instrumentation system.</p>
+<div class="section" id="trace-analysis">
+<h3><a class="toc-backref" href="#id10">Trace Analysis</a><a class="headerlink" href="#trace-analysis" title="Permalink to this headline">Â¶</a></h3>
+<p>We have more subcommands and modes that weâre thinking of developing, in the
+following forms:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">stack</span></code>: Reconstruct the function call stacks in a timeline.</li>
+</ul>
+</div>
+<div class="section" id="more-platforms">
+<h3><a class="toc-backref" href="#id11">More Platforms</a><a class="headerlink" href="#more-platforms" title="Permalink to this headline">Â¶</a></h3>
+<p>Weâre looking forward to contributions to port XRay to more architectures and
+operating systems.</p>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="XRayExample.html" title="Debugging with XRay"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="GlobalISel.html" title="Global Instruction Selection"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/XRayExample.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/XRayExample.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/XRayExample.html (added)
+++ www-releases/trunk/5.0.1/docs/XRayExample.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,348 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Debugging with XRay — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="The PDB File Format" href="PDB/index.html" />
+    <link rel="prev" title="XRay Instrumentation" href="XRay.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="PDB/index.html" title="The PDB File Format"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="XRay.html" title="XRay Instrumentation"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="debugging-with-xray">
+<h1>Debugging with XRay<a class="headerlink" href="#debugging-with-xray" title="Permalink to this headline">Â¶</a></h1>
+<p>This document shows an example of how you would go about analyzing applications
+built with XRay instrumentation. Here we will attempt to debug <code class="docutils literal"><span class="pre">llc</span></code>
+compiling some sample LLVM IR generated by Clang.</p>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#building-with-xray" id="id1">Building with XRay</a></li>
+<li><a class="reference internal" href="#getting-traces" id="id2">Getting Traces</a></li>
+<li><a class="reference internal" href="#the-llvm-xray-tool" id="id3">The <code class="docutils literal"><span class="pre">llvm-xray</span></code> Tool</a></li>
+<li><a class="reference internal" href="#controlling-fidelity" id="id4">Controlling Fidelity</a><ul>
+<li><a class="reference internal" href="#instruction-threshold" id="id5">Instruction Threshold</a></li>
+<li><a class="reference internal" href="#instrumentation-attributes" id="id6">Instrumentation Attributes</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#further-exploration" id="id7">Further Exploration</a></li>
+<li><a class="reference internal" href="#next-steps" id="id8">Next Steps</a></li>
+</ul>
+</div>
+<div class="section" id="building-with-xray">
+<h2><a class="toc-backref" href="#id1">Building with XRay</a><a class="headerlink" href="#building-with-xray" title="Permalink to this headline">Â¶</a></h2>
+<p>To debug an application with XRay instrumentation, we need to build it with a
+Clang that supports the <code class="docutils literal"><span class="pre">-fxray-instrument</span></code> option. See <a class="reference external" href="XRay.html">XRay</a>
+for more technical details of how XRay works for background information.</p>
+<p>In our example, we need to add <code class="docutils literal"><span class="pre">-fxray-instrument</span></code> to the list of flags
+passed to Clang when building a binary. Note that we need to link with Clang as
+well to get the XRay runtime linked in appropriately. For building <code class="docutils literal"><span class="pre">llc</span></code> with
+XRay, we do something similar below for our LLVM build:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ mkdir -p llvm-build && cd llvm-build
+# Assume that the LLVM sources are at ../llvm
+$ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
+    -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \
+# Once this finishes, we should build llc
+$ ninja llc
+</pre></div>
+</div>
+<p>To verify that we have an XRay instrumented binary, we can use <code class="docutils literal"><span class="pre">objdump</span></code> to
+look for the <code class="docutils literal"><span class="pre">xray_instr_map</span></code> section.</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ objdump -h -j xray_instr_map ./bin/llc
+./bin/llc:     file format elf64-x86-64
+
+Sections:
+Idx Name          Size      VMA               LMA               File off  Algn
+ 14 xray_instr_map 00002fc0  00000000041516c6  00000000041516c6  03d516c6  2**0
+                  CONTENTS, ALLOC, LOAD, READONLY, DATA
+</pre></div>
+</div>
+</div>
+<div class="section" id="getting-traces">
+<h2><a class="toc-backref" href="#id2">Getting Traces</a><a class="headerlink" href="#getting-traces" title="Permalink to this headline">Â¶</a></h2>
+<p>By default, XRay does not write out the trace files or patch the application
+before main starts. If we just run <code class="docutils literal"><span class="pre">llc</span></code> it should just work like a normally
+built binary. However, if we want to get a full trace of the applicationâs
+operations (of the functions we do end up instrumenting with XRay) then we need
+to enable XRay at application start. To do this, XRay checks the
+<code class="docutils literal"><span class="pre">XRAY_OPTIONS</span></code> environment variable.</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span># The following doesn't create an XRay trace by default.
+$ ./bin/llc input.ll
+
+# We need to set the XRAY_OPTIONS to enable some features.
+$ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
+==69819==XRay: Log file in 'xray-log.llc.m35qPB'
+</pre></div>
+</div>
+<p>At this point we now have an XRay trace we can start analysing.</p>
+</div>
+<div class="section" id="the-llvm-xray-tool">
+<h2><a class="toc-backref" href="#id3">The <code class="docutils literal"><span class="pre">llvm-xray</span></code> Tool</a><a class="headerlink" href="#the-llvm-xray-tool" title="Permalink to this headline">Â¶</a></h2>
+<p>Having a trace then allows us to do basic accounting of the functions that were
+instrumented, and how much time weâre spending in parts of the code. To make
+sense of this data, we use the <code class="docutils literal"><span class="pre">llvm-xray</span></code> tool which has a few subcommands
+to help us understand our trace.</p>
+<p>One of the simplest things we can do is to get an accounting of the functions
+that have been instrumented. We can see an example accounting with <code class="docutils literal"><span class="pre">llvm-xray</span>
+<span class="pre">account</span></code>:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
+Functions with latencies: 29
+   funcid      count [      min,       med,       90p,       99p,       max]       sum  function
+      187        360 [ 0.000000,  0.000001,  0.000014,  0.000032,  0.000075]  0.001596  LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier()
+       85        130 [ 0.000000,  0.000000,  0.000018,  0.000023,  0.000156]  0.000799  X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*)
+      138        130 [ 0.000000,  0.000000,  0.000017,  0.000155,  0.000155]  0.000774  SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
+      188        103 [ 0.000000,  0.000000,  0.000003,  0.000123,  0.000214]  0.000737  LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*)
+       88          1 [ 0.000562,  0.000562,  0.000562,  0.000562,  0.000562]  0.000562  X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&)
+      125        102 [ 0.000001,  0.000003,  0.000010,  0.000017,  0.000049]  0.000471  Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&)
+       90          8 [ 0.000023,  0.000035,  0.000106,  0.000106,  0.000106]  0.000342  X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const
+      124         32 [ 0.000003,  0.000007,  0.000016,  0.000041,  0.000041]  0.000310  Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&)
+      123          1 [ 0.000302,  0.000302,  0.000302,  0.000302,  0.000302]  0.000302  LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl()
+      139         46 [ 0.000000,  0.000002,  0.000006,  0.000008,  0.000019]  0.000138  TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const
+</pre></div>
+</div>
+<p>This shows us that for our input file, <code class="docutils literal"><span class="pre">llc</span></code> spent the most cumulative time
+in the lexer (a total of 1 millisecond). If we wanted for example to work with
+this data in a spreadsheet, we can output the results as CSV using the
+<code class="docutils literal"><span class="pre">-format=csv</span></code> option to the command for further analysis.</p>
+<p>If we want to get a textual representation of the raw trace we can use the
+<code class="docutils literal"><span class="pre">llvm-xray</span> <span class="pre">convert</span></code> tool to get YAML output. The first few lines of that
+ouput for an example trace would look like the following:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB
+---
+header:
+  version:         1
+  type:            0
+  constant-tsc:    true
+  nonstop-tsc:     true
+  cycle-frequency: 2601000000
+records:
+  - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 }
+  - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 }
+  - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 }
+  - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 }
+  - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 }
+  - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 }
+</pre></div>
+</div>
+</div>
+<div class="section" id="controlling-fidelity">
+<h2><a class="toc-backref" href="#id4">Controlling Fidelity</a><a class="headerlink" href="#controlling-fidelity" title="Permalink to this headline">Â¶</a></h2>
+<p>So far in our examples, we havenât been getting full coverage of the functions
+we have in the binary. To get that, we need to modify the compiler flags so
+that we can instrument more (if not all) the functions we have in the binary.
+We have two options for doing that, and we explore both of these below.</p>
+<div class="section" id="instruction-threshold">
+<h3><a class="toc-backref" href="#id5">Instruction Threshold</a><a class="headerlink" href="#instruction-threshold" title="Permalink to this headline">Â¶</a></h3>
+<p>The first âbluntâ way of doing this is by setting the minimum threshold for
+function bodies to 1. We can do that with the
+<code class="docutils literal"><span class="pre">-fxray-instruction-threshold=N</span></code> flag when building our binary. We rebuild
+<code class="docutils literal"><span class="pre">llc</span></code> with this option and observe the results:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ rm CMakeCache.txt
+$ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
+    -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \
+    -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1"
+$ ninja llc
+$ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
+==69819==XRay: Log file in 'xray-log.llc.5rqxkU'
+
+$ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
+Functions with latencies: 36652
+ funcid      count [      min,       med,       90p,       99p,       max]       sum  function
+     75          1 [ 0.672368,  0.672368,  0.672368,  0.672368,  0.672368]  0.672368  llc.cpp:271:0: main
+     78          1 [ 0.626455,  0.626455,  0.626455,  0.626455,  0.626455]  0.626455  llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&)
+ 139617          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&)
+ 139610          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&)
+ 139612          1 [ 0.470948,  0.470948,  0.470948,  0.470948,  0.470948]  0.470948  LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&)
+ 139607          2 [ 0.147345,  0.315994,  0.315994,  0.315994,  0.315994]  0.463340  LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&)
+ 139605         21 [ 0.000002,  0.000002,  0.102593,  0.213336,  0.213336]  0.463331  LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&)
+ 139563      26096 [ 0.000002,  0.000002,  0.000037,  0.000063,  0.000215]  0.225708  LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool)
+ 108055        188 [ 0.000002,  0.000120,  0.001375,  0.004523,  0.062624]  0.159279  MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
+  62635         22 [ 0.000041,  0.000046,  0.000050,  0.126744,  0.126744]  0.127715  X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const
+</pre></div>
+</div>
+</div>
+<div class="section" id="instrumentation-attributes">
+<h3><a class="toc-backref" href="#id6">Instrumentation Attributes</a><a class="headerlink" href="#instrumentation-attributes" title="Permalink to this headline">Â¶</a></h3>
+<p>The other way is to use configuration files for selecting which functions
+should always be instrumented by the compiler. This gives us a way of ensuring
+that certain functions are either always or never instrumented by not having to
+add the attribute to the source.</p>
+<p>To use this feature, you can define one file for the functions to always
+instrument, and another for functions to never instrument. The format of these
+files are exactly the same as the SanitizerLists files that control similar
+things for the sanitizer implementations. For example, we can have two
+different files like below:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1"># always-instrument.txt</span>
+<span class="c1"># always instrument functions that match the following filters:</span>
+<span class="n">fun</span><span class="p">:</span><span class="n">main</span>
+
+<span class="c1"># never-instrument.txt</span>
+<span class="c1"># never instrument functions that match the following filters:</span>
+<span class="n">fun</span><span class="p">:</span><span class="n">__cxx_</span><span class="o">*</span>
+</pre></div>
+</div>
+<p>Given the above two files we can re-build by providing those two files as
+arguments to clang as <code class="docutils literal"><span class="pre">-fxray-always-instrument=always-instrument.txt</span></code> or
+<code class="docutils literal"><span class="pre">-fxray-never-instrument=never-instrument.txt</span></code>.</p>
+</div>
+</div>
+<div class="section" id="further-exploration">
+<h2><a class="toc-backref" href="#id7">Further Exploration</a><a class="headerlink" href="#further-exploration" title="Permalink to this headline">Â¶</a></h2>
+<p>The <code class="docutils literal"><span class="pre">llvm-xray</span></code> tool has a few other subcommands that are in various stages
+of being developed. One interesting subcommand that can highlight a few
+interesting things is the <code class="docutils literal"><span class="pre">graph</span></code> subcommand. Given for example the following
+toy program that we build with XRay instrumentation, we can see how the
+generated graph may be a helpful indicator of where time is being spent for the
+application.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// sample.cc</span>
+<span class="cp">#include</span> <span class="cpf"><iostream></span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf"><thread></span><span class="cp"></span>
+
+<span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_always_intrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
+  <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="sc">'.'</span><span class="p">;</span>
+<span class="p">}</span>
+
+<span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_always_intrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">g</span><span class="p">()</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="sc">'-'</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+
+<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
+  <span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t1</span><span class="p">([]</span> <span class="p">{</span>
+    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+      <span class="n">f</span><span class="p">();</span>
+  <span class="p">});</span>
+  <span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t2</span><span class="p">([]</span> <span class="p">{</span>
+    <span class="n">g</span><span class="p">();</span>
+  <span class="p">});</span>
+  <span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
+  <span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
+  <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>We then build the above with XRay instrumentation:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1
+$ XRAY_OPTIONS="patch_premain=true" ./sample
+</pre></div>
+</div>
+<p>We can then explore the graph rendering of the trace generated by this sample
+application. We assume you have the graphviz toosl available in your system,
+including both <code class="docutils literal"><span class="pre">unflatten</span></code> and <code class="docutils literal"><span class="pre">dot</span></code>. If you prefer rendering or exploring
+the graph using another tool, then that should be feasible as well. <code class="docutils literal"><span class="pre">llvm-xray</span>
+<span class="pre">graph</span></code> will create DOT format graphs which should be usable in most graph
+rendering applications. One example invocation of the <code class="docutils literal"><span class="pre">llvm-xray</span> <span class="pre">graph</span></code>
+command should yield some interesting insights to the workings of C++
+applications:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \
+    | unflatten -f -l10 | dot -Tsvg -o sample.svg
+</pre></div>
+</div>
+</div>
+<div class="section" id="next-steps">
+<h2><a class="toc-backref" href="#id8">Next Steps</a><a class="headerlink" href="#next-steps" title="Permalink to this headline">Â¶</a></h2>
+<p>If you have some interesting analyses youâd like to implement as part of the
+llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list.
+The following are some ideas to inspire you in getting involved and potentially
+making things better.</p>
+<blockquote>
+<div><ul class="simple">
+<li>Implement a query/filtering library that allows for finding patterns in the
+XRay traces.</li>
+<li>A conversion from the XRay trace onto something that can be visualised
+better by other tools (like the Chrome trace viewer for example).</li>
+<li>Collecting function call stacks and how often theyâre encountered in the
+XRay trace.</li>
+</ul>
+</div></blockquote>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="PDB/index.html" title="The PDB File Format"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="XRay.html" title="XRay Instrumentation"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/YamlIO.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/YamlIO.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/YamlIO.html (added)
+++ www-releases/trunk/5.0.1/docs/YamlIO.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1029 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>YAML I/O — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="The Often Misunderstood GEP Instruction" href="GetElementPtr.html" />
+    <link rel="prev" title="LLVMâs Analysis and Transform Passes" href="Passes.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="GetElementPtr.html" title="The Often Misunderstood GEP Instruction"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="Passes.html" title="LLVMâs Analysis and Transform Passes"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="yaml-i-o">
+<h1>YAML I/O<a class="headerlink" href="#yaml-i-o" title="Permalink to this headline">Â¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction-to-yaml" id="id1">Introduction to YAML</a></li>
+<li><a class="reference internal" href="#introduction-to-yaml-i-o" id="id2">Introduction to YAML I/O</a></li>
+<li><a class="reference internal" href="#error-handling" id="id3">Error Handling</a></li>
+<li><a class="reference internal" href="#scalars" id="id4">Scalars</a><ul>
+<li><a class="reference internal" href="#built-in-types" id="id5">Built-in types</a></li>
+<li><a class="reference internal" href="#unique-types" id="id6">Unique types</a></li>
+<li><a class="reference internal" href="#hex-types" id="id7">Hex types</a></li>
+<li><a class="reference internal" href="#scalarenumerationtraits" id="id8">ScalarEnumerationTraits</a></li>
+<li><a class="reference internal" href="#bitvalue" id="id9">BitValue</a></li>
+<li><a class="reference internal" href="#custom-scalar" id="id10">Custom Scalar</a></li>
+<li><a class="reference internal" href="#block-scalars" id="id11">Block Scalars</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#mappings" id="id12">Mappings</a><ul>
+<li><a class="reference internal" href="#no-normalization" id="id13">No Normalization</a></li>
+<li><a class="reference internal" href="#normalization" id="id14">Normalization</a></li>
+<li><a class="reference internal" href="#default-values" id="id15">Default values</a></li>
+<li><a class="reference internal" href="#order-of-keys" id="id16">Order of Keys</a></li>
+<li><a class="reference internal" href="#tags" id="id17">Tags</a></li>
+<li><a class="reference internal" href="#validation" id="id18">Validation</a></li>
+<li><a class="reference internal" href="#flow-mapping" id="id19">Flow Mapping</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#sequence" id="id20">Sequence</a><ul>
+<li><a class="reference internal" href="#flow-sequence" id="id21">Flow Sequence</a></li>
+<li><a class="reference internal" href="#utility-macros" id="id22">Utility Macros</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#document-list" id="id23">Document List</a></li>
+<li><a class="reference internal" href="#user-context-data" id="id24">User Context Data</a></li>
+<li><a class="reference internal" href="#output" id="id25">Output</a></li>
+<li><a class="reference internal" href="#input" id="id26">Input</a></li>
+</ul>
+</div>
+<div class="section" id="introduction-to-yaml">
+<h2><a class="toc-backref" href="#id1">Introduction to YAML</a><a class="headerlink" href="#introduction-to-yaml" title="Permalink to this headline">Â¶</a></h2>
+<p>YAML is a human readable data serialization language.  The full YAML language
+spec can be read at <a class="reference external" href="http://www.yaml.org/spec/1.2/spec.html#Introduction">yaml.org</a>.  The simplest form of
+yaml is just âscalarsâ, âmappingsâ, and âsequencesâ.  A scalar is any number
+or string.  The pound/hash symbol (#) begins a comment line.   A mapping is
+a set of key-value pairs where the key ends with a colon.  For example:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a mapping</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+</pre></div>
+</div>
+<p>A sequence is a list of items where each item starts with a leading dash (â-â).
+For example:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a sequence</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86_64</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">PowerPC</span>
+</pre></div>
+</div>
+<p>You can combine mappings and sequences by indenting.  For example a sequence
+of mappings in which one of the mapping values is itself a sequence:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a sequence of mappings with one key's value being a sequence</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86_64</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Bob</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">PowerPC</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+</pre></div>
+</div>
+<p>Sometime sequences are known to be short and the one entry per line is too
+verbose, so YAML offers an alternate syntax for sequences called a âFlow
+Sequenceâ in which you put comma separated sequence elements into square
+brackets.  The above example could then be simplified to :</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a sequence of mappings with one key's value being a flow sequence</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>      <span class="p p-Indicator">[</span> <span class="nv">x86</span><span class="p p-Indicator">,</span> <span class="nv">x86_64</span> <span class="p p-Indicator">]</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Bob</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>      <span class="p p-Indicator">[</span> <span class="nv">x86</span> <span class="p p-Indicator">]</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>      <span class="p p-Indicator">[</span> <span class="nv">PowerPC</span><span class="p p-Indicator">,</span> <span class="nv">x86</span> <span class="p p-Indicator">]</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="introduction-to-yaml-i-o">
+<h2><a class="toc-backref" href="#id2">Introduction to YAML I/O</a><a class="headerlink" href="#introduction-to-yaml-i-o" title="Permalink to this headline">Â¶</a></h2>
+<p>The use of indenting makes the YAML easy for a human to read and understand,
+but having a program read and write YAML involves a lot of tedious details.
+The YAML I/O library structures and simplifies reading and writing YAML
+documents.</p>
+<p>YAML I/O assumes you have some ânativeâ data structures which you want to be
+able to dump as YAML and recreate from YAML.  The first step is to try
+writing example YAML for your data structures. You may find after looking at
+possible YAML representations that a direct mapping of your data structures
+to YAML is not very readable.  Often the fields are not in the order that
+a human would find readable.  Or the same information is replicated in multiple
+locations, making it hard for a human to write such YAML correctly.</p>
+<p>In relational database theory there is a design step called normalization in
+which you reorganize fields and tables.  The same considerations need to
+go into the design of your YAML encoding.  But, you may not want to change
+your existing native data structures.  Therefore, when writing out YAML
+there may be a normalization step, and when reading YAML there would be a
+corresponding denormalization step.</p>
+<p>YAML I/O uses a non-invasive, traits based design.  YAML I/O defines some
+abstract base templates.  You specialize those templates on your data types.
+For instance, if you have an enumerated type FooBar you could specialize
+ScalarEnumerationTraits on that type and define the enumeration() method:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarEnumerationTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarEnumerationTraits</span><span class="o"><</span><span class="n">FooBar</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">enumeration</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">FooBar</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>As with all YAML I/O template specializations, the ScalarEnumerationTraits is used for
+both reading and writing YAML. That is, the mapping between in-memory enum
+values and the YAML string representation is only in one place.
+This assures that the code for writing and parsing of YAML stays in sync.</p>
+<p>To specify a YAML mappings, you define a specialization on
+llvm::yaml::MappingTraits.
+If your native data structure happens to be a struct that is already normalized,
+then the specialization is simple.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Person</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span>         <span class="n">info</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"hat-size"</span><span class="p">,</span>     <span class="n">info</span><span class="p">.</span><span class="n">hatSize</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>A YAML sequence is automatically inferred if you data type has begin()/end()
+iterators and a push_back() method.  Therefore any of the STL containers
+(such as std::vector<>) will automatically translate to YAML sequences.</p>
+<p>Once you have defined specializations for your data types, you can
+programmatically use YAML I/O to write a YAML document:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Output</span><span class="p">;</span>
+
+<span class="n">Person</span> <span class="n">tom</span><span class="p">;</span>
+<span class="n">tom</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">"Tom"</span><span class="p">;</span>
+<span class="n">tom</span><span class="p">.</span><span class="n">hatSize</span> <span class="o">=</span> <span class="mi">8</span><span class="p">;</span>
+<span class="n">Person</span> <span class="n">dan</span><span class="p">;</span>
+<span class="n">dan</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">"Dan"</span><span class="p">;</span>
+<span class="n">dan</span><span class="p">.</span><span class="n">hatSize</span> <span class="o">=</span> <span class="mi">7</span><span class="p">;</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="n">persons</span><span class="p">;</span>
+<span class="n">persons</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">tom</span><span class="p">);</span>
+<span class="n">persons</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">dan</span><span class="p">);</span>
+
+<span class="n">Output</span> <span class="nf">yout</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">outs</span><span class="p">());</span>
+<span class="n">yout</span> <span class="o"><<</span> <span class="n">persons</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>This would write the following:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">8</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+</pre></div>
+</div>
+<p>And you can also read such YAML documents with the following code:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Input</span><span class="p">;</span>
+
+<span class="k">typedef</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="n">PersonList</span><span class="p">;</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">PersonList</span><span class="o">></span> <span class="n">docs</span><span class="p">;</span>
+
+<span class="n">Input</span> <span class="nf">yin</span><span class="p">(</span><span class="n">document</span><span class="p">.</span><span class="n">getBuffer</span><span class="p">());</span>
+<span class="n">yin</span> <span class="o">>></span> <span class="n">docs</span><span class="p">;</span>
+
+<span class="k">if</span> <span class="p">(</span> <span class="n">yin</span><span class="p">.</span><span class="n">error</span><span class="p">()</span> <span class="p">)</span>
+  <span class="k">return</span><span class="p">;</span>
+
+<span class="c1">// Process read document</span>
+<span class="k">for</span> <span class="p">(</span> <span class="n">PersonList</span> <span class="o">&</span><span class="nl">pl</span> <span class="p">:</span> <span class="n">docs</span> <span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span> <span class="n">Person</span> <span class="o">&</span><span class="nl">person</span> <span class="p">:</span> <span class="n">pl</span> <span class="p">)</span> <span class="p">{</span>
+    <span class="n">cout</span> <span class="o"><<</span> <span class="s">"name="</span> <span class="o"><<</span> <span class="n">person</span><span class="p">.</span><span class="n">name</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>One other feature of YAML is the ability to define multiple documents in a
+single file.  That is why reading YAML produces a vector of your document type.</p>
+</div>
+<div class="section" id="error-handling">
+<h2><a class="toc-backref" href="#id3">Error Handling</a><a class="headerlink" href="#error-handling" title="Permalink to this headline">Â¶</a></h2>
+<p>When parsing a YAML document, if the input does not match your schema (as
+expressed in your XxxTraits<> specializations).  YAML I/O
+will print out an error message and your Input objectâs error() method will
+return true. For instance the following document:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">shoe-size</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">12</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+</pre></div>
+</div>
+<p>Has a key (shoe-size) that is not defined in the schema.  YAML I/O will
+automatically generate this error:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">YAML:2:2</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">error</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">unknown key 'shoe-size'</span>
+  <span class="l l-Scalar l-Scalar-Plain">shoe-size</span><span class="p p-Indicator">:</span>       <span class="l l-Scalar l-Scalar-Plain">12</span>
+  <span class="l l-Scalar l-Scalar-Plain">^~~~~~~~~</span>
+</pre></div>
+</div>
+<p>Similar errors are produced for other input not conforming to the schema.</p>
+</div>
+<div class="section" id="scalars">
+<h2><a class="toc-backref" href="#id4">Scalars</a><a class="headerlink" href="#scalars" title="Permalink to this headline">Â¶</a></h2>
+<p>YAML scalars are just strings (i.e. not a sequence or mapping).  The YAML I/O
+library provides support for translating between YAML scalars and specific
+C++ types.</p>
+<div class="section" id="built-in-types">
+<h3><a class="toc-backref" href="#id5">Built-in types</a><a class="headerlink" href="#built-in-types" title="Permalink to this headline">Â¶</a></h3>
+<p>The following types have built-in support in YAML I/O:</p>
+<ul class="simple">
+<li>bool</li>
+<li>float</li>
+<li>double</li>
+<li>StringRef</li>
+<li>std::string</li>
+<li>int64_t</li>
+<li>int32_t</li>
+<li>int16_t</li>
+<li>int8_t</li>
+<li>uint64_t</li>
+<li>uint32_t</li>
+<li>uint16_t</li>
+<li>uint8_t</li>
+</ul>
+<p>That is, you can use those types in fields of MappingTraits or as element type
+in sequence.  When reading, YAML I/O will validate that the string found
+is convertible to that type and error out if not.</p>
+</div>
+<div class="section" id="unique-types">
+<h3><a class="toc-backref" href="#id6">Unique types</a><a class="headerlink" href="#unique-types" title="Permalink to this headline">Â¶</a></h3>
+<p>Given that YAML I/O is trait based, the selection of how to convert your data
+to YAML is based on the type of your data.  But in C++ type matching, typedefs
+do not generate unique type names.  That means if you have two typedefs of
+unsigned int, to YAML I/O both types look exactly like unsigned int.  To
+facilitate make unique type names, YAML I/O provides a macro which is used
+like a typedef on built-in types, but expands to create a class with conversion
+operators to and from the base type.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">LLVM_YAML_STRONG_TYPEDEF</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">MyFooFlags</span><span class="p">)</span>
+<span class="n">LLVM_YAML_STRONG_TYPEDEF</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">MyBarFlags</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>This generates two classes MyFooFlags and MyBarFlags which you can use in your
+native data structures instead of uint32_t. They are implicitly
+converted to and from uint32_t.  The point of creating these unique types
+is that you can now specify traits on them to get different YAML conversions.</p>
+</div>
+<div class="section" id="hex-types">
+<h3><a class="toc-backref" href="#id7">Hex types</a><a class="headerlink" href="#hex-types" title="Permalink to this headline">Â¶</a></h3>
+<p>An example use of a unique type is that YAML I/O provides fixed sized unsigned
+integers that are written with YAML I/O as hexadecimal instead of the decimal
+format used by the built-in integer types:</p>
+<ul class="simple">
+<li>Hex64</li>
+<li>Hex32</li>
+<li>Hex16</li>
+<li>Hex8</li>
+</ul>
+<p>You can use llvm::yaml::Hex32 instead of uint32_t and the only different will
+be that when YAML I/O writes out that type it will be formatted in hexadecimal.</p>
+</div>
+<div class="section" id="scalarenumerationtraits">
+<h3><a class="toc-backref" href="#id8">ScalarEnumerationTraits</a><a class="headerlink" href="#scalarenumerationtraits" title="Permalink to this headline">Â¶</a></h3>
+<p>YAML I/O supports translating between in-memory enumerations and a set of string
+values in YAML documents. This is done by specializing ScalarEnumerationTraits<>
+on your enumeration type and define a enumeration() method.
+For instance, suppose you had an enumeration of CPUs and a struct with it as
+a field:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="n">CPUs</span> <span class="p">{</span>
+  <span class="n">cpu_x86_64</span>  <span class="o">=</span> <span class="mi">5</span><span class="p">,</span>
+  <span class="n">cpu_x86</span>     <span class="o">=</span> <span class="mi">7</span><span class="p">,</span>
+  <span class="n">cpu_PowerPC</span> <span class="o">=</span> <span class="mi">8</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="n">Info</span> <span class="p">{</span>
+  <span class="n">CPUs</span>      <span class="n">cpu</span><span class="p">;</span>
+  <span class="kt">uint32_t</span>  <span class="n">flags</span><span class="p">;</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>To support reading and writing of this enumeration, you can define a
+ScalarEnumerationTraits specialization on CPUs, which can then be used
+as a field type:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarEnumerationTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarEnumerationTraits</span><span class="o"><</span><span class="n">CPUs</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">enumeration</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">CPUs</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">enumCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"x86_64"</span><span class="p">,</span>  <span class="n">cpu_x86_64</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">enumCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"x86"</span><span class="p">,</span>     <span class="n">cpu_x86</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">enumCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"PowerPC"</span><span class="p">,</span> <span class="n">cpu_PowerPC</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Info</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Info</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"cpu"</span><span class="p">,</span>       <span class="n">info</span><span class="p">.</span><span class="n">cpu</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span>     <span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>When reading YAML, if the string found does not match any of the strings
+specified by enumCase() methods, an error is automatically generated.
+When writing YAML, if the value being written does not match any of the values
+specified by the enumCase() methods, a runtime assertion is triggered.</p>
+</div>
+<div class="section" id="bitvalue">
+<h3><a class="toc-backref" href="#id9">BitValue</a><a class="headerlink" href="#bitvalue" title="Permalink to this headline">Â¶</a></h3>
+<p>Another common data structure in C++ is a field where each bit has a unique
+meaning.  This is often used in a âflagsâ field.  YAML I/O has support for
+converting such fields to a flow sequence.   For instance suppose you
+had the following bit flags defined:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="p">{</span>
+  <span class="n">flagsPointy</span> <span class="o">=</span> <span class="mi">1</span>
+  <span class="n">flagsHollow</span> <span class="o">=</span> <span class="mi">2</span>
+  <span class="n">flagsFlat</span>   <span class="o">=</span> <span class="mi">4</span>
+  <span class="n">flagsRound</span>  <span class="o">=</span> <span class="mi">8</span>
+<span class="p">};</span>
+
+<span class="n">LLVM_YAML_STRONG_TYPEDEF</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">MyFlags</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>To support reading and writing of MyFlags, you specialize ScalarBitSetTraits<>
+on MyFlags and provide the bit values and their names.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarBitSetTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarBitSetTraits</span><span class="o"><</span><span class="n">MyFlags</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">bitset</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyFlags</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"hollow"</span><span class="p">,</span>  <span class="n">flagHollow</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"flat"</span><span class="p">,</span>    <span class="n">flagFlat</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"round"</span><span class="p">,</span>   <span class="n">flagRound</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"pointy"</span><span class="p">,</span>  <span class="n">flagPointy</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="n">Info</span> <span class="p">{</span>
+  <span class="n">StringRef</span>   <span class="n">name</span><span class="p">;</span>
+  <span class="n">MyFlags</span>     <span class="n">flags</span><span class="p">;</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Info</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Info</span><span class="o">&</span> <span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span>  <span class="n">info</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span> <span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">);</span>
+   <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>With the above, YAML I/O (when writing) will test mask each value in the
+bitset trait against the flags field, and each that matches will
+cause the corresponding string to be added to the flow sequence.  The opposite
+is done when reading and any unknown string values will result in a error. With
+the above schema, a same valid YAML document is:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>    <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">flags</span><span class="p p-Indicator">:</span>   <span class="p p-Indicator">[</span> <span class="nv">pointy</span><span class="p p-Indicator">,</span> <span class="nv">flat</span> <span class="p p-Indicator">]</span>
+</pre></div>
+</div>
+<p>Sometimes a âflagsâ field might contains an enumeration part
+defined by a bit-mask.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="p">{</span>
+  <span class="n">flagsFeatureA</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
+  <span class="n">flagsFeatureB</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span>
+  <span class="n">flagsFeatureC</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
+
+  <span class="n">flagsCPUMask</span> <span class="o">=</span> <span class="mi">24</span><span class="p">,</span>
+
+  <span class="n">flagsCPU1</span> <span class="o">=</span> <span class="mi">8</span><span class="p">,</span>
+  <span class="n">flagsCPU2</span> <span class="o">=</span> <span class="mi">16</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>To support reading and writing such fields, you need to use the maskedBitSet()
+method and provide the bit values, their names and the enumeration mask.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarBitSetTraits</span><span class="o"><</span><span class="n">MyFlags</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">bitset</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyFlags</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"featureA"</span><span class="p">,</span>  <span class="n">flagsFeatureA</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"featureB"</span><span class="p">,</span>  <span class="n">flagsFeatureB</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"featureC"</span><span class="p">,</span>  <span class="n">flagsFeatureC</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">maskedBitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"CPU1"</span><span class="p">,</span>  <span class="n">flagsCPU1</span><span class="p">,</span> <span class="n">flagsCPUMask</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">maskedBitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"CPU2"</span><span class="p">,</span>  <span class="n">flagsCPU2</span><span class="p">,</span> <span class="n">flagsCPUMask</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>YAML I/O (when writing) will apply the enumeration mask to the flags field,
+and compare the result and values from the bitset. As in case of a regular
+bitset, each that matches will cause the corresponding string to be added
+to the flow sequence.</p>
+</div>
+<div class="section" id="custom-scalar">
+<h3><a class="toc-backref" href="#id10">Custom Scalar</a><a class="headerlink" href="#custom-scalar" title="Permalink to this headline">Â¶</a></h3>
+<p>Sometimes for readability a scalar needs to be formatted in a custom way. For
+instance your internal data structure may use a integer for time (seconds since
+some epoch), but in YAML it would be much nicer to express that integer in
+some time format (e.g. 4-May-2012 10:30pm).  YAML I/O has a way to support
+custom formatting and parsing of scalar types by specializing ScalarTraits<> on
+your data type.  When writing, YAML I/O will provide the native type and
+your specialization must create a temporary llvm::StringRef.  When reading,
+YAML I/O will provide an llvm::StringRef of scalar and your specialization
+must convert that to your native data type.  An outline of a custom scalar type
+looks like:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarTraits</span><span class="o"><</span><span class="n">MyCustomType</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">output</span><span class="p">(</span><span class="k">const</span> <span class="n">MyCustomType</span> <span class="o">&</span><span class="n">value</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span><span class="p">,</span>
+                     <span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="n">out</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">out</span> <span class="o"><<</span> <span class="n">value</span><span class="p">;</span>  <span class="c1">// do custom formatting here</span>
+  <span class="p">}</span>
+  <span class="k">static</span> <span class="n">StringRef</span> <span class="n">input</span><span class="p">(</span><span class="n">StringRef</span> <span class="n">scalar</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span><span class="p">,</span> <span class="n">MyCustomType</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="c1">// do custom parsing here.  Return the empty string on success,</span>
+    <span class="c1">// or an error message on failure.</span>
+    <span class="k">return</span> <span class="n">StringRef</span><span class="p">();</span>
+  <span class="p">}</span>
+  <span class="c1">// Determine if this scalar needs quotes.</span>
+  <span class="k">static</span> <span class="kt">bool</span> <span class="n">mustQuote</span><span class="p">(</span><span class="n">StringRef</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">true</span><span class="p">;</span> <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="block-scalars">
+<h3><a class="toc-backref" href="#id11">Block Scalars</a><a class="headerlink" href="#block-scalars" title="Permalink to this headline">Â¶</a></h3>
+<p>YAML block scalars are string literals that are represented in YAML using the
+literal block notation, just like the example shown below:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">text</span><span class="p p-Indicator">:</span> <span class="p p-Indicator">|</span>
+  <span class="no">First line</span>
+  <span class="no">Second line</span>
+</pre></div>
+</div>
+<p>The YAML I/O library provides support for translating between YAML block scalars
+and specific C++ types by allowing you to specialize BlockScalarTraits<> on
+your data type. The library doesnât provide any built-in support for block
+scalar I/O for types like std::string and llvm::StringRef as they are already
+supported by YAML I/O and use the ordinary scalar notation by default.</p>
+<p>BlockScalarTraits specializations are very similar to the
+ScalarTraits specialization - YAML I/O will provide the native type and your
+specialization must create a temporary llvm::StringRef when writing, and
+it will also provide an llvm::StringRef that has the value of that block scalar
+and your specialization must convert that to your native data type when reading.
+An example of a custom type with an appropriate specialization of
+BlockScalarTraits is shown below:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">BlockScalarTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">MyStringType</span> <span class="p">{</span>
+  <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">Str</span><span class="p">;</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">BlockScalarTraits</span><span class="o"><</span><span class="n">MyStringType</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">output</span><span class="p">(</span><span class="k">const</span> <span class="n">MyStringType</span> <span class="o">&</span><span class="n">Value</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">Ctxt</span><span class="p">,</span>
+                     <span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="n">OS</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">OS</span> <span class="o"><<</span> <span class="n">Value</span><span class="p">.</span><span class="n">Str</span><span class="p">;</span>
+  <span class="p">}</span>
+
+  <span class="k">static</span> <span class="n">StringRef</span> <span class="n">input</span><span class="p">(</span><span class="n">StringRef</span> <span class="n">Scalar</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">Ctxt</span><span class="p">,</span>
+                         <span class="n">MyStringType</span> <span class="o">&</span><span class="n">Value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">Value</span><span class="p">.</span><span class="n">Str</span> <span class="o">=</span> <span class="n">Scalar</span><span class="p">.</span><span class="n">str</span><span class="p">();</span>
+    <span class="k">return</span> <span class="nf">StringRef</span><span class="p">();</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="mappings">
+<h2><a class="toc-backref" href="#id12">Mappings</a><a class="headerlink" href="#mappings" title="Permalink to this headline">Â¶</a></h2>
+<p>To be translated to or from a YAML mapping for your type T you must specialize
+llvm::yaml::MappingTraits on T and implement the âvoid mapping(IO &io, T&)â
+method. If your native data structures use pointers to a class everywhere,
+you can specialize on the class pointer.  Examples:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="c1">// Example of struct Foo which is used by value</span>
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Foo</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Foo</span> <span class="o">&</span><span class="n">foo</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"size"</span><span class="p">,</span>      <span class="n">foo</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="c1">// Example of struct Bar which is natively always a pointer</span>
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Bar</span><span class="o">*></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Bar</span> <span class="o">*&</span><span class="n">bar</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"size"</span><span class="p">,</span>    <span class="n">bar</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<div class="section" id="no-normalization">
+<h3><a class="toc-backref" href="#id13">No Normalization</a><a class="headerlink" href="#no-normalization" title="Permalink to this headline">Â¶</a></h3>
+<p>The mapping() method is responsible, if needed, for normalizing and
+denormalizing. In a simple case where the native data structure requires no
+normalization, the mapping method just uses mapOptional() or mapRequired() to
+bind the structâs fields to YAML key names.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Person</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span>         <span class="n">info</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"hat-size"</span><span class="p">,</span>     <span class="n">info</span><span class="p">.</span><span class="n">hatSize</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="normalization">
+<h3><a class="toc-backref" href="#id14">Normalization</a><a class="headerlink" href="#normalization" title="Permalink to this headline">Â¶</a></h3>
+<p>When [de]normalization is required, the mapping() method needs a way to access
+normalized values as fields. To help with this, there is
+a template MappingNormalization<> which you can then use to automatically
+do the normalization and denormalization.  The template is used to create
+a local variable in your mapping() method which contains the normalized keys.</p>
+<p>Suppose you have native data type
+Polar which specifies a position in polar coordinates (distance, angle):</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">Polar</span> <span class="p">{</span>
+  <span class="kt">float</span> <span class="n">distance</span><span class="p">;</span>
+  <span class="kt">float</span> <span class="n">angle</span><span class="p">;</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>but youâve decided the normalized YAML for should be in x,y coordinates. That
+is, you want the yaml to look like:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">x</span><span class="p p-Indicator">:</span>   <span class="l l-Scalar l-Scalar-Plain">10.3</span>
+<span class="l l-Scalar l-Scalar-Plain">y</span><span class="p p-Indicator">:</span>   <span class="l l-Scalar l-Scalar-Plain">-4.7</span>
+</pre></div>
+</div>
+<p>You can support this by defining a MappingTraits that normalizes the polar
+coordinates to x,y coordinates when writing YAML and denormalizes x,y
+coordinates into polar when reading YAML.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Polar</span><span class="o">></span> <span class="p">{</span>
+
+  <span class="k">class</span> <span class="nc">NormalizedPolar</span> <span class="p">{</span>
+  <span class="k">public</span><span class="o">:</span>
+    <span class="n">NormalizedPolar</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">)</span>
+      <span class="o">:</span> <span class="n">x</span><span class="p">(</span><span class="mf">0.0</span><span class="p">),</span> <span class="n">y</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span> <span class="p">{</span>
+    <span class="p">}</span>
+    <span class="n">NormalizedPolar</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="p">,</span> <span class="n">Polar</span> <span class="o">&</span><span class="n">polar</span><span class="p">)</span>
+      <span class="o">:</span> <span class="n">x</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">distance</span> <span class="o">*</span> <span class="n">cos</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">angle</span><span class="p">)),</span>
+        <span class="n">y</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">distance</span> <span class="o">*</span> <span class="n">sin</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">angle</span><span class="p">))</span> <span class="p">{</span>
+    <span class="p">}</span>
+    <span class="n">Polar</span> <span class="n">denormalize</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="p">)</span> <span class="p">{</span>
+      <span class="k">return</span> <span class="n">Polar</span><span class="p">(</span><span class="n">sqrt</span><span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="p">),</span> <span class="n">arctan</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">));</span>
+    <span class="p">}</span>
+
+    <span class="kt">float</span>        <span class="n">x</span><span class="p">;</span>
+    <span class="kt">float</span>        <span class="n">y</span><span class="p">;</span>
+  <span class="p">};</span>
+
+  <span class="k">static</span> <span class="kt">void</span> <span class="nf">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Polar</span> <span class="o">&</span><span class="n">polar</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">MappingNormalization</span><span class="o"><</span><span class="n">NormalizedPolar</span><span class="p">,</span> <span class="n">Polar</span><span class="o">></span> <span class="n">keys</span><span class="p">(</span><span class="n">io</span><span class="p">,</span> <span class="n">polar</span><span class="p">);</span>
+
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span>    <span class="n">keys</span><span class="o">-></span><span class="n">x</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"y"</span><span class="p">,</span>    <span class="n">keys</span><span class="o">-></span><span class="n">y</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>When writing YAML, the local variable âkeysâ will be a stack allocated
+instance of NormalizedPolar, constructed from the supplied polar object which
+initializes it x and y fields.  The mapRequired() methods then write out the x
+and y values as key/value pairs.</p>
+<p>When reading YAML, the local variable âkeysâ will be a stack allocated instance
+of NormalizedPolar, constructed by the empty constructor.  The mapRequired
+methods will find the matching key in the YAML document and fill in the x and y
+fields of the NormalizedPolar object keys. At the end of the mapping() method
+when the local keys variable goes out of scope, the denormalize() method will
+automatically be called to convert the read values back to polar coordinates,
+and then assigned back to the second parameter to mapping().</p>
+<p>In some cases, the normalized class may be a subclass of the native type and
+could be returned by the denormalize() method, except that the temporary
+normalized instance is stack allocated.  In these cases, the utility template
+MappingNormalizationHeap<> can be used instead.  It just like
+MappingNormalization<> except that it heap allocates the normalized object
+when reading YAML.  It never destroys the normalized object.  The denormalize()
+method can this return âthisâ.</p>
+</div>
+<div class="section" id="default-values">
+<h3><a class="toc-backref" href="#id15">Default values</a><a class="headerlink" href="#default-values" title="Permalink to this headline">Â¶</a></h3>
+<p>Within a mapping() method, calls to io.mapRequired() mean that that key is
+required to exist when parsing YAML documents, otherwise YAML I/O will issue an
+error.</p>
+<p>On the other hand, keys registered with io.mapOptional() are allowed to not
+exist in the YAML document being read.  So what value is put in the field
+for those optional keys?
+There are two steps to how those optional fields are filled in. First, the
+second parameter to the mapping() method is a reference to a native class.  That
+native class must have a default constructor.  Whatever value the default
+constructor initially sets for an optional field will be that fieldâs value.
+Second, the mapOptional() method has an optional third parameter.  If provided
+it is the value that mapOptional() should set that field to if the YAML document
+does not have that key.</p>
+<p>There is one important difference between those two ways (default constructor
+and third parameter to mapOptional). When YAML I/O generates a YAML document,
+if the mapOptional() third parameter is used, if the actual value being written
+is the same as (using ==) the default value, then that key/value is not written.</p>
+</div>
+<div class="section" id="order-of-keys">
+<h3><a class="toc-backref" href="#id16">Order of Keys</a><a class="headerlink" href="#order-of-keys" title="Permalink to this headline">Â¶</a></h3>
+<p>When writing out a YAML document, the keys are written in the order that the
+calls to mapRequired()/mapOptional() are made in the mapping() method. This
+gives you a chance to write the fields in an order that a human reader of
+the YAML document would find natural.  This may be different that the order
+of the fields in the native class.</p>
+<p>When reading in a YAML document, the keys in the document can be in any order,
+but they are processed in the order that the calls to mapRequired()/mapOptional()
+are made in the mapping() method.  That enables some interesting
+functionality.  For instance, if the first field bound is the cpu and the second
+field bound is flags, and the flags are cpu specific, you can programmatically
+switch how the flags are converted to and from YAML based on the cpu.
+This works for both reading and writing. For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">Info</span> <span class="p">{</span>
+  <span class="n">CPUs</span>        <span class="n">cpu</span><span class="p">;</span>
+  <span class="kt">uint32_t</span>    <span class="n">flags</span><span class="p">;</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Info</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Info</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"cpu"</span><span class="p">,</span>       <span class="n">info</span><span class="p">.</span><span class="n">cpu</span><span class="p">);</span>
+    <span class="c1">// flags must come after cpu for this to work when reading yaml</span>
+    <span class="k">if</span> <span class="p">(</span> <span class="n">info</span><span class="p">.</span><span class="n">cpu</span> <span class="o">==</span> <span class="n">cpu_x86_64</span> <span class="p">)</span>
+      <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span>  <span class="o">*</span><span class="p">(</span><span class="n">My86_64Flags</span><span class="o">*</span><span class="p">)</span><span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">);</span>
+    <span class="k">else</span>
+      <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span>  <span class="o">*</span><span class="p">(</span><span class="n">My86Flags</span><span class="o">*</span><span class="p">)</span><span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">);</span>
+ <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="tags">
+<h3><a class="toc-backref" href="#id17">Tags</a><a class="headerlink" href="#tags" title="Permalink to this headline">Â¶</a></h3>
+<p>The YAML syntax supports tags as a way to specify the type of a node before
+it is parsed. This allows dynamic types of nodes.  But the YAML I/O model uses
+static typing, so there are limits to how you can use tags with the YAML I/O
+model. Recently, we added support to YAML I/O for checking/setting the optional
+tag on a map. Using this functionality it is even possbile to support different
+mappings, as long as they are convertible.</p>
+<p>To check a tag, inside your mapping() method you can use io.mapTag() to specify
+what the tag should be.  This will also add that tag when writing yaml.</p>
+</div>
+<div class="section" id="validation">
+<h3><a class="toc-backref" href="#id18">Validation</a><a class="headerlink" href="#validation" title="Permalink to this headline">Â¶</a></h3>
+<p>Sometimes in a yaml map, each key/value pair is valid, but the combination is
+not.  This is similar to something having no syntax errors, but still having
+semantic errors.  To support semantic level checking, YAML I/O allows
+an optional <code class="docutils literal"><span class="pre">validate()</span></code> method in a MappingTraits template specialization.</p>
+<p>When parsing yaml, the <code class="docutils literal"><span class="pre">validate()</span></code> method is call <em>after</em> all key/values in
+the map have been processed. Any error message returned by the <code class="docutils literal"><span class="pre">validate()</span></code>
+method during input will be printed just a like a syntax error would be printed.
+When writing yaml, the <code class="docutils literal"><span class="pre">validate()</span></code> method is called <em>before</em> the yaml
+key/values  are written.  Any error during output will trigger an <code class="docutils literal"><span class="pre">assert()</span></code>
+because it is a programming error to have invalid struct values.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">Stuff</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Stuff</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Stuff</span> <span class="o">&</span><span class="n">stuff</span><span class="p">)</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+  <span class="k">static</span> <span class="n">StringRef</span> <span class="n">validate</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Stuff</span> <span class="o">&</span><span class="n">stuff</span><span class="p">)</span> <span class="p">{</span>
+    <span class="c1">// Look at all fields in 'stuff' and if there</span>
+    <span class="c1">// are any bad values return a string describing</span>
+    <span class="c1">// the error.  Otherwise return an empty string.</span>
+    <span class="k">return</span> <span class="n">StringRef</span><span class="p">();</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="flow-mapping">
+<h3><a class="toc-backref" href="#id19">Flow Mapping</a><a class="headerlink" href="#flow-mapping" title="Permalink to this headline">Â¶</a></h3>
+<p>A YAML âflow mappingâ is a mapping that uses the inline notation
+(e.g { x: 1, y: 0 } ) when written to YAML. To specify that a type should be
+written in YAML using flow mapping, your MappingTraits specialization should
+add âstatic const bool flow = true;â. For instance:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">Stuff</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Stuff</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Stuff</span> <span class="o">&</span><span class="n">stuff</span><span class="p">)</span> <span class="p">{</span>
+    <span class="p">...</span>
+  <span class="p">}</span>
+
+  <span class="k">static</span> <span class="k">const</span> <span class="kt">bool</span> <span class="n">flow</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Flow mappings are subject to line wrapping according to the Output object
+configuration.</p>
+</div>
+</div>
+<div class="section" id="sequence">
+<h2><a class="toc-backref" href="#id20">Sequence</a><a class="headerlink" href="#sequence" title="Permalink to this headline">Â¶</a></h2>
+<p>To be translated to or from a YAML sequence for your type T you must specialize
+llvm::yaml::SequenceTraits on T and implement two methods:
+<code class="docutils literal"><span class="pre">size_t</span> <span class="pre">size(IO</span> <span class="pre">&io,</span> <span class="pre">T&)</span></code> and
+<code class="docutils literal"><span class="pre">T::value_type&</span> <span class="pre">element(IO</span> <span class="pre">&io,</span> <span class="pre">T&,</span> <span class="pre">size_t</span> <span class="pre">indx)</span></code>.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">SequenceTraits</span><span class="o"><</span><span class="n">MySeq</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MySeq</span> <span class="o">&</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="n">MySeqEl</span> <span class="o">&</span><span class="n">element</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MySeq</span> <span class="o">&</span><span class="n">list</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>The size() method returns how many elements are currently in your sequence.
+The element() method returns a reference to the iâth element in the sequence.
+When parsing YAML, the element() method may be called with an index one bigger
+than the current size.  Your element() method should allocate space for one
+more element (using default constructor if element is a C++ object) and returns
+a reference to that new allocated space.</p>
+<div class="section" id="flow-sequence">
+<h3><a class="toc-backref" href="#id21">Flow Sequence</a><a class="headerlink" href="#flow-sequence" title="Permalink to this headline">Â¶</a></h3>
+<p>A YAML âflow sequenceâ is a sequence that when written to YAML it uses the
+inline notation (e.g [ foo, bar ] ).  To specify that a sequence type should
+be written in YAML as a flow sequence, your SequenceTraits specialization should
+add âstatic const bool flow = true;â.  For instance:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">SequenceTraits</span><span class="o"><</span><span class="n">MyList</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyList</span> <span class="o">&</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="n">MyListEl</span> <span class="o">&</span><span class="n">element</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyList</span> <span class="o">&</span><span class="n">list</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+
+  <span class="c1">// The existence of this member causes YAML I/O to use a flow sequence</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="kt">bool</span> <span class="n">flow</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>With the above, if you used MyList as the data type in your native data
+structures, then when converted to YAML, a flow sequence of integers
+will be used (e.g. [ 10, -3, 4 ]).</p>
+<p>Flow sequences are subject to line wrapping according to the Output object
+configuration.</p>
+</div>
+<div class="section" id="utility-macros">
+<h3><a class="toc-backref" href="#id22">Utility Macros</a><a class="headerlink" href="#utility-macros" title="Permalink to this headline">Â¶</a></h3>
+<p>Since a common source of sequences is std::vector<>, YAML I/O provides macros:
+LLVM_YAML_IS_SEQUENCE_VECTOR() and LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR() which
+can be used to easily specify SequenceTraits<> on a std::vector type.  YAML
+I/O does not partial specialize SequenceTraits on std::vector<> because that
+would force all vectors to be sequences.  An example use of the macros:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyType1</span><span class="o">></span><span class="p">;</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyType2</span><span class="o">></span><span class="p">;</span>
+<span class="n">LLVM_YAML_IS_SEQUENCE_VECTOR</span><span class="p">(</span><span class="n">MyType1</span><span class="p">)</span>
+<span class="n">LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR</span><span class="p">(</span><span class="n">MyType2</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="document-list">
+<h2><a class="toc-backref" href="#id23">Document List</a><a class="headerlink" href="#document-list" title="Permalink to this headline">Â¶</a></h2>
+<p>YAML allows you to define multiple âdocumentsâ in a single YAML file.  Each
+new document starts with a left aligned âââ token.  The end of all documents
+is denoted with a left aligned ââ¦â token.  Many users of YAML will never
+have need for multiple documents.  The top level node in their YAML schema
+will be a mapping or sequence. For those cases, the following is not needed.
+But for cases where you do want multiple documents, you can specify a
+trait for you document list type.  The trait has the same methods as
+SequenceTraits but is named DocumentListTraits.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">DocumentListTraits</span><span class="o"><</span><span class="n">MyDocList</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyDocList</span> <span class="o">&</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="n">MyDocType</span> <span class="n">element</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyDocList</span> <span class="o">&</span><span class="n">list</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="user-context-data">
+<h2><a class="toc-backref" href="#id24">User Context Data</a><a class="headerlink" href="#user-context-data" title="Permalink to this headline">Â¶</a></h2>
+<p>When an llvm::yaml::Input or llvm::yaml::Output object is created their
+constructors take an optional âcontextâ parameter.  This is a pointer to
+whatever state information you might need.</p>
+<p>For instance, in a previous example we showed how the conversion type for a
+flags field could be determined at runtime based on the value of another field
+in the mapping. But what if an inner mapping needs to know some field value
+of an outer mapping?  That is where the âcontextâ parameter comes in. You
+can set values in the context in the outer mapâs mapping() method and
+retrieve those values in the inner mapâs mapping() method.</p>
+<p>The context value is just a void*.  All your traits which use the context
+and operate on your native data types, need to agree what the context value
+actually is.  It could be a pointer to an object or struct which your various
+traits use to shared context sensitive information.</p>
+</div>
+<div class="section" id="output">
+<h2><a class="toc-backref" href="#id25">Output</a><a class="headerlink" href="#output" title="Permalink to this headline">Â¶</a></h2>
+<p>The llvm::yaml::Output class is used to generate a YAML document from your
+in-memory data structures, using traits defined on your data types.
+To instantiate an Output object you need an llvm::raw_ostream, an optional
+context pointer and an optional wrapping column:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Output</span> <span class="o">:</span> <span class="k">public</span> <span class="n">IO</span> <span class="p">{</span>
+<span class="k">public</span><span class="o">:</span>
+  <span class="n">Output</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">context</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">,</span> <span class="kt">int</span> <span class="n">WrapColumn</span> <span class="o">=</span> <span class="mi">70</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Once you have an Output object, you can use the C++ stream operator on it
+to write your native data as YAML. One thing to recall is that a YAML file
+can contain multiple âdocumentsâ.  If the top level data structure you are
+streaming as YAML is a mapping, scalar, or sequence, then Output assumes you
+are generating one document and wraps the mapping output
+with  â<code class="docutils literal"><span class="pre">---</span></code>â and trailing â<code class="docutils literal"><span class="pre">...</span></code>â.</p>
+<p>The WrapColumn parameter will cause the flow mappings and sequences to
+line-wrap when they go over the supplied column. Pass 0 to completely
+suppress the wrapping.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Output</span><span class="p">;</span>
+
+<span class="kt">void</span> <span class="nf">dumpMyMapDoc</span><span class="p">(</span><span class="k">const</span> <span class="n">MyMapType</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">Output</span> <span class="n">yout</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">outs</span><span class="p">());</span>
+  <span class="n">yout</span> <span class="o"><<</span> <span class="n">info</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The above could produce output like:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="nn">---</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+<span class="nn">...</span>
+</pre></div>
+</div>
+<p>On the other hand, if the top level data structure you are streaming as YAML
+has a DocumentListTraits specialization, then Output walks through each element
+of your DocumentList and generates a âââ before the start of each element
+and ends with a ââ¦â.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Output</span><span class="p">;</span>
+
+<span class="kt">void</span> <span class="nf">dumpMyMapDoc</span><span class="p">(</span><span class="k">const</span> <span class="n">MyDocListType</span> <span class="o">&</span><span class="n">docList</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">Output</span> <span class="n">yout</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">outs</span><span class="p">());</span>
+  <span class="n">yout</span> <span class="o"><<</span> <span class="n">docList</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The above could produce output like:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="nn">---</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+<span class="nn">---</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">shoe-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">11</span>
+<span class="nn">...</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="input">
+<h2><a class="toc-backref" href="#id26">Input</a><a class="headerlink" href="#input" title="Permalink to this headline">Â¶</a></h2>
+<p>The llvm::yaml::Input class is used to parse YAML document(s) into your native
+data structures. To instantiate an Input
+object you need a StringRef to the entire YAML file, and optionally a context
+pointer:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Input</span> <span class="o">:</span> <span class="k">public</span> <span class="n">IO</span> <span class="p">{</span>
+<span class="k">public</span><span class="o">:</span>
+  <span class="n">Input</span><span class="p">(</span><span class="n">StringRef</span> <span class="n">inputContent</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">context</span><span class="o">=</span><span class="nb">NULL</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Once you have an Input object, you can use the C++ stream operator to read
+the document(s).  If you expect there might be multiple YAML documents in
+one file, youâll need to specialize DocumentListTraits on a list of your
+document type and stream in that document list type.  Otherwise you can
+just stream in the document type.  Also, you can check if there was
+any syntax errors in the YAML be calling the error() method on the Input
+object.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// Reading a single document</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Input</span><span class="p">;</span>
+
+<span class="n">Input</span> <span class="nf">yin</span><span class="p">(</span><span class="n">mb</span><span class="p">.</span><span class="n">getBuffer</span><span class="p">());</span>
+
+<span class="c1">// Parse the YAML file</span>
+<span class="n">MyDocType</span> <span class="n">theDoc</span><span class="p">;</span>
+<span class="n">yin</span> <span class="o">>></span> <span class="n">theDoc</span><span class="p">;</span>
+
+<span class="c1">// Check for error</span>
+<span class="k">if</span> <span class="p">(</span> <span class="n">yin</span><span class="p">.</span><span class="n">error</span><span class="p">()</span> <span class="p">)</span>
+  <span class="k">return</span><span class="p">;</span>
+</pre></div>
+</div>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// Reading multiple documents in one file</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Input</span><span class="p">;</span>
+
+<span class="n">LLVM_YAML_IS_DOCUMENT_LIST_VECTOR</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyDocType</span><span class="o">></span><span class="p">)</span>
+
+<span class="n">Input</span> <span class="n">yin</span><span class="p">(</span><span class="n">mb</span><span class="p">.</span><span class="n">getBuffer</span><span class="p">());</span>
+
+<span class="c1">// Parse the YAML file</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyDocType</span><span class="o">></span> <span class="n">theDocList</span><span class="p">;</span>
+<span class="n">yin</span> <span class="o">>></span> <span class="n">theDocList</span><span class="p">;</span>
+
+<span class="c1">// Check for error</span>
+<span class="k">if</span> <span class="p">(</span> <span class="n">yin</span><span class="p">.</span><span class="n">error</span><span class="p">()</span> <span class="p">)</span>
+  <span class="k">return</span><span class="p">;</span>
+</pre></div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="GetElementPtr.html" title="The Often Misunderstood GEP Instruction"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="Passes.html" title="LLVMâs Analysis and Transform Passes"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastfail.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastfail.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastfail.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastsuccess.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastsuccess.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastsuccess.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ld1.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-ld1.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ld1.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ldr.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-ldr.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ldr.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/LangImpl05-cfg.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/LangImpl05-cfg.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/LangImpl05-cfg.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-creation.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-creation.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-creation.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-dyld-load.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-dyld-load.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-dyld-load.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-engine-builder.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-engine-builder.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-engine-builder.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-load-object.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-load-object.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-load-object.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-load.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-load.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-load.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-resolve-relocations.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-resolve-relocations.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-resolve-relocations.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/gcc-loops.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/gcc-loops.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/gcc-loops.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/linpack-pc.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/linpack-pc.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/linpack-pc.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,3761 @@
+=============================
+User Guide for AMDGPU Backend
+=============================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+The AMDGPU backend provides ISA code generation for AMD GPUs, starting with the
+R600 family up until the current GCN families. It lives in the
+``lib/Target/AMDGPU`` directory.
+
+LLVM
+====
+
+.. _amdgpu-target-triples:
+
+Target Triples
+--------------
+
+Use the ``clang -target <Architecture>-<Vendor>-<OS>-<Environment>`` option to
+specify the target triple:
+
+  .. table:: AMDGPU Target Triples
+     :name: amdgpu-target-triples-table
+
+     ============ ======== ========= ===========
+     Architecture Vendor   OS        Environment
+     ============ ======== ========= ===========
+     r600         amd      <empty>   <empty>
+     amdgcn       amd      <empty>   <empty>
+     amdgcn       amd      amdhsa    <empty>
+     amdgcn       amd      amdhsa    opencl
+     amdgcn       amd      amdhsa    amdgizcl
+     amdgcn       amd      amdhsa    amdgiz
+     amdgcn       amd      amdhsa    hcc
+     ============ ======== ========= ===========
+
+``r600-amd--``
+  Supports AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders executed on
+  the MESA runtime.
+
+``amdgcn-amd--``
+  Supports AMD GPUs GCN 6 onwards for graphics and compute shaders executed on
+  the MESA runtime.
+
+``amdgcn-amd-amdhsa-``
+  Supports AMD GCN GPUs GFX6 onwards for compute kernels executed on HSA [HSA]_
+  compatible runtimes such as AMD's ROCm [AMD-ROCm]_.
+
+``amdgcn-amd-amdhsa-opencl``
+  Supports AMD GCN GPUs GFX6 onwards for OpenCL compute kernels executed on HSA
+  [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
+  :ref:`amdgpu-opencl`.
+
+``amdgcn-amd-amdhsa-amdgizcl``
+  Same as ``amdgcn-amd-amdhsa-opencl`` except a different address space mapping
+  is used (see :ref:`amdgpu-address-spaces`).
+
+``amdgcn-amd-amdhsa-amdgiz``
+  Same as ``amdgcn-amd-amdhsa-`` except a different address space mapping is
+  used (see :ref:`amdgpu-address-spaces`).
+
+``amdgcn-amd-amdhsa-hcc``
+  Supports AMD GCN GPUs GFX6 onwards for AMD HC language compute kernels
+  executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
+  :ref:`amdgpu-hcc`.
+
+.. _amdgpu-processors:
+
+Processors
+----------
+
+Use the ``clang -mcpu <Processor>`` option to specify the AMD GPU processor. The
+names from both the *Processor* and *Alternative Processor* can be used.
+
+  .. table:: AMDGPU Processors
+     :name: amdgpu-processors-table
+
+     ========== =========== ============ ===== ======= ==================
+     Processor  Alternative Target       dGPU/ Runtime Example
+                Processor   Triple       APU   Support Products
+                            Architecture
+     ========== =========== ============ ===== ======= ==================
+     **R600** [AMD-R6xx]_
+     --------------------------------------------------------------------
+     r600                   r600         dGPU
+     r630                   r600         dGPU
+     rs880                  r600         dGPU
+     rv670                  r600         dGPU
+     **R700** [AMD-R7xx]_
+     --------------------------------------------------------------------
+     rv710                  r600         dGPU
+     rv730                  r600         dGPU
+     rv770                  r600         dGPU
+     **Evergreen** [AMD-Evergreen]_
+     --------------------------------------------------------------------
+     cedar                  r600         dGPU
+     redwood                r600         dGPU
+     sumo                   r600         dGPU
+     juniper                r600         dGPU
+     cypress                r600         dGPU
+     **Northern Islands** [AMD-Cayman-Trinity]_
+     --------------------------------------------------------------------
+     barts                  r600         dGPU
+     turks                  r600         dGPU
+     caicos                 r600         dGPU
+     cayman                 r600         dGPU
+     **GCN GFX6 (Southern Islands (SI))** [AMD-Souther-Islands]_
+     --------------------------------------------------------------------
+     gfx600     - SI        amdgcn       dGPU
+                - tahiti
+     gfx601     - pitcairn  amdgcn       dGPU
+                - verde
+                - oland
+                - hainan
+     **GCN GFX7 (Sea Islands (CI))** [AMD-Sea-Islands]_
+     --------------------------------------------------------------------
+     gfx700     - bonaire   amdgcn       dGPU          - Radeon HD 7790
+                                                       - Radeon HD 8770
+                                                       - R7 260
+                                                       - R7 260X
+     \          - kaveri    amdgcn       APU           - A6-7000
+                                                       - A6 Pro-7050B
+                                                       - A8-7100
+                                                       - A8 Pro-7150B
+                                                       - A10-7300
+                                                       - A10 Pro-7350B
+                                                       - FX-7500
+                                                       - A8-7200P
+                                                       - A10-7400P
+                                                       - FX-7600P
+     gfx701     - hawaii    amdgcn       dGPU  ROCm    - FirePro W8100
+                                                       - FirePro W9100
+                                                       - FirePro S9150
+                                                       - FirePro S9170
+     gfx702                              dGPU  ROCm    - Radeon R9 290
+                                                       - Radeon R9 290x
+                                                       - Radeon R390
+                                                       - Radeon R390x
+     gfx703     - kabini    amdgcn       APU           - E1-2100
+                - mullins                              - E1-2200
+                                                       - E1-2500
+                                                       - E2-3000
+                                                       - E2-3800
+                                                       - A4-5000
+                                                       - A4-5100
+                                                       - A6-5200
+                                                       - A4 Pro-3340B
+     **GCN GFX8 (Volcanic Islands (VI))** [AMD-Volcanic-Islands]_
+     --------------------------------------------------------------------
+     gfx800     - iceland   amdgcn       dGPU          - FirePro S7150
+                                                       - FirePro S7100
+                                                       - FirePro W7100
+                                                       - Radeon R285
+                                                       - Radeon R9 380
+                                                       - Radeon R9 385
+                                                       - Mobile FirePro
+                                                         M7170
+     gfx801     - carrizo   amdgcn       APU           - A6-8500P
+                                                       - Pro A6-8500B
+                                                       - A8-8600P
+                                                       - Pro A8-8600B
+                                                       - FX-8800P
+                                                       - Pro A12-8800B
+     \                      amdgcn       APU   ROCm    - A10-8700P
+                                                       - Pro A10-8700B
+                                                       - A10-8780P
+     \                      amdgcn       APU           - A10-9600P
+                                                       - A10-9630P
+                                                       - A12-9700P
+                                                       - A12-9730P
+                                                       - FX-9800P
+                                                       - FX-9830P
+     \                      amdgcn       APU           - E2-9010
+                                                       - A6-9210
+                                                       - A9-9410
+     gfx802     - tonga     amdgcn       dGPU  ROCm    Same as gfx800
+     gfx803     - fiji      amdgcn       dGPU  ROCm    - Radeon R9 Nano
+                                                       - Radeon R9 Fury
+                                                       - Radeon R9 FuryX
+                                                       - Radeon Pro Duo
+                                                       - FirePro S9300x2
+     \          - polaris10 amdgcn       dGPU  ROCm    - Radeon RX 470
+                                                       - Radeon RX 480
+     \          - polaris11 amdgcn       dGPU  ROCm    - Radeon RX 460
+     gfx804                 amdgcn       dGPU          Same as gfx803
+     gfx810     - stoney    amdgcn       APU
+     **GCN GFX9**
+     --------------------------------------------------------------------
+     gfx900                 amdgcn       dGPU          - Radeon Vega Frontier Edition
+     gfx901                 amdgcn       dGPU  ROCm    Same as gfx900
+                                                       except XNACK is
+                                                       enabled
+     gfx902                 amdgcn       APU           *TBA*
+
+                                                       .. TODO
+                                                          Add product
+                                                          names.
+     gfx903                 amdgcn       APU           Same as gfx902
+                                                       except XNACK is
+                                                       enabled
+     ========== =========== ============ ===== ======= ==================
+
+.. _amdgpu-address-spaces:
+
+Address Spaces
+--------------
+
+The AMDGPU backend uses the following address space mappings.
+
+The memory space names used in the table, aside from the region memory space, is
+from the OpenCL standard.
+
+LLVM Address Space number is used throughout LLVM (for example, in LLVM IR).
+
+  .. table:: Address Space Mapping
+     :name: amdgpu-address-space-mapping-table
+
+     ================== ================= ================= ================= =================
+     LLVM Address Space Memory Space
+     ------------------ -----------------------------------------------------------------------
+     \                  Current Default   amdgiz/amdgizcl   hcc               Future Default
+     ================== ================= ================= ================= =================
+     0                  Private (Scratch) Generic (Flat)    Generic (Flat)    Generic (Flat)
+     1                  Global            Global            Global            Global
+     2                  Constant          Constant          Constant          Region (GDS)
+     3                  Local (group/LDS) Local (group/LDS) Local (group/LDS) Local (group/LDS)
+     4                  Generic (Flat)    Region (GDS)      Region (GDS)      Constant
+     5                  Region (GDS)      Private (Scratch) Private (Scratch) Private (Scratch)
+     ================== ================= ================= ================= =================
+
+Current Default
+  This is the current default address space mapping used for all languages
+  except hcc. This will shortly be deprecated.
+
+amdgiz/amdgizcl
+  This is the current address space mapping used when ``amdgiz`` or ``amdgizcl``
+  is specified as the target triple environment value.
+
+hcc
+  This is the current address space mapping used when ``hcc`` is specified as
+  the target triple environment value.This will shortly be deprecated.
+
+Future Default
+  This will shortly be the only address space mapping for all languages using
+  AMDGPU backend.
+
+.. _amdgpu-memory-scopes:
+
+Memory Scopes
+-------------
+
+This section provides LLVM memory synchronization scopes supported by the AMDGPU
+backend memory model when the target triple OS is ``amdhsa`` (see
+:ref:`amdgpu-amdhsa-memory-model` and :ref:`amdgpu-target-triples`).
+
+The memory model supported is based on the HSA memory model [HSA]_ which is
+based in turn on HRF-indirect with scope inclusion [HRF]_. The happens-before
+relation is transitive over the synchonizes-with relation independent of scope,
+and synchonizes-with allows the memory scope instances to be inclusive (see
+table :ref:`amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table`).
+
+This is different to the OpenCL [OpenCL]_ memory model which does not have scope
+inclusion and requires the memory scopes to exactly match. However, this
+is conservatively correct for OpenCL.
+
+  .. table:: AMDHSA LLVM Sync Scopes for AMDHSA
+     :name: amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table
+
+     ================ ==========================================================
+     LLVM Sync Scope  Description
+     ================ ==========================================================
+     *none*           The default: ``system``.
+
+                      Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system``.
+                      - ``agent`` and executed by a thread on the same agent.
+                      - ``workgroup`` and executed by a thread in the same
+                        workgroup.
+                      - ``wavefront`` and executed by a thread in the same
+                        wavefront.
+
+     ``agent``        Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system`` or ``agent`` and executed by a thread on the
+                        same agent.
+                      - ``workgroup`` and executed by a thread in the same
+                        workgroup.
+                      - ``wavefront`` and executed by a thread in the same
+                        wavefront.
+
+     ``workgroup``    Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system``, ``agent`` or ``workgroup`` and executed by a
+                        thread in the same workgroup.
+                      - ``wavefront`` and executed by a thread in the same
+                        wavefront.
+
+     ``wavefront``    Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system``, ``agent``, ``workgroup`` or ``wavefront``
+                        and executed by a thread in the same wavefront.
+
+     ``singlethread`` Only synchronizes with, and participates in modification
+                      and seq_cst total orderings with, other operations (except
+                      image operations) running in the same thread for all
+                      address spaces (for example, in signal handlers).
+     ================ ==========================================================
+
+AMDGPU Intrinsics
+-----------------
+
+The AMDGPU backend implements the following intrinsics.
+
+*This section is WIP.*
+
+.. TODO
+   List AMDGPU intrinsics
+
+Code Object
+===========
+
+The AMDGPU backend generates a standard ELF [ELF]_ relocatable code object that
+can be linked by ``lld`` to produce a standard ELF shared code object which can
+be loaded and executed on an AMDGPU target.
+
+Header
+------
+
+The AMDGPU backend uses the following ELF header:
+
+  .. table:: AMDGPU ELF Header
+     :name: amdgpu-elf-header-table
+
+     ========================== =========================
+     Field                      Value
+     ========================== =========================
+     ``e_ident[EI_CLASS]``      ``ELFCLASS64``
+     ``e_ident[EI_DATA]``       ``ELFDATA2LSB``
+     ``e_ident[EI_OSABI]``      ``ELFOSABI_AMDGPU_HSA``
+     ``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA``
+     ``e_type``                 ``ET_REL`` or ``ET_DYN``
+     ``e_machine``              ``EM_AMDGPU``
+     ``e_entry``                0
+     ``e_flags``                0
+     ========================== =========================
+
+..
+
+  .. table:: AMDGPU ELF Header Enumeration Values
+     :name: amdgpu-elf-header-enumeration-values-table
+
+     ============================ =====
+     Name                         Value
+     ============================ =====
+     ``EM_AMDGPU``                224
+     ``ELFOSABI_AMDGPU_HSA``      64
+     ``ELFABIVERSION_AMDGPU_HSA`` 1
+     ============================ =====
+
+``e_ident[EI_CLASS]``
+  The ELF class is always ``ELFCLASS64``. The AMDGPU backend only supports 64 bit
+  applications.
+
+``e_ident[EI_DATA]``
+  All AMDGPU targets use ELFDATA2LSB for little-endian byte ordering.
+
+``e_ident[EI_OSABI]``
+  The AMD GPU architecture specific OS ABI of ``ELFOSABI_AMDGPU_HSA`` is used to
+  specify that the code object conforms to the AMD HSA runtime ABI [HSA]_.
+
+``e_ident[EI_ABIVERSION]``
+  The AMD GPU architecture specific OS ABI version of
+  ``ELFABIVERSION_AMDGPU_HSA`` is used to specify the version of AMD HSA runtime
+  ABI to which the code object conforms.
+
+``e_type``
+  Can be one of the following values:
+
+
+  ``ET_REL``
+    The type produced by the AMD GPU backend compiler as it is relocatable code
+    object.
+
+  ``ET_DYN``
+    The type produced by the linker as it is a shared code object.
+
+  The AMD HSA runtime loader requires a ``ET_DYN`` code object.
+
+``e_machine``
+  The value ``EM_AMDGPU`` is used for the machine for all members of the AMD GPU
+  architecture family. The specific member is specified in the
+  ``NT_AMD_AMDGPU_ISA`` entry in the ``.note`` section (see
+  :ref:`amdgpu-note-records`).
+
+``e_entry``
+  The entry point is 0 as the entry points for individual kernels must be
+  selected in order to invoke them through AQL packets.
+
+``e_flags``
+  The value is 0 as no flags are used.
+
+Sections
+--------
+
+An AMDGPU target ELF code object has the standard ELF sections which include:
+
+  .. table:: AMDGPU ELF Sections
+     :name: amdgpu-elf-sections-table
+
+     ================== ================ =================================
+     Name               Type             Attributes
+     ================== ================ =================================
+     ``.bss``           ``SHT_NOBITS``   ``SHF_ALLOC`` + ``SHF_WRITE``
+     ``.data``          ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_WRITE``
+     ``.debug_``\ *\**  ``SHT_PROGBITS`` *none*
+     ``.dynamic``       ``SHT_DYNAMIC``  ``SHF_ALLOC``
+     ``.dynstr``        ``SHT_PROGBITS`` ``SHF_ALLOC``
+     ``.dynsym``        ``SHT_PROGBITS`` ``SHF_ALLOC``
+     ``.got``           ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_WRITE``
+     ``.hash``          ``SHT_HASH``     ``SHF_ALLOC``
+     ``.note``          ``SHT_NOTE``     *none*
+     ``.rela``\ *name*  ``SHT_RELA``     *none*
+     ``.rela.dyn``      ``SHT_RELA``     *none*
+     ``.rodata``        ``SHT_PROGBITS`` ``SHF_ALLOC``
+     ``.shstrtab``      ``SHT_STRTAB``   *none*
+     ``.strtab``        ``SHT_STRTAB``   *none*
+     ``.symtab``        ``SHT_SYMTAB``   *none*
+     ``.text``          ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_EXECINSTR``
+     ================== ================ =================================
+
+These sections have their standard meanings (see [ELF]_) and are only generated
+if needed.
+
+``.debug``\ *\**
+  The standard DWARF sections. See :ref:`amdgpu-dwarf` for information on the
+  DWARF produced by the AMDGPU backend.
+
+``.dynamic``, ``.dynstr``, ``.dynstr``, ``.hash``
+  The standard sections used by a dynamic loader.
+
+``.note``
+  See :ref:`amdgpu-note-records` for the note records supported by the AMDGPU
+  backend.
+
+``.rela``\ *name*, ``.rela.dyn``
+  For relocatable code objects, *name* is the name of the section that the
+  relocation records apply. For example, ``.rela.text`` is the section name for
+  relocation records associated with the ``.text`` section.
+
+  For linked shared code objects, ``.rela.dyn`` contains all the relocation
+  records from each of the relocatable code object's ``.rela``\ *name* sections.
+
+  See :ref:`amdgpu-relocation-records` for the relocation records supported by
+  the AMDGPU backend.
+
+``.text``
+  The executable machine code for the kernels and functions they call. Generated
+  as position independent code. See :ref:`amdgpu-code-conventions` for
+  information on conventions used in the isa generation.
+
+.. _amdgpu-note-records:
+
+Note Records
+------------
+
+As required by ``ELFCLASS64``, minimal zero byte padding must be generated after
+the ``name`` field to ensure the ``desc`` field is 4 byte aligned. In addition,
+minimal zero byte padding must be generated to ensure the ``desc`` field size is
+a multiple of 4 bytes. The ``sh_addralign`` field of the ``.note`` section must
+be at least 4 to indicate at least 8 byte alignment.
+
+The AMDGPU backend code object uses the following ELF note records in the
+``.note`` section. The *Description* column specifies the layout of the note
+recordâs ``desc`` field. All fields are consecutive bytes. Note records with
+variable size strings have a corresponding ``*_size`` field that specifies the
+number of bytes, including the terminating null character, in the string. The
+string(s) come immediately after the preceding fields.
+
+Additional note records can be present.
+
+  .. table:: AMDGPU ELF Note Records
+     :name: amdgpu-elf-note-records-table
+
+     ===== ========================== ==========================================
+     Name  Type                       Description
+     ===== ========================== ==========================================
+     "AMD" ``NT_AMD_AMDGPU_METADATA`` <metadata null terminated string>
+     "AMD" ``NT_AMD_AMDGPU_ISA``      <isa name null terminated string>
+     ===== ========================== ==========================================
+
+..
+
+  .. table:: AMDGPU ELF Note Record Enumeration Values
+     :name: amdgpu-elf-note-record-enumeration-values-table
+
+     ============================= =====
+     Name                          Value
+     ============================= =====
+     *reserved*                    0-9
+     ``NT_AMD_AMDGPU_METADATA``    10
+     ``NT_AMD_AMDGPU_ISA``         11
+     ============================= =====
+
+``NT_AMD_AMDGPU_ISA``
+  Specifies the instruction set architecture used by the machine code contained
+  in the code object.
+
+  This note record is required for code objects containing machine code for
+  processors matching the ``amdgcn`` architecture in table
+  :ref:`amdgpu-processors`.
+
+  The null terminated string has the following syntax:
+
+    *architecture*\ ``-``\ *vendor*\ ``-``\ *os*\ ``-``\ *environment*\ ``-``\ *processor*
+
+  where:
+
+    *architecture*
+      The architecture from table :ref:`amdgpu-target-triples-table`.
+
+      This is always ``amdgcn`` when the target triple OS is ``amdhsa`` (see
+      :ref:`amdgpu-target-triples`).
+
+    *vendor*
+      The vendor from table :ref:`amdgpu-target-triples-table`.
+
+      For the AMDGPU backend this is always ``amd``.
+
+    *os*
+      The OS from table :ref:`amdgpu-target-triples-table`.
+
+    *environment*
+      An environment from table :ref:`amdgpu-target-triples-table`, or blank if
+      the environment has no affect on the execution of the code object.
+
+      For the AMDGPU backend this is currently always blank.
+    *processor*
+      The processor from table :ref:`amdgpu-processors-table`.
+
+  For example:
+
+    ``amdgcn-amd-amdhsa--gfx901``
+
+``NT_AMD_AMDGPU_METADATA``
+  Specifies extensible metadata associated with the code object. See
+  :ref:`amdgpu-code-object-metadata` for the syntax of the code object metadata
+  string.
+
+  This note record is required and must contain the minimum information
+  necessary to support the ROCM kernel queries. For example, the segment sizes
+  needed in a dispatch packet. In addition, a high level language runtime may
+  require other information to be included. For example, the AMD OpenCL runtime
+  records kernel argument information.
+
+  .. TODO
+     Is the string null terminated? It probably should not if YAML allows it to
+     contain null characters, otherwise it should be.
+
+.. _amdgpu-code-object-metadata:
+
+Code Object Metadata
+--------------------
+
+The code object metadata is specified by the ``NT_AMD_AMDHSA_METADATA`` note
+record (see :ref:`amdgpu-note-records`).
+
+The metadata is specified as a YAML formatted string (see [YAML]_ and
+:doc:`YamlIO`).
+
+The metadata is represented as a single YAML document comprised of the mapping
+defined in table :ref:`amdgpu-amdhsa-code-object-metadata-mapping-table` and
+referenced tables.
+
+For boolean values, the string values of ``false`` and ``true`` are used for
+false and true respectively.
+
+Additional information can be added to the mappings. To avoid conflicts, any
+non-AMD key names should be prefixed by "*vendor-name*.".
+
+  .. table:: AMDHSA Code Object Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-metadata-mapping-table
+
+     ========== ============== ========= =======================================
+     String Key Value Type     Required? Description
+     ========== ============== ========= =======================================
+     "Version"  sequence of    Required  - The first integer is the major
+                2 integers                 version. Currently 1.
+                                         - The second integer is the minor
+                                           version. Currently 0.
+     "Printf"   sequence of              Each string is encoded information
+                strings                  about a printf function call. The
+                                         encoded information is organized as
+                                         fields separated by colon (':'):
+
+                                         ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString``
+
+                                         where:
+
+                                         ``ID``
+                                           A 32 bit integer as a unique id for
+                                           each printf function call
+
+                                         ``N``
+                                           A 32 bit integer equal to the number
+                                           of arguments of printf function call
+                                           minus 1
+
+                                         ``S[i]`` (where i = 0, 1, ... , N-1)
+                                           32 bit integers for the size in bytes
+                                           of the i-th FormatString argument of
+                                           the printf function call
+
+                                         FormatString
+                                           The format string passed to the
+                                           printf function call.
+     "Kernels"  sequence of    Required  Sequence of the mappings for each
+                mapping                  kernel in the code object. See
+                                         :ref:`amdgpu-amdhsa-code-object-kernel-metadata-mapping-table`
+                                         for the definition of the mapping.
+     ========== ============== ========= =======================================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-metadata-mapping-table
+
+     ================= ============== ========= ================================
+     String Key        Value Type     Required? Description
+     ================= ============== ========= ================================
+     "Name"            string         Required  Source name of the kernel.
+     "SymbolName"      string         Required  Name of the kernel
+                                                descriptor ELF symbol.
+     "Language"        string                   Source language of the kernel.
+                                                Values include:
+
+                                                - "OpenCL C"
+                                                - "OpenCL C++"
+                                                - "HCC"
+                                                - "OpenMP"
+
+     "LanguageVersion" sequence of              - The first integer is the major
+                       2 integers                 version.
+                                                - The second integer is the
+                                                  minor version.
+     "Attrs"           mapping                  Mapping of kernel attributes.
+                                                See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table`
+                                                for the mapping definition.
+     "Arguments"       sequence of              Sequence of mappings of the
+                       mapping                  kernel arguments. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table`
+                                                for the definition of the mapping.
+     "CodeProps"       mapping                  Mapping of properties related to
+                                                the kernel code. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
+                                                for the mapping definition.
+     "DebugProps"      mapping                  Mapping of properties related to
+                                                the kernel debugging. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table`
+                                                for the mapping definition.
+     ================= ============== ========= ================================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Attribute Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table
+
+     =================== ============== ========= ==============================
+     String Key          Value Type     Required? Description
+     =================== ============== ========= ==============================
+     "ReqdWorkGroupSize" sequence of              The dispatch work-group size
+                         3 integers               X, Y, Z must correspond to the
+                                                  specified values.
+
+                                                  Corresponds to the OpenCL
+                                                  ``reqd_work_group_size``
+                                                  attribute.
+     "WorkGroupSizeHint" sequence of              The dispatch work-group size
+                         3 integers               X, Y, Z is likely to be the
+                                                  specified values.
+
+                                                  Corresponds to the OpenCL
+                                                  ``work_group_size_hint``
+                                                  attribute.
+     "VecTypeHint"       string                   The name of a scalar or vector
+                                                  type.
+
+                                                  Corresponds to the OpenCL
+                                                  ``vec_type_hint`` attribute.
+     =================== ============== ========= ==============================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Argument Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table
+
+     ================= ============== ========= ================================
+     String Key        Value Type     Required? Description
+     ================= ============== ========= ================================
+     "Name"            string                   Kernel argument name.
+     "TypeName"        string                   Kernel argument type name.
+     "Size"            integer        Required  Kernel argument size in bytes.
+     "Align"           integer        Required  Kernel argument alignment in
+                                                bytes. Must be a power of two.
+     "ValueKind"       string         Required  Kernel argument kind that
+                                                specifies how to set up the
+                                                corresponding argument.
+                                                Values include:
+
+                                                "ByValue"
+                                                  The argument is copied
+                                                  directly into the kernarg.
+
+                                                "GlobalBuffer"
+                                                  A global address space pointer
+                                                  to the buffer data is passed
+                                                  in the kernarg.
+
+                                                "DynamicSharedPointer"
+                                                  A group address space pointer
+                                                  to dynamically allocated LDS
+                                                  is passed in the kernarg.
+
+                                                "Sampler"
+                                                  A global address space
+                                                  pointer to a S# is passed in
+                                                  the kernarg.
+
+                                                "Image"
+                                                  A global address space
+                                                  pointer to a T# is passed in
+                                                  the kernarg.
+
+                                                "Pipe"
+                                                  A global address space pointer
+                                                  to an OpenCL pipe is passed in
+                                                  the kernarg.
+
+                                                "Queue"
+                                                  A global address space pointer
+                                                  to an OpenCL device enqueue
+                                                  queue is passed in the
+                                                  kernarg.
+
+                                                "HiddenGlobalOffsetX"
+                                                  The OpenCL grid dispatch
+                                                  global offset for the X
+                                                  dimension is passed in the
+                                                  kernarg.
+
+                                                "HiddenGlobalOffsetY"
+                                                  The OpenCL grid dispatch
+                                                  global offset for the Y
+                                                  dimension is passed in the
+                                                  kernarg.
+
+                                                "HiddenGlobalOffsetZ"
+                                                  The OpenCL grid dispatch
+                                                  global offset for the Z
+                                                  dimension is passed in the
+                                                  kernarg.
+
+                                                "HiddenNone"
+                                                  An argument that is not used
+                                                  by the kernel. Space needs to
+                                                  be left for it, but it does
+                                                  not need to be set up.
+
+                                                "HiddenPrintfBuffer"
+                                                  A global address space pointer
+                                                  to the runtime printf buffer
+                                                  is passed in kernarg.
+
+                                                "HiddenDefaultQueue"
+                                                  A global address space pointer
+                                                  to the OpenCL device enqueue
+                                                  queue that should be used by
+                                                  the kernel by default is
+                                                  passed in the kernarg.
+
+                                                "HiddenCompletionAction"
+                                                  *TBD*
+
+                                                  .. TODO
+                                                     Add description.
+
+     "ValueType"       string         Required  Kernel argument value type. Only
+                                                present if "ValueKind" is
+                                                "ByValue". For vector data
+                                                types, the value is for the
+                                                element type. Values include:
+
+                                                - "Struct"
+                                                - "I8"
+                                                - "U8"
+                                                - "I16"
+                                                - "U16"
+                                                - "F16"
+                                                - "I32"
+                                                - "U32"
+                                                - "F32"
+                                                - "I64"
+                                                - "U64"
+                                                - "F64"
+
+                                                .. TODO
+                                                   How can it be determined if a
+                                                   vector type, and what size
+                                                   vector?
+     "PointeeAlign"    integer                  Alignment in bytes of pointee
+                                                type for pointer type kernel
+                                                argument. Must be a power
+                                                of 2. Only present if
+                                                "ValueKind" is
+                                                "DynamicSharedPointer".
+     "AddrSpaceQual"   string                   Kernel argument address space
+                                                qualifier. Only present if
+                                                "ValueKind" is "GlobalBuffer" or
+                                                "DynamicSharedPointer". Values
+                                                are:
+
+                                                - "Private"
+                                                - "Global"
+                                                - "Constant"
+                                                - "Local"
+                                                - "Generic"
+                                                - "Region"
+
+                                                .. TODO
+                                                   Is GlobalBuffer only Global
+                                                   or Constant? Is
+                                                   DynamicSharedPointer always
+                                                   Local? Can HCC allow Generic?
+                                                   How can Private or Region
+                                                   ever happen?
+     "AccQual"         string                   Kernel argument access
+                                                qualifier. Only present if
+                                                "ValueKind" is "Image" or
+                                                "Pipe". Values
+                                                are:
+
+                                                - "ReadOnly"
+                                                - "WriteOnly"
+                                                - "ReadWrite"
+
+                                                .. TODO
+                                                   Does this apply to
+                                                   GlobalBuffer?
+     "ActualAcc"       string                   The actual memory accesses
+                                                performed by the kernel on the
+                                                kernel argument. Only present if
+                                                "ValueKind" is "GlobalBuffer",
+                                                "Image", or "Pipe". This may be
+                                                more restrictive than indicated
+                                                by "AccQual" to reflect what the
+                                                kernel actual does. If not
+                                                present then the runtime must
+                                                assume what is implied by
+                                                "AccQual" and "IsConst". Values
+                                                are:
+
+                                                - "ReadOnly"
+                                                - "WriteOnly"
+                                                - "ReadWrite"
+
+     "IsConst"         boolean                  Indicates if the kernel argument
+                                                is const qualified. Only present
+                                                if "ValueKind" is
+                                                "GlobalBuffer".
+
+     "IsRestrict"      boolean                  Indicates if the kernel argument
+                                                is restrict qualified. Only
+                                                present if "ValueKind" is
+                                                "GlobalBuffer".
+
+     "IsVolatile"      boolean                  Indicates if the kernel argument
+                                                is volatile qualified. Only
+                                                present if "ValueKind" is
+                                                "GlobalBuffer".
+
+     "IsPipe"          boolean                  Indicates if the kernel argument
+                                                is pipe qualified. Only present
+                                                if "ValueKind" is "Pipe".
+
+                                                .. TODO
+                                                   Can GlobalBuffer be pipe
+                                                   qualified?
+     ================= ============== ========= ================================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Code Properties Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table
+
+     ============================ ============== ========= =====================
+     String Key                   Value Type     Required? Description
+     ============================ ============== ========= =====================
+     "KernargSegmentSize"         integer        Required  The size in bytes of
+                                                           the kernarg segment
+                                                           that holds the values
+                                                           of the arguments to
+                                                           the kernel.
+     "GroupSegmentFixedSize"      integer        Required  The amount of group
+                                                           segment memory
+                                                           required by a
+                                                           work-group in
+                                                           bytes. This does not
+                                                           include any
+                                                           dynamically allocated
+                                                           group segment memory
+                                                           that may be added
+                                                           when the kernel is
+                                                           dispatched.
+     "PrivateSegmentFixedSize"    integer        Required  The amount of fixed
+                                                           private address space
+                                                           memory required for a
+                                                           work-item in
+                                                           bytes. If
+                                                           IsDynamicCallstack
+                                                           is 1 then additional
+                                                           space must be added
+                                                           to this value for the
+                                                           call stack.
+     "KernargSegmentAlign"        integer        Required  The maximum byte
+                                                           alignment of
+                                                           arguments in the
+                                                           kernarg segment. Must
+                                                           be a power of 2.
+     "WavefrontSize"              integer        Required  Wavefront size. Must
+                                                           be a power of 2.
+     "NumSGPRs"                   integer                  Number of scalar
+                                                           registers used by a
+                                                           wavefront for
+                                                           GFX6-GFX9. This
+                                                           includes the special
+                                                           SGPRs for VCC, Flat
+                                                           Scratch (GFX7-GFX9)
+                                                           and XNACK (for
+                                                           GFX8-GFX9). It does
+                                                           not include the 16
+                                                           SGPR added if a trap
+                                                           handler is
+                                                           enabled. It is not
+                                                           rounded up to the
+                                                           allocation
+                                                           granularity.
+     "NumVGPRs"                   integer                  Number of vector
+                                                           registers used by
+                                                           each work-item for
+                                                           GFX6-GFX9
+     "MaxFlatWorkgroupSize"       integer                  Maximum flat
+                                                           work-group size
+                                                           supported by the
+                                                           kernel in work-items.
+     "IsDynamicCallStack"         boolean                  Indicates if the
+                                                           generated machine
+                                                           code is using a
+                                                           dynamically sized
+                                                           call stack.
+     "IsXNACKEnabled"             boolean                  Indicates if the
+                                                           generated machine
+                                                           code is capable of
+                                                           supporting XNACK.
+     ============================ ============== ========= =====================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Debug Properties Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table
+
+     =================================== ============== ========= ==============
+     String Key                          Value Type     Required? Description
+     =================================== ============== ========= ==============
+     "DebuggerABIVersion"                string
+     "ReservedNumVGPRs"                  integer
+     "ReservedFirstVGPR"                 integer
+     "PrivateSegmentBufferSGPR"          integer
+     "WavefrontPrivateSegmentOffsetSGPR" integer
+     =================================== ============== ========= ==============
+
+.. TODO
+   Plan to remove the debug properties metadata.   
+
+.. _amdgpu-symbols:
+
+Symbols
+-------
+
+Symbols include the following:
+
+  .. table:: AMDGPU ELF Symbols
+     :name: amdgpu-elf-symbols-table
+
+     ===================== ============== ============= ==================
+     Name                  Type           Section       Description
+     ===================== ============== ============= ==================
+     *link-name*           ``STT_OBJECT`` - ``.data``   Global variable
+                                          - ``.rodata``
+                                          - ``.bss``
+     *link-name*\ ``@kd``  ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
+     *link-name*           ``STT_FUNC``   - ``.text``   Kernel entry point
+     ===================== ============== ============= ==================
+
+Global variable
+  Global variables both used and defined by the compilation unit.
+
+  If the symbol is defined in the compilation unit then it is allocated in the
+  appropriate section according to if it has initialized data or is readonly.
+
+  If the symbol is external then its section is ``STN_UNDEF`` and the loader
+  will resolve relocations using the definition provided by another code object
+  or explicitly defined by the runtime.
+
+  All global symbols, whether defined in the compilation unit or external, are
+  accessed by the machine code indirectly through a GOT table entry. This
+  allows them to be preemptable. The GOT table is only supported when the target
+  triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`).
+
+  .. TODO
+     Add description of linked shared object symbols. Seems undefined symbols
+     are marked as STT_NOTYPE.
+
+Kernel descriptor
+  Every HSA kernel has an associated kernel descriptor. It is the address of the
+  kernel descriptor that is used in the AQL dispatch packet used to invoke the
+  kernel, not the kernel entry point. The layout of the HSA kernel descriptor is
+  defined in :ref:`amdgpu-amdhsa-kernel-descriptor`.
+
+Kernel entry point
+  Every HSA kernel also has a symbol for its machine code entry point.
+
+.. _amdgpu-relocation-records:
+
+Relocation Records
+------------------
+
+AMDGPU backend generates ``Elf64_Rela`` relocation records. Supported
+relocatable fields are:
+
+``word32``
+  This specifies a 32-bit field occupying 4 bytes with arbitrary byte
+  alignment. These values use the same byte order as other word values in the
+  AMD GPU architecture.
+
+``word64``
+  This specifies a 64-bit field occupying 8 bytes with arbitrary byte
+  alignment. These values use the same byte order as other word values in the
+  AMD GPU architecture.
+
+Following notations are used for specifying relocation calculations:
+
+**A**
+  Represents the addend used to compute the value of the relocatable field.
+
+**G**
+  Represents the offset into the global offset table at which the relocation
+  entryâs symbol will reside during execution.
+
+**GOT**
+  Represents the address of the global offset table.
+
+**P**
+  Represents the place (section offset for ``et_rel`` or address for ``et_dyn``)
+  of the storage unit being relocated (computed using ``r_offset``).
+
+**S**
+  Represents the value of the symbol whose index resides in the relocation
+  entry.
+
+The following relocation types are supported:
+
+  .. table:: AMDGPU ELF Relocation Records
+     :name: amdgpu-elf-relocation-records-table
+
+     ==========================  =====  ==========  ==============================
+     Relocation Type             Value  Field       Calculation
+     ==========================  =====  ==========  ==============================
+     ``R_AMDGPU_NONE``           0      *none*      *none*
+     ``R_AMDGPU_ABS32_LO``       1      ``word32``  (S + A) & 0xFFFFFFFF
+     ``R_AMDGPU_ABS32_HI``       2      ``word32``  (S + A) >> 32
+     ``R_AMDGPU_ABS64``          3      ``word64``  S + A
+     ``R_AMDGPU_REL32``          4      ``word32``  S + A - P
+     ``R_AMDGPU_REL64``          5      ``word64``  S + A - P
+     ``R_AMDGPU_ABS32``          6      ``word32``  S + A
+     ``R_AMDGPU_GOTPCREL``       7      ``word32``  G + GOT + A - P
+     ``R_AMDGPU_GOTPCREL32_LO``  8      ``word32``  (G + GOT + A - P) & 0xFFFFFFFF
+     ``R_AMDGPU_GOTPCREL32_HI``  9      ``word32``  (G + GOT + A - P) >> 32
+     ``R_AMDGPU_REL32_LO``       10     ``word32``  (S + A - P) & 0xFFFFFFFF
+     ``R_AMDGPU_REL32_HI``       11     ``word32``  (S + A - P) >> 32
+     ==========================  =====  ==========  ==============================
+
+.. _amdgpu-dwarf:
+
+DWARF
+-----
+
+Standard DWARF [DWARF]_ Version 2 sections can be generated. These contain
+information that maps the code object executable code and data to the source
+language constructs. It can be used by tools such as debuggers and profilers.
+
+Address Space Mapping
+~~~~~~~~~~~~~~~~~~~~~
+
+The following address space mapping is used:
+
+  .. table:: AMDGPU DWARF Address Space Mapping
+     :name: amdgpu-dwarf-address-space-mapping-table
+
+     =================== =================
+     DWARF Address Space Memory Space
+     =================== =================
+     1                   Private (Scratch)
+     2                   Local (group/LDS)
+     *omitted*           Global
+     *omitted*           Constant
+     *omitted*           Generic (Flat)
+     *not supported*     Region (GDS)
+     =================== =================
+
+See :ref:`amdgpu-address-spaces` for infomration on the memory space terminology
+used in the table.
+
+An ``address_class`` attribute is generated on pointer type DIEs to specify the
+DWARF address space of the value of the pointer when it is in the *private* or
+*local* address space. Otherwise the attribute is omitted.
+
+An ``XDEREF`` operation is generated in location list expressions for variables
+that are allocated in the *private* and *local* address space. Otherwise no
+``XDREF`` is omitted.
+
+Register Mapping
+~~~~~~~~~~~~~~~~
+
+*This section is WIP.*
+
+.. TODO
+   Define DWARF register enumeration.
+
+   If want to present a wavefront state then should expose vector registers as
+   64 wide (rather than per work-item view that LLVM uses). Either as separate
+   registers, or a 64x4 byte single register. In either case use a new LANE op
+   (akin to XDREF) to select the current lane usage in a location
+   expression. This would also allow scalar register spilling to vector register
+   lanes to be expressed (currently no debug information is being generated for
+   spilling). If choose a wide single register approach then use LANE in
+   conjunction with PIECE operation to select the dword part of the register for
+   the current lane. If the separate register approach then use LANE to select
+   the register.
+
+Source Text
+~~~~~~~~~~~
+
+*This section is WIP.*
+
+.. TODO
+   DWARF extension to include runtime generated source text.
+
+.. _amdgpu-code-conventions:
+
+Code Conventions
+================
+
+AMDHSA
+------
+
+This section provides code conventions used when the target triple OS is
+``amdhsa`` (see :ref:`amdgpu-target-triples`).
+
+Kernel Dispatch
+~~~~~~~~~~~~~~~
+
+The HSA architected queuing language (AQL) defines a user space memory interface
+that can be used to control the dispatch of kernels, in an agent independent
+way. An agent can have zero or more AQL queues created for it using the ROCm
+runtime, in which AQL packets (all of which are 64 bytes) can be placed. See the
+*HSA Platform System Architecture Specification* [HSA]_ for the AQL queue
+mechanics and packet layouts.
+
+The packet processor of a kernel agent is responsible for detecting and
+dispatching HSA kernels from the AQL queues associated with it. For AMD GPUs the
+packet processor is implemented by the hardware command processor (CP),
+asynchronous dispatch controller (ADC) and shader processor input controller
+(SPI).
+
+The ROCm runtime can be used to allocate an AQL queue object. It uses the kernel
+mode driver to initialize and register the AQL queue with CP.
+
+To dispatch a kernel the following actions are performed. This can occur in the
+CPU host program, or from an HSA kernel executing on a GPU.
+
+1. A pointer to an AQL queue for the kernel agent on which the kernel is to be
+   executed is obtained.
+2. A pointer to the kernel descriptor (see
+   :ref:`amdgpu-amdhsa-kernel-descriptor`) of the kernel to execute is
+   obtained. It must be for a kernel that is contained in a code object that that
+   was loaded by the ROCm runtime on the kernel agent with which the AQL queue is
+   associated.
+3. Space is allocated for the kernel arguments using the ROCm runtime allocator
+   for a memory region with the kernarg property for the kernel agent that will
+   execute the kernel. It must be at least 16 byte aligned.
+4. Kernel argument values are assigned to the kernel argument memory
+   allocation. The layout is defined in the *HSA Programmerâs Language Reference*
+   [HSA]_. For AMDGPU the kernel execution directly accesses the kernel argument
+   memory in the same way constant memory is accessed. (Note that the HSA
+   specification allows an implementation to copy the kernel argument contents to
+   another location that is accessed by the kernel.)
+5. An AQL kernel dispatch packet is created on the AQL queue. The ROCm runtime
+   api uses 64 bit atomic operations to reserve space in the AQL queue for the
+   packet. The packet must be set up, and the final write must use an atomic
+   store release to set the packet kind to ensure the packet contents are
+   visible to the kernel agent. AQL defines a doorbell signal mechanism to
+   notify the kernel agent that the AQL queue has been updated. These rules, and
+   the layout of the AQL queue and kernel dispatch packet is defined in the *HSA
+   System Architecture Specification* [HSA]_.
+6. A kernel dispatch packet includes information about the actual dispatch,
+   such as grid and work-group size, together with information from the code
+   object about the kernel, such as segment sizes. The ROCm runtime queries on
+   the kernel symbol can be used to obtain the code object values which are
+   recorded in the :ref:`amdgpu-code-object-metadata`.
+7. CP executes micro-code and is responsible for detecting and setting up the
+   GPU to execute the wavefronts of a kernel dispatch.
+8. CP ensures that when the a wavefront starts executing the kernel machine
+   code, the scalar general purpose registers (SGPR) and vector general purpose
+   registers (VGPR) are set up as required by the machine code. The required
+   setup is defined in the :ref:`amdgpu-amdhsa-kernel-descriptor`. The initial
+   register state is defined in
+   :ref:`amdgpu-amdhsa-initial-kernel-execution-state`.
+9. The prolog of the kernel machine code (see
+   :ref:`amdgpu-amdhsa-kernel-prolog`) sets up the machine state as necessary
+   before continuing executing the machine code that corresponds to the kernel.
+10. When the kernel dispatch has completed execution, CP signals the completion
+    signal specified in the kernel dispatch packet if not 0.
+
+.. _amdgpu-amdhsa-memory-spaces:
+
+Memory Spaces
+~~~~~~~~~~~~~
+
+The memory space properties are:
+
+  .. table:: AMDHSA Memory Spaces
+     :name: amdgpu-amdhsa-memory-spaces-table
+
+     ================= =========== ======== ======= ==================
+     Memory Space Name HSA Segment Hardware Address NULL Value
+                       Name        Name     Size
+     ================= =========== ======== ======= ==================
+     Private           private     scratch  32      0x00000000
+     Local             group       LDS      32      0xFFFFFFFF
+     Global            global      global   64      0x0000000000000000
+     Constant          constant    *same as 64      0x0000000000000000
+                                   global*
+     Generic           flat        flat     64      0x0000000000000000
+     Region            N/A         GDS      32      *not implemented
+                                                    for AMDHSA*
+     ================= =========== ======== ======= ==================
+
+The global and constant memory spaces both use global virtual addresses, which
+are the same virtual address space used by the CPU. However, some virtual
+addresses may only be accessible to the CPU, some only accessible by the GPU,
+and some by both.
+
+Using the constant memory space indicates that the data will not change during
+the execution of the kernel. This allows scalar read instructions to be
+used. The vector and scalar L1 caches are invalidated of volatile data before
+each kernel dispatch execution to allow constant memory to change values between
+kernel dispatches.
+
+The local memory space uses the hardware Local Data Store (LDS) which is
+automatically allocated when the hardware creates work-groups of wavefronts, and
+freed when all the wavefronts of a work-group have terminated. The data store
+(DS) instructions can be used to access it.
+
+The private memory space uses the hardware scratch memory support. If the kernel
+uses scratch, then the hardware allocates memory that is accessed using
+wavefront lane dword (4 byte) interleaving. The mapping used from private
+address to physical address is:
+
+  ``wavefront-scratch-base +
+  (private-address * wavefront-size * 4) +
+  (wavefront-lane-id * 4)``
+
+There are different ways that the wavefront scratch base address is determined
+by a wavefront (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). This
+memory can be accessed in an interleaved manner using buffer instruction with
+the scratch buffer descriptor and per wave scratch offset, by the scratch
+instructions, or by flat instructions. If each lane of a wavefront accesses the
+same private address, the interleaving results in adjacent dwords being accessed
+and hence requires fewer cache lines to be fetched. Multi-dword access is not
+supported except by flat and scratch instructions in GFX9.
+
+The generic address space uses the hardware flat address support available in
+GFX7-GFX9. This uses two fixed ranges of virtual addresses (the private and
+local appertures), that are outside the range of addressible global memory, to
+map from a flat address to a private or local address.
+
+FLAT instructions can take a flat address and access global, private (scratch)
+and group (LDS) memory depending in if the address is within one of the
+apperture ranges. Flat access to scratch requires hardware aperture setup and
+setup in the kernel prologue (see :ref:`amdgpu-amdhsa-flat-scratch`). Flat
+access to LDS requires hardware aperture setup and M0 (GFX7-GFX8) register setup
+(see :ref:`amdgpu-amdhsa-m0`).
+
+To convert between a segment address and a flat address the base address of the
+appertures address can be used. For GFX7-GFX8 these are available in the
+:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
+Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
+GFX9 the appature base addresses are directly available as inline constant
+registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit
+address mode the apperture sizes are 2^32 bytes and the base is aligned to 2^32
+which makes it easier to convert from flat to segment or segment to flat.
+
+HSA Image and Samplers
+~~~~~~~~~~~~~~~~~~~~~~
+
+Image and sample handles created by the ROCm runtime are 64 bit addresses of a
+hardware 32 byte V# and 48 byte S# object respectively. In order to support the
+HSA ``query_sampler`` operations two extra dwords are used to store the HSA BRIG
+enumeration values for the queries that are not trivially deducible from the S#
+representation.
+
+HSA Signals
+~~~~~~~~~~~
+
+Signal handles created by the ROCm runtime are 64 bit addresses of a structure
+allocated in memory accessible from both the CPU and GPU. The structure is
+defined by the ROCm runtime and subject to change between releases (see
+[AMD-ROCm-github]_).
+
+.. _amdgpu-amdhsa-hsa-aql-queue:
+
+HSA AQL Queue
+~~~~~~~~~~~~~
+
+The AQL queue structure is defined by the ROCm runtime and subject to change
+between releases (see [AMD-ROCm-github]_). For some processors it contains
+fields needed to implement certain language features such as the flat address
+aperture bases. It also contains fields used by CP such as managing the
+allocation of scratch memory.
+
+.. _amdgpu-amdhsa-kernel-descriptor:
+
+Kernel Descriptor
+~~~~~~~~~~~~~~~~~
+
+A kernel descriptor consists of the information needed by CP to initiate the
+execution of a kernel, including the entry point address of the machine code
+that implements the kernel.
+
+Kernel Descriptor for GFX6-GFX9
++++++++++++++++++++++++++++++++
+
+CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
+
+  .. table:: Kernel Descriptor for GFX6-GFX9
+     :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table
+
+     ======= ======= =============================== ===========================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================
+     31:0    4 bytes group_segment_fixed_size        The amount of fixed local
+                                                     address space memory
+                                                     required for a work-group
+                                                     in bytes. This does not
+                                                     include any dynamically
+                                                     allocated local address
+                                                     space memory that may be
+                                                     added when the kernel is
+                                                     dispatched.
+     63:32   4 bytes private_segment_fixed_size      The amount of fixed
+                                                     private address space
+                                                     memory required for a
+                                                     work-item in bytes. If
+                                                     is_dynamic_callstack is 1
+                                                     then additional space must
+                                                     be added to this value for
+                                                     the call stack.
+     95:64   4 bytes max_flat_workgroup_size         Maximum flat work-group
+                                                     size supported by the
+                                                     kernel in work-items.
+     96      1 bit   is_dynamic_call_stack           Indicates if the generated
+                                                     machine code is using a
+                                                     dynamically sized call
+                                                     stack.
+     97      1 bit   is_xnack_enabled                Indicates if the generated
+                                                     machine code is capable of
+                                                     suppoting XNACK.
+     127:98  30 bits                                 Reserved. Must be 0.
+     191:128 8 bytes kernel_code_entry_byte_offset   Byte offset (possibly
+                                                     negative) from base
+                                                     address of kernel
+                                                     descriptor to kernel's
+                                                     entry point instruction
+                                                     which must be 256 byte
+                                                     aligned.
+     383:192 24                                      Reserved. Must be 0.
+             bytes
+     415:384 4 bytes compute_pgm_rsrc1               Compute Shader (CS)
+                                                     program settings used by
+                                                     CP to set up
+                                                     ``COMPUTE_PGM_RSRC1``
+                                                     configuration
+                                                     register. See
+                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc1_t-gfx6-gfx9-table`.
+     447:416 4 bytes compute_pgm_rsrc2               Compute Shader (CS)
+                                                     program settings used by
+                                                     CP to set up
+                                                     ``COMPUTE_PGM_RSRC2``
+                                                     configuration
+                                                     register. See
+                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+     448     1 bit   enable_sgpr_private_segment     Enable the setup of the
+                     _buffer                         SGPR user data registers
+                                                     (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     The total number of SGPR
+                                                     user data registers
+                                                     requested must not exceed
+                                                     16 and match value in
+                                                     ``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
+                                                     Any requests beyond 16
+                                                     will be ignored.
+     449     1 bit   enable_sgpr_dispatch_ptr        *see above*
+     450     1 bit   enable_sgpr_queue_ptr           *see above*
+     451     1 bit   enable_sgpr_kernarg_segment_ptr *see above*
+     452     1 bit   enable_sgpr_dispatch_id         *see above*
+     453     1 bit   enable_sgpr_flat_scratch_init   *see above*
+     454     1 bit   enable_sgpr_private_segment     *see above*
+                     _size
+     455     1 bit   enable_sgpr_grid_workgroup      Not implemented in CP and
+                     _count_X                        should always be 0.
+     456     1 bit   enable_sgpr_grid_workgroup      Not implemented in CP and
+                     _count_Y                        should always be 0.
+     457     1 bit   enable_sgpr_grid_workgroup      Not implemented in CP and
+                     _count_Z                        should always be 0.
+     463:458 6 bits                                  Reserved. Must be 0.
+     511:464 4                                       Reserved. Must be 0.
+             bytes
+     512     **Total size 64 bytes.**
+     ======= ===================================================================
+
+..
+
+  .. table:: compute_pgm_rsrc1 for GFX6-GFX9
+     :name: amdgpu-amdhsa-compute_pgm_rsrc1_t-gfx6-gfx9-table
+
+     ======= ======= =============================== ===========================================================================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================================================================
+     5:0     6 bits  granulated_workitem_vgpr_count  Number of vector registers
+                                                     used by each work-item,
+                                                     granularity is device
+                                                     specific:
+
+                                                     GFX6-9
+                                                       roundup((max-vgpg + 1)
+                                                       / 4) - 1
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.VGPRS``.
+     9:6     4 bits  granulated_wavefront_sgpr_count Number of scalar registers
+                                                     used by a wavefront,
+                                                     granularity is device
+                                                     specific:
+
+                                                     GFX6-8
+                                                       roundup((max-sgpg + 1)
+                                                       / 8) - 1
+                                                     GFX9
+                                                       roundup((max-sgpg + 1)
+                                                       / 16) - 1
+
+                                                     Includes the special SGPRs
+                                                     for VCC, Flat Scratch (for
+                                                     GFX7 onwards) and XNACK
+                                                     (for GFX8 onwards). It does
+                                                     not include the 16 SGPR
+                                                     added if a trap handler is
+                                                     enabled.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.SGPRS``.
+     11:10   2 bits  priority                        Must be 0.
+
+                                                     Start executing wavefront
+                                                     at the specified priority.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.PRIORITY``.
+     13:12   2 bits  float_mode_round_32             Wavefront starts execution
+                                                     with specified rounding
+                                                     mode for single (32
+                                                     bit) floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point rounding
+                                                     mode values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     15:14   2 bits  float_mode_round_16_64          Wavefront starts execution
+                                                     with specified rounding
+                                                     denorm mode for half/double (16
+                                                     and 64 bit) floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point rounding
+                                                     mode values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     17:16   2 bits  float_mode_denorm_32            Wavefront starts execution
+                                                     with specified denorm mode
+                                                     for single (32
+                                                     bit)  floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point denorm mode
+                                                     values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     19:18   2 bits  float_mode_denorm_16_64         Wavefront starts execution
+                                                     with specified denorm mode
+                                                     for half/double (16
+                                                     and 64 bit) floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point denorm mode
+                                                     values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     20      1 bit   priv                            Must be 0.
+
+                                                     Start executing wavefront
+                                                     in privilege trap handler
+                                                     mode.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.PRIV``.
+     21      1 bit   enable_dx10_clamp               Wavefront starts execution
+                                                     with DX10 clamp mode
+                                                     enabled. Used by the vector
+                                                     ALU to force DX-10 style
+                                                     treatment of NaN's (when
+                                                     set, clamp NaN to zero,
+                                                     otherwise pass NaN
+                                                     through).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.DX10_CLAMP``.
+     22      1 bit   debug_mode                      Must be 0.
+
+                                                     Start executing wavefront
+                                                     in single step mode.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.DEBUG_MODE``.
+     23      1 bit   enable_ieee_mode                Wavefront starts execution
+                                                     with IEEE mode
+                                                     enabled. Floating point
+                                                     opcodes that support
+                                                     exception flag gathering
+                                                     will quiet and propagate
+                                                     signaling-NaN inputs per
+                                                     IEEE 754-2008. Min_dx10 and
+                                                     max_dx10 become IEEE
+                                                     754-2008 compliant due to
+                                                     signaling-NaN propagation
+                                                     and quieting.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.IEEE_MODE``.
+     24      1 bit   bulky                           Must be 0.
+
+                                                     Only one work-group allowed
+                                                     to execute on a compute
+                                                     unit.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.BULKY``.
+     25      1 bit   cdbg_user                       Must be 0.
+
+                                                     Flag that can be used to
+                                                     control debugging code.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.CDBG_USER``.
+     31:26   6 bits                                  Reserved. Must be 0.
+     32      **Total size 4 bytes**
+     ======= ===================================================================================================================
+
+..
+
+  .. table:: compute_pgm_rsrc2 for GFX6-GFX9
+     :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table
+
+     ======= ======= =============================== ===========================================================================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================================================================
+     0       1 bit   enable_sgpr_private_segment     Enable the setup of the
+                     _wave_offset                    SGPR wave scratch offset
+                                                     system register (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.SCRATCH_EN``.
+     5:1     5 bits  user_sgpr_count                 The total number of SGPR
+                                                     user data registers
+                                                     requested. This number must
+                                                     match the number of user
+                                                     data registers enabled.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.USER_SGPR``.
+     6       1 bit   enable_trap_handler             Set to 1 if code contains a
+                                                     TRAP instruction which
+                                                     requires a trap handler to
+                                                     be enabled.
+
+                                                     CP sets
+                                                     ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``
+                                                     if the runtime has
+                                                     installed a trap handler
+                                                     regardless of the setting
+                                                     of this field.
+     7       1 bit   enable_sgpr_workgroup_id_x      Enable the setup of the
+                                                     system SGPR register for
+                                                     the work-group id in the X
+                                                     dimension (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_X_EN``.
+     8       1 bit   enable_sgpr_workgroup_id_y      Enable the setup of the
+                                                     system SGPR register for
+                                                     the work-group id in the Y
+                                                     dimension (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_Y_EN``.
+     9       1 bit   enable_sgpr_workgroup_id_z      Enable the setup of the
+                                                     system SGPR register for
+                                                     the work-group id in the Z
+                                                     dimension (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_Z_EN``.
+     10      1 bit   enable_sgpr_workgroup_info      Enable the setup of the
+                                                     system SGPR register for
+                                                     work-group information (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_SIZE_EN``.
+     12:11   2 bits  enable_vgpr_workitem_id         Enable the setup of the
+                                                     VGPR system registers used
+                                                     for the work-item ID.
+                                                     :ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`
+                                                     defines the values.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TIDIG_CMP_CNT``.
+     13      1 bit   enable_exception_address_watch  Must be 0.
+
+                                                     Wavefront starts execution
+                                                     with address watch
+                                                     exceptions enabled which
+                                                     are generated when L1 has
+                                                     witnessed a thread access
+                                                     an *address of
+                                                     interest*.
+
+                                                     CP is responsible for
+                                                     filling in the address
+                                                     watch bit in
+                                                     ``COMPUTE_PGM_RSRC2.EXCP_EN_MSB``
+                                                     according to what the
+                                                     runtime requests.
+     14      1 bit   enable_exception_memory         Must be 0.
+
+                                                     Wavefront starts execution
+                                                     with memory violation
+                                                     exceptions exceptions
+                                                     enabled which are generated
+                                                     when a memory violation has
+                                                     occurred for this wave from
+                                                     L1 or LDS
+                                                     (write-to-read-only-memory,
+                                                     mis-aligned atomic, LDS
+                                                     address out of range,
+                                                     illegal address, etc.).
+
+                                                     CP sets the memory
+                                                     violation bit in
+                                                     ``COMPUTE_PGM_RSRC2.EXCP_EN_MSB``
+                                                     according to what the
+                                                     runtime requests.
+     23:15   9 bits  granulated_lds_size             Must be 0.
+
+                                                     CP uses the rounded value
+                                                     from the dispatch packet,
+                                                     not this value, as the
+                                                     dispatch may contain
+                                                     dynamically allocated group
+                                                     segment memory. CP writes
+                                                     directly to
+                                                     ``COMPUTE_PGM_RSRC2.LDS_SIZE``.
+
+                                                     Amount of group segment
+                                                     (LDS) to allocate for each
+                                                     work-group. Granularity is
+                                                     device specific:
+
+                                                     GFX6:
+                                                       roundup(lds-size / (64 * 4))
+                                                     GFX7-GFX9:
+                                                       roundup(lds-size / (128 * 4))
+
+     24      1 bit   enable_exception_ieee_754_fp    Wavefront starts execution
+                     _invalid_operation              with specified exceptions
+                                                     enabled.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.EXCP_EN``
+                                                     (set from bits 0..6).
+
+                                                     IEEE 754 FP Invalid
+                                                     Operation
+     25      1 bit   enable_exception_fp_denormal    FP Denormal one or more
+                     _source                         input operands is a
+                                                     denormal number
+     26      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP Division by
+                     _division_by_zero               Zero
+     27      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP FP Overflow
+                     _overflow
+     28      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP Underflow
+                     _underflow
+     29      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP Inexact
+                     _inexact
+     30      1 bit   enable_exception_int_divide_by  Integer Division by Zero
+                     _zero                           (rcp_iflag_f32 instruction
+                                                     only)
+     31      1 bit                                   Reserved. Must be 0.
+     32      **Total size 4 bytes.**
+     ======= ===================================================================================================================
+
+..
+
+  .. table:: Floating Point Rounding Mode Enumeration Values
+     :name: amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table
+
+     ===================================== ===== ===============================
+     Enumeration Name                      Value Description
+     ===================================== ===== ===============================
+     AMD_FLOAT_ROUND_MODE_NEAR_EVEN        0     Round Ties To Even
+     AMD_FLOAT_ROUND_MODE_PLUS_INFINITY    1     Round Toward +infinity
+     AMD_FLOAT_ROUND_MODE_MINUS_INFINITY   2     Round Toward -infinity
+     AMD_FLOAT_ROUND_MODE_ZERO             3     Round Toward 0
+     ===================================== ===== ===============================
+
+..
+
+  .. table:: Floating Point Denorm Mode Enumeration Values
+     :name: amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table
+
+     ===================================== ===== ===============================
+     Enumeration Name                      Value Description
+     ===================================== ===== ===============================
+     AMD_FLOAT_DENORM_MODE_FLUSH_SRC_DST   0     Flush Source and Destination
+                                                 Denorms
+     AMD_FLOAT_DENORM_MODE_FLUSH_DST       1     Flush Output Denorms
+     AMD_FLOAT_DENORM_MODE_FLUSH_SRC       2     Flush Source Denorms
+     AMD_FLOAT_DENORM_MODE_FLUSH_NONE      3     No Flush
+     ===================================== ===== ===============================
+
+..
+
+  .. table:: System VGPR Work-Item ID Enumeration Values
+     :name: amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table
+
+     ===================================== ===== ===============================
+     Enumeration Name                      Value Description
+     ===================================== ===== ===============================
+     AMD_SYSTEM_VGPR_WORKITEM_ID_X         0     Set work-item X dimension ID.
+     AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y       1     Set work-item X and Y
+                                                 dimensions ID.
+     AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z     2     Set work-item X, Y and Z
+                                                 dimensions ID.
+     AMD_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3     Undefined.
+     ===================================== ===== ===============================
+
+.. _amdgpu-amdhsa-initial-kernel-execution-state:
+
+Initial Kernel Execution State
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section defines the register state that will be set up by the packet
+processor prior to the start of execution of every wavefront. This is limited by
+the constraints of the hardware controllers of CP/ADC/SPI.
+
+The order of the SGPR registers is defined, but the compiler can specify which
+ones are actually setup in the kernel descriptor using the ``enable_sgpr_*`` bit
+fields (see :ref:`amdgpu-amdhsa-kernel-descriptor`). The register numbers used
+for enabled registers are dense starting at SGPR0: the first enabled register is
+SGPR0, the next enabled register is SGPR1 etc.; disabled registers do not have
+an SGPR number.
+
+The initial SGPRs comprise up to 16 User SRGPs that are set by CP and apply to
+all waves of the grid. It is possible to specify more than 16 User SGPRs using
+the ``enable_sgpr_*`` bit fields, in which case only the first 16 are actually
+initialized. These are then immediately followed by the System SGPRs that are
+set up by ADC/SPI and can have different values for each wave of the grid
+dispatch.
+
+SGPR register initial state is defined in
+:ref:`amdgpu-amdhsa-sgpr-register-set-up-order-table`.
+
+  .. table:: SGPR Register Set Up Order
+     :name: amdgpu-amdhsa-sgpr-register-set-up-order-table
+
+     ========== ========================== ====== ==============================
+     SGPR Order Name                       Number Description
+                (kernel descriptor enable  of
+                field)                     SGPRs
+     ========== ========================== ====== ==============================
+     First      Private Segment Buffer     4      V# that can be used, together
+                (enable_sgpr_private              with Scratch Wave Offset as an
+                _segment_buffer)                  offset, to access the private
+                                                  memory space using a segment
+                                                  address.
+
+                                                  CP uses the value provided by
+                                                  the runtime.
+     then       Dispatch Ptr               2      64 bit address of AQL dispatch
+                (enable_sgpr_dispatch_ptr)        packet for kernel dispatch
+                                                  actually executing.
+     then       Queue Ptr                  2      64 bit address of amd_queue_t
+                (enable_sgpr_queue_ptr)           object for AQL queue on which
+                                                  the dispatch packet was
+                                                  queued.
+     then       Kernarg Segment Ptr        2      64 bit address of Kernarg
+                (enable_sgpr_kernarg              segment. This is directly
+                _segment_ptr)                     copied from the
+                                                  kernarg_address in the kernel
+                                                  dispatch packet.
+
+                                                  Having CP load it once avoids
+                                                  loading it at the beginning of
+                                                  every wavefront.
+     then       Dispatch Id                2      64 bit Dispatch ID of the
+                (enable_sgpr_dispatch_id)         dispatch packet being
+                                                  executed.
+     then       Flat Scratch Init          2      This is 2 SGPRs:
+                (enable_sgpr_flat_scratch
+                _init)                            GFX6
+                                                    Not supported.
+                                                  GFX7-GFX8
+                                                    The first SGPR is a 32 bit
+                                                    byte offset from
+                                                    ``SH_HIDDEN_PRIVATE_BASE_VIMID``
+                                                    to per SPI base of memory
+                                                    for scratch for the queue
+                                                    executing the kernel
+                                                    dispatch. CP obtains this
+                                                    from the runtime.
+
+                                                    This is the same offset used
+                                                    in computing the Scratch
+                                                    Segment Buffer base
+                                                    address. The value of
+                                                    Scratch Wave Offset must be
+                                                    added by the kernel machine
+                                                    code and moved to SGPRn-4
+                                                    for use as the FLAT SCRATCH
+                                                    BASE in flat memory
+                                                    instructions.
+
+                                                    The second SGPR is 32 bit
+                                                    byte size of a single
+                                                    work-itemâs scratch memory
+                                                    usage. This is directly
+                                                    loaded from the kernel
+                                                    dispatch packet Private
+                                                    Segment Byte Size and
+                                                    rounded up to a multiple of
+                                                    DWORD.
+
+                                                    The kernel code must move to
+                                                    SGPRn-3 for use as the FLAT
+                                                    SCRATCH SIZE in flat memory
+                                                    instructions. Having CP load
+                                                    it once avoids loading it at
+                                                    the beginning of every
+                                                    wavefront.
+                                                  GFX9
+                                                    This is the 64 bit base
+                                                    address of the per SPI
+                                                    scratch backing memory
+                                                    managed by SPI for the queue
+                                                    executing the kernel
+                                                    dispatch. CP obtains this
+                                                    from the runtime (and
+                                                    divides it if there are
+                                                    multiple Shader Arrays each
+                                                    with its own SPI). The value
+                                                    of Scratch Wave Offset must
+                                                    be added by the kernel
+                                                    machine code and moved to
+                                                    SGPRn-4 and SGPRn-3 for use
+                                                    as the FLAT SCRATCH BASE in
+                                                    flat memory instructions.
+     then       Private Segment Size       1      The 32 bit byte size of a
+                (enable_sgpr_private              single work-itemâs scratch
+                _segment_size)                    memory allocation. This is the
+                                                  value from the kernel dispatch
+                                                  packet Private Segment Byte
+                                                  Size rounded up by CP to a
+                                                  multiple of DWORD.
+
+                                                  Having CP load it once avoids
+                                                  loading it at the beginning of
+                                                  every wavefront.
+
+                                                  This is not used for
+                                                  GFX7-GFX8 since it is the same
+                                                  value as the second SGPR of
+                                                  Flat Scratch Init. However, it
+                                                  may be needed for GFX9 which
+                                                  changes the meaning of the
+                                                  Flat Scratch Init value.
+     then       Grid Work-Group Count X    1      32 bit count of the number of
+                (enable_sgpr_grid                 work-groups in the X dimension
+                _workgroup_count_X)               for the grid being
+                                                  executed. Computed from the
+                                                  fields in the kernel dispatch
+                                                  packet as ((grid_size.x +
+                                                  workgroup_size.x - 1) /
+                                                  workgroup_size.x).
+     then       Grid Work-Group Count Y    1      32 bit count of the number of
+                (enable_sgpr_grid                 work-groups in the Y dimension
+                _workgroup_count_Y &&             for the grid being
+                less than 16 previous             executed. Computed from the
+                SGPRs)                            fields in the kernel dispatch
+                                                  packet as ((grid_size.y +
+                                                  workgroup_size.y - 1) /
+                                                  workgroupSize.y).
+
+                                                  Only initialized if <16
+                                                  previous SGPRs initialized.
+     then       Grid Work-Group Count Z    1      32 bit count of the number of
+                (enable_sgpr_grid                 work-groups in the Z dimension
+                _workgroup_count_Z &&             for the grid being
+                less than 16 previous             executed. Computed from the
+                SGPRs)                            fields in the kernel dispatch
+                                                  packet as ((grid_size.z +
+                                                  workgroup_size.z - 1) /
+                                                  workgroupSize.z).
+
+                                                  Only initialized if <16
+                                                  previous SGPRs initialized.
+     then       Work-Group Id X            1      32 bit work-group id in X
+                (enable_sgpr_workgroup_id         dimension of grid for
+                _X)                               wavefront.
+     then       Work-Group Id Y            1      32 bit work-group id in Y
+                (enable_sgpr_workgroup_id         dimension of grid for
+                _Y)                               wavefront.
+     then       Work-Group Id Z            1      32 bit work-group id in Z
+                (enable_sgpr_workgroup_id         dimension of grid for
+                _Z)                               wavefront.
+     then       Work-Group Info            1      {first_wave, 14âb0000,
+                (enable_sgpr_workgroup            ordered_append_term[10:0],
+                _info)                            threadgroup_size_in_waves[5:0]}
+     then       Scratch Wave Offset        1      32 bit byte offset from base
+                (enable_sgpr_private              of scratch base of queue
+                _segment_wave_offset)             executing the kernel
+                                                  dispatch. Must be used as an
+                                                  offset with Private
+                                                  segment address when using
+                                                  Scratch Segment Buffer. It
+                                                  must be used to set up FLAT
+                                                  SCRATCH for flat addressing
+                                                  (see
+                                                  :ref:`amdgpu-amdhsa-flat-scratch`).
+     ========== ========================== ====== ==============================
+
+The order of the VGPR registers is defined, but the compiler can specify which
+ones are actually setup in the kernel descriptor using the ``enable_vgpr*`` bit
+fields (see :ref:`amdgpu-amdhsa-kernel-descriptor`). The register numbers used
+for enabled registers are dense starting at VGPR0: the first enabled register is
+VGPR0, the next enabled register is VGPR1 etc.; disabled registers do not have a
+VGPR number.
+
+VGPR register initial state is defined in
+:ref:`amdgpu-amdhsa-vgpr-register-set-up-order-table`.
+
+  .. table:: VGPR Register Set Up Order
+     :name: amdgpu-amdhsa-vgpr-register-set-up-order-table
+
+     ========== ========================== ====== ==============================
+     VGPR Order Name                       Number Description
+                (kernel descriptor enable  of
+                field)                     VGPRs
+     ========== ========================== ====== ==============================
+     First      Work-Item Id X             1      32 bit work item id in X
+                (Always initialized)              dimension of work-group for
+                                                  wavefront lane.
+     then       Work-Item Id Y             1      32 bit work item id in Y
+                (enable_vgpr_workitem_id          dimension of work-group for
+                > 0)                              wavefront lane.
+     then       Work-Item Id Z             1      32 bit work item id in Z
+                (enable_vgpr_workitem_id          dimension of work-group for
+                > 1)                              wavefront lane.
+     ========== ========================== ====== ==============================
+
+The setting of registers is is done by GPU CP/ADC/SPI hardware as follows:
+
+1. SGPRs before the Work-Group Ids are set by CP using the 16 User Data
+   registers.
+2. Work-group Id registers X, Y, Z are set by ADC which supports any
+   combination including none.
+3. Scratch Wave Offset is set by SPI in a per wave basis which is why its value
+   cannot included with the flat scratch init value which is per queue.
+4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
+   or (X, Y, Z).
+
+Flat Scratch register pair are adjacent SGRRs so they can be moved as a 64 bit
+value to the hardware required SGPRn-3 and SGPRn-4 respectively.
+
+The global segment can be accessed either using buffer instructions (GFX6 which
+has V# 64 bit address support), flat instructions (GFX7-9), or global
+instructions (GFX9).
+
+If buffer operations are used then the compiler can generate a V# with the
+following properties:
+
+* base address of 0
+* no swizzle
+* ATC: 1 if IOMMU present (such as APU)
+* ptr64: 1
+* MTYPE set to support memory coherence that matches the runtime (such as CC for
+  APU and NC for dGPU).
+
+.. _amdgpu-amdhsa-kernel-prolog:
+
+Kernel Prolog
+~~~~~~~~~~~~~
+
+.. _amdgpu-amdhsa-m0:
+
+M0
+++
+
+GFX6-GFX8
+  The M0 register must be initialized with a value at least the total LDS size
+  if the kernel may access LDS via DS or flat operations. Total LDS size is
+  available in dispatch packet. For M0, it is also possible to use maximum
+  possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
+  GFX7-GFX8).
+GFX9
+  The M0 register is not used for range checking LDS accesses and so does not
+  need to be initialized in the prolog.
+
+.. _amdgpu-amdhsa-flat-scratch:
+
+Flat Scratch
+++++++++++++
+
+If the kernel may use flat operations to access scratch memory, the prolog code
+must set up FLAT_SCRATCH register pair (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which
+are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wave
+Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
+
+GFX6
+  Flat scratch is not supported.
+
+GFX7-8
+  1. The low word of Flat Scratch Init is 32 bit byte offset from
+     ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
+     being managed by SPI for the queue executing the kernel dispatch. This is
+     the same value used in the Scratch Segment Buffer V# base address. The
+     prolog must add the value of Scratch Wave Offset to get the wave's byte
+     scratch backing memory offset from ``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since
+     FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right shifted
+     by 8 before moving into FLAT_SCRATCH_LO.
+  2. The second word of Flat Scratch Init is 32 bit byte size of a single
+     work-items scratch memory usage. This is directly loaded from the kernel
+     dispatch packet Private Segment Byte Size and rounded up to a multiple of
+     DWORD. Having CP load it once avoids loading it at the beginning of every
+     wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT SCRATCH
+     SIZE.
+GFX9
+  The Flat Scratch Init is the 64 bit address of the base of scratch backing
+  memory being managed by SPI for the queue executing the kernel dispatch. The
+  prolog must add the value of Scratch Wave Offset and moved to the FLAT_SCRATCH
+  pair for use as the flat scratch base in flat memory instructions.
+
+.. _amdgpu-amdhsa-memory-model:
+
+Memory Model
+~~~~~~~~~~~~
+
+This section describes the mapping of LLVM memory model onto AMDGPU machine code
+(see :ref:`memmodel`). *The implementation is WIP.*
+
+.. TODO
+   Update when implementation complete.
+
+   Support more relaxed OpenCL memory model to be controlled by environment
+   component of target triple.
+
+The AMDGPU backend supports the memory synchronization scopes specified in
+:ref:`amdgpu-memory-scopes`.
+
+The code sequences used to implement the memory model are defined in table
+:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
+
+The sequences specify the order of instructions that a single thread must
+execute. The ``s_waitcnt`` and ``buffer_wbinvl1_vol`` are defined with respect
+to other memory instructions executed by the same thread. This allows them to be
+moved earlier or later which can allow them to be combined with other instances
+of the same instruction, or hoisted/sunk out of loops to improve
+performance. Only the instructions related to the memory model are given;
+additional ``s_waitcnt`` instructions are required to ensure registers are
+defined before being used. These may be able to be combined with the memory
+model ``s_waitcnt`` instructions as described above.
+
+The AMDGPU memory model supports both the HSA [HSA]_ memory model, and the
+OpenCL [OpenCL]_ memory model. The HSA memory model uses a single happens-before
+relation for all address spaces (see :ref:`amdgpu-address-spaces`). The OpenCL
+memory model which has separate happens-before relations for the global and
+local address spaces, and only a fence specifying both global and local address
+space joins the relationships. Since the LLVM ``memfence`` instruction does not
+allow an address space to be specified the OpenCL fence has to convervatively
+assume both local and global address space was specified. However, optimizations
+can often be done to eliminate the additional ``s_waitcnt``instructions when
+there are no intervening corresponding ``ds/flat_load/store/atomic`` memory
+instructions. The code sequences in the table indicate what can be omitted for
+the OpenCL memory. The target triple environment is used to determine if the
+source language is OpenCL (see :ref:`amdgpu-opencl`).
+
+``ds/flat_load/store/atomic`` instructions to local memory are termed LDS
+operations.
+
+``buffer/global/flat_load/store/atomic`` instructions to global memory are
+termed vector memory operations.
+
+For GFX6-GFX9:
+
+* Each agent has multiple compute units (CU).
+* Each CU has multiple SIMDs that execute wavefronts.
+* The wavefronts for a single work-group are executed in the same CU but may be
+  executed by different SIMDs.
+* Each CU has a single LDS memory shared by the wavefronts of the work-groups
+  executing on it.
+* All LDS operations of a CU are performed as wavefront wide operations in a
+  global order and involve no caching. Completion is reported to a wavefront in
+  execution order.
+* The LDS memory has multiple request queues shared by the SIMDs of a
+  CU. Therefore, the LDS operations performed by different waves of a work-group
+  can be reordered relative to each other, which can result in reordering the
+  visibility of vector memory operations with respect to LDS operations of other
+  wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
+  ensure synchronization between LDS operations and vector memory operations
+  between waves of a work-group, but not between operations performed by the
+  same wavefront.
+* The vector memory operations are performed as wavefront wide operations and
+  completion is reported to a wavefront in execution order. The exception is
+  that for GFX7-9 ``flat_load/store/atomic`` instructions can report out of
+  vector memory order if they access LDS memory, and out of LDS operation order
+  if they access global memory.
+* The vector memory operations access a vector L1 cache shared by all wavefronts
+  on a CU. Therefore, no special action is required for coherence between
+  wavefronts in the same work-group. A ``buffer_wbinvl1_vol`` is required for
+  coherence between waves executing in different work-groups as they may be
+  executing on different CUs.
+* The scalar memory operations access a scalar L1 cache shared by all wavefronts
+  on a group of CUs. The scalar and vector L1 caches are not coherent. However,
+  scalar operations are used in a restricted way so do not impact the memory
+  model. See :ref:`amdgpu-amdhsa-memory-spaces`.
+* The vector and scalar memory operations use an L2 cache shared by all CUs on
+  the same agent.
+* The L2 cache has independent channels to service disjoint ranges of virtual
+  addresses.
+* Each CU has a separate request queue per channel. Therefore, the vector and
+  scalar memory operations performed by waves executing in different work-groups
+  (which may be executing on different CUs) of an agent can be reordered
+  relative to each other. A ``s_waitcnt vmcnt(0)`` is required to ensure
+  synchronization between vector memory operations of different CUs. It ensures a
+  previous vector memory operation has completed before executing a subsequent
+  vector memory or LDS operation and so can be used to meet the requirements of
+  acquire and release.
+* The L2 cache can be kept coherent with other agents on some targets, or ranges
+  of virtual addresses can be set up to bypass it to ensure system coherence.
+
+Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-8),
+or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
+memory, atomic memory orderings are not meaningful and all accesses are treated
+as non-atomic.
+
+Constant address space uses ``buffer/global_load`` instructions (or equivalent
+scalar memory instructions). Since the constant address space contents do not
+change during the execution of a kernel dispatch it is not legal to perform
+stores, and atomic memory orderings are not meaningful and all access are
+treated as non-atomic.
+
+A memory synchronization scope wider than work-group is not meaningful for the
+group (LDS) address space and is treated as work-group.
+
+The memory model does not support the region address space which is treated as
+non-atomic.
+
+Acquire memory ordering is not meaningful on store atomic instructions and is
+treated as non-atomic.
+
+Release memory ordering is not meaningful on load atomic instructions and is
+treated a non-atomic.
+
+Acquire-release memory ordering is not meaningful on load or store atomic
+instructions and is treated as acquire and release respectively.
+
+AMDGPU backend only uses scalar memory operations to access memory that is
+proven to not change during the execution of the kernel dispatch. This includes
+constant address space and global address space for program scope const
+variables. Therefore the kernel machine code does not have to maintain the
+scalar L1 cache to ensure it is coherent with the vector L1 cache. The scalar
+and vector L1 caches are invalidated between kernel dispatches by CP since
+constant address space data may change between kernel dispatch executions. See
+:ref:`amdgpu-amdhsa-memory-spaces`.
+
+The one execption is if scalar writes are used to spill SGPR registers. In this
+case the AMDGPU backend ensures the memory location used to spill is never
+accessed by vector memory operations at the same time. If scalar writes are used
+then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
+return since the locations may be used for vector memory instructions by a
+future wave that uses the same scratch area, or a function call that creates a
+frame at the same address, respectively. There is no need for a ``s_dcache_inv``
+as all scalar writes are write-before-read in the same thread.
+
+Scratch backing memory (which is used for the private address space) is accessed
+with MTYPE NC_NV (non-coherenent non-volatile). Since the private address space
+is only accessed by a single thread, and is always write-before-read,
+there is never a need to invalidate these entries from the L1 cache. Hence all
+cache invalidates are done as ``*_vol`` to only invalidate the volatile cache
+lines.
+
+On dGPU the kernarg backing memory is accessed as UC (uncached) to avoid needing
+to invalidate the L2 cache. This also causes it to be treated as non-volatile
+and so is not invalidated by ``*_vol``. On APU it is accessed as CC (cache
+coherent) and so the L2 cache will coherent with the CPU and other agents.
+
+  .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX9
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table
+
+     ============ ============ ============== ========== =======================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
+                  Ordering     Sync Scope     Address
+                                              Space
+     ============ ============ ============== ========== =======================
+     **Non-Atomic**
+     ---------------------------------------------------------------------------
+     load         *none*       *none*         - global   non-volatile
+                                              - generic    1. buffer/global/flat_load
+                                                         volatile
+                                                           1. buffer/global/flat_load
+                                                              glc=1
+     load         *none*       *none*         - local    1. ds_load
+     store        *none*       *none*         - global   1. buffer/global/flat_store
+                                              - generic
+     store        *none*       *none*         - local    1. ds_store
+     **Unordered Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic
+                                                         atomic*.
+     **Monotonic Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load
+                               - wavefront    - generic
+                               - workgroup
+     load atomic  monotonic    - singlethread - local    1. ds_load
+                               - wavefront
+                               - workgroup
+     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load
+                               - system       - generic     glc=1
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
+                               - wavefront    - generic
+                               - workgroup
+                               - agent
+                               - system
+     store atomic monotonic    - singlethread - local    1. ds_store
+                               - wavefront
+                               - workgroup
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic
+                               - wavefront    - generic
+                               - workgroup
+                               - agent
+                               - system
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic
+                               - wavefront
+                               - workgroup
+     **Acquire Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load
+                               - wavefront    - local
+                                              - generic
+     load atomic  acquire      - workgroup    - global   1. buffer/global_load
+     load atomic  acquire      - workgroup    - local    1. ds/flat_load
+                                              - generic  2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+
+     load atomic  acquire      - agent        - global   1. buffer/global_load
+                               - system                     glc=1
+                                                         2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the load
+                                                             has completed
+                                                             before invalidating
+                                                             the cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale global data.
+
+     load atomic  acquire      - agent        - generic  1. flat_load glc=1
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the flat_load
+                                                             has completed
+                                                             before invalidating
+                                                             the cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic
+     atomicrmw    acquire      - workgroup    - local    1. ds/flat_atomic
+                                              - generic  2. waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the
+                                                             atomicrmw value
+                                                             being acquired.
+
+     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic
+                               - system                  2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - agent        - generic  1. flat_atomic
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acquire      - singlethread *none*     *none*
+                               - wavefront
+     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             waitcnt. However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the
+                                                             value read by the
+                                                             fence-paired-atomic.
+
+     fence        acquire      - agent        *none*     1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                             However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             group/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures that the
+                                                             fence-paired atomic
+                                                             has completed
+                                                             before invalidating
+                                                             the
+                                                             cache. Therefore
+                                                             any following
+                                                             locations read must
+                                                             be no older than
+                                                             the value read by
+                                                             the
+                                                             fence-paired-atomic.
+
+                                                         2. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     **Release Atomic**
+     ---------------------------------------------------------------------------
+     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store
+                               - wavefront    - local
+                                              - generic
+     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/flat_store
+     store atomic release      - workgroup    - local    1. ds_store
+     store atomic release      - agent        - global   1. s_waitcnt vmcnt(0) &
+                               - system       - generic     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/ds/flat_store
+     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global/flat_atomic
+     atomicrmw    release      - workgroup    - local    1. ds_atomic
+     atomicrmw    release      - agent        - global   1. s_waitcnt vmcnt(0) &
+                               - system       - generic     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global and local
+                                                             have completed
+                                                             before performing
+                                                             the atomicrmw that
+                                                             is being released.
+
+                                                         2. buffer/global/ds/flat_atomic*
+     fence        release      - singlethread *none*     *none*
+                               - wavefront
+     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             waitcnt. However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     fence        release      - agent        *none*     1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                             However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     **Acquire-Release Atomic**
+     ---------------------------------------------------------------------------
+     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+     atomicrmw    acq_rel      - workgroup    - local    1. ds_atomic
+                                                         2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+
+     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+                                                         3. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         4. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         4. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acq_rel      - singlethread *none*     *none*
+                               - wavefront
+     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             waitcnt. However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing any
+                                                             following global
+                                                             memory operations.
+                                                           - Ensures that the
+                                                             preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic)
+                                                             has completed
+                                                             before following
+                                                             global memory
+                                                             operations. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             local/generic store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+     fence        acq_rel      - agent        *none*     1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                             However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures that the
+                                                             preceding
+                                                             global/local/generic
+                                                             load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic)
+                                                             has completed
+                                                             before invalidating
+                                                             the cache. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             global/local/generic
+                                                             store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+                                                         2. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+
+     **Sequential Consistent Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    load atomic acquire*.
+                               - workgroup    - generic
+     load atomic  seq_cst      - agent        - global   1. s_waitcnt vmcnt(0)
+                               - system       - local
+                                              - generic    - Must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent global
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             waitcnt vmcnt(0) of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             competing out of
+                                                             order.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire*.
+
+     store atomic seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    store atomic release*.
+                               - workgroup    - generic
+     store atomic seq_cst      - agent        - global   *Same as corresponding
+                               - system       - generic  store atomic release*.
+     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    atomicrmw acq_rel*.
+                               - workgroup    - generic
+     atomicrmw    seq_cst      - agent        - global   *Same as corresponding
+                               - system       - generic  atomicrmw acq_rel*.
+     fence        seq_cst      - singlethread *none*     *Same as corresponding
+                               - wavefront               fence acq_rel*.
+                               - workgroup
+                               - agent
+                               - system
+     ============ ============ ============== ========== =======================
+
+The memory order also adds the single thread optimization constrains defined in
+table
+:ref:`amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx9-table`.
+
+  .. table:: AMDHSA Memory Model Single Thread Optimization Constraints GFX6-GFX9
+     :name: amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx9-table
+
+     ============ ==============================================================
+     LLVM Memory  Optimization Constraints
+     Ordering
+     ============ ==============================================================
+     unordered    *none*
+     monotonic    *none*
+     acquire      - If a load atomic/atomicrmw then no following load/load
+                    atomic/store/ store atomic/atomicrmw/fence instruction can
+                    be moved before the acquire.
+                  - If a fence then same as load atomic, plus no preceding
+                    associated fence-paired-atomic can be moved after the fence.
+     release      - If a store atomic/atomicrmw then no preceding load/load
+                    atomic/store/ store atomic/atomicrmw/fence instruction can
+                    be moved after the release.
+                  - If a fence then same as store atomic, plus no following
+                    associated fence-paired-atomic can be moved before the
+                    fence.
+     acq_rel      Same constraints as both acquire and release.
+     seq_cst      - If a load atomic then same constraints as acquire, plus no
+                    preceding sequentially consistent load atomic/store
+                    atomic/atomicrmw/fence instruction can be moved after the
+                    seq_cst.
+                  - If a store atomic then the same constraints as release, plus
+                    no following sequentially consistent load atomic/store
+                    atomic/atomicrmw/fence instruction can be moved before the
+                    seq_cst.
+                  - If an atomicrmw/fence then same constraints as acq_rel.
+     ============ ==============================================================
+
+Trap Handler ABI
+~~~~~~~~~~~~~~~~
+
+For code objects generated by AMDGPU backend for HSA [HSA]_ compatible runtimes
+(such as ROCm [AMD-ROCm]_), the runtime installs a trap handler that supports
+the ``s_trap`` instruction with the following usage:
+
+  .. table:: AMDGPU Trap Handler for AMDHSA OS
+     :name: amdgpu-trap-handler-for-amdhsa-os-table
+
+     =================== =============== =============== =======================
+     Usage               Code Sequence   Trap Handler    Description
+                                         Inputs
+     =================== =============== =============== =======================
+     reserved            ``s_trap 0x00``                 Reserved by hardware.
+     ``debugtrap(arg)``  ``s_trap 0x01`` ``SGPR0-1``:    Reserved for HSA
+                                           ``queue_ptr`` ``debugtrap``
+                                         ``VGPR0``:      intrinsic (not
+                                           ``arg``       implemented).
+     ``llvm.trap``       ``s_trap 0x02`` ``SGPR0-1``:    Causes dispatch to be
+                                           ``queue_ptr`` terminated and its
+                                                         associated queue put
+                                                         into the error state.
+     ``llvm.debugtrap``  ``s_trap 0x03`` ``SGPR0-1``:    If debugger not
+                                           ``queue_ptr`` installed handled
+                                                         same as ``llvm.trap``.
+     debugger breakpoint ``s_trap 0x07``                 Reserved for  debugger
+                                                         breakpoints.
+     debugger            ``s_trap 0x08``                 Reserved for debugger.
+     debugger            ``s_trap 0xfe``                 Reserved for debugger.
+     debugger            ``s_trap 0xff``                 Reserved for debugger.
+     =================== =============== =============== =======================
+
+Non-AMDHSA
+----------
+
+Trap Handler ABI
+~~~~~~~~~~~~~~~~
+
+For code objects generated by AMDGPU backend for non-amdhsa OS, the runtime does
+not install a trap handler. The ``llvm.trap`` and ``llvm.debugtrap``
+instructions are handled as follows:
+
+  .. table:: AMDGPU Trap Handler for Non-AMDHSA OS
+     :name: amdgpu-trap-handler-for-non-amdhsa-os-table
+
+     =============== =============== ===========================================
+     Usage           Code Sequence   Description
+     =============== =============== ===========================================
+     llvm.trap       s_endpgm        Causes wavefront to be terminated.
+     llvm.debugtrap  *none*          Compiler warning given that there is no
+                                     trap handler installed.
+     =============== =============== ===========================================
+
+Source Languages
+================
+
+.. _amdgpu-opencl:
+
+OpenCL
+------
+
+When generating code for the OpenCL language the target triple environment
+should be ``opencl`` or ``amdgizcl`` (see :ref:`amdgpu-target-triples`).
+
+When the language is OpenCL the following differences occur:
+
+1. The OpenCL memory model is used (see :ref:`amdgpu-amdhsa-memory-model`).
+2. The AMDGPU backend adds additional arguments to the kernel.
+3. Additional metadata is generated (:ref:`amdgpu-code-object-metadata`).
+
+.. TODO
+   Specify what affect this has. Hidden arguments added. Additional metadata
+   generated.
+
+.. _amdgpu-hcc:
+
+HCC
+---
+
+When generating code for the OpenCL language the target triple environment
+should be ``hcc`` (see :ref:`amdgpu-target-triples`).
+
+When the language is OpenCL the following differences occur:
+
+1. The HSA memory model is used (see :ref:`amdgpu-amdhsa-memory-model`).
+
+.. TODO
+   Specify what affect this has.
+
+Assembler
+---------
+
+AMDGPU backend has LLVM-MC based assembler which is currently in development.
+It supports AMDGCN GFX6-GFX8.
+
+This section describes general syntax for instructions and operands. For more
+information about instructions, their semantics and supported combinations of
+operands, refer to one of instruction set architecture manuals
+[AMD-Souther-Islands]_ [AMD-Sea-Islands]_ [AMD-Volcanic-Islands]_.
+
+An instruction has the following syntax (register operands are normally
+comma-separated while extra operands are space-separated):
+
+*<opcode> <register_operand0>, ... <extra_operand0> ...*
+
+Operands
+~~~~~~~~
+
+The following syntax for register operands is supported:
+
+* SGPR registers: s0, ... or s[0], ...
+* VGPR registers: v0, ... or v[0], ...
+* TTMP registers: ttmp0, ... or ttmp[0], ...
+* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
+* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
+* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
+* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
+* Register index expressions: v[2*2], s[1-1:2-1]
+* 'off' indicates that an operand is not enabled
+
+The following extra operands are supported:
+
+* offset, offset0, offset1
+* idxen, offen bits
+* glc, slc, tfe bits
+* waitcnt: integer or combination of counter values
+* VOP3 modifiers:
+
+  - abs (\| \|), neg (\-)
+
+* DPP modifiers:
+
+  - row_shl, row_shr, row_ror, row_rol
+  - row_mirror, row_half_mirror, row_bcast
+  - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
+  - row_mask, bank_mask, bound_ctrl
+
+* SDWA modifiers:
+
+  - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
+  - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
+  - abs, neg, sext
+
+Instruction Examples
+~~~~~~~~~~~~~~~~~~~~
+
+DS
+~~
+
+.. code-block:: nasm
+
+  ds_add_u32 v2, v4 offset:16
+  ds_write_src2_b64 v2 offset0:4 offset1:8
+  ds_cmpst_f32 v2, v4, v6
+  ds_min_rtn_f64 v[8:9], v2, v[4:5]
+
+
+For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
+
+FLAT
+++++
+
+.. code-block:: nasm
+
+  flat_load_dword v1, v[3:4]
+  flat_store_dwordx3 v[3:4], v[5:7]
+  flat_atomic_swap v1, v[3:4], v5 glc
+  flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
+  flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
+
+For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
+
+MUBUF
++++++
+
+.. code-block:: nasm
+
+  buffer_load_dword v1, off, s[4:7], s1
+  buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
+  buffer_store_format_xy v[1:2], off, s[4:7], s1
+  buffer_wbinvl1
+  buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
+
+For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
+
+SMRD/SMEM
++++++++++
+
+.. code-block:: nasm
+
+  s_load_dword s1, s[2:3], 0xfc
+  s_load_dwordx8 s[8:15], s[2:3], s4
+  s_load_dwordx16 s[88:103], s[2:3], s4
+  s_dcache_inv_vol
+  s_memtime s[4:5]
+
+For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
+
+SOP1
+++++
+
+.. code-block:: nasm
+
+  s_mov_b32 s1, s2
+  s_mov_b64 s[0:1], 0x80000000
+  s_cmov_b32 s1, 200
+  s_wqm_b64 s[2:3], s[4:5]
+  s_bcnt0_i32_b64 s1, s[2:3]
+  s_swappc_b64 s[2:3], s[4:5]
+  s_cbranch_join s[4:5]
+
+For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
+
+SOP2
+++++
+
+.. code-block:: nasm
+
+  s_add_u32 s1, s2, s3
+  s_and_b64 s[2:3], s[4:5], s[6:7]
+  s_cselect_b32 s1, s2, s3
+  s_andn2_b32 s2, s4, s6
+  s_lshr_b64 s[2:3], s[4:5], s6
+  s_ashr_i32 s2, s4, s6
+  s_bfm_b64 s[2:3], s4, s6
+  s_bfe_i64 s[2:3], s[4:5], s6
+  s_cbranch_g_fork s[4:5], s[6:7]
+
+For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
+
+SOPC
+++++
+
+.. code-block:: nasm
+
+  s_cmp_eq_i32 s1, s2
+  s_bitcmp1_b32 s1, s2
+  s_bitcmp0_b64 s[2:3], s4
+  s_setvskip s3, s5
+
+For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
+
+SOPP
+++++
+
+.. code-block:: nasm
+
+  s_barrier
+  s_nop 2
+  s_endpgm
+  s_waitcnt 0 ; Wait for all counters to be 0
+  s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
+  s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
+  s_sethalt 9
+  s_sleep 10
+  s_sendmsg 0x1
+  s_sendmsg sendmsg(MSG_INTERRUPT)
+  s_trap 1
+
+For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
+
+Unless otherwise mentioned, little verification is performed on the operands
+of SOPP Instructions, so it is up to the programmer to be familiar with the
+range or acceptable values.
+
+VALU
+++++
+
+For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
+the assembler will automatically use optimal encoding based on its operands.
+To force specific encoding, one can add a suffix to the opcode of the instruction:
+
+* _e32 for 32-bit VOP1/VOP2/VOPC
+* _e64 for 64-bit VOP3
+* _dpp for VOP_DPP
+* _sdwa for VOP_SDWA
+
+VOP1/VOP2/VOP3/VOPC examples:
+
+.. code-block:: nasm
+
+  v_mov_b32 v1, v2
+  v_mov_b32_e32 v1, v2
+  v_nop
+  v_cvt_f64_i32_e32 v[1:2], v2
+  v_floor_f32_e32 v1, v2
+  v_bfrev_b32_e32 v1, v2
+  v_add_f32_e32 v1, v2, v3
+  v_mul_i32_i24_e64 v1, v2, 3
+  v_mul_i32_i24_e32 v1, -3, v3
+  v_mul_i32_i24_e32 v1, -100, v3
+  v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
+  v_max_f16_e32 v1, v2, v3
+
+VOP_DPP examples:
+
+.. code-block:: nasm
+
+  v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
+  v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
+  v_mov_b32 v0, v0 wave_shl:1
+  v_mov_b32 v0, v0 row_mirror
+  v_mov_b32 v0, v0 row_bcast:31
+  v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
+  v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
+  v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
+
+VOP_SDWA examples:
+
+.. code-block:: nasm
+
+  v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
+  v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
+  v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
+  v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
+  v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
+
+For full list of supported instructions, refer to "Vector ALU instructions".
+
+HSA Code Object Directives
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+AMDGPU ABI defines auxiliary data in output code object. In assembly source,
+one can specify them with assembler directives.
+
+.hsa_code_object_version major, minor
++++++++++++++++++++++++++++++++++++++
+
+*major* and *minor* are integers that specify the version of the HSA code
+object that will be generated by the assembler.
+
+.hsa_code_object_isa [major, minor, stepping, vendor, arch]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+
+*major*, *minor*, and *stepping* are all integers that describe the instruction
+set architecture (ISA) version of the assembly program.
+
+*vendor* and *arch* are quoted strings.  *vendor* should always be equal to
+"AMD" and *arch* should always be equal to "AMDGPU".
+
+By default, the assembler will derive the ISA version, *vendor*, and *arch*
+from the value of the -mcpu option that is passed to the assembler.
+
+.amdgpu_hsa_kernel (name)
++++++++++++++++++++++++++
+
+This directives specifies that the symbol with given name is a kernel entry point
+(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
+
+.amd_kernel_code_t
+++++++++++++++++++
+
+This directive marks the beginning of a list of key / value pairs that are used
+to specify the amd_kernel_code_t object that will be emitted by the assembler.
+The list must be terminated by the *.end_amd_kernel_code_t* directive.  For
+any amd_kernel_code_t values that are unspecified a default value will be
+used.  The default value for all keys is 0, with the following exceptions:
+
+- *kernel_code_version_major* defaults to 1.
+- *machine_kind* defaults to 1.
+- *machine_version_major*, *machine_version_minor*, and
+  *machine_version_stepping* are derived from the value of the -mcpu option
+  that is passed to the assembler.
+- *kernel_code_entry_byte_offset* defaults to 256.
+- *wavefront_size* defaults to 6.
+- *kernarg_segment_alignment*, *group_segment_alignment*, and
+  *private_segment_alignment* default to 4.  Note that alignments are specified
+  as a power of two, so a value of **n** means an alignment of 2^ **n**.
+
+The *.amd_kernel_code_t* directive must be placed immediately after the
+function label and before any instructions.
+
+For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
+comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
+
+Here is an example of a minimal amd_kernel_code_t specification:
+
+.. code-block:: none
+
+   .hsa_code_object_version 1,0
+   .hsa_code_object_isa
+
+   .hsatext
+   .globl  hello_world
+   .p2align 8
+   .amdgpu_hsa_kernel hello_world
+
+   hello_world:
+
+      .amd_kernel_code_t
+         enable_sgpr_kernarg_segment_ptr = 1
+         is_ptr64 = 1
+         compute_pgm_rsrc1_vgprs = 0
+         compute_pgm_rsrc1_sgprs = 0
+         compute_pgm_rsrc2_user_sgpr = 2
+         kernarg_segment_byte_size = 8
+         wavefront_sgpr_count = 2
+         workitem_vgpr_count = 3
+     .end_amd_kernel_code_t
+
+     s_load_dwordx2 s[0:1], s[0:1] 0x0
+     v_mov_b32 v0, 3.14159
+     s_waitcnt lgkmcnt(0)
+     v_mov_b32 v1, s0
+     v_mov_b32 v2, s1
+     flat_store_dword v[1:2], v0
+     s_endpgm
+   .Lfunc_end0:
+        .size   hello_world, .Lfunc_end0-hello_world
+
+Additional Documentation
+========================
+
+.. [AMD-R6xx] `AMD R6xx shader ISA <http://developer.amd.com/wordpress/media/2012/10/R600_Instruction_Set_Architecture.pdf>`__
+.. [AMD-R7xx] `AMD R7xx shader ISA <http://developer.amd.com/wordpress/media/2012/10/R700-Family_Instruction_Set_Architecture.pdf>`__
+.. [AMD-Evergreen] `AMD Evergreen shader ISA <http://developer.amd.com/wordpress/media/2012/10/AMD_Evergreen-Family_Instruction_Set_Architecture.pdf>`__
+.. [AMD-Cayman-Trinity] `AMD Cayman/Trinity shader ISA <http://developer.amd.com/wordpress/media/2012/10/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf>`__
+.. [AMD-Souther-Islands] `AMD Southern Islands Series ISA <http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf>`__
+.. [AMD-Sea-Islands] `AMD Sea Islands Series ISA <http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf>`_
+.. [AMD-Volcanic-Islands] `AMD GCN3 Instruction Set Architecture <http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf>`__
+.. [AMD-OpenCL_Programming-Guide]  `AMD Accelerated Parallel Processing OpenCL Programming Guide <http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf>`_
+.. [AMD-APP-SDK] `AMD Accelerated Parallel Processing APP SDK Documentation <http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/documentation/>`__
+.. [AMD-ROCm] `ROCm: Open Platform for Development, Discovery and Education Around GPU Computing <http://gpuopen.com/compute-product/rocm/>`__
+.. [AMD-ROCm-github] `ROCm github <http://github.com/RadeonOpenCompute>`__
+.. [HSA] `Heterogeneous System Architecture (HSA) Foundation <http://www.hsafoundation.com/>`__
+.. [ELF] `Executable and Linkable Format (ELF) <http://www.sco.com/developers/gabi/>`__
+.. [DWARF] `DWARF Debugging Information Format <http://dwarfstd.org/>`__
+.. [YAML] `YAML Ainât Markup Language (YAMLâ¢) Version 1.2 <http://www.yaml.org/spec/1.2/spec.html>`__
+.. [OpenCL] `The OpenCL Specification Version 2.0 <http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf>`__
+.. [HRF] `Heterogeneous-race-free Memory Models <http://benedictgaster.org/wp-content/uploads/2014/01/asplos269-FINAL.pdf>`__
+.. [AMD-AMDGPU-Compute-Application-Binary-Interface] `AMDGPU Compute Application Binary Interface <https://github.com/RadeonOpenCompute/ROCm-ComputeABI-Doc/blob/master/AMDGPU-ABI.md>`__

Added: www-releases/trunk/5.0.1/docs/_sources/AdvancedBuilds.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/AdvancedBuilds.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/AdvancedBuilds.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/AdvancedBuilds.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,174 @@
+=============================
+Advanced Build Configurations
+=============================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+`CMake <http://www.cmake.org/>`_ is a cross-platform build-generator tool. CMake
+does not build the project, it generates the files needed by your build tool
+(GNU make, Visual Studio, etc.) for building LLVM.
+
+If **you are a new contributor**, please start with the :doc:`GettingStarted` or
+:doc:`CMake` pages. This page is intended for users doing more complex builds.
+
+Many of the examples below are written assuming specific CMake Generators.
+Unless otherwise explicitly called out these commands should work with any CMake
+generator.
+
+Bootstrap Builds
+================
+
+The Clang CMake build system supports bootstrap (aka multi-stage) builds. At a
+high level a multi-stage build is a chain of builds that pass data from one
+stage into the next. The most common and simple version of this is a traditional
+bootstrap build.
+
+In a simple two-stage bootstrap build, we build clang using the system compiler,
+then use that just-built clang to build clang again. In CMake this simplest form
+of a bootstrap build can be configured with a single option,
+CLANG_ENABLE_BOOTSTRAP.
+
+.. code-block:: console
+
+  $ cmake -G Ninja -DCLANG_ENABLE_BOOTSTRAP=On <path to source>
+  $ ninja stage2
+
+This command itself isn't terribly useful because it assumes default
+configurations for each stage. The next series of examples utilize CMake cache
+scripts to provide more complex options.
+
+The clang build system refers to builds as stages. A stage1 build is a standard
+build using the compiler installed on the host, and a stage2 build is built
+using the stage1 compiler. This nomenclature holds up to more stages too. In
+general a stage*n* build is built using the output from stage*n-1*.
+
+Apple Clang Builds (A More Complex Bootstrap)
+=============================================
+
+Apple's Clang builds are a slightly more complicated example of the simple
+bootstrapping scenario. Apple Clang is built using a 2-stage build.
+
+The stage1 compiler is a host-only compiler with some options set. The stage1
+compiler is a balance of optimization vs build time because it is a throwaway.
+The stage2 compiler is the fully optimized compiler intended to ship to users.
+
+Setting up these compilers requires a lot of options. To simplify the
+configuration the Apple Clang build settings are contained in CMake Cache files.
+You can build an Apple Clang compiler using the following commands:
+
+.. code-block:: console
+
+  $ cmake -G Ninja -C <path to clang>/cmake/caches/Apple-stage1.cmake <path to source>
+  $ ninja stage2-distribution
+
+This CMake invocation configures the stage1 host compiler, and sets
+CLANG_BOOTSTRAP_CMAKE_ARGS to pass the Apple-stage2.cmake cache script to the
+stage2 configuration step.
+
+When you build the stage2-distribution target it builds the minimal stage1
+compiler and required tools, then configures and builds the stage2 compiler
+based on the settings in Apple-stage2.cmake.
+
+This pattern of using cache scripts to set complex settings, and specifically to
+make later stage builds include cache scripts is common in our more advanced
+build configurations.
+
+Multi-stage PGO
+===============
+
+Profile-Guided Optimizations (PGO) is a really great way to optimize the code
+clang generates. Our multi-stage PGO builds are a workflow for generating PGO
+profiles that can be used to optimize clang.
+
+At a high level, the way PGO works is that you build an instrumented compiler,
+then you run the instrumented compiler against sample source files. While the
+instrumented compiler runs it will output a bunch of files containing
+performance counters (.profraw files). After generating all the profraw files
+you use llvm-profdata to merge the files into a single profdata file that you
+can feed into the LLVM_PROFDATA_FILE option.
+
+Our PGO.cmake cache script automates that whole process. You can use it by
+running:
+
+.. code-block:: console
+
+  $ cmake -G Ninja -C <path_to_clang>/cmake/caches/PGO.cmake <source dir>
+  $ ninja stage2-instrumented-generate-profdata
+
+If you let that run for a few hours or so, it will place a profdata file in your
+build directory. This takes a really long time because it builds clang twice,
+and you *must* have compiler-rt in your build tree.
+
+This process uses any source files under the perf-training directory as training
+data as long as the source files are marked up with LIT-style RUN lines.
+
+After it finishes you can use âfind . -name clang.profdataâ to find it, but it
+should be at a path something like:
+
+.. code-block:: console
+
+  <build dir>/tools/clang/stage2-instrumented-bins/utils/perf-training/clang.profdata
+
+You can feed that file into the LLVM_PROFDATA_FILE option when you build your
+optimized compiler.
+
+The PGO came cache has a slightly different stage naming scheme than other
+multi-stage builds. It generates three stages; stage1, stage2-instrumented, and
+stage2. Both of the stage2 builds are built using the stage1 compiler.
+
+The PGO came cache generates the following additional targets:
+
+**stage2-instrumented**
+  Builds a stage1 x86 compiler, runtime, and required tools (llvm-config,
+  llvm-profdata) then uses that compiler to build an instrumented stage2 compiler.
+
+**stage2-instrumented-generate-profdata**
+  Depends on "stage2-instrumented" and will use the instrumented compiler to
+  generate profdata based on the training files in <clang>/utils/perf-training
+
+**stage2**
+  Depends of "stage2-instrumented-generate-profdata" and will use the stage1
+  compiler with the stage2 profdata to build a PGO-optimized compiler.
+
+**stage2-check-llvm**
+  Depends on stage2 and runs check-llvm using the stage2 compiler.
+
+**stage2-check-clang**
+  Depends on stage2 and runs check-clang using the stage2 compiler.
+
+**stage2-check-all**
+  Depends on stage2 and runs check-all using the stage2 compiler.
+
+**stage2-test-suite**
+  Depends on stage2 and runs the test-suite using the stage3 compiler (requires
+  in-tree test-suite).
+
+3-Stage Non-Determinism
+=======================
+
+In the ancient lore of compilers non-determinism is like the multi-headed hydra.
+Whenever it's head pops up, terror and chaos ensue.
+
+Historically one of the tests to verify that a compiler was deterministic would
+be a three stage build. The idea of a three stage build is you take your sources
+and build a compiler (stage1), then use that compiler to rebuild the sources
+(stage2), then you use that compiler to rebuild the sources a third time
+(stage3) with an identical configuration to the stage2 build. At the end of
+this, you have a stage2 and stage3 compiler that should be bit-for-bit
+identical.
+
+You can perform one of these 3-stage builds with LLVM & clang using the
+following commands:
+
+.. code-block:: console
+
+  $ cmake -G Ninja -C <path_to_clang>/cmake/caches/3-stage.cmake <source dir>
+  $ ninja stage3
+
+After the build you can compare the stage2 & stage3 compilers. We have a bot
+setup `here <http://lab.llvm.org:8011/builders/clang-3stage-ubuntu>`_ that runs
+this build and compare configuration.

Added: www-releases/trunk/5.0.1/docs/_sources/AliasAnalysis.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/AliasAnalysis.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/AliasAnalysis.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/AliasAnalysis.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,718 @@
+==================================
+LLVM Alias Analysis Infrastructure
+==================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+Alias Analysis (aka Pointer Analysis) is a class of techniques which attempt to
+determine whether or not two pointers ever can point to the same object in
+memory.  There are many different algorithms for alias analysis and many
+different ways of classifying them: flow-sensitive vs. flow-insensitive,
+context-sensitive vs. context-insensitive, field-sensitive
+vs. field-insensitive, unification-based vs. subset-based, etc.  Traditionally,
+alias analyses respond to a query with a `Must, May, or No`_ alias response,
+indicating that two pointers always point to the same object, might point to the
+same object, or are known to never point to the same object.
+
+The LLVM `AliasAnalysis
+<http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html>`__ class is the
+primary interface used by clients and implementations of alias analyses in the
+LLVM system.  This class is the common interface between clients of alias
+analysis information and the implementations providing it, and is designed to
+support a wide range of implementations and clients (but currently all clients
+are assumed to be flow-insensitive).  In addition to simple alias analysis
+information, this class exposes Mod/Ref information from those implementations
+which can provide it, allowing for powerful analyses and transformations to work
+well together.
+
+This document contains information necessary to successfully implement this
+interface, use it, and to test both sides.  It also explains some of the finer
+points about what exactly results mean.  
+
+``AliasAnalysis`` Class Overview
+================================
+
+The `AliasAnalysis <http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html>`__
+class defines the interface that the various alias analysis implementations
+should support.  This class exports two important enums: ``AliasResult`` and
+``ModRefResult`` which represent the result of an alias query or a mod/ref
+query, respectively.
+
+The ``AliasAnalysis`` interface exposes information about memory, represented in
+several different ways.  In particular, memory objects are represented as a
+starting address and size, and function calls are represented as the actual
+``call`` or ``invoke`` instructions that performs the call.  The
+``AliasAnalysis`` interface also exposes some helper methods which allow you to
+get mod/ref information for arbitrary instructions.
+
+All ``AliasAnalysis`` interfaces require that in queries involving multiple
+values, values which are not :ref:`constants <constants>` are all
+defined within the same function.
+
+Representation of Pointers
+--------------------------
+
+Most importantly, the ``AliasAnalysis`` class provides several methods which are
+used to query whether or not two memory objects alias, whether function calls
+can modify or read a memory object, etc.  For all of these queries, memory
+objects are represented as a pair of their starting address (a symbolic LLVM
+``Value*``) and a static size.
+
+Representing memory objects as a starting address and a size is critically
+important for correct Alias Analyses.  For example, consider this (silly, but
+possible) C code:
+
+.. code-block:: c++
+
+  int i;
+  char C[2];
+  char A[10]; 
+  /* ... */
+  for (i = 0; i != 10; ++i) {
+    C[0] = A[i];          /* One byte store */
+    C[1] = A[9-i];        /* One byte store */
+  }
+
+In this case, the ``basicaa`` pass will disambiguate the stores to ``C[0]`` and
+``C[1]`` because they are accesses to two distinct locations one byte apart, and
+the accesses are each one byte.  In this case, the Loop Invariant Code Motion
+(LICM) pass can use store motion to remove the stores from the loop.  In
+constrast, the following code:
+
+.. code-block:: c++
+
+  int i;
+  char C[2];
+  char A[10]; 
+  /* ... */
+  for (i = 0; i != 10; ++i) {
+    ((short*)C)[0] = A[i];  /* Two byte store! */
+    C[1] = A[9-i];          /* One byte store */
+  }
+
+In this case, the two stores to C do alias each other, because the access to the
+``&C[0]`` element is a two byte access.  If size information wasn't available in
+the query, even the first case would have to conservatively assume that the
+accesses alias.
+
+.. _alias:
+
+The ``alias`` method
+--------------------
+  
+The ``alias`` method is the primary interface used to determine whether or not
+two memory objects alias each other.  It takes two memory objects as input and
+returns MustAlias, PartialAlias, MayAlias, or NoAlias as appropriate.
+
+Like all ``AliasAnalysis`` interfaces, the ``alias`` method requires that either
+the two pointer values be defined within the same function, or at least one of
+the values is a :ref:`constant <constants>`.
+
+.. _Must, May, or No:
+
+Must, May, and No Alias Responses
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``NoAlias`` response may be used when there is never an immediate dependence
+between any memory reference *based* on one pointer and any memory reference
+*based* the other. The most obvious example is when the two pointers point to
+non-overlapping memory ranges. Another is when the two pointers are only ever
+used for reading memory. Another is when the memory is freed and reallocated
+between accesses through one pointer and accesses through the other --- in this
+case, there is a dependence, but it's mediated by the free and reallocation.
+
+As an exception to this is with the :ref:`noalias <noalias>` keyword;
+the "irrelevant" dependencies are ignored.
+
+The ``MayAlias`` response is used whenever the two pointers might refer to the
+same object.
+
+The ``PartialAlias`` response is used when the two memory objects are known to
+be overlapping in some way, regardless whether they start at the same address
+or not.
+
+The ``MustAlias`` response may only be returned if the two memory objects are
+guaranteed to always start at exactly the same location. A ``MustAlias``
+response does not imply that the pointers compare equal.
+
+The ``getModRefInfo`` methods
+-----------------------------
+
+The ``getModRefInfo`` methods return information about whether the execution of
+an instruction can read or modify a memory location.  Mod/Ref information is
+always conservative: if an instruction **might** read or write a location,
+``ModRef`` is returned.
+
+The ``AliasAnalysis`` class also provides a ``getModRefInfo`` method for testing
+dependencies between function calls.  This method takes two call sites (``CS1``
+& ``CS2``), returns ``NoModRef`` if neither call writes to memory read or
+written by the other, ``Ref`` if ``CS1`` reads memory written by ``CS2``,
+``Mod`` if ``CS1`` writes to memory read or written by ``CS2``, or ``ModRef`` if
+``CS1`` might read or write memory written to by ``CS2``.  Note that this
+relation is not commutative.
+
+Other useful ``AliasAnalysis`` methods
+--------------------------------------
+
+Several other tidbits of information are often collected by various alias
+analysis implementations and can be put to good use by various clients.
+
+The ``pointsToConstantMemory`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``pointsToConstantMemory`` method returns true if and only if the analysis
+can prove that the pointer only points to unchanging memory locations
+(functions, constant global variables, and the null pointer).  This information
+can be used to refine mod/ref information: it is impossible for an unchanging
+memory location to be modified.
+
+.. _never access memory or only read memory:
+
+The ``doesNotAccessMemory`` and  ``onlyReadsMemory`` methods
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These methods are used to provide very simple mod/ref information for function
+calls.  The ``doesNotAccessMemory`` method returns true for a function if the
+analysis can prove that the function never reads or writes to memory, or if the
+function only reads from constant memory.  Functions with this property are
+side-effect free and only depend on their input arguments, allowing them to be
+eliminated if they form common subexpressions or be hoisted out of loops.  Many
+common functions behave this way (e.g., ``sin`` and ``cos``) but many others do
+not (e.g., ``acos``, which modifies the ``errno`` variable).
+
+The ``onlyReadsMemory`` method returns true for a function if analysis can prove
+that (at most) the function only reads from non-volatile memory.  Functions with
+this property are side-effect free, only depending on their input arguments and
+the state of memory when they are called.  This property allows calls to these
+functions to be eliminated and moved around, as long as there is no store
+instruction that changes the contents of memory.  Note that all functions that
+satisfy the ``doesNotAccessMemory`` method also satisfy ``onlyReadsMemory``.
+
+Writing a new ``AliasAnalysis`` Implementation
+==============================================
+
+Writing a new alias analysis implementation for LLVM is quite straight-forward.
+There are already several implementations that you can use for examples, and the
+following information should help fill in any details.  For a examples, take a
+look at the `various alias analysis implementations`_ included with LLVM.
+
+Different Pass styles
+---------------------
+
+The first step to determining what type of :doc:`LLVM pass <WritingAnLLVMPass>`
+you need to use for your Alias Analysis.  As is the case with most other
+analyses and transformations, the answer should be fairly obvious from what type
+of problem you are trying to solve:
+
+#. If you require interprocedural analysis, it should be a ``Pass``.
+#. If you are a function-local analysis, subclass ``FunctionPass``.
+#. If you don't need to look at the program at all, subclass ``ImmutablePass``.
+
+In addition to the pass that you subclass, you should also inherit from the
+``AliasAnalysis`` interface, of course, and use the ``RegisterAnalysisGroup``
+template to register as an implementation of ``AliasAnalysis``.
+
+Required initialization calls
+-----------------------------
+
+Your subclass of ``AliasAnalysis`` is required to invoke two methods on the
+``AliasAnalysis`` base class: ``getAnalysisUsage`` and
+``InitializeAliasAnalysis``.  In particular, your implementation of
+``getAnalysisUsage`` should explicitly call into the
+``AliasAnalysis::getAnalysisUsage`` method in addition to doing any declaring
+any pass dependencies your pass has.  Thus you should have something like this:
+
+.. code-block:: c++
+
+  void getAnalysisUsage(AnalysisUsage &AU) const {
+    AliasAnalysis::getAnalysisUsage(AU);
+    // declare your dependencies here.
+  }
+
+Additionally, your must invoke the ``InitializeAliasAnalysis`` method from your
+analysis run method (``run`` for a ``Pass``, ``runOnFunction`` for a
+``FunctionPass``, or ``InitializePass`` for an ``ImmutablePass``).  For example
+(as part of a ``Pass``):
+
+.. code-block:: c++
+
+  bool run(Module &M) {
+    InitializeAliasAnalysis(this);
+    // Perform analysis here...
+    return false;
+  }
+
+Required methods to override
+----------------------------
+
+You must override the ``getAdjustedAnalysisPointer`` method on all subclasses
+of ``AliasAnalysis``. An example implementation of this method would look like:
+
+.. code-block:: c++
+
+  void *getAdjustedAnalysisPointer(const void* ID) override {
+    if (ID == &AliasAnalysis::ID)
+      return (AliasAnalysis*)this;
+    return this;
+  }
+
+Interfaces which may be specified
+---------------------------------
+
+All of the `AliasAnalysis
+<http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html>`__ virtual methods
+default to providing :ref:`chaining <aliasanalysis-chaining>` to another alias
+analysis implementation, which ends up returning conservatively correct
+information (returning "May" Alias and "Mod/Ref" for alias and mod/ref queries
+respectively).  Depending on the capabilities of the analysis you are
+implementing, you just override the interfaces you can improve.
+
+.. _aliasanalysis-chaining:
+
+``AliasAnalysis`` chaining behavior
+-----------------------------------
+
+With only one special exception (the :ref:`-no-aa <aliasanalysis-no-aa>` pass)
+every alias analysis pass chains to another alias analysis implementation (for
+example, the user can specify "``-basicaa -ds-aa -licm``" to get the maximum
+benefit from both alias analyses).  The alias analysis class automatically
+takes care of most of this for methods that you don't override.  For methods
+that you do override, in code paths that return a conservative MayAlias or
+Mod/Ref result, simply return whatever the superclass computes.  For example:
+
+.. code-block:: c++
+
+  AliasResult alias(const Value *V1, unsigned V1Size,
+                    const Value *V2, unsigned V2Size) {
+    if (...)
+      return NoAlias;
+    ...
+
+    // Couldn't determine a must or no-alias result.
+    return AliasAnalysis::alias(V1, V1Size, V2, V2Size);
+  }
+
+In addition to analysis queries, you must make sure to unconditionally pass LLVM
+`update notification`_ methods to the superclass as well if you override them,
+which allows all alias analyses in a change to be updated.
+
+.. _update notification:
+
+Updating analysis results for transformations
+---------------------------------------------
+
+Alias analysis information is initially computed for a static snapshot of the
+program, but clients will use this information to make transformations to the
+code.  All but the most trivial forms of alias analysis will need to have their
+analysis results updated to reflect the changes made by these transformations.
+
+The ``AliasAnalysis`` interface exposes four methods which are used to
+communicate program changes from the clients to the analysis implementations.
+Various alias analysis implementations should use these methods to ensure that
+their internal data structures are kept up-to-date as the program changes (for
+example, when an instruction is deleted), and clients of alias analysis must be
+sure to call these interfaces appropriately.
+
+The ``deleteValue`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``deleteValue`` method is called by transformations when they remove an
+instruction or any other value from the program (including values that do not
+use pointers).  Typically alias analyses keep data structures that have entries
+for each value in the program.  When this method is called, they should remove
+any entries for the specified value, if they exist.
+
+The ``copyValue`` method
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``copyValue`` method is used when a new value is introduced into the
+program.  There is no way to introduce a value into the program that did not
+exist before (this doesn't make sense for a safe compiler transformation), so
+this is the only way to introduce a new value.  This method indicates that the
+new value has exactly the same properties as the value being copied.
+
+The ``replaceWithNewValue`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This method is a simple helper method that is provided to make clients easier to
+use.  It is implemented by copying the old analysis information to the new
+value, then deleting the old value.  This method cannot be overridden by alias
+analysis implementations.
+
+The ``addEscapingUse`` method
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``addEscapingUse`` method is used when the uses of a pointer value have
+changed in ways that may invalidate precomputed analysis information.
+Implementations may either use this callback to provide conservative responses
+for points whose uses have change since analysis time, or may recompute some or
+all of their internal state to continue providing accurate responses.
+
+In general, any new use of a pointer value is considered an escaping use, and
+must be reported through this callback, *except* for the uses below:
+
+* A ``bitcast`` or ``getelementptr`` of the pointer
+* A ``store`` through the pointer (but not a ``store`` *of* the pointer)
+* A ``load`` through the pointer
+
+Efficiency Issues
+-----------------
+
+From the LLVM perspective, the only thing you need to do to provide an efficient
+alias analysis is to make sure that alias analysis **queries** are serviced
+quickly.  The actual calculation of the alias analysis results (the "run"
+method) is only performed once, but many (perhaps duplicate) queries may be
+performed.  Because of this, try to move as much computation to the run method
+as possible (within reason).
+
+Limitations
+-----------
+
+The AliasAnalysis infrastructure has several limitations which make writing a
+new ``AliasAnalysis`` implementation difficult.
+
+There is no way to override the default alias analysis. It would be very useful
+to be able to do something like "``opt -my-aa -O2``" and have it use ``-my-aa``
+for all passes which need AliasAnalysis, but there is currently no support for
+that, short of changing the source code and recompiling. Similarly, there is
+also no way of setting a chain of analyses as the default.
+
+There is no way for transform passes to declare that they preserve
+``AliasAnalysis`` implementations. The ``AliasAnalysis`` interface includes
+``deleteValue`` and ``copyValue`` methods which are intended to allow a pass to
+keep an AliasAnalysis consistent, however there's no way for a pass to declare
+in its ``getAnalysisUsage`` that it does so. Some passes attempt to use
+``AU.addPreserved<AliasAnalysis>``, however this doesn't actually have any
+effect.
+
+``AliasAnalysisCounter`` (``-count-aa``) are implemented as ``ModulePass``
+classes, so if your alias analysis uses ``FunctionPass``, it won't be able to
+use these utilities. If you try to use them, the pass manager will silently
+route alias analysis queries directly to ``BasicAliasAnalysis`` instead.
+
+Similarly, the ``opt -p`` option introduces ``ModulePass`` passes between each
+pass, which prevents the use of ``FunctionPass`` alias analysis passes.
+
+The ``AliasAnalysis`` API does have functions for notifying implementations when
+values are deleted or copied, however these aren't sufficient. There are many
+other ways that LLVM IR can be modified which could be relevant to
+``AliasAnalysis`` implementations which can not be expressed.
+
+The ``AliasAnalysisDebugger`` utility seems to suggest that ``AliasAnalysis``
+implementations can expect that they will be informed of any relevant ``Value``
+before it appears in an alias query. However, popular clients such as ``GVN``
+don't support this, and are known to trigger errors when run with the
+``AliasAnalysisDebugger``.
+
+Due to several of the above limitations, the most obvious use for the
+``AliasAnalysisCounter`` utility, collecting stats on all alias queries in a
+compilation, doesn't work, even if the ``AliasAnalysis`` implementations don't
+use ``FunctionPass``.  There's no way to set a default, much less a default
+sequence, and there's no way to preserve it.
+
+The ``AliasSetTracker`` class (which is used by ``LICM``) makes a
+non-deterministic number of alias queries. This can cause stats collected by
+``AliasAnalysisCounter`` to have fluctuations among identical runs, for
+example. Another consequence is that debugging techniques involving pausing
+execution after a predetermined number of queries can be unreliable.
+
+Many alias queries can be reformulated in terms of other alias queries. When
+multiple ``AliasAnalysis`` queries are chained together, it would make sense to
+start those queries from the beginning of the chain, with care taken to avoid
+infinite looping, however currently an implementation which wants to do this can
+only start such queries from itself.
+
+Using alias analysis results
+============================
+
+There are several different ways to use alias analysis results.  In order of
+preference, these are:
+
+Using the ``MemoryDependenceAnalysis`` Pass
+-------------------------------------------
+
+The ``memdep`` pass uses alias analysis to provide high-level dependence
+information about memory-using instructions.  This will tell you which store
+feeds into a load, for example.  It uses caching and other techniques to be
+efficient, and is used by Dead Store Elimination, GVN, and memcpy optimizations.
+
+.. _AliasSetTracker:
+
+Using the ``AliasSetTracker`` class
+-----------------------------------
+
+Many transformations need information about alias **sets** that are active in
+some scope, rather than information about pairwise aliasing.  The
+`AliasSetTracker <http://llvm.org/doxygen/classllvm_1_1AliasSetTracker.html>`__
+class is used to efficiently build these Alias Sets from the pairwise alias
+analysis information provided by the ``AliasAnalysis`` interface.
+
+First you initialize the AliasSetTracker by using the "``add``" methods to add
+information about various potentially aliasing instructions in the scope you are
+interested in.  Once all of the alias sets are completed, your pass should
+simply iterate through the constructed alias sets, using the ``AliasSetTracker``
+``begin()``/``end()`` methods.
+
+The ``AliasSet``\s formed by the ``AliasSetTracker`` are guaranteed to be
+disjoint, calculate mod/ref information and volatility for the set, and keep
+track of whether or not all of the pointers in the set are Must aliases.  The
+AliasSetTracker also makes sure that sets are properly folded due to call
+instructions, and can provide a list of pointers in each set.
+
+As an example user of this, the `Loop Invariant Code Motion
+<doxygen/structLICM.html>`_ pass uses ``AliasSetTracker``\s to calculate alias
+sets for each loop nest.  If an ``AliasSet`` in a loop is not modified, then all
+load instructions from that set may be hoisted out of the loop.  If any alias
+sets are stored to **and** are must alias sets, then the stores may be sunk
+to outside of the loop, promoting the memory location to a register for the
+duration of the loop nest.  Both of these transformations only apply if the
+pointer argument is loop-invariant.
+
+The AliasSetTracker implementation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The AliasSetTracker class is implemented to be as efficient as possible.  It
+uses the union-find algorithm to efficiently merge AliasSets when a pointer is
+inserted into the AliasSetTracker that aliases multiple sets.  The primary data
+structure is a hash table mapping pointers to the AliasSet they are in.
+
+The AliasSetTracker class must maintain a list of all of the LLVM ``Value*``\s
+that are in each AliasSet.  Since the hash table already has entries for each
+LLVM ``Value*`` of interest, the AliasesSets thread the linked list through
+these hash-table nodes to avoid having to allocate memory unnecessarily, and to
+make merging alias sets extremely efficient (the linked list merge is constant
+time).
+
+You shouldn't need to understand these details if you are just a client of the
+AliasSetTracker, but if you look at the code, hopefully this brief description
+will help make sense of why things are designed the way they are.
+
+Using the ``AliasAnalysis`` interface directly
+----------------------------------------------
+
+If neither of these utility class are what your pass needs, you should use the
+interfaces exposed by the ``AliasAnalysis`` class directly.  Try to use the
+higher-level methods when possible (e.g., use mod/ref information instead of the
+`alias`_ method directly if possible) to get the best precision and efficiency.
+
+Existing alias analysis implementations and clients
+===================================================
+
+If you're going to be working with the LLVM alias analysis infrastructure, you
+should know what clients and implementations of alias analysis are available.
+In particular, if you are implementing an alias analysis, you should be aware of
+the `the clients`_ that are useful for monitoring and evaluating different
+implementations.
+
+.. _various alias analysis implementations:
+
+Available ``AliasAnalysis`` implementations
+-------------------------------------------
+
+This section lists the various implementations of the ``AliasAnalysis``
+interface.  With the exception of the :ref:`-no-aa <aliasanalysis-no-aa>`
+implementation, all of these :ref:`chain <aliasanalysis-chaining>` to other
+alias analysis implementations.
+
+.. _aliasanalysis-no-aa:
+
+The ``-no-aa`` pass
+^^^^^^^^^^^^^^^^^^^
+
+The ``-no-aa`` pass is just like what it sounds: an alias analysis that never
+returns any useful information.  This pass can be useful if you think that alias
+analysis is doing something wrong and are trying to narrow down a problem.
+
+The ``-basicaa`` pass
+^^^^^^^^^^^^^^^^^^^^^
+
+The ``-basicaa`` pass is an aggressive local analysis that *knows* many
+important facts:
+
+* Distinct globals, stack allocations, and heap allocations can never alias.
+* Globals, stack allocations, and heap allocations never alias the null pointer.
+* Different fields of a structure do not alias.
+* Indexes into arrays with statically differing subscripts cannot alias.
+* Many common standard C library functions `never access memory or only read
+  memory`_.
+* Pointers that obviously point to constant globals "``pointToConstantMemory``".
+* Function calls can not modify or references stack allocations if they never
+  escape from the function that allocates them (a common case for automatic
+  arrays).
+
+The ``-globalsmodref-aa`` pass
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This pass implements a simple context-sensitive mod/ref and alias analysis for
+internal global variables that don't "have their address taken".  If a global
+does not have its address taken, the pass knows that no pointers alias the
+global.  This pass also keeps track of functions that it knows never access
+memory or never read memory.  This allows certain optimizations (e.g. GVN) to
+eliminate call instructions entirely.
+
+The real power of this pass is that it provides context-sensitive mod/ref
+information for call instructions.  This allows the optimizer to know that calls
+to a function do not clobber or read the value of the global, allowing loads and
+stores to be eliminated.
+
+.. note::
+
+  This pass is somewhat limited in its scope (only support non-address taken
+  globals), but is very quick analysis.
+
+The ``-steens-aa`` pass
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``-steens-aa`` pass implements a variation on the well-known "Steensgaard's
+algorithm" for interprocedural alias analysis.  Steensgaard's algorithm is a
+unification-based, flow-insensitive, context-insensitive, and field-insensitive
+alias analysis that is also very scalable (effectively linear time).
+
+The LLVM ``-steens-aa`` pass implements a "speculatively field-**sensitive**"
+version of Steensgaard's algorithm using the Data Structure Analysis framework.
+This gives it substantially more precision than the standard algorithm while
+maintaining excellent analysis scalability.
+
+.. note::
+
+  ``-steens-aa`` is available in the optional "poolalloc" module. It is not part
+  of the LLVM core.
+
+The ``-ds-aa`` pass
+^^^^^^^^^^^^^^^^^^^
+
+The ``-ds-aa`` pass implements the full Data Structure Analysis algorithm.  Data
+Structure Analysis is a modular unification-based, flow-insensitive,
+context-**sensitive**, and speculatively field-**sensitive** alias
+analysis that is also quite scalable, usually at ``O(n * log(n))``.
+
+This algorithm is capable of responding to a full variety of alias analysis
+queries, and can provide context-sensitive mod/ref information as well.  The
+only major facility not implemented so far is support for must-alias
+information.
+
+.. note::
+
+  ``-ds-aa`` is available in the optional "poolalloc" module. It is not part of
+  the LLVM core.
+
+The ``-scev-aa`` pass
+^^^^^^^^^^^^^^^^^^^^^
+
+The ``-scev-aa`` pass implements AliasAnalysis queries by translating them into
+ScalarEvolution queries. This gives it a more complete understanding of
+``getelementptr`` instructions and loop induction variables than other alias
+analyses have.
+
+Alias analysis driven transformations
+-------------------------------------
+
+LLVM includes several alias-analysis driven transformations which can be used
+with any of the implementations above.
+
+The ``-adce`` pass
+^^^^^^^^^^^^^^^^^^
+
+The ``-adce`` pass, which implements Aggressive Dead Code Elimination uses the
+``AliasAnalysis`` interface to delete calls to functions that do not have
+side-effects and are not used.
+
+The ``-licm`` pass
+^^^^^^^^^^^^^^^^^^
+
+The ``-licm`` pass implements various Loop Invariant Code Motion related
+transformations.  It uses the ``AliasAnalysis`` interface for several different
+transformations:
+
+* It uses mod/ref information to hoist or sink load instructions out of loops if
+  there are no instructions in the loop that modifies the memory loaded.
+
+* It uses mod/ref information to hoist function calls out of loops that do not
+  write to memory and are loop-invariant.
+
+* It uses alias information to promote memory objects that are loaded and stored
+  to in loops to live in a register instead.  It can do this if there are no may
+  aliases to the loaded/stored memory location.
+
+The ``-argpromotion`` pass
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``-argpromotion`` pass promotes by-reference arguments to be passed in
+by-value instead.  In particular, if pointer arguments are only loaded from it
+passes in the value loaded instead of the address to the function.  This pass
+uses alias information to make sure that the value loaded from the argument
+pointer is not modified between the entry of the function and any load of the
+pointer.
+
+The ``-gvn``, ``-memcpyopt``, and ``-dse`` passes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These passes use AliasAnalysis information to reason about loads and stores.
+
+.. _the clients:
+
+Clients for debugging and evaluation of implementations
+-------------------------------------------------------
+
+These passes are useful for evaluating the various alias analysis
+implementations.  You can use them with commands like:
+
+.. code-block:: bash
+
+  % opt -ds-aa -aa-eval foo.bc -disable-output -stats
+
+The ``-print-alias-sets`` pass
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``-print-alias-sets`` pass is exposed as part of the ``opt`` tool to print
+out the Alias Sets formed by the `AliasSetTracker`_ class.  This is useful if
+you're using the ``AliasSetTracker`` class.  To use it, use something like:
+
+.. code-block:: bash
+
+  % opt -ds-aa -print-alias-sets -disable-output
+
+The ``-count-aa`` pass
+^^^^^^^^^^^^^^^^^^^^^^
+
+The ``-count-aa`` pass is useful to see how many queries a particular pass is
+making and what responses are returned by the alias analysis.  As an example:
+
+.. code-block:: bash
+
+  % opt -basicaa -count-aa -ds-aa -count-aa -licm
+
+will print out how many queries (and what responses are returned) by the
+``-licm`` pass (of the ``-ds-aa`` pass) and how many queries are made of the
+``-basicaa`` pass by the ``-ds-aa`` pass.  This can be useful when debugging a
+transformation or an alias analysis implementation.
+
+The ``-aa-eval`` pass
+^^^^^^^^^^^^^^^^^^^^^
+
+The ``-aa-eval`` pass simply iterates through all pairs of pointers in a
+function and asks an alias analysis whether or not the pointers alias.  This
+gives an indication of the precision of the alias analysis.  Statistics are
+printed indicating the percent of no/may/must aliases found (a more precise
+algorithm will have a lower number of may aliases).
+
+Memory Dependence Analysis
+==========================
+
+.. note::
+
+  We are currently in the process of migrating things from
+  ``MemoryDependenceAnalysis`` to :doc:`MemorySSA`. Please try to use
+  that instead.
+
+If you're just looking to be a client of alias analysis information, consider
+using the Memory Dependence Analysis interface instead.  MemDep is a lazy,
+caching layer on top of alias analysis that is able to answer the question of
+what preceding memory operations a given instruction depends on, either at an
+intra- or inter-block level.  Because of its laziness and caching policy, using
+MemDep can be a significant performance win over accessing alias analysis
+directly.

Added: www-releases/trunk/5.0.1/docs/_sources/Atomics.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/Atomics.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/Atomics.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/Atomics.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,605 @@
+==============================================
+LLVM Atomic Instructions and Concurrency Guide
+==============================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+LLVM supports instructions which are well-defined in the presence of threads and
+asynchronous signals.
+
+The atomic instructions are designed specifically to provide readable IR and
+optimized code generation for the following:
+
+* The C++11 ``<atomic>`` header.  (`C++11 draft available here
+  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
+  <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
+
+* Proper semantics for Java-style memory, for both ``volatile`` and regular
+  shared variables. (`Java Specification
+  <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
+
+* gcc-compatible ``__sync_*`` builtins. (`Description
+  <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
+
+* Other scenarios with atomic semantics, including ``static`` variables with
+  non-trivial constructors in C++.
+
+Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile,
+which ensures that every volatile load and store happens and is performed in the
+stated order.  A couple examples: if a SequentiallyConsistent store is
+immediately followed by another SequentiallyConsistent store to the same
+address, the first store can be erased. This transformation is not allowed for a
+pair of volatile stores. On the other hand, a non-volatile non-atomic load can
+be moved across a volatile load freely, but not an Acquire load.
+
+This document is intended to provide a guide to anyone either writing a frontend
+for LLVM or working on optimization passes for LLVM with a guide for how to deal
+with instructions with special semantics in the presence of concurrency.  This
+is not intended to be a precise guide to the semantics; the details can get
+extremely complicated and unreadable, and are not usually necessary.
+
+.. _Optimization outside atomic:
+
+Optimization outside atomic
+===========================
+
+The basic ``'load'`` and ``'store'`` allow a variety of optimizations, but can
+lead to undefined results in a concurrent environment; see `NotAtomic`_. This
+section specifically goes into the one optimizer restriction which applies in
+concurrent environments, which gets a bit more of an extended description
+because any optimization dealing with stores needs to be aware of it.
+
+From the optimizer's point of view, the rule is that if there are not any
+instructions with atomic ordering involved, concurrency does not matter, with
+one exception: if a variable might be visible to another thread or signal
+handler, a store cannot be inserted along a path where it might not execute
+otherwise.  Take the following example:
+
+.. code-block:: c
+
+ /* C code, for readability; run through clang -O2 -S -emit-llvm to get
+     equivalent IR */
+  int x;
+  void f(int* a) {
+    for (int i = 0; i < 100; i++) {
+      if (a[i])
+        x += 1;
+    }
+  }
+
+The following is equivalent in non-concurrent situations:
+
+.. code-block:: c
+
+  int x;
+  void f(int* a) {
+    int xtemp = x;
+    for (int i = 0; i < 100; i++) {
+      if (a[i])
+        xtemp += 1;
+    }
+    x = xtemp;
+  }
+
+However, LLVM is not allowed to transform the former to the latter: it could
+indirectly introduce undefined behavior if another thread can access ``x`` at
+the same time. (This example is particularly of interest because before the
+concurrency model was implemented, LLVM would perform this transformation.)
+
+Note that speculative loads are allowed; a load which is part of a race returns
+``undef``, but does not have undefined behavior.
+
+Atomic instructions
+===================
+
+For cases where simple loads and stores are not sufficient, LLVM provides
+various atomic instructions. The exact guarantees provided depend on the
+ordering; see `Atomic orderings`_.
+
+``load atomic`` and ``store atomic`` provide the same basic functionality as
+non-atomic loads and stores, but provide additional guarantees in situations
+where threads and signals are involved.
+
+``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
+atomic store (where the store is conditional for ``cmpxchg``), but no other
+memory operation can happen on any thread between the load and store.
+
+A ``fence`` provides Acquire and/or Release ordering which is not part of
+another operation; it is normally used along with Monotonic memory operations.
+A Monotonic load followed by an Acquire fence is roughly equivalent to an
+Acquire load, and a Monotonic store following a Release fence is roughly
+equivalent to a Release store. SequentiallyConsistent fences behave as both
+an Acquire and a Release fence, and offer some additional complicated
+guarantees, see the C++11 standard for details.
+
+Frontends generating atomic instructions generally need to be aware of the
+target to some degree; atomic instructions are guaranteed to be lock-free, and
+therefore an instruction which is wider than the target natively supports can be
+impossible to generate.
+
+.. _Atomic orderings:
+
+Atomic orderings
+================
+
+In order to achieve a balance between performance and necessary guarantees,
+there are six levels of atomicity. They are listed in order of strength; each
+level includes all the guarantees of the previous level except for
+Acquire/Release. (See also `LangRef Ordering <LangRef.html#ordering>`_.)
+
+.. _NotAtomic:
+
+NotAtomic
+---------
+
+NotAtomic is the obvious, a load or store which is not atomic. (This isn't
+really a level of atomicity, but is listed here for comparison.) This is
+essentially a regular load or store. If there is a race on a given memory
+location, loads from that location return undef.
+
+Relevant standard
+  This is intended to match shared variables in C/C++, and to be used in any
+  other context where memory access is necessary, and a race is impossible. (The
+  precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.)
+
+Notes for frontends
+  The rule is essentially that all memory accessed with basic loads and stores
+  by multiple threads should be protected by a lock or other synchronization;
+  otherwise, you are likely to run into undefined behavior. If your frontend is
+  for a "safe" language like Java, use Unordered to load and store any shared
+  variable.  Note that NotAtomic volatile loads and stores are not properly
+  atomic; do not try to use them as a substitute. (Per the C/C++ standards,
+  volatile does provide some limited guarantees around asynchronous signals, but
+  atomics are generally a better solution.)
+
+Notes for optimizers
+  Introducing loads to shared variables along a codepath where they would not
+  otherwise exist is allowed; introducing stores to shared variables is not. See
+  `Optimization outside atomic`_.
+
+Notes for code generation
+  The one interesting restriction here is that it is not allowed to write to
+  bytes outside of the bytes relevant to a store.  This is mostly relevant to
+  unaligned stores: it is not allowed in general to convert an unaligned store
+  into two aligned stores of the same width as the unaligned store. Backends are
+  also expected to generate an i8 store as an i8 store, and not an instruction
+  which writes to surrounding bytes.  (If you are writing a backend for an
+  architecture which cannot satisfy these restrictions and cares about
+  concurrency, please send an email to llvm-dev.)
+
+Unordered
+---------
+
+Unordered is the lowest level of atomicity. It essentially guarantees that races
+produce somewhat sane results instead of having undefined behavior.  It also
+guarantees the operation to be lock-free, so it does not depend on the data
+being part of a special atomic structure or depend on a separate per-process
+global lock.  Note that code generation will fail for unsupported atomic
+operations; if you need such an operation, use explicit locking.
+
+Relevant standard
+  This is intended to match the Java memory model for shared variables.
+
+Notes for frontends
+  This cannot be used for synchronization, but is useful for Java and other
+  "safe" languages which need to guarantee that the generated code never
+  exhibits undefined behavior. Note that this guarantee is cheap on common
+  platforms for loads of a native width, but can be expensive or unavailable for
+  wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe"
+  languages would normally split a 64-bit store on ARM into two 32-bit unordered
+  stores.)
+
+Notes for optimizers
+  In terms of the optimizer, this prohibits any transformation that transforms a
+  single load into multiple loads, transforms a store into multiple stores,
+  narrows a store, or stores a value which would not be stored otherwise.  Some
+  examples of unsafe optimizations are narrowing an assignment into a bitfield,
+  rematerializing a load, and turning loads and stores into a memcpy
+  call. Reordering unordered operations is safe, though, and optimizers should
+  take advantage of that because unordered operations are common in languages
+  that need them.
+
+Notes for code generation
+  These operations are required to be atomic in the sense that if you use
+  unordered loads and unordered stores, a load cannot see a value which was
+  never stored.  A normal load or store instruction is usually sufficient, but
+  note that an unordered load or store cannot be split into multiple
+  instructions (or an instruction which does multiple memory operations, like
+  ``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM).
+
+Monotonic
+---------
+
+Monotonic is the weakest level of atomicity that can be used in synchronization
+primitives, although it does not provide any general synchronization. It
+essentially guarantees that if you take all the operations affecting a specific
+address, a consistent ordering exists.
+
+Relevant standard
+  This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
+  standards for the exact definition.
+
+Notes for frontends
+  If you are writing a frontend which uses this directly, use with caution.  The
+  guarantees in terms of synchronization are very weak, so make sure these are
+  only used in a pattern which you know is correct.  Generally, these would
+  either be used for atomic operations which do not protect other memory (like
+  an atomic counter), or along with a ``fence``.
+
+Notes for optimizers
+  In terms of the optimizer, this can be treated as a read+write on the relevant
+  memory location (and alias analysis will take advantage of that). In addition,
+  it is legal to reorder non-atomic and Unordered loads around Monotonic
+  loads. CSE/DSE and a few other optimizations are allowed, but Monotonic
+  operations are unlikely to be used in ways which would make those
+  optimizations useful.
+
+Notes for code generation
+  Code generation is essentially the same as that for unordered for loads and
+  stores.  No fences are required.  ``cmpxchg`` and ``atomicrmw`` are required
+  to appear as a single operation.
+
+Acquire
+-------
+
+Acquire provides a barrier of the sort necessary to acquire a lock to access
+other memory with normal loads and stores.
+
+Relevant standard
+  This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
+  used for C++11/C11 ``memory_order_consume``.
+
+Notes for frontends
+  If you are writing a frontend which uses this directly, use with caution.
+  Acquire only provides a semantic guarantee when paired with a Release
+  operation.
+
+Notes for optimizers
+  Optimizers not aware of atomics can treat this like a nothrow call.  It is
+  also possible to move stores from before an Acquire load or read-modify-write
+  operation to after it, and move non-Acquire loads from before an Acquire
+  operation to after it.
+
+Notes for code generation
+  Architectures with weak memory ordering (essentially everything relevant today
+  except x86 and SPARC) require some sort of fence to maintain the Acquire
+  semantics.  The precise fences required varies widely by architecture, but for
+  a simple implementation, most architectures provide a barrier which is strong
+  enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.).  Putting
+  such a fence after the equivalent Monotonic operation is sufficient to
+  maintain Acquire semantics for a memory operation.
+
+Release
+-------
+
+Release is similar to Acquire, but with a barrier of the sort necessary to
+release a lock.
+
+Relevant standard
+  This corresponds to the C++11/C11 ``memory_order_release``.
+
+Notes for frontends
+  If you are writing a frontend which uses this directly, use with caution.
+  Release only provides a semantic guarantee when paired with a Acquire
+  operation.
+
+Notes for optimizers
+  Optimizers not aware of atomics can treat this like a nothrow call.  It is
+  also possible to move loads from after a Release store or read-modify-write
+  operation to before it, and move non-Release stores from after an Release
+  operation to before it.
+
+Notes for code generation
+  See the section on Acquire; a fence before the relevant operation is usually
+  sufficient for Release. Note that a store-store fence is not sufficient to
+  implement Release semantics; store-store fences are generally not exposed to
+  IR because they are extremely difficult to use correctly.
+
+AcquireRelease
+--------------
+
+AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
+barrier (for fences and operations which both read and write memory).
+
+Relevant standard
+  This corresponds to the C++11/C11 ``memory_order_acq_rel``.
+
+Notes for frontends
+  If you are writing a frontend which uses this directly, use with caution.
+  Acquire only provides a semantic guarantee when paired with a Release
+  operation, and vice versa.
+
+Notes for optimizers
+  In general, optimizers should treat this like a nothrow call; the possible
+  optimizations are usually not interesting.
+
+Notes for code generation
+  This operation has Acquire and Release semantics; see the sections on Acquire
+  and Release.
+
+SequentiallyConsistent
+----------------------
+
+SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
+and Release semantics for stores. Additionally, it guarantees that a total
+ordering exists between all SequentiallyConsistent operations.
+
+Relevant standard
+  This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
+  the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
+
+Notes for frontends
+  If a frontend is exposing atomic operations, these are much easier to reason
+  about for the programmer than other kinds of operations, and using them is
+  generally a practical performance tradeoff.
+
+Notes for optimizers
+  Optimizers not aware of atomics can treat this like a nothrow call.  For
+  SequentiallyConsistent loads and stores, the same reorderings are allowed as
+  for Acquire loads and Release stores, except that SequentiallyConsistent
+  operations may not be reordered.
+
+Notes for code generation
+  SequentiallyConsistent loads minimally require the same barriers as Acquire
+  operations and SequentiallyConsistent stores require Release
+  barriers. Additionally, the code generator must enforce ordering between
+  SequentiallyConsistent stores followed by SequentiallyConsistent loads. This
+  is usually done by emitting either a full fence before the loads or a full
+  fence after the stores; which is preferred varies by architecture.
+
+Atomics and IR optimization
+===========================
+
+Predicates for optimizer writers to query:
+
+* ``isSimple()``: A load or store which is not volatile or atomic.  This is
+  what, for example, memcpyopt would check for operations it might transform.
+
+* ``isUnordered()``: A load or store which is not volatile and at most
+  Unordered. This would be checked, for example, by LICM before hoisting an
+  operation.
+
+* ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
+  that they return true for any operation which is volatile or at least
+  Monotonic.
+
+* ``isStrongerThan`` / ``isAtLeastOrStrongerThan``: These are predicates on
+  orderings. They can be useful for passes that are aware of atomics, for
+  example to do DSE across a single atomic access, but not across a
+  release-acquire pair (see MemoryDependencyAnalysis for an example of this)
+
+* Alias analysis: Note that AA will return ModRef for anything Acquire or
+  Release, and for the address accessed by any Monotonic operation.
+
+To support optimizing around atomic operations, make sure you are using the
+right predicates; everything should work if that is done.  If your pass should
+optimize some atomic operations (Unordered operations in particular), make sure
+it doesn't replace an atomic load or store with a non-atomic operation.
+
+Some examples of how optimizations interact with various kinds of atomic
+operations:
+
+* ``memcpyopt``: An atomic operation cannot be optimized into part of a
+  memcpy/memset, including unordered loads/stores.  It can pull operations
+  across some atomic operations.
+
+* LICM: Unordered loads/stores can be moved out of a loop.  It just treats
+  monotonic operations like a read+write to a memory location, and anything
+  stricter than that like a nothrow call.
+
+* DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
+  be DSE'ed in some cases, but it's tricky to reason about, and not especially
+  important. It is possible in some case for DSE to operate across a stronger
+  atomic operation, but it is fairly tricky. DSE delegates this reasoning to
+  MemoryDependencyAnalysis (which is also used by other passes like GVN).
+
+* Folding a load: Any atomic load from a constant global can be constant-folded,
+  because it cannot be observed.  Similar reasoning allows sroa with
+  atomic loads and stores.
+
+Atomics and Codegen
+===================
+
+Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
+On architectures which use barrier instructions for all atomic ordering (like
+ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
+``setInsertFencesForAtomic()`` was used.
+
+The MachineMemOperand for all atomic operations is currently marked as volatile;
+this is not correct in the IR sense of volatile, but CodeGen handles anything
+marked volatile very conservatively.  This should get fixed at some point.
+
+One very important property of the atomic operations is that if your backend
+supports any inline lock-free atomic operations of a given size, you should
+support *ALL* operations of that size in a lock-free manner.
+
+When the target implements atomic ``cmpxchg`` or LL/SC instructions (as most do)
+this is trivial: all the other operations can be implemented on top of those
+primitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel 80386) there
+are atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As it is
+invalid to implement ``atomic load`` using the native instruction, but
+``cmpxchg`` using a library call to a function that uses a mutex, ``atomic
+load`` must *also* expand to a library call on such architectures, so that it
+can remain atomic with regards to a simultaneous ``cmpxchg``, by using the same
+mutex.
+
+AtomicExpandPass can help with that: it will expand all atomic operations to the
+proper ``__atomic_*`` libcalls for any size above the maximum set by
+``setMaxAtomicSizeInBitsSupported`` (which defaults to 0).
+
+On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
+generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
+fences generate an ``MFENCE``, other fences do not cause any code to be
+generated.  ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction.  ``atomicrmw xchg``
+uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
+other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``.  Depending
+on the users of the result, some ``atomicrmw`` operations can be translated into
+operations like ``LOCK AND``, but that does not work in general.
+
+On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
+and SequentiallyConsistent semantics require barrier instructions for every such
+operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
+``atomicrmw`` can be represented using a loop with LL/SC-style instructions
+which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
+on ARM, etc.).
+
+It is often easiest for backends to use AtomicExpandPass to lower some of the
+atomic constructs. Here are some lowerings it can do:
+
+* cmpxchg -> loop with load-linked/store-conditional
+  by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
+  ``emitStoreConditional()``
+* large loads/stores -> ll-sc/cmpxchg
+  by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
+* strong atomic accesses -> monotonic accesses + fences by overriding
+  ``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and
+  ``emitTrailingFence()``
+* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
+  by overriding ``expandAtomicRMWInIR()``
+* expansion to __atomic_* libcalls for unsupported sizes.
+
+For an example of all of these, look at the ARM backend.
+
+Libcalls: __atomic_*
+====================
+
+There are two kinds of atomic library calls that are generated by LLVM. Please
+note that both sets of library functions somewhat confusingly share the names of
+builtin functions defined by clang. Despite this, the library functions are
+not directly related to the builtins: it is *not* the case that ``__atomic_*``
+builtins lower to ``__atomic_*`` library calls and ``__sync_*`` builtins lower
+to ``__sync_*`` library calls.
+
+The first set of library functions are named ``__atomic_*``. This set has been
+"standardized" by GCC, and is described below. (See also `GCC's documentation
+<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_)
+
+LLVM's AtomicExpandPass will translate atomic operations on data sizes above
+``MaxAtomicSizeInBitsSupported`` into calls to these functions.
+
+There are four generic functions, which can be called with data of any size or
+alignment::
+
+   void __atomic_load(size_t size, void *ptr, void *ret, int ordering)
+   void __atomic_store(size_t size, void *ptr, void *val, int ordering)
+   void __atomic_exchange(size_t size, void *ptr, void *val, void *ret, int ordering)
+   bool __atomic_compare_exchange(size_t size, void *ptr, void *expected, void *desired, int success_order, int failure_order)
+
+There are also size-specialized versions of the above functions, which can only
+be used with *naturally-aligned* pointers of the appropriate size. In the
+signatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate
+integer type of that size; if no such integer type exists, the specialization
+cannot be used::
+
+   iN __atomic_load_N(iN *ptr, iN val, int ordering)
+   void __atomic_store_N(iN *ptr, iN val, int ordering)
+   iN __atomic_exchange_N(iN *ptr, iN val, int ordering)
+   bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure_order)
+
+Finally there are some read-modify-write functions, which are only available in
+the size-specific variants (any other sizes use a ``__atomic_compare_exchange``
+loop)::
+
+   iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering)
+   iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering)
+   iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering)
+   iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering)
+   iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering)
+   iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering)
+
+This set of library functions have some interesting implementation requirements
+to take note of:
+
+- They support all sizes and alignments -- including those which cannot be
+  implemented natively on any existing hardware. Therefore, they will certainly
+  use mutexes in for some sizes/alignments.
+
+- As a consequence, they cannot be shipped in a statically linked
+  compiler-support library, as they have state which must be shared amongst all
+  DSOs loaded in the program. They must be provided in a shared library used by
+  all objects.
+
+- The set of atomic sizes supported lock-free must be a superset of the sizes
+  any compiler can emit. That is: if a new compiler introduces support for
+  inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a
+  lock-free implementation for size N. This is a requirement so that code
+  produced by an old compiler (which will have called the ``__atomic_*`` function)
+  interoperates with code produced by the new compiler (which will use native
+  the atomic instruction).
+
+Note that it's possible to write an entirely target-independent implementation
+of these library functions by using the compiler atomic builtins themselves to
+implement the operations on naturally-aligned pointers of supported sizes, and a
+generic mutex implementation otherwise.
+
+Libcalls: __sync_*
+==================
+
+Some targets or OS/target combinations can support lock-free atomics, but for
+various reasons, it is not practical to emit the instructions inline.
+
+There's two typical examples of this.
+
+Some CPUs support multiple instruction sets which can be swiched back and forth
+on function-call boundaries. For example, MIPS supports the MIPS16 ISA, which
+has a smaller instruction encoding than the usual MIPS32 ISA. ARM, similarly,
+has the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic
+instructions are not encodable. However, those instructions are available via a
+function call to a function with the longer encoding.
+
+Additionally, a few OS/target pairs provide kernel-supported lock-free
+atomics. ARM/Linux is an example of this: the kernel `provides
+<https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_ a
+function which on older CPUs contains a "magically-restartable" atomic sequence
+(which looks atomic so long as there's only one CPU), and contains actual atomic
+instructions on newer multicore models. This sort of functionality can typically
+be provided on any architecture, if all CPUs which are missing atomic
+compare-and-swap support are uniprocessor (no SMP). This is almost always the
+case. The only common architecture without that property is SPARC -- SPARCV8 SMP
+systems were common, yet it doesn't support any sort of compare-and-swap
+operation.
+
+In either of these cases, the Target in LLVM can claim support for atomics of an
+appropriate size, and then implement some subset of the operations via libcalls
+to a ``__sync_*`` function. Such functions *must* not use locks in their
+implementation, because unlike the ``__atomic_*`` routines used by
+AtomicExpandPass, these may be mixed-and-matched with native instructions by the
+target lowering.
+
+Further, these routines do not need to be shared, as they are stateless. So,
+there is no issue with having multiple copies included in one binary. Thus,
+typically these routines are implemented by the statically-linked compiler
+runtime support library.
+
+LLVM will emit a call to an appropriate ``__sync_*`` routine if the target
+ISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``, ``ATOMIC_SWAP``,
+or ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the
+availability of those library functions via a call to ``initSyncLibcalls()``.
+
+The full set of functions that may be called by LLVM is (for ``N`` being 1, 2,
+4, 8, or 16)::
+
+  iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired)
+  iN __sync_lock_test_and_set_N(iN *ptr, iN val)
+  iN __sync_fetch_and_add_N(iN *ptr, iN val)
+  iN __sync_fetch_and_sub_N(iN *ptr, iN val)
+  iN __sync_fetch_and_and_N(iN *ptr, iN val)
+  iN __sync_fetch_and_or_N(iN *ptr, iN val)
+  iN __sync_fetch_and_xor_N(iN *ptr, iN val)
+  iN __sync_fetch_and_nand_N(iN *ptr, iN val)
+  iN __sync_fetch_and_max_N(iN *ptr, iN val)
+  iN __sync_fetch_and_umax_N(iN *ptr, iN val)
+  iN __sync_fetch_and_min_N(iN *ptr, iN val)
+  iN __sync_fetch_and_umin_N(iN *ptr, iN val)
+
+This list doesn't include any function for atomic load or store; all known
+architectures support atomic loads and stores directly (possibly by emitting a
+fence on either side of a normal load or store.)
+
+There's also, somewhat separately, the possibility to lower ``ATOMIC_FENCE`` to
+``__sync_synchronize()``. This may happen or not happen independent of all the
+above, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE, ...)``.

Added: www-releases/trunk/5.0.1/docs/_sources/Benchmarking.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/Benchmarking.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/Benchmarking.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/Benchmarking.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,87 @@
+==================================
+Benchmarking tips
+==================================
+
+
+Introduction
+============
+
+For benchmarking a patch we want to reduce all possible sources of
+noise as much as possible. How to do that is very OS dependent.
+
+Note that low noise is required, but not sufficient. It does not
+exclude measurement bias. See
+https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for
+example.
+
+General
+================================
+
+* Use a high resolution timer, e.g. perf under linux.
+
+* Run the benchmark multiple times to be able to recognize noise.
+
+* Disable as many processes or services as possible on the target system.
+
+* Disable frequency scaling, turbo boost and address space
+  randomization (see OS specific section).
+
+* Static link if the OS supports it. That avoids any variation that
+  might be introduced by loading dynamic libraries. This can be done
+  by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.
+
+* Try to avoid storage. On some systems you can use tmpfs. Putting the
+  program, inputs and outputs on tmpfs avoids touching a real storage
+  system, which can have a pretty big variability.
+
+  To mount it (on linux and freebsd at least)::
+
+    mount -t tmpfs -o size=<XX>g none dir_to_mount
+
+Linux
+=====
+
+* Disable address space randomization::
+
+    echo 0 > /proc/sys/kernel/randomize_va_space
+
+* Set scaling_governor to performance::
+
+   for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+   do
+     echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+   done
+
+* Use https://github.com/lpechacek/cpuset to reserve cpus for just the
+  program you are benchmarking. If using perf, leave at least 2 cores
+  so that perf runs in one and your program in another::
+
+    cset shield -c N1,N2 -k on
+
+  This will move all threads out of N1 and N2. The ``-k on`` means
+  that even kernel threads are moved out.
+
+* Disable the SMT pair of the cpus you will use for the benchmark. The
+  pair of cpu N can be found in
+  ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and
+  disabled with::
+
+    echo 0 > /sys/devices/system/cpu/cpuX/online
+
+
+* Run the program with::
+
+    cset shield --exec -- perf stat -r 10 <cmd>
+
+  This will run the command after ``--`` in the isolated cpus. The
+  particular perf command runs the ``<cmd>`` 10 times and reports
+  statistics.
+
+With these in place you can expect perf variations of less than 0.1%.
+
+Linux Intel
+-----------
+
+* Disable turbo mode::
+
+    echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

Added: www-releases/trunk/5.0.1/docs/_sources/BigEndianNEON.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/BigEndianNEON.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/BigEndianNEON.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/BigEndianNEON.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,205 @@
+==============================================
+Using ARM NEON instructions in big endian mode
+==============================================
+
+.. contents::
+    :local:
+
+Introduction
+============
+
+Generating code for big endian ARM processors is for the most part straightforward. NEON loads and stores however have some interesting properties that make code generation decisions less obvious in big endian mode.
+
+The aim of this document is to explain the problem with NEON loads and stores, and the solution that has been implemented in LLVM.
+
+In this document the term "vector" refers to what the ARM ABI calls a "short vector", which is a sequence of items that can fit in a NEON register. This sequence can be 64 or 128 bits in length, and can constitute 8, 16, 32 or 64 bit items. This document refers to A64 instructions throughout, but is almost applicable to the A32/ARMv7 instruction sets also. The ABI format for passing vectors in A32 is sligtly different to A64. Apart from that, the same concepts apply.
+
+Example: C-level intrinsics -> assembly
+---------------------------------------
+
+It may be helpful first to illustrate how C-level ARM NEON intrinsics are lowered to instructions.
+
+This trivial C function takes a vector of four ints and sets the zero'th lane to the value "42"::
+
+    #include <arm_neon.h>
+    int32x4_t f(int32x4_t p) {
+        return vsetq_lane_s32(42, p, 0);
+    }
+
+arm_neon.h intrinsics generate "generic" IR where possible (that is, normal IR instructions not ``llvm.arm.neon.*`` intrinsic calls). The above generates::
+
+    define <4 x i32> @f(<4 x i32> %p) {
+      %vset_lane = insertelement <4 x i32> %p, i32 42, i32 0
+      ret <4 x i32> %vset_lane
+    }
+
+Which then becomes the following trivial assembly::
+
+    f:                                      // @f
+            movz	w8, #0x2a
+            ins 	v0.s[0], w8
+            ret
+
+Problem
+=======
+
+The main problem is how vectors are represented in memory and in registers.
+
+First, a recap. The "endianness" of an item affects its representation in memory only. In a register, a number is just a sequence of bits - 64 bits in the case of AArch64 general purpose registers. Memory, however, is a sequence of addressable units of 8 bits in size. Any number greater than 8 bits must therefore be split up into 8-bit chunks, and endianness describes the order in which these chunks are laid out in memory.
+
+A "little endian" layout has the least significant byte first (lowest in memory address). A "big endian" layout has the *most* significant byte first. This means that when loading an item from big endian memory, the lowest 8-bits in memory must go in the most significant 8-bits, and so forth.
+
+``LDR`` and ``LD1``
+===================
+
+.. figure:: ARM-BE-ldr.png
+    :align: right
+    
+    Big endian vector load using ``LDR``.
+
+
+A vector is a consecutive sequence of items that are operated on simultaneously. To load a 64-bit vector, 64 bits need to be read from memory. In little endian mode, we can do this by just performing a 64-bit load - ``LDR q0, [foo]``. However if we try this in big endian mode, because of the byte swapping the lane indices end up being swapped! The zero'th item as laid out in memory becomes the n'th lane in the vector.
+
+.. figure:: ARM-BE-ld1.png
+    :align: right
+
+    Big endian vector load using ``LD1``. Note that the lanes retain the correct ordering.
+
+
+Because of this, the instruction ``LD1`` performs a vector load but performs byte swapping not on the entire 64 bits, but on the individual items within the vector. This means that the register content is the same as it would have been on a little endian system.
+
+It may seem that ``LD1`` should suffice to peform vector loads on a big endian machine. However there are pros and cons to the two approaches that make it less than simple which register format to pick.
+
+There are two options:
+
+    1. The content of a vector register is the same *as if* it had been loaded with an ``LDR`` instruction.
+    2. The content of a vector register is the same *as if* it had been loaded with an ``LD1`` instruction.
+
+Because ``LD1 == LDR + REV`` and similarly ``LDR == LD1 + REV`` (on a big endian system), we can simulate either type of load with the other type of load plus a ``REV`` instruction. So we're not deciding which instructions to use, but which format to use (which will then influence which instruction is best to use).
+
+.. The 'clearer' container is required to make the following section header come after the floated
+   images above.
+.. container:: clearer
+
+    Note that throughout this section we only mention loads. Stores have exactly the same problems as their associated loads, so have been skipped for brevity.
+ 
+
+Considerations
+==============
+
+LLVM IR Lane ordering
+---------------------
+
+LLVM IR has first class vector types. In LLVM IR, the zero'th element of a vector resides at the lowest memory address. The optimizer relies on this property in certain areas, for example when concatenating vectors together. The intention is for arrays and vectors to have identical memory layouts - ``[4 x i8]`` and ``<4 x i8>`` should be represented the same in memory. Without this property there would be many special cases that the optimizer would have to cleverly handle.
+
+Use of ``LDR`` would break this lane ordering property. This doesn't preclude the use of ``LDR``, but we would have to do one of two things:
+
+   1. Insert a ``REV`` instruction to reverse the lane order after every ``LDR``.
+   2. Disable all optimizations that rely on lane layout, and for every access to an individual lane (``insertelement``/``extractelement``/``shufflevector``) reverse the lane index.
+
+AAPCS
+-----
+
+The ARM procedure call standard (AAPCS) defines the ABI for passing vectors between functions in registers. It states:
+
+    When a short vector is transferred between registers and memory it is treated as an opaque object. That is a short vector is stored in memory as if it were stored with a single ``STR`` of the entire register; a short vector is loaded from memory using the corresponding ``LDR`` instruction. On a little-endian system this means that element 0 will always contain the lowest addressed element of a short vector; on a big-endian system element 0 will contain the highest-addressed element of a short vector.
+
+    -- Procedure Call Standard for the ARM 64-bit Architecture (AArch64), 4.1.2 Short Vectors
+
+The use of ``LDR`` and ``STR`` as the ABI defines has at least one advantage over ``LD1`` and ``ST1``. ``LDR`` and ``STR`` are oblivious to the size of the individual lanes of a vector. ``LD1`` and ``ST1`` are not - the lane size is encoded within them. This is important across an ABI boundary, because it would become necessary to know the lane width the callee expects. Consider the following code:
+
+.. code-block:: c
+
+    <callee.c>
+    void callee(uint32x2_t v) {
+      ...
+    }
+
+    <caller.c>
+    extern void callee(uint32x2_t);
+    void caller() {
+      callee(...);
+    }
+
+If ``callee`` changed its signature to ``uint16x4_t``, which is equivalent in register content, if we passed as ``LD1`` we'd break this code until ``caller`` was updated and recompiled.
+
+There is an argument that if the signatures of the two functions are different then the behaviour should be undefined. But there may be functions that are agnostic to the lane layout of the vector, and treating the vector as an opaque value (just loading it and storing it) would be impossible without a common format across ABI boundaries.
+
+So to preserve ABI compatibility, we need to use the ``LDR`` lane layout across function calls.
+
+Alignment
+---------
+
+In strict alignment mode, ``LDR qX`` requires its address to be 128-bit aligned, whereas ``LD1`` only requires it to be as aligned as the lane size. If we canonicalised on using ``LDR``, we'd still need to use ``LD1`` in some places to avoid alignment faults (the result of the ``LD1`` would then need to be reversed with ``REV``).
+
+Most operating systems however do not run with alignment faults enabled, so this is often not an issue.
+
+Summary
+-------
+
+The following table summarises the instructions that are required to be emitted for each property mentioned above for each of the two solutions.
+
++-------------------------------+-------------------------------+---------------------+
+|                               | ``LDR`` layout                | ``LD1`` layout      |
++===============================+===============================+=====================+
+| Lane ordering                 |   ``LDR + REV``               |    ``LD1``          |
++-------------------------------+-------------------------------+---------------------+
+| AAPCS                         |   ``LDR``                     |    ``LD1 + REV``    |
++-------------------------------+-------------------------------+---------------------+
+| Alignment for strict mode     |   ``LDR`` / ``LD1 + REV``     |    ``LD1``          |
++-------------------------------+-------------------------------+---------------------+
+
+Neither approach is perfect, and choosing one boils down to choosing the lesser of two evils. The issue with lane ordering, it was decided, would have to change target-agnostic compiler passes and would result in a strange IR in which lane indices were reversed. It was decided that this was worse than the changes that would have to be made to support ``LD1``, so ``LD1`` was chosen as the canonical vector load instruction (and by inference, ``ST1`` for vector stores).
+
+Implementation
+==============
+
+There are 3 parts to the implementation:
+
+    1. Predicate ``LDR`` and ``STR`` instructions so that they are never allowed to be selected to generate vector loads and stores. The exception is one-lane vectors [1]_ - these by definition cannot have lane ordering problems so are fine to use ``LDR``/``STR``. 
+
+    2. Create code generation patterns for bitconverts that create ``REV`` instructions.
+
+    3. Make sure appropriate bitconverts are created so that vector values get passed over call boundaries as 1-element vectors (which is the same as if they were loaded with ``LDR``).
+
+Bitconverts
+-----------
+
+.. image:: ARM-BE-bitcastfail.png
+    :align: right
+
+The main problem with the ``LD1`` solution is dealing with bitconverts (or bitcasts, or reinterpret casts). These are pseudo instructions that only change the compiler's interpretation of data, not the underlying data itself. A requirement is that if data is loaded and then saved again (called a "round trip"), the memory contents should be the same after the store as before the load. If a vector is loaded and is then bitconverted to a different vector type before storing, the round trip will currently be broken.
+
+Take for example this code sequence::
+
+    %0 = load <4 x i32> %x
+    %1 = bitcast <4 x i32> %0 to <2 x i64>
+         store <2 x i64> %1, <2 x i64>* %y
+
+This would produce a code sequence such as that in the figure on the right. The mismatched ``LD1`` and ``ST1`` cause the stored data to differ from the loaded data.
+
+.. container:: clearer
+
+    When we see a bitcast from type ``X`` to type ``Y``, what we need to do is to change the in-register representation of the data to be *as if* it had just been loaded by a ``LD1`` of type ``Y``.
+
+.. image:: ARM-BE-bitcastsuccess.png
+    :align: right
+
+Conceptually this is simple - we can insert a ``REV`` undoing the ``LD1`` of type ``X`` (converting the in-register representation to the same as if it had been loaded by ``LDR``) and then insert another ``REV`` to change the representation to be as if it had been loaded by an ``LD1`` of type ``Y``.
+
+For the previous example, this would be::
+
+    LD1   v0.4s, [x]
+
+    REV64 v0.4s, v0.4s                  // There is no REV128 instruction, so it must be synthesizedcd 
+    EXT   v0.16b, v0.16b, v0.16b, #8    // with a REV64 then an EXT to swap the two 64-bit elements.
+
+    REV64 v0.2d, v0.2d
+    EXT   v0.16b, v0.16b, v0.16b, #8
+
+    ST1   v0.2d, [y]
+
+It turns out that these ``REV`` pairs can, in almost all cases, be squashed together into a single ``REV``. For the example above, a ``REV128 4s`` + ``REV128 2d`` is actually a ``REV64 4s``, as shown in the figure on the right.
+
+.. [1] One lane vectors may seem useless as a concept but they serve to distinguish between values held in general purpose registers and values held in NEON/VFP registers. For example, an ``i64`` would live in an ``x`` register, but ``<1 x i64>`` would live in a ``d`` register.
+

Added: www-releases/trunk/5.0.1/docs/_sources/BitCodeFormat.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/BitCodeFormat.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/BitCodeFormat.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/BitCodeFormat.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1336 @@
+.. role:: raw-html(raw)
+   :format: html
+
+========================
+LLVM Bitcode File Format
+========================
+
+.. contents::
+   :local:
+
+Abstract
+========
+
+This document describes the LLVM bitstream file format and the encoding of the
+LLVM IR into it.
+
+Overview
+========
+
+What is commonly known as the LLVM bitcode file format (also, sometimes
+anachronistically known as bytecode) is actually two things: a `bitstream
+container format`_ and an `encoding of LLVM IR`_ into the container format.
+
+The bitstream format is an abstract encoding of structured data, very similar to
+XML in some ways.  Like XML, bitstream files contain tags, and nested
+structures, and you can parse the file without having to understand the tags.
+Unlike XML, the bitstream format is a binary encoding, and unlike XML it
+provides a mechanism for the file to self-describe "abbreviations", which are
+effectively size optimizations for the content.
+
+LLVM IR files may be optionally embedded into a `wrapper`_ structure, or in a
+`native object file`_. Both of these mechanisms make it easy to embed extra
+data along with LLVM IR files.
+
+This document first describes the LLVM bitstream format, describes the wrapper
+format, then describes the record structure used by LLVM IR files.
+
+.. _bitstream container format:
+
+Bitstream Format
+================
+
+The bitstream format is literally a stream of bits, with a very simple
+structure.  This structure consists of the following concepts:
+
+* A "`magic number`_" that identifies the contents of the stream.
+
+* Encoding `primitives`_ like variable bit-rate integers.
+
+* `Blocks`_, which define nested content.
+
+* `Data Records`_, which describe entities within the file.
+
+* Abbreviations, which specify compression optimizations for the file.
+
+Note that the :doc:`llvm-bcanalyzer <CommandGuide/llvm-bcanalyzer>` tool can be
+used to dump and inspect arbitrary bitstreams, which is very useful for
+understanding the encoding.
+
+.. _magic number:
+
+Magic Numbers
+-------------
+
+The first two bytes of a bitcode file are 'BC' (``0x42``, ``0x43``).  The second
+two bytes are an application-specific magic number.  Generic bitcode tools can
+look at only the first two bytes to verify the file is bitcode, while
+application-specific programs will want to look at all four.
+
+.. _primitives:
+
+Primitives
+----------
+
+A bitstream literally consists of a stream of bits, which are read in order
+starting with the least significant bit of each byte.  The stream is made up of
+a number of primitive values that encode a stream of unsigned integer values.
+These integers are encoded in two ways: either as `Fixed Width Integers`_ or as
+`Variable Width Integers`_.
+
+.. _Fixed Width Integers:
+.. _fixed-width value:
+
+Fixed Width Integers
+^^^^^^^^^^^^^^^^^^^^
+
+Fixed-width integer values have their low bits emitted directly to the file.
+For example, a 3-bit integer value encodes 1 as 001.  Fixed width integers are
+used when there are a well-known number of options for a field.  For example,
+boolean values are usually encoded with a 1-bit wide integer.
+
+.. _Variable Width Integers:
+.. _Variable Width Integer:
+.. _variable-width value:
+
+Variable Width Integers
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Variable-width integer (VBR) values encode values of arbitrary size, optimizing
+for the case where the values are small.  Given a 4-bit VBR field, any 3-bit
+value (0 through 7) is encoded directly, with the high bit set to zero.  Values
+larger than N-1 bits emit their bits in a series of N-1 bit chunks, where all
+but the last set the high bit.
+
+For example, the value 27 (0x1B) is encoded as 1011 0011 when emitted as a vbr4
+value.  The first set of four bits indicates the value 3 (011) with a
+continuation piece (indicated by a high bit of 1).  The next word indicates a
+value of 24 (011 << 3) with no continuation.  The sum (3+24) yields the value
+27.
+
+.. _char6-encoded value:
+
+6-bit characters
+^^^^^^^^^^^^^^^^
+
+6-bit characters encode common characters into a fixed 6-bit field.  They
+represent the following characters with the following 6-bit values:
+
+::
+
+  'a' .. 'z' ---  0 .. 25
+  'A' .. 'Z' --- 26 .. 51
+  '0' .. '9' --- 52 .. 61
+         '.' --- 62
+         '_' --- 63
+
+This encoding is only suitable for encoding characters and strings that consist
+only of the above characters.  It is completely incapable of encoding characters
+not in the set.
+
+Word Alignment
+^^^^^^^^^^^^^^
+
+Occasionally, it is useful to emit zero bits until the bitstream is a multiple
+of 32 bits.  This ensures that the bit position in the stream can be represented
+as a multiple of 32-bit words.
+
+Abbreviation IDs
+----------------
+
+A bitstream is a sequential series of `Blocks`_ and `Data Records`_.  Both of
+these start with an abbreviation ID encoded as a fixed-bitwidth field.  The
+width is specified by the current block, as described below.  The value of the
+abbreviation ID specifies either a builtin ID (which have special meanings,
+defined below) or one of the abbreviation IDs defined for the current block by
+the stream itself.
+
+The set of builtin abbrev IDs is:
+
+* 0 - `END_BLOCK`_ --- This abbrev ID marks the end of the current block.
+
+* 1 - `ENTER_SUBBLOCK`_ --- This abbrev ID marks the beginning of a new
+  block.
+
+* 2 - `DEFINE_ABBREV`_ --- This defines a new abbreviation.
+
+* 3 - `UNABBREV_RECORD`_ --- This ID specifies the definition of an
+  unabbreviated record.
+
+Abbreviation IDs 4 and above are defined by the stream itself, and specify an
+`abbreviated record encoding`_.
+
+.. _Blocks:
+
+Blocks
+------
+
+Blocks in a bitstream denote nested regions of the stream, and are identified by
+a content-specific id number (for example, LLVM IR uses an ID of 12 to represent
+function bodies).  Block IDs 0-7 are reserved for `standard blocks`_ whose
+meaning is defined by Bitcode; block IDs 8 and greater are application
+specific. Nested blocks capture the hierarchical structure of the data encoded
+in it, and various properties are associated with blocks as the file is parsed.
+Block definitions allow the reader to efficiently skip blocks in constant time
+if the reader wants a summary of blocks, or if it wants to efficiently skip data
+it does not understand.  The LLVM IR reader uses this mechanism to skip function
+bodies, lazily reading them on demand.
+
+When reading and encoding the stream, several properties are maintained for the
+block.  In particular, each block maintains:
+
+#. A current abbrev id width.  This value starts at 2 at the beginning of the
+   stream, and is set every time a block record is entered.  The block entry
+   specifies the abbrev id width for the body of the block.
+
+#. A set of abbreviations.  Abbreviations may be defined within a block, in
+   which case they are only defined in that block (neither subblocks nor
+   enclosing blocks see the abbreviation).  Abbreviations can also be defined
+   inside a `BLOCKINFO`_ block, in which case they are defined in all blocks
+   that match the ID that the ``BLOCKINFO`` block is describing.
+
+As sub blocks are entered, these properties are saved and the new sub-block has
+its own set of abbreviations, and its own abbrev id width.  When a sub-block is
+popped, the saved values are restored.
+
+.. _ENTER_SUBBLOCK:
+
+ENTER_SUBBLOCK Encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+:raw-html:`<tt>`
+[ENTER_SUBBLOCK, blockid\ :sub:`vbr8`, newabbrevlen\ :sub:`vbr4`, <align32bits>, blocklen_32]
+:raw-html:`</tt>`
+
+The ``ENTER_SUBBLOCK`` abbreviation ID specifies the start of a new block
+record.  The ``blockid`` value is encoded as an 8-bit VBR identifier, and
+indicates the type of block being entered, which can be a `standard block`_ or
+an application-specific block.  The ``newabbrevlen`` value is a 4-bit VBR, which
+specifies the abbrev id width for the sub-block.  The ``blocklen`` value is a
+32-bit aligned value that specifies the size of the subblock in 32-bit
+words. This value allows the reader to skip over the entire block in one jump.
+
+.. _END_BLOCK:
+
+END_BLOCK Encoding
+^^^^^^^^^^^^^^^^^^
+
+``[END_BLOCK, <align32bits>]``
+
+The ``END_BLOCK`` abbreviation ID specifies the end of the current block record.
+Its end is aligned to 32-bits to ensure that the size of the block is an even
+multiple of 32-bits.
+
+.. _Data Records:
+
+Data Records
+------------
+
+Data records consist of a record code and a number of (up to) 64-bit integer
+values.  The interpretation of the code and values is application specific and
+may vary between different block types.  Records can be encoded either using an
+unabbrev record, or with an abbreviation.  In the LLVM IR format, for example,
+there is a record which encodes the target triple of a module.  The code is
+``MODULE_CODE_TRIPLE``, and the values of the record are the ASCII codes for the
+characters in the string.
+
+.. _UNABBREV_RECORD:
+
+UNABBREV_RECORD Encoding
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+:raw-html:`<tt>`
+[UNABBREV_RECORD, code\ :sub:`vbr6`, numops\ :sub:`vbr6`, op0\ :sub:`vbr6`, op1\ :sub:`vbr6`, ...]
+:raw-html:`</tt>`
+
+An ``UNABBREV_RECORD`` provides a default fallback encoding, which is both
+completely general and extremely inefficient.  It can describe an arbitrary
+record by emitting the code and operands as VBRs.
+
+For example, emitting an LLVM IR target triple as an unabbreviated record
+requires emitting the ``UNABBREV_RECORD`` abbrevid, a vbr6 for the
+``MODULE_CODE_TRIPLE`` code, a vbr6 for the length of the string, which is equal
+to the number of operands, and a vbr6 for each character.  Because there are no
+letters with values less than 32, each letter would need to be emitted as at
+least a two-part VBR, which means that each letter would require at least 12
+bits.  This is not an efficient encoding, but it is fully general.
+
+.. _abbreviated record encoding:
+
+Abbreviated Record Encoding
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[<abbrevid>, fields...]``
+
+An abbreviated record is a abbreviation id followed by a set of fields that are
+encoded according to the `abbreviation definition`_.  This allows records to be
+encoded significantly more densely than records encoded with the
+`UNABBREV_RECORD`_ type, and allows the abbreviation types to be specified in
+the stream itself, which allows the files to be completely self describing.  The
+actual encoding of abbreviations is defined below.
+
+The record code, which is the first field of an abbreviated record, may be
+encoded in the abbreviation definition (as a literal operand) or supplied in the
+abbreviated record (as a Fixed or VBR operand value).
+
+.. _abbreviation definition:
+
+Abbreviations
+-------------
+
+Abbreviations are an important form of compression for bitstreams.  The idea is
+to specify a dense encoding for a class of records once, then use that encoding
+to emit many records.  It takes space to emit the encoding into the file, but
+the space is recouped (hopefully plus some) when the records that use it are
+emitted.
+
+Abbreviations can be determined dynamically per client, per file. Because the
+abbreviations are stored in the bitstream itself, different streams of the same
+format can contain different sets of abbreviations according to the needs of the
+specific stream.  As a concrete example, LLVM IR files usually emit an
+abbreviation for binary operators.  If a specific LLVM module contained no or
+few binary operators, the abbreviation does not need to be emitted.
+
+.. _DEFINE_ABBREV:
+
+DEFINE_ABBREV Encoding
+^^^^^^^^^^^^^^^^^^^^^^
+
+:raw-html:`<tt>`
+[DEFINE_ABBREV, numabbrevops\ :sub:`vbr5`, abbrevop0, abbrevop1, ...]
+:raw-html:`</tt>`
+
+A ``DEFINE_ABBREV`` record adds an abbreviation to the list of currently defined
+abbreviations in the scope of this block.  This definition only exists inside
+this immediate block --- it is not visible in subblocks or enclosing blocks.
+Abbreviations are implicitly assigned IDs sequentially starting from 4 (the
+first application-defined abbreviation ID).  Any abbreviations defined in a
+``BLOCKINFO`` record for the particular block type receive IDs first, in order,
+followed by any abbreviations defined within the block itself.  Abbreviated data
+records reference this ID to indicate what abbreviation they are invoking.
+
+An abbreviation definition consists of the ``DEFINE_ABBREV`` abbrevid followed
+by a VBR that specifies the number of abbrev operands, then the abbrev operands
+themselves.  Abbreviation operands come in three forms.  They all start with a
+single bit that indicates whether the abbrev operand is a literal operand (when
+the bit is 1) or an encoding operand (when the bit is 0).
+
+#. Literal operands --- :raw-html:`<tt>` [1\ :sub:`1`, litvalue\
+   :sub:`vbr8`] :raw-html:`</tt>` --- Literal operands specify that the value in
+   the result is always a single specific value.  This specific value is emitted
+   as a vbr8 after the bit indicating that it is a literal operand.
+
+#. Encoding info without data --- :raw-html:`<tt>` [0\ :sub:`1`, encoding\
+   :sub:`3`] :raw-html:`</tt>` --- Operand encodings that do not have extra data
+   are just emitted as their code.
+
+#. Encoding info with data --- :raw-html:`<tt>` [0\ :sub:`1`, encoding\
+   :sub:`3`, value\ :sub:`vbr5`] :raw-html:`</tt>` --- Operand encodings that do
+   have extra data are emitted as their code, followed by the extra data.
+
+The possible operand encodings are:
+
+* Fixed (code 1): The field should be emitted as a `fixed-width value`_, whose
+  width is specified by the operand's extra data.
+
+* VBR (code 2): The field should be emitted as a `variable-width value`_, whose
+  width is specified by the operand's extra data.
+
+* Array (code 3): This field is an array of values.  The array operand has no
+  extra data, but expects another operand to follow it, indicating the element
+  type of the array.  When reading an array in an abbreviated record, the first
+  integer is a vbr6 that indicates the array length, followed by the encoded
+  elements of the array.  An array may only occur as the last operand of an
+  abbreviation (except for the one final operand that gives the array's
+  type).
+
+* Char6 (code 4): This field should be emitted as a `char6-encoded value`_.
+  This operand type takes no extra data. Char6 encoding is normally used as an
+  array element type.
+
+* Blob (code 5): This field is emitted as a vbr6, followed by padding to a
+  32-bit boundary (for alignment) and an array of 8-bit objects.  The array of
+  bytes is further followed by tail padding to ensure that its total length is a
+  multiple of 4 bytes.  This makes it very efficient for the reader to decode
+  the data without having to make a copy of it: it can use a pointer to the data
+  in the mapped in file and poke directly at it.  A blob may only occur as the
+  last operand of an abbreviation.
+
+For example, target triples in LLVM modules are encoded as a record of the form
+``[TRIPLE, 'a', 'b', 'c', 'd']``.  Consider if the bitstream emitted the
+following abbrev entry:
+
+::
+
+  [0, Fixed, 4]
+  [0, Array]
+  [0, Char6]
+
+When emitting a record with this abbreviation, the above entry would be emitted
+as:
+
+:raw-html:`<tt><blockquote>`
+[4\ :sub:`abbrevwidth`, 2\ :sub:`4`, 4\ :sub:`vbr6`, 0\ :sub:`6`, 1\ :sub:`6`, 2\ :sub:`6`, 3\ :sub:`6`]
+:raw-html:`</blockquote></tt>`
+
+These values are:
+
+#. The first value, 4, is the abbreviation ID for this abbreviation.
+
+#. The second value, 2, is the record code for ``TRIPLE`` records within LLVM IR
+   file ``MODULE_BLOCK`` blocks.
+
+#. The third value, 4, is the length of the array.
+
+#. The rest of the values are the char6 encoded values for ``"abcd"``.
+
+With this abbreviation, the triple is emitted with only 37 bits (assuming a
+abbrev id width of 3).  Without the abbreviation, significantly more space would
+be required to emit the target triple.  Also, because the ``TRIPLE`` value is
+not emitted as a literal in the abbreviation, the abbreviation can also be used
+for any other string value.
+
+.. _standard blocks:
+.. _standard block:
+
+Standard Blocks
+---------------
+
+In addition to the basic block structure and record encodings, the bitstream
+also defines specific built-in block types.  These block types specify how the
+stream is to be decoded or other metadata.  In the future, new standard blocks
+may be added.  Block IDs 0-7 are reserved for standard blocks.
+
+.. _BLOCKINFO:
+
+#0 - BLOCKINFO Block
+^^^^^^^^^^^^^^^^^^^^
+
+The ``BLOCKINFO`` block allows the description of metadata for other blocks.
+The currently specified records are:
+
+::
+
+  [SETBID (#1), blockid]
+  [DEFINE_ABBREV, ...]
+  [BLOCKNAME, ...name...]
+  [SETRECORDNAME, RecordID, ...name...]
+
+The ``SETBID`` record (code 1) indicates which block ID is being described.
+``SETBID`` records can occur multiple times throughout the block to change which
+block ID is being described.  There must be a ``SETBID`` record prior to any
+other records.
+
+Standard ``DEFINE_ABBREV`` records can occur inside ``BLOCKINFO`` blocks, but
+unlike their occurrence in normal blocks, the abbreviation is defined for blocks
+matching the block ID we are describing, *not* the ``BLOCKINFO`` block
+itself.  The abbreviations defined in ``BLOCKINFO`` blocks receive abbreviation
+IDs as described in `DEFINE_ABBREV`_.
+
+The ``BLOCKNAME`` record (code 2) can optionally occur in this block.  The
+elements of the record are the bytes of the string name of the block.
+llvm-bcanalyzer can use this to dump out bitcode files symbolically.
+
+The ``SETRECORDNAME`` record (code 3) can also optionally occur in this block.
+The first operand value is a record ID number, and the rest of the elements of
+the record are the bytes for the string name of the record.  llvm-bcanalyzer can
+use this to dump out bitcode files symbolically.
+
+Note that although the data in ``BLOCKINFO`` blocks is described as "metadata,"
+the abbreviations they contain are essential for parsing records from the
+corresponding blocks.  It is not safe to skip them.
+
+.. _wrapper:
+
+Bitcode Wrapper Format
+======================
+
+Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper
+structure.  This structure contains a simple header that indicates the offset
+and size of the embedded BC file.  This allows additional information to be
+stored alongside the BC file.  The structure of this file header is:
+
+:raw-html:`<tt><blockquote>`
+[Magic\ :sub:`32`, Version\ :sub:`32`, Offset\ :sub:`32`, Size\ :sub:`32`, CPUType\ :sub:`32`]
+:raw-html:`</blockquote></tt>`
+
+Each of the fields are 32-bit fields stored in little endian form (as with the
+rest of the bitcode file fields).  The Magic number is always ``0x0B17C0DE`` and
+the version is currently always ``0``.  The Offset field is the offset in bytes
+to the start of the bitcode stream in the file, and the Size field is the size
+in bytes of the stream. CPUType is a target-specific value that can be used to
+encode the CPU of the target.
+
+.. _native object file:
+
+Native Object File Wrapper Format
+=================================
+
+Bitcode files for LLVM IR may also be wrapped in a native object file
+(i.e. ELF, COFF, Mach-O).  The bitcode must be stored in a section of the object
+file named ``__LLVM,__bitcode`` for MachO and ``.llvmbc`` for the other object
+formats.  This wrapper format is useful for accommodating LTO in compilation
+pipelines where intermediate objects must be native object files which contain
+metadata in other sections.
+
+Not all tools support this format.
+
+.. _encoding of LLVM IR:
+
+LLVM IR Encoding
+================
+
+LLVM IR is encoded into a bitstream by defining blocks and records.  It uses
+blocks for things like constant pools, functions, symbol tables, etc.  It uses
+records for things like instructions, global variable descriptors, type
+descriptions, etc.  This document does not describe the set of abbreviations
+that the writer uses, as these are fully self-described in the file, and the
+reader is not allowed to build in any knowledge of this.
+
+Basics
+------
+
+LLVM IR Magic Number
+^^^^^^^^^^^^^^^^^^^^
+
+The magic number for LLVM IR files is:
+
+:raw-html:`<tt><blockquote>`
+[0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
+:raw-html:`</blockquote></tt>`
+
+When combined with the bitcode magic number and viewed as bytes, this is
+``"BC 0xC0DE"``.
+
+.. _Signed VBRs:
+
+Signed VBRs
+^^^^^^^^^^^
+
+`Variable Width Integer`_ encoding is an efficient way to encode arbitrary sized
+unsigned values, but is an extremely inefficient for encoding signed values, as
+signed values are otherwise treated as maximally large unsigned values.
+
+As such, signed VBR values of a specific width are emitted as follows:
+
+* Positive values are emitted as VBRs of the specified width, but with their
+  value shifted left by one.
+
+* Negative values are emitted as VBRs of the specified width, but the negated
+  value is shifted left by one, and the low bit is set.
+
+With this encoding, small positive and small negative values can both be emitted
+efficiently. Signed VBR encoding is used in ``CST_CODE_INTEGER`` and
+``CST_CODE_WIDE_INTEGER`` records within ``CONSTANTS_BLOCK`` blocks.
+It is also used for phi instruction operands in `MODULE_CODE_VERSION`_ 1.
+
+LLVM IR Blocks
+^^^^^^^^^^^^^^
+
+LLVM IR is defined with the following blocks:
+
+* 8 --- `MODULE_BLOCK`_ --- This is the top-level block that contains the entire
+  module, and describes a variety of per-module information.
+
+* 9 --- `PARAMATTR_BLOCK`_ --- This enumerates the parameter attributes.
+
+* 10 --- `PARAMATTR_GROUP_BLOCK`_ --- This describes the attribute group table.
+
+* 11 --- `CONSTANTS_BLOCK`_ --- This describes constants for a module or
+  function.
+
+* 12 --- `FUNCTION_BLOCK`_ --- This describes a function body.
+
+* 14 --- `VALUE_SYMTAB_BLOCK`_ --- This describes a value symbol table.
+
+* 15 --- `METADATA_BLOCK`_ --- This describes metadata items.
+
+* 16 --- `METADATA_ATTACHMENT`_ --- This contains records associating metadata
+  with function instruction values.
+
+* 17 --- `TYPE_BLOCK`_ --- This describes all of the types in the module.
+
+* 23 --- `STRTAB_BLOCK`_ --- The bitcode file's string table.
+
+.. _MODULE_BLOCK:
+
+MODULE_BLOCK Contents
+---------------------
+
+The ``MODULE_BLOCK`` block (id 8) is the top-level block for LLVM bitcode files,
+and each bitcode file must contain exactly one. In addition to records
+(described below) containing information about the module, a ``MODULE_BLOCK``
+block may contain the following sub-blocks:
+
+* `BLOCKINFO`_
+* `PARAMATTR_BLOCK`_
+* `PARAMATTR_GROUP_BLOCK`_
+* `TYPE_BLOCK`_
+* `VALUE_SYMTAB_BLOCK`_
+* `CONSTANTS_BLOCK`_
+* `FUNCTION_BLOCK`_
+* `METADATA_BLOCK`_
+
+.. _MODULE_CODE_VERSION:
+
+MODULE_CODE_VERSION Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[VERSION, version#]``
+
+The ``VERSION`` record (code 1) contains a single value indicating the format
+version. Versions 0, 1 and 2 are supported at this time. The difference between
+version 0 and 1 is in the encoding of instruction operands in
+each `FUNCTION_BLOCK`_.
+
+In version 0, each value defined by an instruction is assigned an ID
+unique to the function. Function-level value IDs are assigned starting from
+``NumModuleValues`` since they share the same namespace as module-level
+values. The value enumerator resets after each function. When a value is
+an operand of an instruction, the value ID is used to represent the operand.
+For large functions or large modules, these operand values can be large.
+
+The encoding in version 1 attempts to avoid large operand values
+in common cases. Instead of using the value ID directly, operands are
+encoded as relative to the current instruction. Thus, if an operand
+is the value defined by the previous instruction, the operand
+will be encoded as 1.
+
+For example, instead of
+
+.. code-block:: none
+
+  #n = load #n-1
+  #n+1 = icmp eq #n, #const0
+  br #n+1, label #(bb1), label #(bb2)
+
+version 1 will encode the instructions as
+
+.. code-block:: none
+
+  #n = load #1
+  #n+1 = icmp eq #1, (#n+1)-#const0
+  br #1, label #(bb1), label #(bb2)
+
+Note in the example that operands which are constants also use
+the relative encoding, while operands like basic block labels
+do not use the relative encoding.
+
+Forward references will result in a negative value.
+This can be inefficient, as operands are normally encoded
+as unsigned VBRs. However, forward references are rare, except in the
+case of phi instructions. For phi instructions, operands are encoded as
+`Signed VBRs`_ to deal with forward references.
+
+In version 2, the meaning of module records ``FUNCTION``, ``GLOBALVAR``,
+``ALIAS``, ``IFUNC`` and ``COMDAT`` change such that the first two operands
+specify an offset and size of a string in a string table (see `STRTAB_BLOCK
+Contents`_), the function name is removed from the ``FNENTRY`` record in the
+value symbol table, and the top-level ``VALUE_SYMTAB_BLOCK`` may only contain
+``FNENTRY`` records.
+
+MODULE_CODE_TRIPLE Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[TRIPLE, ...string...]``
+
+The ``TRIPLE`` record (code 2) contains a variable number of values representing
+the bytes of the ``target triple`` specification string.
+
+MODULE_CODE_DATALAYOUT Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[DATALAYOUT, ...string...]``
+
+The ``DATALAYOUT`` record (code 3) contains a variable number of values
+representing the bytes of the ``target datalayout`` specification string.
+
+MODULE_CODE_ASM Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[ASM, ...string...]``
+
+The ``ASM`` record (code 4) contains a variable number of values representing
+the bytes of ``module asm`` strings, with individual assembly blocks separated
+by newline (ASCII 10) characters.
+
+.. _MODULE_CODE_SECTIONNAME:
+
+MODULE_CODE_SECTIONNAME Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[SECTIONNAME, ...string...]``
+
+The ``SECTIONNAME`` record (code 5) contains a variable number of values
+representing the bytes of a single section name string. There should be one
+``SECTIONNAME`` record for each section name referenced (e.g., in global
+variable or function ``section`` attributes) within the module. These records
+can be referenced by the 1-based index in the *section* fields of ``GLOBALVAR``
+or ``FUNCTION`` records.
+
+MODULE_CODE_DEPLIB Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[DEPLIB, ...string...]``
+
+The ``DEPLIB`` record (code 6) contains a variable number of values representing
+the bytes of a single dependent library name string, one of the libraries
+mentioned in a ``deplibs`` declaration.  There should be one ``DEPLIB`` record
+for each library name referenced.
+
+MODULE_CODE_GLOBALVAR Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[GLOBALVAR, strtab offset, strtab size, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal, unnamed_addr, externally_initialized, dllstorageclass, comdat]``
+
+The ``GLOBALVAR`` record (code 7) marks the declaration or definition of a
+global variable. The operand fields are:
+
+* *strtab offset*, *strtab size*: Specifies the name of the global variable.
+  See `STRTAB_BLOCK Contents`_.
+
+* *pointer type*: The type index of the pointer type used to point to this
+  global variable
+
+* *isconst*: Non-zero if the variable is treated as constant within the module,
+  or zero if it is not
+
+* *initid*: If non-zero, the value index of the initializer for this variable,
+  plus 1.
+
+.. _linkage type:
+
+* *linkage*: An encoding of the linkage type for this variable:
+
+  * ``external``: code 0
+  * ``weak``: code 1
+  * ``appending``: code 2
+  * ``internal``: code 3
+  * ``linkonce``: code 4
+  * ``dllimport``: code 5
+  * ``dllexport``: code 6
+  * ``extern_weak``: code 7
+  * ``common``: code 8
+  * ``private``: code 9
+  * ``weak_odr``: code 10
+  * ``linkonce_odr``: code 11
+  * ``available_externally``: code 12
+  * deprecated : code 13
+  * deprecated : code 14
+
+* alignment*: The logarithm base 2 of the variable's requested alignment, plus 1
+
+* *section*: If non-zero, the 1-based section index in the table of
+  `MODULE_CODE_SECTIONNAME`_ entries.
+
+.. _visibility:
+
+* *visibility*: If present, an encoding of the visibility of this variable:
+
+  * ``default``: code 0
+  * ``hidden``: code 1
+  * ``protected``: code 2
+
+.. _bcthreadlocal:
+
+* *threadlocal*: If present, an encoding of the thread local storage mode of the
+  variable:
+
+  * ``not thread local``: code 0
+  * ``thread local; default TLS model``: code 1
+  * ``localdynamic``: code 2
+  * ``initialexec``: code 3
+  * ``localexec``: code 4
+
+.. _bcunnamedaddr:
+
+* *unnamed_addr*: If present, an encoding of the ``unnamed_addr`` attribute of this
+  variable:
+
+  * not ``unnamed_addr``: code 0
+  * ``unnamed_addr``: code 1
+  * ``local_unnamed_addr``: code 2
+
+.. _bcdllstorageclass:
+
+* *dllstorageclass*: If present, an encoding of the DLL storage class of this variable:
+
+  * ``default``: code 0
+  * ``dllimport``: code 1
+  * ``dllexport``: code 2
+
+* *comdat*: An encoding of the COMDAT of this function
+
+.. _FUNCTION:
+
+MODULE_CODE_FUNCTION Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[FUNCTION, strtab offset, strtab size, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc, prologuedata, dllstorageclass, comdat, prefixdata, personalityfn]``
+
+The ``FUNCTION`` record (code 8) marks the declaration or definition of a
+function. The operand fields are:
+
+* *strtab offset*, *strtab size*: Specifies the name of the function.
+  See `STRTAB_BLOCK Contents`_.
+
+* *type*: The type index of the function type describing this function
+
+* *callingconv*: The calling convention number:
+  * ``ccc``: code 0
+  * ``fastcc``: code 8
+  * ``coldcc``: code 9
+  * ``webkit_jscc``: code 12
+  * ``anyregcc``: code 13
+  * ``preserve_mostcc``: code 14
+  * ``preserve_allcc``: code 15
+  * ``swiftcc`` : code 16
+  * ``cxx_fast_tlscc``: code 17
+  * ``x86_stdcallcc``: code 64
+  * ``x86_fastcallcc``: code 65
+  * ``arm_apcscc``: code 66
+  * ``arm_aapcscc``: code 67
+  * ``arm_aapcs_vfpcc``: code 68
+
+* isproto*: Non-zero if this entry represents a declaration rather than a
+  definition
+
+* *linkage*: An encoding of the `linkage type`_ for this function
+
+* *paramattr*: If nonzero, the 1-based parameter attribute index into the table
+  of `PARAMATTR_CODE_ENTRY`_ entries.
+
+* *alignment*: The logarithm base 2 of the function's requested alignment, plus
+  1
+
+* *section*: If non-zero, the 1-based section index in the table of
+  `MODULE_CODE_SECTIONNAME`_ entries.
+
+* *visibility*: An encoding of the `visibility`_ of this function
+
+* *gc*: If present and nonzero, the 1-based garbage collector index in the table
+  of `MODULE_CODE_GCNAME`_ entries.
+
+* *unnamed_addr*: If present, an encoding of the
+  :ref:`unnamed_addr<bcunnamedaddr>` attribute of this function
+
+* *prologuedata*: If non-zero, the value index of the prologue data for this function,
+  plus 1.
+
+* *dllstorageclass*: An encoding of the
+  :ref:`dllstorageclass<bcdllstorageclass>` of this function
+
+* *comdat*: An encoding of the COMDAT of this function
+
+* *prefixdata*: If non-zero, the value index of the prefix data for this function,
+  plus 1.
+
+* *personalityfn*: If non-zero, the value index of the personality function for this function,
+  plus 1.
+
+MODULE_CODE_ALIAS Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[ALIAS, strtab offset, strtab size, alias type, aliasee val#, linkage, visibility, dllstorageclass, threadlocal, unnamed_addr]``
+
+The ``ALIAS`` record (code 9) marks the definition of an alias. The operand
+fields are
+
+* *strtab offset*, *strtab size*: Specifies the name of the alias.
+  See `STRTAB_BLOCK Contents`_.
+
+* *alias type*: The type index of the alias
+
+* *aliasee val#*: The value index of the aliased value
+
+* *linkage*: An encoding of the `linkage type`_ for this alias
+
+* *visibility*: If present, an encoding of the `visibility`_ of the alias
+
+* *dllstorageclass*: If present, an encoding of the
+  :ref:`dllstorageclass<bcdllstorageclass>` of the alias
+
+* *threadlocal*: If present, an encoding of the
+  :ref:`thread local property<bcthreadlocal>` of the alias
+
+* *unnamed_addr*: If present, an encoding of the
+  :ref:`unnamed_addr<bcunnamedaddr>` attribute of this alias
+
+.. _MODULE_CODE_GCNAME:
+
+MODULE_CODE_GCNAME Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[GCNAME, ...string...]``
+
+The ``GCNAME`` record (code 11) contains a variable number of values
+representing the bytes of a single garbage collector name string. There should
+be one ``GCNAME`` record for each garbage collector name referenced in function
+``gc`` attributes within the module. These records can be referenced by 1-based
+index in the *gc* fields of ``FUNCTION`` records.
+
+.. _PARAMATTR_BLOCK:
+
+PARAMATTR_BLOCK Contents
+------------------------
+
+The ``PARAMATTR_BLOCK`` block (id 9) contains a table of entries describing the
+attributes of function parameters. These entries are referenced by 1-based index
+in the *paramattr* field of module block `FUNCTION`_ records, or within the
+*attr* field of function block ``INST_INVOKE`` and ``INST_CALL`` records.
+
+Entries within ``PARAMATTR_BLOCK`` are constructed to ensure that each is unique
+(i.e., no two indices represent equivalent attribute lists).
+
+.. _PARAMATTR_CODE_ENTRY:
+
+PARAMATTR_CODE_ENTRY Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[ENTRY, attrgrp0, attrgrp1, ...]``
+
+The ``ENTRY`` record (code 2) contains a variable number of values describing a
+unique set of function parameter attributes. Each *attrgrp* value is used as a
+key with which to look up an entry in the the attribute group table described
+in the ``PARAMATTR_GROUP_BLOCK`` block.
+
+.. _PARAMATTR_CODE_ENTRY_OLD:
+
+PARAMATTR_CODE_ENTRY_OLD Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. note::
+  This is a legacy encoding for attributes, produced by LLVM versions 3.2 and
+  earlier. It is guaranteed to be understood by the current LLVM version, as
+  specified in the :ref:`IR backwards compatibility` policy.
+
+``[ENTRY, paramidx0, attr0, paramidx1, attr1...]``
+
+The ``ENTRY`` record (code 1) contains an even number of values describing a
+unique set of function parameter attributes. Each *paramidx* value indicates
+which set of attributes is represented, with 0 representing the return value
+attributes, 0xFFFFFFFF representing function attributes, and other values
+representing 1-based function parameters. Each *attr* value is a bitmap with the
+following interpretation:
+
+* bit 0: ``zeroext``
+* bit 1: ``signext``
+* bit 2: ``noreturn``
+* bit 3: ``inreg``
+* bit 4: ``sret``
+* bit 5: ``nounwind``
+* bit 6: ``noalias``
+* bit 7: ``byval``
+* bit 8: ``nest``
+* bit 9: ``readnone``
+* bit 10: ``readonly``
+* bit 11: ``noinline``
+* bit 12: ``alwaysinline``
+* bit 13: ``optsize``
+* bit 14: ``ssp``
+* bit 15: ``sspreq``
+* bits 16-31: ``align n``
+* bit 32: ``nocapture``
+* bit 33: ``noredzone``
+* bit 34: ``noimplicitfloat``
+* bit 35: ``naked``
+* bit 36: ``inlinehint``
+* bits 37-39: ``alignstack n``, represented as the logarithm
+  base 2 of the requested alignment, plus 1
+
+.. _PARAMATTR_GROUP_BLOCK:
+
+PARAMATTR_GROUP_BLOCK Contents
+------------------------------
+
+The ``PARAMATTR_GROUP_BLOCK`` block (id 10) contains a table of entries
+describing the attribute groups present in the module. These entries can be
+referenced within ``PARAMATTR_CODE_ENTRY`` entries.
+
+.. _PARAMATTR_GRP_CODE_ENTRY:
+
+PARAMATTR_GRP_CODE_ENTRY Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[ENTRY, grpid, paramidx, attr0, attr1, ...]``
+
+The ``ENTRY`` record (code 3) contains *grpid* and *paramidx* values, followed
+by a variable number of values describing a unique group of attributes. The
+*grpid* value is a unique key for the attribute group, which can be referenced
+within ``PARAMATTR_CODE_ENTRY`` entries. The *paramidx* value indicates which
+set of attributes is represented, with 0 representing the return value
+attributes, 0xFFFFFFFF representing function attributes, and other values
+representing 1-based function parameters.
+
+Each *attr* is itself represented as a variable number of values:
+
+``kind, key [, ...], [value [, ...]]``
+
+Each attribute is either a well-known LLVM attribute (possibly with an integer
+value associated with it), or an arbitrary string (possibly with an arbitrary
+string value associated with it). The *kind* value is an integer code
+distinguishing between these possibilities:
+
+* code 0: well-known attribute
+* code 1: well-known attribute with an integer value
+* code 3: string attribute
+* code 4: string attribute with a string value
+
+For well-known attributes (code 0 or 1), the *key* value is an integer code
+identifying the attribute. For attributes with an integer argument (code 1),
+the *value* value indicates the argument.
+
+For string attributes (code 3 or 4), the *key* value is actually a variable
+number of values representing the bytes of a null-terminated string. For
+attributes with a string argument (code 4), the *value* value is similarly a
+variable number of values representing the bytes of a null-terminated string.
+
+The integer codes are mapped to well-known attributes as follows.
+
+* code 1: ``align(<n>)``
+* code 2: ``alwaysinline``
+* code 3: ``byval``
+* code 4: ``inlinehint``
+* code 5: ``inreg``
+* code 6: ``minsize``
+* code 7: ``naked``
+* code 8: ``nest``
+* code 9: ``noalias``
+* code 10: ``nobuiltin``
+* code 11: ``nocapture``
+* code 12: ``noduplicates``
+* code 13: ``noimplicitfloat``
+* code 14: ``noinline``
+* code 15: ``nonlazybind``
+* code 16: ``noredzone``
+* code 17: ``noreturn``
+* code 18: ``nounwind``
+* code 19: ``optsize``
+* code 20: ``readnone``
+* code 21: ``readonly``
+* code 22: ``returned``
+* code 23: ``returns_twice``
+* code 24: ``signext``
+* code 25: ``alignstack(<n>)``
+* code 26: ``ssp``
+* code 27: ``sspreq``
+* code 28: ``sspstrong``
+* code 29: ``sret``
+* code 30: ``sanitize_address``
+* code 31: ``sanitize_thread``
+* code 32: ``sanitize_memory``
+* code 33: ``uwtable``
+* code 34: ``zeroext``
+* code 35: ``builtin``
+* code 36: ``cold``
+* code 37: ``optnone``
+* code 38: ``inalloca``
+* code 39: ``nonnull``
+* code 40: ``jumptable``
+* code 41: ``dereferenceable(<n>)``
+* code 42: ``dereferenceable_or_null(<n>)``
+* code 43: ``convergent``
+* code 44: ``safestack``
+* code 45: ``argmemonly``
+* code 46: ``swiftself``
+* code 47: ``swifterror``
+* code 48: ``norecurse``
+* code 49: ``inaccessiblememonly``
+* code 50: ``inaccessiblememonly_or_argmemonly``
+* code 51: ``allocsize(<EltSizeParam>[, <NumEltsParam>])``
+* code 52: ``writeonly``
+
+.. note::
+  The ``allocsize`` attribute has a special encoding for its arguments. Its two
+  arguments, which are 32-bit integers, are packed into one 64-bit integer value
+  (i.e. ``(EltSizeParam << 32) | NumEltsParam``), with ``NumEltsParam`` taking on
+  the sentinel value -1 if it is not specified.
+
+.. _TYPE_BLOCK:
+
+TYPE_BLOCK Contents
+-------------------
+
+The ``TYPE_BLOCK`` block (id 17) contains records which constitute a table of
+type operator entries used to represent types referenced within an LLVM
+module. Each record (with the exception of `NUMENTRY`_) generates a single type
+table entry, which may be referenced by 0-based index from instructions,
+constants, metadata, type symbol table entries, or other type operator records.
+
+Entries within ``TYPE_BLOCK`` are constructed to ensure that each entry is
+unique (i.e., no two indices represent structurally equivalent types).
+
+.. _TYPE_CODE_NUMENTRY:
+.. _NUMENTRY:
+
+TYPE_CODE_NUMENTRY Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[NUMENTRY, numentries]``
+
+The ``NUMENTRY`` record (code 1) contains a single value which indicates the
+total number of type code entries in the type table of the module. If present,
+``NUMENTRY`` should be the first record in the block.
+
+TYPE_CODE_VOID Record
+^^^^^^^^^^^^^^^^^^^^^
+
+``[VOID]``
+
+The ``VOID`` record (code 2) adds a ``void`` type to the type table.
+
+TYPE_CODE_HALF Record
+^^^^^^^^^^^^^^^^^^^^^
+
+``[HALF]``
+
+The ``HALF`` record (code 10) adds a ``half`` (16-bit floating point) type to
+the type table.
+
+TYPE_CODE_FLOAT Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[FLOAT]``
+
+The ``FLOAT`` record (code 3) adds a ``float`` (32-bit floating point) type to
+the type table.
+
+TYPE_CODE_DOUBLE Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[DOUBLE]``
+
+The ``DOUBLE`` record (code 4) adds a ``double`` (64-bit floating point) type to
+the type table.
+
+TYPE_CODE_LABEL Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[LABEL]``
+
+The ``LABEL`` record (code 5) adds a ``label`` type to the type table.
+
+TYPE_CODE_OPAQUE Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[OPAQUE]``
+
+The ``OPAQUE`` record (code 6) adds an ``opaque`` type to the type table, with
+a name defined by a previously encountered ``STRUCT_NAME`` record. Note that
+distinct ``opaque`` types are not unified.
+
+TYPE_CODE_INTEGER Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[INTEGER, width]``
+
+The ``INTEGER`` record (code 7) adds an integer type to the type table. The
+single *width* field indicates the width of the integer type.
+
+TYPE_CODE_POINTER Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[POINTER, pointee type, address space]``
+
+The ``POINTER`` record (code 8) adds a pointer type to the type table. The
+operand fields are
+
+* *pointee type*: The type index of the pointed-to type
+
+* *address space*: If supplied, the target-specific numbered address space where
+  the pointed-to object resides. Otherwise, the default address space is zero.
+
+TYPE_CODE_FUNCTION_OLD Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. note::
+  This is a legacy encoding for functions, produced by LLVM versions 3.0 and
+  earlier. It is guaranteed to be understood by the current LLVM version, as
+  specified in the :ref:`IR backwards compatibility` policy.
+
+``[FUNCTION_OLD, vararg, ignored, retty, ...paramty... ]``
+
+The ``FUNCTION_OLD`` record (code 9) adds a function type to the type table.
+The operand fields are
+
+* *vararg*: Non-zero if the type represents a varargs function
+
+* *ignored*: This value field is present for backward compatibility only, and is
+  ignored
+
+* *retty*: The type index of the function's return type
+
+* *paramty*: Zero or more type indices representing the parameter types of the
+  function
+
+TYPE_CODE_ARRAY Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[ARRAY, numelts, eltty]``
+
+The ``ARRAY`` record (code 11) adds an array type to the type table.  The
+operand fields are
+
+* *numelts*: The number of elements in arrays of this type
+
+* *eltty*: The type index of the array element type
+
+TYPE_CODE_VECTOR Record
+^^^^^^^^^^^^^^^^^^^^^^^
+
+``[VECTOR, numelts, eltty]``
+
+The ``VECTOR`` record (code 12) adds a vector type to the type table.  The
+operand fields are
+
+* *numelts*: The number of elements in vectors of this type
+
+* *eltty*: The type index of the vector element type
+
+TYPE_CODE_X86_FP80 Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[X86_FP80]``
+
+The ``X86_FP80`` record (code 13) adds an ``x86_fp80`` (80-bit floating point)
+type to the type table.
+
+TYPE_CODE_FP128 Record
+^^^^^^^^^^^^^^^^^^^^^^
+
+``[FP128]``
+
+The ``FP128`` record (code 14) adds an ``fp128`` (128-bit floating point) type
+to the type table.
+
+TYPE_CODE_PPC_FP128 Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[PPC_FP128]``
+
+The ``PPC_FP128`` record (code 15) adds a ``ppc_fp128`` (128-bit floating point)
+type to the type table.
+
+TYPE_CODE_METADATA Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[METADATA]``
+
+The ``METADATA`` record (code 16) adds a ``metadata`` type to the type table.
+
+TYPE_CODE_X86_MMX Record
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[X86_MMX]``
+
+The ``X86_MMX`` record (code 17) adds an ``x86_mmx`` type to the type table.
+
+TYPE_CODE_STRUCT_ANON Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[STRUCT_ANON, ispacked, ...eltty...]``
+
+The ``STRUCT_ANON`` record (code 18) adds a literal struct type to the type
+table. The operand fields are
+
+* *ispacked*: Non-zero if the type represents a packed structure
+
+* *eltty*: Zero or more type indices representing the element types of the
+  structure
+
+TYPE_CODE_STRUCT_NAME Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[STRUCT_NAME, ...string...]``
+
+The ``STRUCT_NAME`` record (code 19) contains a variable number of values
+representing the bytes of a struct name. The next ``OPAQUE`` or
+``STRUCT_NAMED`` record will use this name.
+
+TYPE_CODE_STRUCT_NAMED Record
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[STRUCT_NAMED, ispacked, ...eltty...]``
+
+The ``STRUCT_NAMED`` record (code 20) adds an identified struct type to the
+type table, with a name defined by a previously encountered ``STRUCT_NAME``
+record. The operand fields are
+
+* *ispacked*: Non-zero if the type represents a packed structure
+
+* *eltty*: Zero or more type indices representing the element types of the
+  structure
+
+TYPE_CODE_FUNCTION Record
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``[FUNCTION, vararg, retty, ...paramty... ]``
+
+The ``FUNCTION`` record (code 21) adds a function type to the type table. The
+operand fields are
+
+* *vararg*: Non-zero if the type represents a varargs function
+
+* *retty*: The type index of the function's return type
+
+* *paramty*: Zero or more type indices representing the parameter types of the
+  function
+
+.. _CONSTANTS_BLOCK:
+
+CONSTANTS_BLOCK Contents
+------------------------
+
+The ``CONSTANTS_BLOCK`` block (id 11) ...
+
+.. _FUNCTION_BLOCK:
+
+FUNCTION_BLOCK Contents
+-----------------------
+
+The ``FUNCTION_BLOCK`` block (id 12) ...
+
+In addition to the record types described below, a ``FUNCTION_BLOCK`` block may
+contain the following sub-blocks:
+
+* `CONSTANTS_BLOCK`_
+* `VALUE_SYMTAB_BLOCK`_
+* `METADATA_ATTACHMENT`_
+
+.. _VALUE_SYMTAB_BLOCK:
+
+VALUE_SYMTAB_BLOCK Contents
+---------------------------
+
+The ``VALUE_SYMTAB_BLOCK`` block (id 14) ...
+
+.. _METADATA_BLOCK:
+
+METADATA_BLOCK Contents
+-----------------------
+
+The ``METADATA_BLOCK`` block (id 15) ...
+
+.. _METADATA_ATTACHMENT:
+
+METADATA_ATTACHMENT Contents
+----------------------------
+
+The ``METADATA_ATTACHMENT`` block (id 16) ...
+
+.. _STRTAB_BLOCK:
+
+STRTAB_BLOCK Contents
+---------------------
+
+The ``STRTAB`` block (id 23) contains a single record (``STRTAB_BLOB``, id 1)
+with a single blob operand containing the bitcode file's string table.
+
+Strings in the string table are not null terminated. A record's *strtab
+offset* and *strtab size* operands specify the byte offset and size of a
+string within the string table.
+
+The string table is used by all preceding blocks in the bitcode file that are
+not succeeded by another intervening ``STRTAB`` block. Normally a bitcode
+file will have a single string table, but it may have more than one if it
+was created by binary concatenation of multiple bitcode files.

Added: www-releases/trunk/5.0.1/docs/_sources/BlockFrequencyTerminology.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/BlockFrequencyTerminology.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/BlockFrequencyTerminology.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/BlockFrequencyTerminology.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,130 @@
+================================
+LLVM Block Frequency Terminology
+================================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+Block Frequency is a metric for estimating the relative frequency of different
+basic blocks.  This document describes the terminology that the
+``BlockFrequencyInfo`` and ``MachineBlockFrequencyInfo`` analysis passes use.
+
+Branch Probability
+==================
+
+Blocks with multiple successors have probabilities associated with each
+outgoing edge.  These are called branch probabilities.  For a given block, the
+sum of its outgoing branch probabilities should be 1.0.
+
+Branch Weight
+=============
+
+Rather than storing fractions on each edge, we store an integer weight.
+Weights are relative to the other edges of a given predecessor block.  The
+branch probability associated with a given edge is its own weight divided by
+the sum of the weights on the predecessor's outgoing edges.
+
+For example, consider this IR:
+
+.. code-block:: llvm
+
+   define void @foo() {
+       ; ...
+       A:
+           br i1 %cond, label %B, label %C, !prof !0
+       ; ...
+   }
+   !0 = metadata !{metadata !"branch_weights", i32 7, i32 8}
+
+and this simple graph representation::
+
+   A -> B  (edge-weight: 7)
+   A -> C  (edge-weight: 8)
+
+The probability of branching from block A to block B is 7/15, and the
+probability of branching from block A to block C is 8/15.
+
+See :doc:`BranchWeightMetadata` for details about the branch weight IR
+representation.
+
+Block Frequency
+===============
+
+Block frequency is a relative metric that represents the number of times a
+block executes.  The ratio of a block frequency to the entry block frequency is
+the expected number of times the block will execute per entry to the function.
+
+Block frequency is the main output of the ``BlockFrequencyInfo`` and
+``MachineBlockFrequencyInfo`` analysis passes.
+
+Implementation: a series of DAGs
+================================
+
+The implementation of the block frequency calculation analyses each loop,
+bottom-up, ignoring backedges; i.e., as a DAG.  After each loop is processed,
+it's packaged up to act as a pseudo-node in its parent loop's (or the
+function's) DAG analysis.
+
+Block Mass
+==========
+
+For each DAG, the entry node is assigned a mass of ``UINT64_MAX`` and mass is
+distributed to successors according to branch weights.  Block Mass uses a
+fixed-point representation where ``UINT64_MAX`` represents ``1.0`` and ``0``
+represents a number just above ``0.0``.
+
+After mass is fully distributed, in any cut of the DAG that separates the exit
+nodes from the entry node, the sum of the block masses of the nodes succeeded
+by a cut edge should equal ``UINT64_MAX``.  In other words, mass is conserved
+as it "falls" through the DAG.
+
+If a function's basic block graph is a DAG, then block masses are valid block
+frequencies.  This works poorly in practise though, since downstream users rely
+on adding block frequencies together without hitting the maximum.
+
+Loop Scale
+==========
+
+Loop scale is a metric that indicates how many times a loop iterates per entry.
+As mass is distributed through the loop's DAG, the (otherwise ignored) backedge
+mass is collected.  This backedge mass is used to compute the exit frequency,
+and thus the loop scale.
+
+Implementation: Getting from mass and scale to frequency
+========================================================
+
+After analysing the complete series of DAGs, each block has a mass (local to
+its containing loop, if any), and each loop pseudo-node has a loop scale and
+its own mass (from its parent's DAG).
+
+We can get an initial frequency assignment (with entry frequency of 1.0) by
+multiplying these masses and loop scales together.  A given block's frequency
+is the product of its mass, the mass of containing loops' pseudo nodes, and the
+containing loops' loop scales.
+
+Since downstream users need integers (not floating point), this initial
+frequency assignment is shifted as necessary into the range of ``uint64_t``.
+
+Block Bias
+==========
+
+Block bias is a proposed *absolute* metric to indicate a bias toward or away
+from a given block during a function's execution.  The idea is that bias can be
+used in isolation to indicate whether a block is relatively hot or cold, or to
+compare two blocks to indicate whether one is hotter or colder than the other.
+
+The proposed calculation involves calculating a *reference* block frequency,
+where:
+
+* every branch weight is assumed to be 1 (i.e., every branch probability
+  distribution is even) and
+
+* loop scales are ignored.
+
+This reference frequency represents what the block frequency would be in an
+unbiased graph.
+
+The bias is the ratio of the block frequency to this reference block frequency.

Added: www-releases/trunk/5.0.1/docs/_sources/BranchWeightMetadata.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/BranchWeightMetadata.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/BranchWeightMetadata.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/BranchWeightMetadata.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,164 @@
+===========================
+LLVM Branch Weight Metadata
+===========================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+Branch Weight Metadata represents branch weights as its likeliness to be taken
+(see :doc:`BlockFrequencyTerminology`). Metadata is assigned to the
+``TerminatorInst`` as a ``MDNode`` of the ``MD_prof`` kind. The first operator
+is always a ``MDString`` node with the string "branch_weights".  Number of
+operators depends on the terminator type.
+
+Branch weights might be fetch from the profiling file, or generated based on
+`__builtin_expect`_ instruction.
+
+All weights are represented as an unsigned 32-bit values, where higher value
+indicates greater chance to be taken.
+
+Supported Instructions
+======================
+
+``BranchInst``
+^^^^^^^^^^^^^^
+
+Metadata is only assigned to the conditional branches. There are two extra
+operands for the true and the false branch.
+
+.. code-block:: none
+
+  !0 = metadata !{
+    metadata !"branch_weights",
+    i32 <TRUE_BRANCH_WEIGHT>,
+    i32 <FALSE_BRANCH_WEIGHT>
+  }
+
+``SwitchInst``
+^^^^^^^^^^^^^^
+
+Branch weights are assigned to every case (including the ``default`` case which
+is always case #0).
+
+.. code-block:: none
+
+  !0 = metadata !{
+    metadata !"branch_weights",
+    i32 <DEFAULT_BRANCH_WEIGHT>
+    [ , i32 <CASE_BRANCH_WEIGHT> ... ]
+  }
+
+``IndirectBrInst``
+^^^^^^^^^^^^^^^^^^
+
+Branch weights are assigned to every destination.
+
+.. code-block:: none
+
+  !0 = metadata !{
+    metadata !"branch_weights",
+    i32 <LABEL_BRANCH_WEIGHT>
+    [ , i32 <LABEL_BRANCH_WEIGHT> ... ]
+  }
+
+``CallInst``
+^^^^^^^^^^^^^^^^^^
+
+Calls may have branch weight metadata, containing the execution count of
+the call. It is currently used in SamplePGO mode only, to augment the
+block and entry counts which may not be accurate with sampling.
+
+.. code-block:: none
+
+  !0 = metadata !{
+    metadata !"branch_weights",
+    i32 <CALL_BRANCH_WEIGHT>
+  }
+
+Other
+^^^^^
+
+Other terminator instructions are not allowed to contain Branch Weight Metadata.
+
+.. _\__builtin_expect:
+
+Built-in ``expect`` Instructions
+================================
+
+``__builtin_expect(long exp, long c)`` instruction provides branch prediction
+information. The return value is the value of ``exp``.
+
+It is especially useful in conditional statements. Currently Clang supports two
+conditional statements:
+
+``if`` statement
+^^^^^^^^^^^^^^^^
+
+The ``exp`` parameter is the condition. The ``c`` parameter is the expected
+comparison value. If it is equal to 1 (true), the condition is likely to be
+true, in other case condition is likely to be false. For example:
+
+.. code-block:: c++
+
+  if (__builtin_expect(x > 0, 1)) {
+    // This block is likely to be taken.
+  }
+
+``switch`` statement
+^^^^^^^^^^^^^^^^^^^^
+
+The ``exp`` parameter is the value. The ``c`` parameter is the expected
+value. If the expected value doesn't show on the cases list, the ``default``
+case is assumed to be likely taken.
+
+.. code-block:: c++
+
+  switch (__builtin_expect(x, 5)) {
+  default: break;
+  case 0:  // ...
+  case 3:  // ...
+  case 5:  // This case is likely to be taken.
+  }
+
+CFG Modifications
+=================
+
+Branch Weight Metatada is not proof against CFG changes. If terminator operands'
+are changed some action should be taken. In other case some misoptimizations may
+occur due to incorrect branch prediction information.
+
+Function Entry Counts
+=====================
+
+To allow comparing different functions during inter-procedural analysis and
+optimization, ``MD_prof`` nodes can also be assigned to a function definition.
+The first operand is a string indicating the name of the associated counter.
+
+Currently, one counter is supported: "function_entry_count". The second operand
+is a 64-bit counter that indicates the number of times that this function was
+invoked (in the case of instrumentation-based profiles). In the case of
+sampling-based profiles, this operand is an approximation of how many times
+the function was invoked.
+
+For example, in the code below, the instrumentation for function foo()
+indicates that it was called 2,590 times at runtime.
+
+.. code-block:: llvm
+
+  define i32 @foo() !prof !1 {
+    ret i32 0
+  }
+  !1 = !{!"function_entry_count", i64 2590}
+
+If "function_entry_count" has more than 2 operands, the later operands are
+the GUID of the functions that needs to be imported by ThinLTO. This is only
+set by sampling based profile. It is needed because the sampling based profile
+was collected on a binary that had already imported and inlined these functions,
+and we need to ensure the IR matches in the ThinLTO backends for profile
+annotation. The reason why we cannot annotate this on the callsite is that it
+can only goes down 1 level in the call chain. For the cases where
+foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels
+in the call chain to import both bar_in_b_cc and baz_in_c_cc.

Added: www-releases/trunk/5.0.1/docs/_sources/Bugpoint.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/Bugpoint.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/Bugpoint.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/Bugpoint.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,216 @@
+====================================
+LLVM bugpoint tool: design and usage
+====================================
+
+.. contents::
+   :local:
+
+Description
+===========
+
+``bugpoint`` narrows down the source of problems in LLVM tools and passes.  It
+can be used to debug three types of failures: optimizer crashes, miscompilations
+by optimizers, or bad native code generation (including problems in the static
+and JIT compilers).  It aims to reduce large test cases to small, useful ones.
+For example, if ``opt`` crashes while optimizing a file, it will identify the
+optimization (or combination of optimizations) that causes the crash, and reduce
+the file down to a small example which triggers the crash.
+
+For detailed case scenarios, such as debugging ``opt``, or one of the LLVM code
+generators, see :doc:`HowToSubmitABug`.
+
+Design Philosophy
+=================
+
+``bugpoint`` is designed to be a useful tool without requiring any hooks into
+the LLVM infrastructure at all.  It works with any and all LLVM passes and code
+generators, and does not need to "know" how they work.  Because of this, it may
+appear to do stupid things or miss obvious simplifications.  ``bugpoint`` is
+also designed to trade off programmer time for computer time in the
+compiler-debugging process; consequently, it may take a long period of
+(unattended) time to reduce a test case, but we feel it is still worth it. Note
+that ``bugpoint`` is generally very quick unless debugging a miscompilation
+where each test of the program (which requires executing it) takes a long time.
+
+Automatic Debugger Selection
+----------------------------
+
+``bugpoint`` reads each ``.bc`` or ``.ll`` file specified on the command line
+and links them together into a single module, called the test program.  If any
+LLVM passes are specified on the command line, it runs these passes on the test
+program.  If any of the passes crash, or if they produce malformed output (which
+causes the verifier to abort), ``bugpoint`` starts the `crash debugger`_.
+
+Otherwise, if the ``-output`` option was not specified, ``bugpoint`` runs the
+test program with the "safe" backend (which is assumed to generate good code) to
+generate a reference output.  Once ``bugpoint`` has a reference output for the
+test program, it tries executing it with the selected code generator.  If the
+selected code generator crashes, ``bugpoint`` starts the `crash debugger`_ on
+the code generator.  Otherwise, if the resulting output differs from the
+reference output, it assumes the difference resulted from a code generator
+failure, and starts the `code generator debugger`_.
+
+Finally, if the output of the selected code generator matches the reference
+output, ``bugpoint`` runs the test program after all of the LLVM passes have
+been applied to it.  If its output differs from the reference output, it assumes
+the difference resulted from a failure in one of the LLVM passes, and enters the
+`miscompilation debugger`_.  Otherwise, there is no problem ``bugpoint`` can
+debug.
+
+.. _crash debugger:
+
+Crash debugger
+--------------
+
+If an optimizer or code generator crashes, ``bugpoint`` will try as hard as it
+can to reduce the list of passes (for optimizer crashes) and the size of the
+test program.  First, ``bugpoint`` figures out which combination of optimizer
+passes triggers the bug. This is useful when debugging a problem exposed by
+``opt``, for example, because it runs over 38 passes.
+
+Next, ``bugpoint`` tries removing functions from the test program, to reduce its
+size.  Usually it is able to reduce a test program to a single function, when
+debugging intraprocedural optimizations.  Once the number of functions has been
+reduced, it attempts to delete various edges in the control flow graph, to
+reduce the size of the function as much as possible.  Finally, ``bugpoint``
+deletes any individual LLVM instructions whose absence does not eliminate the
+failure.  At the end, ``bugpoint`` should tell you what passes crash, give you a
+bitcode file, and give you instructions on how to reproduce the failure with
+``opt`` or ``llc``.
+
+.. _code generator debugger:
+
+Code generator debugger
+-----------------------
+
+The code generator debugger attempts to narrow down the amount of code that is
+being miscompiled by the selected code generator.  To do this, it takes the test
+program and partitions it into two pieces: one piece which it compiles with the
+"safe" backend (into a shared object), and one piece which it runs with either
+the JIT or the static LLC compiler.  It uses several techniques to reduce the
+amount of code pushed through the LLVM code generator, to reduce the potential
+scope of the problem.  After it is finished, it emits two bitcode files (called
+"test" [to be compiled with the code generator] and "safe" [to be compiled with
+the "safe" backend], respectively), and instructions for reproducing the
+problem.  The code generator debugger assumes that the "safe" backend produces
+good code.
+
+.. _miscompilation debugger:
+
+Miscompilation debugger
+-----------------------
+
+The miscompilation debugger works similarly to the code generator debugger.  It
+works by splitting the test program into two pieces, running the optimizations
+specified on one piece, linking the two pieces back together, and then executing
+the result.  It attempts to narrow down the list of passes to the one (or few)
+which are causing the miscompilation, then reduce the portion of the test
+program which is being miscompiled.  The miscompilation debugger assumes that
+the selected code generator is working properly.
+
+Advice for using bugpoint
+=========================
+
+``bugpoint`` can be a remarkably useful tool, but it sometimes works in
+non-obvious ways.  Here are some hints and tips:
+
+* In the code generator and miscompilation debuggers, ``bugpoint`` only works
+  with programs that have deterministic output.  Thus, if the program outputs
+  ``argv[0]``, the date, time, or any other "random" data, ``bugpoint`` may
+  misinterpret differences in these data, when output, as the result of a
+  miscompilation.  Programs should be temporarily modified to disable outputs
+  that are likely to vary from run to run.
+
+* In the code generator and miscompilation debuggers, debugging will go faster
+  if you manually modify the program or its inputs to reduce the runtime, but
+  still exhibit the problem.
+
+* ``bugpoint`` is extremely useful when working on a new optimization: it helps
+  track down regressions quickly.  To avoid having to relink ``bugpoint`` every
+  time you change your optimization however, have ``bugpoint`` dynamically load
+  your optimization with the ``-load`` option.
+
+* ``bugpoint`` can generate a lot of output and run for a long period of time.
+  It is often useful to capture the output of the program to file.  For example,
+  in the C shell, you can run:
+
+  .. code-block:: console
+
+    $ bugpoint  ... |& tee bugpoint.log
+
+  to get a copy of ``bugpoint``'s output in the file ``bugpoint.log``, as well
+  as on your terminal.
+
+* ``bugpoint`` cannot debug problems with the LLVM linker. If ``bugpoint``
+  crashes before you see its "All input ok" message, you might try ``llvm-link
+  -v`` on the same set of input files. If that also crashes, you may be
+  experiencing a linker bug.
+
+* ``bugpoint`` is useful for proactively finding bugs in LLVM.  Invoking
+  ``bugpoint`` with the ``-find-bugs`` option will cause the list of specified
+  optimizations to be randomized and applied to the program. This process will
+  repeat until a bug is found or the user kills ``bugpoint``.
+
+What to do when bugpoint isn't enough
+=====================================
+	
+Sometimes, ``bugpoint`` is not enough. In particular, InstCombine and
+TargetLowering both have visitor structured code with lots of potential
+transformations.  If the process of using bugpoint has left you with still too
+much code to figure out and the problem seems to be in instcombine, the
+following steps may help.  These same techniques are useful with TargetLowering
+as well.
+
+Turn on ``-debug-only=instcombine`` and see which transformations within
+instcombine are firing by selecting out lines with "``IC``" in them.
+
+At this point, you have a decision to make.  Is the number of transformations
+small enough to step through them using a debugger?  If so, then try that.
+
+If there are too many transformations, then a source modification approach may
+be helpful.  In this approach, you can modify the source code of instcombine to
+disable just those transformations that are being performed on your test input
+and perform a binary search over the set of transformations.  One set of places
+to modify are the "``visit*``" methods of ``InstCombiner`` (*e.g.*
+``visitICmpInst``) by adding a "``return false``" as the first line of the
+method.
+
+If that still doesn't remove enough, then change the caller of
+``InstCombiner::DoOneIteration``, ``InstCombiner::runOnFunction`` to limit the
+number of iterations.
+
+You may also find it useful to use "``-stats``" now to see what parts of
+instcombine are firing.  This can guide where to put additional reporting code.
+
+At this point, if the amount of transformations is still too large, then
+inserting code to limit whether or not to execute the body of the code in the
+visit function can be helpful.  Add a static counter which is incremented on
+every invocation of the function.  Then add code which simply returns false on
+desired ranges.  For example:
+
+.. code-block:: c++
+
+
+  static int calledCount = 0;
+  calledCount++;
+  DEBUG(if (calledCount < 212) return false);
+  DEBUG(if (calledCount > 217) return false);
+  DEBUG(if (calledCount == 213) return false);
+  DEBUG(if (calledCount == 214) return false);
+  DEBUG(if (calledCount == 215) return false);
+  DEBUG(if (calledCount == 216) return false);
+  DEBUG(dbgs() << "visitXOR calledCount: " << calledCount << "\n");
+  DEBUG(dbgs() << "I: "; I->dump());
+
+could be added to ``visitXOR`` to limit ``visitXor`` to being applied only to
+calls 212 and 217. This is from an actual test case and raises an important
+point---a simple binary search may not be sufficient, as transformations that
+interact may require isolating more than one call.  In TargetLowering, use
+``return SDNode();`` instead of ``return false;``.
+
+Now that the number of transformations is down to a manageable number, try
+examining the output to see if you can figure out which transformations are
+being done.  If that can be figured out, then do the usual debugging.  If which
+code corresponds to the transformation being performed isn't obvious, set a
+breakpoint after the call count based disabling and step through the code.
+Alternatively, you can use "``printf``" style debugging to report waypoints.

Added: www-releases/trunk/5.0.1/docs/_sources/CMake.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/CMake.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/CMake.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/CMake.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,784 @@
+========================
+Building LLVM with CMake
+========================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+`CMake <http://www.cmake.org/>`_ is a cross-platform build-generator tool. CMake
+does not build the project, it generates the files needed by your build tool
+(GNU make, Visual Studio, etc.) for building LLVM.
+
+If **you are a new contributor**, please start with the :doc:`GettingStarted` 
+page.  This page is geared for existing contributors moving from the 
+legacy configure/make system.
+
+If you are really anxious about getting a functional LLVM build, go to the
+`Quick start`_ section. If you are a CMake novice, start with `Basic CMake usage`_
+and then go back to the `Quick start`_ section once you know what you are doing. The
+`Options and variables`_ section is a reference for customizing your build. If
+you already have experience with CMake, this is the recommended starting point.
+
+This page is geared towards users of the LLVM CMake build. If you're looking for
+information about modifying the LLVM CMake build system you may want to see the
+:doc:`CMakePrimer` page. It has a basic overview of the CMake language.
+
+.. _Quick start:
+
+Quick start
+===========
+
+We use here the command-line, non-interactive CMake interface.
+
+#. `Download <http://www.cmake.org/cmake/resources/software.html>`_ and install
+   CMake. Version 3.4.3 is the minimum required.
+
+#. Open a shell. Your development tools must be reachable from this shell
+   through the PATH environment variable.
+
+#. Create a build directory. Building LLVM in the source
+   directory is not supported. cd to this directory:
+
+   .. code-block:: console
+
+     $ mkdir mybuilddir
+     $ cd mybuilddir
+
+#. Execute this command in the shell replacing `path/to/llvm/source/root` with
+   the path to the root of your LLVM source tree:
+
+   .. code-block:: console
+
+     $ cmake path/to/llvm/source/root
+
+   CMake will detect your development environment, perform a series of tests, and
+   generate the files required for building LLVM. CMake will use default values
+   for all build parameters. See the `Options and variables`_ section for
+   a list of build parameters that you can modify.
+
+   This can fail if CMake can't detect your toolset, or if it thinks that the
+   environment is not sane enough. In this case, make sure that the toolset that
+   you intend to use is the only one reachable from the shell, and that the shell
+   itself is the correct one for your development environment. CMake will refuse
+   to build MinGW makefiles if you have a POSIX shell reachable through the PATH
+   environment variable, for instance. You can force CMake to use a given build
+   tool; for instructions, see the `Usage`_ section, below.
+
+#. After CMake has finished running, proceed to use IDE project files, or start
+   the build from the build directory:
+
+   .. code-block:: console
+
+     $ cmake --build .
+
+   The ``--build`` option tells ``cmake`` to invoke the underlying build
+   tool (``make``, ``ninja``, ``xcodebuild``, ``msbuild``, etc.)
+
+   The underlying build tool can be invoked directly, of course, but
+   the ``--build`` option is portable.
+
+#. After LLVM has finished building, install it from the build directory:
+
+   .. code-block:: console
+
+     $ cmake --build . --target install
+
+   The ``--target`` option with ``install`` parameter in addition to
+   the ``--build`` option tells ``cmake`` to build the ``install`` target.
+
+   It is possible to set a different install prefix at installation time
+   by invoking the ``cmake_install.cmake`` script generated in the
+   build directory:
+
+   .. code-block:: console
+
+     $ cmake -DCMAKE_INSTALL_PREFIX=/tmp/llvm -P cmake_install.cmake
+
+.. _Basic CMake usage:
+.. _Usage:
+
+Basic CMake usage
+=================
+
+This section explains basic aspects of CMake
+which you may need in your day-to-day usage.
+
+CMake comes with extensive documentation, in the form of html files, and as
+online help accessible via the ``cmake`` executable itself. Execute ``cmake
+--help`` for further help options.
+
+CMake allows you to specify a build tool (e.g., GNU make, Visual Studio,
+or Xcode). If not specified on the command line, CMake tries to guess which
+build tool to use, based on your environment. Once it has identified your
+build tool, CMake uses the corresponding *Generator* to create files for your
+build tool (e.g., Makefiles or Visual Studio or Xcode project files). You can
+explicitly specify the generator with the command line option ``-G "Name of the
+generator"``. To see a list of the available generators on your system, execute
+
+.. code-block:: console
+
+  $ cmake --help
+
+This will list the generator names at the end of the help text.
+
+Generators' names are case-sensitive, and may contain spaces. For this reason,
+you should enter them exactly as they are listed in the ``cmake --help``
+output, in quotes. For example, to generate project files specifically for
+Visual Studio 12, you can execute:
+
+.. code-block:: console
+
+  $ cmake -G "Visual Studio 12" path/to/llvm/source/root
+
+For a given development platform there can be more than one adequate
+generator. If you use Visual Studio, "NMake Makefiles" is a generator you can use
+for building with NMake. By default, CMake chooses the most specific generator
+supported by your development environment. If you want an alternative generator,
+you must tell this to CMake with the ``-G`` option.
+
+.. todo::
+
+  Explain variables and cache. Move explanation here from #options section.
+
+.. _Options and variables:
+
+Options and variables
+=====================
+
+Variables customize how the build will be generated. Options are boolean
+variables, with possible values ON/OFF. Options and variables are defined on the
+CMake command line like this:
+
+.. code-block:: console
+
+  $ cmake -DVARIABLE=value path/to/llvm/source
+
+You can set a variable after the initial CMake invocation to change its
+value. You can also undefine a variable:
+
+.. code-block:: console
+
+  $ cmake -UVARIABLE path/to/llvm/source
+
+Variables are stored in the CMake cache. This is a file named ``CMakeCache.txt``
+stored at the root of your build directory that is generated by ``cmake``.
+Editing it yourself is not recommended.
+
+Variables are listed in the CMake cache and later in this document with
+the variable name and type separated by a colon. You can also specify the
+variable and type on the CMake command line:
+
+.. code-block:: console
+
+  $ cmake -DVARIABLE:TYPE=value path/to/llvm/source
+
+Frequently-used CMake variables
+-------------------------------
+
+Here are some of the CMake variables that are used often, along with a
+brief explanation and LLVM-specific notes. For full documentation, consult the
+CMake manual, or execute ``cmake --help-variable VARIABLE_NAME``.
+
+**CMAKE_BUILD_TYPE**:STRING
+  Sets the build type for ``make``-based generators. Possible values are
+  Release, Debug, RelWithDebInfo and MinSizeRel. If you are using an IDE such as
+  Visual Studio, you should use the IDE settings to set the build type.
+  Be aware that Release and RelWithDebInfo use different optimization levels on
+  most platforms.
+
+**CMAKE_INSTALL_PREFIX**:PATH
+  Path where LLVM will be installed if "make install" is invoked or the
+  "install" target is built.
+
+**LLVM_LIBDIR_SUFFIX**:STRING
+  Extra suffix to append to the directory where libraries are to be
+  installed. On a 64-bit architecture, one could use ``-DLLVM_LIBDIR_SUFFIX=64``
+  to install libraries to ``/usr/lib64``.
+
+**CMAKE_C_FLAGS**:STRING
+  Extra flags to use when compiling C source files.
+
+**CMAKE_CXX_FLAGS**:STRING
+  Extra flags to use when compiling C++ source files.
+
+.. _LLVM-specific variables:
+
+LLVM-specific variables
+-----------------------
+
+**LLVM_TARGETS_TO_BUILD**:STRING
+  Semicolon-separated list of targets to build, or *all* for building all
+  targets. Case-sensitive. Defaults to *all*. Example:
+  ``-DLLVM_TARGETS_TO_BUILD="X86;PowerPC"``.
+
+**LLVM_BUILD_TOOLS**:BOOL
+  Build LLVM tools. Defaults to ON. Targets for building each tool are generated
+  in any case. You can build a tool separately by invoking its target. For
+  example, you can build *llvm-as* with a Makefile-based system by executing *make
+  llvm-as* at the root of your build directory.
+
+**LLVM_INCLUDE_TOOLS**:BOOL
+  Generate build targets for the LLVM tools. Defaults to ON. You can use this
+  option to disable the generation of build targets for the LLVM tools.
+
+**LLVM_BUILD_EXAMPLES**:BOOL
+  Build LLVM examples. Defaults to OFF. Targets for building each example are
+  generated in any case. See documentation for *LLVM_BUILD_TOOLS* above for more
+  details.
+
+**LLVM_INCLUDE_EXAMPLES**:BOOL
+  Generate build targets for the LLVM examples. Defaults to ON. You can use this
+  option to disable the generation of build targets for the LLVM examples.
+
+**LLVM_BUILD_TESTS**:BOOL
+  Build LLVM unit tests. Defaults to OFF. Targets for building each unit test
+  are generated in any case. You can build a specific unit test using the
+  targets defined under *unittests*, such as ADTTests, IRTests, SupportTests,
+  etc. (Search for ``add_llvm_unittest`` in the subdirectories of *unittests*
+  for a complete list of unit tests.) It is possible to build all unit tests
+  with the target *UnitTests*.
+
+**LLVM_INCLUDE_TESTS**:BOOL
+  Generate build targets for the LLVM unit tests. Defaults to ON. You can use
+  this option to disable the generation of build targets for the LLVM unit
+  tests.
+
+**LLVM_APPEND_VC_REV**:BOOL
+  Embed version control revision info (svn revision number or Git revision id).
+  The version info is provided by the ``LLVM_REVISION`` macro in
+  ``llvm/include/llvm/Support/VCSRevision.h``. Developers using git who don't
+  need revision info can disable this option to avoid re-linking most binaries
+  after a branch switch. Defaults to ON.
+
+**LLVM_ENABLE_THREADS**:BOOL
+  Build with threads support, if available. Defaults to ON.
+
+**LLVM_ENABLE_CXX1Y**:BOOL
+  Build in C++1y mode, if available. Defaults to OFF.
+
+**LLVM_ENABLE_ASSERTIONS**:BOOL
+  Enables code assertions. Defaults to ON if and only if ``CMAKE_BUILD_TYPE``
+  is *Debug*.
+
+**LLVM_ENABLE_EH**:BOOL
+  Build LLVM with exception-handling support. This is necessary if you wish to
+  link against LLVM libraries and make use of C++ exceptions in your own code
+  that need to propagate through LLVM code. Defaults to OFF.
+
+**LLVM_ENABLE_EXPENSIVE_CHECKS**:BOOL
+  Enable additional time/memory expensive checking. Defaults to OFF.
+
+**LLVM_ENABLE_PIC**:BOOL
+  Add the ``-fPIC`` flag to the compiler command-line, if the compiler supports
+  this flag. Some systems, like Windows, do not need this flag. Defaults to ON.
+
+**LLVM_ENABLE_RTTI**:BOOL
+  Build LLVM with run-time type information. Defaults to OFF.
+
+**LLVM_ENABLE_WARNINGS**:BOOL
+  Enable all compiler warnings. Defaults to ON.
+
+**LLVM_ENABLE_PEDANTIC**:BOOL
+  Enable pedantic mode. This disables compiler-specific extensions, if
+  possible. Defaults to ON.
+
+**LLVM_ENABLE_WERROR**:BOOL
+  Stop and fail the build, if a compiler warning is triggered. Defaults to OFF.
+
+**LLVM_ABI_BREAKING_CHECKS**:STRING
+  Used to decide if LLVM should be built with ABI breaking checks or
+  not.  Allowed values are `WITH_ASSERTS` (default), `FORCE_ON` and
+  `FORCE_OFF`.  `WITH_ASSERTS` turns on ABI breaking checks in an
+  assertion enabled build.  `FORCE_ON` (`FORCE_OFF`) turns them on
+  (off) irrespective of whether normal (`NDEBUG`-based) assertions are
+  enabled or not.  A version of LLVM built with ABI breaking checks
+  is not ABI compatible with a version built without it.
+
+**LLVM_BUILD_32_BITS**:BOOL
+  Build 32-bit executables and libraries on 64-bit systems. This option is
+  available only on some 64-bit Unix systems. Defaults to OFF.
+
+**LLVM_TARGET_ARCH**:STRING
+  LLVM target to use for native code generation. This is required for JIT
+  generation. It defaults to "host", meaning that it shall pick the architecture
+  of the machine where LLVM is being built. If you are cross-compiling, set it
+  to the target architecture name.
+
+**LLVM_TABLEGEN**:STRING
+  Full path to a native TableGen executable (usually named ``llvm-tblgen``). This is
+  intended for cross-compiling: if the user sets this variable, no native
+  TableGen will be created.
+
+**LLVM_LIT_ARGS**:STRING
+  Arguments given to lit.  ``make check`` and ``make clang-test`` are affected.
+  By default, ``'-sv --no-progress-bar'`` on Visual C++ and Xcode, ``'-sv'`` on
+  others.
+
+**LLVM_LIT_TOOLS_DIR**:PATH
+  The path to GnuWin32 tools for tests. Valid on Windows host.  Defaults to
+  the empty string, in which case lit will look for tools needed for tests
+  (e.g. ``grep``, ``sort``, etc.) in your %PATH%. If GnuWin32 is not in your
+  %PATH%, then you can set this variable to the GnuWin32 directory so that
+  lit can find tools needed for tests in that directory.
+
+**LLVM_ENABLE_FFI**:BOOL
+  Indicates whether the LLVM Interpreter will be linked with the Foreign Function
+  Interface library (libffi) in order to enable calling external functions.
+  If the library or its headers are installed in a custom
+  location, you can also set the variables FFI_INCLUDE_DIR and
+  FFI_LIBRARY_DIR to the directories where ffi.h and libffi.so can be found,
+  respectively. Defaults to OFF.
+
+**LLVM_EXTERNAL_{CLANG,LLD,POLLY}_SOURCE_DIR**:PATH
+  These variables specify the path to the source directory for the external
+  LLVM projects Clang, lld, and Polly, respectively, relative to the top-level
+  source directory.  If the in-tree subdirectory for an external project
+  exists (e.g., llvm/tools/clang for Clang), then the corresponding variable
+  will not be used.  If the variable for an external project does not point
+  to a valid path, then that project will not be built.
+
+**LLVM_ENABLE_PROJECTS**:STRING
+  Semicolon-separated list of projects to build, or *all* for building all
+  (clang, libcxx, libcxxabi, lldb, compiler-rt, lld, polly) projects.
+  This flag assumes that projects are checked out side-by-side and not nested,
+  i.e. clang needs to be in parallel of llvm instead of nested in `llvm/tools`.
+  This feature allows to have one build for only LLVM and another for clang+llvm
+  using the same source checkout.
+
+**LLVM_EXTERNAL_PROJECTS**:STRING
+  Semicolon-separated list of additional external projects to build as part of
+  llvm. For each project LLVM_EXTERNAL_<NAME>_SOURCE_DIR have to be specified
+  with the path for the source code of the project. Example:
+  ``-DLLVM_EXTERNAL_PROJECTS="Foo;Bar"
+  -DLLVM_EXTERNAL_FOO_SOURCE_DIR=/src/foo
+  -DLLVM_EXTERNAL_BAR_SOURCE_DIR=/src/bar``.
+
+**LLVM_USE_OPROFILE**:BOOL
+  Enable building OProfile JIT support. Defaults to OFF.
+
+**LLVM_PROFDATA_FILE**:PATH
+  Path to a profdata file to pass into clang's -fprofile-instr-use flag. This
+  can only be specified if you're building with clang.
+
+**LLVM_USE_INTEL_JITEVENTS**:BOOL
+  Enable building support for Intel JIT Events API. Defaults to OFF.
+
+**LLVM_ENABLE_ZLIB**:BOOL
+  Enable building with zlib to support compression/uncompression in LLVM tools.
+  Defaults to ON.
+
+**LLVM_ENABLE_DIA_SDK**:BOOL
+  Enable building with MSVC DIA SDK for PDB debugging support. Available
+  only with MSVC. Defaults to ON.
+
+**LLVM_USE_SANITIZER**:STRING
+  Define the sanitizer used to build LLVM binaries and tests. Possible values
+  are ``Address``, ``Memory``, ``MemoryWithOrigins``, ``Undefined``, ``Thread``,
+  and ``Address;Undefined``. Defaults to empty string.
+
+**LLVM_ENABLE_LTO**:STRING
+  Add ``-flto`` or ``-flto=`` flags to the compile and link command
+  lines, enabling link-time optimization. Possible values are ``Off``,
+  ``On``, ``Thin`` and ``Full``. Defaults to OFF.
+
+**LLVM_USE_LINKER**:STRING
+  Add ``-fuse-ld={name}`` to the link invocation. The possible value depend on
+  your compiler, for clang the value can be an absolute path to your custom
+  linker, otherwise clang will prefix the name with ``ld.`` and apply its usual
+  search. For example to link LLVM with the Gold linker, cmake can be invoked
+  with ``-DLLVM_USE_LINKER=gold``.
+
+**LLVM_ENABLE_LLD**:BOOL
+  This option is equivalent to `-DLLVM_USE_LINKER=lld`, except during a 2-stage
+  build where a dependency is added from the first stage to the second ensuring
+  that lld is built before stage2 begins.
+
+**LLVM_PARALLEL_COMPILE_JOBS**:STRING
+  Define the maximum number of concurrent compilation jobs.
+
+**LLVM_PARALLEL_LINK_JOBS**:STRING
+  Define the maximum number of concurrent link jobs.
+
+**LLVM_BUILD_DOCS**:BOOL
+  Adds all *enabled* documentation targets (i.e. Doxgyen and Sphinx targets) as
+  dependencies of the default build targets.  This results in all of the (enabled)
+  documentation targets being as part of a normal build.  If the ``install`` 
+  target is run then this also enables all built documentation targets to be 
+  installed. Defaults to OFF.  To enable a particular documentation target, see 
+  see LLVM_ENABLE_SPHINX and LLVM_ENABLE_DOXYGEN.  
+
+**LLVM_ENABLE_DOXYGEN**:BOOL
+  Enables the generation of browsable HTML documentation using doxygen.
+  Defaults to OFF.
+
+**LLVM_ENABLE_DOXYGEN_QT_HELP**:BOOL
+  Enables the generation of a Qt Compressed Help file. Defaults to OFF.
+  This affects the make target ``doxygen-llvm``. When enabled, apart from
+  the normal HTML output generated by doxygen, this will produce a QCH file
+  named ``org.llvm.qch``. You can then load this file into Qt Creator.
+  This option is only useful in combination with ``-DLLVM_ENABLE_DOXYGEN=ON``;
+  otherwise this has no effect.
+
+**LLVM_DOXYGEN_QCH_FILENAME**:STRING
+  The filename of the Qt Compressed Help file that will be generated when
+  ``-DLLVM_ENABLE_DOXYGEN=ON`` and
+  ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON`` are given. Defaults to
+  ``org.llvm.qch``.
+  This option is only useful in combination with
+  ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``;
+  otherwise it has no effect.
+
+**LLVM_DOXYGEN_QHP_NAMESPACE**:STRING
+  Namespace under which the intermediate Qt Help Project file lives. See `Qt
+  Help Project`_
+  for more information. Defaults to "org.llvm". This option is only useful in
+  combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise
+  it has no effect.
+
+**LLVM_DOXYGEN_QHP_CUST_FILTER_NAME**:STRING
+  See `Qt Help Project`_ for
+  more information. Defaults to the CMake variable ``${PACKAGE_STRING}`` which
+  is a combination of the package name and version string. This filter can then
+  be used in Qt Creator to select only documentation from LLVM when browsing
+  through all the help files that you might have loaded. This option is only
+  useful in combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``;
+  otherwise it has no effect.
+
+.. _Qt Help Project: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-filters
+
+**LLVM_DOXYGEN_QHELPGENERATOR_PATH**:STRING
+  The path to the ``qhelpgenerator`` executable. Defaults to whatever CMake's
+  ``find_program()`` can find. This option is only useful in combination with
+  ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise it has no
+  effect.
+
+**LLVM_DOXYGEN_SVG**:BOOL
+  Uses .svg files instead of .png files for graphs in the Doxygen output.
+  Defaults to OFF.
+
+**LLVM_INSTALL_DOXYGEN_HTML_DIR**:STRING
+  The path to install Doxygen-generated HTML documentation to. This path can
+  either be absolute or relative to the CMAKE_INSTALL_PREFIX. Defaults to
+  `share/doc/llvm/doxygen-html`.
+
+**LLVM_ENABLE_SPHINX**:BOOL
+  If specified, CMake will search for the ``sphinx-build`` executable and will make
+  the ``SPHINX_OUTPUT_HTML`` and ``SPHINX_OUTPUT_MAN`` CMake options available.
+  Defaults to OFF.
+
+**SPHINX_EXECUTABLE**:STRING
+  The path to the ``sphinx-build`` executable detected by CMake.
+  For installation instructions, see
+  http://www.sphinx-doc.org/en/latest/install.html
+
+**SPHINX_OUTPUT_HTML**:BOOL
+  If enabled (and ``LLVM_ENABLE_SPHINX`` is enabled) then the targets for
+  building the documentation as html are added (but not built by default unless
+  ``LLVM_BUILD_DOCS`` is enabled). There is a target for each project in the
+  source tree that uses sphinx (e.g.  ``docs-llvm-html``, ``docs-clang-html``
+  and ``docs-lld-html``). Defaults to ON.
+
+**SPHINX_OUTPUT_MAN**:BOOL
+  If enabled (and ``LLVM_ENABLE_SPHINX`` is enabled) the targets for building
+  the man pages are added (but not built by default unless ``LLVM_BUILD_DOCS``
+  is enabled). Currently the only target added is ``docs-llvm-man``. Defaults
+  to ON.
+
+**SPHINX_WARNINGS_AS_ERRORS**:BOOL
+  If enabled then sphinx documentation warnings will be treated as
+  errors. Defaults to ON.
+
+**LLVM_INSTALL_SPHINX_HTML_DIR**:STRING
+  The path to install Sphinx-generated HTML documentation to. This path can
+  either be absolute or relative to the CMAKE_INSTALL_PREFIX. Defaults to
+  `share/doc/llvm/html`.
+
+**LLVM_INSTALL_OCAMLDOC_HTML_DIR**:STRING
+  The path to install OCamldoc-generated HTML documentation to. This path can
+  either be absolute or relative to the CMAKE_INSTALL_PREFIX. Defaults to
+  `share/doc/llvm/ocaml-html`.
+
+**LLVM_CREATE_XCODE_TOOLCHAIN**:BOOL
+  OS X Only: If enabled CMake will generate a target named
+  'install-xcode-toolchain'. This target will create a directory at
+  $CMAKE_INSTALL_PREFIX/Toolchains containing an xctoolchain directory which can
+  be used to override the default system tools. 
+
+**LLVM_BUILD_LLVM_DYLIB**:BOOL
+  If enabled, the target for building the libLLVM shared library is added.
+  This library contains all of LLVM's components in a single shared library.
+  Defaults to OFF. This cannot be used in conjunction with BUILD_SHARED_LIBS.
+  Tools will only be linked to the libLLVM shared library if LLVM_LINK_LLVM_DYLIB
+  is also ON.
+  The components in the library can be customised by setting LLVM_DYLIB_COMPONENTS
+  to a list of the desired components.
+
+**LLVM_LINK_LLVM_DYLIB**:BOOL
+  If enabled, tools will be linked with the libLLVM shared library. Defaults
+  to OFF. Setting LLVM_LINK_LLVM_DYLIB to ON also sets LLVM_BUILD_LLVM_DYLIB
+  to ON.
+
+**BUILD_SHARED_LIBS**:BOOL
+  Flag indicating if each LLVM component (e.g. Support) is built as a shared
+  library (ON) or as a static library (OFF). Its default value is OFF. On
+  Windows, shared libraries may be used when building with MinGW, including
+  mingw-w64, but not when building with the Microsoft toolchain.
+ 
+  .. note:: BUILD_SHARED_LIBS is only recommended for use by LLVM developers.
+            If you want to build LLVM as a shared library, you should use the
+            ``LLVM_BUILD_LLVM_DYLIB`` option.
+
+**LLVM_OPTIMIZED_TABLEGEN**:BOOL
+  If enabled and building a debug or asserts build the CMake build system will
+  generate a Release build tree to build a fully optimized tablegen for use
+  during the build. Enabling this option can significantly speed up build times
+  especially when building LLVM in Debug configurations.
+
+**LLVM_REVERSE_ITERATION**:BOOL
+  If enabled, all supported unordered llvm containers would be iterated in
+  reverse order. This is useful for uncovering non-determinism caused by
+  iteration of unordered containers.
+
+CMake Caches
+============
+
+Recently LLVM and Clang have been adding some more complicated build system
+features. Utilizing these new features often involves a complicated chain of
+CMake variables passed on the command line. Clang provides a collection of CMake
+cache scripts to make these features more approachable.
+
+CMake cache files are utilized using CMake's -C flag:
+
+.. code-block:: console
+
+  $ cmake -C <path to cache file> <path to sources>
+
+CMake cache scripts are processed in an isolated scope, only cached variables
+remain set when the main configuration runs. CMake cached variables do not reset
+variables that are already set unless the FORCE option is specified.
+
+A few notes about CMake Caches:
+
+- Order of command line arguments is important
+
+  - -D arguments specified before -C are set before the cache is processed and
+    can be read inside the cache file
+  - -D arguments specified after -C are set after the cache is processed and
+    are unset inside the cache file
+
+- All -D arguments will override cache file settings
+- CMAKE_TOOLCHAIN_FILE is evaluated after both the cache file and the command
+  line arguments
+- It is recommended that all -D options should be specified *before* -C
+
+For more information about some of the advanced build configurations supported
+via Cache files see :doc:`AdvancedBuilds`.
+
+Executing the test suite
+========================
+
+Testing is performed when the *check-all* target is built. For instance, if you are
+using Makefiles, execute this command in the root of your build directory:
+
+.. code-block:: console
+
+  $ make check-all
+
+On Visual Studio, you may run tests by building the project "check-all".
+For more information about testing, see the :doc:`TestingGuide`.
+
+Cross compiling
+===============
+
+See `this wiki page <http://www.vtk.org/Wiki/CMake_Cross_Compiling>`_ for
+generic instructions on how to cross-compile with CMake. It goes into detailed
+explanations and may seem daunting, but it is not. On the wiki page there are
+several examples including toolchain files. Go directly to `this section
+<http://www.vtk.org/Wiki/CMake_Cross_Compiling#Information_how_to_set_up_various_cross_compiling_toolchains>`_
+for a quick solution.
+
+Also see the `LLVM-specific variables`_ section for variables used when
+cross-compiling.
+
+Embedding LLVM in your project
+==============================
+
+From LLVM 3.5 onwards both the CMake and autoconf/Makefile build systems export
+LLVM libraries as importable CMake targets. This means that clients of LLVM can
+now reliably use CMake to develop their own LLVM-based projects against an
+installed version of LLVM regardless of how it was built.
+
+Here is a simple example of a CMakeLists.txt file that imports the LLVM libraries
+and uses them to build a simple application ``simple-tool``.
+
+.. code-block:: cmake
+
+  cmake_minimum_required(VERSION 3.4.3)
+  project(SimpleProject)
+
+  find_package(LLVM REQUIRED CONFIG)
+
+  message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
+  message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")
+
+  # Set your project compile flags.
+  # E.g. if using the C++ header files
+  # you will need to enable C++11 support
+  # for your compiler.
+
+  include_directories(${LLVM_INCLUDE_DIRS})
+  add_definitions(${LLVM_DEFINITIONS})
+
+  # Now build our tools
+  add_executable(simple-tool tool.cpp)
+
+  # Find the libraries that correspond to the LLVM components
+  # that we wish to use
+  llvm_map_components_to_libnames(llvm_libs support core irreader)
+
+  # Link against LLVM libraries
+  target_link_libraries(simple-tool ${llvm_libs})
+
+The ``find_package(...)`` directive when used in CONFIG mode (as in the above
+example) will look for the ``LLVMConfig.cmake`` file in various locations (see
+cmake manual for details).  It creates a ``LLVM_DIR`` cache entry to save the
+directory where ``LLVMConfig.cmake`` is found or allows the user to specify the
+directory (e.g. by passing ``-DLLVM_DIR=/usr/lib/cmake/llvm`` to
+the ``cmake`` command or by setting it directly in ``ccmake`` or ``cmake-gui``).
+
+This file is available in two different locations.
+
+* ``<INSTALL_PREFIX>/lib/cmake/llvm/LLVMConfig.cmake`` where
+  ``<INSTALL_PREFIX>`` is the install prefix of an installed version of LLVM.
+  On Linux typically this is ``/usr/lib/cmake/llvm/LLVMConfig.cmake``.
+
+* ``<LLVM_BUILD_ROOT>/lib/cmake/llvm/LLVMConfig.cmake`` where
+  ``<LLVM_BUILD_ROOT>`` is the root of the LLVM build tree. **Note: this is only
+  available when building LLVM with CMake.**
+
+If LLVM is installed in your operating system's normal installation prefix (e.g.
+on Linux this is usually ``/usr/``) ``find_package(LLVM ...)`` will
+automatically find LLVM if it is installed correctly. If LLVM is not installed
+or you wish to build directly against the LLVM build tree you can use
+``LLVM_DIR`` as previously mentioned.
+
+The ``LLVMConfig.cmake`` file sets various useful variables. Notable variables
+include
+
+``LLVM_CMAKE_DIR``
+  The path to the LLVM CMake directory (i.e. the directory containing
+  LLVMConfig.cmake).
+
+``LLVM_DEFINITIONS``
+  A list of preprocessor defines that should be used when building against LLVM.
+
+``LLVM_ENABLE_ASSERTIONS``
+  This is set to ON if LLVM was built with assertions, otherwise OFF.
+
+``LLVM_ENABLE_EH``
+  This is set to ON if LLVM was built with exception handling (EH) enabled,
+  otherwise OFF.
+
+``LLVM_ENABLE_RTTI``
+  This is set to ON if LLVM was built with run time type information (RTTI),
+  otherwise OFF.
+
+``LLVM_INCLUDE_DIRS``
+  A list of include paths to directories containing LLVM header files.
+
+``LLVM_PACKAGE_VERSION``
+  The LLVM version. This string can be used with CMake conditionals, e.g., ``if
+  (${LLVM_PACKAGE_VERSION} VERSION_LESS "3.5")``.
+
+``LLVM_TOOLS_BINARY_DIR``
+  The path to the directory containing the LLVM tools (e.g. ``llvm-as``).
+
+Notice that in the above example we link ``simple-tool`` against several LLVM
+libraries. The list of libraries is determined by using the
+``llvm_map_components_to_libnames()`` CMake function. For a list of available
+components look at the output of running ``llvm-config --components``.
+
+Note that for LLVM < 3.5 ``llvm_map_components_to_libraries()`` was
+used instead of ``llvm_map_components_to_libnames()``. This is now deprecated
+and will be removed in a future version of LLVM.
+
+.. _cmake-out-of-source-pass:
+
+Developing LLVM passes out of source
+------------------------------------
+
+It is possible to develop LLVM passes out of LLVM's source tree (i.e. against an
+installed or built LLVM). An example of a project layout is provided below.
+
+.. code-block:: none
+
+  <project dir>/
+      |
+      CMakeLists.txt
+      <pass name>/
+          |
+          CMakeLists.txt
+          Pass.cpp
+          ...
+
+Contents of ``<project dir>/CMakeLists.txt``:
+
+.. code-block:: cmake
+
+  find_package(LLVM REQUIRED CONFIG)
+
+  add_definitions(${LLVM_DEFINITIONS})
+  include_directories(${LLVM_INCLUDE_DIRS})
+
+  add_subdirectory(<pass name>)
+
+Contents of ``<project dir>/<pass name>/CMakeLists.txt``:
+
+.. code-block:: cmake
+
+  add_library(LLVMPassname MODULE Pass.cpp)
+
+Note if you intend for this pass to be merged into the LLVM source tree at some
+point in the future it might make more sense to use LLVM's internal
+``add_llvm_loadable_module`` function instead by...
+
+
+Adding the following to ``<project dir>/CMakeLists.txt`` (after
+``find_package(LLVM ...)``)
+
+.. code-block:: cmake
+
+  list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
+  include(AddLLVM)
+
+And then changing ``<project dir>/<pass name>/CMakeLists.txt`` to
+
+.. code-block:: cmake
+
+  add_llvm_loadable_module(LLVMPassname
+    Pass.cpp
+    )
+
+When you are done developing your pass, you may wish to integrate it
+into the LLVM source tree. You can achieve it in two easy steps:
+
+#. Copying ``<pass name>`` folder into ``<LLVM root>/lib/Transform`` directory.
+
+#. Adding ``add_subdirectory(<pass name>)`` line into
+   ``<LLVM root>/lib/Transform/CMakeLists.txt``.
+
+Compiler/Platform-specific topics
+=================================
+
+Notes for specific compilers and/or platforms.
+
+Microsoft Visual C++
+--------------------
+
+**LLVM_COMPILER_JOBS**:STRING
+  Specifies the maximum number of parallel compiler jobs to use per project
+  when building with msbuild or Visual Studio. Only supported for the Visual
+  Studio 2010 CMake generator. 0 means use all processors. Default is 0.

Added: www-releases/trunk/5.0.1/docs/_sources/CMakePrimer.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/CMakePrimer.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/CMakePrimer.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/CMakePrimer.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,438 @@
+============
+CMake Primer
+============
+
+.. contents::
+   :local:
+
+.. warning::
+   Disclaimer: This documentation is written by LLVM project contributors `not`
+   anyone affiliated with the CMake project. This document may contain
+   inaccurate terminology, phrasing, or technical details. It is provided with
+   the best intentions.
+
+
+Introduction
+============
+
+The LLVM project and many of the core projects built on LLVM build using CMake.
+This document aims to provide a brief overview of CMake for developers modifying
+LLVM projects or building their own projects on top of LLVM.
+
+The official CMake language references is available in the cmake-language
+manpage and `cmake-language online documentation
+<https://cmake.org/cmake/help/v3.4/manual/cmake-language.7.html>`_.
+
+10,000 ft View
+==============
+
+CMake is a tool that reads script files in its own language that describe how a
+software project builds. As CMake evaluates the scripts it constructs an
+internal representation of the software project. Once the scripts have been
+fully processed, if there are no errors, CMake will generate build files to
+actually build the project. CMake supports generating build files for a variety
+of command line build tools as well as for popular IDEs.
+
+When a user runs CMake it performs a variety of checks similar to how autoconf
+worked historically. During the checks and the evaluation of the build
+description scripts CMake caches values into the CMakeCache. This is useful
+because it allows the build system to skip long-running checks during
+incremental development. CMake caching also has some drawbacks, but that will be
+discussed later.
+
+Scripting Overview
+==================
+
+CMake's scripting language has a very simple grammar. Every language construct
+is a command that matches the pattern _name_(_args_). Commands come in three
+primary types: language-defined (commands implemented in C++ in CMake), defined
+functions, and defined macros. The CMake distribution also contains a suite of
+CMake modules that contain definitions for useful functionality.
+
+The example below is the full CMake build for building a C++ "Hello World"
+program. The example uses only CMake language-defined functions.
+
+.. code-block:: cmake
+
+   cmake_minimum_required(VERSION 3.2)
+   project(HelloWorld)
+   add_executable(HelloWorld HelloWorld.cpp)
+
+The CMake language provides control flow constructs in the form of foreach loops
+and if blocks. To make the example above more complicated you could add an if
+block to define "APPLE" when targeting Apple platforms:
+
+.. code-block:: cmake
+
+   cmake_minimum_required(VERSION 3.2)
+   project(HelloWorld)
+   add_executable(HelloWorld HelloWorld.cpp)
+   if(APPLE)
+     target_compile_definitions(HelloWorld PUBLIC APPLE)
+   endif()
+   
+Variables, Types, and Scope
+===========================
+
+Dereferencing
+-------------
+
+In CMake variables are "stringly" typed. All variables are represented as
+strings throughout evaluation. Wrapping a variable in ``${}`` dereferences it
+and results in a literal substitution of the name for the value. CMake refers to
+this as "variable evaluation" in their documentation. Dereferences are performed
+*before* the command being called receives the arguments. This means
+dereferencing a list results in multiple separate arguments being passed to the
+command.
+
+Variable dereferences can be nested and be used to model complex data. For
+example:
+
+.. code-block:: cmake
+
+   set(var_name var1)
+   set(${var_name} foo) # same as "set(var1 foo)"
+   set(${${var_name}}_var bar) # same as "set(foo_var bar)"
+   
+Dereferencing an unset variable results in an empty expansion. It is a common
+pattern in CMake to conditionally set variables knowing that it will be used in
+code paths that the variable isn't set. There are examples of this throughout
+the LLVM CMake build system.
+
+An example of variable empty expansion is:
+
+.. code-block:: cmake
+
+   if(APPLE)
+     set(extra_sources Apple.cpp)
+   endif()
+   add_executable(HelloWorld HelloWorld.cpp ${extra_sources})
+   
+In this example the ``extra_sources`` variable is only defined if you're
+targeting an Apple platform. For all other targets the ``extra_sources`` will be
+evaluated as empty before add_executable is given its arguments.
+
+Lists
+-----
+
+In CMake lists are semi-colon delimited strings, and it is strongly advised that
+you avoid using semi-colons in lists; it doesn't go smoothly. A few examples of
+defining lists:
+
+.. code-block:: cmake
+
+   # Creates a list with members a, b, c, and d
+   set(my_list a b c d)
+   set(my_list "a;b;c;d")
+   
+   # Creates a string "a b c d"
+   set(my_string "a b c d")
+
+Lists of Lists
+--------------
+
+One of the more complicated patterns in CMake is lists of lists. Because a list
+cannot contain an element with a semi-colon to construct a list of lists you
+make a list of variable names that refer to other lists. For example:
+
+.. code-block:: cmake
+
+   set(list_of_lists a b c)
+   set(a 1 2 3)
+   set(b 4 5 6)
+   set(c 7 8 9)
+   
+With this layout you can iterate through the list of lists printing each value
+with the following code:
+
+.. code-block:: cmake
+
+   foreach(list_name IN LISTS list_of_lists)
+     foreach(value IN LISTS ${list_name})
+       message(${value})
+     endforeach()
+   endforeach()
+   
+You'll notice that the inner foreach loop's list is doubly dereferenced. This is
+because the first dereference turns ``list_name`` into the name of the sub-list
+(a, b, or c in the example), then the second dereference is to get the value of
+the list.
+
+This pattern is used throughout CMake, the most common example is the compiler
+flags options, which CMake refers to using the following variable expansions:
+CMAKE_${LANGUAGE}_FLAGS and CMAKE_${LANGUAGE}_FLAGS_${CMAKE_BUILD_TYPE}.
+
+Other Types
+-----------
+
+Variables that are cached or specified on the command line can have types
+associated with them. The variable's type is used by CMake's UI tool to display
+the right input field. The variable's type generally doesn't impact evaluation.
+One of the few examples is PATH variables, which CMake does have some special
+handling for. You can read more about the special handling in `CMake's set
+documentation
+<https://cmake.org/cmake/help/v3.5/command/set.html#set-cache-entry>`_.
+
+Scope
+-----
+
+CMake inherently has a directory-based scoping. Setting a variable in a
+CMakeLists file, will set the variable for that file, and all subdirectories.
+Variables set in a CMake module that is included in a CMakeLists file will be
+set in the scope they are included from, and all subdirectories.
+
+When a variable that is already set is set again in a subdirectory it overrides
+the value in that scope and any deeper subdirectories.
+
+The CMake set command provides two scope-related options. PARENT_SCOPE sets a
+variable into the parent scope, and not the current scope. The CACHE option sets
+the variable in the CMakeCache, which results in it being set in all scopes. The
+CACHE option will not set a variable that already exists in the CACHE unless the
+FORCE option is specified.
+
+In addition to directory-based scope, CMake functions also have their own scope.
+This means variables set inside functions do not bleed into the parent scope.
+This is not true of macros, and it is for this reason LLVM prefers functions
+over macros whenever reasonable.
+
+.. note::
+  Unlike C-based languages, CMake's loop and control flow blocks do not have
+  their own scopes.
+
+Control Flow
+============
+
+CMake features the same basic control flow constructs you would expect in any
+scripting language, but there are a few quarks because, as with everything in
+CMake, control flow constructs are commands.
+
+If, ElseIf, Else
+----------------
+
+.. note::
+  For the full documentation on the CMake if command go
+  `here <https://cmake.org/cmake/help/v3.4/command/if.html>`_. That resource is
+  far more complete.
+
+In general CMake if blocks work the way you'd expect:
+
+.. code-block:: cmake
+
+  if(<condition>)
+    message("do stuff")
+  elseif(<condition>)
+    message("do other stuff")
+  else()
+    message("do other other stuff")
+  endif()
+
+The single most important thing to know about CMake's if blocks coming from a C
+background is that they do not have their own scope. Variables set inside
+conditional blocks persist after the ``endif()``.
+
+Loops
+-----
+
+The most common form of the CMake ``foreach`` block is:
+
+.. code-block:: cmake
+
+  foreach(var ...)
+    message("do stuff")
+  endforeach()
+
+The variable argument portion of the ``foreach`` block can contain dereferenced
+lists, values to iterate, or a mix of both:
+
+.. code-block:: cmake
+
+  foreach(var foo bar baz)
+    message(${var})
+  endforeach()
+  # prints:
+  #  foo
+  #  bar
+  #  baz
+
+  set(my_list 1 2 3)
+  foreach(var ${my_list})
+    message(${var})
+  endforeach()
+  # prints:
+  #  1
+  #  2
+  #  3
+
+  foreach(var ${my_list} out_of_bounds)
+    message(${var})
+  endforeach()
+  # prints:
+  #  1
+  #  2
+  #  3
+  #  out_of_bounds
+
+There is also a more modern CMake foreach syntax. The code below is equivalent
+to the code above:
+
+.. code-block:: cmake
+
+  foreach(var IN ITEMS foo bar baz)
+    message(${var})
+  endforeach()
+  # prints:
+  #  foo
+  #  bar
+  #  baz
+
+  set(my_list 1 2 3)
+  foreach(var IN LISTS my_list)
+    message(${var})
+  endforeach()
+  # prints:
+  #  1
+  #  2
+  #  3
+
+  foreach(var IN LISTS my_list ITEMS out_of_bounds)
+    message(${var})
+  endforeach()
+  # prints:
+  #  1
+  #  2
+  #  3
+  #  out_of_bounds
+
+Similar to the conditional statements, these generally behave how you would
+expect, and they do not have their own scope.
+
+CMake also supports ``while`` loops, although they are not widely used in LLVM.
+
+Modules, Functions and Macros
+=============================
+
+Modules
+-------
+
+Modules are CMake's vehicle for enabling code reuse. CMake modules are just
+CMake script files. They can contain code to execute on include as well as
+definitions for commands.
+
+In CMake macros and functions are universally referred to as commands, and they
+are the primary method of defining code that can be called multiple times.
+
+In LLVM we have several CMake modules that are included as part of our
+distribution for developers who don't build our project from source. Those
+modules are the fundamental pieces needed to build LLVM-based projects with
+CMake. We also rely on modules as a way of organizing the build system's
+functionality for maintainability and re-use within LLVM projects.
+
+Argument Handling
+-----------------
+
+When defining a CMake command handling arguments is very useful. The examples
+in this section will all use the CMake ``function`` block, but this all applies
+to the ``macro`` block as well.
+
+CMake commands can have named arguments, but all commands are implicitly
+variable argument. If the command has named arguments they are required and must
+be specified at every call site. Below is a trivial example of providing a
+wrapper function for CMake's built in function ``add_dependencies``.
+
+.. code-block:: cmake
+
+   function(add_deps target)
+     add_dependencies(${target} ${ARGV})
+   endfunction()
+
+This example defines a new macro named ``add_deps`` which takes a required first
+argument, and just calls another function passing through the first argument and
+all trailing arguments. When variable arguments are present CMake defines them
+in a list named ``ARGV``, and the count of the arguments is defined in ``ARGN``.
+
+CMake provides a module ``CMakeParseArguments`` which provides an implementation
+of advanced argument parsing. We use this all over LLVM, and it is recommended
+for any function that has complex argument-based behaviors or optional
+arguments. CMake's official documentation for the module is in the
+``cmake-modules`` manpage, and is also available at the
+`cmake-modules online documentation
+<https://cmake.org/cmake/help/v3.4/module/CMakeParseArguments.html>`_.
+
+.. note::
+  As of CMake 3.5 the cmake_parse_arguments command has become a native command
+  and the CMakeParseArguments module is empty and only left around for
+  compatibility.
+
+Functions Vs Macros
+-------------------
+
+Functions and Macros look very similar in how they are used, but there is one
+fundamental difference between the two. Functions have their own scope, and
+macros don't. This means variables set in macros will bleed out into the calling
+scope. That makes macros suitable for defining very small bits of functionality
+only.
+
+The other difference between CMake functions and macros is how arguments are
+passed. Arguments to macros are not set as variables, instead dereferences to
+the parameters are resolved across the macro before executing it. This can
+result in some unexpected behavior if using unreferenced variables. For example:
+
+.. code-block:: cmake
+
+   macro(print_list my_list)
+     foreach(var IN LISTS my_list)
+       message("${var}")
+     endforeach()
+   endmacro()
+   
+   set(my_list a b c d)
+   set(my_list_of_numbers 1 2 3 4)
+   print_list(my_list_of_numbers)
+   # prints:
+   # a
+   # b
+   # c
+   # d
+
+Generally speaking this issue is uncommon because it requires using
+non-dereferenced variables with names that overlap in the parent scope, but it
+is important to be aware of because it can lead to subtle bugs.
+
+LLVM Project Wrappers
+=====================
+
+LLVM projects provide lots of wrappers around critical CMake built-in commands.
+We use these wrappers to provide consistent behaviors across LLVM components
+and to reduce code duplication.
+
+We generally (but not always) follow the convention that commands prefaced with
+``llvm_`` are intended to be used only as building blocks for other commands.
+Wrapper commands that are intended for direct use are generally named following
+with the project in the middle of the command name (i.e. ``add_llvm_executable``
+is the wrapper for ``add_executable``). The LLVM ``add_*`` wrapper functions are
+all defined in ``AddLLVM.cmake`` which is installed as part of the LLVM
+distribution. It can be included and used by any LLVM sub-project that requires
+LLVM.
+
+.. note::
+
+   Not all LLVM projects require LLVM for all use cases. For example compiler-rt
+   can be built without LLVM, and the compiler-rt sanitizer libraries are used
+   with GCC.
+
+Useful Built-in Commands
+========================
+
+CMake has a bunch of useful built-in commands. This document isn't going to
+go into details about them because The CMake project has excellent
+documentation. To highlight a few useful functions see:
+
+* `add_custom_command <https://cmake.org/cmake/help/v3.4/command/add_custom_command.html>`_
+* `add_custom_target <https://cmake.org/cmake/help/v3.4/command/add_custom_target.html>`_
+* `file <https://cmake.org/cmake/help/v3.4/command/file.html>`_
+* `list <https://cmake.org/cmake/help/v3.4/command/list.html>`_
+* `math <https://cmake.org/cmake/help/v3.4/command/math.html>`_
+* `string <https://cmake.org/cmake/help/v3.4/command/string.html>`_
+
+The full documentation for CMake commands is in the ``cmake-commands`` manpage
+and available on `CMake's website <https://cmake.org/cmake/help/v3.4/manual/cmake-commands.7.html>`_