[www-releases] r321287 - Add 5.0.1 docs

Tom Stellard via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 21 10:09:58 PST 2017


Added: www-releases/trunk/5.0.1/docs/TestingGuide.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/TestingGuide.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/TestingGuide.html (added)
+++ www-releases/trunk/5.0.1/docs/TestingGuide.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,664 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>LLVM Testing Infrastructure Guide — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="LLVM test-suite Guide" href="TestSuiteMakefileGuide.html" />
+    <link rel="prev" title="Code Reviews with Phabricator" href="Phabricator.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="TestSuiteMakefileGuide.html" title="LLVM test-suite Guide"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="Phabricator.html" title="Code Reviews with Phabricator"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="llvm-testing-infrastructure-guide">
+<h1>LLVM Testing Infrastructure Guide<a class="headerlink" href="#llvm-testing-infrastructure-guide" title="Permalink to this headline">¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#overview" id="id5">Overview</a></li>
+<li><a class="reference internal" href="#requirements" id="id6">Requirements</a></li>
+<li><a class="reference internal" href="#llvm-testing-infrastructure-organization" id="id7">LLVM testing infrastructure organization</a><ul>
+<li><a class="reference internal" href="#regression-tests" id="id8">Regression tests</a></li>
+<li><a class="reference internal" href="#test-suite" id="id9"><code class="docutils literal"><span class="pre">test-suite</span></code></a></li>
+<li><a class="reference internal" href="#debugging-information-tests" id="id10">Debugging Information tests</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#quick-start" id="id11">Quick start</a><ul>
+<li><a class="reference internal" href="#id1" id="id12">Regression tests</a></li>
+<li><a class="reference internal" href="#id2" id="id13">Debugging Information tests</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#regression-test-structure" id="id14">Regression test structure</a><ul>
+<li><a class="reference internal" href="#writing-new-regression-tests" id="id15">Writing new regression tests</a></li>
+<li><a class="reference internal" href="#extra-files" id="id16">Extra files</a></li>
+<li><a class="reference internal" href="#fragile-tests" id="id17">Fragile tests</a></li>
+<li><a class="reference internal" href="#platform-specific-tests" id="id18">Platform-Specific Tests</a></li>
+<li><a class="reference internal" href="#constraining-test-execution" id="id19">Constraining test execution</a></li>
+<li><a class="reference internal" href="#substitutions" id="id20">Substitutions</a></li>
+<li><a class="reference internal" href="#options" id="id21">Options</a></li>
+<li><a class="reference internal" href="#other-features" id="id22">Other Features</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#test-suite-overview" id="id23"><code class="docutils literal"><span class="pre">test-suite</span></code> Overview</a><ul>
+<li><a class="reference internal" href="#test-suite-quickstart" id="id24"><code class="docutils literal"><span class="pre">test-suite</span></code> Quickstart</a></li>
+<li><a class="reference internal" href="#test-suite-makefiles" id="id25"><code class="docutils literal"><span class="pre">test-suite</span></code> Makefiles</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="toctree-wrapper compound">
+</div>
+<div class="section" id="overview">
+<h2><a class="toc-backref" href="#id5">Overview</a><a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h2>
+<p>This document is the reference manual for the LLVM testing
+infrastructure. It documents the structure of the LLVM testing
+infrastructure, the tools needed to use it, and how to add and run
+tests.</p>
+</div>
+<div class="section" id="requirements">
+<h2><a class="toc-backref" href="#id6">Requirements</a><a class="headerlink" href="#requirements" title="Permalink to this headline">¶</a></h2>
+<p>In order to use the LLVM testing infrastructure, you will need all of the
+software required to build LLVM, as well as <a class="reference external" href="http://python.org">Python</a> 2.7 or
+later.</p>
+<p>If you intend to run the <a class="reference internal" href="#test-suite-overview"><span class="std std-ref">test-suite</span></a>, you will also
+need a development version of zlib (zlib1g-dev is known to work on several Linux
+distributions).</p>
+</div>
+<div class="section" id="llvm-testing-infrastructure-organization">
+<h2><a class="toc-backref" href="#id7">LLVM testing infrastructure organization</a><a class="headerlink" href="#llvm-testing-infrastructure-organization" title="Permalink to this headline">¶</a></h2>
+<p>The LLVM testing infrastructure contains two major categories of tests:
+regression tests and whole programs. The regression tests are contained
+inside the LLVM repository itself under <code class="docutils literal"><span class="pre">llvm/test</span></code> and are expected
+to always pass – they should be run before every commit.</p>
+<p>The whole programs tests are referred to as the “LLVM test suite” (or
+“test-suite”) and are in the <code class="docutils literal"><span class="pre">test-suite</span></code> module in subversion. For
+historical reasons, these tests are also referred to as the “nightly
+tests” in places, which is less ambiguous than “test-suite” and remains
+in use although we run them much more often than nightly.</p>
+<div class="section" id="regression-tests">
+<h3><a class="toc-backref" href="#id8">Regression tests</a><a class="headerlink" href="#regression-tests" title="Permalink to this headline">¶</a></h3>
+<p>The regression tests are small pieces of code that test a specific
+feature of LLVM or trigger a specific bug in LLVM. The language they are
+written in depends on the part of LLVM being tested. These tests are driven by
+the <a class="reference internal" href="CommandGuide/lit.html"><span class="doc">Lit</span></a> testing tool (which is part of LLVM), and
+are located in the <code class="docutils literal"><span class="pre">llvm/test</span></code> directory.</p>
+<p>Typically when a bug is found in LLVM, a regression test containing just
+enough code to reproduce the problem should be written and placed
+somewhere underneath this directory. For example, it can be a small
+piece of LLVM IR distilled from an actual application or benchmark.</p>
+</div>
+<div class="section" id="test-suite">
+<h3><a class="toc-backref" href="#id9"><code class="docutils literal"><span class="pre">test-suite</span></code></a><a class="headerlink" href="#test-suite" title="Permalink to this headline">¶</a></h3>
+<p>The test suite contains whole programs, which are pieces of code which
+can be compiled and linked into a stand-alone program that can be
+executed. These programs are generally written in high level languages
+such as C or C++.</p>
+<p>These programs are compiled using a user specified compiler and set of
+flags, and then executed to capture the program output and timing
+information. The output of these programs is compared to a reference
+output to ensure that the program is being compiled correctly.</p>
+<p>In addition to compiling and executing programs, whole program tests
+serve as a way of benchmarking LLVM performance, both in terms of the
+efficiency of the programs generated as well as the speed with which
+LLVM compiles, optimizes, and generates code.</p>
+<p>The test-suite is located in the <code class="docutils literal"><span class="pre">test-suite</span></code> Subversion module.</p>
+</div>
+<div class="section" id="debugging-information-tests">
+<h3><a class="toc-backref" href="#id10">Debugging Information tests</a><a class="headerlink" href="#debugging-information-tests" title="Permalink to this headline">¶</a></h3>
+<p>The test suite contains tests to check quality of debugging information.
+The test are written in C based languages or in LLVM assembly language.</p>
+<p>These tests are compiled and run under a debugger. The debugger output
+is checked to validate of debugging information. See README.txt in the
+test suite for more information . This test suite is located in the
+<code class="docutils literal"><span class="pre">debuginfo-tests</span></code> Subversion module.</p>
+</div>
+</div>
+<div class="section" id="quick-start">
+<h2><a class="toc-backref" href="#id11">Quick start</a><a class="headerlink" href="#quick-start" title="Permalink to this headline">¶</a></h2>
+<p>The tests are located in two separate Subversion modules. The
+regressions tests are in the main “llvm” module under the directory
+<code class="docutils literal"><span class="pre">llvm/test</span></code> (so you get these tests for free with the main LLVM tree).
+Use <code class="docutils literal"><span class="pre">make</span> <span class="pre">check-all</span></code> to run the regression tests after building LLVM.</p>
+<p>The more comprehensive test suite that includes whole programs in C and C++
+is in the <code class="docutils literal"><span class="pre">test-suite</span></code> module. See <a class="reference internal" href="#test-suite-quickstart"><span class="std std-ref">test-suite Quickstart</span></a> for more information on running these tests.</p>
+<div class="section" id="id1">
+<h3><a class="toc-backref" href="#id12">Regression tests</a><a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h3>
+<p>To run all of the LLVM regression tests use the check-llvm target:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% make check-llvm
+</pre></div>
+</div>
+<p>If you have <a class="reference external" href="http://clang.llvm.org/">Clang</a> checked out and built, you
+can run the LLVM and Clang tests simultaneously using:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% make check-all
+</pre></div>
+</div>
+<p>To run the tests with Valgrind (Memcheck by default), use the <code class="docutils literal"><span class="pre">LIT_ARGS</span></code> make
+variable to pass the required options to lit. For example, you can use:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% make check <span class="nv">LIT_ARGS</span><span class="o">=</span><span class="s2">"-v --vg --vg-leak"</span>
+</pre></div>
+</div>
+<p>to enable testing with valgrind and with leak checking enabled.</p>
+<p>To run individual tests or subsets of tests, you can use the <code class="docutils literal"><span class="pre">llvm-lit</span></code>
+script which is built as part of LLVM. For example, to run the
+<code class="docutils literal"><span class="pre">Integer/BitPacked.ll</span></code> test by itself you can run:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% llvm-lit ~/llvm/test/Integer/BitPacked.ll
+</pre></div>
+</div>
+<p>or to run all of the ARM CodeGen tests:</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% llvm-lit ~/llvm/test/CodeGen/ARM
+</pre></div>
+</div>
+<p>For more information on using the <strong class="program">lit</strong> tool, see <code class="docutils literal"><span class="pre">llvm-lit</span> <span class="pre">--help</span></code>
+or the <a class="reference internal" href="CommandGuide/lit.html"><span class="doc">lit man page</span></a>.</p>
+</div>
+<div class="section" id="id2">
+<h3><a class="toc-backref" href="#id13">Debugging Information tests</a><a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h3>
+<p>To run debugging information tests simply checkout the tests inside
+clang/test directory.</p>
+<div class="highlight-bash"><div class="highlight"><pre><span></span>% <span class="nb">cd</span> clang/test
+% svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests
+</pre></div>
+</div>
+<p>These tests are already set up to run as part of clang regression tests.</p>
+</div>
+</div>
+<div class="section" id="regression-test-structure">
+<h2><a class="toc-backref" href="#id14">Regression test structure</a><a class="headerlink" href="#regression-test-structure" title="Permalink to this headline">¶</a></h2>
+<p>The LLVM regression tests are driven by <strong class="program">lit</strong> and are located in the
+<code class="docutils literal"><span class="pre">llvm/test</span></code> directory.</p>
+<p>This directory contains a large array of small tests that exercise
+various features of LLVM and to ensure that regressions do not occur.
+The directory is broken into several sub-directories, each focused on a
+particular area of LLVM.</p>
+<div class="section" id="writing-new-regression-tests">
+<h3><a class="toc-backref" href="#id15">Writing new regression tests</a><a class="headerlink" href="#writing-new-regression-tests" title="Permalink to this headline">¶</a></h3>
+<p>The regression test structure is very simple, but does require some
+information to be set. This information is gathered via <code class="docutils literal"><span class="pre">configure</span></code>
+and is written to a file, <code class="docutils literal"><span class="pre">test/lit.site.cfg</span></code> in the build directory.
+The <code class="docutils literal"><span class="pre">llvm/test</span></code> Makefile does this work for you.</p>
+<p>In order for the regression tests to work, each directory of tests must
+have a <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> file. <strong class="program">lit</strong> looks for this file to determine
+how to run the tests. This file is just Python code and thus is very
+flexible, but we’ve standardized it for the LLVM regression tests. If
+you’re adding a directory of tests, just copy <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> from
+another directory to get running. The standard <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> simply
+specifies which files to look in for tests. Any directory that contains
+only directories does not need the <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> file. Read the <a class="reference internal" href="CommandGuide/lit.html"><span class="doc">Lit
+documentation</span></a> for more information.</p>
+<p>Each test file must contain lines starting with “RUN:” that tell <strong class="program">lit</strong>
+how to run it. If there are no RUN lines, <strong class="program">lit</strong> will issue an error
+while running a test.</p>
+<p>RUN lines are specified in the comments of the test program using the
+keyword <code class="docutils literal"><span class="pre">RUN</span></code> followed by a colon, and lastly the command (pipeline)
+to execute. Together, these lines form the “script” that <strong class="program">lit</strong>
+executes to run the test case. The syntax of the RUN lines is similar to a
+shell’s syntax for pipelines including I/O redirection and variable
+substitution. However, even though these lines may <em>look</em> like a shell
+script, they are not. RUN lines are interpreted by <strong class="program">lit</strong>.
+Consequently, the syntax differs from shell in a few ways. You can specify
+as many RUN lines as needed.</p>
+<p><strong class="program">lit</strong> performs substitution on each RUN line to replace LLVM tool names
+with the full paths to the executable built for each tool (in
+<code class="docutils literal"><span class="pre">$(LLVM_OBJ_ROOT)/$(BuildMode)/bin)</span></code>. This ensures that <strong class="program">lit</strong> does
+not invoke any stray LLVM tools in the user’s path during testing.</p>
+<p>Each RUN line is executed on its own, distinct from other lines unless
+its last character is <code class="docutils literal"><span class="pre">\</span></code>. This continuation character causes the RUN
+line to be concatenated with the next one. In this way you can build up
+long pipelines of commands without making huge line lengths. The lines
+ending in <code class="docutils literal"><span class="pre">\</span></code> are concatenated until a RUN line that doesn’t end in
+<code class="docutils literal"><span class="pre">\</span></code> is found. This concatenated set of RUN lines then constitutes one
+execution. <strong class="program">lit</strong> will substitute variables and arrange for the pipeline
+to be executed. If any process in the pipeline fails, the entire line (and
+test case) fails too.</p>
+<p>Below is an example of legal RUN lines in a <code class="docutils literal"><span class="pre">.ll</span></code> file:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; RUN: llvm-as < %s | llvm-dis > %t1</span>
+<span class="c">; RUN: llvm-dis < %s.bc-13 > %t2</span>
+<span class="c">; RUN: diff %t1 %t2</span>
+</pre></div>
+</div>
+<p>As with a Unix shell, the RUN lines permit pipelines and I/O
+redirection to be used.</p>
+<p>There are some quoting rules that you must pay attention to when writing
+your RUN lines. In general nothing needs to be quoted. <strong class="program">lit</strong> won’t
+strip off any quote characters so they will get passed to the invoked program.
+To avoid this use curly braces to tell <strong class="program">lit</strong> that it should treat
+everything enclosed as one value.</p>
+<p>In general, you should strive to keep your RUN lines as simple as possible,
+using them only to run tools that generate textual output you can then examine.
+The recommended way to examine output to figure out if the test passes is using
+the <a class="reference internal" href="CommandGuide/FileCheck.html"><span class="doc">FileCheck tool</span></a>. <em>[The usage of grep in RUN
+lines is deprecated - please do not send or commit patches that use it.]</em></p>
+<p>Put related tests into a single file rather than having a separate file per
+test. Check if there are files already covering your feature and consider
+adding your code there instead of creating a new file.</p>
+</div>
+<div class="section" id="extra-files">
+<h3><a class="toc-backref" href="#id16">Extra files</a><a class="headerlink" href="#extra-files" title="Permalink to this headline">¶</a></h3>
+<p>If your test requires extra files besides the file containing the <code class="docutils literal"><span class="pre">RUN:</span></code>
+lines, the idiomatic place to put them is in a subdirectory <code class="docutils literal"><span class="pre">Inputs</span></code>.
+You can then refer to the extra files as <code class="docutils literal"><span class="pre">%S/Inputs/foo.bar</span></code>.</p>
+<p>For example, consider <code class="docutils literal"><span class="pre">test/Linker/ident.ll</span></code>. The directory structure is
+as follows:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">test</span><span class="o">/</span>
+  <span class="n">Linker</span><span class="o">/</span>
+    <span class="n">ident</span><span class="o">.</span><span class="n">ll</span>
+    <span class="n">Inputs</span><span class="o">/</span>
+      <span class="n">ident</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">ll</span>
+      <span class="n">ident</span><span class="o">.</span><span class="n">b</span><span class="o">.</span><span class="n">ll</span>
+</pre></div>
+</div>
+<p>For convenience, these are the contents:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">;;;;; ident.ll:</span>
+
+<span class="c">; RUN: llvm-link %S/Inputs/ident.a.ll %S/Inputs/ident.b.ll -S | FileCheck %s</span>
+
+<span class="c">; Verify that multiple input llvm.ident metadata are linked together.</span>
+
+<span class="c">; CHECK-DAG: !llvm.ident = !{!0, !1, !2}</span>
+<span class="c">; CHECK-DAG: "Compiler V1"</span>
+<span class="c">; CHECK-DAG: "Compiler V2"</span>
+<span class="c">; CHECK-DAG: "Compiler V3"</span>
+
+<span class="c">;;;;; Inputs/ident.a.ll:</span>
+
+<span class="nv">!llvm.ident</span> <span class="p">=</span> <span class="p">!{</span><span class="nv nv-Anonymous">!0</span><span class="p">,</span> <span class="nv nv-Anonymous">!1</span><span class="p">}</span>
+<span class="nv nv-Anonymous">!0</span> <span class="p">=</span> <span class="kt">metadata</span> <span class="p">!{</span><span class="kt">metadata</span> <span class="nv">!"Compiler V1"</span><span class="p">}</span>
+<span class="nv nv-Anonymous">!1</span> <span class="p">=</span> <span class="kt">metadata</span> <span class="p">!{</span><span class="kt">metadata</span> <span class="nv">!"Compiler V2"</span><span class="p">}</span>
+
+<span class="c">;;;;; Inputs/ident.b.ll:</span>
+
+<span class="nv">!llvm.ident</span> <span class="p">=</span> <span class="p">!{</span><span class="nv nv-Anonymous">!0</span><span class="p">}</span>
+<span class="nv nv-Anonymous">!0</span> <span class="p">=</span> <span class="kt">metadata</span> <span class="p">!{</span><span class="kt">metadata</span> <span class="nv">!"Compiler V3"</span><span class="p">}</span>
+</pre></div>
+</div>
+<p>For symmetry reasons, <code class="docutils literal"><span class="pre">ident.ll</span></code> is just a dummy file that doesn’t
+actually participate in the test besides holding the <code class="docutils literal"><span class="pre">RUN:</span></code> lines.</p>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">Some existing tests use <code class="docutils literal"><span class="pre">RUN:</span> <span class="pre">true</span></code> in extra files instead of just
+putting the extra files in an <code class="docutils literal"><span class="pre">Inputs/</span></code> directory. This pattern is
+deprecated.</p>
+</div>
+</div>
+<div class="section" id="fragile-tests">
+<h3><a class="toc-backref" href="#id17">Fragile tests</a><a class="headerlink" href="#fragile-tests" title="Permalink to this headline">¶</a></h3>
+<p>It is easy to write a fragile test that would fail spuriously if the tool being
+tested outputs a full path to the input file.  For example, <strong class="program">opt</strong> by
+default outputs a <code class="docutils literal"><span class="pre">ModuleID</span></code>:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> cat example.ll
+<span class="go">define i32 @main() nounwind {</span>
+<span class="go">    ret i32 0</span>
+<span class="go">}</span>
+
+<span class="gp">$</span> opt -S /path/to/example.ll
+<span class="go">; ModuleID = '/path/to/example.ll'</span>
+
+<span class="go">define i32 @main() nounwind {</span>
+<span class="go">    ret i32 0</span>
+<span class="go">}</span>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">ModuleID</span></code> can unexpectedly match against <code class="docutils literal"><span class="pre">CHECK</span></code> lines.  For example:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; RUN: opt -S %s | FileCheck</span>
+
+<span class="k">define</span> <span class="k">i32</span> <span class="vg">@main</span><span class="p">()</span> <span class="k">nounwind</span> <span class="p">{</span>
+    <span class="c">; CHECK-NOT: load</span>
+    <span class="k">ret</span> <span class="k">i32</span> <span class="m">0</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This test will fail if placed into a <code class="docutils literal"><span class="pre">download</span></code> directory.</p>
+<p>To make your tests robust, always use <code class="docutils literal"><span class="pre">opt</span> <span class="pre">...</span> <span class="pre"><</span> <span class="pre">%s</span></code> in the RUN line.
+<strong class="program">opt</strong> does not output a <code class="docutils literal"><span class="pre">ModuleID</span></code> when input comes from stdin.</p>
+</div>
+<div class="section" id="platform-specific-tests">
+<h3><a class="toc-backref" href="#id18">Platform-Specific Tests</a><a class="headerlink" href="#platform-specific-tests" title="Permalink to this headline">¶</a></h3>
+<p>Whenever adding tests that require the knowledge of a specific platform,
+either related to code generated, specific output or back-end features,
+you must make sure to isolate the features, so that buildbots that
+run on different architectures (and don’t even compile all back-ends),
+don’t fail.</p>
+<p>The first problem is to check for target-specific output, for example sizes
+of structures, paths and architecture names, for example:</p>
+<ul class="simple">
+<li>Tests containing Windows paths will fail on Linux and vice-versa.</li>
+<li>Tests that check for <code class="docutils literal"><span class="pre">x86_64</span></code> somewhere in the text will fail anywhere else.</li>
+<li>Tests where the debug information calculates the size of types and structures.</li>
+</ul>
+<p>Also, if the test rely on any behaviour that is coded in any back-end, it must
+go in its own directory. So, for instance, code generator tests for ARM go
+into <code class="docutils literal"><span class="pre">test/CodeGen/ARM</span></code> and so on. Those directories contain a special
+<code class="docutils literal"><span class="pre">lit</span></code> configuration file that ensure all tests in that directory will
+only run if a specific back-end is compiled and available.</p>
+<p>For instance, on <code class="docutils literal"><span class="pre">test/CodeGen/ARM</span></code>, the <code class="docutils literal"><span class="pre">lit.local.cfg</span></code> is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">config</span><span class="o">.</span><span class="n">suffixes</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'.ll'</span><span class="p">,</span> <span class="s1">'.c'</span><span class="p">,</span> <span class="s1">'.cpp'</span><span class="p">,</span> <span class="s1">'.test'</span><span class="p">]</span>
+<span class="k">if</span> <span class="ow">not</span> <span class="s1">'ARM'</span> <span class="ow">in</span> <span class="n">config</span><span class="o">.</span><span class="n">root</span><span class="o">.</span><span class="n">targets</span><span class="p">:</span>
+  <span class="n">config</span><span class="o">.</span><span class="n">unsupported</span> <span class="o">=</span> <span class="bp">True</span>
+</pre></div>
+</div>
+<p>Other platform-specific tests are those that depend on a specific feature
+of a specific sub-architecture, for example only to Intel chips that support <code class="docutils literal"><span class="pre">AVX2</span></code>.</p>
+<p>For instance, <code class="docutils literal"><span class="pre">test/CodeGen/X86/psubus.ll</span></code> tests three sub-architecture
+variants:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; RUN: llc -mcpu=core2 < %s | FileCheck %s -check-prefix=SSE2</span>
+<span class="c">; RUN: llc -mcpu=corei7-avx < %s | FileCheck %s -check-prefix=AVX1</span>
+<span class="c">; RUN: llc -mcpu=core-avx2 < %s | FileCheck %s -check-prefix=AVX2</span>
+</pre></div>
+</div>
+<p>And the checks are different:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; SSE2: @test1</span>
+<span class="c">; SSE2: psubusw LCPI0_0(%rip), %xmm0</span>
+<span class="c">; AVX1: @test1</span>
+<span class="c">; AVX1: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0</span>
+<span class="c">; AVX2: @test1</span>
+<span class="c">; AVX2: vpsubusw LCPI0_0(%rip), %xmm0, %xmm0</span>
+</pre></div>
+</div>
+<p>So, if you’re testing for a behaviour that you know is platform-specific or
+depends on special features of sub-architectures, you must add the specific
+triple, test with the specific FileCheck and put it into the specific
+directory that will filter out all other architectures.</p>
+</div>
+<div class="section" id="constraining-test-execution">
+<h3><a class="toc-backref" href="#id19">Constraining test execution</a><a class="headerlink" href="#constraining-test-execution" title="Permalink to this headline">¶</a></h3>
+<p>Some tests can be run only in specific configurations, such as
+with debug builds or on particular platforms. Use <code class="docutils literal"><span class="pre">REQUIRES</span></code>
+and <code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> to control when the test is enabled.</p>
+<p>Some tests are expected to fail. For example, there may be a known bug
+that the test detect. Use <code class="docutils literal"><span class="pre">XFAIL</span></code> to mark a test as an expected failure.
+An <code class="docutils literal"><span class="pre">XFAIL</span></code> test will be successful if its execution fails, and
+will be a failure if its execution succeeds.</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; This test will be only enabled in the build with asserts.</span>
+<span class="c">; REQUIRES: asserts</span>
+<span class="c">; This test is disabled on Linux.</span>
+<span class="c">; UNSUPPORTED: -linux-</span>
+<span class="c">; This test is expected to fail on PowerPC.</span>
+<span class="c">; XFAIL: powerpc</span>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">REQUIRES</span></code> and <code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> and <code class="docutils literal"><span class="pre">XFAIL</span></code> all accept a comma-separated
+list of boolean expressions. The values in each expression may be:</p>
+<ul class="simple">
+<li>Features added to <code class="docutils literal"><span class="pre">config.available_features</span></code> by
+configuration files such as <code class="docutils literal"><span class="pre">lit.cfg</span></code>.</li>
+<li>Substrings of the target triple (<code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> and <code class="docutils literal"><span class="pre">XFAIL</span></code> only).</li>
+</ul>
+<div class="line-block">
+<div class="line"><code class="docutils literal"><span class="pre">REQUIRES</span></code> enables the test if all expressions are true.</div>
+<div class="line"><code class="docutils literal"><span class="pre">UNSUPPORTED</span></code> disables the test if any expression is true.</div>
+<div class="line"><code class="docutils literal"><span class="pre">XFAIL</span></code> expects the test to fail if any expression is true.</div>
+</div>
+<p>As a special case, <code class="docutils literal"><span class="pre">XFAIL:</span> <span class="pre">*</span></code> is expected to fail everywhere.</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="c">; This test is disabled on Windows,</span>
+<span class="c">; and is disabled on Linux, except for Android Linux.</span>
+<span class="c">; UNSUPPORTED: windows, linux && !android</span>
+<span class="c">; This test is expected to fail on both PowerPC and ARM.</span>
+<span class="c">; XFAIL: powerpc || arm</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="substitutions">
+<h3><a class="toc-backref" href="#id20">Substitutions</a><a class="headerlink" href="#substitutions" title="Permalink to this headline">¶</a></h3>
+<p>Besides replacing LLVM tool names the following substitutions are performed in
+RUN lines:</p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">%%</span></code></dt>
+<dd>Replaced by a single <code class="docutils literal"><span class="pre">%</span></code>. This allows escaping other substitutions.</dd>
+<dt><code class="docutils literal"><span class="pre">%s</span></code></dt>
+<dd><p class="first">File path to the test case’s source. This is suitable for passing on the
+command line as the input to an LLVM tool.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm/test/MC/ELF/foo_test.s</span></code></p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%S</span></code></dt>
+<dd><p class="first">Directory path to the test case’s source.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm/test/MC/ELF</span></code></p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%t</span></code></dt>
+<dd><p class="first">File path to a temporary file name that could be used for this test case.
+The file name won’t conflict with other test cases. You can append to it
+if you need multiple temporaries. This is useful as the destination of
+some redirected output.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm.build/test/MC/ELF/Output/foo_test.s.tmp</span></code></p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%T</span></code></dt>
+<dd><p class="first">Directory of <code class="docutils literal"><span class="pre">%t</span></code>.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">/home/user/llvm.build/test/MC/ELF/Output</span></code></p>
+</dd>
+</dl>
+<p><code class="docutils literal"><span class="pre">%{pathsep}</span></code></p>
+<blockquote>
+<div>Expands to the path separator, i.e. <code class="docutils literal"><span class="pre">:</span></code> (or <code class="docutils literal"><span class="pre">;</span></code> on Windows).</div></blockquote>
+<p><code class="docutils literal"><span class="pre">%/s,</span> <span class="pre">%/S,</span> <span class="pre">%/t,</span> <span class="pre">%/T:</span></code></p>
+<blockquote>
+<div><p>Act like the corresponding substitution above but replace any <code class="docutils literal"><span class="pre">\</span></code>
+character with a <code class="docutils literal"><span class="pre">/</span></code>. This is useful to normalize path separators.</p>
+<blockquote>
+<div><p>Example: <code class="docutils literal"><span class="pre">%s:</span>  <span class="pre">C:\Desktop</span> <span class="pre">Files/foo_test.s.tmp</span></code></p>
+<p>Example: <code class="docutils literal"><span class="pre">%/s:</span> <span class="pre">C:/Desktop</span> <span class="pre">Files/foo_test.s.tmp</span></code></p>
+</div></blockquote>
+</div></blockquote>
+<p><code class="docutils literal"><span class="pre">%:s,</span> <span class="pre">%:S,</span> <span class="pre">%:t,</span> <span class="pre">%:T:</span></code></p>
+<blockquote>
+<div><p>Act like the corresponding substitution above but remove colons at
+the beginning of Windows paths. This is useful to allow concatenation
+of absolute paths on Windows to produce a legal path.</p>
+<blockquote>
+<div><p>Example: <code class="docutils literal"><span class="pre">%s:</span>  <span class="pre">C:\Desktop</span> <span class="pre">Files\foo_test.s.tmp</span></code></p>
+<p>Example: <code class="docutils literal"><span class="pre">%:s:</span> <span class="pre">C\Desktop</span> <span class="pre">Files\foo_test.s.tmp</span></code></p>
+</div></blockquote>
+</div></blockquote>
+<p><strong>LLVM-specific substitutions:</strong></p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">%shlibext</span></code></dt>
+<dd><p class="first">The suffix for the host platforms shared library files. This includes the
+period as the first character.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">.so</span></code> (Linux), <code class="docutils literal"><span class="pre">.dylib</span></code> (OS X), <code class="docutils literal"><span class="pre">.dll</span></code> (Windows)</p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%exeext</span></code></dt>
+<dd><p class="first">The suffix for the host platforms executable files. This includes the
+period as the first character.</p>
+<p class="last">Example: <code class="docutils literal"><span class="pre">.exe</span></code> (Windows), empty on Linux.</p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">%(line)</span></code>, <code class="docutils literal"><span class="pre">%(line+<number>)</span></code>, <code class="docutils literal"><span class="pre">%(line-<number>)</span></code></dt>
+<dd>The number of the line where this substitution is used, with an optional
+integer offset. This can be used in tests with multiple RUN lines, which
+reference test file’s line numbers.</dd>
+</dl>
+<p><strong>Clang-specific substitutions:</strong></p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">%clang</span></code></dt>
+<dd>Invokes the Clang driver.</dd>
+<dt><code class="docutils literal"><span class="pre">%clang_cpp</span></code></dt>
+<dd>Invokes the Clang driver for C++.</dd>
+<dt><code class="docutils literal"><span class="pre">%clang_cl</span></code></dt>
+<dd>Invokes the CL-compatible Clang driver.</dd>
+<dt><code class="docutils literal"><span class="pre">%clangxx</span></code></dt>
+<dd>Invokes the G++-compatible Clang driver.</dd>
+<dt><code class="docutils literal"><span class="pre">%clang_cc1</span></code></dt>
+<dd>Invokes the Clang frontend.</dd>
+<dt><code class="docutils literal"><span class="pre">%itanium_abi_triple</span></code>, <code class="docutils literal"><span class="pre">%ms_abi_triple</span></code></dt>
+<dd>These substitutions can be used to get the current target triple adjusted to
+the desired ABI. For example, if the test suite is running with the
+<code class="docutils literal"><span class="pre">i686-pc-win32</span></code> target, <code class="docutils literal"><span class="pre">%itanium_abi_triple</span></code> will expand to
+<code class="docutils literal"><span class="pre">i686-pc-mingw32</span></code>. This allows a test to run with a specific ABI without
+constraining it to a specific triple.</dd>
+</dl>
+<p>To add more substituations, look at <code class="docutils literal"><span class="pre">test/lit.cfg</span></code> or <code class="docutils literal"><span class="pre">lit.local.cfg</span></code>.</p>
+</div>
+<div class="section" id="options">
+<h3><a class="toc-backref" href="#id21">Options</a><a class="headerlink" href="#options" title="Permalink to this headline">¶</a></h3>
+<p>The llvm lit configuration allows to customize some things with user options:</p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">llc</span></code>, <code class="docutils literal"><span class="pre">opt</span></code>, …</dt>
+<dd><p class="first">Substitute the respective llvm tool name with a custom command line. This
+allows to specify custom paths and default arguments for these tools.
+Example:</p>
+<p class="last">% llvm-lit “-Dllc=llc -verify-machineinstrs”</p>
+</dd>
+<dt><code class="docutils literal"><span class="pre">run_long_tests</span></code></dt>
+<dd>Enable the execution of long running tests.</dd>
+<dt><code class="docutils literal"><span class="pre">llvm_site_config</span></code></dt>
+<dd>Load the specified lit configuration instead of the default one.</dd>
+</dl>
+</div>
+<div class="section" id="other-features">
+<h3><a class="toc-backref" href="#id22">Other Features</a><a class="headerlink" href="#other-features" title="Permalink to this headline">¶</a></h3>
+<p>To make RUN line writing easier, there are several helper programs. These
+helpers are in the PATH when running tests, so you can just call them using
+their name. For example:</p>
+<dl class="docutils">
+<dt><code class="docutils literal"><span class="pre">not</span></code></dt>
+<dd>This program runs its arguments and then inverts the result code from it.
+Zero result codes become 1. Non-zero result codes become 0.</dd>
+</dl>
+<p>To make the output more useful, <strong class="program">lit</strong> will scan
+the lines of the test case for ones that contain a pattern that matches
+<code class="docutils literal"><span class="pre">PR[0-9]+</span></code>. This is the syntax for specifying a PR (Problem Report) number
+that is related to the test case. The number after “PR” specifies the
+LLVM bugzilla number. When a PR number is specified, it will be used in
+the pass/fail reporting. This is useful to quickly get some context when
+a test fails.</p>
+<p>Finally, any line that contains “END.” will cause the special
+interpretation of lines to terminate. This is generally done right after
+the last RUN: line. This has two side effects:</p>
+<ol class="loweralpha simple">
+<li>it prevents special interpretation of lines that are part of the test
+program, not the instructions to the test case, and</li>
+<li>it speeds things up for really big test cases by avoiding
+interpretation of the remainder of the file.</li>
+</ol>
+</div>
+</div>
+<div class="section" id="test-suite-overview">
+<span id="id3"></span><h2><a class="toc-backref" href="#id23"><code class="docutils literal"><span class="pre">test-suite</span></code> Overview</a><a class="headerlink" href="#test-suite-overview" title="Permalink to this headline">¶</a></h2>
+<p>The <code class="docutils literal"><span class="pre">test-suite</span></code> module contains a number of programs that can be
+compiled and executed. The <code class="docutils literal"><span class="pre">test-suite</span></code> includes reference outputs for
+all of the programs, so that the output of the executed program can be
+checked for correctness.</p>
+<p><code class="docutils literal"><span class="pre">test-suite</span></code> tests are divided into three types of tests: MultiSource,
+SingleSource, and External.</p>
+<ul>
+<li><p class="first"><code class="docutils literal"><span class="pre">test-suite/SingleSource</span></code></p>
+<p>The SingleSource directory contains test programs that are only a
+single source file in size. These are usually small benchmark
+programs or small programs that calculate a particular value. Several
+such programs are grouped together in each directory.</p>
+</li>
+<li><p class="first"><code class="docutils literal"><span class="pre">test-suite/MultiSource</span></code></p>
+<p>The MultiSource directory contains subdirectories which contain
+entire programs with multiple source files. Large benchmarks and
+whole applications go here.</p>
+</li>
+<li><p class="first"><code class="docutils literal"><span class="pre">test-suite/External</span></code></p>
+<p>The External directory contains Makefiles for building code that is
+external to (i.e., not distributed with) LLVM. The most prominent
+members of this directory are the SPEC 95 and SPEC 2000 benchmark
+suites. The <code class="docutils literal"><span class="pre">External</span></code> directory does not contain these actual
+tests, but only the Makefiles that know how to properly compile these
+programs from somewhere else. When using <code class="docutils literal"><span class="pre">LNT</span></code>, use the
+<code class="docutils literal"><span class="pre">--test-externals</span></code> option to include these tests in the results.</p>
+</li>
+</ul>
+<div class="section" id="test-suite-quickstart">
+<span id="id4"></span><h3><a class="toc-backref" href="#id24"><code class="docutils literal"><span class="pre">test-suite</span></code> Quickstart</a><a class="headerlink" href="#test-suite-quickstart" title="Permalink to this headline">¶</a></h3>
+<p>The modern way of running the <code class="docutils literal"><span class="pre">test-suite</span></code> is focused on testing and
+benchmarking complete compilers using the
+<a class="reference external" href="http://llvm.org/docs/lnt">LNT</a> testing infrastructure.</p>
+<p>For more information on using LNT to execute the <code class="docutils literal"><span class="pre">test-suite</span></code>, please
+see the <a class="reference external" href="http://llvm.org/docs/lnt/quickstart.html">LNT Quickstart</a>
+documentation.</p>
+</div>
+<div class="section" id="test-suite-makefiles">
+<h3><a class="toc-backref" href="#id25"><code class="docutils literal"><span class="pre">test-suite</span></code> Makefiles</a><a class="headerlink" href="#test-suite-makefiles" title="Permalink to this headline">¶</a></h3>
+<p>Historically, the <code class="docutils literal"><span class="pre">test-suite</span></code> was executed using a complicated setup
+of Makefiles. The LNT based approach above is recommended for most
+users, but there are some testing scenarios which are not supported by
+the LNT approach. In addition, LNT currently uses the Makefile setup
+under the covers and so developers who are interested in how LNT works
+under the hood may want to understand the Makefile based setup.</p>
+<p>For more information on the <code class="docutils literal"><span class="pre">test-suite</span></code> Makefile setup, please see
+the <a class="reference internal" href="TestSuiteMakefileGuide.html"><span class="doc">Test Suite Makefile Guide</span></a>.</p>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="TestSuiteMakefileGuide.html" title="LLVM test-suite Guide"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="Phabricator.html" title="Code Reviews with Phabricator"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/TypeMetadata.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/TypeMetadata.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/TypeMetadata.html (added)
+++ www-releases/trunk/5.0.1/docs/TypeMetadata.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,386 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Type Metadata — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="FaultMaps and implicit checks" href="FaultMaps.html" />
+    <link rel="prev" title="MergeFunctions pass, how it works" href="MergeFunctions.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="FaultMaps.html" title="FaultMaps and implicit checks"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="MergeFunctions.html" title="MergeFunctions pass, how it works"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="type-metadata">
+<h1>Type Metadata<a class="headerlink" href="#type-metadata" title="Permalink to this headline">¶</a></h1>
+<p>Type metadata is a mechanism that allows IR modules to co-operatively build
+pointer sets corresponding to addresses within a given set of globals. LLVM’s
+<a class="reference external" href="http://clang.llvm.org/docs/ControlFlowIntegrity.html">control flow integrity</a> implementation uses this metadata to efficiently
+check (at each call site) that a given address corresponds to either a
+valid vtable or function pointer for a given class or function type, and its
+whole-program devirtualization pass uses the metadata to identify potential
+callees for a given virtual call.</p>
+<p>To use the mechanism, a client creates metadata nodes with two elements:</p>
+<ol class="arabic simple">
+<li>a byte offset into the global (generally zero for functions)</li>
+<li>a metadata object representing an identifier for the type</li>
+</ol>
+<p>These metadata nodes are associated with globals by using global object
+metadata attachments with the <code class="docutils literal"><span class="pre">!type</span></code> metadata kind.</p>
+<p>Each type identifier must exclusively identify either global variables
+or functions.</p>
+<div class="admonition-limitation admonition">
+<p class="first admonition-title">Limitation</p>
+<p class="last">The current implementation only supports attaching metadata to functions on
+the x86-32 and x86-64 architectures.</p>
+</div>
+<p>An intrinsic, <a class="reference internal" href="LangRef.html#type-test"><span class="std std-ref">llvm.type.test</span></a>, is used to test whether a
+given pointer is associated with a type identifier.</p>
+<div class="section" id="representing-type-information-using-type-metadata">
+<h2>Representing Type Information using Type Metadata<a class="headerlink" href="#representing-type-information-using-type-metadata" title="Permalink to this headline">¶</a></h2>
+<p>This section describes how Clang represents C++ type information associated with
+virtual tables using type metadata.</p>
+<p>Consider the following inheritance hierarchy:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">A</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">();</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="nl">B</span> <span class="p">:</span> <span class="n">A</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">();</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="nf">g</span><span class="p">();</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="n">C</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">h</span><span class="p">();</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="nl">D</span> <span class="p">:</span> <span class="n">A</span><span class="p">,</span> <span class="n">C</span> <span class="p">{</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="n">f</span><span class="p">();</span>
+  <span class="k">virtual</span> <span class="kt">void</span> <span class="nf">h</span><span class="p">();</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>The virtual table objects for A, B, C and D look like this (under the Itanium ABI):</p>
+<table border="1" class="docutils" id="id1">
+<caption><span class="caption-text">Virtual Table Layout for A, B, C, D</span><a class="headerlink" href="#id1" title="Permalink to this table">¶</a></caption>
+<colgroup>
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+<col width="13%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Class</th>
+<th class="head">0</th>
+<th class="head">1</th>
+<th class="head">2</th>
+<th class="head">3</th>
+<th class="head">4</th>
+<th class="head">5</th>
+<th class="head">6</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>A</td>
+<td>A::offset-to-top</td>
+<td>&A::rtti</td>
+<td>&A::f</td>
+<td> </td>
+<td> </td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="row-odd"><td>B</td>
+<td>B::offset-to-top</td>
+<td>&B::rtti</td>
+<td>&B::f</td>
+<td>&B::g</td>
+<td> </td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="row-even"><td>C</td>
+<td>C::offset-to-top</td>
+<td>&C::rtti</td>
+<td>&C::h</td>
+<td> </td>
+<td> </td>
+<td> </td>
+<td> </td>
+</tr>
+<tr class="row-odd"><td>D</td>
+<td>D::offset-to-top</td>
+<td>&D::rtti</td>
+<td>&D::f</td>
+<td>&D::h</td>
+<td>D::offset-to-top</td>
+<td>&D::rtti</td>
+<td>thunk for &D::h</td>
+</tr>
+</tbody>
+</table>
+<p>When an object of type A is constructed, the address of <code class="docutils literal"><span class="pre">&A::f</span></code> in A’s
+virtual table object is stored in the object’s vtable pointer.  In ABI parlance
+this address is known as an <a class="reference external" href="https://mentorembedded.github.io/cxx-abi/abi.html#vtable-general">address point</a>. Similarly, when an object of type
+B is constructed, the address of <code class="docutils literal"><span class="pre">&B::f</span></code> is stored in the vtable pointer. In
+this way, the vtable in B’s virtual table object is compatible with A’s vtable.</p>
+<p>D is a little more complicated, due to the use of multiple inheritance. Its
+virtual table object contains two vtables, one compatible with A’s vtable and
+the other compatible with C’s vtable. Objects of type D contain two virtual
+pointers, one belonging to the A subobject and containing the address of
+the vtable compatible with A’s vtable, and the other belonging to the C
+subobject and containing the address of the vtable compatible with C’s vtable.</p>
+<p>The full set of compatibility information for the above class hierarchy is
+shown below. The following table shows the name of a class, the offset of an
+address point within that class’s vtable and the name of one of the classes
+with which that address point is compatible.</p>
+<table border="1" class="docutils" id="id2">
+<caption><span class="caption-text">Type Offsets for A, B, C, D</span><a class="headerlink" href="#id2" title="Permalink to this table">¶</a></caption>
+<colgroup>
+<col width="33%" />
+<col width="33%" />
+<col width="33%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">VTable for</th>
+<th class="head">Offset</th>
+<th class="head">Compatible Class</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>A</td>
+<td>16</td>
+<td>A</td>
+</tr>
+<tr class="row-odd"><td>B</td>
+<td>16</td>
+<td>A</td>
+</tr>
+<tr class="row-even"><td> </td>
+<td> </td>
+<td>B</td>
+</tr>
+<tr class="row-odd"><td>C</td>
+<td>16</td>
+<td>C</td>
+</tr>
+<tr class="row-even"><td>D</td>
+<td>16</td>
+<td>A</td>
+</tr>
+<tr class="row-odd"><td> </td>
+<td> </td>
+<td>D</td>
+</tr>
+<tr class="row-even"><td> </td>
+<td>48</td>
+<td>C</td>
+</tr>
+</tbody>
+</table>
+<p>The next step is to encode this compatibility information into the IR. The way
+this is done is to create type metadata named after each of the compatible
+classes, with which we associate each of the compatible address points in
+each vtable. For example, these type metadata entries encode the compatibility
+information for the above hierarchy:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>@_ZTV1A = constant [...], !type !0
+ at _ZTV1B = constant [...], !type !0, !type !1
+ at _ZTV1C = constant [...], !type !2
+ at _ZTV1D = constant [...], !type !0, !type !3, !type !4
+
+!0 = !{i64 16, !"_ZTS1A"}
+!1 = !{i64 16, !"_ZTS1B"}
+!2 = !{i64 16, !"_ZTS1C"}
+!3 = !{i64 16, !"_ZTS1D"}
+!4 = !{i64 48, !"_ZTS1C"}
+</pre></div>
+</div>
+<p>With this type metadata, we can now use the <code class="docutils literal"><span class="pre">llvm.type.test</span></code> intrinsic to
+test whether a given pointer is compatible with a type identifier. Working
+backwards, if <code class="docutils literal"><span class="pre">llvm.type.test</span></code> returns true for a particular pointer,
+we can also statically determine the identities of the virtual functions
+that a particular virtual call may call. For example, if a program assumes
+a pointer to be a member of <code class="docutils literal"><span class="pre">!"_ZST1A"</span></code>, we know that the address can
+be only be one of <code class="docutils literal"><span class="pre">_ZTV1A+16</span></code>, <code class="docutils literal"><span class="pre">_ZTV1B+16</span></code> or <code class="docutils literal"><span class="pre">_ZTV1D+16</span></code> (i.e. the
+address points of the vtables of A, B and D respectively). If we then load
+an address from that pointer, we know that the address can only be one of
+<code class="docutils literal"><span class="pre">&A::f</span></code>, <code class="docutils literal"><span class="pre">&B::f</span></code> or <code class="docutils literal"><span class="pre">&D::f</span></code>.</p>
+</div>
+<div class="section" id="testing-addresses-for-type-membership">
+<h2>Testing Addresses For Type Membership<a class="headerlink" href="#testing-addresses-for-type-membership" title="Permalink to this headline">¶</a></h2>
+<p>If a program tests an address using <code class="docutils literal"><span class="pre">llvm.type.test</span></code>, this will cause
+a link-time optimization pass, <code class="docutils literal"><span class="pre">LowerTypeTests</span></code>, to replace calls to this
+intrinsic with efficient code to perform type member tests. At a high level,
+the pass will lay out referenced globals in a consecutive memory region in
+the object file, construct bit vectors that map onto that memory region,
+and generate code at each of the <code class="docutils literal"><span class="pre">llvm.type.test</span></code> call sites to test
+pointers against those bit vectors. Because of the layout manipulation, the
+globals’ definitions must be available at LTO time. For more information,
+see the <a class="reference external" href="http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html">control flow integrity design document</a>.</p>
+<p>A type identifier that identifies functions is transformed into a jump table,
+which is a block of code consisting of one branch instruction for each
+of the functions associated with the type identifier that branches to the
+target function. The pass will redirect any taken function addresses to the
+corresponding jump table entry. In the object file’s symbol table, the jump
+table entries take the identities of the original functions, so that addresses
+taken outside the module will pass any verification done inside the module.</p>
+<p>Jump tables may call external functions, so their definitions need not
+be available at LTO time. Note that if an externally defined function is
+associated with a type identifier, there is no guarantee that its identity
+within the module will be the same as its identity outside of the module,
+as the former will be the jump table entry if a jump table is necessary.</p>
+<p>The <a class="reference external" href="http://llvm.org/klaus/llvm/blob/master/include/llvm/Transforms/IPO/LowerTypeTests.h">GlobalLayoutBuilder</a> class is responsible for laying out the globals
+efficiently to minimize the sizes of the underlying bitsets.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Example:</th><td class="field-body"></td>
+</tr>
+</tbody>
+</table>
+<div class="highlight-default"><div class="highlight"><pre><span></span>target datalayout = "e-p:32:32"
+
+ at a = internal global i32 0, !type !0
+ at b = internal global i32 0, !type !0, !type !1
+ at c = internal global i32 0, !type !1
+ at d = internal global [2 x i32] [i32 0, i32 0], !type !2
+
+define void @e() !type !3 {
+  ret void
+}
+
+define void @f() {
+  ret void
+}
+
+declare void @g() !type !3
+
+!0 = !{i32 0, !"typeid1"}
+!1 = !{i32 0, !"typeid2"}
+!2 = !{i32 4, !"typeid2"}
+!3 = !{i32 0, !"typeid3"}
+
+declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone
+
+define i1 @foo(i32* %p) {
+  %pi8 = bitcast i32* %p to i8*
+  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1")
+  ret i1 %x
+}
+
+define i1 @bar(i32* %p) {
+  %pi8 = bitcast i32* %p to i8*
+  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2")
+  ret i1 %x
+}
+
+define i1 @baz(void ()* %p) {
+  %pi8 = bitcast void ()* %p to i8*
+  %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
+  ret i1 %x
+}
+
+define void @main() {
+  %a1 = call i1 @foo(i32* @a) ; returns 1
+  %b1 = call i1 @foo(i32* @b) ; returns 1
+  %c1 = call i1 @foo(i32* @c) ; returns 0
+  %a2 = call i1 @bar(i32* @a) ; returns 0
+  %b2 = call i1 @bar(i32* @b) ; returns 1
+  %c2 = call i1 @bar(i32* @c) ; returns 1
+  %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
+  %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
+  %e = call i1 @baz(void ()* @e) ; returns 1
+  %f = call i1 @baz(void ()* @f) ; returns 0
+  %g = call i1 @baz(void ()* @g) ; returns 1
+  ret void
+}
+</pre></div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="FaultMaps.html" title="FaultMaps and implicit checks"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="MergeFunctions.html" title="MergeFunctions pass, how it works"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/Vectorizers.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/Vectorizers.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/Vectorizers.html (added)
+++ www-releases/trunk/5.0.1/docs/Vectorizers.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,502 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Auto-Vectorization in LLVM — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="Vectorization Plan" href="Proposals/VectorizationPlan.html" />
+    <link rel="prev" title="Source Level Debugging with LLVM" href="SourceLevelDebugging.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="SourceLevelDebugging.html" title="Source Level Debugging with LLVM"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="auto-vectorization-in-llvm">
+<h1>Auto-Vectorization in LLVM<a class="headerlink" href="#auto-vectorization-in-llvm" title="Permalink to this headline">¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#the-loop-vectorizer" id="id2">The Loop Vectorizer</a><ul>
+<li><a class="reference internal" href="#usage" id="id3">Usage</a><ul>
+<li><a class="reference internal" href="#command-line-flags" id="id4">Command line flags</a></li>
+<li><a class="reference internal" href="#pragma-loop-hint-directives" id="id5">Pragma loop hint directives</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#diagnostics" id="id6">Diagnostics</a></li>
+<li><a class="reference internal" href="#features" id="id7">Features</a><ul>
+<li><a class="reference internal" href="#loops-with-unknown-trip-count" id="id8">Loops with unknown trip count</a></li>
+<li><a class="reference internal" href="#runtime-checks-of-pointers" id="id9">Runtime Checks of Pointers</a></li>
+<li><a class="reference internal" href="#reductions" id="id10">Reductions</a></li>
+<li><a class="reference internal" href="#inductions" id="id11">Inductions</a></li>
+<li><a class="reference internal" href="#if-conversion" id="id12">If Conversion</a></li>
+<li><a class="reference internal" href="#pointer-induction-variables" id="id13">Pointer Induction Variables</a></li>
+<li><a class="reference internal" href="#reverse-iterators" id="id14">Reverse Iterators</a></li>
+<li><a class="reference internal" href="#scatter-gather" id="id15">Scatter / Gather</a></li>
+<li><a class="reference internal" href="#vectorization-of-mixed-types" id="id16">Vectorization of Mixed Types</a></li>
+<li><a class="reference internal" href="#global-structures-alias-analysis" id="id17">Global Structures Alias Analysis</a></li>
+<li><a class="reference internal" href="#vectorization-of-function-calls" id="id18">Vectorization of function calls</a></li>
+<li><a class="reference internal" href="#partial-unrolling-during-vectorization" id="id19">Partial unrolling during vectorization</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#performance" id="id20">Performance</a></li>
+<li><a class="reference internal" href="#ongoing-development-directions" id="id21">Ongoing Development Directions</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-slp-vectorizer" id="id22">The SLP Vectorizer</a><ul>
+<li><a class="reference internal" href="#details" id="id23">Details</a></li>
+<li><a class="reference internal" href="#id1" id="id24">Usage</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<p>LLVM has two vectorizers: The <a class="reference internal" href="#loop-vectorizer"><span class="std std-ref">Loop Vectorizer</span></a>,
+which operates on Loops, and the <a class="reference internal" href="#slp-vectorizer"><span class="std std-ref">SLP Vectorizer</span></a>. These vectorizers
+focus on different optimization opportunities and use different techniques.
+The SLP vectorizer merges multiple scalars that are found in the code into
+vectors while the Loop Vectorizer widens instructions in loops
+to operate on multiple consecutive iterations.</p>
+<p>Both the Loop Vectorizer and the SLP Vectorizer are enabled by default.</p>
+<div class="section" id="the-loop-vectorizer">
+<span id="loop-vectorizer"></span><h2><a class="toc-backref" href="#id2">The Loop Vectorizer</a><a class="headerlink" href="#the-loop-vectorizer" title="Permalink to this headline">¶</a></h2>
+<div class="section" id="usage">
+<h3><a class="toc-backref" href="#id3">Usage</a><a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h3>
+<p>The Loop Vectorizer is enabled by default, but it can be disabled
+through clang using the command line flag:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang ... -fno-vectorize  file.c
+</pre></div>
+</div>
+<div class="section" id="command-line-flags">
+<h4><a class="toc-backref" href="#id4">Command line flags</a><a class="headerlink" href="#command-line-flags" title="Permalink to this headline">¶</a></h4>
+<p>The loop vectorizer uses a cost model to decide on the optimal vectorization factor
+and unroll factor. However, users of the vectorizer can force the vectorizer to use
+specific values. Both ‘clang’ and ‘opt’ support the flags below.</p>
+<p>Users can control the vectorization SIMD width using the command line flag “-force-vector-width”.</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang  -mllvm -force-vector-width<span class="o">=</span><span class="m">8</span> ...
+<span class="gp">$</span> opt -loop-vectorize -force-vector-width<span class="o">=</span><span class="m">8</span> ...
+</pre></div>
+</div>
+<p>Users can control the unroll factor using the command line flag “-force-vector-interleave”</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang  -mllvm -force-vector-interleave<span class="o">=</span><span class="m">2</span> ...
+<span class="gp">$</span> opt -loop-vectorize -force-vector-interleave<span class="o">=</span><span class="m">2</span> ...
+</pre></div>
+</div>
+</div>
+<div class="section" id="pragma-loop-hint-directives">
+<h4><a class="toc-backref" href="#id5">Pragma loop hint directives</a><a class="headerlink" href="#pragma-loop-hint-directives" title="Permalink to this headline">¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">#pragma</span> <span class="pre">clang</span> <span class="pre">loop</span></code> directive allows loop vectorization hints to be
+specified for the subsequent for, while, do-while, or c++11 range-based for
+loop. The directive allows vectorization and interleaving to be enabled or
+disabled. Vector width as well as interleave count can also be manually
+specified. The following example explicitly enables vectorization and
+interleaving:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#pragma clang loop vectorize(enable) interleave(enable)</span>
+<span class="k">while</span><span class="p">(...)</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The following example implicitly enables vectorization and interleaving by
+specifying a vector width and interleaving count:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#pragma clang loop vectorize_width(2) interleave_count(2)</span>
+<span class="k">for</span><span class="p">(...)</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>See the Clang
+<a class="reference external" href="http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations">language extensions</a>
+for details.</p>
+</div>
+</div>
+<div class="section" id="diagnostics">
+<h3><a class="toc-backref" href="#id6">Diagnostics</a><a class="headerlink" href="#diagnostics" title="Permalink to this headline">¶</a></h3>
+<p>Many loops cannot be vectorized including loops with complicated control flow,
+unvectorizable types, and unvectorizable calls. The loop vectorizer generates
+optimization remarks which can be queried using command line options to identify
+and diagnose loops that are skipped by the loop-vectorizer.</p>
+<p>Optimization remarks are enabled using:</p>
+<p><code class="docutils literal"><span class="pre">-Rpass=loop-vectorize</span></code> identifies loops that were successfully vectorized.</p>
+<p><code class="docutils literal"><span class="pre">-Rpass-missed=loop-vectorize</span></code> identifies loops that failed vectorization and
+indicates if vectorization was specified.</p>
+<p><code class="docutils literal"><span class="pre">-Rpass-analysis=loop-vectorize</span></code> identifies the statements that caused
+vectorization to fail. If in addition <code class="docutils literal"><span class="pre">-fsave-optimization-record</span></code> is
+provided, multiple causes of vectorization failure may be listed (this behavior
+might change in the future).</p>
+<p>Consider the following loop:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#pragma clang loop vectorize(enable)</span>
+<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">switch</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
+  <span class="k">case</span> <span class="mi">0</span><span class="o">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
+  <span class="k">case</span> <span class="mi">1</span><span class="o">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>   <span class="k">break</span><span class="p">;</span>
+  <span class="k">default</span><span class="o">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The command line <code class="docutils literal"><span class="pre">-Rpass-missed=loop-vectorized</span></code> prints the remark:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="go">no_switch.cpp:4:5: remark: loop not vectorized: vectorization is explicitly enabled [-Rpass-missed=loop-vectorize]</span>
+</pre></div>
+</div>
+<p>And the command line <code class="docutils literal"><span class="pre">-Rpass-analysis=loop-vectorize</span></code> indicates that the
+switch statement cannot be vectorized.</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="go">no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement [-Rpass-analysis=loop-vectorize]</span>
+<span class="go">  switch(A[i]) {</span>
+<span class="go">  ^</span>
+</pre></div>
+</div>
+<p>To ensure line and column numbers are produced include the command line options
+<code class="docutils literal"><span class="pre">-gline-tables-only</span></code> and <code class="docutils literal"><span class="pre">-gcolumn-info</span></code>. See the Clang <a class="reference external" href="http://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports">user manual</a>
+for details</p>
+</div>
+<div class="section" id="features">
+<h3><a class="toc-backref" href="#id7">Features</a><a class="headerlink" href="#features" title="Permalink to this headline">¶</a></h3>
+<p>The LLVM Loop Vectorizer has a number of features that allow it to vectorize
+complex loops.</p>
+<div class="section" id="loops-with-unknown-trip-count">
+<h4><a class="toc-backref" href="#id8">Loops with unknown trip count</a><a class="headerlink" href="#loops-with-unknown-trip-count" title="Permalink to this headline">¶</a></h4>
+<p>The Loop Vectorizer supports loops with an unknown trip count.
+In the loop below, the iteration <code class="docutils literal"><span class="pre">start</span></code> and <code class="docutils literal"><span class="pre">finish</span></code> points are unknown,
+and the Loop Vectorizer has a mechanism to vectorize loops that do not start
+at zero. In this example, ‘n’ may not be a multiple of the vector width, and
+the vectorizer has to execute the last few iterations as scalar code. Keeping
+a scalar copy of the loop increases the code size.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">float</span> <span class="n">K</span><span class="p">,</span> <span class="kt">int</span> <span class="n">start</span><span class="p">,</span> <span class="kt">int</span> <span class="n">end</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">start</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">end</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">K</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="runtime-checks-of-pointers">
+<h4><a class="toc-backref" href="#id9">Runtime Checks of Pointers</a><a class="headerlink" href="#runtime-checks-of-pointers" title="Permalink to this headline">¶</a></h4>
+<p>In the example below, if the pointers A and B point to consecutive addresses,
+then it is illegal to vectorize the code because some elements of A will be
+written before they are read from array B.</p>
+<p>Some programmers use the ‘restrict’ keyword to notify the compiler that the
+pointers are disjointed, but in our example, the Loop Vectorizer has no way of
+knowing that the pointers A and B are unique. The Loop Vectorizer handles this
+loop by placing code that checks, at runtime, if the arrays A and B point to
+disjointed memory locations. If arrays A and B overlap, then the scalar version
+of the loop is executed.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">float</span> <span class="n">K</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">*=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">K</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="reductions">
+<h4><a class="toc-backref" href="#id10">Reductions</a><a class="headerlink" href="#reductions" title="Permalink to this headline">¶</a></h4>
+<p>In this example the <code class="docutils literal"><span class="pre">sum</span></code> variable is used by consecutive iterations of
+the loop. Normally, this would prevent vectorization, but the vectorizer can
+detect that ‘sum’ is a reduction variable. The variable ‘sum’ becomes a vector
+of integers, and at the end of the loop the elements of the array are added
+together to create the correct result. We support a number of different
+reduction operations, such as addition, multiplication, XOR, AND and OR.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">unsigned</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">sum</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">5</span><span class="p">;</span>
+  <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>We support floating point reduction operations when <cite>-ffast-math</cite> is used.</p>
+</div>
+<div class="section" id="inductions">
+<h4><a class="toc-backref" href="#id11">Inductions</a><a class="headerlink" href="#inductions" title="Permalink to this headline">¶</a></h4>
+<p>In this example the value of the induction variable <code class="docutils literal"><span class="pre">i</span></code> is saved into an
+array. The Loop Vectorizer knows to vectorize induction variables.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">float</span> <span class="n">K</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="if-conversion">
+<h4><a class="toc-backref" href="#id12">If Conversion</a><a class="headerlink" href="#if-conversion" title="Permalink to this headline">¶</a></h4>
+<p>The Loop Vectorizer is able to “flatten” the IF statement in the code and
+generate a single stream of instructions. The Loop Vectorizer supports any
+control flow in the innermost loop. The innermost loop may contain complex
+nesting of IFs, ELSEs and even GOTOs.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">unsigned</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">></span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
+      <span class="n">sum</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">5</span><span class="p">;</span>
+  <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="pointer-induction-variables">
+<h4><a class="toc-backref" href="#id13">Pointer Induction Variables</a><a class="headerlink" href="#pointer-induction-variables" title="Permalink to this headline">¶</a></h4>
+<p>This example uses the “accumulate” function of the standard c++ library. This
+loop uses C++ iterators, which are pointers, and not integer indices.
+The Loop Vectorizer detects pointer induction variables and can vectorize
+this loop. This feature is important because many C++ programs use iterators.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">baz</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">accumulate</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">A</span> <span class="o">+</span> <span class="n">n</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="reverse-iterators">
+<h4><a class="toc-backref" href="#id14">Reverse Iterators</a><a class="headerlink" href="#reverse-iterators" title="Permalink to this headline">¶</a></h4>
+<p>The Loop Vectorizer can vectorize loops that count backwards.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="n">n</span><span class="p">;</span> <span class="n">i</span> <span class="o">></span> <span class="mi">0</span><span class="p">;</span> <span class="o">--</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span><span class="mi">1</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="scatter-gather">
+<h4><a class="toc-backref" href="#id15">Scatter / Gather</a><a class="headerlink" href="#scatter-gather" title="Permalink to this headline">¶</a></h4>
+<p>The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions
+that scatter/gathers memory.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span> <span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">intptr_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+      <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">4</span><span class="p">];</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>In many situations the cost model will inform LLVM that this is not beneficial
+and LLVM will only vectorize such code if forced with “-mllvm -force-vector-width=#”.</p>
+</div>
+<div class="section" id="vectorization-of-mixed-types">
+<h4><a class="toc-backref" href="#id16">Vectorization of Mixed Types</a><a class="headerlink" href="#vectorization-of-mixed-types" title="Permalink to this headline">¶</a></h4>
+<p>The Loop Vectorizer can vectorize programs with mixed types. The Vectorizer
+cost model can estimate the cost of the type conversion and decide if
+vectorization is profitable.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">,</span> <span class="kt">int</span> <span class="n">k</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="global-structures-alias-analysis">
+<h4><a class="toc-backref" href="#id17">Global Structures Alias Analysis</a><a class="headerlink" href="#global-structures-alias-analysis" title="Permalink to this headline">¶</a></h4>
+<p>Access to global structures can also be vectorized, with alias analysis being
+used to make sure accesses don’t alias. Run-time checks can also be added on
+pointer access to structure members.</p>
+<p>Many variations are supported, but some that rely on undefined behaviour being
+ignored (as other compilers do) are still being left un-vectorized.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">A</span><span class="p">[</span><span class="mi">100</span><span class="p">],</span> <span class="n">K</span><span class="p">,</span> <span class="n">B</span><span class="p">[</span><span class="mi">100</span><span class="p">];</span> <span class="p">}</span> <span class="n">Foo</span><span class="p">;</span>
+
+<span class="kt">int</span> <span class="nf">foo</span><span class="p">()</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">100</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">Foo</span><span class="p">.</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">Foo</span><span class="p">.</span><span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="mi">100</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="vectorization-of-function-calls">
+<h4><a class="toc-backref" href="#id18">Vectorization of function calls</a><a class="headerlink" href="#vectorization-of-function-calls" title="Permalink to this headline">¶</a></h4>
+<p>The Loop Vectorize can vectorize intrinsic math functions.
+See the table below for a list of these functions.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="26%" />
+<col width="26%" />
+<col width="47%" />
+</colgroup>
+<tbody valign="top">
+<tr class="row-odd"><td>pow</td>
+<td>exp</td>
+<td>exp2</td>
+</tr>
+<tr class="row-even"><td>sin</td>
+<td>cos</td>
+<td>sqrt</td>
+</tr>
+<tr class="row-odd"><td>log</td>
+<td>log2</td>
+<td>log10</td>
+</tr>
+<tr class="row-even"><td>fabs</td>
+<td>floor</td>
+<td>ceil</td>
+</tr>
+<tr class="row-odd"><td>fma</td>
+<td>trunc</td>
+<td>nearbyint</td>
+</tr>
+<tr class="row-even"><td> </td>
+<td> </td>
+<td>fmuladd</td>
+</tr>
+</tbody>
+</table>
+<p>The loop vectorizer knows about special instructions on the target and will
+vectorize a loop containing a function call that maps to the instructions. For
+example, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps
+instruction is available.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">float</span> <span class="o">*</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">!=</span> <span class="mi">1024</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+    <span class="n">f</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">floorf</span><span class="p">(</span><span class="n">f</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="partial-unrolling-during-vectorization">
+<h4><a class="toc-backref" href="#id19">Partial unrolling during vectorization</a><a class="headerlink" href="#partial-unrolling-during-vectorization" title="Permalink to this headline">¶</a></h4>
+<p>Modern processors feature multiple execution units, and only programs that contain a
+high degree of parallelism can fully utilize the entire width of the machine.
+The Loop Vectorizer increases the instruction level parallelism (ILP) by
+performing partial-unrolling of loops.</p>
+<p>In the example below the entire array is accumulated into the variable ‘sum’.
+This is inefficient because only a single execution port can be used by the processor.
+By unrolling the code the Loop Vectorizer allows two or more execution ports
+to be used simultaneously.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">int</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">B</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
+  <span class="kt">unsigned</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+      <span class="n">sum</span> <span class="o">+=</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
+  <span class="k">return</span> <span class="n">sum</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The Loop Vectorizer uses a cost model to decide when it is profitable to unroll loops.
+The decision to unroll the loop depends on the register pressure and the generated code size.</p>
+</div>
+</div>
+<div class="section" id="performance">
+<h3><a class="toc-backref" href="#id20">Performance</a><a class="headerlink" href="#performance" title="Permalink to this headline">¶</a></h3>
+<p>This section shows the execution time of Clang on a simple benchmark:
+<a class="reference external" href="http://llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/UnitTests/Vectorizer/">gcc-loops</a>.
+This benchmarks is a collection of loops from the GCC autovectorization
+<a class="reference external" href="http://gcc.gnu.org/projects/tree-ssa/vectorization.html">page</a> by Dorit Nuzman.</p>
+<p>The chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for “corei7-avx”, running on a Sandybridge iMac.
+The Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels.</p>
+<img alt="_images/gcc-loops.png" src="_images/gcc-loops.png" />
+<p>And Linpack-pc with the same configuration. Result is Mflops, higher is better.</p>
+<img alt="_images/linpack-pc.png" src="_images/linpack-pc.png" />
+</div>
+<div class="section" id="ongoing-development-directions">
+<h3><a class="toc-backref" href="#id21">Ongoing Development Directions</a><a class="headerlink" href="#ongoing-development-directions" title="Permalink to this headline">¶</a></h3>
+<div class="toctree-wrapper compound">
+</div>
+<dl class="docutils">
+<dt><a class="reference internal" href="Proposals/VectorizationPlan.html"><span class="doc">Vectorization Plan</span></a></dt>
+<dd>Modeling the process and upgrading the infrastructure of LLVM’s Loop Vectorizer.</dd>
+</dl>
+</div>
+</div>
+<div class="section" id="the-slp-vectorizer">
+<span id="slp-vectorizer"></span><h2><a class="toc-backref" href="#id22">The SLP Vectorizer</a><a class="headerlink" href="#the-slp-vectorizer" title="Permalink to this headline">¶</a></h2>
+<div class="section" id="details">
+<h3><a class="toc-backref" href="#id23">Details</a><a class="headerlink" href="#details" title="Permalink to this headline">¶</a></h3>
+<p>The goal of SLP vectorization (a.k.a. superword-level parallelism) is
+to combine similar independent instructions
+into vector instructions. Memory accesses, arithmetic operations, comparison
+operations, PHI-nodes, can all be vectorized using this technique.</p>
+<p>For example, the following function performs very similar operations on its
+inputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these
+into vector operations.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="n">a1</span><span class="p">,</span> <span class="kt">int</span> <span class="n">a2</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b1</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b2</span><span class="p">,</span> <span class="kt">int</span> <span class="o">*</span><span class="n">A</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">A</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">a1</span><span class="o">*</span><span class="p">(</span><span class="n">a1</span> <span class="o">+</span> <span class="n">b1</span><span class="p">)</span><span class="o">/</span><span class="n">b1</span> <span class="o">+</span> <span class="mi">50</span><span class="o">*</span><span class="n">b1</span><span class="o">/</span><span class="n">a1</span><span class="p">;</span>
+  <span class="n">A</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">a2</span><span class="o">*</span><span class="p">(</span><span class="n">a2</span> <span class="o">+</span> <span class="n">b2</span><span class="p">)</span><span class="o">/</span><span class="n">b2</span> <span class="o">+</span> <span class="mi">50</span><span class="o">*</span><span class="n">b2</span><span class="o">/</span><span class="n">a2</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The SLP-vectorizer processes the code bottom-up, across basic blocks, in search of scalars to combine.</p>
+</div>
+<div class="section" id="id1">
+<h3><a class="toc-backref" href="#id24">Usage</a><a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h3>
+<p>The SLP Vectorizer is enabled by default, but it can be disabled
+through clang using the command line flag:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang -fno-slp-vectorize file.c
+</pre></div>
+</div>
+<p>LLVM has a second basic block vectorization phase
+which is more compile-time intensive (The BB vectorizer). This optimization
+can be enabled through clang using the command line flag:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> clang -fslp-vectorize-aggressive file.c
+</pre></div>
+</div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="SourceLevelDebugging.html" title="Source Level Debugging with LLVM"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html (added)
+++ www-releases/trunk/5.0.1/docs/WritingAnLLVMBackend.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1821 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Writing an LLVM Backend — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="How To Use Instruction Mappings" href="HowToUseInstrMappings.html" />
+    <link rel="prev" title="Vectorization Plan" href="Proposals/VectorizationPlan.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="HowToUseInstrMappings.html" title="How To Use Instruction Mappings"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="writing-an-llvm-backend">
+<h1>Writing an LLVM Backend<a class="headerlink" href="#writing-an-llvm-backend" title="Permalink to this headline">¶</a></h1>
+<div class="toctree-wrapper compound">
+</div>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction" id="id3">Introduction</a><ul>
+<li><a class="reference internal" href="#audience" id="id4">Audience</a></li>
+<li><a class="reference internal" href="#prerequisite-reading" id="id5">Prerequisite Reading</a></li>
+<li><a class="reference internal" href="#basic-steps" id="id6">Basic Steps</a></li>
+<li><a class="reference internal" href="#preliminaries" id="id7">Preliminaries</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#target-machine" id="id8">Target Machine</a></li>
+<li><a class="reference internal" href="#target-registration" id="id9">Target Registration</a></li>
+<li><a class="reference internal" href="#register-set-and-register-classes" id="id10">Register Set and Register Classes</a><ul>
+<li><a class="reference internal" href="#defining-a-register" id="id11">Defining a Register</a></li>
+<li><a class="reference internal" href="#defining-a-register-class" id="id12">Defining a Register Class</a></li>
+<li><a class="reference internal" href="#implement-a-subclass-of-targetregisterinfo" id="id13">Implement a subclass of <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code></a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#instruction-set" id="id14">Instruction Set</a><ul>
+<li><a class="reference internal" href="#instruction-operand-mapping" id="id15">Instruction Operand Mapping</a><ul>
+<li><a class="reference internal" href="#instruction-operand-name-mapping" id="id16">Instruction Operand Name Mapping</a></li>
+<li><a class="reference internal" href="#instruction-operand-types" id="id17">Instruction Operand Types</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#instruction-scheduling" id="id18">Instruction Scheduling</a></li>
+<li><a class="reference internal" href="#instruction-relation-mapping" id="id19">Instruction Relation Mapping</a></li>
+<li><a class="reference internal" href="#implement-a-subclass-of-targetinstrinfo" id="id20">Implement a subclass of <code class="docutils literal"><span class="pre">TargetInstrInfo</span></code></a></li>
+<li><a class="reference internal" href="#branch-folding-and-if-conversion" id="id21">Branch Folding and If Conversion</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#instruction-selector" id="id22">Instruction Selector</a><ul>
+<li><a class="reference internal" href="#the-selectiondag-legalize-phase" id="id23">The SelectionDAG Legalize Phase</a><ul>
+<li><a class="reference internal" href="#promote" id="id24">Promote</a></li>
+<li><a class="reference internal" href="#expand" id="id25">Expand</a></li>
+<li><a class="reference internal" href="#custom" id="id26">Custom</a></li>
+<li><a class="reference internal" href="#legal" id="id27">Legal</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#calling-conventions" id="id28">Calling Conventions</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#assembly-printer" id="id29">Assembly Printer</a></li>
+<li><a class="reference internal" href="#subtarget-support" id="id30">Subtarget Support</a></li>
+<li><a class="reference internal" href="#jit-support" id="id31">JIT Support</a><ul>
+<li><a class="reference internal" href="#machine-code-emitter" id="id32">Machine Code Emitter</a></li>
+<li><a class="reference internal" href="#target-jit-info" id="id33">Target JIT Info</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="introduction">
+<h2><a class="toc-backref" href="#id3">Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
+<p>This document describes techniques for writing compiler backends that convert
+the LLVM Intermediate Representation (IR) to code for a specified machine or
+other languages.  Code intended for a specific machine can take the form of
+either assembly code or binary code (usable for a JIT compiler).</p>
+<p>The backend of LLVM features a target-independent code generator that may
+create output for several types of target CPUs — including X86, PowerPC,
+ARM, and SPARC.  The backend may also be used to generate code targeted at SPUs
+of the Cell processor or GPUs to support the execution of compute kernels.</p>
+<p>The document focuses on existing examples found in subdirectories of
+<code class="docutils literal"><span class="pre">llvm/lib/Target</span></code> in a downloaded LLVM release.  In particular, this document
+focuses on the example of creating a static compiler (one that emits text
+assembly) for a SPARC target, because SPARC has fairly standard
+characteristics, such as a RISC instruction set and straightforward calling
+conventions.</p>
+<div class="section" id="audience">
+<h3><a class="toc-backref" href="#id4">Audience</a><a class="headerlink" href="#audience" title="Permalink to this headline">¶</a></h3>
+<p>The audience for this document is anyone who needs to write an LLVM backend to
+generate code for a specific hardware or software target.</p>
+</div>
+<div class="section" id="prerequisite-reading">
+<h3><a class="toc-backref" href="#id5">Prerequisite Reading</a><a class="headerlink" href="#prerequisite-reading" title="Permalink to this headline">¶</a></h3>
+<p>These essential documents must be read before reading this document:</p>
+<ul class="simple">
+<li><a class="reference external" href="LangRef.html">LLVM Language Reference Manual</a> — a reference manual for
+the LLVM assembly language.</li>
+<li><a class="reference internal" href="CodeGenerator.html"><span class="doc">The LLVM Target-Independent Code Generator</span></a> — a guide to the components (classes and code
+generation algorithms) for translating the LLVM internal representation into
+machine code for a specified target.  Pay particular attention to the
+descriptions of code generation stages: Instruction Selection, Scheduling and
+Formation, SSA-based Optimization, Register Allocation, Prolog/Epilog Code
+Insertion, Late Machine Code Optimizations, and Code Emission.</li>
+<li><a class="reference internal" href="TableGen/index.html"><span class="doc">TableGen</span></a> — a document that describes the TableGen
+(<code class="docutils literal"><span class="pre">tblgen</span></code>) application that manages domain-specific information to support
+LLVM code generation.  TableGen processes input from a target description
+file (<code class="docutils literal"><span class="pre">.td</span></code> suffix) and generates C++ code that can be used for code
+generation.</li>
+<li><a class="reference internal" href="WritingAnLLVMPass.html"><span class="doc">Writing an LLVM Pass</span></a> — The assembly printer is a <code class="docutils literal"><span class="pre">FunctionPass</span></code>, as
+are several <code class="docutils literal"><span class="pre">SelectionDAG</span></code> processing steps.</li>
+</ul>
+<p>To follow the SPARC examples in this document, have a copy of <a class="reference external" href="http://www.sparc.org/standards/V8.pdf">The SPARC
+Architecture Manual, Version 8</a> for
+reference.  For details about the ARM instruction set, refer to the <a class="reference external" href="http://infocenter.arm.com/">ARM
+Architecture Reference Manual</a>.  For more about
+the GNU Assembler format (<code class="docutils literal"><span class="pre">GAS</span></code>), see <a class="reference external" href="http://sourceware.org/binutils/docs/as/index.html">Using As</a>, especially for the
+assembly printer.  “Using As” contains a list of target machine dependent
+features.</p>
+</div>
+<div class="section" id="basic-steps">
+<h3><a class="toc-backref" href="#id6">Basic Steps</a><a class="headerlink" href="#basic-steps" title="Permalink to this headline">¶</a></h3>
+<p>To write a compiler backend for LLVM that converts the LLVM IR to code for a
+specified target (machine or other language), follow these steps:</p>
+<ul class="simple">
+<li>Create a subclass of the <code class="docutils literal"><span class="pre">TargetMachine</span></code> class that describes
+characteristics of your target machine.  Copy existing examples of specific
+<code class="docutils literal"><span class="pre">TargetMachine</span></code> class and header files; for example, start with
+<code class="docutils literal"><span class="pre">SparcTargetMachine.cpp</span></code> and <code class="docutils literal"><span class="pre">SparcTargetMachine.h</span></code>, but change the file
+names for your target.  Similarly, change code that references “<code class="docutils literal"><span class="pre">Sparc</span></code>” to
+reference your target.</li>
+<li>Describe the register set of the target.  Use TableGen to generate code for
+register definition, register aliases, and register classes from a
+target-specific <code class="docutils literal"><span class="pre">RegisterInfo.td</span></code> input file.  You should also write
+additional code for a subclass of the <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code> class that
+represents the class register file data used for register allocation and also
+describes the interactions between registers.</li>
+<li>Describe the instruction set of the target.  Use TableGen to generate code
+for target-specific instructions from target-specific versions of
+<code class="docutils literal"><span class="pre">TargetInstrFormats.td</span></code> and <code class="docutils literal"><span class="pre">TargetInstrInfo.td</span></code>.  You should write
+additional code for a subclass of the <code class="docutils literal"><span class="pre">TargetInstrInfo</span></code> class to represent
+machine instructions supported by the target machine.</li>
+<li>Describe the selection and conversion of the LLVM IR from a Directed Acyclic
+Graph (DAG) representation of instructions to native target-specific
+instructions.  Use TableGen to generate code that matches patterns and
+selects instructions based on additional information in a target-specific
+version of <code class="docutils literal"><span class="pre">TargetInstrInfo.td</span></code>.  Write code for <code class="docutils literal"><span class="pre">XXXISelDAGToDAG.cpp</span></code>,
+where <code class="docutils literal"><span class="pre">XXX</span></code> identifies the specific target, to perform pattern matching and
+DAG-to-DAG instruction selection.  Also write code in <code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code>
+to replace or remove operations and data types that are not supported
+natively in a SelectionDAG.</li>
+<li>Write code for an assembly printer that converts LLVM IR to a GAS format for
+your target machine.  You should add assembly strings to the instructions
+defined in your target-specific version of <code class="docutils literal"><span class="pre">TargetInstrInfo.td</span></code>.  You
+should also write code for a subclass of <code class="docutils literal"><span class="pre">AsmPrinter</span></code> that performs the
+LLVM-to-assembly conversion and a trivial subclass of <code class="docutils literal"><span class="pre">TargetAsmInfo</span></code>.</li>
+<li>Optionally, add support for subtargets (i.e., variants with different
+capabilities).  You should also write code for a subclass of the
+<code class="docutils literal"><span class="pre">TargetSubtarget</span></code> class, which allows you to use the <code class="docutils literal"><span class="pre">-mcpu=</span></code> and
+<code class="docutils literal"><span class="pre">-mattr=</span></code> command-line options.</li>
+<li>Optionally, add JIT support and create a machine code emitter (subclass of
+<code class="docutils literal"><span class="pre">TargetJITInfo</span></code>) that is used to emit binary code directly into memory.</li>
+</ul>
+<p>In the <code class="docutils literal"><span class="pre">.cpp</span></code> and <code class="docutils literal"><span class="pre">.h</span></code>. files, initially stub up these methods and then
+implement them later.  Initially, you may not know which private members that
+the class will need and which components will need to be subclassed.</p>
+</div>
+<div class="section" id="preliminaries">
+<h3><a class="toc-backref" href="#id7">Preliminaries</a><a class="headerlink" href="#preliminaries" title="Permalink to this headline">¶</a></h3>
+<p>To actually create your compiler backend, you need to create and modify a few
+files.  The absolute minimum is discussed here.  But to actually use the LLVM
+target-independent code generator, you must perform the steps described in the
+<a class="reference internal" href="CodeGenerator.html"><span class="doc">LLVM Target-Independent Code Generator</span></a> document.</p>
+<p>First, you should create a subdirectory under <code class="docutils literal"><span class="pre">lib/Target</span></code> to hold all the
+files related to your target.  If your target is called “Dummy”, create the
+directory <code class="docutils literal"><span class="pre">lib/Target/Dummy</span></code>.</p>
+<p>In this new directory, create a <code class="docutils literal"><span class="pre">CMakeLists.txt</span></code>.  It is easiest to copy a
+<code class="docutils literal"><span class="pre">CMakeLists.txt</span></code> of another target and modify it.  It should at least contain
+the <code class="docutils literal"><span class="pre">LLVM_TARGET_DEFINITIONS</span></code> variable. The library can be named <code class="docutils literal"><span class="pre">LLVMDummy</span></code>
+(for example, see the MIPS target).  Alternatively, you can split the library
+into <code class="docutils literal"><span class="pre">LLVMDummyCodeGen</span></code> and <code class="docutils literal"><span class="pre">LLVMDummyAsmPrinter</span></code>, the latter of which
+should be implemented in a subdirectory below <code class="docutils literal"><span class="pre">lib/Target/Dummy</span></code> (for example,
+see the PowerPC target).</p>
+<p>Note that these two naming schemes are hardcoded into <code class="docutils literal"><span class="pre">llvm-config</span></code>.  Using
+any other naming scheme will confuse <code class="docutils literal"><span class="pre">llvm-config</span></code> and produce a lot of
+(seemingly unrelated) linker errors when linking <code class="docutils literal"><span class="pre">llc</span></code>.</p>
+<p>To make your target actually do something, you need to implement a subclass of
+<code class="docutils literal"><span class="pre">TargetMachine</span></code>.  This implementation should typically be in the file
+<code class="docutils literal"><span class="pre">lib/Target/DummyTargetMachine.cpp</span></code>, but any file in the <code class="docutils literal"><span class="pre">lib/Target</span></code>
+directory will be built and should work.  To use LLVM’s target independent code
+generator, you should do what all current machine backends do: create a
+subclass of <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code>.  (To create a target from scratch, create a
+subclass of <code class="docutils literal"><span class="pre">TargetMachine</span></code>.)</p>
+<p>To get LLVM to actually build and link your target, you need to run <code class="docutils literal"><span class="pre">cmake</span></code>
+with <code class="docutils literal"><span class="pre">-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=Dummy</span></code>. This will build your
+target without needing to add it to the list of all the targets.</p>
+<p>Once your target is stable, you can add it to the <code class="docutils literal"><span class="pre">LLVM_ALL_TARGETS</span></code> variable
+located in the main <code class="docutils literal"><span class="pre">CMakeLists.txt</span></code>.</p>
+</div>
+</div>
+<div class="section" id="target-machine">
+<h2><a class="toc-backref" href="#id8">Target Machine</a><a class="headerlink" href="#target-machine" title="Permalink to this headline">¶</a></h2>
+<p><code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code> is designed as a base class for targets implemented with
+the LLVM target-independent code generator.  The <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code> class
+should be specialized by a concrete target class that implements the various
+virtual methods.  <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code> is defined as a subclass of
+<code class="docutils literal"><span class="pre">TargetMachine</span></code> in <code class="docutils literal"><span class="pre">include/llvm/Target/TargetMachine.h</span></code>.  The
+<code class="docutils literal"><span class="pre">TargetMachine</span></code> class implementation (<code class="docutils literal"><span class="pre">TargetMachine.cpp</span></code>) also processes
+numerous command-line options.</p>
+<p>To create a concrete target-specific subclass of <code class="docutils literal"><span class="pre">LLVMTargetMachine</span></code>, start
+by copying an existing <code class="docutils literal"><span class="pre">TargetMachine</span></code> class and header.  You should name the
+files that you create to reflect your specific target.  For instance, for the
+SPARC target, name the files <code class="docutils literal"><span class="pre">SparcTargetMachine.h</span></code> and
+<code class="docutils literal"><span class="pre">SparcTargetMachine.cpp</span></code>.</p>
+<p>For a target machine <code class="docutils literal"><span class="pre">XXX</span></code>, the implementation of <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> must
+have access methods to obtain objects that represent target components.  These
+methods are named <code class="docutils literal"><span class="pre">get*Info</span></code>, and are intended to obtain the instruction set
+(<code class="docutils literal"><span class="pre">getInstrInfo</span></code>), register set (<code class="docutils literal"><span class="pre">getRegisterInfo</span></code>), stack frame layout
+(<code class="docutils literal"><span class="pre">getFrameInfo</span></code>), and similar information.  <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> must also
+implement the <code class="docutils literal"><span class="pre">getDataLayout</span></code> method to access an object with target-specific
+data characteristics, such as data type size and alignment requirements.</p>
+<p>For instance, for the SPARC target, the header file <code class="docutils literal"><span class="pre">SparcTargetMachine.h</span></code>
+declares prototypes for several <code class="docutils literal"><span class="pre">get*Info</span></code> and <code class="docutils literal"><span class="pre">getDataLayout</span></code> methods that
+simply return a class member.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="n">llvm</span> <span class="p">{</span>
+
+<span class="k">class</span> <span class="nc">Module</span><span class="p">;</span>
+
+<span class="k">class</span> <span class="nc">SparcTargetMachine</span> <span class="o">:</span> <span class="k">public</span> <span class="n">LLVMTargetMachine</span> <span class="p">{</span>
+  <span class="k">const</span> <span class="n">DataLayout</span> <span class="n">DataLayout</span><span class="p">;</span>       <span class="c1">// Calculates type size & alignment</span>
+  <span class="n">SparcSubtarget</span> <span class="n">Subtarget</span><span class="p">;</span>
+  <span class="n">SparcInstrInfo</span> <span class="n">InstrInfo</span><span class="p">;</span>
+  <span class="n">TargetFrameInfo</span> <span class="n">FrameInfo</span><span class="p">;</span>
+
+<span class="k">protected</span><span class="o">:</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetAsmInfo</span> <span class="o">*</span><span class="n">createTargetAsmInfo</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>
+
+<span class="k">public</span><span class="o">:</span>
+  <span class="n">SparcTargetMachine</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">FS</span><span class="p">);</span>
+
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">SparcInstrInfo</span> <span class="o">*</span><span class="nf">getInstrInfo</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span><span class="k">return</span> <span class="o">&</span><span class="n">InstrInfo</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetFrameInfo</span> <span class="o">*</span><span class="nf">getFrameInfo</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span><span class="k">return</span> <span class="o">&</span><span class="n">FrameInfo</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetSubtarget</span> <span class="o">*</span><span class="nf">getSubtargetImpl</span><span class="p">()</span> <span class="k">const</span><span class="p">{</span><span class="k">return</span> <span class="o">&</span><span class="n">Subtarget</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">TargetRegisterInfo</span> <span class="o">*</span><span class="nf">getRegisterInfo</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
+    <span class="k">return</span> <span class="o">&</span><span class="n">InstrInfo</span><span class="p">.</span><span class="n">getRegisterInfo</span><span class="p">();</span>
+  <span class="p">}</span>
+  <span class="k">virtual</span> <span class="k">const</span> <span class="n">DataLayout</span> <span class="o">*</span><span class="nf">getDataLayout</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="o">&</span><span class="n">DataLayout</span><span class="p">;</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="kt">unsigned</span> <span class="nf">getModuleMatchQuality</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">);</span>
+
+  <span class="c1">// Pass Pipeline Configuration</span>
+  <span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">addInstSelector</span><span class="p">(</span><span class="n">PassManagerBase</span> <span class="o">&</span><span class="n">PM</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">Fast</span><span class="p">);</span>
+  <span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">addPreEmitPass</span><span class="p">(</span><span class="n">PassManagerBase</span> <span class="o">&</span><span class="n">PM</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">Fast</span><span class="p">);</span>
+<span class="p">};</span>
+
+<span class="p">}</span> <span class="c1">// end namespace llvm</span>
+</pre></div>
+</div>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getInstrInfo()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getRegisterInfo()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getFrameInfo()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getDataLayout()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getSubtargetImpl()</span></code></li>
+</ul>
+<p>For some targets, you also need to support the following methods:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getTargetLowering()</span></code></li>
+<li><code class="docutils literal"><span class="pre">getJITInfo()</span></code></li>
+</ul>
+<p>Some architectures, such as GPUs, do not support jumping to an arbitrary
+program location and implement branching using masked execution and loop using
+special instructions around the loop body. In order to avoid CFG modifications
+that introduce irreducible control flow not handled by such hardware, a target
+must call <cite>setRequiresStructuredCFG(true)</cite> when being initialized.</p>
+<p>In addition, the <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> constructor should specify a
+<code class="docutils literal"><span class="pre">TargetDescription</span></code> string that determines the data layout for the target
+machine, including characteristics such as pointer size, alignment, and
+endianness.  For example, the constructor for <code class="docutils literal"><span class="pre">SparcTargetMachine</span></code> contains
+the following:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SparcTargetMachine</span><span class="o">::</span><span class="n">SparcTargetMachine</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">FS</span><span class="p">)</span>
+  <span class="o">:</span> <span class="n">DataLayout</span><span class="p">(</span><span class="s">"E-p:32:32-f128:128:128"</span><span class="p">),</span>
+    <span class="n">Subtarget</span><span class="p">(</span><span class="n">M</span><span class="p">,</span> <span class="n">FS</span><span class="p">),</span> <span class="n">InstrInfo</span><span class="p">(</span><span class="n">Subtarget</span><span class="p">),</span>
+    <span class="n">FrameInfo</span><span class="p">(</span><span class="n">TargetFrameInfo</span><span class="o">::</span><span class="n">StackGrowsDown</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Hyphens separate portions of the <code class="docutils literal"><span class="pre">TargetDescription</span></code> string.</p>
+<ul class="simple">
+<li>An upper-case “<code class="docutils literal"><span class="pre">E</span></code>” in the string indicates a big-endian target data model.
+A lower-case “<code class="docutils literal"><span class="pre">e</span></code>” indicates little-endian.</li>
+<li>“<code class="docutils literal"><span class="pre">p:</span></code>” is followed by pointer information: size, ABI alignment, and
+preferred alignment.  If only two figures follow “<code class="docutils literal"><span class="pre">p:</span></code>”, then the first
+value is pointer size, and the second value is both ABI and preferred
+alignment.</li>
+<li>Then a letter for numeric type alignment: “<code class="docutils literal"><span class="pre">i</span></code>”, “<code class="docutils literal"><span class="pre">f</span></code>”, “<code class="docutils literal"><span class="pre">v</span></code>”, or
+“<code class="docutils literal"><span class="pre">a</span></code>” (corresponding to integer, floating point, vector, or aggregate).
+“<code class="docutils literal"><span class="pre">i</span></code>”, “<code class="docutils literal"><span class="pre">v</span></code>”, or “<code class="docutils literal"><span class="pre">a</span></code>” are followed by ABI alignment and preferred
+alignment. “<code class="docutils literal"><span class="pre">f</span></code>” is followed by three values: the first indicates the size
+of a long double, then ABI alignment, and then ABI preferred alignment.</li>
+</ul>
+</div>
+<div class="section" id="target-registration">
+<h2><a class="toc-backref" href="#id9">Target Registration</a><a class="headerlink" href="#target-registration" title="Permalink to this headline">¶</a></h2>
+<p>You must also register your target with the <code class="docutils literal"><span class="pre">TargetRegistry</span></code>, which is what
+other LLVM tools use to be able to lookup and use your target at runtime.  The
+<code class="docutils literal"><span class="pre">TargetRegistry</span></code> can be used directly, but for most targets there are helper
+templates which should take care of the work for you.</p>
+<p>All targets should declare a global <code class="docutils literal"><span class="pre">Target</span></code> object which is used to
+represent the target during registration.  Then, in the target’s <code class="docutils literal"><span class="pre">TargetInfo</span></code>
+library, the target should define that object and use the <code class="docutils literal"><span class="pre">RegisterTarget</span></code>
+template to register the target.  For example, the Sparc registration code
+looks like this:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">Target</span> <span class="n">llvm</span><span class="o">::</span><span class="n">getTheSparcTarget</span><span class="p">();</span>
+
+<span class="k">extern</span> <span class="s">"C"</span> <span class="kt">void</span> <span class="n">LLVMInitializeSparcTargetInfo</span><span class="p">()</span> <span class="p">{</span>
+  <span class="n">RegisterTarget</span><span class="o"><</span><span class="n">Triple</span><span class="o">::</span><span class="n">sparc</span><span class="p">,</span> <span class="cm">/*HasJIT=*/</span><span class="nb">false</span><span class="o">></span>
+    <span class="n">X</span><span class="p">(</span><span class="n">getTheSparcTarget</span><span class="p">(),</span> <span class="s">"sparc"</span><span class="p">,</span> <span class="s">"Sparc"</span><span class="p">);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This allows the <code class="docutils literal"><span class="pre">TargetRegistry</span></code> to look up the target by name or by target
+triple.  In addition, most targets will also register additional features which
+are available in separate libraries.  These registration steps are separate,
+because some clients may wish to only link in some parts of the target — the
+JIT code generator does not require the use of the assembler printer, for
+example.  Here is an example of registering the Sparc assembly printer:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">extern</span> <span class="s">"C"</span> <span class="kt">void</span> <span class="n">LLVMInitializeSparcAsmPrinter</span><span class="p">()</span> <span class="p">{</span>
+  <span class="n">RegisterAsmPrinter</span><span class="o"><</span><span class="n">SparcAsmPrinter</span><span class="o">></span> <span class="n">X</span><span class="p">(</span><span class="n">getTheSparcTarget</span><span class="p">());</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>For more information, see “<a class="reference external" href="/doxygen/TargetRegistry_8h-source.html">llvm/Target/TargetRegistry.h</a>”.</p>
+</div>
+<div class="section" id="register-set-and-register-classes">
+<h2><a class="toc-backref" href="#id10">Register Set and Register Classes</a><a class="headerlink" href="#register-set-and-register-classes" title="Permalink to this headline">¶</a></h2>
+<p>You should describe a concrete target-specific class that represents the
+register file of a target machine.  This class is called <code class="docutils literal"><span class="pre">XXXRegisterInfo</span></code>
+(where <code class="docutils literal"><span class="pre">XXX</span></code> identifies the target) and represents the class register file
+data that is used for register allocation.  It also describes the interactions
+between registers.</p>
+<p>You also need to define register classes to categorize related registers.  A
+register class should be added for groups of registers that are all treated the
+same way for some instruction.  Typical examples are register classes for
+integer, floating-point, or vector registers.  A register allocator allows an
+instruction to use any register in a specified register class to perform the
+instruction in a similar manner.  Register classes allocate virtual registers
+to instructions from these sets, and register classes let the
+target-independent register allocator automatically choose the actual
+registers.</p>
+<p>Much of the code for registers, including register definition, register
+aliases, and register classes, is generated by TableGen from
+<code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> input files and placed in <code class="docutils literal"><span class="pre">XXXGenRegisterInfo.h.inc</span></code>
+and <code class="docutils literal"><span class="pre">XXXGenRegisterInfo.inc</span></code> output files.  Some of the code in the
+implementation of <code class="docutils literal"><span class="pre">XXXRegisterInfo</span></code> requires hand-coding.</p>
+<div class="section" id="defining-a-register">
+<h3><a class="toc-backref" href="#id11">Defining a Register</a><a class="headerlink" href="#defining-a-register" title="Permalink to this headline">¶</a></h3>
+<p>The <code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> file typically starts with register definitions for
+a target machine.  The <code class="docutils literal"><span class="pre">Register</span></code> class (specified in <code class="docutils literal"><span class="pre">Target.td</span></code>) is used
+to define an object for each register.  The specified string <code class="docutils literal"><span class="pre">n</span></code> becomes the
+<code class="docutils literal"><span class="pre">Name</span></code> of the register.  The basic <code class="docutils literal"><span class="pre">Register</span></code> object does not have any
+subregisters and does not specify any aliases.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class Register<string n> {
+  string Namespace = "";
+  string AsmName = n;
+  string Name = n;
+  int SpillSize = 0;
+  int SpillAlignment = 0;
+  list<Register> Aliases = [];
+  list<Register> SubRegs = [];
+  list<int> DwarfNumbers = [];
+}
+</pre></div>
+</div>
+<p>For example, in the <code class="docutils literal"><span class="pre">X86RegisterInfo.td</span></code> file, there are register definitions
+that utilize the <code class="docutils literal"><span class="pre">Register</span></code> class, such as:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>;
+</pre></div>
+</div>
+<p>This defines the register <code class="docutils literal"><span class="pre">AL</span></code> and assigns it values (with <code class="docutils literal"><span class="pre">DwarfRegNum</span></code>)
+that are used by <code class="docutils literal"><span class="pre">gcc</span></code>, <code class="docutils literal"><span class="pre">gdb</span></code>, or a debug information writer to identify a
+register.  For register <code class="docutils literal"><span class="pre">AL</span></code>, <code class="docutils literal"><span class="pre">DwarfRegNum</span></code> takes an array of 3 values
+representing 3 different modes: the first element is for X86-64, the second for
+exception handling (EH) on X86-32, and the third is generic. -1 is a special
+Dwarf number that indicates the gcc number is undefined, and -2 indicates the
+register number is invalid for this mode.</p>
+<p>From the previously described line in the <code class="docutils literal"><span class="pre">X86RegisterInfo.td</span></code> file, TableGen
+generates this code in the <code class="docutils literal"><span class="pre">X86GenRegisterInfo.inc</span></code> file:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="n">GR8</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">X86</span><span class="o">::</span><span class="n">AL</span><span class="p">,</span> <span class="p">...</span> <span class="p">};</span>
+
+<span class="k">const</span> <span class="kt">unsigned</span> <span class="n">AL_AliasSet</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="n">X86</span><span class="o">::</span><span class="n">AX</span><span class="p">,</span> <span class="n">X86</span><span class="o">::</span><span class="n">EAX</span><span class="p">,</span> <span class="n">X86</span><span class="o">::</span><span class="n">RAX</span><span class="p">,</span> <span class="mi">0</span> <span class="p">};</span>
+
+<span class="k">const</span> <span class="n">TargetRegisterDesc</span> <span class="n">RegisterDescriptors</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">{</span> <span class="s">"AL"</span><span class="p">,</span> <span class="s">"AL"</span><span class="p">,</span> <span class="n">AL_AliasSet</span><span class="p">,</span> <span class="n">Empty_SubRegsSet</span><span class="p">,</span> <span class="n">Empty_SubRegsSet</span><span class="p">,</span> <span class="n">AL_SuperRegsSet</span> <span class="p">},</span> <span class="p">...</span>
+</pre></div>
+</div>
+<p>From the register info file, TableGen generates a <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code> object
+for each register.  <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code> is defined in
+<code class="docutils literal"><span class="pre">include/llvm/Target/TargetRegisterInfo.h</span></code> with the following fields:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">TargetRegisterDesc</span> <span class="p">{</span>
+  <span class="k">const</span> <span class="kt">char</span>     <span class="o">*</span><span class="n">AsmName</span><span class="p">;</span>      <span class="c1">// Assembly language name for the register</span>
+  <span class="k">const</span> <span class="kt">char</span>     <span class="o">*</span><span class="n">Name</span><span class="p">;</span>         <span class="c1">// Printable name for the reg (for debugging)</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">AliasSet</span><span class="p">;</span>     <span class="c1">// Register Alias Set</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">SubRegs</span><span class="p">;</span>      <span class="c1">// Sub-register set</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">ImmSubRegs</span><span class="p">;</span>   <span class="c1">// Immediate sub-register set</span>
+  <span class="k">const</span> <span class="kt">unsigned</span> <span class="o">*</span><span class="n">SuperRegs</span><span class="p">;</span>    <span class="c1">// Super-register set</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>TableGen uses the entire target description file (<code class="docutils literal"><span class="pre">.td</span></code>) to determine text
+names for the register (in the <code class="docutils literal"><span class="pre">AsmName</span></code> and <code class="docutils literal"><span class="pre">Name</span></code> fields of
+<code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code>) and the relationships of other registers to the defined
+register (in the other <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code> fields).  In this example, other
+definitions establish the registers “<code class="docutils literal"><span class="pre">AX</span></code>”, “<code class="docutils literal"><span class="pre">EAX</span></code>”, and “<code class="docutils literal"><span class="pre">RAX</span></code>” as
+aliases for one another, so TableGen generates a null-terminated array
+(<code class="docutils literal"><span class="pre">AL_AliasSet</span></code>) for this register alias set.</p>
+<p>The <code class="docutils literal"><span class="pre">Register</span></code> class is commonly used as a base class for more complex
+classes.  In <code class="docutils literal"><span class="pre">Target.td</span></code>, the <code class="docutils literal"><span class="pre">Register</span></code> class is the base for the
+<code class="docutils literal"><span class="pre">RegisterWithSubRegs</span></code> class that is used to define registers that need to
+specify subregisters in the <code class="docutils literal"><span class="pre">SubRegs</span></code> list, as shown here:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> {
+  let SubRegs = subregs;
+}
+</pre></div>
+</div>
+<p>In <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code>, additional register classes are defined for SPARC:
+a <code class="docutils literal"><span class="pre">Register</span></code> subclass, <code class="docutils literal"><span class="pre">SparcReg</span></code>, and further subclasses: <code class="docutils literal"><span class="pre">Ri</span></code>, <code class="docutils literal"><span class="pre">Rf</span></code>,
+and <code class="docutils literal"><span class="pre">Rd</span></code>.  SPARC registers are identified by 5-bit ID numbers, which is a
+feature common to these subclasses.  Note the use of “<code class="docutils literal"><span class="pre">let</span></code>” expressions to
+override values that are initially defined in a superclass (such as <code class="docutils literal"><span class="pre">SubRegs</span></code>
+field in the <code class="docutils literal"><span class="pre">Rd</span></code> class).</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class SparcReg<string n> : Register<n> {
+  field bits<5> Num;
+  let Namespace = "SP";
+}
+// Ri - 32-bit integer registers
+class Ri<bits<5> num, string n> :
+SparcReg<n> {
+  let Num = num;
+}
+// Rf - 32-bit floating-point registers
+class Rf<bits<5> num, string n> :
+SparcReg<n> {
+  let Num = num;
+}
+// Rd - Slots in the FP register file for 64-bit floating-point values.
+class Rd<bits<5> num, string n, list<Register> subregs> : SparcReg<n> {
+  let Num = num;
+  let SubRegs = subregs;
+}
+</pre></div>
+</div>
+<p>In the <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> file, there are register definitions that
+utilize these subclasses of <code class="docutils literal"><span class="pre">Register</span></code>, such as:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>;
+def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>;
+...
+def F0 : Rf< 0, "F0">, DwarfRegNum<[32]>;
+def F1 : Rf< 1, "F1">, DwarfRegNum<[33]>;
+...
+def D0 : Rd< 0, "F0", [F0, F1]>, DwarfRegNum<[32]>;
+def D1 : Rd< 2, "F2", [F2, F3]>, DwarfRegNum<[34]>;
+</pre></div>
+</div>
+<p>The last two registers shown above (<code class="docutils literal"><span class="pre">D0</span></code> and <code class="docutils literal"><span class="pre">D1</span></code>) are double-precision
+floating-point registers that are aliases for pairs of single-precision
+floating-point sub-registers.  In addition to aliases, the sub-register and
+super-register relationships of the defined register are in fields of a
+register’s <code class="docutils literal"><span class="pre">TargetRegisterDesc</span></code>.</p>
+</div>
+<div class="section" id="defining-a-register-class">
+<h3><a class="toc-backref" href="#id12">Defining a Register Class</a><a class="headerlink" href="#defining-a-register-class" title="Permalink to this headline">¶</a></h3>
+<p>The <code class="docutils literal"><span class="pre">RegisterClass</span></code> class (specified in <code class="docutils literal"><span class="pre">Target.td</span></code>) is used to define an
+object that represents a group of related registers and also defines the
+default allocation order of the registers.  A target description file
+<code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> that uses <code class="docutils literal"><span class="pre">Target.td</span></code> can construct register classes
+using the following class:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class RegisterClass<string namespace,
+list<ValueType> regTypes, int alignment, dag regList> {
+  string Namespace = namespace;
+  list<ValueType> RegTypes = regTypes;
+  int Size = 0;  // spill size, in bits; zero lets tblgen pick the size
+  int Alignment = alignment;
+
+  // CopyCost is the cost of copying a value between two registers
+  // default value 1 means a single instruction
+  // A negative value means copying is extremely expensive or impossible
+  int CopyCost = 1;
+  dag MemberList = regList;
+
+  // for register classes that are subregisters of this class
+  list<RegisterClass> SubRegClassList = [];
+
+  code MethodProtos = [{}];  // to insert arbitrary code
+  code MethodBodies = [{}];
+}
+</pre></div>
+</div>
+<p>To define a <code class="docutils literal"><span class="pre">RegisterClass</span></code>, use the following 4 arguments:</p>
+<ul class="simple">
+<li>The first argument of the definition is the name of the namespace.</li>
+<li>The second argument is a list of <code class="docutils literal"><span class="pre">ValueType</span></code> register type values that are
+defined in <code class="docutils literal"><span class="pre">include/llvm/CodeGen/ValueTypes.td</span></code>.  Defined values include
+integer types (such as <code class="docutils literal"><span class="pre">i16</span></code>, <code class="docutils literal"><span class="pre">i32</span></code>, and <code class="docutils literal"><span class="pre">i1</span></code> for Boolean),
+floating-point types (<code class="docutils literal"><span class="pre">f32</span></code>, <code class="docutils literal"><span class="pre">f64</span></code>), and vector types (for example,
+<code class="docutils literal"><span class="pre">v8i16</span></code> for an <code class="docutils literal"><span class="pre">8</span> <span class="pre">x</span> <span class="pre">i16</span></code> vector).  All registers in a <code class="docutils literal"><span class="pre">RegisterClass</span></code>
+must have the same <code class="docutils literal"><span class="pre">ValueType</span></code>, but some registers may store vector data in
+different configurations.  For example a register that can process a 128-bit
+vector may be able to handle 16 8-bit integer elements, 8 16-bit integers, 4
+32-bit integers, and so on.</li>
+<li>The third argument of the <code class="docutils literal"><span class="pre">RegisterClass</span></code> definition specifies the
+alignment required of the registers when they are stored or loaded to
+memory.</li>
+<li>The final argument, <code class="docutils literal"><span class="pre">regList</span></code>, specifies which registers are in this class.
+If an alternative allocation order method is not specified, then <code class="docutils literal"><span class="pre">regList</span></code>
+also defines the order of allocation used by the register allocator.  Besides
+simply listing registers with <code class="docutils literal"><span class="pre">(add</span> <span class="pre">R0,</span> <span class="pre">R1,</span> <span class="pre">...)</span></code>, more advanced set
+operators are available.  See <code class="docutils literal"><span class="pre">include/llvm/Target/Target.td</span></code> for more
+information.</li>
+</ul>
+<p>In <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code>, three <code class="docutils literal"><span class="pre">RegisterClass</span></code> objects are defined:
+<code class="docutils literal"><span class="pre">FPRegs</span></code>, <code class="docutils literal"><span class="pre">DFPRegs</span></code>, and <code class="docutils literal"><span class="pre">IntRegs</span></code>.  For all three register classes, the
+first argument defines the namespace with the string “<code class="docutils literal"><span class="pre">SP</span></code>”.  <code class="docutils literal"><span class="pre">FPRegs</span></code>
+defines a group of 32 single-precision floating-point registers (<code class="docutils literal"><span class="pre">F0</span></code> to
+<code class="docutils literal"><span class="pre">F31</span></code>); <code class="docutils literal"><span class="pre">DFPRegs</span></code> defines a group of 16 double-precision registers
+(<code class="docutils literal"><span class="pre">D0-D15</span></code>).</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>// F0, F1, F2, ..., F31
+def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>;
+
+def DFPRegs : RegisterClass<"SP", [f64], 64,
+                            (add D0, D1, D2, D3, D4, D5, D6, D7, D8,
+                                 D9, D10, D11, D12, D13, D14, D15)>;
+
+def IntRegs : RegisterClass<"SP", [i32], 32,
+    (add L0, L1, L2, L3, L4, L5, L6, L7,
+         I0, I1, I2, I3, I4, I5,
+         O0, O1, O2, O3, O4, O5, O7,
+         G1,
+         // Non-allocatable regs:
+         G2, G3, G4,
+         O6,        // stack ptr
+         I6,        // frame ptr
+         I7,        // return address
+         G0,        // constant zero
+         G5, G6, G7 // reserved for kernel
+    )>;
+</pre></div>
+</div>
+<p>Using <code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> with TableGen generates several output files
+that are intended for inclusion in other source code that you write.
+<code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> generates <code class="docutils literal"><span class="pre">SparcGenRegisterInfo.h.inc</span></code>, which should
+be included in the header file for the implementation of the SPARC register
+implementation that you write (<code class="docutils literal"><span class="pre">SparcRegisterInfo.h</span></code>).  In
+<code class="docutils literal"><span class="pre">SparcGenRegisterInfo.h.inc</span></code> a new structure is defined called
+<code class="docutils literal"><span class="pre">SparcGenRegisterInfo</span></code> that uses <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code> as its base.  It also
+specifies types, based upon the defined register classes: <code class="docutils literal"><span class="pre">DFPRegsClass</span></code>,
+<code class="docutils literal"><span class="pre">FPRegsClass</span></code>, and <code class="docutils literal"><span class="pre">IntRegsClass</span></code>.</p>
+<p><code class="docutils literal"><span class="pre">SparcRegisterInfo.td</span></code> also generates <code class="docutils literal"><span class="pre">SparcGenRegisterInfo.inc</span></code>, which is
+included at the bottom of <code class="docutils literal"><span class="pre">SparcRegisterInfo.cpp</span></code>, the SPARC register
+implementation.  The code below shows only the generated integer registers and
+associated register classes.  The order of registers in <code class="docutils literal"><span class="pre">IntRegs</span></code> reflects
+the order in the definition of <code class="docutils literal"><span class="pre">IntRegs</span></code> in the target description file.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// IntRegs Register Class...</span>
+<span class="k">static</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="n">IntRegs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">L0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L3</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L5</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">L6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">L7</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I3</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">I4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I5</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O3</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">O4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O5</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O7</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G1</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G2</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G3</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">G4</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">O6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">I7</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G0</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G5</span><span class="p">,</span>
+  <span class="n">SP</span><span class="o">::</span><span class="n">G6</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">G7</span><span class="p">,</span>
+<span class="p">};</span>
+
+<span class="c1">// IntRegsVTs Register Class Value Types...</span>
+<span class="k">static</span> <span class="k">const</span> <span class="n">MVT</span><span class="o">::</span><span class="n">ValueType</span> <span class="n">IntRegsVTs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+  <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">Other</span>
+<span class="p">};</span>
+
+<span class="k">namespace</span> <span class="n">SP</span> <span class="p">{</span>   <span class="c1">// Register class instances</span>
+  <span class="n">DFPRegsClass</span>    <span class="n">DFPRegsRegClass</span><span class="p">;</span>
+  <span class="n">FPRegsClass</span>     <span class="n">FPRegsRegClass</span><span class="p">;</span>
+  <span class="n">IntRegsClass</span>    <span class="n">IntRegsRegClass</span><span class="p">;</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Sub-register Classes...</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSubRegClasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Super-register Classes..</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSuperRegClasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Register Class sub-classes...</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSubclasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+<span class="p">...</span>
+  <span class="c1">// IntRegs Register Class super-classes...</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="n">TargetRegisterClass</span><span class="o">*</span> <span class="k">const</span> <span class="n">IntRegsSuperclasses</span> <span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="nb">NULL</span>
+  <span class="p">};</span>
+
+  <span class="n">IntRegsClass</span><span class="o">::</span><span class="n">IntRegsClass</span><span class="p">()</span> <span class="o">:</span> <span class="n">TargetRegisterClass</span><span class="p">(</span><span class="n">IntRegsRegClassID</span><span class="p">,</span>
+    <span class="n">IntRegsVTs</span><span class="p">,</span> <span class="n">IntRegsSubclasses</span><span class="p">,</span> <span class="n">IntRegsSuperclasses</span><span class="p">,</span> <span class="n">IntRegsSubRegClasses</span><span class="p">,</span>
+    <span class="n">IntRegsSuperRegClasses</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">IntRegs</span><span class="p">,</span> <span class="n">IntRegs</span> <span class="o">+</span> <span class="mi">32</span><span class="p">)</span> <span class="p">{}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The register allocators will avoid using reserved registers, and callee saved
+registers are not used until all the volatile registers have been used.  That
+is usually good enough, but in some cases it may be necessary to provide custom
+allocation orders.</p>
+</div>
+<div class="section" id="implement-a-subclass-of-targetregisterinfo">
+<h3><a class="toc-backref" href="#id13">Implement a subclass of <code class="docutils literal"><span class="pre">TargetRegisterInfo</span></code></a><a class="headerlink" href="#implement-a-subclass-of-targetregisterinfo" title="Permalink to this headline">¶</a></h3>
+<p>The final step is to hand code portions of <code class="docutils literal"><span class="pre">XXXRegisterInfo</span></code>, which
+implements the interface described in <code class="docutils literal"><span class="pre">TargetRegisterInfo.h</span></code> (see
+<a class="reference internal" href="CodeGenerator.html#targetregisterinfo"><span class="std std-ref">The TargetRegisterInfo class</span></a>).  These functions return <code class="docutils literal"><span class="pre">0</span></code>, <code class="docutils literal"><span class="pre">NULL</span></code>, or
+<code class="docutils literal"><span class="pre">false</span></code>, unless overridden.  Here is a list of functions that are overridden
+for the SPARC implementation in <code class="docutils literal"><span class="pre">SparcRegisterInfo.cpp</span></code>:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getCalleeSavedRegs</span></code> — Returns a list of callee-saved registers in the
+order of the desired callee-save stack frame offset.</li>
+<li><code class="docutils literal"><span class="pre">getReservedRegs</span></code> — Returns a bitset indexed by physical register
+numbers, indicating if a particular register is unavailable.</li>
+<li><code class="docutils literal"><span class="pre">hasFP</span></code> — Return a Boolean indicating if a function should have a
+dedicated frame pointer register.</li>
+<li><code class="docutils literal"><span class="pre">eliminateCallFramePseudoInstr</span></code> — If call frame setup or destroy pseudo
+instructions are used, this can be called to eliminate them.</li>
+<li><code class="docutils literal"><span class="pre">eliminateFrameIndex</span></code> — Eliminate abstract frame indices from
+instructions that may use them.</li>
+<li><code class="docutils literal"><span class="pre">emitPrologue</span></code> — Insert prologue code into the function.</li>
+<li><code class="docutils literal"><span class="pre">emitEpilogue</span></code> — Insert epilogue code into the function.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="instruction-set">
+<span id="id1"></span><h2><a class="toc-backref" href="#id14">Instruction Set</a><a class="headerlink" href="#instruction-set" title="Permalink to this headline">¶</a></h2>
+<p>During the early stages of code generation, the LLVM IR code is converted to a
+<code class="docutils literal"><span class="pre">SelectionDAG</span></code> with nodes that are instances of the <code class="docutils literal"><span class="pre">SDNode</span></code> class
+containing target instructions.  An <code class="docutils literal"><span class="pre">SDNode</span></code> has an opcode, operands, type
+requirements, and operation properties.  For example, is an operation
+commutative, does an operation load from memory.  The various operation node
+types are described in the <code class="docutils literal"><span class="pre">include/llvm/CodeGen/SelectionDAGNodes.h</span></code> file
+(values of the <code class="docutils literal"><span class="pre">NodeType</span></code> enum in the <code class="docutils literal"><span class="pre">ISD</span></code> namespace).</p>
+<p>TableGen uses the following target description (<code class="docutils literal"><span class="pre">.td</span></code>) input files to
+generate much of the code for instruction definition:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">Target.td</span></code> — Where the <code class="docutils literal"><span class="pre">Instruction</span></code>, <code class="docutils literal"><span class="pre">Operand</span></code>, <code class="docutils literal"><span class="pre">InstrInfo</span></code>, and
+other fundamental classes are defined.</li>
+<li><code class="docutils literal"><span class="pre">TargetSelectionDAG.td</span></code> — Used by <code class="docutils literal"><span class="pre">SelectionDAG</span></code> instruction selection
+generators, contains <code class="docutils literal"><span class="pre">SDTC*</span></code> classes (selection DAG type constraint),
+definitions of <code class="docutils literal"><span class="pre">SelectionDAG</span></code> nodes (such as <code class="docutils literal"><span class="pre">imm</span></code>, <code class="docutils literal"><span class="pre">cond</span></code>, <code class="docutils literal"><span class="pre">bb</span></code>,
+<code class="docutils literal"><span class="pre">add</span></code>, <code class="docutils literal"><span class="pre">fadd</span></code>, <code class="docutils literal"><span class="pre">sub</span></code>), and pattern support (<code class="docutils literal"><span class="pre">Pattern</span></code>, <code class="docutils literal"><span class="pre">Pat</span></code>,
+<code class="docutils literal"><span class="pre">PatFrag</span></code>, <code class="docutils literal"><span class="pre">PatLeaf</span></code>, <code class="docutils literal"><span class="pre">ComplexPattern</span></code>.</li>
+<li><code class="docutils literal"><span class="pre">XXXInstrFormats.td</span></code> — Patterns for definitions of target-specific
+instructions.</li>
+<li><code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> — Target-specific definitions of instruction templates,
+condition codes, and instructions of an instruction set.  For architecture
+modifications, a different file name may be used.  For example, for Pentium
+with SSE instruction, this file is <code class="docutils literal"><span class="pre">X86InstrSSE.td</span></code>, and for Pentium with
+MMX, this file is <code class="docutils literal"><span class="pre">X86InstrMMX.td</span></code>.</li>
+</ul>
+<p>There is also a target-specific <code class="docutils literal"><span class="pre">XXX.td</span></code> file, where <code class="docutils literal"><span class="pre">XXX</span></code> is the name of
+the target.  The <code class="docutils literal"><span class="pre">XXX.td</span></code> file includes the other <code class="docutils literal"><span class="pre">.td</span></code> input files, but
+its contents are only directly important for subtargets.</p>
+<p>You should describe a concrete target-specific class <code class="docutils literal"><span class="pre">XXXInstrInfo</span></code> that
+represents machine instructions supported by a target machine.
+<code class="docutils literal"><span class="pre">XXXInstrInfo</span></code> contains an array of <code class="docutils literal"><span class="pre">XXXInstrDescriptor</span></code> objects, each of
+which describes one instruction.  An instruction descriptor defines:</p>
+<ul class="simple">
+<li>Opcode mnemonic</li>
+<li>Number of operands</li>
+<li>List of implicit register definitions and uses</li>
+<li>Target-independent properties (such as memory access, is commutable)</li>
+<li>Target-specific flags</li>
+</ul>
+<p>The Instruction class (defined in <code class="docutils literal"><span class="pre">Target.td</span></code>) is mostly used as a base for
+more complex instruction classes.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class Instruction {
+  string Namespace = "";
+  dag OutOperandList;    // A dag containing the MI def operand list.
+  dag InOperandList;     // A dag containing the MI use operand list.
+  string AsmString = ""; // The .s format to print the instruction with.
+  list<dag> Pattern;     // Set to the DAG pattern for this instruction.
+  list<Register> Uses = [];
+  list<Register> Defs = [];
+  list<Predicate> Predicates = [];  // predicates turned into isel match code
+  ... remainder not shown for space ...
+}
+</pre></div>
+</div>
+<p>A <code class="docutils literal"><span class="pre">SelectionDAG</span></code> node (<code class="docutils literal"><span class="pre">SDNode</span></code>) should contain an object representing a
+target-specific instruction that is defined in <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code>.  The
+instruction objects should represent instructions from the architecture manual
+of the target machine (such as the SPARC Architecture Manual for the SPARC
+target).</p>
+<p>A single instruction from the architecture manual is often modeled as multiple
+target instructions, depending upon its operands.  For example, a manual might
+describe an add instruction that takes a register or an immediate operand.  An
+LLVM target could model this with two instructions named <code class="docutils literal"><span class="pre">ADDri</span></code> and
+<code class="docutils literal"><span class="pre">ADDrr</span></code>.</p>
+<p>You should define a class for each instruction category and define each opcode
+as a subclass of the category with appropriate parameters such as the fixed
+binary encoding of opcodes and extended opcodes.  You should map the register
+bits to the bits of the instruction in which they are encoded (for the JIT).
+Also you should specify how the instruction should be printed when the
+automatic assembly printer is used.</p>
+<p>As is described in the SPARC Architecture Manual, Version 8, there are three
+major 32-bit formats for instructions.  Format 1 is only for the <code class="docutils literal"><span class="pre">CALL</span></code>
+instruction.  Format 2 is for branch on condition codes and <code class="docutils literal"><span class="pre">SETHI</span></code> (set high
+bits of a register) instructions.  Format 3 is for other instructions.</p>
+<p>Each of these formats has corresponding classes in <code class="docutils literal"><span class="pre">SparcInstrFormat.td</span></code>.
+<code class="docutils literal"><span class="pre">InstSP</span></code> is a base class for other instruction classes.  Additional base
+classes are specified for more precise formats: for example in
+<code class="docutils literal"><span class="pre">SparcInstrFormat.td</span></code>, <code class="docutils literal"><span class="pre">F2_1</span></code> is for <code class="docutils literal"><span class="pre">SETHI</span></code>, and <code class="docutils literal"><span class="pre">F2_2</span></code> is for
+branches.  There are three other base classes: <code class="docutils literal"><span class="pre">F3_1</span></code> for register/register
+operations, <code class="docutils literal"><span class="pre">F3_2</span></code> for register/immediate operations, and <code class="docutils literal"><span class="pre">F3_3</span></code> for
+floating-point operations.  <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> also adds the base class
+<code class="docutils literal"><span class="pre">Pseudo</span></code> for synthetic SPARC instructions.</p>
+<p><code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> largely consists of operand and instruction definitions
+for the SPARC target.  In <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>, the following target
+description file entry, <code class="docutils literal"><span class="pre">LDrr</span></code>, defines the Load Integer instruction for a
+Word (the <code class="docutils literal"><span class="pre">LD</span></code> SPARC opcode) from a memory address to a register.  The first
+parameter, the value 3 (<code class="docutils literal"><span class="pre">11</span></code><sub>2</sub>), is the operation value for this
+category of operation.  The second parameter (<code class="docutils literal"><span class="pre">000000</span></code><sub>2</sub>) is the
+specific operation value for <code class="docutils literal"><span class="pre">LD</span></code>/Load Word.  The third parameter is the
+output destination, which is a register operand and defined in the <code class="docutils literal"><span class="pre">Register</span></code>
+target description file (<code class="docutils literal"><span class="pre">IntRegs</span></code>).</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr),
+                 "ld [$addr], $dst",
+                 [(set i32:$dst, (load ADDRrr:$addr))]>;
+</pre></div>
+</div>
+<p>The fourth parameter is the input source, which uses the address operand
+<code class="docutils literal"><span class="pre">MEMrr</span></code> that is defined earlier in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def MEMrr : Operand<i32> {
+  let PrintMethod = "printMemOperand";
+  let MIOperandInfo = (ops IntRegs, IntRegs);
+}
+</pre></div>
+</div>
+<p>The fifth parameter is a string that is used by the assembly printer and can be
+left as an empty string until the assembly printer interface is implemented.
+The sixth and final parameter is the pattern used to match the instruction
+during the SelectionDAG Select Phase described in <a class="reference internal" href="CodeGenerator.html"><span class="doc">The LLVM Target-Independent Code Generator</span></a>.
+This parameter is detailed in the next section, <a class="reference internal" href="#instruction-selector"><span class="std std-ref">Instruction Selector</span></a>.</p>
+<p>Instruction class definitions are not overloaded for different operand types,
+so separate versions of instructions are needed for register, memory, or
+immediate value operands.  For example, to perform a Load Integer instruction
+for a Word from an immediate operand to a register, the following instruction
+class is defined:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr),
+                 "ld [$addr], $dst",
+                 [(set i32:$dst, (load ADDRri:$addr))]>;
+</pre></div>
+</div>
+<p>Writing these definitions for so many similar instructions can involve a lot of
+cut and paste.  In <code class="docutils literal"><span class="pre">.td</span></code> files, the <code class="docutils literal"><span class="pre">multiclass</span></code> directive enables the
+creation of templates to define several instruction classes at once (using the
+<code class="docutils literal"><span class="pre">defm</span></code> directive).  For example in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>, the <code class="docutils literal"><span class="pre">multiclass</span></code>
+pattern <code class="docutils literal"><span class="pre">F3_12</span></code> is defined to create 2 instruction classes each time
+<code class="docutils literal"><span class="pre">F3_12</span></code> is invoked:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> {
+  def rr  : F3_1 <2, Op3Val,
+                 (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
+                 !strconcat(OpcStr, " $b, $c, $dst"),
+                 [(set i32:$dst, (OpNode i32:$b, i32:$c))]>;
+  def ri  : F3_2 <2, Op3Val,
+                 (outs IntRegs:$dst), (ins IntRegs:$b, i32imm:$c),
+                 !strconcat(OpcStr, " $b, $c, $dst"),
+                 [(set i32:$dst, (OpNode i32:$b, simm13:$c))]>;
+}
+</pre></div>
+</div>
+<p>So when the <code class="docutils literal"><span class="pre">defm</span></code> directive is used for the <code class="docutils literal"><span class="pre">XOR</span></code> and <code class="docutils literal"><span class="pre">ADD</span></code>
+instructions, as seen below, it creates four instruction objects: <code class="docutils literal"><span class="pre">XORrr</span></code>,
+<code class="docutils literal"><span class="pre">XORri</span></code>, <code class="docutils literal"><span class="pre">ADDrr</span></code>, and <code class="docutils literal"><span class="pre">ADDri</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>defm XOR   : F3_12<"xor", 0b000011, xor>;
+defm ADD   : F3_12<"add", 0b000000, add>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> also includes definitions for condition codes that are
+referenced by branch instructions.  The following definitions in
+<code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code> indicate the bit location of the SPARC condition code.
+For example, the 10<sup>th</sup> bit represents the “greater than” condition for
+integers, and the 22<sup>nd</sup> bit represents the “greater than” condition for
+floats.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def ICC_NE  : ICC_VAL< 9>;  // Not Equal
+def ICC_E   : ICC_VAL< 1>;  // Equal
+def ICC_G   : ICC_VAL<10>;  // Greater
+...
+def FCC_U   : FCC_VAL<23>;  // Unordered
+def FCC_G   : FCC_VAL<22>;  // Greater
+def FCC_UG  : FCC_VAL<21>;  // Unordered or Greater
+...
+</pre></div>
+</div>
+<p>(Note that <code class="docutils literal"><span class="pre">Sparc.h</span></code> also defines enums that correspond to the same SPARC
+condition codes.  Care must be taken to ensure the values in <code class="docutils literal"><span class="pre">Sparc.h</span></code>
+correspond to the values in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>.  I.e., <code class="docutils literal"><span class="pre">SPCC::ICC_NE</span> <span class="pre">=</span> <span class="pre">9</span></code>,
+<code class="docutils literal"><span class="pre">SPCC::FCC_U</span> <span class="pre">=</span> <span class="pre">23</span></code> and so on.)</p>
+<div class="section" id="instruction-operand-mapping">
+<h3><a class="toc-backref" href="#id15">Instruction Operand Mapping</a><a class="headerlink" href="#instruction-operand-mapping" title="Permalink to this headline">¶</a></h3>
+<p>The code generator backend maps instruction operands to fields in the
+instruction.  Operands are assigned to unbound fields in the instruction in the
+order they are defined.  Fields are bound when they are assigned a value.  For
+example, the Sparc target defines the <code class="docutils literal"><span class="pre">XNORrr</span></code> instruction as a <code class="docutils literal"><span class="pre">F3_1</span></code>
+format instruction having three operands.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def XNORrr  : F3_1<2, 0b000111,
+                   (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c),
+                   "xnor $b, $c, $dst",
+                   [(set i32:$dst, (not (xor i32:$b, i32:$c)))]>;
+</pre></div>
+</div>
+<p>The instruction templates in <code class="docutils literal"><span class="pre">SparcInstrFormats.td</span></code> show the base class for
+<code class="docutils literal"><span class="pre">F3_1</span></code> is <code class="docutils literal"><span class="pre">InstSP</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction {
+  field bits<32> Inst;
+  let Namespace = "SP";
+  bits<2> op;
+  let Inst{31-30} = op;
+  dag OutOperandList = outs;
+  dag InOperandList = ins;
+  let AsmString   = asmstr;
+  let Pattern = pattern;
+}
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">InstSP</span></code> leaves the <code class="docutils literal"><span class="pre">op</span></code> field unbound.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class F3<dag outs, dag ins, string asmstr, list<dag> pattern>
+    : InstSP<outs, ins, asmstr, pattern> {
+  bits<5> rd;
+  bits<6> op3;
+  bits<5> rs1;
+  let op{1} = 1;   // Op = 2 or 3
+  let Inst{29-25} = rd;
+  let Inst{24-19} = op3;
+  let Inst{18-14} = rs1;
+}
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">F3</span></code> binds the <code class="docutils literal"><span class="pre">op</span></code> field and defines the <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">op3</span></code>, and <code class="docutils literal"><span class="pre">rs1</span></code>
+fields.  <code class="docutils literal"><span class="pre">F3</span></code> format instructions will bind the operands <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">op3</span></code>, and
+<code class="docutils literal"><span class="pre">rs1</span></code> fields.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins,
+           string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> {
+  bits<8> asi = 0; // asi not currently used
+  bits<5> rs2;
+  let op         = opVal;
+  let op3        = op3val;
+  let Inst{13}   = 0;     // i field = 0
+  let Inst{12-5} = asi;   // address space identifier
+  let Inst{4-0}  = rs2;
+}
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">F3_1</span></code> binds the <code class="docutils literal"><span class="pre">op3</span></code> field and defines the <code class="docutils literal"><span class="pre">rs2</span></code> fields.  <code class="docutils literal"><span class="pre">F3_1</span></code>
+format instructions will bind the operands to the <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">rs1</span></code>, and <code class="docutils literal"><span class="pre">rs2</span></code>
+fields.  This results in the <code class="docutils literal"><span class="pre">XNORrr</span></code> instruction binding <code class="docutils literal"><span class="pre">$dst</span></code>, <code class="docutils literal"><span class="pre">$b</span></code>,
+and <code class="docutils literal"><span class="pre">$c</span></code> operands to the <code class="docutils literal"><span class="pre">rd</span></code>, <code class="docutils literal"><span class="pre">rs1</span></code>, and <code class="docutils literal"><span class="pre">rs2</span></code> fields respectively.</p>
+<div class="section" id="instruction-operand-name-mapping">
+<h4><a class="toc-backref" href="#id16">Instruction Operand Name Mapping</a><a class="headerlink" href="#instruction-operand-name-mapping" title="Permalink to this headline">¶</a></h4>
+<p>TableGen will also generate a function called getNamedOperandIdx() which
+can be used to look up an operand’s index in a MachineInstr based on its
+TableGen name.  Setting the UseNamedOperandTable bit in an instruction’s
+TableGen definition will add all of its operands to an enumeration in the
+llvm::XXX:OpName namespace and also add an entry for it into the OperandMap
+table, which can be queried using getNamedOperandIdx()</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>int DstIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::dst); // => 0
+int BIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::b);     // => 1
+int CIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::c);     // => 2
+int DIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::d);     // => -1
+
+...
+</pre></div>
+</div>
+<p>The entries in the OpName enum are taken verbatim from the TableGen definitions,
+so operands with lowercase names will have lower case entries in the enum.</p>
+<p>To include the getNamedOperandIdx() function in your backend, you will need
+to define a few preprocessor macros in XXXInstrInfo.cpp and XXXInstrInfo.h.
+For example:</p>
+<p>XXXInstrInfo.cpp:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#define GET_INSTRINFO_NAMED_OPS </span><span class="c1">// For getNamedOperandIdx() function</span>
+<span class="cp">#include</span> <span class="cpf">"XXXGenInstrInfo.inc"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>XXXInstrInfo.h:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#define GET_INSTRINFO_OPERAND_ENUM </span><span class="c1">// For OpName enum</span>
+<span class="cp">#include</span> <span class="cpf">"XXXGenInstrInfo.inc"</span><span class="cp"></span>
+
+<span class="k">namespace</span> <span class="n">XXX</span> <span class="p">{</span>
+  <span class="kt">int16_t</span> <span class="n">getNamedOperandIdx</span><span class="p">(</span><span class="kt">uint16_t</span> <span class="n">Opcode</span><span class="p">,</span> <span class="kt">uint16_t</span> <span class="n">NamedIndex</span><span class="p">);</span>
+<span class="p">}</span> <span class="c1">// End namespace XXX</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="instruction-operand-types">
+<h4><a class="toc-backref" href="#id17">Instruction Operand Types</a><a class="headerlink" href="#instruction-operand-types" title="Permalink to this headline">¶</a></h4>
+<p>TableGen will also generate an enumeration consisting of all named Operand
+types defined in the backend, in the llvm::XXX::OpTypes namespace.
+Some common immediate Operand types (for instance i8, i32, i64, f32, f64)
+are defined for all targets in <code class="docutils literal"><span class="pre">include/llvm/Target/Target.td</span></code>, and are
+available in each Target’s OpTypes enum.  Also, only named Operand types appear
+in the enumeration: anonymous types are ignored.
+For example, the X86 backend defines <code class="docutils literal"><span class="pre">brtarget</span></code> and <code class="docutils literal"><span class="pre">brtarget8</span></code>, both
+instances of the TableGen <code class="docutils literal"><span class="pre">Operand</span></code> class, which represent branch target
+operands:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def brtarget : Operand<OtherVT>;
+def brtarget8 : Operand<OtherVT>;
+</pre></div>
+</div>
+<p>This results in:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="n">X86</span> <span class="p">{</span>
+<span class="k">namespace</span> <span class="n">OpTypes</span> <span class="p">{</span>
+<span class="k">enum</span> <span class="n">OperandType</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="n">brtarget</span><span class="p">,</span>
+  <span class="n">brtarget8</span><span class="p">,</span>
+  <span class="p">...</span>
+  <span class="n">i32imm</span><span class="p">,</span>
+  <span class="n">i64imm</span><span class="p">,</span>
+  <span class="p">...</span>
+  <span class="n">OPERAND_TYPE_LIST_END</span>
+<span class="p">}</span> <span class="c1">// End namespace OpTypes</span>
+<span class="p">}</span> <span class="c1">// End namespace X86</span>
+</pre></div>
+</div>
+<p>In typical TableGen fashion, to use the enum, you will need to define a
+preprocessor macro:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#define GET_INSTRINFO_OPERAND_TYPES_ENUM </span><span class="c1">// For OpTypes enum</span>
+<span class="cp">#include</span> <span class="cpf">"XXXGenInstrInfo.inc"</span><span class="cp"></span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="instruction-scheduling">
+<h3><a class="toc-backref" href="#id18">Instruction Scheduling</a><a class="headerlink" href="#instruction-scheduling" title="Permalink to this headline">¶</a></h3>
+<p>Instruction itineraries can be queried using MCDesc::getSchedClass(). The
+value can be named by an enumemation in llvm::XXX::Sched namespace generated
+by TableGen in XXXGenInstrInfo.inc. The name of the schedule classes are
+the same as provided in XXXSchedule.td plus a default NoItinerary class.</p>
+</div>
+<div class="section" id="instruction-relation-mapping">
+<h3><a class="toc-backref" href="#id19">Instruction Relation Mapping</a><a class="headerlink" href="#instruction-relation-mapping" title="Permalink to this headline">¶</a></h3>
+<p>This TableGen feature is used to relate instructions with each other.  It is
+particularly useful when you have multiple instruction formats and need to
+switch between them after instruction selection.  This entire feature is driven
+by relation models which can be defined in <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> files
+according to the target-specific instruction set.  Relation models are defined
+using <code class="docutils literal"><span class="pre">InstrMapping</span></code> class as a base.  TableGen parses all the models
+and generates instruction relation maps using the specified information.
+Relation maps are emitted as tables in the <code class="docutils literal"><span class="pre">XXXGenInstrInfo.inc</span></code> file
+along with the functions to query them.  For the detailed information on how to
+use this feature, please refer to <a class="reference internal" href="HowToUseInstrMappings.html"><span class="doc">How To Use Instruction Mappings</span></a>.</p>
+</div>
+<div class="section" id="implement-a-subclass-of-targetinstrinfo">
+<h3><a class="toc-backref" href="#id20">Implement a subclass of <code class="docutils literal"><span class="pre">TargetInstrInfo</span></code></a><a class="headerlink" href="#implement-a-subclass-of-targetinstrinfo" title="Permalink to this headline">¶</a></h3>
+<p>The final step is to hand code portions of <code class="docutils literal"><span class="pre">XXXInstrInfo</span></code>, which implements
+the interface described in <code class="docutils literal"><span class="pre">TargetInstrInfo.h</span></code> (see <a class="reference internal" href="CodeGenerator.html#targetinstrinfo"><span class="std std-ref">The TargetInstrInfo class</span></a>).
+These functions return <code class="docutils literal"><span class="pre">0</span></code> or a Boolean or they assert, unless overridden.
+Here’s a list of functions that are overridden for the SPARC implementation in
+<code class="docutils literal"><span class="pre">SparcInstrInfo.cpp</span></code>:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">isLoadFromStackSlot</span></code> — If the specified machine instruction is a direct
+load from a stack slot, return the register number of the destination and the
+<code class="docutils literal"><span class="pre">FrameIndex</span></code> of the stack slot.</li>
+<li><code class="docutils literal"><span class="pre">isStoreToStackSlot</span></code> — If the specified machine instruction is a direct
+store to a stack slot, return the register number of the destination and the
+<code class="docutils literal"><span class="pre">FrameIndex</span></code> of the stack slot.</li>
+<li><code class="docutils literal"><span class="pre">copyPhysReg</span></code> — Copy values between a pair of physical registers.</li>
+<li><code class="docutils literal"><span class="pre">storeRegToStackSlot</span></code> — Store a register value to a stack slot.</li>
+<li><code class="docutils literal"><span class="pre">loadRegFromStackSlot</span></code> — Load a register value from a stack slot.</li>
+<li><code class="docutils literal"><span class="pre">storeRegToAddr</span></code> — Store a register value to memory.</li>
+<li><code class="docutils literal"><span class="pre">loadRegFromAddr</span></code> — Load a register value from memory.</li>
+<li><code class="docutils literal"><span class="pre">foldMemoryOperand</span></code> — Attempt to combine instructions of any load or
+store instruction for the specified operand(s).</li>
+</ul>
+</div>
+<div class="section" id="branch-folding-and-if-conversion">
+<h3><a class="toc-backref" href="#id21">Branch Folding and If Conversion</a><a class="headerlink" href="#branch-folding-and-if-conversion" title="Permalink to this headline">¶</a></h3>
+<p>Performance can be improved by combining instructions or by eliminating
+instructions that are never reached.  The <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> method in
+<code class="docutils literal"><span class="pre">XXXInstrInfo</span></code> may be implemented to examine conditional instructions and
+remove unnecessary instructions.  <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> looks at the end of a
+machine basic block (MBB) for opportunities for improvement, such as branch
+folding and if conversion.  The <code class="docutils literal"><span class="pre">BranchFolder</span></code> and <code class="docutils literal"><span class="pre">IfConverter</span></code> machine
+function passes (see the source files <code class="docutils literal"><span class="pre">BranchFolding.cpp</span></code> and
+<code class="docutils literal"><span class="pre">IfConversion.cpp</span></code> in the <code class="docutils literal"><span class="pre">lib/CodeGen</span></code> directory) call <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code>
+to improve the control flow graph that represents the instructions.</p>
+<p>Several implementations of <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (for ARM, Alpha, and X86) can be
+examined as models for your own <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> implementation.  Since SPARC
+does not implement a useful <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code>, the ARM target implementation is
+shown below.</p>
+<p><code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> returns a Boolean value and takes four parameters:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">MachineBasicBlock</span> <span class="pre">&MBB</span></code> — The incoming block to be examined.</li>
+<li><code class="docutils literal"><span class="pre">MachineBasicBlock</span> <span class="pre">*&TBB</span></code> — A destination block that is returned.  For a
+conditional branch that evaluates to true, <code class="docutils literal"><span class="pre">TBB</span></code> is the destination.</li>
+<li><code class="docutils literal"><span class="pre">MachineBasicBlock</span> <span class="pre">*&FBB</span></code> — For a conditional branch that evaluates to
+false, <code class="docutils literal"><span class="pre">FBB</span></code> is returned as the destination.</li>
+<li><code class="docutils literal"><span class="pre">std::vector<MachineOperand></span> <span class="pre">&Cond</span></code> — List of operands to evaluate a
+condition for a conditional branch.</li>
+</ul>
+<p>In the simplest case, if a block ends without a branch, then it falls through
+to the successor block.  No destination blocks are specified for either <code class="docutils literal"><span class="pre">TBB</span></code>
+or <code class="docutils literal"><span class="pre">FBB</span></code>, so both parameters return <code class="docutils literal"><span class="pre">NULL</span></code>.  The start of the
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (see code below for the ARM target) shows the function
+parameters and the code for the simplest case.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">bool</span> <span class="n">ARMInstrInfo</span><span class="o">::</span><span class="n">AnalyzeBranch</span><span class="p">(</span><span class="n">MachineBasicBlock</span> <span class="o">&</span><span class="n">MBB</span><span class="p">,</span>
+                                 <span class="n">MachineBasicBlock</span> <span class="o">*&</span><span class="n">TBB</span><span class="p">,</span>
+                                 <span class="n">MachineBasicBlock</span> <span class="o">*&</span><span class="n">FBB</span><span class="p">,</span>
+                                 <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MachineOperand</span><span class="o">></span> <span class="o">&</span><span class="n">Cond</span><span class="p">)</span> <span class="k">const</span>
+<span class="p">{</span>
+  <span class="n">MachineBasicBlock</span><span class="o">::</span><span class="n">iterator</span> <span class="n">I</span> <span class="o">=</span> <span class="n">MBB</span><span class="p">.</span><span class="n">end</span><span class="p">();</span>
+  <span class="k">if</span> <span class="p">(</span><span class="n">I</span> <span class="o">==</span> <span class="n">MBB</span><span class="p">.</span><span class="n">begin</span><span class="p">()</span> <span class="o">||</span> <span class="o">!</span><span class="n">isUnpredicatedTerminator</span><span class="p">(</span><span class="o">--</span><span class="n">I</span><span class="p">))</span>
+    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>If a block ends with a single unconditional branch instruction, then
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (shown below) should return the destination of that branch in
+the <code class="docutils literal"><span class="pre">TBB</span></code> parameter.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span> <span class="o">||</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">TBB</span> <span class="o">=</span> <span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>If a block ends with two unconditional branches, then the second branch is
+never reached.  In that situation, as shown below, remove the last branch
+instruction and return the penultimate branch in the <code class="docutils literal"><span class="pre">TBB</span></code> parameter.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">((</span><span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span> <span class="o">||</span> <span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">)</span> <span class="o">&&</span>
+    <span class="p">(</span><span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span> <span class="o">||</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">))</span> <span class="p">{</span>
+  <span class="n">TBB</span> <span class="o">=</span> <span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="n">I</span> <span class="o">=</span> <span class="n">LastInst</span><span class="p">;</span>
+  <span class="n">I</span><span class="o">-></span><span class="n">eraseFromParent</span><span class="p">();</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>A block may end with a single conditional branch instruction that falls through
+to successor block if the condition evaluates to false.  In that case,
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (shown below) should return the destination of that
+conditional branch in the <code class="docutils literal"><span class="pre">TBB</span></code> parameter and a list of operands in the
+<code class="docutils literal"><span class="pre">Cond</span></code> parameter to evaluate the condition.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">Bcc</span> <span class="o">||</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tBcc</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// Block ends with fall-through condbranch.</span>
+  <span class="n">TBB</span> <span class="o">=</span> <span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">1</span><span class="p">));</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>If a block ends with both a conditional branch and an ensuing unconditional
+branch, then <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> (shown below) should return the conditional
+branch destination (assuming it corresponds to a conditional evaluation of
+“<code class="docutils literal"><span class="pre">true</span></code>”) in the <code class="docutils literal"><span class="pre">TBB</span></code> parameter and the unconditional branch destination
+in the <code class="docutils literal"><span class="pre">FBB</span></code> (corresponding to a conditional evaluation of “<code class="docutils literal"><span class="pre">false</span></code>”).  A
+list of operands to evaluate the condition should be returned in the <code class="docutils literal"><span class="pre">Cond</span></code>
+parameter.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">unsigned</span> <span class="n">SecondLastOpc</span> <span class="o">=</span> <span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOpcode</span><span class="p">();</span>
+
+<span class="k">if</span> <span class="p">((</span><span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">Bcc</span> <span class="o">&&</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">B</span><span class="p">)</span> <span class="o">||</span>
+    <span class="p">(</span><span class="n">SecondLastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tBcc</span> <span class="o">&&</span> <span class="n">LastOpc</span> <span class="o">==</span> <span class="n">ARM</span><span class="o">::</span><span class="n">tB</span><span class="p">))</span> <span class="p">{</span>
+  <span class="n">TBB</span> <span class="o">=</span>  <span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">1</span><span class="p">));</span>
+  <span class="n">Cond</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">SecondLastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">2</span><span class="p">));</span>
+  <span class="n">FBB</span> <span class="o">=</span> <span class="n">LastInst</span><span class="o">-></span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">getMBB</span><span class="p">();</span>
+  <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>For the last two cases (ending with a single conditional branch or ending with
+one conditional and one unconditional branch), the operands returned in the
+<code class="docutils literal"><span class="pre">Cond</span></code> parameter can be passed to methods of other instructions to create new
+branches or perform other operations.  An implementation of <code class="docutils literal"><span class="pre">AnalyzeBranch</span></code>
+requires the helper methods <code class="docutils literal"><span class="pre">RemoveBranch</span></code> and <code class="docutils literal"><span class="pre">InsertBranch</span></code> to manage
+subsequent operations.</p>
+<p><code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> should return false indicating success in most circumstances.
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> should only return true when the method is stumped about what
+to do, for example, if a block has three terminating branches.
+<code class="docutils literal"><span class="pre">AnalyzeBranch</span></code> may return true if it encounters a terminator it cannot
+handle, such as an indirect branch.</p>
+</div>
+</div>
+<div class="section" id="instruction-selector">
+<span id="id2"></span><h2><a class="toc-backref" href="#id22">Instruction Selector</a><a class="headerlink" href="#instruction-selector" title="Permalink to this headline">¶</a></h2>
+<p>LLVM uses a <code class="docutils literal"><span class="pre">SelectionDAG</span></code> to represent LLVM IR instructions, and nodes of
+the <code class="docutils literal"><span class="pre">SelectionDAG</span></code> ideally represent native target instructions.  During code
+generation, instruction selection passes are performed to convert non-native
+DAG instructions into native target-specific instructions.  The pass described
+in <code class="docutils literal"><span class="pre">XXXISelDAGToDAG.cpp</span></code> is used to match patterns and perform DAG-to-DAG
+instruction selection.  Optionally, a pass may be defined (in
+<code class="docutils literal"><span class="pre">XXXBranchSelector.cpp</span></code>) to perform similar DAG-to-DAG operations for branch
+instructions.  Later, the code in <code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code> replaces or removes
+operations and data types not supported natively (legalizes) in a
+<code class="docutils literal"><span class="pre">SelectionDAG</span></code>.</p>
+<p>TableGen generates code for instruction selection using the following target
+description input files:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> — Contains definitions of instructions in a
+target-specific instruction set, generates <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code>, which is
+included in <code class="docutils literal"><span class="pre">XXXISelDAGToDAG.cpp</span></code>.</li>
+<li><code class="docutils literal"><span class="pre">XXXCallingConv.td</span></code> — Contains the calling and return value conventions
+for the target architecture, and it generates <code class="docutils literal"><span class="pre">XXXGenCallingConv.inc</span></code>,
+which is included in <code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code>.</li>
+</ul>
+<p>The implementation of an instruction selection pass must include a header that
+declares the <code class="docutils literal"><span class="pre">FunctionPass</span></code> class or a subclass of <code class="docutils literal"><span class="pre">FunctionPass</span></code>.  In
+<code class="docutils literal"><span class="pre">XXXTargetMachine.cpp</span></code>, a Pass Manager (PM) should add each instruction
+selection pass into the queue of passes to run.</p>
+<p>The LLVM static compiler (<code class="docutils literal"><span class="pre">llc</span></code>) is an excellent tool for visualizing the
+contents of DAGs.  To display the <code class="docutils literal"><span class="pre">SelectionDAG</span></code> before or after specific
+processing phases, use the command line options for <code class="docutils literal"><span class="pre">llc</span></code>, described at
+<a class="reference internal" href="CodeGenerator.html#selectiondag-process"><span class="std std-ref">SelectionDAG Instruction Selection Process</span></a>.</p>
+<p>To describe instruction selector behavior, you should add patterns for lowering
+LLVM code into a <code class="docutils literal"><span class="pre">SelectionDAG</span></code> as the last parameter of the instruction
+definitions in <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code>.  For example, in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>,
+this entry defines a register store operation, and the last parameter describes
+a pattern with the store DAG operator.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def STrr  : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src),
+                 "st $src, [$addr]", [(store i32:$src, ADDRrr:$addr)]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">ADDRrr</span></code> is a memory mode that is also defined in <code class="docutils literal"><span class="pre">SparcInstrInfo.td</span></code>:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>;
+</pre></div>
+</div>
+<p>The definition of <code class="docutils literal"><span class="pre">ADDRrr</span></code> refers to <code class="docutils literal"><span class="pre">SelectADDRrr</span></code>, which is a function
+defined in an implementation of the Instructor Selector (such as
+<code class="docutils literal"><span class="pre">SparcISelDAGToDAG.cpp</span></code>).</p>
+<p>In <code class="docutils literal"><span class="pre">lib/Target/TargetSelectionDAG.td</span></code>, the DAG operator for store is defined
+below:</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def store : PatFrag<(ops node:$val, node:$ptr),
+                    (st node:$val, node:$ptr), [{
+  if (StoreSDNode *ST = dyn_cast<StoreSDNode>(N))
+    return !ST->isTruncatingStore() &&
+           ST->getAddressingMode() == ISD::UNINDEXED;
+  return false;
+}]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> also generates (in <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code>) the
+<code class="docutils literal"><span class="pre">SelectCode</span></code> method that is used to call the appropriate processing method
+for an instruction.  In this example, <code class="docutils literal"><span class="pre">SelectCode</span></code> calls <code class="docutils literal"><span class="pre">Select_ISD_STORE</span></code>
+for the <code class="docutils literal"><span class="pre">ISD::STORE</span></code> opcode.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SDNode</span> <span class="o">*</span><span class="nf">SelectCode</span><span class="p">(</span><span class="n">SDValue</span> <span class="n">N</span><span class="p">)</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="n">MVT</span><span class="o">::</span><span class="n">ValueType</span> <span class="n">NVT</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getNode</span><span class="p">()</span><span class="o">-></span><span class="n">getValueType</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
+  <span class="k">switch</span> <span class="p">(</span><span class="n">N</span><span class="p">.</span><span class="n">getOpcode</span><span class="p">())</span> <span class="p">{</span>
+  <span class="k">case</span> <span class="n">ISD</span><span class="o">::</span><span class="nl">STORE</span><span class="p">:</span> <span class="p">{</span>
+    <span class="k">switch</span> <span class="p">(</span><span class="n">NVT</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">default</span><span class="o">:</span>
+      <span class="k">return</span> <span class="n">Select_ISD_STORE</span><span class="p">(</span><span class="n">N</span><span class="p">);</span>
+      <span class="k">break</span><span class="p">;</span>
+    <span class="p">}</span>
+    <span class="k">break</span><span class="p">;</span>
+  <span class="p">}</span>
+  <span class="p">...</span>
+</pre></div>
+</div>
+<p>The pattern for <code class="docutils literal"><span class="pre">STrr</span></code> is matched, so elsewhere in <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code>,
+code for <code class="docutils literal"><span class="pre">STrr</span></code> is created for <code class="docutils literal"><span class="pre">Select_ISD_STORE</span></code>.  The <code class="docutils literal"><span class="pre">Emit_22</span></code> method
+is also generated in <code class="docutils literal"><span class="pre">XXXGenDAGISel.inc</span></code> to complete the processing of this
+instruction.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SDNode</span> <span class="o">*</span><span class="nf">Select_ISD_STORE</span><span class="p">(</span><span class="k">const</span> <span class="n">SDValue</span> <span class="o">&</span><span class="n">N</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">SDValue</span> <span class="n">Chain</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
+  <span class="k">if</span> <span class="p">(</span><span class="n">Predicate_store</span><span class="p">(</span><span class="n">N</span><span class="p">.</span><span class="n">getNode</span><span class="p">()))</span> <span class="p">{</span>
+    <span class="n">SDValue</span> <span class="n">N1</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
+    <span class="n">SDValue</span> <span class="n">N2</span> <span class="o">=</span> <span class="n">N</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
+    <span class="n">SDValue</span> <span class="n">CPTmp0</span><span class="p">;</span>
+    <span class="n">SDValue</span> <span class="n">CPTmp1</span><span class="p">;</span>
+
+    <span class="c1">// Pattern: (st:void i32:i32:$src,</span>
+    <span class="c1">//           ADDRrr:i32:$addr)<<P:Predicate_store>></span>
+    <span class="c1">// Emits: (STrr:void ADDRrr:i32:$addr, IntRegs:i32:$src)</span>
+    <span class="c1">// Pattern complexity = 13  cost = 1  size = 0</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">SelectADDRrr</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N2</span><span class="p">,</span> <span class="n">CPTmp0</span><span class="p">,</span> <span class="n">CPTmp1</span><span class="p">)</span> <span class="o">&&</span>
+        <span class="n">N1</span><span class="p">.</span><span class="n">getNode</span><span class="p">()</span><span class="o">-></span><span class="n">getValueType</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span> <span class="o">&&</span>
+        <span class="n">N2</span><span class="p">.</span><span class="n">getNode</span><span class="p">()</span><span class="o">-></span><span class="n">getValueType</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">)</span> <span class="p">{</span>
+      <span class="k">return</span> <span class="n">Emit_22</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">STrr</span><span class="p">,</span> <span class="n">CPTmp0</span><span class="p">,</span> <span class="n">CPTmp1</span><span class="p">);</span>
+    <span class="p">}</span>
+<span class="p">...</span>
+</pre></div>
+</div>
+<div class="section" id="the-selectiondag-legalize-phase">
+<h3><a class="toc-backref" href="#id23">The SelectionDAG Legalize Phase</a><a class="headerlink" href="#the-selectiondag-legalize-phase" title="Permalink to this headline">¶</a></h3>
+<p>The Legalize phase converts a DAG to use types and operations that are natively
+supported by the target.  For natively unsupported types and operations, you
+need to add code to the target-specific <code class="docutils literal"><span class="pre">XXXTargetLowering</span></code> implementation to
+convert unsupported types and operations to supported ones.</p>
+<p>In the constructor for the <code class="docutils literal"><span class="pre">XXXTargetLowering</span></code> class, first use the
+<code class="docutils literal"><span class="pre">addRegisterClass</span></code> method to specify which types are supported and which
+register classes are associated with them.  The code for the register classes
+are generated by TableGen from <code class="docutils literal"><span class="pre">XXXRegisterInfo.td</span></code> and placed in
+<code class="docutils literal"><span class="pre">XXXGenRegisterInfo.h.inc</span></code>.  For example, the implementation of the
+constructor for the SparcTargetLowering class (in <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code>)
+starts with the following code:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">addRegisterClass</span><span class="p">(</span><span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">IntRegsRegisterClass</span><span class="p">);</span>
+<span class="n">addRegisterClass</span><span class="p">(</span><span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">FPRegsRegisterClass</span><span class="p">);</span>
+<span class="n">addRegisterClass</span><span class="p">(</span><span class="n">MVT</span><span class="o">::</span><span class="n">f64</span><span class="p">,</span> <span class="n">SP</span><span class="o">::</span><span class="n">DFPRegsRegisterClass</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>You should examine the node types in the <code class="docutils literal"><span class="pre">ISD</span></code> namespace
+(<code class="docutils literal"><span class="pre">include/llvm/CodeGen/SelectionDAGNodes.h</span></code>) and determine which operations
+the target natively supports.  For operations that do <strong>not</strong> have native
+support, add a callback to the constructor for the <code class="docutils literal"><span class="pre">XXXTargetLowering</span></code> class,
+so the instruction selection process knows what to do.  The <code class="docutils literal"><span class="pre">TargetLowering</span></code>
+class callback methods (declared in <code class="docutils literal"><span class="pre">llvm/Target/TargetLowering.h</span></code>) are:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">setOperationAction</span></code> — General operation.</li>
+<li><code class="docutils literal"><span class="pre">setLoadExtAction</span></code> — Load with extension.</li>
+<li><code class="docutils literal"><span class="pre">setTruncStoreAction</span></code> — Truncating store.</li>
+<li><code class="docutils literal"><span class="pre">setIndexedLoadAction</span></code> — Indexed load.</li>
+<li><code class="docutils literal"><span class="pre">setIndexedStoreAction</span></code> — Indexed store.</li>
+<li><code class="docutils literal"><span class="pre">setConvertAction</span></code> — Type conversion.</li>
+<li><code class="docutils literal"><span class="pre">setCondCodeAction</span></code> — Support for a given condition code.</li>
+</ul>
+<p>Note: on older releases, <code class="docutils literal"><span class="pre">setLoadXAction</span></code> is used instead of
+<code class="docutils literal"><span class="pre">setLoadExtAction</span></code>.  Also, on older releases, <code class="docutils literal"><span class="pre">setCondCodeAction</span></code> may not
+be supported.  Examine your release to see what methods are specifically
+supported.</p>
+<p>These callbacks are used to determine that an operation does or does not work
+with a specified type (or types).  And in all cases, the third parameter is a
+<code class="docutils literal"><span class="pre">LegalAction</span></code> type enum value: <code class="docutils literal"><span class="pre">Promote</span></code>, <code class="docutils literal"><span class="pre">Expand</span></code>, <code class="docutils literal"><span class="pre">Custom</span></code>, or
+<code class="docutils literal"><span class="pre">Legal</span></code>.  <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code> contains examples of all four
+<code class="docutils literal"><span class="pre">LegalAction</span></code> values.</p>
+<div class="section" id="promote">
+<h4><a class="toc-backref" href="#id24">Promote</a><a class="headerlink" href="#promote" title="Permalink to this headline">¶</a></h4>
+<p>For an operation without native support for a given type, the specified type
+may be promoted to a larger type that is supported.  For example, SPARC does
+not support a sign-extending load for Boolean values (<code class="docutils literal"><span class="pre">i1</span></code> type), so in
+<code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code> the third parameter below, <code class="docutils literal"><span class="pre">Promote</span></code>, changes
+<code class="docutils literal"><span class="pre">i1</span></code> type values to a large type before loading.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setLoadExtAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">SEXTLOAD</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i1</span><span class="p">,</span> <span class="n">Promote</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="expand">
+<h4><a class="toc-backref" href="#id25">Expand</a><a class="headerlink" href="#expand" title="Permalink to this headline">¶</a></h4>
+<p>For a type without native support, a value may need to be broken down further,
+rather than promoted.  For an operation without native support, a combination
+of other operations may be used to similar effect.  In SPARC, the
+floating-point sine and cosine trig operations are supported by expansion to
+other operations, as indicated by the third parameter, <code class="docutils literal"><span class="pre">Expand</span></code>, to
+<code class="docutils literal"><span class="pre">setOperationAction</span></code>:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">FSIN</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">Expand</span><span class="p">);</span>
+<span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">FCOS</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">Expand</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="custom">
+<h4><a class="toc-backref" href="#id26">Custom</a><a class="headerlink" href="#custom" title="Permalink to this headline">¶</a></h4>
+<p>For some operations, simple type promotion or operation expansion may be
+insufficient.  In some cases, a special intrinsic function must be implemented.</p>
+<p>For example, a constant value may require special treatment, or an operation
+may require spilling and restoring registers in the stack and working with
+register allocators.</p>
+<p>As seen in <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code> code below, to perform a type conversion
+from a floating point value to a signed integer, first the
+<code class="docutils literal"><span class="pre">setOperationAction</span></code> should be called with <code class="docutils literal"><span class="pre">Custom</span></code> as the third parameter:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">FP_TO_SINT</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Custom</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>In the <code class="docutils literal"><span class="pre">LowerOperation</span></code> method, for each <code class="docutils literal"><span class="pre">Custom</span></code> operation, a case
+statement should be added to indicate what function to call.  In the following
+code, an <code class="docutils literal"><span class="pre">FP_TO_SINT</span></code> opcode will call the <code class="docutils literal"><span class="pre">LowerFP_TO_SINT</span></code> method:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SDValue</span> <span class="n">SparcTargetLowering</span><span class="o">::</span><span class="n">LowerOperation</span><span class="p">(</span><span class="n">SDValue</span> <span class="n">Op</span><span class="p">,</span> <span class="n">SelectionDAG</span> <span class="o">&</span><span class="n">DAG</span><span class="p">)</span> <span class="p">{</span>
+  <span class="k">switch</span> <span class="p">(</span><span class="n">Op</span><span class="p">.</span><span class="n">getOpcode</span><span class="p">())</span> <span class="p">{</span>
+  <span class="k">case</span> <span class="n">ISD</span><span class="o">::</span><span class="nl">FP_TO_SINT</span><span class="p">:</span> <span class="k">return</span> <span class="n">LowerFP_TO_SINT</span><span class="p">(</span><span class="n">Op</span><span class="p">,</span> <span class="n">DAG</span><span class="p">);</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Finally, the <code class="docutils literal"><span class="pre">LowerFP_TO_SINT</span></code> method is implemented, using an FP register to
+convert the floating-point value to an integer.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">SDValue</span> <span class="nf">LowerFP_TO_SINT</span><span class="p">(</span><span class="n">SDValue</span> <span class="n">Op</span><span class="p">,</span> <span class="n">SelectionDAG</span> <span class="o">&</span><span class="n">DAG</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">assert</span><span class="p">(</span><span class="n">Op</span><span class="p">.</span><span class="n">getValueType</span><span class="p">()</span> <span class="o">==</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">);</span>
+  <span class="n">Op</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">.</span><span class="n">getNode</span><span class="p">(</span><span class="n">SPISD</span><span class="o">::</span><span class="n">FTOI</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">f32</span><span class="p">,</span> <span class="n">Op</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span>
+  <span class="k">return</span> <span class="n">DAG</span><span class="p">.</span><span class="n">getNode</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">BITCAST</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Op</span><span class="p">);</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="legal">
+<h4><a class="toc-backref" href="#id27">Legal</a><a class="headerlink" href="#legal" title="Permalink to this headline">¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">Legal</span></code> <code class="docutils literal"><span class="pre">LegalizeAction</span></code> enum value simply indicates that an operation
+<strong>is</strong> natively supported.  <code class="docutils literal"><span class="pre">Legal</span></code> represents the default condition, so it
+is rarely used.  In <code class="docutils literal"><span class="pre">SparcISelLowering.cpp</span></code>, the action for <code class="docutils literal"><span class="pre">CTPOP</span></code> (an
+operation to count the bits set in an integer) is natively supported only for
+SPARC v9.  The following code enables the <code class="docutils literal"><span class="pre">Expand</span></code> conversion technique for
+non-v9 SPARC implementations.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">CTPOP</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Expand</span><span class="p">);</span>
+<span class="p">...</span>
+<span class="k">if</span> <span class="p">(</span><span class="n">TM</span><span class="p">.</span><span class="n">getSubtarget</span><span class="o"><</span><span class="n">SparcSubtarget</span><span class="o">></span><span class="p">().</span><span class="n">isV9</span><span class="p">())</span>
+  <span class="n">setOperationAction</span><span class="p">(</span><span class="n">ISD</span><span class="o">::</span><span class="n">CTPOP</span><span class="p">,</span> <span class="n">MVT</span><span class="o">::</span><span class="n">i32</span><span class="p">,</span> <span class="n">Legal</span><span class="p">);</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="calling-conventions">
+<h3><a class="toc-backref" href="#id28">Calling Conventions</a><a class="headerlink" href="#calling-conventions" title="Permalink to this headline">¶</a></h3>
+<p>To support target-specific calling conventions, <code class="docutils literal"><span class="pre">XXXGenCallingConv.td</span></code> uses
+interfaces (such as <code class="docutils literal"><span class="pre">CCIfType</span></code> and <code class="docutils literal"><span class="pre">CCAssignToReg</span></code>) that are defined in
+<code class="docutils literal"><span class="pre">lib/Target/TargetCallingConv.td</span></code>.  TableGen can take the target descriptor
+file <code class="docutils literal"><span class="pre">XXXGenCallingConv.td</span></code> and generate the header file
+<code class="docutils literal"><span class="pre">XXXGenCallingConv.inc</span></code>, which is typically included in
+<code class="docutils literal"><span class="pre">XXXISelLowering.cpp</span></code>.  You can use the interfaces in
+<code class="docutils literal"><span class="pre">TargetCallingConv.td</span></code> to specify:</p>
+<ul class="simple">
+<li>The order of parameter allocation.</li>
+<li>Where parameters and return values are placed (that is, on the stack or in
+registers).</li>
+<li>Which registers may be used.</li>
+<li>Whether the caller or callee unwinds the stack.</li>
+</ul>
+<p>The following example demonstrates the use of the <code class="docutils literal"><span class="pre">CCIfType</span></code> and
+<code class="docutils literal"><span class="pre">CCAssignToReg</span></code> interfaces.  If the <code class="docutils literal"><span class="pre">CCIfType</span></code> predicate is true (that is,
+if the current argument is of type <code class="docutils literal"><span class="pre">f32</span></code> or <code class="docutils literal"><span class="pre">f64</span></code>), then the action is
+performed.  In this case, the <code class="docutils literal"><span class="pre">CCAssignToReg</span></code> action assigns the argument
+value to the first available register: either <code class="docutils literal"><span class="pre">R0</span></code> or <code class="docutils literal"><span class="pre">R1</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">SparcCallingConv.td</span></code> contains definitions for a target-specific return-value
+calling convention (<code class="docutils literal"><span class="pre">RetCC_Sparc32</span></code>) and a basic 32-bit C calling convention
+(<code class="docutils literal"><span class="pre">CC_Sparc32</span></code>).  The definition of <code class="docutils literal"><span class="pre">RetCC_Sparc32</span></code> (shown below) indicates
+which registers are used for specified scalar return types.  A single-precision
+float is returned to register <code class="docutils literal"><span class="pre">F0</span></code>, and a double-precision float goes to
+register <code class="docutils literal"><span class="pre">D0</span></code>.  A 32-bit integer is returned in register <code class="docutils literal"><span class="pre">I0</span></code> or <code class="docutils literal"><span class="pre">I1</span></code>.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def RetCC_Sparc32 : CallingConv<[
+  CCIfType<[i32], CCAssignToReg<[I0, I1]>>,
+  CCIfType<[f32], CCAssignToReg<[F0]>>,
+  CCIfType<[f64], CCAssignToReg<[D0]>>
+]>;
+</pre></div>
+</div>
+<p>The definition of <code class="docutils literal"><span class="pre">CC_Sparc32</span></code> in <code class="docutils literal"><span class="pre">SparcCallingConv.td</span></code> introduces
+<code class="docutils literal"><span class="pre">CCAssignToStack</span></code>, which assigns the value to a stack slot with the specified
+size and alignment.  In the example below, the first parameter, 4, indicates
+the size of the slot, and the second parameter, also 4, indicates the stack
+alignment along 4-byte units.  (Special cases: if size is zero, then the ABI
+size is used; if alignment is zero, then the ABI alignment is used.)</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def CC_Sparc32 : CallingConv<[
+  // All arguments get passed in integer registers if there is space.
+  CCIfType<[i32, f32, f64], CCAssignToReg<[I0, I1, I2, I3, I4, I5]>>,
+  CCAssignToStack<4, 4>
+]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">CCDelegateTo</span></code> is another commonly used interface, which tries to find a
+specified sub-calling convention, and, if a match is found, it is invoked.  In
+the following example (in <code class="docutils literal"><span class="pre">X86CallingConv.td</span></code>), the definition of
+<code class="docutils literal"><span class="pre">RetCC_X86_32_C</span></code> ends with <code class="docutils literal"><span class="pre">CCDelegateTo</span></code>.  After the current value is
+assigned to the register <code class="docutils literal"><span class="pre">ST0</span></code> or <code class="docutils literal"><span class="pre">ST1</span></code>, the <code class="docutils literal"><span class="pre">RetCC_X86Common</span></code> is
+invoked.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def RetCC_X86_32_C : CallingConv<[
+  CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>,
+  CCIfType<[f64], CCAssignToReg<[ST0, ST1]>>,
+  CCDelegateTo<RetCC_X86Common>
+]>;
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">CCIfCC</span></code> is an interface that attempts to match the given name to the current
+calling convention.  If the name identifies the current calling convention,
+then a specified action is invoked.  In the following example (in
+<code class="docutils literal"><span class="pre">X86CallingConv.td</span></code>), if the <code class="docutils literal"><span class="pre">Fast</span></code> calling convention is in use, then
+<code class="docutils literal"><span class="pre">RetCC_X86_32_Fast</span></code> is invoked.  If the <code class="docutils literal"><span class="pre">SSECall</span></code> calling convention is in
+use, then <code class="docutils literal"><span class="pre">RetCC_X86_32_SSE</span></code> is invoked.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def RetCC_X86_32 : CallingConv<[
+  CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>,
+  CCIfCC<"CallingConv::X86_SSECall", CCDelegateTo<RetCC_X86_32_SSE>>,
+  CCDelegateTo<RetCC_X86_32_C>
+]>;
+</pre></div>
+</div>
+<p>Other calling convention interfaces include:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">CCIf</span> <span class="pre"><predicate,</span> <span class="pre">action></span></code> — If the predicate matches, apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCIfInReg</span> <span class="pre"><action></span></code> — If the argument is marked with the “<code class="docutils literal"><span class="pre">inreg</span></code>”
+attribute, then apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCIfNest</span> <span class="pre"><action></span></code> — If the argument is marked with the “<code class="docutils literal"><span class="pre">nest</span></code>”
+attribute, then apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCIfNotVarArg</span> <span class="pre"><action></span></code> — If the current function does not take a
+variable number of arguments, apply the action.</li>
+<li><code class="docutils literal"><span class="pre">CCAssignToRegWithShadow</span> <span class="pre"><registerList,</span> <span class="pre">shadowList></span></code> — similar to
+<code class="docutils literal"><span class="pre">CCAssignToReg</span></code>, but with a shadow list of registers.</li>
+<li><code class="docutils literal"><span class="pre">CCPassByVal</span> <span class="pre"><size,</span> <span class="pre">align></span></code> — Assign value to a stack slot with the
+minimum specified size and alignment.</li>
+<li><code class="docutils literal"><span class="pre">CCPromoteToType</span> <span class="pre"><type></span></code> — Promote the current value to the specified
+type.</li>
+<li><code class="docutils literal"><span class="pre">CallingConv</span> <span class="pre"><[actions]></span></code> — Define each calling convention that is
+supported.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="assembly-printer">
+<h2><a class="toc-backref" href="#id29">Assembly Printer</a><a class="headerlink" href="#assembly-printer" title="Permalink to this headline">¶</a></h2>
+<p>During the code emission stage, the code generator may utilize an LLVM pass to
+produce assembly output.  To do this, you want to implement the code for a
+printer that converts LLVM IR to a GAS-format assembly language for your target
+machine, using the following steps:</p>
+<ul class="simple">
+<li>Define all the assembly strings for your target, adding them to the
+instructions defined in the <code class="docutils literal"><span class="pre">XXXInstrInfo.td</span></code> file.  (See
+<a class="reference internal" href="#instruction-set"><span class="std std-ref">Instruction Set</span></a>.)  TableGen will produce an output file
+(<code class="docutils literal"><span class="pre">XXXGenAsmWriter.inc</span></code>) with an implementation of the <code class="docutils literal"><span class="pre">printInstruction</span></code>
+method for the <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code> class.</li>
+<li>Write <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.h</span></code>, which contains the bare-bones declaration of
+the <code class="docutils literal"><span class="pre">XXXTargetAsmInfo</span></code> class (a subclass of <code class="docutils literal"><span class="pre">TargetAsmInfo</span></code>).</li>
+<li>Write <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.cpp</span></code>, which contains target-specific values for
+<code class="docutils literal"><span class="pre">TargetAsmInfo</span></code> properties and sometimes new implementations for methods.</li>
+<li>Write <code class="docutils literal"><span class="pre">XXXAsmPrinter.cpp</span></code>, which implements the <code class="docutils literal"><span class="pre">AsmPrinter</span></code> class that
+performs the LLVM-to-assembly conversion.</li>
+</ul>
+<p>The code in <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.h</span></code> is usually a trivial declaration of the
+<code class="docutils literal"><span class="pre">XXXTargetAsmInfo</span></code> class for use in <code class="docutils literal"><span class="pre">XXXTargetAsmInfo.cpp</span></code>.  Similarly,
+<code class="docutils literal"><span class="pre">XXXTargetAsmInfo.cpp</span></code> usually has a few declarations of <code class="docutils literal"><span class="pre">XXXTargetAsmInfo</span></code>
+replacement values that override the default values in <code class="docutils literal"><span class="pre">TargetAsmInfo.cpp</span></code>.
+For example in <code class="docutils literal"><span class="pre">SparcTargetAsmInfo.cpp</span></code>:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">SparcTargetAsmInfo</span><span class="o">::</span><span class="n">SparcTargetAsmInfo</span><span class="p">(</span><span class="k">const</span> <span class="n">SparcTargetMachine</span> <span class="o">&</span><span class="n">TM</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">Data16bitsDirective</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.half</span><span class="se">\t</span><span class="s">"</span><span class="p">;</span>
+  <span class="n">Data32bitsDirective</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.word</span><span class="se">\t</span><span class="s">"</span><span class="p">;</span>
+  <span class="n">Data64bitsDirective</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>  <span class="c1">// .xword is only supported by V9.</span>
+  <span class="n">ZeroDirective</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.skip</span><span class="se">\t</span><span class="s">"</span><span class="p">;</span>
+  <span class="n">CommentString</span> <span class="o">=</span> <span class="s">"!"</span><span class="p">;</span>
+  <span class="n">ConstantPoolSection</span> <span class="o">=</span> <span class="s">"</span><span class="se">\t</span><span class="s">.section </span><span class="se">\"</span><span class="s">.rodata</span><span class="se">\"</span><span class="s">,#alloc</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The X86 assembly printer implementation (<code class="docutils literal"><span class="pre">X86TargetAsmInfo</span></code>) is an example
+where the target specific <code class="docutils literal"><span class="pre">TargetAsmInfo</span></code> class uses an overridden methods:
+<code class="docutils literal"><span class="pre">ExpandInlineAsm</span></code>.</p>
+<p>A target-specific implementation of <code class="docutils literal"><span class="pre">AsmPrinter</span></code> is written in
+<code class="docutils literal"><span class="pre">XXXAsmPrinter.cpp</span></code>, which implements the <code class="docutils literal"><span class="pre">AsmPrinter</span></code> class that converts
+the LLVM to printable assembly.  The implementation must include the following
+headers that have declarations for the <code class="docutils literal"><span class="pre">AsmPrinter</span></code> and
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> classes.  The <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> is a subclass of
+<code class="docutils literal"><span class="pre">FunctionPass</span></code>.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/CodeGen/AsmPrinter.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/CodeGen/MachineFunctionPass.h"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>As a <code class="docutils literal"><span class="pre">FunctionPass</span></code>, <code class="docutils literal"><span class="pre">AsmPrinter</span></code> first calls <code class="docutils literal"><span class="pre">doInitialization</span></code> to set
+up the <code class="docutils literal"><span class="pre">AsmPrinter</span></code>.  In <code class="docutils literal"><span class="pre">SparcAsmPrinter</span></code>, a <code class="docutils literal"><span class="pre">Mangler</span></code> object is
+instantiated to process variable names.</p>
+<p>In <code class="docutils literal"><span class="pre">XXXAsmPrinter.cpp</span></code>, the <code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> method (declared in
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>) must be implemented for <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code>.  In
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>, the <code class="docutils literal"><span class="pre">runOnFunction</span></code> method invokes
+<code class="docutils literal"><span class="pre">runOnMachineFunction</span></code>.  Target-specific implementations of
+<code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> differ, but generally do the following to process each
+machine function:</p>
+<ul class="simple">
+<li>Call <code class="docutils literal"><span class="pre">SetupMachineFunction</span></code> to perform initialization.</li>
+<li>Call <code class="docutils literal"><span class="pre">EmitConstantPool</span></code> to print out (to the output stream) constants which
+have been spilled to memory.</li>
+<li>Call <code class="docutils literal"><span class="pre">EmitJumpTableInfo</span></code> to print out jump tables used by the current
+function.</li>
+<li>Print out the label for the current function.</li>
+<li>Print out the code for the function, including basic block labels and the
+assembly for the instruction (using <code class="docutils literal"><span class="pre">printInstruction</span></code>)</li>
+</ul>
+<p>The <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code> implementation must also include the code generated by
+TableGen that is output in the <code class="docutils literal"><span class="pre">XXXGenAsmWriter.inc</span></code> file.  The code in
+<code class="docutils literal"><span class="pre">XXXGenAsmWriter.inc</span></code> contains an implementation of the <code class="docutils literal"><span class="pre">printInstruction</span></code>
+method that may call these methods:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">printOperand</span></code></li>
+<li><code class="docutils literal"><span class="pre">printMemOperand</span></code></li>
+<li><code class="docutils literal"><span class="pre">printCCOperand</span></code> (for conditional statements)</li>
+<li><code class="docutils literal"><span class="pre">printDataDirective</span></code></li>
+<li><code class="docutils literal"><span class="pre">printDeclare</span></code></li>
+<li><code class="docutils literal"><span class="pre">printImplicitDef</span></code></li>
+<li><code class="docutils literal"><span class="pre">printInlineAsm</span></code></li>
+</ul>
+<p>The implementations of <code class="docutils literal"><span class="pre">printDeclare</span></code>, <code class="docutils literal"><span class="pre">printImplicitDef</span></code>,
+<code class="docutils literal"><span class="pre">printInlineAsm</span></code>, and <code class="docutils literal"><span class="pre">printLabel</span></code> in <code class="docutils literal"><span class="pre">AsmPrinter.cpp</span></code> are generally
+adequate for printing assembly and do not need to be overridden.</p>
+<p>The <code class="docutils literal"><span class="pre">printOperand</span></code> method is implemented with a long <code class="docutils literal"><span class="pre">switch</span></code>/<code class="docutils literal"><span class="pre">case</span></code>
+statement for the type of operand: register, immediate, basic block, external
+symbol, global address, constant pool index, or jump table index.  For an
+instruction with a memory address operand, the <code class="docutils literal"><span class="pre">printMemOperand</span></code> method
+should be implemented to generate the proper output.  Similarly,
+<code class="docutils literal"><span class="pre">printCCOperand</span></code> should be used to print a conditional operand.</p>
+<p><code class="docutils literal"><span class="pre">doFinalization</span></code> should be overridden in <code class="docutils literal"><span class="pre">XXXAsmPrinter</span></code>, and it should be
+called to shut down the assembly printer.  During <code class="docutils literal"><span class="pre">doFinalization</span></code>, global
+variables and constants are printed to output.</p>
+</div>
+<div class="section" id="subtarget-support">
+<h2><a class="toc-backref" href="#id30">Subtarget Support</a><a class="headerlink" href="#subtarget-support" title="Permalink to this headline">¶</a></h2>
+<p>Subtarget support is used to inform the code generation process of instruction
+set variations for a given chip set.  For example, the LLVM SPARC
+implementation provided covers three major versions of the SPARC microprocessor
+architecture: Version 8 (V8, which is a 32-bit architecture), Version 9 (V9, a
+64-bit architecture), and the UltraSPARC architecture.  V8 has 16
+double-precision floating-point registers that are also usable as either 32
+single-precision or 8 quad-precision registers.  V8 is also purely big-endian.
+V9 has 32 double-precision floating-point registers that are also usable as 16
+quad-precision registers, but cannot be used as single-precision registers.
+The UltraSPARC architecture combines V9 with UltraSPARC Visual Instruction Set
+extensions.</p>
+<p>If subtarget support is needed, you should implement a target-specific
+<code class="docutils literal"><span class="pre">XXXSubtarget</span></code> class for your architecture.  This class should process the
+command-line options <code class="docutils literal"><span class="pre">-mcpu=</span></code> and <code class="docutils literal"><span class="pre">-mattr=</span></code>.</p>
+<p>TableGen uses definitions in the <code class="docutils literal"><span class="pre">Target.td</span></code> and <code class="docutils literal"><span class="pre">Sparc.td</span></code> files to
+generate code in <code class="docutils literal"><span class="pre">SparcGenSubtarget.inc</span></code>.  In <code class="docutils literal"><span class="pre">Target.td</span></code>, shown below, the
+<code class="docutils literal"><span class="pre">SubtargetFeature</span></code> interface is defined.  The first 4 string parameters of
+the <code class="docutils literal"><span class="pre">SubtargetFeature</span></code> interface are a feature name, an attribute set by the
+feature, the value of the attribute, and a description of the feature.  (The
+fifth parameter is a list of features whose presence is implied, and its
+default value is an empty array.)</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class SubtargetFeature<string n, string a, string v, string d,
+                       list<SubtargetFeature> i = []> {
+  string Name = n;
+  string Attribute = a;
+  string Value = v;
+  string Desc = d;
+  list<SubtargetFeature> Implies = i;
+}
+</pre></div>
+</div>
+<p>In the <code class="docutils literal"><span class="pre">Sparc.td</span></code> file, the <code class="docutils literal"><span class="pre">SubtargetFeature</span></code> is used to define the
+following features.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true",
+                     "Enable SPARC-V9 instructions">;
+def FeatureV8Deprecated : SubtargetFeature<"deprecated-v8",
+                     "V8DeprecatedInsts", "true",
+                     "Enable deprecated V8 instructions in V9 mode">;
+def FeatureVIS : SubtargetFeature<"vis", "IsVIS", "true",
+                     "Enable UltraSPARC Visual Instruction Set extensions">;
+</pre></div>
+</div>
+<p>Elsewhere in <code class="docutils literal"><span class="pre">Sparc.td</span></code>, the <code class="docutils literal"><span class="pre">Proc</span></code> class is defined and then is used to
+define particular SPARC processor subtypes that may have the previously
+described features.</p>
+<div class="highlight-text"><div class="highlight"><pre><span></span>class Proc<string Name, list<SubtargetFeature> Features>
+  : Processor<Name, NoItineraries, Features>;
+
+def : Proc<"generic",         []>;
+def : Proc<"v8",              []>;
+def : Proc<"supersparc",      []>;
+def : Proc<"sparclite",       []>;
+def : Proc<"f934",            []>;
+def : Proc<"hypersparc",      []>;
+def : Proc<"sparclite86x",    []>;
+def : Proc<"sparclet",        []>;
+def : Proc<"tsc701",          []>;
+def : Proc<"v9",              [FeatureV9]>;
+def : Proc<"ultrasparc",      [FeatureV9, FeatureV8Deprecated]>;
+def : Proc<"ultrasparc3",     [FeatureV9, FeatureV8Deprecated]>;
+def : Proc<"ultrasparc3-vis", [FeatureV9, FeatureV8Deprecated, FeatureVIS]>;
+</pre></div>
+</div>
+<p>From <code class="docutils literal"><span class="pre">Target.td</span></code> and <code class="docutils literal"><span class="pre">Sparc.td</span></code> files, the resulting
+<code class="docutils literal"><span class="pre">SparcGenSubtarget.inc</span></code> specifies enum values to identify the features,
+arrays of constants to represent the CPU features and CPU subtypes, and the
+<code class="docutils literal"><span class="pre">ParseSubtargetFeatures</span></code> method that parses the features string that sets
+specified subtarget options.  The generated <code class="docutils literal"><span class="pre">SparcGenSubtarget.inc</span></code> file
+should be included in the <code class="docutils literal"><span class="pre">SparcSubtarget.cpp</span></code>.  The target-specific
+implementation of the <code class="docutils literal"><span class="pre">XXXSubtarget</span></code> method should follow this pseudocode:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">XXXSubtarget</span><span class="o">::</span><span class="n">XXXSubtarget</span><span class="p">(</span><span class="k">const</span> <span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">,</span> <span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="o">&</span><span class="n">FS</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// Set the default features</span>
+  <span class="c1">// Determine default and user specified characteristics of the CPU</span>
+  <span class="c1">// Call ParseSubtargetFeatures(FS, CPU) to parse the features string</span>
+  <span class="c1">// Perform any additional operations</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="jit-support">
+<h2><a class="toc-backref" href="#id31">JIT Support</a><a class="headerlink" href="#jit-support" title="Permalink to this headline">¶</a></h2>
+<p>The implementation of a target machine optionally includes a Just-In-Time (JIT)
+code generator that emits machine code and auxiliary structures as binary
+output that can be written directly to memory.  To do this, implement JIT code
+generation by performing the following steps:</p>
+<ul class="simple">
+<li>Write an <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code> file that contains a machine function pass
+that transforms target-machine instructions into relocatable machine
+code.</li>
+<li>Write an <code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> file that implements the JIT interfaces for
+target-specific code-generation activities, such as emitting machine code and
+stubs.</li>
+<li>Modify <code class="docutils literal"><span class="pre">XXXTargetMachine</span></code> so that it provides a <code class="docutils literal"><span class="pre">TargetJITInfo</span></code> object
+through its <code class="docutils literal"><span class="pre">getJITInfo</span></code> method.</li>
+</ul>
+<p>There are several different approaches to writing the JIT support code.  For
+instance, TableGen and target descriptor files may be used for creating a JIT
+code generator, but are not mandatory.  For the Alpha and PowerPC target
+machines, TableGen is used to generate <code class="docutils literal"><span class="pre">XXXGenCodeEmitter.inc</span></code>, which
+contains the binary coding of machine instructions and the
+<code class="docutils literal"><span class="pre">getBinaryCodeForInstr</span></code> method to access those codes.  Other JIT
+implementations do not.</p>
+<p>Both <code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> and <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code> must include the
+<code class="docutils literal"><span class="pre">llvm/CodeGen/MachineCodeEmitter.h</span></code> header file that defines the
+<code class="docutils literal"><span class="pre">MachineCodeEmitter</span></code> class containing code for several callback functions
+that write data (in bytes, words, strings, etc.) to the output stream.</p>
+<div class="section" id="machine-code-emitter">
+<h3><a class="toc-backref" href="#id32">Machine Code Emitter</a><a class="headerlink" href="#machine-code-emitter" title="Permalink to this headline">¶</a></h3>
+<p>In <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code>, a target-specific of the <code class="docutils literal"><span class="pre">Emitter</span></code> class is
+implemented as a function pass (subclass of <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>).  The
+target-specific implementation of <code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> (invoked by
+<code class="docutils literal"><span class="pre">runOnFunction</span></code> in <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>) iterates through the
+<code class="docutils literal"><span class="pre">MachineBasicBlock</span></code> calls <code class="docutils literal"><span class="pre">emitInstruction</span></code> to process each instruction and
+emit binary code.  <code class="docutils literal"><span class="pre">emitInstruction</span></code> is largely implemented with case
+statements on the instruction types defined in <code class="docutils literal"><span class="pre">XXXInstrInfo.h</span></code>.  For
+example, in <code class="docutils literal"><span class="pre">X86CodeEmitter.cpp</span></code>, the <code class="docutils literal"><span class="pre">emitInstruction</span></code> method is built
+around the following <code class="docutils literal"><span class="pre">switch</span></code>/<code class="docutils literal"><span class="pre">case</span></code> statements:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">switch</span> <span class="p">(</span><span class="n">Desc</span><span class="o">-></span><span class="n">TSFlags</span> <span class="o">&</span> <span class="n">X86</span><span class="o">::</span><span class="n">FormMask</span><span class="p">)</span> <span class="p">{</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">Pseudo</span><span class="p">:</span>  <span class="c1">// for not yet implemented instructions</span>
+   <span class="p">...</span>               <span class="c1">// or pseudo-instructions</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">RawFrm</span><span class="p">:</span>  <span class="c1">// for instructions with a fixed opcode value</span>
+   <span class="p">...</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">AddRegFrm</span><span class="p">:</span> <span class="c1">// for instructions that have one register operand</span>
+   <span class="p">...</span>                 <span class="c1">// added to their opcode</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMDestReg</span><span class="p">:</span><span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a destination (register)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMDestMem</span><span class="p">:</span><span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a destination (memory)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMSrcReg</span><span class="p">:</span> <span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a source (register)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMSrcMem</span><span class="p">:</span> <span class="c1">// for instructions that use the Mod/RM byte</span>
+   <span class="p">...</span>                 <span class="c1">// to specify a source (memory)</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM0r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM1r</span><span class="p">:</span>  <span class="c1">// for instructions that operate on</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM2r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM3r</span><span class="p">:</span>  <span class="c1">// a REGISTER r/m operand and</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM4r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM5r</span><span class="p">:</span>  <span class="c1">// use the Mod/RM byte and a field</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM6r</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM7r</span><span class="p">:</span>  <span class="c1">// to hold extended opcode data</span>
+   <span class="p">...</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM0m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM1m</span><span class="p">:</span>  <span class="c1">// for instructions that operate on</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM2m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM3m</span><span class="p">:</span>  <span class="c1">// a MEMORY r/m operand and</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM4m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM5m</span><span class="p">:</span>  <span class="c1">// use the Mod/RM byte and a field</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM6m</span><span class="p">:</span> <span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRM7m</span><span class="p">:</span>  <span class="c1">// to hold extended opcode data</span>
+   <span class="p">...</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">MRMInitReg</span><span class="p">:</span> <span class="c1">// for instructions whose source and</span>
+   <span class="p">...</span>                  <span class="c1">// destination are the same register</span>
+   <span class="k">break</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The implementations of these case statements often first emit the opcode and
+then get the operand(s).  Then depending upon the operand, helper methods may
+be called to process the operand(s).  For example, in <code class="docutils literal"><span class="pre">X86CodeEmitter.cpp</span></code>,
+for the <code class="docutils literal"><span class="pre">X86II::AddRegFrm</span></code> case, the first data emitted (by <code class="docutils literal"><span class="pre">emitByte</span></code>) is
+the opcode added to the register operand.  Then an object representing the
+machine operand, <code class="docutils literal"><span class="pre">MO1</span></code>, is extracted.  The helper methods such as
+<code class="docutils literal"><span class="pre">isImmediate</span></code>, <code class="docutils literal"><span class="pre">isGlobalAddress</span></code>, <code class="docutils literal"><span class="pre">isExternalSymbol</span></code>,
+<code class="docutils literal"><span class="pre">isConstantPoolIndex</span></code>, and <code class="docutils literal"><span class="pre">isJumpTableIndex</span></code> determine the operand type.
+(<code class="docutils literal"><span class="pre">X86CodeEmitter.cpp</span></code> also has private methods such as <code class="docutils literal"><span class="pre">emitConstant</span></code>,
+<code class="docutils literal"><span class="pre">emitGlobalAddress</span></code>, <code class="docutils literal"><span class="pre">emitExternalSymbolAddress</span></code>, <code class="docutils literal"><span class="pre">emitConstPoolAddress</span></code>,
+and <code class="docutils literal"><span class="pre">emitJumpTableAddress</span></code> that emit the data into the output stream.)</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">case</span> <span class="n">X86II</span><span class="o">::</span><span class="nl">AddRegFrm</span><span class="p">:</span>
+  <span class="n">MCE</span><span class="p">.</span><span class="n">emitByte</span><span class="p">(</span><span class="n">BaseOpcode</span> <span class="o">+</span> <span class="n">getX86RegNum</span><span class="p">(</span><span class="n">MI</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="n">CurOp</span><span class="o">++</span><span class="p">).</span><span class="n">getReg</span><span class="p">()));</span>
+
+  <span class="k">if</span> <span class="p">(</span><span class="n">CurOp</span> <span class="o">!=</span> <span class="n">NumOps</span><span class="p">)</span> <span class="p">{</span>
+    <span class="k">const</span> <span class="n">MachineOperand</span> <span class="o">&</span><span class="n">MO1</span> <span class="o">=</span> <span class="n">MI</span><span class="p">.</span><span class="n">getOperand</span><span class="p">(</span><span class="n">CurOp</span><span class="o">++</span><span class="p">);</span>
+    <span class="kt">unsigned</span> <span class="n">Size</span> <span class="o">=</span> <span class="n">X86InstrInfo</span><span class="o">::</span><span class="n">sizeOfImm</span><span class="p">(</span><span class="n">Desc</span><span class="p">);</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isImmediate</span><span class="p">())</span>
+      <span class="n">emitConstant</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getImm</span><span class="p">(),</span> <span class="n">Size</span><span class="p">);</span>
+    <span class="k">else</span> <span class="p">{</span>
+      <span class="kt">unsigned</span> <span class="n">rt</span> <span class="o">=</span> <span class="n">Is64BitMode</span> <span class="o">?</span> <span class="n">X86</span><span class="o">::</span><span class="nl">reloc_pcrel_word</span>
+        <span class="p">:</span> <span class="p">(</span><span class="n">IsPIC</span> <span class="o">?</span> <span class="n">X86</span><span class="o">::</span><span class="nl">reloc_picrel_word</span> <span class="p">:</span> <span class="n">X86</span><span class="o">::</span><span class="n">reloc_absolute_word</span><span class="p">);</span>
+      <span class="k">if</span> <span class="p">(</span><span class="n">Opcode</span> <span class="o">==</span> <span class="n">X86</span><span class="o">::</span><span class="n">MOV64ri</span><span class="p">)</span>
+        <span class="n">rt</span> <span class="o">=</span> <span class="n">X86</span><span class="o">::</span><span class="n">reloc_absolute_dword</span><span class="p">;</span>  <span class="c1">// FIXME: add X86II flag?</span>
+      <span class="k">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isGlobalAddress</span><span class="p">())</span> <span class="p">{</span>
+        <span class="kt">bool</span> <span class="n">NeedStub</span> <span class="o">=</span> <span class="n">isa</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getGlobal</span><span class="p">());</span>
+        <span class="kt">bool</span> <span class="n">isLazy</span> <span class="o">=</span> <span class="n">gvNeedsLazyPtr</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getGlobal</span><span class="p">());</span>
+        <span class="n">emitGlobalAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getGlobal</span><span class="p">(),</span> <span class="n">rt</span><span class="p">,</span> <span class="n">MO1</span><span class="p">.</span><span class="n">getOffset</span><span class="p">(),</span> <span class="mi">0</span><span class="p">,</span>
+                          <span class="n">NeedStub</span><span class="p">,</span> <span class="n">isLazy</span><span class="p">);</span>
+      <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isExternalSymbol</span><span class="p">())</span>
+        <span class="n">emitExternalSymbolAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getSymbolName</span><span class="p">(),</span> <span class="n">rt</span><span class="p">);</span>
+      <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isConstantPoolIndex</span><span class="p">())</span>
+        <span class="n">emitConstPoolAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getIndex</span><span class="p">(),</span> <span class="n">rt</span><span class="p">);</span>
+      <span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">isJumpTableIndex</span><span class="p">())</span>
+        <span class="n">emitJumpTableAddress</span><span class="p">(</span><span class="n">MO1</span><span class="p">.</span><span class="n">getIndex</span><span class="p">(),</span> <span class="n">rt</span><span class="p">);</span>
+    <span class="p">}</span>
+  <span class="p">}</span>
+  <span class="k">break</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>In the previous example, <code class="docutils literal"><span class="pre">XXXCodeEmitter.cpp</span></code> uses the variable <code class="docutils literal"><span class="pre">rt</span></code>, which
+is a <code class="docutils literal"><span class="pre">RelocationType</span></code> enum that may be used to relocate addresses (for
+example, a global address with a PIC base offset).  The <code class="docutils literal"><span class="pre">RelocationType</span></code> enum
+for that target is defined in the short target-specific <code class="docutils literal"><span class="pre">XXXRelocations.h</span></code>
+file.  The <code class="docutils literal"><span class="pre">RelocationType</span></code> is used by the <code class="docutils literal"><span class="pre">relocate</span></code> method defined in
+<code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> to rewrite addresses for referenced global symbols.</p>
+<p>For example, <code class="docutils literal"><span class="pre">X86Relocations.h</span></code> specifies the following relocation types for
+the X86 addresses.  In all four cases, the relocated value is added to the
+value already in memory.  For <code class="docutils literal"><span class="pre">reloc_pcrel_word</span></code> and <code class="docutils literal"><span class="pre">reloc_picrel_word</span></code>,
+there is an additional initial adjustment.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="n">RelocationType</span> <span class="p">{</span>
+  <span class="n">reloc_pcrel_word</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span>    <span class="c1">// add reloc value after adjusting for the PC loc</span>
+  <span class="n">reloc_picrel_word</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>   <span class="c1">// add reloc value after adjusting for the PIC base</span>
+  <span class="n">reloc_absolute_word</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span> <span class="c1">// absolute relocation; no additional adjustment</span>
+  <span class="n">reloc_absolute_dword</span> <span class="o">=</span> <span class="mi">3</span> <span class="c1">// absolute relocation; no additional adjustment</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="target-jit-info">
+<h3><a class="toc-backref" href="#id33">Target JIT Info</a><a class="headerlink" href="#target-jit-info" title="Permalink to this headline">¶</a></h3>
+<p><code class="docutils literal"><span class="pre">XXXJITInfo.cpp</span></code> implements the JIT interfaces for target-specific
+code-generation activities, such as emitting machine code and stubs.  At
+minimum, a target-specific version of <code class="docutils literal"><span class="pre">XXXJITInfo</span></code> implements the following:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> — Initializes the JIT, gives the target a
+function that is used for compilation.</li>
+<li><code class="docutils literal"><span class="pre">emitFunctionStub</span></code> — Returns a native function with a specified address
+for a callback function.</li>
+<li><code class="docutils literal"><span class="pre">relocate</span></code> — Changes the addresses of referenced globals, based on
+relocation types.</li>
+<li>Callback function that are wrappers to a function stub that is used when the
+real target is not initially known.</li>
+</ul>
+<p><code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> is generally trivial to implement.  It makes the
+incoming parameter as the global <code class="docutils literal"><span class="pre">JITCompilerFunction</span></code> and returns the
+callback function that will be used a function wrapper.  For the Alpha target
+(in <code class="docutils literal"><span class="pre">AlphaJITInfo.cpp</span></code>), the <code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> implementation is
+simply:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">TargetJITInfo</span><span class="o">::</span><span class="n">LazyResolverFn</span> <span class="n">AlphaJITInfo</span><span class="o">::</span><span class="n">getLazyResolverFunction</span><span class="p">(</span>
+                                            <span class="n">JITCompilerFn</span> <span class="n">F</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">JITCompilerFunction</span> <span class="o">=</span> <span class="n">F</span><span class="p">;</span>
+  <span class="k">return</span> <span class="n">AlphaCompilationCallback</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>For the X86 target, the <code class="docutils literal"><span class="pre">getLazyResolverFunction</span></code> implementation is a little
+more complicated, because it returns a different callback function for
+processors with SSE instructions and XMM registers.</p>
+<p>The callback function initially saves and later restores the callee register
+values, incoming arguments, and frame and return address.  The callback
+function needs low-level access to the registers or stack, so it is typically
+implemented with assembler.</p>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="HowToUseInstrMappings.html" title="How To Use Instruction Mappings"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="Proposals/VectorizationPlan.html" title="Vectorization Plan"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html (added)
+++ www-releases/trunk/5.0.1/docs/WritingAnLLVMPass.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1326 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Writing an LLVM Pass — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="How To Use Attributes" href="HowToUseAttributes.html" />
+    <link rel="prev" title="Garbage Collection with LLVM" href="GarbageCollection.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="HowToUseAttributes.html" title="How To Use Attributes"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="GarbageCollection.html" title="Garbage Collection with LLVM"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="writing-an-llvm-pass">
+<h1>Writing an LLVM Pass<a class="headerlink" href="#writing-an-llvm-pass" title="Permalink to this headline">¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction-what-is-a-pass" id="id5">Introduction — What is a pass?</a></li>
+<li><a class="reference internal" href="#quick-start-writing-hello-world" id="id6">Quick Start — Writing hello world</a><ul>
+<li><a class="reference internal" href="#setting-up-the-build-environment" id="id7">Setting up the build environment</a></li>
+<li><a class="reference internal" href="#basic-code-required" id="id8">Basic code required</a></li>
+<li><a class="reference internal" href="#running-a-pass-with-opt" id="id9">Running a pass with <code class="docutils literal"><span class="pre">opt</span></code></a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pass-classes-and-requirements" id="id10">Pass classes and requirements</a><ul>
+<li><a class="reference internal" href="#the-immutablepass-class" id="id11">The <code class="docutils literal"><span class="pre">ImmutablePass</span></code> class</a></li>
+<li><a class="reference internal" href="#the-modulepass-class" id="id12">The <code class="docutils literal"><span class="pre">ModulePass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-runonmodule-method" id="id13">The <code class="docutils literal"><span class="pre">runOnModule</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-callgraphsccpass-class" id="id14">The <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-callgraph-method" id="id15">The <code class="docutils literal"><span class="pre">doInitialization(CallGraph</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonscc-method" id="id16">The <code class="docutils literal"><span class="pre">runOnSCC</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-callgraph-method" id="id17">The <code class="docutils literal"><span class="pre">doFinalization(CallGraph</span> <span class="pre">&)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-functionpass-class" id="id18">The <code class="docutils literal"><span class="pre">FunctionPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-module-method" id="id19">The <code class="docutils literal"><span class="pre">doInitialization(Module</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonfunction-method" id="id20">The <code class="docutils literal"><span class="pre">runOnFunction</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-module-method" id="id21">The <code class="docutils literal"><span class="pre">doFinalization(Module</span> <span class="pre">&)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-looppass-class" id="id22">The <code class="docutils literal"><span class="pre">LoopPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-loop-lppassmanager-method" id="id23">The <code class="docutils literal"><span class="pre">doInitialization(Loop</span> <span class="pre">*,</span> <span class="pre">LPPassManager</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonloop-method" id="id24">The <code class="docutils literal"><span class="pre">runOnLoop</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-method" id="id25">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-regionpass-class" id="id26">The <code class="docutils literal"><span class="pre">RegionPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-region-rgpassmanager-method" id="id27">The <code class="docutils literal"><span class="pre">doInitialization(Region</span> <span class="pre">*,</span> <span class="pre">RGPassManager</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonregion-method" id="id28">The <code class="docutils literal"><span class="pre">runOnRegion</span></code> method</a></li>
+<li><a class="reference internal" href="#id2" id="id29">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-basicblockpass-class" id="id30">The <code class="docutils literal"><span class="pre">BasicBlockPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-doinitialization-function-method" id="id31">The <code class="docutils literal"><span class="pre">doInitialization(Function</span> <span class="pre">&)</span></code> method</a></li>
+<li><a class="reference internal" href="#the-runonbasicblock-method" id="id32">The <code class="docutils literal"><span class="pre">runOnBasicBlock</span></code> method</a></li>
+<li><a class="reference internal" href="#the-dofinalization-function-method" id="id33">The <code class="docutils literal"><span class="pre">doFinalization(Function</span> <span class="pre">&)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#the-machinefunctionpass-class" id="id34">The <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> class</a><ul>
+<li><a class="reference internal" href="#the-runonmachinefunction-machinefunction-mf-method" id="id35">The <code class="docutils literal"><span class="pre">runOnMachineFunction(MachineFunction</span> <span class="pre">&MF)</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pass-registration" id="id36">Pass registration</a><ul>
+<li><a class="reference internal" href="#the-print-method" id="id37">The <code class="docutils literal"><span class="pre">print</span></code> method</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#specifying-interactions-between-passes" id="id38">Specifying interactions between passes</a><ul>
+<li><a class="reference internal" href="#the-getanalysisusage-method" id="id39">The <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> method</a></li>
+<li><a class="reference internal" href="#the-analysisusage-addrequired-and-analysisusage-addrequiredtransitive-methods" id="id40">The <code class="docutils literal"><span class="pre">AnalysisUsage::addRequired<></span></code> and <code class="docutils literal"><span class="pre">AnalysisUsage::addRequiredTransitive<></span></code> methods</a></li>
+<li><a class="reference internal" href="#the-analysisusage-addpreserved-method" id="id41">The <code class="docutils literal"><span class="pre">AnalysisUsage::addPreserved<></span></code> method</a></li>
+<li><a class="reference internal" href="#example-implementations-of-getanalysisusage" id="id42">Example implementations of <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code></a></li>
+<li><a class="reference internal" href="#the-getanalysis-and-getanalysisifavailable-methods" id="id43">The <code class="docutils literal"><span class="pre">getAnalysis<></span></code> and <code class="docutils literal"><span class="pre">getAnalysisIfAvailable<></span></code> methods</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#implementing-analysis-groups" id="id44">Implementing Analysis Groups</a><ul>
+<li><a class="reference internal" href="#analysis-group-concepts" id="id45">Analysis Group Concepts</a></li>
+<li><a class="reference internal" href="#using-registeranalysisgroup" id="id46">Using <code class="docutils literal"><span class="pre">RegisterAnalysisGroup</span></code></a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a class="reference internal" href="#pass-statistics" id="id47">Pass Statistics</a><ul>
+<li><a class="reference internal" href="#what-passmanager-does" id="id48">What PassManager does</a><ul>
+<li><a class="reference internal" href="#the-releasememory-method" id="id49">The <code class="docutils literal"><span class="pre">releaseMemory</span></code> method</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a class="reference internal" href="#registering-dynamically-loaded-passes" id="id50">Registering dynamically loaded passes</a><ul>
+<li><a class="reference internal" href="#using-existing-registries" id="id51">Using existing registries</a></li>
+<li><a class="reference internal" href="#creating-new-registries" id="id52">Creating new registries</a></li>
+<li><a class="reference internal" href="#using-gdb-with-dynamically-loaded-passes" id="id53">Using GDB with dynamically loaded passes</a><ul>
+<li><a class="reference internal" href="#setting-a-breakpoint-in-your-pass" id="id54">Setting a breakpoint in your pass</a></li>
+<li><a class="reference internal" href="#miscellaneous-problems" id="id55">Miscellaneous Problems</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#future-extensions-planned" id="id56">Future extensions planned</a><ul>
+<li><a class="reference internal" href="#multithreaded-llvm" id="id57">Multithreaded LLVM</a></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="introduction-what-is-a-pass">
+<h2><a class="toc-backref" href="#id5">Introduction — What is a pass?</a><a class="headerlink" href="#introduction-what-is-a-pass" title="Permalink to this headline">¶</a></h2>
+<p>The LLVM Pass Framework is an important part of the LLVM system, because LLVM
+passes are where most of the interesting parts of the compiler exist.  Passes
+perform the transformations and optimizations that make up the compiler, they
+build the analysis results that are used by these transformations, and they
+are, above all, a structuring technique for compiler code.</p>
+<p>All LLVM passes are subclasses of the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Pass.html">Pass</a> class, which implement
+functionality by overriding virtual methods inherited from <code class="docutils literal"><span class="pre">Pass</span></code>.  Depending
+on how your pass works, you should inherit from the <a class="reference internal" href="#writing-an-llvm-pass-modulepass"><span class="std std-ref">ModulePass</span></a> , <a class="reference internal" href="#writing-an-llvm-pass-callgraphsccpass"><span class="std std-ref">CallGraphSCCPass</span></a>, <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> , or <a class="reference internal" href="#writing-an-llvm-pass-looppass"><span class="std std-ref">LoopPass</span></a>, or <a class="reference internal" href="#writing-an-llvm-pass-regionpass"><span class="std std-ref">RegionPass</span></a>, or <a class="reference internal" href="#writing-an-llvm-pass-basicblockpass"><span class="std std-ref">BasicBlockPass</span></a> classes, which gives the system more
+information about what your pass does, and how it can be combined with other
+passes.  One of the main features of the LLVM Pass Framework is that it
+schedules passes to run in an efficient way based on the constraints that your
+pass meets (which are indicated by which class they derive from).</p>
+<p>We start by showing you how to construct a pass, everything from setting up the
+code, to compiling, loading, and executing it.  After the basics are down, more
+advanced features are discussed.</p>
+</div>
+<div class="section" id="quick-start-writing-hello-world">
+<h2><a class="toc-backref" href="#id6">Quick Start — Writing hello world</a><a class="headerlink" href="#quick-start-writing-hello-world" title="Permalink to this headline">¶</a></h2>
+<p>Here we describe how to write the “hello world” of passes.  The “Hello” pass is
+designed to simply print out the name of non-external functions that exist in
+the program being compiled.  It does not modify the program at all, it just
+inspects it.  The source code and files for this pass are available in the LLVM
+source tree in the <code class="docutils literal"><span class="pre">lib/Transforms/Hello</span></code> directory.</p>
+<div class="section" id="setting-up-the-build-environment">
+<span id="writing-an-llvm-pass-makefile"></span><h3><a class="toc-backref" href="#id7">Setting up the build environment</a><a class="headerlink" href="#setting-up-the-build-environment" title="Permalink to this headline">¶</a></h3>
+<p>First, configure and build LLVM.  Next, you need to create a new directory
+somewhere in the LLVM source base.  For this example, we’ll assume that you
+made <code class="docutils literal"><span class="pre">lib/Transforms/Hello</span></code>.  Finally, you must set up a build script
+that will compile the source code for the new pass.  To do this,
+copy the following into <code class="docutils literal"><span class="pre">CMakeLists.txt</span></code>:</p>
+<div class="highlight-cmake"><div class="highlight"><pre><span></span><span class="nb">add_llvm_loadable_module</span><span class="p">(</span> <span class="s">LLVMHello</span>
+  <span class="s">Hello.cpp</span>
+
+  <span class="s">PLUGIN_TOOL</span>
+  <span class="s">opt</span>
+  <span class="p">)</span>
+</pre></div>
+</div>
+<p>and the following line into <code class="docutils literal"><span class="pre">lib/Transforms/CMakeLists.txt</span></code>:</p>
+<div class="highlight-cmake"><div class="highlight"><pre><span></span><span class="nb">add_subdirectory</span><span class="p">(</span><span class="s">Hello</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>(Note that there is already a directory named <code class="docutils literal"><span class="pre">Hello</span></code> with a sample “Hello”
+pass; you may play with it – in which case you don’t need to modify any
+<code class="docutils literal"><span class="pre">CMakeLists.txt</span></code> files – or, if you want to create everything from scratch,
+use another name.)</p>
+<p>This build script specifies that <code class="docutils literal"><span class="pre">Hello.cpp</span></code> file in the current directory
+is to be compiled and linked into a shared object <code class="docutils literal"><span class="pre">$(LEVEL)/lib/LLVMHello.so</span></code> that
+can be dynamically loaded by the <strong class="program">opt</strong> tool via its <a class="reference internal" href="CommandGuide/opt.html#cmdoption-load"><code class="xref std std-option docutils literal"><span class="pre">-load</span></code></a>
+option. If your operating system uses a suffix other than <code class="docutils literal"><span class="pre">.so</span></code> (such as
+Windows or Mac OS X), the appropriate extension will be used.</p>
+<p>Now that we have the build scripts set up, we just need to write the code for
+the pass itself.</p>
+</div>
+<div class="section" id="basic-code-required">
+<span id="writing-an-llvm-pass-basiccode"></span><h3><a class="toc-backref" href="#id8">Basic code required</a><a class="headerlink" href="#basic-code-required" title="Permalink to this headline">¶</a></h3>
+<p>Now that we have a way to compile our new pass, we just have to write it.
+Start out with:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/Pass.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/IR/Function.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/Support/raw_ostream.h"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>Which are needed because we are writing a <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Pass.html">Pass</a>, we are operating on
+<a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Function.html">Function</a>s, and we will
+be doing some printing.</p>
+<p>Next we have:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="k">namespace</span> <span class="n">llvm</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>… which is required because the functions from the include files live in the
+llvm namespace.</p>
+<p>Next we have:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="p">{</span>
+</pre></div>
+</div>
+<p>… which starts out an anonymous namespace.  Anonymous namespaces are to C++
+what the “<code class="docutils literal"><span class="pre">static</span></code>” keyword is to C (at global scope).  It makes the things
+declared inside of the anonymous namespace visible only to the current file.
+If you’re not familiar with them, consult a decent C++ book for more
+information.</p>
+<p>Next, we declare our pass itself:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="nl">Hello</span> <span class="p">:</span> <span class="k">public</span> <span class="n">FunctionPass</span> <span class="p">{</span>
+</pre></div>
+</div>
+<p>This declares a “<code class="docutils literal"><span class="pre">Hello</span></code>” class that is a subclass of <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a>.  The different builtin pass subclasses
+are described in detail <a class="reference internal" href="#writing-an-llvm-pass-pass-classes"><span class="std std-ref">later</span></a>, but
+for now, know that <code class="docutils literal"><span class="pre">FunctionPass</span></code> operates on a function at a time.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="kt">char</span> <span class="n">ID</span><span class="p">;</span>
+<span class="n">Hello</span><span class="p">()</span> <span class="o">:</span> <span class="n">FunctionPass</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="p">{}</span>
+</pre></div>
+</div>
+<p>This declares pass identifier used by LLVM to identify pass.  This allows LLVM
+to avoid using expensive C++ runtime information.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span>  <span class="kt">bool</span> <span class="nf">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
+    <span class="n">errs</span><span class="p">()</span> <span class="o"><<</span> <span class="s">"Hello: "</span><span class="p">;</span>
+    <span class="n">errs</span><span class="p">().</span><span class="n">write_escaped</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">getName</span><span class="p">())</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
+    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">};</span> <span class="c1">// end of struct Hello</span>
+<span class="p">}</span>  <span class="c1">// end of anonymous namespace</span>
+</pre></div>
+</div>
+<p>We declare a <a class="reference internal" href="#writing-an-llvm-pass-runonfunction"><span class="std std-ref">runOnFunction</span></a> method,
+which overrides an abstract virtual method inherited from <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a>.  This is where we are supposed to do our
+thing, so we just print out our message with the name of each function.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">char</span> <span class="n">Hello</span><span class="o">::</span><span class="n">ID</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>We initialize pass ID here.  LLVM uses ID’s address to identify a pass, so
+initialization value is not important.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">RegisterPass</span><span class="o"><</span><span class="n">Hello</span><span class="o">></span> <span class="n">X</span><span class="p">(</span><span class="s">"hello"</span><span class="p">,</span> <span class="s">"Hello World Pass"</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Only looks at CFG */</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Analysis Pass */</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Lastly, we <a class="reference internal" href="#writing-an-llvm-pass-registration"><span class="std std-ref">register our class</span></a>
+<code class="docutils literal"><span class="pre">Hello</span></code>, giving it a command line argument “<code class="docutils literal"><span class="pre">hello</span></code>”, and a name “Hello
+World Pass”.  The last two arguments describe its behavior: if a pass walks CFG
+without modifying it then the third argument is set to <code class="docutils literal"><span class="pre">true</span></code>; if a pass is
+an analysis pass, for example dominator tree pass, then <code class="docutils literal"><span class="pre">true</span></code> is supplied as
+the fourth argument.</p>
+<p>As a whole, the <code class="docutils literal"><span class="pre">.cpp</span></code> file looks like:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/Pass.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/IR/Function.h"</span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf">"llvm/Support/raw_ostream.h"</span><span class="cp"></span>
+
+<span class="k">using</span> <span class="k">namespace</span> <span class="n">llvm</span><span class="p">;</span>
+
+<span class="k">namespace</span> <span class="p">{</span>
+<span class="k">struct</span> <span class="nl">Hello</span> <span class="p">:</span> <span class="k">public</span> <span class="n">FunctionPass</span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">char</span> <span class="n">ID</span><span class="p">;</span>
+  <span class="n">Hello</span><span class="p">()</span> <span class="o">:</span> <span class="n">FunctionPass</span><span class="p">(</span><span class="n">ID</span><span class="p">)</span> <span class="p">{}</span>
+
+  <span class="kt">bool</span> <span class="n">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="k">override</span> <span class="p">{</span>
+    <span class="n">errs</span><span class="p">()</span> <span class="o"><<</span> <span class="s">"Hello: "</span><span class="p">;</span>
+    <span class="n">errs</span><span class="p">().</span><span class="n">write_escaped</span><span class="p">(</span><span class="n">F</span><span class="p">.</span><span class="n">getName</span><span class="p">())</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
+    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">};</span> <span class="c1">// end of struct Hello</span>
+<span class="p">}</span>  <span class="c1">// end of anonymous namespace</span>
+
+<span class="kt">char</span> <span class="n">Hello</span><span class="o">::</span><span class="n">ID</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+<span class="k">static</span> <span class="n">RegisterPass</span><span class="o"><</span><span class="n">Hello</span><span class="o">></span> <span class="n">X</span><span class="p">(</span><span class="s">"hello"</span><span class="p">,</span> <span class="s">"Hello World Pass"</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Only looks at CFG */</span><span class="p">,</span>
+                             <span class="nb">false</span> <span class="cm">/* Analysis Pass */</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Now that it’s all together, compile the file with a simple “<code class="docutils literal"><span class="pre">gmake</span></code>” command
+from the top level of your build directory and you should get a new file
+“<code class="docutils literal"><span class="pre">lib/LLVMHello.so</span></code>”.  Note that everything in this file is
+contained in an anonymous namespace — this reflects the fact that passes
+are self contained units that do not need external interfaces (although they
+can have them) to be useful.</p>
+</div>
+<div class="section" id="running-a-pass-with-opt">
+<h3><a class="toc-backref" href="#id9">Running a pass with <code class="docutils literal"><span class="pre">opt</span></code></a><a class="headerlink" href="#running-a-pass-with-opt" title="Permalink to this headline">¶</a></h3>
+<p>Now that you have a brand new shiny shared object file, we can use the
+<strong class="program">opt</strong> command to run an LLVM program through your pass.  Because you
+registered your pass with <code class="docutils literal"><span class="pre">RegisterPass</span></code>, you will be able to use the
+<strong class="program">opt</strong> tool to access it, once loaded.</p>
+<p>To test it, follow the example at the end of the <a class="reference internal" href="GettingStarted.html"><span class="doc">Getting Started with the LLVM System</span></a> to
+compile “Hello World” to LLVM.  We can now run the bitcode file (hello.bc) for
+the program through our transformation like this (or course, any bitcode file
+will work):</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -hello < hello.bc > /dev/null
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+</pre></div>
+</div>
+<p>The <a class="reference internal" href="CommandGuide/opt.html#cmdoption-load"><code class="xref std std-option docutils literal"><span class="pre">-load</span></code></a> option specifies that <strong class="program">opt</strong> should load your pass
+as a shared object, which makes “<code class="docutils literal"><span class="pre">-hello</span></code>” a valid command line argument
+(which is one reason you need to <a class="reference internal" href="#writing-an-llvm-pass-registration"><span class="std std-ref">register your pass</span></a>).  Because the Hello pass does not modify
+the program in any interesting way, we just throw away the result of
+<strong class="program">opt</strong> (sending it to <code class="docutils literal"><span class="pre">/dev/null</span></code>).</p>
+<p>To see what happened to the other string you registered, try running
+<strong class="program">opt</strong> with the <a class="reference internal" href="CommandGuide/opt.html#cmdoption-help"><code class="xref std std-option docutils literal"><span class="pre">-help</span></code></a> option:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -help
+<span class="go">OVERVIEW: llvm .bc -> .bc modular optimizer and analysis printer</span>
+
+<span class="go">USAGE: opt [subcommand] [options] <input bitcode file></span>
+
+<span class="go">OPTIONS:</span>
+<span class="go">  Optimizations available:</span>
+<span class="go">...</span>
+<span class="go">    -guard-widening           - Widen guards</span>
+<span class="go">    -gvn                      - Global Value Numbering</span>
+<span class="go">    -gvn-hoist                - Early GVN Hoisting of Expressions</span>
+<span class="go">    -hello                    - Hello World Pass</span>
+<span class="go">    -indvars                  - Induction Variable Simplification</span>
+<span class="go">    -inferattrs               - Infer set function attributes</span>
+<span class="go">...</span>
+</pre></div>
+</div>
+<p>The pass name gets added as the information string for your pass, giving some
+documentation to users of <strong class="program">opt</strong>.  Now that you have a working pass,
+you would go ahead and make it do the cool transformations you want.  Once you
+get it all working and tested, it may become useful to find out how fast your
+pass is.  The <a class="reference internal" href="#writing-an-llvm-pass-passmanager"><span class="std std-ref">PassManager</span></a> provides a
+nice command line option (<a class="reference internal" href="CommandGuide/llc.html#cmdoption-time-passes"><code class="xref std std-option docutils literal"><span class="pre">--time-passes</span></code></a>) that allows you to get
+information about the execution time of your pass along with the other passes
+you queue up.  For example:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -hello -time-passes < hello.bc > /dev/null
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+<span class="go">===-------------------------------------------------------------------------===</span>
+<span class="go">                      ... Pass execution timing report ...</span>
+<span class="go">===-------------------------------------------------------------------------===</span>
+<span class="go">  Total Execution Time: 0.0007 seconds (0.0005 wall clock)</span>
+
+<span class="go">   ---User Time---   --User+System--   ---Wall Time---  --- Name ---</span>
+<span class="go">   0.0004 ( 55.3%)   0.0004 ( 55.3%)   0.0004 ( 75.7%)  Bitcode Writer</span>
+<span class="go">   0.0003 ( 44.7%)   0.0003 ( 44.7%)   0.0001 ( 13.6%)  Hello World Pass</span>
+<span class="go">   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 ( 10.7%)  Module Verifier</span>
+<span class="go">   0.0007 (100.0%)   0.0007 (100.0%)   0.0005 (100.0%)  Total</span>
+</pre></div>
+</div>
+<p>As you can see, our implementation above is pretty fast.  The additional
+passes listed are automatically inserted by the <strong class="program">opt</strong> tool to verify
+that the LLVM emitted by your pass is still valid and well formed LLVM, which
+hasn’t been broken somehow.</p>
+<p>Now that you have seen the basics of the mechanics behind passes, we can talk
+about some more details of how they work and how to use them.</p>
+</div>
+</div>
+<div class="section" id="pass-classes-and-requirements">
+<span id="writing-an-llvm-pass-pass-classes"></span><h2><a class="toc-backref" href="#id10">Pass classes and requirements</a><a class="headerlink" href="#pass-classes-and-requirements" title="Permalink to this headline">¶</a></h2>
+<p>One of the first things that you should do when designing a new pass is to
+decide what class you should subclass for your pass.  The <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> example uses the <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> class for its implementation, but we did
+not discuss why or when this should occur.  Here we talk about the classes
+available, from the most general to the most specific.</p>
+<p>When choosing a superclass for your <code class="docutils literal"><span class="pre">Pass</span></code>, you should choose the <strong>most
+specific</strong> class possible, while still being able to meet the requirements
+listed.  This gives the LLVM Pass Infrastructure information necessary to
+optimize how passes are run, so that the resultant compiler isn’t unnecessarily
+slow.</p>
+<div class="section" id="the-immutablepass-class">
+<h3><a class="toc-backref" href="#id11">The <code class="docutils literal"><span class="pre">ImmutablePass</span></code> class</a><a class="headerlink" href="#the-immutablepass-class" title="Permalink to this headline">¶</a></h3>
+<p>The most plain and boring type of pass is the “<a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1ImmutablePass.html">ImmutablePass</a>” class.  This pass
+type is used for passes that do not have to be run, do not change state, and
+never need to be updated.  This is not a normal type of transformation or
+analysis, but can provide information about the current compiler configuration.</p>
+<p>Although this pass class is very infrequently used, it is important for
+providing information about the current target machine being compiled for, and
+other static information that can affect the various transformations.</p>
+<p><code class="docutils literal"><span class="pre">ImmutablePass</span></code>es never invalidate other transformations, are never
+invalidated, and are never “run”.</p>
+</div>
+<div class="section" id="the-modulepass-class">
+<span id="writing-an-llvm-pass-modulepass"></span><h3><a class="toc-backref" href="#id12">The <code class="docutils literal"><span class="pre">ModulePass</span></code> class</a><a class="headerlink" href="#the-modulepass-class" title="Permalink to this headline">¶</a></h3>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1ModulePass.html">ModulePass</a> class
+is the most general of all superclasses that you can use.  Deriving from
+<code class="docutils literal"><span class="pre">ModulePass</span></code> indicates that your pass uses the entire program as a unit,
+referring to function bodies in no predictable order, or adding and removing
+functions.  Because nothing is known about the behavior of <code class="docutils literal"><span class="pre">ModulePass</span></code>
+subclasses, no optimization can be done for their execution.</p>
+<p>A module pass can use function level passes (e.g. dominators) using the
+<code class="docutils literal"><span class="pre">getAnalysis</span></code> interface <code class="docutils literal"><span class="pre">getAnalysis<DominatorTree>(llvm::Function</span> <span class="pre">*)</span></code> to
+provide the function to retrieve analysis result for, if the function pass does
+not require any module or immutable passes.  Note that this can only be done
+for functions for which the analysis ran, e.g. in the case of dominators you
+should only ask for the <code class="docutils literal"><span class="pre">DominatorTree</span></code> for function definitions, not
+declarations.</p>
+<p>To write a correct <code class="docutils literal"><span class="pre">ModulePass</span></code> subclass, derive from <code class="docutils literal"><span class="pre">ModulePass</span></code> and
+overload the <code class="docutils literal"><span class="pre">runOnModule</span></code> method with the following signature:</p>
+<div class="section" id="the-runonmodule-method">
+<h4><a class="toc-backref" href="#id13">The <code class="docutils literal"><span class="pre">runOnModule</span></code> method</a><a class="headerlink" href="#the-runonmodule-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnModule</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnModule</span></code> method performs the interesting work of the pass.  It
+should return <code class="docutils literal"><span class="pre">true</span></code> if the module was modified by the transformation and
+<code class="docutils literal"><span class="pre">false</span></code> otherwise.</p>
+</div>
+</div>
+<div class="section" id="the-callgraphsccpass-class">
+<span id="writing-an-llvm-pass-callgraphsccpass"></span><h3><a class="toc-backref" href="#id14">The <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> class</a><a class="headerlink" href="#the-callgraphsccpass-class" title="Permalink to this headline">¶</a></h3>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1CallGraphSCCPass.html">CallGraphSCCPass</a> is used by
+passes that need to traverse the program bottom-up on the call graph (callees
+before callers).  Deriving from <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> provides some mechanics
+for building and traversing the <code class="docutils literal"><span class="pre">CallGraph</span></code>, but also allows the system to
+optimize execution of <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code>es.  If your pass meets the
+requirements outlined below, and doesn’t meet the requirements of a
+<a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> or <a class="reference internal" href="#writing-an-llvm-pass-basicblockpass"><span class="std std-ref">BasicBlockPass</span></a>, you should derive from
+<code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code>.</p>
+<p><code class="docutils literal"><span class="pre">TODO</span></code>: explain briefly what SCC, Tarjan’s algo, and B-U mean.</p>
+<p>To be explicit, CallGraphSCCPass subclasses are:</p>
+<ol class="arabic simple">
+<li>… <em>not allowed</em> to inspect or modify any <code class="docutils literal"><span class="pre">Function</span></code>s other than those
+in the current SCC and the direct callers and direct callees of the SCC.</li>
+<li>… <em>required</em> to preserve the current <code class="docutils literal"><span class="pre">CallGraph</span></code> object, updating it to
+reflect any changes made to the program.</li>
+<li>… <em>not allowed</em> to add or remove SCC’s from the current Module, though
+they may change the contents of an SCC.</li>
+<li>… <em>allowed</em> to add or remove global variables from the current Module.</li>
+<li>… <em>allowed</em> to maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonscc"><span class="std std-ref">runOnSCC</span></a> (including global data).</li>
+</ol>
+<p>Implementing a <code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code> is slightly tricky in some cases because it
+has to handle SCCs with more than one node in it.  All of the virtual methods
+described below should return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or
+<code class="docutils literal"><span class="pre">false</span></code> if they didn’t.</p>
+<div class="section" id="the-doinitialization-callgraph-method">
+<h4><a class="toc-backref" href="#id15">The <code class="docutils literal"><span class="pre">doInitialization(CallGraph</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-callgraph-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">CallGraph</span> <span class="o">&</span><span class="n">CG</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is allowed to do most of the things that
+<code class="docutils literal"><span class="pre">CallGraphSCCPass</span></code>es are not allowed to do.  They can add and remove
+functions, get pointers to functions, etc.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is
+designed to do simple initialization type of stuff that does not depend on the
+SCCs being processed.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to
+overlap with any other pass executions (thus it should be very fast).</p>
+</div>
+<div class="section" id="the-runonscc-method">
+<span id="writing-an-llvm-pass-runonscc"></span><h4><a class="toc-backref" href="#id16">The <code class="docutils literal"><span class="pre">runOnSCC</span></code> method</a><a class="headerlink" href="#the-runonscc-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnSCC</span><span class="p">(</span><span class="n">CallGraphSCC</span> <span class="o">&</span><span class="n">SCC</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnSCC</span></code> method performs the interesting work of the pass, and should
+return <code class="docutils literal"><span class="pre">true</span></code> if the module was modified by the transformation, <code class="docutils literal"><span class="pre">false</span></code>
+otherwise.</p>
+</div>
+<div class="section" id="the-dofinalization-callgraph-method">
+<h4><a class="toc-backref" href="#id17">The <code class="docutils literal"><span class="pre">doFinalization(CallGraph</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-dofinalization-callgraph-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">(</span><span class="n">CallGraph</span> <span class="o">&</span><span class="n">CG</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonscc"><span class="std std-ref">runOnSCC</span></a> for every SCC in the program being compiled.</p>
+</div>
+</div>
+<div class="section" id="the-functionpass-class">
+<span id="writing-an-llvm-pass-functionpass"></span><h3><a class="toc-backref" href="#id18">The <code class="docutils literal"><span class="pre">FunctionPass</span></code> class</a><a class="headerlink" href="#the-functionpass-class" title="Permalink to this headline">¶</a></h3>
+<p>In contrast to <code class="docutils literal"><span class="pre">ModulePass</span></code> subclasses, <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1Pass.html">FunctionPass</a> subclasses do have a
+predictable, local behavior that can be expected by the system.  All
+<code class="docutils literal"><span class="pre">FunctionPass</span></code> execute on each function in the program independent of all of
+the other functions in the program.  <code class="docutils literal"><span class="pre">FunctionPass</span></code>es do not require that
+they are executed in a particular order, and <code class="docutils literal"><span class="pre">FunctionPass</span></code>es do not modify
+external functions.</p>
+<p>To be explicit, <code class="docutils literal"><span class="pre">FunctionPass</span></code> subclasses are not allowed to:</p>
+<ol class="arabic simple">
+<li>Inspect or modify a <code class="docutils literal"><span class="pre">Function</span></code> other than the one currently being processed.</li>
+<li>Add or remove <code class="docutils literal"><span class="pre">Function</span></code>s from the current <code class="docutils literal"><span class="pre">Module</span></code>.</li>
+<li>Add or remove global variables from the current <code class="docutils literal"><span class="pre">Module</span></code>.</li>
+<li>Maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonfunction"><span class="std std-ref">runOnFunction</span></a> (including global data).</li>
+</ol>
+<p>Implementing a <code class="docutils literal"><span class="pre">FunctionPass</span></code> is usually straightforward (See the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello
+World</span></a> pass for example).
+<code class="docutils literal"><span class="pre">FunctionPass</span></code>es may overload three virtual methods to do their work.  All
+of these methods should return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or
+<code class="docutils literal"><span class="pre">false</span></code> if they didn’t.</p>
+<div class="section" id="the-doinitialization-module-method">
+<span id="writing-an-llvm-pass-doinitialization-mod"></span><h4><a class="toc-backref" href="#id19">The <code class="docutils literal"><span class="pre">doInitialization(Module</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-module-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is allowed to do most of the things that
+<code class="docutils literal"><span class="pre">FunctionPass</span></code>es are not allowed to do.  They can add and remove functions,
+get pointers to functions, etc.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to
+do simple initialization type of stuff that does not depend on the functions
+being processed.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to
+overlap with any other pass executions (thus it should be very fast).</p>
+<p>A good example of how this method should be used is the <a class="reference external" href="http://llvm.org/doxygen/LowerAllocations_8cpp-source.html">LowerAllocations</a> pass.  This pass
+converts <code class="docutils literal"><span class="pre">malloc</span></code> and <code class="docutils literal"><span class="pre">free</span></code> instructions into platform dependent
+<code class="docutils literal"><span class="pre">malloc()</span></code> and <code class="docutils literal"><span class="pre">free()</span></code> function calls.  It uses the <code class="docutils literal"><span class="pre">doInitialization</span></code>
+method to get a reference to the <code class="docutils literal"><span class="pre">malloc</span></code> and <code class="docutils literal"><span class="pre">free</span></code> functions that it
+needs, adding prototypes to the module if necessary.</p>
+</div>
+<div class="section" id="the-runonfunction-method">
+<span id="writing-an-llvm-pass-runonfunction"></span><h4><a class="toc-backref" href="#id20">The <code class="docutils literal"><span class="pre">runOnFunction</span></code> method</a><a class="headerlink" href="#the-runonfunction-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnFunction</span></code> method must be implemented by your subclass to do the
+transformation or analysis work of your pass.  As usual, a <code class="docutils literal"><span class="pre">true</span></code> value
+should be returned if the function is modified.</p>
+</div>
+<div class="section" id="the-dofinalization-module-method">
+<span id="writing-an-llvm-pass-dofinalization-mod"></span><h4><a class="toc-backref" href="#id21">The <code class="docutils literal"><span class="pre">doFinalization(Module</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-dofinalization-module-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonfunction"><span class="std std-ref">runOnFunction</span></a> for every function in the program being
+compiled.</p>
+</div>
+</div>
+<div class="section" id="the-looppass-class">
+<span id="writing-an-llvm-pass-looppass"></span><h3><a class="toc-backref" href="#id22">The <code class="docutils literal"><span class="pre">LoopPass</span></code> class</a><a class="headerlink" href="#the-looppass-class" title="Permalink to this headline">¶</a></h3>
+<p>All <code class="docutils literal"><span class="pre">LoopPass</span></code> execute on each loop in the function independent of all of the
+other loops in the function.  <code class="docutils literal"><span class="pre">LoopPass</span></code> processes loops in loop nest order
+such that outer most loop is processed last.</p>
+<p><code class="docutils literal"><span class="pre">LoopPass</span></code> subclasses are allowed to update loop nest using <code class="docutils literal"><span class="pre">LPPassManager</span></code>
+interface.  Implementing a loop pass is usually straightforward.
+<code class="docutils literal"><span class="pre">LoopPass</span></code>es may overload three virtual methods to do their work.  All
+these methods should return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or <code class="docutils literal"><span class="pre">false</span></code>
+if they didn’t.</p>
+<p>A <code class="docutils literal"><span class="pre">LoopPass</span></code> subclass which is intended to run as part of the main loop pass
+pipeline needs to preserve all of the same <em>function</em> analyses that the other
+loop passes in its pipeline require. To make that easier,
+a <code class="docutils literal"><span class="pre">getLoopAnalysisUsage</span></code> function is provided by <code class="docutils literal"><span class="pre">LoopUtils.h</span></code>. It can be
+called within the subclass’s <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> override to get consistent
+and correct behavior. Analogously, <code class="docutils literal"><span class="pre">INITIALIZE_PASS_DEPENDENCY(LoopPass)</span></code>
+will initialize this set of function analyses.</p>
+<div class="section" id="the-doinitialization-loop-lppassmanager-method">
+<h4><a class="toc-backref" href="#id23">The <code class="docutils literal"><span class="pre">doInitialization(Loop</span> <span class="pre">*,</span> <span class="pre">LPPassManager</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-loop-lppassmanager-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Loop</span> <span class="o">*</span><span class="p">,</span> <span class="n">LPPassManager</span> <span class="o">&</span><span class="n">LPM</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to do simple initialization type of
+stuff that does not depend on the functions being processed.  The
+<code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).  <code class="docutils literal"><span class="pre">LPPassManager</span></code> interface
+should be used to access <code class="docutils literal"><span class="pre">Function</span></code> or <code class="docutils literal"><span class="pre">Module</span></code> level analysis information.</p>
+</div>
+<div class="section" id="the-runonloop-method">
+<span id="writing-an-llvm-pass-runonloop"></span><h4><a class="toc-backref" href="#id24">The <code class="docutils literal"><span class="pre">runOnLoop</span></code> method</a><a class="headerlink" href="#the-runonloop-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnLoop</span><span class="p">(</span><span class="n">Loop</span> <span class="o">*</span><span class="p">,</span> <span class="n">LPPassManager</span> <span class="o">&</span><span class="n">LPM</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnLoop</span></code> method must be implemented by your subclass to do the
+transformation or analysis work of your pass.  As usual, a <code class="docutils literal"><span class="pre">true</span></code> value
+should be returned if the function is modified.  <code class="docutils literal"><span class="pre">LPPassManager</span></code> interface
+should be used to update loop nest.</p>
+</div>
+<div class="section" id="the-dofinalization-method">
+<h4><a class="toc-backref" href="#id25">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a><a class="headerlink" href="#the-dofinalization-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonloop"><span class="std std-ref">runOnLoop</span></a> for every loop in the program being compiled.</p>
+</div>
+</div>
+<div class="section" id="the-regionpass-class">
+<span id="writing-an-llvm-pass-regionpass"></span><h3><a class="toc-backref" href="#id26">The <code class="docutils literal"><span class="pre">RegionPass</span></code> class</a><a class="headerlink" href="#the-regionpass-class" title="Permalink to this headline">¶</a></h3>
+<p><code class="docutils literal"><span class="pre">RegionPass</span></code> is similar to <a class="reference internal" href="#writing-an-llvm-pass-looppass"><span class="std std-ref">LoopPass</span></a>,
+but executes on each single entry single exit region in the function.
+<code class="docutils literal"><span class="pre">RegionPass</span></code> processes regions in nested order such that the outer most
+region is processed last.</p>
+<p><code class="docutils literal"><span class="pre">RegionPass</span></code> subclasses are allowed to update the region tree by using the
+<code class="docutils literal"><span class="pre">RGPassManager</span></code> interface.  You may overload three virtual methods of
+<code class="docutils literal"><span class="pre">RegionPass</span></code> to implement your own region pass.  All these methods should
+return <code class="docutils literal"><span class="pre">true</span></code> if they modified the program, or <code class="docutils literal"><span class="pre">false</span></code> if they did not.</p>
+<div class="section" id="the-doinitialization-region-rgpassmanager-method">
+<h4><a class="toc-backref" href="#id27">The <code class="docutils literal"><span class="pre">doInitialization(Region</span> <span class="pre">*,</span> <span class="pre">RGPassManager</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-region-rgpassmanager-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Region</span> <span class="o">*</span><span class="p">,</span> <span class="n">RGPassManager</span> <span class="o">&</span><span class="n">RGM</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to do simple initialization type of
+stuff that does not depend on the functions being processed.  The
+<code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).  <code class="docutils literal"><span class="pre">RPPassManager</span></code> interface
+should be used to access <code class="docutils literal"><span class="pre">Function</span></code> or <code class="docutils literal"><span class="pre">Module</span></code> level analysis information.</p>
+</div>
+<div class="section" id="the-runonregion-method">
+<span id="writing-an-llvm-pass-runonregion"></span><h4><a class="toc-backref" href="#id28">The <code class="docutils literal"><span class="pre">runOnRegion</span></code> method</a><a class="headerlink" href="#the-runonregion-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnRegion</span><span class="p">(</span><span class="n">Region</span> <span class="o">*</span><span class="p">,</span> <span class="n">RGPassManager</span> <span class="o">&</span><span class="n">RGM</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">runOnRegion</span></code> method must be implemented by your subclass to do the
+transformation or analysis work of your pass.  As usual, a true value should be
+returned if the region is modified.  <code class="docutils literal"><span class="pre">RGPassManager</span></code> interface should be used to
+update region tree.</p>
+</div>
+<div class="section" id="id2">
+<h4><a class="toc-backref" href="#id29">The <code class="docutils literal"><span class="pre">doFinalization()</span></code> method</a><a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonregion"><span class="std std-ref">runOnRegion</span></a> for every region in the program being
+compiled.</p>
+</div>
+</div>
+<div class="section" id="the-basicblockpass-class">
+<span id="writing-an-llvm-pass-basicblockpass"></span><h3><a class="toc-backref" href="#id30">The <code class="docutils literal"><span class="pre">BasicBlockPass</span></code> class</a><a class="headerlink" href="#the-basicblockpass-class" title="Permalink to this headline">¶</a></h3>
+<p><code class="docutils literal"><span class="pre">BasicBlockPass</span></code>es are just like <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass’s</span></a> , except that they must limit their scope
+of inspection and modification to a single basic block at a time.  As such,
+they are <strong>not</strong> allowed to do any of the following:</p>
+<ol class="arabic simple">
+<li>Modify or inspect any basic blocks outside of the current one.</li>
+<li>Maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonbasicblock"><span class="std std-ref">runOnBasicBlock</span></a>.</li>
+<li>Modify the control flow graph (by altering terminator instructions)</li>
+<li>Any of the things forbidden for <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPasses</span></a>.</li>
+</ol>
+<p><code class="docutils literal"><span class="pre">BasicBlockPass</span></code>es are useful for traditional local and “peephole”
+optimizations.  They may override the same <a class="reference internal" href="#writing-an-llvm-pass-doinitialization-mod"><span class="std std-ref">doInitialization(Module &)</span></a> and <a class="reference internal" href="#writing-an-llvm-pass-dofinalization-mod"><span class="std std-ref">doFinalization(Module &)</span></a> methods that <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass’s</span></a> have, but also have the following virtual
+methods that may also be implemented:</p>
+<div class="section" id="the-doinitialization-function-method">
+<h4><a class="toc-backref" href="#id31">The <code class="docutils literal"><span class="pre">doInitialization(Function</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-doinitialization-function-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doInitialization</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is allowed to do most of the things that
+<code class="docutils literal"><span class="pre">BasicBlockPass</span></code>es are not allowed to do, but that <code class="docutils literal"><span class="pre">FunctionPass</span></code>es
+can.  The <code class="docutils literal"><span class="pre">doInitialization</span></code> method is designed to do simple initialization
+that does not depend on the <code class="docutils literal"><span class="pre">BasicBlock</span></code>s being processed.  The
+<code class="docutils literal"><span class="pre">doInitialization</span></code> method call is not scheduled to overlap with any other
+pass executions (thus it should be very fast).</p>
+</div>
+<div class="section" id="the-runonbasicblock-method">
+<span id="writing-an-llvm-pass-runonbasicblock"></span><h4><a class="toc-backref" href="#id32">The <code class="docutils literal"><span class="pre">runOnBasicBlock</span></code> method</a><a class="headerlink" href="#the-runonbasicblock-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnBasicBlock</span><span class="p">(</span><span class="n">BasicBlock</span> <span class="o">&</span><span class="n">BB</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>Override this function to do the work of the <code class="docutils literal"><span class="pre">BasicBlockPass</span></code>.  This function
+is not allowed to inspect or modify basic blocks other than the parameter, and
+are not allowed to modify the CFG.  A <code class="docutils literal"><span class="pre">true</span></code> value must be returned if the
+basic block is modified.</p>
+</div>
+<div class="section" id="the-dofinalization-function-method">
+<h4><a class="toc-backref" href="#id33">The <code class="docutils literal"><span class="pre">doFinalization(Function</span> <span class="pre">&)</span></code> method</a><a class="headerlink" href="#the-dofinalization-function-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">doFinalization</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">doFinalization</span></code> method is an infrequently used method that is called
+when the pass framework has finished calling <a class="reference internal" href="#writing-an-llvm-pass-runonbasicblock"><span class="std std-ref">runOnBasicBlock</span></a> for every <code class="docutils literal"><span class="pre">BasicBlock</span></code> in the program
+being compiled.  This can be used to perform per-function finalization.</p>
+</div>
+</div>
+<div class="section" id="the-machinefunctionpass-class">
+<h3><a class="toc-backref" href="#id34">The <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> class</a><a class="headerlink" href="#the-machinefunctionpass-class" title="Permalink to this headline">¶</a></h3>
+<p>A <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> is a part of the LLVM code generator that executes on
+the machine-dependent representation of each LLVM function in the program.</p>
+<p>Code generator passes are registered and initialized specially by
+<code class="docutils literal"><span class="pre">TargetMachine::addPassesToEmitFile</span></code> and similar routines, so they cannot
+generally be run from the <strong class="program">opt</strong> or <strong class="program">bugpoint</strong> commands.</p>
+<p>A <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> is also a <code class="docutils literal"><span class="pre">FunctionPass</span></code>, so all the restrictions
+that apply to a <code class="docutils literal"><span class="pre">FunctionPass</span></code> also apply to it.  <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>es
+also have additional restrictions.  In particular, <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>es
+are not allowed to do any of the following:</p>
+<ol class="arabic simple">
+<li>Modify or create any LLVM IR <code class="docutils literal"><span class="pre">Instruction</span></code>s, <code class="docutils literal"><span class="pre">BasicBlock</span></code>s,
+<code class="docutils literal"><span class="pre">Argument</span></code>s, <code class="docutils literal"><span class="pre">Function</span></code>s, <code class="docutils literal"><span class="pre">GlobalVariable</span></code>s,
+<code class="docutils literal"><span class="pre">GlobalAlias</span></code>es, or <code class="docutils literal"><span class="pre">Module</span></code>s.</li>
+<li>Modify a <code class="docutils literal"><span class="pre">MachineFunction</span></code> other than the one currently being processed.</li>
+<li>Maintain state across invocations of <a class="reference internal" href="#writing-an-llvm-pass-runonmachinefunction"><span class="std std-ref">runOnMachineFunction</span></a> (including global data).</li>
+</ol>
+<div class="section" id="the-runonmachinefunction-machinefunction-mf-method">
+<span id="writing-an-llvm-pass-runonmachinefunction"></span><h4><a class="toc-backref" href="#id35">The <code class="docutils literal"><span class="pre">runOnMachineFunction(MachineFunction</span> <span class="pre">&MF)</span></code> method</a><a class="headerlink" href="#the-runonmachinefunction-machinefunction-mf-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">bool</span> <span class="nf">runOnMachineFunction</span><span class="p">(</span><span class="n">MachineFunction</span> <span class="o">&</span><span class="n">MF</span><span class="p">)</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
+</pre></div>
+</div>
+<p><code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> can be considered the main entry point of a
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>; that is, you should override this method to do the
+work of your <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>.</p>
+<p>The <code class="docutils literal"><span class="pre">runOnMachineFunction</span></code> method is called on every <code class="docutils literal"><span class="pre">MachineFunction</span></code> in a
+<code class="docutils literal"><span class="pre">Module</span></code>, so that the <code class="docutils literal"><span class="pre">MachineFunctionPass</span></code> may perform optimizations on
+the machine-dependent representation of the function.  If you want to get at
+the LLVM <code class="docutils literal"><span class="pre">Function</span></code> for the <code class="docutils literal"><span class="pre">MachineFunction</span></code> you’re working on, use
+<code class="docutils literal"><span class="pre">MachineFunction</span></code>’s <code class="docutils literal"><span class="pre">getFunction()</span></code> accessor method — but remember, you
+may not modify the LLVM <code class="docutils literal"><span class="pre">Function</span></code> or its contents from a
+<code class="docutils literal"><span class="pre">MachineFunctionPass</span></code>.</p>
+</div>
+</div>
+<div class="section" id="pass-registration">
+<span id="writing-an-llvm-pass-registration"></span><h3><a class="toc-backref" href="#id36">Pass registration</a><a class="headerlink" href="#pass-registration" title="Permalink to this headline">¶</a></h3>
+<p>In the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> example pass we
+illustrated how pass registration works, and discussed some of the reasons that
+it is used and what it does.  Here we discuss how and why passes are
+registered.</p>
+<p>As we saw above, passes are registered with the <code class="docutils literal"><span class="pre">RegisterPass</span></code> template.  The
+template parameter is the name of the pass that is to be used on the command
+line to specify that the pass should be added to a program (for example, with
+<strong class="program">opt</strong> or <strong class="program">bugpoint</strong>).  The first argument is the name of the
+pass, which is to be used for the <a class="reference internal" href="CommandGuide/opt.html#cmdoption-help"><code class="xref std std-option docutils literal"><span class="pre">-help</span></code></a> output of programs, as well
+as for debug output generated by the <cite>–debug-pass</cite> option.</p>
+<p>If you want your pass to be easily dumpable, you should implement the virtual
+print method:</p>
+<div class="section" id="the-print-method">
+<h4><a class="toc-backref" href="#id37">The <code class="docutils literal"><span class="pre">print</span></code> method</a><a class="headerlink" href="#the-print-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">void</span> <span class="nf">print</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="n">O</span><span class="p">,</span> <span class="k">const</span> <span class="n">Module</span> <span class="o">*</span><span class="n">M</span><span class="p">)</span> <span class="k">const</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">print</span></code> method must be implemented by “analyses” in order to print a
+human readable version of the analysis results.  This is useful for debugging
+an analysis itself, as well as for other people to figure out how an analysis
+works.  Use the opt <code class="docutils literal"><span class="pre">-analyze</span></code> argument to invoke this method.</p>
+<p>The <code class="docutils literal"><span class="pre">llvm::raw_ostream</span></code> parameter specifies the stream to write the results
+on, and the <code class="docutils literal"><span class="pre">Module</span></code> parameter gives a pointer to the top level module of the
+program that has been analyzed.  Note however that this pointer may be <code class="docutils literal"><span class="pre">NULL</span></code>
+in certain circumstances (such as calling the <code class="docutils literal"><span class="pre">Pass::dump()</span></code> from a
+debugger), so it should only be used to enhance debug output, it should not be
+depended on.</p>
+</div>
+</div>
+<div class="section" id="specifying-interactions-between-passes">
+<span id="writing-an-llvm-pass-interaction"></span><h3><a class="toc-backref" href="#id38">Specifying interactions between passes</a><a class="headerlink" href="#specifying-interactions-between-passes" title="Permalink to this headline">¶</a></h3>
+<p>One of the main responsibilities of the <code class="docutils literal"><span class="pre">PassManager</span></code> is to make sure that
+passes interact with each other correctly.  Because <code class="docutils literal"><span class="pre">PassManager</span></code> tries to
+<a class="reference internal" href="#writing-an-llvm-pass-passmanager"><span class="std std-ref">optimize the execution of passes</span></a> it
+must know how the passes interact with each other and what dependencies exist
+between the various passes.  To track this, each pass can declare the set of
+passes that are required to be executed before the current pass, and the passes
+which are invalidated by the current pass.</p>
+<p>Typically this functionality is used to require that analysis results are
+computed before your pass is run.  Running arbitrary transformation passes can
+invalidate the computed analysis results, which is what the invalidation set
+specifies.  If a pass does not implement the <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a> method, it defaults to not having any
+prerequisite passes, and invalidating <strong>all</strong> other passes.</p>
+<div class="section" id="the-getanalysisusage-method">
+<span id="writing-an-llvm-pass-getanalysisusage"></span><h4><a class="toc-backref" href="#id39">The <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> method</a><a class="headerlink" href="#the-getanalysisusage-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">void</span> <span class="nf">getAnalysisUsage</span><span class="p">(</span><span class="n">AnalysisUsage</span> <span class="o">&</span><span class="n">Info</span><span class="p">)</span> <span class="k">const</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>By implementing the <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code> method, the required and invalidated
+sets may be specified for your transformation.  The implementation should fill
+in the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AnalysisUsage.html">AnalysisUsage</a> object with
+information about which passes are required and not invalidated.  To do this, a
+pass may call any of the following methods on the <code class="docutils literal"><span class="pre">AnalysisUsage</span></code> object:</p>
+</div>
+<div class="section" id="the-analysisusage-addrequired-and-analysisusage-addrequiredtransitive-methods">
+<h4><a class="toc-backref" href="#id40">The <code class="docutils literal"><span class="pre">AnalysisUsage::addRequired<></span></code> and <code class="docutils literal"><span class="pre">AnalysisUsage::addRequiredTransitive<></span></code> methods</a><a class="headerlink" href="#the-analysisusage-addrequired-and-analysisusage-addrequiredtransitive-methods" title="Permalink to this headline">¶</a></h4>
+<p>If your pass requires a previous pass to be executed (an analysis for example),
+it can use one of these methods to arrange for it to be run before your pass.
+LLVM has many different types of analyses and passes that can be required,
+spanning the range from <code class="docutils literal"><span class="pre">DominatorSet</span></code> to <code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>.  Requiring
+<code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>, for example, guarantees that there will be no critical
+edges in the CFG when your pass has been run.</p>
+<p>Some analyses chain to other analyses to do their job.  For example, an
+<cite>AliasAnalysis <AliasAnalysis></cite> implementation is required to <a class="reference internal" href="AliasAnalysis.html#aliasanalysis-chaining"><span class="std std-ref">chain</span></a> to other alias analysis passes.  In cases where
+analyses chain, the <code class="docutils literal"><span class="pre">addRequiredTransitive</span></code> method should be used instead of
+the <code class="docutils literal"><span class="pre">addRequired</span></code> method.  This informs the <code class="docutils literal"><span class="pre">PassManager</span></code> that the
+transitively required pass should be alive as long as the requiring pass is.</p>
+</div>
+<div class="section" id="the-analysisusage-addpreserved-method">
+<h4><a class="toc-backref" href="#id41">The <code class="docutils literal"><span class="pre">AnalysisUsage::addPreserved<></span></code> method</a><a class="headerlink" href="#the-analysisusage-addpreserved-method" title="Permalink to this headline">¶</a></h4>
+<p>One of the jobs of the <code class="docutils literal"><span class="pre">PassManager</span></code> is to optimize how and when analyses are
+run.  In particular, it attempts to avoid recomputing data unless it needs to.
+For this reason, passes are allowed to declare that they preserve (i.e., they
+don’t invalidate) an existing analysis if it’s available.  For example, a
+simple constant folding pass would not modify the CFG, so it can’t possibly
+affect the results of dominator analysis.  By default, all passes are assumed
+to invalidate all others.</p>
+<p>The <code class="docutils literal"><span class="pre">AnalysisUsage</span></code> class provides several methods which are useful in
+certain circumstances that are related to <code class="docutils literal"><span class="pre">addPreserved</span></code>.  In particular, the
+<code class="docutils literal"><span class="pre">setPreservesAll</span></code> method can be called to indicate that the pass does not
+modify the LLVM program at all (which is true for analyses), and the
+<code class="docutils literal"><span class="pre">setPreservesCFG</span></code> method can be used by transformations that change
+instructions in the program but do not modify the CFG or terminator
+instructions (note that this property is implicitly set for
+<a class="reference internal" href="#writing-an-llvm-pass-basicblockpass"><span class="std std-ref">BasicBlockPass</span></a>es).</p>
+<p><code class="docutils literal"><span class="pre">addPreserved</span></code> is particularly useful for transformations like
+<code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>.  This pass knows how to update a small set of loop and
+dominator related analyses if they exist, so it can preserve them, despite the
+fact that it hacks on the CFG.</p>
+</div>
+<div class="section" id="example-implementations-of-getanalysisusage">
+<h4><a class="toc-backref" href="#id42">Example implementations of <code class="docutils literal"><span class="pre">getAnalysisUsage</span></code></a><a class="headerlink" href="#example-implementations-of-getanalysisusage" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// This example modifies the program, but does not modify the CFG</span>
+<span class="kt">void</span> <span class="n">LICM</span><span class="o">::</span><span class="n">getAnalysisUsage</span><span class="p">(</span><span class="n">AnalysisUsage</span> <span class="o">&</span><span class="n">AU</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span>
+  <span class="n">AU</span><span class="p">.</span><span class="n">setPreservesCFG</span><span class="p">();</span>
+  <span class="n">AU</span><span class="p">.</span><span class="n">addRequired</span><span class="o"><</span><span class="n">LoopInfoWrapperPass</span><span class="o">></span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="the-getanalysis-and-getanalysisifavailable-methods">
+<span id="writing-an-llvm-pass-getanalysis"></span><h4><a class="toc-backref" href="#id43">The <code class="docutils literal"><span class="pre">getAnalysis<></span></code> and <code class="docutils literal"><span class="pre">getAnalysisIfAvailable<></span></code> methods</a><a class="headerlink" href="#the-getanalysis-and-getanalysisifavailable-methods" title="Permalink to this headline">¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">Pass::getAnalysis<></span></code> method is automatically inherited by your class,
+providing you with access to the passes that you declared that you required
+with the <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a>
+method.  It takes a single template argument that specifies which pass class
+you want, and returns a reference to that pass.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">bool</span> <span class="n">LICM</span><span class="o">::</span><span class="n">runOnFunction</span><span class="p">(</span><span class="n">Function</span> <span class="o">&</span><span class="n">F</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">LoopInfo</span> <span class="o">&</span><span class="n">LI</span> <span class="o">=</span> <span class="n">getAnalysis</span><span class="o"><</span><span class="n">LoopInfoWrapperPass</span><span class="o">></span><span class="p">().</span><span class="n">getLoopInfo</span><span class="p">();</span>
+  <span class="c1">//...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This method call returns a reference to the pass desired.  You may get a
+runtime assertion failure if you attempt to get an analysis that you did not
+declare as required in your <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a> implementation.  This method can be
+called by your <code class="docutils literal"><span class="pre">run*</span></code> method implementation, or by any other local method
+invoked by your <code class="docutils literal"><span class="pre">run*</span></code> method.</p>
+<p>A module level pass can use function level analysis info using this interface.
+For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="kt">bool</span> <span class="n">ModuleLevelPass</span><span class="o">::</span><span class="n">runOnModule</span><span class="p">(</span><span class="n">Module</span> <span class="o">&</span><span class="n">M</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">//...</span>
+  <span class="n">DominatorTree</span> <span class="o">&</span><span class="n">DT</span> <span class="o">=</span> <span class="n">getAnalysis</span><span class="o"><</span><span class="n">DominatorTree</span><span class="o">></span><span class="p">(</span><span class="n">Func</span><span class="p">);</span>
+  <span class="c1">//...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>In above example, <code class="docutils literal"><span class="pre">runOnFunction</span></code> for <code class="docutils literal"><span class="pre">DominatorTree</span></code> is called by pass
+manager before returning a reference to the desired pass.</p>
+<p>If your pass is capable of updating analyses if they exist (e.g.,
+<code class="docutils literal"><span class="pre">BreakCriticalEdges</span></code>, as described above), you can use the
+<code class="docutils literal"><span class="pre">getAnalysisIfAvailable</span></code> method, which returns a pointer to the analysis if
+it is active.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">if</span> <span class="p">(</span><span class="n">DominatorSet</span> <span class="o">*</span><span class="n">DS</span> <span class="o">=</span> <span class="n">getAnalysisIfAvailable</span><span class="o"><</span><span class="n">DominatorSet</span><span class="o">></span><span class="p">())</span> <span class="p">{</span>
+  <span class="c1">// A DominatorSet is active.  This code will update it.</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="implementing-analysis-groups">
+<h3><a class="toc-backref" href="#id44">Implementing Analysis Groups</a><a class="headerlink" href="#implementing-analysis-groups" title="Permalink to this headline">¶</a></h3>
+<p>Now that we understand the basics of how passes are defined, how they are used,
+and how they are required from other passes, it’s time to get a little bit
+fancier.  All of the pass relationships that we have seen so far are very
+simple: one pass depends on one other specific pass to be run before it can
+run.  For many applications, this is great, for others, more flexibility is
+required.</p>
+<p>In particular, some analyses are defined such that there is a single simple
+interface to the analysis results, but multiple ways of calculating them.
+Consider alias analysis for example.  The most trivial alias analysis returns
+“may alias” for any alias query.  The most sophisticated analysis a
+flow-sensitive, context-sensitive interprocedural analysis that can take a
+significant amount of time to execute (and obviously, there is a lot of room
+between these two extremes for other implementations).  To cleanly support
+situations like this, the LLVM Pass Infrastructure supports the notion of
+Analysis Groups.</p>
+<div class="section" id="analysis-group-concepts">
+<h4><a class="toc-backref" href="#id45">Analysis Group Concepts</a><a class="headerlink" href="#analysis-group-concepts" title="Permalink to this headline">¶</a></h4>
+<p>An Analysis Group is a single simple interface that may be implemented by
+multiple different passes.  Analysis Groups can be given human readable names
+just like passes, but unlike passes, they need not derive from the <code class="docutils literal"><span class="pre">Pass</span></code>
+class.  An analysis group may have one or more implementations, one of which is
+the “default” implementation.</p>
+<p>Analysis groups are used by client passes just like other passes are: the
+<code class="docutils literal"><span class="pre">AnalysisUsage::addRequired()</span></code> and <code class="docutils literal"><span class="pre">Pass::getAnalysis()</span></code> methods.  In order
+to resolve this requirement, the <a class="reference internal" href="#writing-an-llvm-pass-passmanager"><span class="std std-ref">PassManager</span></a> scans the available passes to see if any
+implementations of the analysis group are available.  If none is available, the
+default implementation is created for the pass to use.  All standard rules for
+<a class="reference internal" href="#writing-an-llvm-pass-interaction"><span class="std std-ref">interaction between passes</span></a> still
+apply.</p>
+<p>Although <a class="reference internal" href="#writing-an-llvm-pass-registration"><span class="std std-ref">Pass Registration</span></a> is
+optional for normal passes, all analysis group implementations must be
+registered, and must use the <a class="reference internal" href="#writing-an-llvm-pass-registeranalysisgroup"><span class="std std-ref">INITIALIZE_AG_PASS</span></a> template to join the
+implementation pool.  Also, a default implementation of the interface <strong>must</strong>
+be registered with <a class="reference internal" href="#writing-an-llvm-pass-registeranalysisgroup"><span class="std std-ref">RegisterAnalysisGroup</span></a>.</p>
+<p>As a concrete example of an Analysis Group in action, consider the
+<a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html">AliasAnalysis</a>
+analysis group.  The default implementation of the alias analysis interface
+(the <a class="reference external" href="http://llvm.org/doxygen/structBasicAliasAnalysis.html">basicaa</a> pass)
+just does a few simple checks that don’t require significant analysis to
+compute (such as: two different globals can never alias each other, etc).
+Passes that use the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html">AliasAnalysis</a> interface (for
+example the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1GVN.html">gvn</a> pass), do not
+care which implementation of alias analysis is actually provided, they just use
+the designated interface.</p>
+<p>From the user’s perspective, commands work just like normal.  Issuing the
+command <code class="docutils literal"><span class="pre">opt</span> <span class="pre">-gvn</span> <span class="pre">...</span></code> will cause the <code class="docutils literal"><span class="pre">basicaa</span></code> class to be instantiated
+and added to the pass sequence.  Issuing the command <code class="docutils literal"><span class="pre">opt</span> <span class="pre">-somefancyaa</span> <span class="pre">-gvn</span>
+<span class="pre">...</span></code> will cause the <code class="docutils literal"><span class="pre">gvn</span></code> pass to use the <code class="docutils literal"><span class="pre">somefancyaa</span></code> alias analysis
+(which doesn’t actually exist, it’s just a hypothetical example) instead.</p>
+</div>
+<div class="section" id="using-registeranalysisgroup">
+<span id="writing-an-llvm-pass-registeranalysisgroup"></span><h4><a class="toc-backref" href="#id46">Using <code class="docutils literal"><span class="pre">RegisterAnalysisGroup</span></code></a><a class="headerlink" href="#using-registeranalysisgroup" title="Permalink to this headline">¶</a></h4>
+<p>The <code class="docutils literal"><span class="pre">RegisterAnalysisGroup</span></code> template is used to register the analysis group
+itself, while the <code class="docutils literal"><span class="pre">INITIALIZE_AG_PASS</span></code> is used to add pass implementations to
+the analysis group.  First, an analysis group should be registered, with a
+human readable name provided for it.  Unlike registration of passes, there is
+no command line argument to be specified for the Analysis Group Interface
+itself, because it is “abstract”:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">RegisterAnalysisGroup</span><span class="o"><</span><span class="n">AliasAnalysis</span><span class="o">></span> <span class="n">A</span><span class="p">(</span><span class="s">"Alias Analysis"</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Once the analysis is registered, passes can declare that they are valid
+implementations of the interface by using the following code:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="p">{</span>
+  <span class="c1">// Declare that we implement the AliasAnalysis interface</span>
+  <span class="n">INITIALIZE_AG_PASS</span><span class="p">(</span><span class="n">FancyAA</span><span class="p">,</span> <span class="n">AliasAnalysis</span> <span class="p">,</span> <span class="s">"somefancyaa"</span><span class="p">,</span>
+      <span class="s">"A more complex alias analysis implementation"</span><span class="p">,</span>
+      <span class="nb">false</span><span class="p">,</span>  <span class="c1">// Is CFG Only?</span>
+      <span class="nb">true</span><span class="p">,</span>   <span class="c1">// Is Analysis?</span>
+      <span class="nb">false</span><span class="p">);</span> <span class="c1">// Is default Analysis Group implementation?</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>This just shows a class <code class="docutils literal"><span class="pre">FancyAA</span></code> that uses the <code class="docutils literal"><span class="pre">INITIALIZE_AG_PASS</span></code> macro
+both to register and to “join” the <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1AliasAnalysis.html">AliasAnalysis</a> analysis group.
+Every implementation of an analysis group should join using this macro.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">namespace</span> <span class="p">{</span>
+  <span class="c1">// Declare that we implement the AliasAnalysis interface</span>
+  <span class="n">INITIALIZE_AG_PASS</span><span class="p">(</span><span class="n">BasicAA</span><span class="p">,</span> <span class="n">AliasAnalysis</span><span class="p">,</span> <span class="s">"basicaa"</span><span class="p">,</span>
+      <span class="s">"Basic Alias Analysis (default AA impl)"</span><span class="p">,</span>
+      <span class="nb">false</span><span class="p">,</span> <span class="c1">// Is CFG Only?</span>
+      <span class="nb">true</span><span class="p">,</span>  <span class="c1">// Is Analysis?</span>
+      <span class="nb">true</span><span class="p">);</span> <span class="c1">// Is default Analysis Group implementation?</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Here we show how the default implementation is specified (using the final
+argument to the <code class="docutils literal"><span class="pre">INITIALIZE_AG_PASS</span></code> template).  There must be exactly one
+default implementation available at all times for an Analysis Group to be used.
+Only default implementation can derive from <code class="docutils literal"><span class="pre">ImmutablePass</span></code>.  Here we declare
+that the <a class="reference external" href="http://llvm.org/doxygen/structBasicAliasAnalysis.html">BasicAliasAnalysis</a> pass is the default
+implementation for the interface.</p>
+</div>
+</div>
+</div>
+<div class="section" id="pass-statistics">
+<h2><a class="toc-backref" href="#id47">Pass Statistics</a><a class="headerlink" href="#pass-statistics" title="Permalink to this headline">¶</a></h2>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/Statistic_8h-source.html">Statistic</a> class is
+designed to be an easy way to expose various success metrics from passes.
+These statistics are printed at the end of a run, when the <a class="reference internal" href="CommandGuide/opt.html#cmdoption-stats"><code class="xref std std-option docutils literal"><span class="pre">-stats</span></code></a>
+command line option is enabled on the command line.  See the <a class="reference internal" href="ProgrammersManual.html#statistic"><span class="std std-ref">Statistics
+section</span></a> in the Programmer’s Manual for details.</p>
+<div class="section" id="what-passmanager-does">
+<span id="writing-an-llvm-pass-passmanager"></span><h3><a class="toc-backref" href="#id48">What PassManager does</a><a class="headerlink" href="#what-passmanager-does" title="Permalink to this headline">¶</a></h3>
+<p>The <a class="reference external" href="http://llvm.org/doxygen/PassManager_8h-source.html">PassManager</a> <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1PassManager.html">class</a> takes a list of
+passes, ensures their <a class="reference internal" href="#writing-an-llvm-pass-interaction"><span class="std std-ref">prerequisites</span></a>
+are set up correctly, and then schedules passes to run efficiently.  All of the
+LLVM tools that run passes use the PassManager for execution of these passes.</p>
+<p>The PassManager does two main things to try to reduce the execution time of a
+series of passes:</p>
+<ol class="arabic">
+<li><p class="first"><strong>Share analysis results.</strong>  The <code class="docutils literal"><span class="pre">PassManager</span></code> attempts to avoid
+recomputing analysis results as much as possible.  This means keeping track
+of which analyses are available already, which analyses get invalidated, and
+which analyses are needed to be run for a pass.  An important part of work
+is that the <code class="docutils literal"><span class="pre">PassManager</span></code> tracks the exact lifetime of all analysis
+results, allowing it to <a class="reference internal" href="#writing-an-llvm-pass-releasememory"><span class="std std-ref">free memory</span></a> allocated to holding analysis results
+as soon as they are no longer needed.</p>
+</li>
+<li><p class="first"><strong>Pipeline the execution of passes on the program.</strong>  The <code class="docutils literal"><span class="pre">PassManager</span></code>
+attempts to get better cache and memory usage behavior out of a series of
+passes by pipelining the passes together.  This means that, given a series
+of consecutive <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a>, it
+will execute all of the <a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPass</span></a> on the first function, then all of the
+<a class="reference internal" href="#writing-an-llvm-pass-functionpass"><span class="std std-ref">FunctionPasses</span></a> on the second
+function, etc… until the entire program has been run through the passes.</p>
+<p>This improves the cache behavior of the compiler, because it is only
+touching the LLVM program representation for a single function at a time,
+instead of traversing the entire program.  It reduces the memory consumption
+of compiler, because, for example, only one <a class="reference external" href="http://llvm.org/doxygen/classllvm_1_1DominatorSet.html">DominatorSet</a> needs to be
+calculated at a time.  This also makes it possible to implement some
+<a class="reference internal" href="#writing-an-llvm-pass-smp"><span class="std std-ref">interesting enhancements</span></a> in the future.</p>
+</li>
+</ol>
+<p>The effectiveness of the <code class="docutils literal"><span class="pre">PassManager</span></code> is influenced directly by how much
+information it has about the behaviors of the passes it is scheduling.  For
+example, the “preserved” set is intentionally conservative in the face of an
+unimplemented <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a>
+method.  Not implementing when it should be implemented will have the effect of
+not allowing any analysis results to live across the execution of your pass.</p>
+<p>The <code class="docutils literal"><span class="pre">PassManager</span></code> class exposes a <code class="docutils literal"><span class="pre">--debug-pass</span></code> command line options that
+is useful for debugging pass execution, seeing how things work, and diagnosing
+when you should be preserving more analyses than you currently are.  (To get
+information about all of the variants of the <code class="docutils literal"><span class="pre">--debug-pass</span></code> option, just type
+“<code class="docutils literal"><span class="pre">opt</span> <span class="pre">-help-hidden</span></code>”).</p>
+<p>By using the –debug-pass=Structure option, for example, we can see how our
+<a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> pass interacts with other
+passes.  Lets try it out with the gvn and licm passes:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -gvn -licm --debug-pass<span class="o">=</span>Structure < hello.bc > /dev/null
+<span class="go">ModulePass Manager</span>
+<span class="go">  FunctionPass Manager</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Memory Dependence Analysis</span>
+<span class="go">    Global Value Numbering</span>
+<span class="go">    Natural Loop Information</span>
+<span class="go">    Canonicalize natural loops</span>
+<span class="go">    Loop-Closed SSA Form Pass</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Scalar Evolution Analysis</span>
+<span class="go">    Loop Pass Manager</span>
+<span class="go">      Loop Invariant Code Motion</span>
+<span class="go">    Module Verifier</span>
+<span class="go">  Bitcode Writer</span>
+</pre></div>
+</div>
+<p>This output shows us when passes are constructed.
+Here we see that GVN uses dominator tree information to do its job.  The LICM pass
+uses natural loop information, which uses dominator tree as well.</p>
+<p>After the LICM pass, the module verifier runs (which is automatically added by
+the <strong class="program">opt</strong> tool), which uses the dominator tree to check that the
+resultant LLVM code is well formed. Note that the dominator tree is computed
+once, and shared by three passes.</p>
+<p>Lets see how this changes when we run the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> pass in between the two passes:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -gvn -hello -licm --debug-pass<span class="o">=</span>Structure < hello.bc > /dev/null
+<span class="go">ModulePass Manager</span>
+<span class="go">  FunctionPass Manager</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Memory Dependence Analysis</span>
+<span class="go">    Global Value Numbering</span>
+<span class="go">    Hello World Pass</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Natural Loop Information</span>
+<span class="go">    Canonicalize natural loops</span>
+<span class="go">    Loop-Closed SSA Form Pass</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Scalar Evolution Analysis</span>
+<span class="go">    Loop Pass Manager</span>
+<span class="go">      Loop Invariant Code Motion</span>
+<span class="go">    Module Verifier</span>
+<span class="go">  Bitcode Writer</span>
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+</pre></div>
+</div>
+<p>Here we see that the <a class="reference internal" href="#writing-an-llvm-pass-basiccode"><span class="std std-ref">Hello World</span></a> pass
+has killed the Dominator Tree pass, even though it doesn’t modify the code at
+all!  To fix this, we need to add the following <a class="reference internal" href="#writing-an-llvm-pass-getanalysisusage"><span class="std std-ref">getAnalysisUsage</span></a> method to our pass:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// We don't modify the program, so we preserve all analyses</span>
+<span class="kt">void</span> <span class="nf">getAnalysisUsage</span><span class="p">(</span><span class="n">AnalysisUsage</span> <span class="o">&</span><span class="n">AU</span><span class="p">)</span> <span class="k">const</span> <span class="k">override</span> <span class="p">{</span>
+  <span class="n">AU</span><span class="p">.</span><span class="n">setPreservesAll</span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Now when we run our pass, we get this output:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> opt -load lib/LLVMHello.so -gvn -hello -licm --debug-pass<span class="o">=</span>Structure < hello.bc > /dev/null
+<span class="go">Pass Arguments:  -gvn -hello -licm</span>
+<span class="go">ModulePass Manager</span>
+<span class="go">  FunctionPass Manager</span>
+<span class="go">    Dominator Tree Construction</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Memory Dependence Analysis</span>
+<span class="go">    Global Value Numbering</span>
+<span class="go">    Hello World Pass</span>
+<span class="go">    Natural Loop Information</span>
+<span class="go">    Canonicalize natural loops</span>
+<span class="go">    Loop-Closed SSA Form Pass</span>
+<span class="go">    Basic Alias Analysis (stateless AA impl)</span>
+<span class="go">    Function Alias Analysis Results</span>
+<span class="go">    Scalar Evolution Analysis</span>
+<span class="go">    Loop Pass Manager</span>
+<span class="go">      Loop Invariant Code Motion</span>
+<span class="go">    Module Verifier</span>
+<span class="go">  Bitcode Writer</span>
+<span class="go">Hello: __main</span>
+<span class="go">Hello: puts</span>
+<span class="go">Hello: main</span>
+</pre></div>
+</div>
+<p>Which shows that we don’t accidentally invalidate dominator information
+anymore, and therefore do not have to compute it twice.</p>
+<div class="section" id="the-releasememory-method">
+<span id="writing-an-llvm-pass-releasememory"></span><h4><a class="toc-backref" href="#id49">The <code class="docutils literal"><span class="pre">releaseMemory</span></code> method</a><a class="headerlink" href="#the-releasememory-method" title="Permalink to this headline">¶</a></h4>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">virtual</span> <span class="kt">void</span> <span class="nf">releaseMemory</span><span class="p">();</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">PassManager</span></code> automatically determines when to compute analysis results,
+and how long to keep them around for.  Because the lifetime of the pass object
+itself is effectively the entire duration of the compilation process, we need
+some way to free analysis results when they are no longer useful.  The
+<code class="docutils literal"><span class="pre">releaseMemory</span></code> virtual method is the way to do this.</p>
+<p>If you are writing an analysis or any other pass that retains a significant
+amount of state (for use by another pass which “requires” your pass and uses
+the <a class="reference internal" href="#writing-an-llvm-pass-getanalysis"><span class="std std-ref">getAnalysis</span></a> method) you should
+implement <code class="docutils literal"><span class="pre">releaseMemory</span></code> to, well, release the memory allocated to maintain
+this internal state.  This method is called after the <code class="docutils literal"><span class="pre">run*</span></code> method for the
+class, before the next call of <code class="docutils literal"><span class="pre">run*</span></code> in your pass.</p>
+</div>
+</div>
+</div>
+<div class="section" id="registering-dynamically-loaded-passes">
+<h2><a class="toc-backref" href="#id50">Registering dynamically loaded passes</a><a class="headerlink" href="#registering-dynamically-loaded-passes" title="Permalink to this headline">¶</a></h2>
+<p><em>Size matters</em> when constructing production quality tools using LLVM, both for
+the purposes of distribution, and for regulating the resident code size when
+running on the target system.  Therefore, it becomes desirable to selectively
+use some passes, while omitting others and maintain the flexibility to change
+configurations later on.  You want to be able to do all this, and, provide
+feedback to the user.  This is where pass registration comes into play.</p>
+<p>The fundamental mechanisms for pass registration are the
+<code class="docutils literal"><span class="pre">MachinePassRegistry</span></code> class and subclasses of <code class="docutils literal"><span class="pre">MachinePassRegistryNode</span></code>.</p>
+<p>An instance of <code class="docutils literal"><span class="pre">MachinePassRegistry</span></code> is used to maintain a list of
+<code class="docutils literal"><span class="pre">MachinePassRegistryNode</span></code> objects.  This instance maintains the list and
+communicates additions and deletions to the command line interface.</p>
+<p>An instance of <code class="docutils literal"><span class="pre">MachinePassRegistryNode</span></code> subclass is used to maintain
+information provided about a particular pass.  This information includes the
+command line name, the command help string and the address of the function used
+to create an instance of the pass.  A global static constructor of one of these
+instances <em>registers</em> with a corresponding <code class="docutils literal"><span class="pre">MachinePassRegistry</span></code>, the static
+destructor <em>unregisters</em>.  Thus a pass that is statically linked in the tool
+will be registered at start up.  A dynamically loaded pass will register on
+load and unregister at unload.</p>
+<div class="section" id="using-existing-registries">
+<h3><a class="toc-backref" href="#id51">Using existing registries</a><a class="headerlink" href="#using-existing-registries" title="Permalink to this headline">¶</a></h3>
+<p>There are predefined registries to track instruction scheduling
+(<code class="docutils literal"><span class="pre">RegisterScheduler</span></code>) and register allocation (<code class="docutils literal"><span class="pre">RegisterRegAlloc</span></code>) machine
+passes.  Here we will describe how to <em>register</em> a register allocator machine
+pass.</p>
+<p>Implement your register allocator machine pass.  In your register allocator
+<code class="docutils literal"><span class="pre">.cpp</span></code> file add the following include:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="cp">#include</span> <span class="cpf">"llvm/CodeGen/RegAllocRegistry.h"</span><span class="cp"></span>
+</pre></div>
+</div>
+<p>Also in your register allocator <code class="docutils literal"><span class="pre">.cpp</span></code> file, define a creator function in the
+form:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">FunctionPass</span> <span class="o">*</span><span class="nf">createMyRegisterAllocator</span><span class="p">()</span> <span class="p">{</span>
+  <span class="k">return</span> <span class="k">new</span> <span class="n">MyRegisterAllocator</span><span class="p">();</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Note that the signature of this function should match the type of
+<code class="docutils literal"><span class="pre">RegisterRegAlloc::FunctionPassCtor</span></code>.  In the same file add the “installing”
+declaration, in the form:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">static</span> <span class="n">RegisterRegAlloc</span> <span class="nf">myRegAlloc</span><span class="p">(</span><span class="s">"myregalloc"</span><span class="p">,</span>
+                                   <span class="s">"my register allocator help string"</span><span class="p">,</span>
+                                   <span class="n">createMyRegisterAllocator</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Note the two spaces prior to the help string produces a tidy result on the
+<a class="reference internal" href="CommandGuide/opt.html#cmdoption-help"><code class="xref std std-option docutils literal"><span class="pre">-help</span></code></a> query.</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> llc -help
+<span class="go">  ...</span>
+<span class="go">  -regalloc                    - Register allocator to use (default=linearscan)</span>
+<span class="go">    =linearscan                -   linear scan register allocator</span>
+<span class="go">    =local                     -   local register allocator</span>
+<span class="go">    =simple                    -   simple register allocator</span>
+<span class="go">    =myregalloc                -   my register allocator help string</span>
+<span class="go">  ...</span>
+</pre></div>
+</div>
+<p>And that’s it.  The user is now free to use <code class="docutils literal"><span class="pre">-regalloc=myregalloc</span></code> as an
+option.  Registering instruction schedulers is similar except use the
+<code class="docutils literal"><span class="pre">RegisterScheduler</span></code> class.  Note that the
+<code class="docutils literal"><span class="pre">RegisterScheduler::FunctionPassCtor</span></code> is significantly different from
+<code class="docutils literal"><span class="pre">RegisterRegAlloc::FunctionPassCtor</span></code>.</p>
+<p>To force the load/linking of your register allocator into the
+<strong class="program">llc</strong>/<strong class="program">lli</strong> tools, add your creator function’s global
+declaration to <code class="docutils literal"><span class="pre">Passes.h</span></code> and add a “pseudo” call line to
+<code class="docutils literal"><span class="pre">llvm/Codegen/LinkAllCodegenComponents.h</span></code>.</p>
+</div>
+<div class="section" id="creating-new-registries">
+<h3><a class="toc-backref" href="#id52">Creating new registries</a><a class="headerlink" href="#creating-new-registries" title="Permalink to this headline">¶</a></h3>
+<p>The easiest way to get started is to clone one of the existing registries; we
+recommend <code class="docutils literal"><span class="pre">llvm/CodeGen/RegAllocRegistry.h</span></code>.  The key things to modify are
+the class name and the <code class="docutils literal"><span class="pre">FunctionPassCtor</span></code> type.</p>
+<p>Then you need to declare the registry.  Example: if your pass registry is
+<code class="docutils literal"><span class="pre">RegisterMyPasses</span></code> then define:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">MachinePassRegistry</span> <span class="n">RegisterMyPasses</span><span class="o">::</span><span class="n">Registry</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>And finally, declare the command line option for your passes.  Example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">cl</span><span class="o">::</span><span class="n">opt</span><span class="o"><</span><span class="n">RegisterMyPasses</span><span class="o">::</span><span class="n">FunctionPassCtor</span><span class="p">,</span> <span class="nb">false</span><span class="p">,</span>
+        <span class="n">RegisterPassParser</span><span class="o"><</span><span class="n">RegisterMyPasses</span><span class="o">></span> <span class="o">></span>
+<span class="n">MyPassOpt</span><span class="p">(</span><span class="s">"mypass"</span><span class="p">,</span>
+          <span class="n">cl</span><span class="o">::</span><span class="n">init</span><span class="p">(</span><span class="o">&</span><span class="n">createDefaultMyPass</span><span class="p">),</span>
+          <span class="n">cl</span><span class="o">::</span><span class="n">desc</span><span class="p">(</span><span class="s">"my pass option help"</span><span class="p">));</span>
+</pre></div>
+</div>
+<p>Here the command option is “<code class="docutils literal"><span class="pre">mypass</span></code>”, with <code class="docutils literal"><span class="pre">createDefaultMyPass</span></code> as the
+default creator.</p>
+</div>
+<div class="section" id="using-gdb-with-dynamically-loaded-passes">
+<h3><a class="toc-backref" href="#id53">Using GDB with dynamically loaded passes</a><a class="headerlink" href="#using-gdb-with-dynamically-loaded-passes" title="Permalink to this headline">¶</a></h3>
+<p>Unfortunately, using GDB with dynamically loaded passes is not as easy as it
+should be.  First of all, you can’t set a breakpoint in a shared object that
+has not been loaded yet, and second of all there are problems with inlined
+functions in shared objects.  Here are some suggestions to debugging your pass
+with GDB.</p>
+<p>For sake of discussion, I’m going to assume that you are debugging a
+transformation invoked by <strong class="program">opt</strong>, although nothing described here
+depends on that.</p>
+<div class="section" id="setting-a-breakpoint-in-your-pass">
+<h4><a class="toc-backref" href="#id54">Setting a breakpoint in your pass</a><a class="headerlink" href="#setting-a-breakpoint-in-your-pass" title="Permalink to this headline">¶</a></h4>
+<p>First thing you do is start gdb on the opt process:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> gdb opt
+<span class="go">GNU gdb 5.0</span>
+<span class="go">Copyright 2000 Free Software Foundation, Inc.</span>
+<span class="go">GDB is free software, covered by the GNU General Public License, and you are</span>
+<span class="go">welcome to change it and/or distribute copies of it under certain conditions.</span>
+<span class="go">Type "show copying" to see the conditions.</span>
+<span class="go">There is absolutely no warranty for GDB.  Type "show warranty" for details.</span>
+<span class="go">This GDB was configured as "sparc-sun-solaris2.6"...</span>
+<span class="go">(gdb)</span>
+</pre></div>
+</div>
+<p>Note that <strong class="program">opt</strong> has a lot of debugging information in it, so it takes
+time to load.  Be patient.  Since we cannot set a breakpoint in our pass yet
+(the shared object isn’t loaded until runtime), we must execute the process,
+and have it stop before it invokes our pass, but after it has loaded the shared
+object.  The most foolproof way of doing this is to set a breakpoint in
+<code class="docutils literal"><span class="pre">PassManager::run</span></code> and then run the process with the arguments you want:</p>
+<div class="highlight-console"><div class="highlight"><pre><span></span><span class="gp">$</span> <span class="o">(</span>gdb<span class="o">)</span> <span class="nb">break</span> llvm::PassManager::run
+<span class="go">Breakpoint 1 at 0x2413bc: file Pass.cpp, line 70.</span>
+<span class="go">(gdb) run test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]</span>
+<span class="go">Starting program: opt test.bc -load $(LLVMTOP)/llvm/Debug+Asserts/lib/[libname].so -[passoption]</span>
+<span class="go">Breakpoint 1, PassManager::run (this=0xffbef174, M=@0x70b298) at Pass.cpp:70</span>
+<span class="go">70      bool PassManager::run(Module &M) { return PM->run(M); }</span>
+<span class="go">(gdb)</span>
+</pre></div>
+</div>
+<p>Once the <strong class="program">opt</strong> stops in the <code class="docutils literal"><span class="pre">PassManager::run</span></code> method you are now
+free to set breakpoints in your pass so that you can trace through execution or
+do other standard debugging stuff.</p>
+</div>
+<div class="section" id="miscellaneous-problems">
+<h4><a class="toc-backref" href="#id55">Miscellaneous Problems</a><a class="headerlink" href="#miscellaneous-problems" title="Permalink to this headline">¶</a></h4>
+<p>Once you have the basics down, there are a couple of problems that GDB has,
+some with solutions, some without.</p>
+<ul class="simple">
+<li>Inline functions have bogus stack information.  In general, GDB does a pretty
+good job getting stack traces and stepping through inline functions.  When a
+pass is dynamically loaded however, it somehow completely loses this
+capability.  The only solution I know of is to de-inline a function (move it
+from the body of a class to a <code class="docutils literal"><span class="pre">.cpp</span></code> file).</li>
+<li>Restarting the program breaks breakpoints.  After following the information
+above, you have succeeded in getting some breakpoints planted in your pass.
+Next thing you know, you restart the program (i.e., you type “<code class="docutils literal"><span class="pre">run</span></code>” again),
+and you start getting errors about breakpoints being unsettable.  The only
+way I have found to “fix” this problem is to delete the breakpoints that are
+already set in your pass, run the program, and re-set the breakpoints once
+execution stops in <code class="docutils literal"><span class="pre">PassManager::run</span></code>.</li>
+</ul>
+<p>Hopefully these tips will help with common case debugging situations.  If you’d
+like to contribute some tips of your own, just contact <a class="reference external" href="mailto:sabre%40nondot.org">Chris</a>.</p>
+</div>
+</div>
+<div class="section" id="future-extensions-planned">
+<h3><a class="toc-backref" href="#id56">Future extensions planned</a><a class="headerlink" href="#future-extensions-planned" title="Permalink to this headline">¶</a></h3>
+<p>Although the LLVM Pass Infrastructure is very capable as it stands, and does
+some nifty stuff, there are things we’d like to add in the future.  Here is
+where we are going:</p>
+<div class="section" id="multithreaded-llvm">
+<span id="writing-an-llvm-pass-smp"></span><h4><a class="toc-backref" href="#id57">Multithreaded LLVM</a><a class="headerlink" href="#multithreaded-llvm" title="Permalink to this headline">¶</a></h4>
+<p>Multiple CPU machines are becoming more common and compilation can never be
+fast enough: obviously we should allow for a multithreaded compiler.  Because
+of the semantics defined for passes above (specifically they cannot maintain
+state across invocations of their <code class="docutils literal"><span class="pre">run*</span></code> methods), a nice clean way to
+implement a multithreaded compiler would be for the <code class="docutils literal"><span class="pre">PassManager</span></code> class to
+create multiple instances of each pass object, and allow the separate instances
+to be hacking on different parts of the program at the same time.</p>
+<p>This implementation would prevent each of the passes from having to implement
+multithreaded constructs, requiring only the LLVM core to have locking in a few
+places (for global resources).  Although this is a simple extension, we simply
+haven’t had time (or multiprocessor machines, thus a reason) to implement this.
+Despite that, we have kept the LLVM passes SMP ready, and you should too.</p>
+</div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="HowToUseAttributes.html" title="How To Use Attributes"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="GarbageCollection.html" title="Garbage Collection with LLVM"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/XRay.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/XRay.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/XRay.html (added)
+++ www-releases/trunk/5.0.1/docs/XRay.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,408 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>XRay Instrumentation — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="Debugging with XRay" href="XRayExample.html" />
+    <link rel="prev" title="Global Instruction Selection" href="GlobalISel.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="XRayExample.html" title="Debugging with XRay"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="GlobalISel.html" title="Global Instruction Selection"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="xray-instrumentation">
+<h1>XRay Instrumentation<a class="headerlink" href="#xray-instrumentation" title="Permalink to this headline">¶</a></h1>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Version:</th><td class="field-body">1 as of 2016-11-08</td>
+</tr>
+</tbody>
+</table>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction" id="id1">Introduction</a></li>
+<li><a class="reference internal" href="#xray-in-llvm" id="id2">XRay in LLVM</a></li>
+<li><a class="reference internal" href="#using-xray" id="id3">Using XRay</a><ul>
+<li><a class="reference internal" href="#instrumenting-your-c-c-objective-c-application" id="id4">Instrumenting your C/C++/Objective-C Application</a></li>
+<li><a class="reference internal" href="#llvm-function-attribute" id="id5">LLVM Function Attribute</a></li>
+<li><a class="reference internal" href="#xray-runtime-library" id="id6">XRay Runtime Library</a></li>
+<li><a class="reference internal" href="#flight-data-recorder-mode" id="id7">Flight Data Recorder Mode</a></li>
+<li><a class="reference internal" href="#trace-analysis-tools" id="id8">Trace Analysis Tools</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#future-work" id="id9">Future Work</a><ul>
+<li><a class="reference internal" href="#trace-analysis" id="id10">Trace Analysis</a></li>
+<li><a class="reference internal" href="#more-platforms" id="id11">More Platforms</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section" id="introduction">
+<h2><a class="toc-backref" href="#id1">Introduction</a><a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
+<p>XRay is a function call tracing system which combines compiler-inserted
+instrumentation points and a runtime library that can dynamically enable and
+disable the instrumentation.</p>
+<p>More high level information about XRay can be found in the <a class="reference external" href="http://research.google.com/pubs/pub45287.html">XRay whitepaper</a>.</p>
+<p>This document describes how to use XRay as implemented in LLVM.</p>
+</div>
+<div class="section" id="xray-in-llvm">
+<h2><a class="toc-backref" href="#id2">XRay in LLVM</a><a class="headerlink" href="#xray-in-llvm" title="Permalink to this headline">¶</a></h2>
+<p>XRay consists of three main parts:</p>
+<ul>
+<li><p class="first">Compiler-inserted instrumentation points.</p>
+</li>
+<li><p class="first">A runtime library for enabling/disabling tracing at runtime.</p>
+</li>
+<li><p class="first">A suite of tools for analysing the traces.</p>
+<p><strong>NOTE:</strong> As of February 27, 2017 , XRay is only available for the following
+architectures running Linux: x86_64, arm7 (no thumb), aarch64, powerpc64le,
+mips, mipsel, mips64, mips64el.</p>
+</li>
+</ul>
+<p>The compiler-inserted instrumentation points come in the form of nop-sleds in
+the final generated binary, and an ELF section named <code class="docutils literal"><span class="pre">xray_instr_map</span></code> which
+contains entries pointing to these instrumentation points. The runtime library
+relies on being able to access the entries of the <code class="docutils literal"><span class="pre">xray_instr_map</span></code>, and
+overwrite the instrumentation points at runtime.</p>
+</div>
+<div class="section" id="using-xray">
+<h2><a class="toc-backref" href="#id3">Using XRay</a><a class="headerlink" href="#using-xray" title="Permalink to this headline">¶</a></h2>
+<p>You can use XRay in a couple of ways:</p>
+<ul class="simple">
+<li>Instrumenting your C/C++/Objective-C/Objective-C++ application.</li>
+<li>Generating LLVM IR with the correct function attributes.</li>
+</ul>
+<p>The rest of this section covers these main ways and later on how to customise
+what XRay does in an XRay-instrumented binary.</p>
+<div class="section" id="instrumenting-your-c-c-objective-c-application">
+<h3><a class="toc-backref" href="#id4">Instrumenting your C/C++/Objective-C Application</a><a class="headerlink" href="#instrumenting-your-c-c-objective-c-application" title="Permalink to this headline">¶</a></h3>
+<p>The easiest way of getting XRay instrumentation for your application is by
+enabling the <code class="docutils literal"><span class="pre">-fxray-instrument</span></code> flag in your clang invocation.</p>
+<p>For example:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">clang</span> <span class="o">-</span><span class="n">fxray</span><span class="o">-</span><span class="n">instrument</span> <span class="o">..</span>
+</pre></div>
+</div>
+<p>By default, functions that have at least 200 instructions will get XRay
+instrumentation points. You can tweak that number through the
+<code class="docutils literal"><span class="pre">-fxray-instruction-threshold=</span></code> flag:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="n">clang</span> <span class="o">-</span><span class="n">fxray</span><span class="o">-</span><span class="n">instrument</span> <span class="o">-</span><span class="n">fxray</span><span class="o">-</span><span class="n">instruction</span><span class="o">-</span><span class="n">threshold</span><span class="o">=</span><span class="mi">1</span> <span class="o">..</span>
+</pre></div>
+</div>
+<p>You can also specifically instrument functions in your binary to either always
+or never be instrumented using source-level attributes. You can do it using the
+GCC-style attributes or C++11-style attributes.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_always_intrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">always_instrumented</span><span class="p">();</span>
+
+<span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_never_instrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">never_instrumented</span><span class="p">();</span>
+
+<span class="kt">void</span> <span class="nf">alt_always_instrumented</span><span class="p">()</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">xray_always_intrument</span><span class="p">));</span>
+
+<span class="kt">void</span> <span class="nf">alt_never_instrumented</span><span class="p">()</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">xray_never_instrument</span><span class="p">));</span>
+</pre></div>
+</div>
+<p>When linking a binary, you can either manually link in the <a class="reference internal" href="#xray-runtime-library">XRay Runtime
+Library</a> or use <code class="docutils literal"><span class="pre">clang</span></code> to link it in automatically with the
+<code class="docutils literal"><span class="pre">-fxray-instrument</span></code> flag. Alternatively, you can statically link-in the XRay
+runtime library from compiler-rt – those archive files will take the name of
+<cite>libclang_rt.xray-{arch}</cite> where <cite>{arch}</cite> is the mnemonic supported by clang
+(x86_64, arm7, etc.).</p>
+</div>
+<div class="section" id="llvm-function-attribute">
+<h3><a class="toc-backref" href="#id5">LLVM Function Attribute</a><a class="headerlink" href="#llvm-function-attribute" title="Permalink to this headline">¶</a></h3>
+<p>If you’re using LLVM IR directly, you can add the <code class="docutils literal"><span class="pre">function-instrument</span></code>
+string attribute to your functions, to get the similar effect that the
+C/C++/Objective-C source-level attributes would get:</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="k">define</span> <span class="k">i32</span> <span class="vg">@always_instrument</span><span class="p">()</span> <span class="k">uwtable</span> <span class="s">"function-instrument"</span><span class="p">=</span><span class="s">"xray-always"</span> <span class="p">{</span>
+  <span class="c">; ...</span>
+<span class="p">}</span>
+
+<span class="k">define</span> <span class="k">i32</span> <span class="vg">@never_instrument</span><span class="p">()</span> <span class="k">uwtable</span> <span class="s">"function-instrument"</span><span class="p">=</span><span class="s">"xray-never"</span> <span class="p">{</span>
+  <span class="c">; ...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>You can also set the <code class="docutils literal"><span class="pre">xray-instruction-threshold</span></code> attribute and provide a
+numeric string value for how many instructions should be in the function before
+it gets instrumented.</p>
+<div class="highlight-llvm"><div class="highlight"><pre><span></span><span class="k">define</span> <span class="k">i32</span> <span class="vg">@maybe_instrument</span><span class="p">()</span> <span class="k">uwtable</span> <span class="s">"xray-instruction-threshold"</span><span class="p">=</span><span class="s">"2"</span> <span class="p">{</span>
+  <span class="c">; ...</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="xray-runtime-library">
+<h3><a class="toc-backref" href="#id6">XRay Runtime Library</a><a class="headerlink" href="#xray-runtime-library" title="Permalink to this headline">¶</a></h3>
+<p>The XRay Runtime Library is part of the compiler-rt project, which implements
+the runtime components that perform the patching and unpatching of inserted
+instrumentation points. When you use <code class="docutils literal"><span class="pre">clang</span></code> to link your binaries and the
+<code class="docutils literal"><span class="pre">-fxray-instrument</span></code> flag, it will automatically link in the XRay runtime.</p>
+<p>The default implementation of the XRay runtime will enable XRay instrumentation
+before <code class="docutils literal"><span class="pre">main</span></code> starts, which works for applications that have a short
+lifetime. This implementation also records all function entry and exit events
+which may result in a lot of records in the resulting trace.</p>
+<p>Also by default the filename of the XRay trace is <code class="docutils literal"><span class="pre">xray-log.XXXXXX</span></code> where the
+<code class="docutils literal"><span class="pre">XXXXXX</span></code> part is randomly generated.</p>
+<p>These options can be controlled through the <code class="docutils literal"><span class="pre">XRAY_OPTIONS</span></code> environment
+variable, where we list down the options and their defaults below.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="25%" />
+<col width="23%" />
+<col width="20%" />
+<col width="32%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Option</th>
+<th class="head">Type</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>patch_premain</td>
+<td><code class="docutils literal"><span class="pre">bool</span></code></td>
+<td><code class="docutils literal"><span class="pre">false</span></code></td>
+<td>Whether to patch
+instrumentation points
+before main.</td>
+</tr>
+<tr class="row-odd"><td>xray_naive_log</td>
+<td><code class="docutils literal"><span class="pre">bool</span></code></td>
+<td><code class="docutils literal"><span class="pre">true</span></code></td>
+<td>Whether to install
+the naive log
+implementation.</td>
+</tr>
+<tr class="row-even"><td>xray_logfile_base</td>
+<td><code class="docutils literal"><span class="pre">const</span> <span class="pre">char*</span></code></td>
+<td><code class="docutils literal"><span class="pre">xray-log.</span></code></td>
+<td>Filename base for the
+XRay logfile.</td>
+</tr>
+<tr class="row-odd"><td>xray_fdr_log</td>
+<td><code class="docutils literal"><span class="pre">bool</span></code></td>
+<td><code class="docutils literal"><span class="pre">false</span></code></td>
+<td>Whether to install the
+Flight Data Recorder
+(FDR) mode.</td>
+</tr>
+</tbody>
+</table>
+<p>If you choose to not use the default logging implementation that comes with the
+XRay runtime and/or control when/how the XRay instrumentation runs, you may use
+the XRay APIs directly for doing so. To do this, you’ll need to include the
+<code class="docutils literal"><span class="pre">xray_interface.h</span></code> from the compiler-rt <code class="docutils literal"><span class="pre">xray</span></code> directory. The important API
+functions we list below:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">__xray_set_handler(void</span> <span class="pre">(*entry)(int32_t,</span> <span class="pre">XRayEntryType))</span></code>: Install your
+own logging handler for when an event is encountered. See
+<code class="docutils literal"><span class="pre">xray/xray_interface.h</span></code> for more details.</li>
+<li><code class="docutils literal"><span class="pre">__xray_remove_handler()</span></code>: Removes whatever the installed handler is.</li>
+<li><code class="docutils literal"><span class="pre">__xray_patch()</span></code>: Patch all the instrumentation points defined in the
+binary.</li>
+<li><code class="docutils literal"><span class="pre">__xray_unpatch()</span></code>: Unpatch the instrumentation points defined in the
+binary.</li>
+</ul>
+<p>There are some requirements on the logging handler to be installed for the
+thread-safety of operations to be performed by the XRay runtime library:</p>
+<ul class="simple">
+<li>The function should be thread-safe, as multiple threads may be invoking the
+function at the same time. If the logging function needs to do
+synchronisation, it must do so internally as XRay does not provide any
+synchronisation guarantees outside from the atomicity of updates to the
+pointer.</li>
+<li>The pointer provided to <code class="docutils literal"><span class="pre">__xray_set_handler(...)</span></code> must be live even after
+calls to <code class="docutils literal"><span class="pre">__xray_remove_handler()</span></code> and <code class="docutils literal"><span class="pre">__xray_unpatch()</span></code> have succeeded.
+XRay cannot guarantee that all threads that have ever gotten a copy of the
+pointer will not invoke the function.</li>
+</ul>
+</div>
+<div class="section" id="flight-data-recorder-mode">
+<h3><a class="toc-backref" href="#id7">Flight Data Recorder Mode</a><a class="headerlink" href="#flight-data-recorder-mode" title="Permalink to this headline">¶</a></h3>
+<p>XRay supports a logging mode which allows the application to only capture a
+fixed amount of memory’s worth of events. Flight Data Recorder (FDR) mode works
+very much like a plane’s “black box” which keeps recording data to memory in a
+fixed-size circular queue of buffers, and have the data available
+programmatically until the buffers are finalized and flushed. To use FDR mode
+on your application, you may set the <code class="docutils literal"><span class="pre">xray_fdr_log</span></code> option to <code class="docutils literal"><span class="pre">true</span></code> in the
+<code class="docutils literal"><span class="pre">XRAY_OPTIONS</span></code> environment variable (while also optionally setting the
+<code class="docutils literal"><span class="pre">xray_naive_log</span></code> to <code class="docutils literal"><span class="pre">false</span></code>).</p>
+<p>When FDR mode is on, it will keep writing and recycling memory buffers until
+the logging implementation is finalized – at which point it can be flushed and
+re-initialised later. To do this programmatically, we follow the workflow
+provided below:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// Patch the sleds, if we haven't yet.</span>
+<span class="k">auto</span> <span class="n">patch_status</span> <span class="o">=</span> <span class="n">__xray_patch</span><span class="p">();</span>
+
+<span class="c1">// Maybe handle the patch_status errors.</span>
+
+<span class="c1">// When we want to flush the log, we need to finalize it first, to give</span>
+<span class="c1">// threads a chance to return buffers to the queue.</span>
+<span class="k">auto</span> <span class="n">finalize_status</span> <span class="o">=</span> <span class="n">__xray_log_finalize</span><span class="p">();</span>
+<span class="k">if</span> <span class="p">(</span><span class="n">finalize_status</span> <span class="o">!=</span> <span class="n">XRAY_LOG_FINALIZED</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// maybe retry, or bail out.</span>
+<span class="p">}</span>
+
+<span class="c1">// At this point, we are sure that the log is finalized, so we may try</span>
+<span class="c1">// flushing the log.</span>
+<span class="k">auto</span> <span class="n">flush_status</span> <span class="o">=</span> <span class="n">__xray_log_flushLog</span><span class="p">();</span>
+<span class="k">if</span> <span class="p">(</span><span class="n">flush_status</span> <span class="o">!=</span> <span class="n">XRAY_LOG_FLUSHED</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// maybe retry, or bail out.</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The default settings for the FDR mode implementation will create logs named
+similarly to the naive log implementation, but will have a different log
+format. All the trace analysis tools (and the trace reading library) will
+support all versions of the FDR mode format as we add more functionality and
+record types in the future.</p>
+<blockquote>
+<div><strong>NOTE:</strong> We do not however promise perpetual support for when we update the
+log versions we support going forward. Deprecation of the formats will be
+announced and discussed on the developers mailing list.</div></blockquote>
+<p>XRay allows for replacing the default FDR mode logging implementation using the
+following API:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">__xray_set_log_impl(...)</span></code>: This function takes a struct of type
+<code class="docutils literal"><span class="pre">XRayLogImpl</span></code>, which is defined in <code class="docutils literal"><span class="pre">xray/xray_log_interface.h</span></code>, part of
+the XRay compiler-rt installation.</li>
+<li><code class="docutils literal"><span class="pre">__xray_log_init(...)</span></code>: This function allows for initializing and
+re-initializing an installed logging implementation. See
+<code class="docutils literal"><span class="pre">xray/xray_log_interface.h</span></code> for details, part of the XRay compiler-rt
+installation.</li>
+</ul>
+</div>
+<div class="section" id="trace-analysis-tools">
+<h3><a class="toc-backref" href="#id8">Trace Analysis Tools</a><a class="headerlink" href="#trace-analysis-tools" title="Permalink to this headline">¶</a></h3>
+<p>We currently have the beginnings of a trace analysis tool in LLVM, which can be
+found in the <code class="docutils literal"><span class="pre">tools/llvm-xray</span></code> directory. The <code class="docutils literal"><span class="pre">llvm-xray</span></code> tool currently
+supports the following subcommands:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">extract</span></code>: Extract the instrumentation map from a binary, and return it as
+YAML.</li>
+<li><code class="docutils literal"><span class="pre">account</span></code>: Performs basic function call accounting statistics with various
+options for sorting, and output formats (supports CSV, YAML, and
+console-friendly TEXT).</li>
+<li><code class="docutils literal"><span class="pre">convert</span></code>: Converts an XRay log file from one format to another. Currently
+only converts to YAML.</li>
+<li><code class="docutils literal"><span class="pre">graph</span></code>: Generates a DOT graph of the function call relationships between
+functions found in an XRay trace.</li>
+</ul>
+<p>These subcommands use various library components found as part of the XRay
+libraries, distributed with the LLVM distribution. These are:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">llvm/XRay/Trace.h</span></code> : A trace reading library for conveniently loading
+an XRay trace of supported forms, into a convenient in-memory representation.
+All the analysis tools that deal with traces use this implementation.</li>
+<li><code class="docutils literal"><span class="pre">llvm/XRay/Graph.h</span></code> : A semi-generic graph type used by the graph
+subcommand to conveniently represent a function call graph with statistics
+associated with edges and vertices.</li>
+<li><code class="docutils literal"><span class="pre">llvm/XRay/InstrumentationMap.h</span></code>: A convenient tool for analyzing the
+instrumentation map in XRay-instrumented object files and binaries. The
+<code class="docutils literal"><span class="pre">extract</span></code> subcommand uses this particular library.</li>
+</ul>
+</div>
+</div>
+<div class="section" id="future-work">
+<h2><a class="toc-backref" href="#id9">Future Work</a><a class="headerlink" href="#future-work" title="Permalink to this headline">¶</a></h2>
+<p>There are a number of ongoing efforts for expanding the toolset building around
+the XRay instrumentation system.</p>
+<div class="section" id="trace-analysis">
+<h3><a class="toc-backref" href="#id10">Trace Analysis</a><a class="headerlink" href="#trace-analysis" title="Permalink to this headline">¶</a></h3>
+<p>We have more subcommands and modes that we’re thinking of developing, in the
+following forms:</p>
+<ul class="simple">
+<li><code class="docutils literal"><span class="pre">stack</span></code>: Reconstruct the function call stacks in a timeline.</li>
+</ul>
+</div>
+<div class="section" id="more-platforms">
+<h3><a class="toc-backref" href="#id11">More Platforms</a><a class="headerlink" href="#more-platforms" title="Permalink to this headline">¶</a></h3>
+<p>We’re looking forward to contributions to port XRay to more architectures and
+operating systems.</p>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="XRayExample.html" title="Debugging with XRay"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="GlobalISel.html" title="Global Instruction Selection"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/XRayExample.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/XRayExample.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/XRayExample.html (added)
+++ www-releases/trunk/5.0.1/docs/XRayExample.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,348 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>Debugging with XRay — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="The PDB File Format" href="PDB/index.html" />
+    <link rel="prev" title="XRay Instrumentation" href="XRay.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="PDB/index.html" title="The PDB File Format"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="XRay.html" title="XRay Instrumentation"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="debugging-with-xray">
+<h1>Debugging with XRay<a class="headerlink" href="#debugging-with-xray" title="Permalink to this headline">¶</a></h1>
+<p>This document shows an example of how you would go about analyzing applications
+built with XRay instrumentation. Here we will attempt to debug <code class="docutils literal"><span class="pre">llc</span></code>
+compiling some sample LLVM IR generated by Clang.</p>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#building-with-xray" id="id1">Building with XRay</a></li>
+<li><a class="reference internal" href="#getting-traces" id="id2">Getting Traces</a></li>
+<li><a class="reference internal" href="#the-llvm-xray-tool" id="id3">The <code class="docutils literal"><span class="pre">llvm-xray</span></code> Tool</a></li>
+<li><a class="reference internal" href="#controlling-fidelity" id="id4">Controlling Fidelity</a><ul>
+<li><a class="reference internal" href="#instruction-threshold" id="id5">Instruction Threshold</a></li>
+<li><a class="reference internal" href="#instrumentation-attributes" id="id6">Instrumentation Attributes</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#further-exploration" id="id7">Further Exploration</a></li>
+<li><a class="reference internal" href="#next-steps" id="id8">Next Steps</a></li>
+</ul>
+</div>
+<div class="section" id="building-with-xray">
+<h2><a class="toc-backref" href="#id1">Building with XRay</a><a class="headerlink" href="#building-with-xray" title="Permalink to this headline">¶</a></h2>
+<p>To debug an application with XRay instrumentation, we need to build it with a
+Clang that supports the <code class="docutils literal"><span class="pre">-fxray-instrument</span></code> option. See <a class="reference external" href="XRay.html">XRay</a>
+for more technical details of how XRay works for background information.</p>
+<p>In our example, we need to add <code class="docutils literal"><span class="pre">-fxray-instrument</span></code> to the list of flags
+passed to Clang when building a binary. Note that we need to link with Clang as
+well to get the XRay runtime linked in appropriately. For building <code class="docutils literal"><span class="pre">llc</span></code> with
+XRay, we do something similar below for our LLVM build:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ mkdir -p llvm-build && cd llvm-build
+# Assume that the LLVM sources are at ../llvm
+$ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
+    -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \
+# Once this finishes, we should build llc
+$ ninja llc
+</pre></div>
+</div>
+<p>To verify that we have an XRay instrumented binary, we can use <code class="docutils literal"><span class="pre">objdump</span></code> to
+look for the <code class="docutils literal"><span class="pre">xray_instr_map</span></code> section.</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ objdump -h -j xray_instr_map ./bin/llc
+./bin/llc:     file format elf64-x86-64
+
+Sections:
+Idx Name          Size      VMA               LMA               File off  Algn
+ 14 xray_instr_map 00002fc0  00000000041516c6  00000000041516c6  03d516c6  2**0
+                  CONTENTS, ALLOC, LOAD, READONLY, DATA
+</pre></div>
+</div>
+</div>
+<div class="section" id="getting-traces">
+<h2><a class="toc-backref" href="#id2">Getting Traces</a><a class="headerlink" href="#getting-traces" title="Permalink to this headline">¶</a></h2>
+<p>By default, XRay does not write out the trace files or patch the application
+before main starts. If we just run <code class="docutils literal"><span class="pre">llc</span></code> it should just work like a normally
+built binary. However, if we want to get a full trace of the application’s
+operations (of the functions we do end up instrumenting with XRay) then we need
+to enable XRay at application start. To do this, XRay checks the
+<code class="docutils literal"><span class="pre">XRAY_OPTIONS</span></code> environment variable.</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span># The following doesn't create an XRay trace by default.
+$ ./bin/llc input.ll
+
+# We need to set the XRAY_OPTIONS to enable some features.
+$ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
+==69819==XRay: Log file in 'xray-log.llc.m35qPB'
+</pre></div>
+</div>
+<p>At this point we now have an XRay trace we can start analysing.</p>
+</div>
+<div class="section" id="the-llvm-xray-tool">
+<h2><a class="toc-backref" href="#id3">The <code class="docutils literal"><span class="pre">llvm-xray</span></code> Tool</a><a class="headerlink" href="#the-llvm-xray-tool" title="Permalink to this headline">¶</a></h2>
+<p>Having a trace then allows us to do basic accounting of the functions that were
+instrumented, and how much time we’re spending in parts of the code. To make
+sense of this data, we use the <code class="docutils literal"><span class="pre">llvm-xray</span></code> tool which has a few subcommands
+to help us understand our trace.</p>
+<p>One of the simplest things we can do is to get an accounting of the functions
+that have been instrumented. We can see an example accounting with <code class="docutils literal"><span class="pre">llvm-xray</span>
+<span class="pre">account</span></code>:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
+Functions with latencies: 29
+   funcid      count [      min,       med,       90p,       99p,       max]       sum  function
+      187        360 [ 0.000000,  0.000001,  0.000014,  0.000032,  0.000075]  0.001596  LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier()
+       85        130 [ 0.000000,  0.000000,  0.000018,  0.000023,  0.000156]  0.000799  X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*)
+      138        130 [ 0.000000,  0.000000,  0.000017,  0.000155,  0.000155]  0.000774  SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
+      188        103 [ 0.000000,  0.000000,  0.000003,  0.000123,  0.000214]  0.000737  LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*)
+       88          1 [ 0.000562,  0.000562,  0.000562,  0.000562,  0.000562]  0.000562  X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&)
+      125        102 [ 0.000001,  0.000003,  0.000010,  0.000017,  0.000049]  0.000471  Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&)
+       90          8 [ 0.000023,  0.000035,  0.000106,  0.000106,  0.000106]  0.000342  X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const
+      124         32 [ 0.000003,  0.000007,  0.000016,  0.000041,  0.000041]  0.000310  Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&)
+      123          1 [ 0.000302,  0.000302,  0.000302,  0.000302,  0.000302]  0.000302  LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl()
+      139         46 [ 0.000000,  0.000002,  0.000006,  0.000008,  0.000019]  0.000138  TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const
+</pre></div>
+</div>
+<p>This shows us that for our input file, <code class="docutils literal"><span class="pre">llc</span></code> spent the most cumulative time
+in the lexer (a total of 1 millisecond). If we wanted for example to work with
+this data in a spreadsheet, we can output the results as CSV using the
+<code class="docutils literal"><span class="pre">-format=csv</span></code> option to the command for further analysis.</p>
+<p>If we want to get a textual representation of the raw trace we can use the
+<code class="docutils literal"><span class="pre">llvm-xray</span> <span class="pre">convert</span></code> tool to get YAML output. The first few lines of that
+ouput for an example trace would look like the following:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB
+---
+header:
+  version:         1
+  type:            0
+  constant-tsc:    true
+  nonstop-tsc:     true
+  cycle-frequency: 2601000000
+records:
+  - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 }
+  - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 }
+  - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 }
+  - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 }
+  - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 }
+  - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 }
+  - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 }
+</pre></div>
+</div>
+</div>
+<div class="section" id="controlling-fidelity">
+<h2><a class="toc-backref" href="#id4">Controlling Fidelity</a><a class="headerlink" href="#controlling-fidelity" title="Permalink to this headline">¶</a></h2>
+<p>So far in our examples, we haven’t been getting full coverage of the functions
+we have in the binary. To get that, we need to modify the compiler flags so
+that we can instrument more (if not all) the functions we have in the binary.
+We have two options for doing that, and we explore both of these below.</p>
+<div class="section" id="instruction-threshold">
+<h3><a class="toc-backref" href="#id5">Instruction Threshold</a><a class="headerlink" href="#instruction-threshold" title="Permalink to this headline">¶</a></h3>
+<p>The first “blunt” way of doing this is by setting the minimum threshold for
+function bodies to 1. We can do that with the
+<code class="docutils literal"><span class="pre">-fxray-instruction-threshold=N</span></code> flag when building our binary. We rebuild
+<code class="docutils literal"><span class="pre">llc</span></code> with this option and observe the results:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ rm CMakeCache.txt
+$ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
+    -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \
+    -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1"
+$ ninja llc
+$ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
+==69819==XRay: Log file in 'xray-log.llc.5rqxkU'
+
+$ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
+Functions with latencies: 36652
+ funcid      count [      min,       med,       90p,       99p,       max]       sum  function
+     75          1 [ 0.672368,  0.672368,  0.672368,  0.672368,  0.672368]  0.672368  llc.cpp:271:0: main
+     78          1 [ 0.626455,  0.626455,  0.626455,  0.626455,  0.626455]  0.626455  llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&)
+ 139617          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&)
+ 139610          1 [ 0.472618,  0.472618,  0.472618,  0.472618,  0.472618]  0.472618  LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&)
+ 139612          1 [ 0.470948,  0.470948,  0.470948,  0.470948,  0.470948]  0.470948  LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&)
+ 139607          2 [ 0.147345,  0.315994,  0.315994,  0.315994,  0.315994]  0.463340  LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&)
+ 139605         21 [ 0.000002,  0.000002,  0.102593,  0.213336,  0.213336]  0.463331  LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&)
+ 139563      26096 [ 0.000002,  0.000002,  0.000037,  0.000063,  0.000215]  0.225708  LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool)
+ 108055        188 [ 0.000002,  0.000120,  0.001375,  0.004523,  0.062624]  0.159279  MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
+  62635         22 [ 0.000041,  0.000046,  0.000050,  0.126744,  0.126744]  0.127715  X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const
+</pre></div>
+</div>
+</div>
+<div class="section" id="instrumentation-attributes">
+<h3><a class="toc-backref" href="#id6">Instrumentation Attributes</a><a class="headerlink" href="#instrumentation-attributes" title="Permalink to this headline">¶</a></h3>
+<p>The other way is to use configuration files for selecting which functions
+should always be instrumented by the compiler. This gives us a way of ensuring
+that certain functions are either always or never instrumented by not having to
+add the attribute to the source.</p>
+<p>To use this feature, you can define one file for the functions to always
+instrument, and another for functions to never instrument. The format of these
+files are exactly the same as the SanitizerLists files that control similar
+things for the sanitizer implementations. For example, we can have two
+different files like below:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span><span class="c1"># always-instrument.txt</span>
+<span class="c1"># always instrument functions that match the following filters:</span>
+<span class="n">fun</span><span class="p">:</span><span class="n">main</span>
+
+<span class="c1"># never-instrument.txt</span>
+<span class="c1"># never instrument functions that match the following filters:</span>
+<span class="n">fun</span><span class="p">:</span><span class="n">__cxx_</span><span class="o">*</span>
+</pre></div>
+</div>
+<p>Given the above two files we can re-build by providing those two files as
+arguments to clang as <code class="docutils literal"><span class="pre">-fxray-always-instrument=always-instrument.txt</span></code> or
+<code class="docutils literal"><span class="pre">-fxray-never-instrument=never-instrument.txt</span></code>.</p>
+</div>
+</div>
+<div class="section" id="further-exploration">
+<h2><a class="toc-backref" href="#id7">Further Exploration</a><a class="headerlink" href="#further-exploration" title="Permalink to this headline">¶</a></h2>
+<p>The <code class="docutils literal"><span class="pre">llvm-xray</span></code> tool has a few other subcommands that are in various stages
+of being developed. One interesting subcommand that can highlight a few
+interesting things is the <code class="docutils literal"><span class="pre">graph</span></code> subcommand. Given for example the following
+toy program that we build with XRay instrumentation, we can see how the
+generated graph may be a helpful indicator of where time is being spent for the
+application.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// sample.cc</span>
+<span class="cp">#include</span> <span class="cpf"><iostream></span><span class="cp"></span>
+<span class="cp">#include</span> <span class="cpf"><thread></span><span class="cp"></span>
+
+<span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_always_intrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
+  <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="sc">'.'</span><span class="p">;</span>
+<span class="p">}</span>
+
+<span class="p">[[</span><span class="n">clang</span><span class="o">::</span><span class="n">xray_always_intrument</span><span class="p">]]</span> <span class="kt">void</span> <span class="n">g</span><span class="p">()</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="sc">'-'</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+
+<span class="kt">int</span> <span class="n">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">argv</span><span class="p">[])</span> <span class="p">{</span>
+  <span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t1</span><span class="p">([]</span> <span class="p">{</span>
+    <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
+      <span class="n">f</span><span class="p">();</span>
+  <span class="p">});</span>
+  <span class="n">std</span><span class="o">::</span><span class="kr">thread</span> <span class="n">t2</span><span class="p">([]</span> <span class="p">{</span>
+    <span class="n">g</span><span class="p">();</span>
+  <span class="p">});</span>
+  <span class="n">t1</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
+  <span class="n">t2</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
+  <span class="n">std</span><span class="o">::</span><span class="n">cerr</span> <span class="o"><<</span> <span class="sc">'\n'</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>We then build the above with XRay instrumentation:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1
+$ XRAY_OPTIONS="patch_premain=true" ./sample
+</pre></div>
+</div>
+<p>We can then explore the graph rendering of the trace generated by this sample
+application. We assume you have the graphviz toosl available in your system,
+including both <code class="docutils literal"><span class="pre">unflatten</span></code> and <code class="docutils literal"><span class="pre">dot</span></code>. If you prefer rendering or exploring
+the graph using another tool, then that should be feasible as well. <code class="docutils literal"><span class="pre">llvm-xray</span>
+<span class="pre">graph</span></code> will create DOT format graphs which should be usable in most graph
+rendering applications. One example invocation of the <code class="docutils literal"><span class="pre">llvm-xray</span> <span class="pre">graph</span></code>
+command should yield some interesting insights to the workings of C++
+applications:</p>
+<div class="highlight-default"><div class="highlight"><pre><span></span>$ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \
+    | unflatten -f -l10 | dot -Tsvg -o sample.svg
+</pre></div>
+</div>
+</div>
+<div class="section" id="next-steps">
+<h2><a class="toc-backref" href="#id8">Next Steps</a><a class="headerlink" href="#next-steps" title="Permalink to this headline">¶</a></h2>
+<p>If you have some interesting analyses you’d like to implement as part of the
+llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list.
+The following are some ideas to inspire you in getting involved and potentially
+making things better.</p>
+<blockquote>
+<div><ul class="simple">
+<li>Implement a query/filtering library that allows for finding patterns in the
+XRay traces.</li>
+<li>A conversion from the XRay trace onto something that can be visualised
+better by other tools (like the Chrome trace viewer for example).</li>
+<li>Collecting function call stacks and how often they’re encountered in the
+XRay trace.</li>
+</ul>
+</div></blockquote>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="PDB/index.html" title="The PDB File Format"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="XRay.html" title="XRay Instrumentation"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/YamlIO.html
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/YamlIO.html?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/YamlIO.html (added)
+++ www-releases/trunk/5.0.1/docs/YamlIO.html Thu Dec 21 10:09:53 2017
@@ -0,0 +1,1029 @@
+
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>YAML I/O — LLVM 5 documentation</title>
+    <link rel="stylesheet" href="_static/llvm-theme.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    './',
+        VERSION:     '5',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="The Often Misunderstood GEP Instruction" href="GetElementPtr.html" />
+    <link rel="prev" title="LLVM’s Analysis and Transform Passes" href="Passes.html" />
+<style type="text/css">
+  table.right { float: right; margin-left: 20px; }
+  table.right td { border: 1px solid #ccc; }
+</style>
+
+  </head>
+  <body>
+<div class="logo">
+  <a href="index.html">
+    <img src="_static/logo.png"
+         alt="LLVM Logo" width="250" height="88"/></a>
+</div>
+
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="GetElementPtr.html" title="The Often Misunderstood GEP Instruction"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="Passes.html" title="LLVM’s Analysis and Transform Passes"
+             accesskey="P">previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+
+
+    <div class="document">
+      <div class="documentwrapper">
+          <div class="body" role="main">
+            
+  <div class="section" id="yaml-i-o">
+<h1>YAML I/O<a class="headerlink" href="#yaml-i-o" title="Permalink to this headline">¶</a></h1>
+<div class="contents local topic" id="contents">
+<ul class="simple">
+<li><a class="reference internal" href="#introduction-to-yaml" id="id1">Introduction to YAML</a></li>
+<li><a class="reference internal" href="#introduction-to-yaml-i-o" id="id2">Introduction to YAML I/O</a></li>
+<li><a class="reference internal" href="#error-handling" id="id3">Error Handling</a></li>
+<li><a class="reference internal" href="#scalars" id="id4">Scalars</a><ul>
+<li><a class="reference internal" href="#built-in-types" id="id5">Built-in types</a></li>
+<li><a class="reference internal" href="#unique-types" id="id6">Unique types</a></li>
+<li><a class="reference internal" href="#hex-types" id="id7">Hex types</a></li>
+<li><a class="reference internal" href="#scalarenumerationtraits" id="id8">ScalarEnumerationTraits</a></li>
+<li><a class="reference internal" href="#bitvalue" id="id9">BitValue</a></li>
+<li><a class="reference internal" href="#custom-scalar" id="id10">Custom Scalar</a></li>
+<li><a class="reference internal" href="#block-scalars" id="id11">Block Scalars</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#mappings" id="id12">Mappings</a><ul>
+<li><a class="reference internal" href="#no-normalization" id="id13">No Normalization</a></li>
+<li><a class="reference internal" href="#normalization" id="id14">Normalization</a></li>
+<li><a class="reference internal" href="#default-values" id="id15">Default values</a></li>
+<li><a class="reference internal" href="#order-of-keys" id="id16">Order of Keys</a></li>
+<li><a class="reference internal" href="#tags" id="id17">Tags</a></li>
+<li><a class="reference internal" href="#validation" id="id18">Validation</a></li>
+<li><a class="reference internal" href="#flow-mapping" id="id19">Flow Mapping</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#sequence" id="id20">Sequence</a><ul>
+<li><a class="reference internal" href="#flow-sequence" id="id21">Flow Sequence</a></li>
+<li><a class="reference internal" href="#utility-macros" id="id22">Utility Macros</a></li>
+</ul>
+</li>
+<li><a class="reference internal" href="#document-list" id="id23">Document List</a></li>
+<li><a class="reference internal" href="#user-context-data" id="id24">User Context Data</a></li>
+<li><a class="reference internal" href="#output" id="id25">Output</a></li>
+<li><a class="reference internal" href="#input" id="id26">Input</a></li>
+</ul>
+</div>
+<div class="section" id="introduction-to-yaml">
+<h2><a class="toc-backref" href="#id1">Introduction to YAML</a><a class="headerlink" href="#introduction-to-yaml" title="Permalink to this headline">¶</a></h2>
+<p>YAML is a human readable data serialization language.  The full YAML language
+spec can be read at <a class="reference external" href="http://www.yaml.org/spec/1.2/spec.html#Introduction">yaml.org</a>.  The simplest form of
+yaml is just “scalars”, “mappings”, and “sequences”.  A scalar is any number
+or string.  The pound/hash symbol (#) begins a comment line.   A mapping is
+a set of key-value pairs where the key ends with a colon.  For example:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a mapping</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+</pre></div>
+</div>
+<p>A sequence is a list of items where each item starts with a leading dash (‘-‘).
+For example:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a sequence</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86_64</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">PowerPC</span>
+</pre></div>
+</div>
+<p>You can combine mappings and sequences by indenting.  For example a sequence
+of mappings in which one of the mapping values is itself a sequence:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a sequence of mappings with one key's value being a sequence</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86_64</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Bob</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">PowerPC</span>
+   <span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">x86</span>
+</pre></div>
+</div>
+<p>Sometime sequences are known to be short and the one entry per line is too
+verbose, so YAML offers an alternate syntax for sequences called a “Flow
+Sequence” in which you put comma separated sequence elements into square
+brackets.  The above example could then be simplified to :</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="c1"># a sequence of mappings with one key's value being a flow sequence</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>      <span class="p p-Indicator">[</span> <span class="nv">x86</span><span class="p p-Indicator">,</span> <span class="nv">x86_64</span> <span class="p p-Indicator">]</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Bob</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>      <span class="p p-Indicator">[</span> <span class="nv">x86</span> <span class="p p-Indicator">]</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">cpus</span><span class="p p-Indicator">:</span>      <span class="p p-Indicator">[</span> <span class="nv">PowerPC</span><span class="p p-Indicator">,</span> <span class="nv">x86</span> <span class="p p-Indicator">]</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="introduction-to-yaml-i-o">
+<h2><a class="toc-backref" href="#id2">Introduction to YAML I/O</a><a class="headerlink" href="#introduction-to-yaml-i-o" title="Permalink to this headline">¶</a></h2>
+<p>The use of indenting makes the YAML easy for a human to read and understand,
+but having a program read and write YAML involves a lot of tedious details.
+The YAML I/O library structures and simplifies reading and writing YAML
+documents.</p>
+<p>YAML I/O assumes you have some “native” data structures which you want to be
+able to dump as YAML and recreate from YAML.  The first step is to try
+writing example YAML for your data structures. You may find after looking at
+possible YAML representations that a direct mapping of your data structures
+to YAML is not very readable.  Often the fields are not in the order that
+a human would find readable.  Or the same information is replicated in multiple
+locations, making it hard for a human to write such YAML correctly.</p>
+<p>In relational database theory there is a design step called normalization in
+which you reorganize fields and tables.  The same considerations need to
+go into the design of your YAML encoding.  But, you may not want to change
+your existing native data structures.  Therefore, when writing out YAML
+there may be a normalization step, and when reading YAML there would be a
+corresponding denormalization step.</p>
+<p>YAML I/O uses a non-invasive, traits based design.  YAML I/O defines some
+abstract base templates.  You specialize those templates on your data types.
+For instance, if you have an enumerated type FooBar you could specialize
+ScalarEnumerationTraits on that type and define the enumeration() method:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarEnumerationTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarEnumerationTraits</span><span class="o"><</span><span class="n">FooBar</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">enumeration</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">FooBar</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>As with all YAML I/O template specializations, the ScalarEnumerationTraits is used for
+both reading and writing YAML. That is, the mapping between in-memory enum
+values and the YAML string representation is only in one place.
+This assures that the code for writing and parsing of YAML stays in sync.</p>
+<p>To specify a YAML mappings, you define a specialization on
+llvm::yaml::MappingTraits.
+If your native data structure happens to be a struct that is already normalized,
+then the specialization is simple.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Person</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span>         <span class="n">info</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"hat-size"</span><span class="p">,</span>     <span class="n">info</span><span class="p">.</span><span class="n">hatSize</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>A YAML sequence is automatically inferred if you data type has begin()/end()
+iterators and a push_back() method.  Therefore any of the STL containers
+(such as std::vector<>) will automatically translate to YAML sequences.</p>
+<p>Once you have defined specializations for your data types, you can
+programmatically use YAML I/O to write a YAML document:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Output</span><span class="p">;</span>
+
+<span class="n">Person</span> <span class="n">tom</span><span class="p">;</span>
+<span class="n">tom</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">"Tom"</span><span class="p">;</span>
+<span class="n">tom</span><span class="p">.</span><span class="n">hatSize</span> <span class="o">=</span> <span class="mi">8</span><span class="p">;</span>
+<span class="n">Person</span> <span class="n">dan</span><span class="p">;</span>
+<span class="n">dan</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="s">"Dan"</span><span class="p">;</span>
+<span class="n">dan</span><span class="p">.</span><span class="n">hatSize</span> <span class="o">=</span> <span class="mi">7</span><span class="p">;</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="n">persons</span><span class="p">;</span>
+<span class="n">persons</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">tom</span><span class="p">);</span>
+<span class="n">persons</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">dan</span><span class="p">);</span>
+
+<span class="n">Output</span> <span class="nf">yout</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">outs</span><span class="p">());</span>
+<span class="n">yout</span> <span class="o"><<</span> <span class="n">persons</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>This would write the following:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">8</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+</pre></div>
+</div>
+<p>And you can also read such YAML documents with the following code:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Input</span><span class="p">;</span>
+
+<span class="k">typedef</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="n">PersonList</span><span class="p">;</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">PersonList</span><span class="o">></span> <span class="n">docs</span><span class="p">;</span>
+
+<span class="n">Input</span> <span class="nf">yin</span><span class="p">(</span><span class="n">document</span><span class="p">.</span><span class="n">getBuffer</span><span class="p">());</span>
+<span class="n">yin</span> <span class="o">>></span> <span class="n">docs</span><span class="p">;</span>
+
+<span class="k">if</span> <span class="p">(</span> <span class="n">yin</span><span class="p">.</span><span class="n">error</span><span class="p">()</span> <span class="p">)</span>
+  <span class="k">return</span><span class="p">;</span>
+
+<span class="c1">// Process read document</span>
+<span class="k">for</span> <span class="p">(</span> <span class="n">PersonList</span> <span class="o">&</span><span class="nl">pl</span> <span class="p">:</span> <span class="n">docs</span> <span class="p">)</span> <span class="p">{</span>
+  <span class="k">for</span> <span class="p">(</span> <span class="n">Person</span> <span class="o">&</span><span class="nl">person</span> <span class="p">:</span> <span class="n">pl</span> <span class="p">)</span> <span class="p">{</span>
+    <span class="n">cout</span> <span class="o"><<</span> <span class="s">"name="</span> <span class="o"><<</span> <span class="n">person</span><span class="p">.</span><span class="n">name</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>One other feature of YAML is the ability to define multiple documents in a
+single file.  That is why reading YAML produces a vector of your document type.</p>
+</div>
+<div class="section" id="error-handling">
+<h2><a class="toc-backref" href="#id3">Error Handling</a><a class="headerlink" href="#error-handling" title="Permalink to this headline">¶</a></h2>
+<p>When parsing a YAML document, if the input does not match your schema (as
+expressed in your XxxTraits<> specializations).  YAML I/O
+will print out an error message and your Input object’s error() method will
+return true. For instance the following document:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+  <span class="l l-Scalar l-Scalar-Plain">shoe-size</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">12</span>
+<span class="p p-Indicator">-</span> <span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Dan</span>
+  <span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+</pre></div>
+</div>
+<p>Has a key (shoe-size) that is not defined in the schema.  YAML I/O will
+automatically generate this error:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">YAML:2:2</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">error</span><span class="p p-Indicator">:</span> <span class="l l-Scalar l-Scalar-Plain">unknown key 'shoe-size'</span>
+  <span class="l l-Scalar l-Scalar-Plain">shoe-size</span><span class="p p-Indicator">:</span>       <span class="l l-Scalar l-Scalar-Plain">12</span>
+  <span class="l l-Scalar l-Scalar-Plain">^~~~~~~~~</span>
+</pre></div>
+</div>
+<p>Similar errors are produced for other input not conforming to the schema.</p>
+</div>
+<div class="section" id="scalars">
+<h2><a class="toc-backref" href="#id4">Scalars</a><a class="headerlink" href="#scalars" title="Permalink to this headline">¶</a></h2>
+<p>YAML scalars are just strings (i.e. not a sequence or mapping).  The YAML I/O
+library provides support for translating between YAML scalars and specific
+C++ types.</p>
+<div class="section" id="built-in-types">
+<h3><a class="toc-backref" href="#id5">Built-in types</a><a class="headerlink" href="#built-in-types" title="Permalink to this headline">¶</a></h3>
+<p>The following types have built-in support in YAML I/O:</p>
+<ul class="simple">
+<li>bool</li>
+<li>float</li>
+<li>double</li>
+<li>StringRef</li>
+<li>std::string</li>
+<li>int64_t</li>
+<li>int32_t</li>
+<li>int16_t</li>
+<li>int8_t</li>
+<li>uint64_t</li>
+<li>uint32_t</li>
+<li>uint16_t</li>
+<li>uint8_t</li>
+</ul>
+<p>That is, you can use those types in fields of MappingTraits or as element type
+in sequence.  When reading, YAML I/O will validate that the string found
+is convertible to that type and error out if not.</p>
+</div>
+<div class="section" id="unique-types">
+<h3><a class="toc-backref" href="#id6">Unique types</a><a class="headerlink" href="#unique-types" title="Permalink to this headline">¶</a></h3>
+<p>Given that YAML I/O is trait based, the selection of how to convert your data
+to YAML is based on the type of your data.  But in C++ type matching, typedefs
+do not generate unique type names.  That means if you have two typedefs of
+unsigned int, to YAML I/O both types look exactly like unsigned int.  To
+facilitate make unique type names, YAML I/O provides a macro which is used
+like a typedef on built-in types, but expands to create a class with conversion
+operators to and from the base type.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">LLVM_YAML_STRONG_TYPEDEF</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">MyFooFlags</span><span class="p">)</span>
+<span class="n">LLVM_YAML_STRONG_TYPEDEF</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">MyBarFlags</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>This generates two classes MyFooFlags and MyBarFlags which you can use in your
+native data structures instead of uint32_t. They are implicitly
+converted to and from uint32_t.  The point of creating these unique types
+is that you can now specify traits on them to get different YAML conversions.</p>
+</div>
+<div class="section" id="hex-types">
+<h3><a class="toc-backref" href="#id7">Hex types</a><a class="headerlink" href="#hex-types" title="Permalink to this headline">¶</a></h3>
+<p>An example use of a unique type is that YAML I/O provides fixed sized unsigned
+integers that are written with YAML I/O as hexadecimal instead of the decimal
+format used by the built-in integer types:</p>
+<ul class="simple">
+<li>Hex64</li>
+<li>Hex32</li>
+<li>Hex16</li>
+<li>Hex8</li>
+</ul>
+<p>You can use llvm::yaml::Hex32 instead of uint32_t and the only different will
+be that when YAML I/O writes out that type it will be formatted in hexadecimal.</p>
+</div>
+<div class="section" id="scalarenumerationtraits">
+<h3><a class="toc-backref" href="#id8">ScalarEnumerationTraits</a><a class="headerlink" href="#scalarenumerationtraits" title="Permalink to this headline">¶</a></h3>
+<p>YAML I/O supports translating between in-memory enumerations and a set of string
+values in YAML documents. This is done by specializing ScalarEnumerationTraits<>
+on your enumeration type and define a enumeration() method.
+For instance, suppose you had an enumeration of CPUs and a struct with it as
+a field:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="n">CPUs</span> <span class="p">{</span>
+  <span class="n">cpu_x86_64</span>  <span class="o">=</span> <span class="mi">5</span><span class="p">,</span>
+  <span class="n">cpu_x86</span>     <span class="o">=</span> <span class="mi">7</span><span class="p">,</span>
+  <span class="n">cpu_PowerPC</span> <span class="o">=</span> <span class="mi">8</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="n">Info</span> <span class="p">{</span>
+  <span class="n">CPUs</span>      <span class="n">cpu</span><span class="p">;</span>
+  <span class="kt">uint32_t</span>  <span class="n">flags</span><span class="p">;</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>To support reading and writing of this enumeration, you can define a
+ScalarEnumerationTraits specialization on CPUs, which can then be used
+as a field type:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarEnumerationTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarEnumerationTraits</span><span class="o"><</span><span class="n">CPUs</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">enumeration</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">CPUs</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">enumCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"x86_64"</span><span class="p">,</span>  <span class="n">cpu_x86_64</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">enumCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"x86"</span><span class="p">,</span>     <span class="n">cpu_x86</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">enumCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"PowerPC"</span><span class="p">,</span> <span class="n">cpu_PowerPC</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Info</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Info</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"cpu"</span><span class="p">,</span>       <span class="n">info</span><span class="p">.</span><span class="n">cpu</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span>     <span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>When reading YAML, if the string found does not match any of the strings
+specified by enumCase() methods, an error is automatically generated.
+When writing YAML, if the value being written does not match any of the values
+specified by the enumCase() methods, a runtime assertion is triggered.</p>
+</div>
+<div class="section" id="bitvalue">
+<h3><a class="toc-backref" href="#id9">BitValue</a><a class="headerlink" href="#bitvalue" title="Permalink to this headline">¶</a></h3>
+<p>Another common data structure in C++ is a field where each bit has a unique
+meaning.  This is often used in a “flags” field.  YAML I/O has support for
+converting such fields to a flow sequence.   For instance suppose you
+had the following bit flags defined:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="p">{</span>
+  <span class="n">flagsPointy</span> <span class="o">=</span> <span class="mi">1</span>
+  <span class="n">flagsHollow</span> <span class="o">=</span> <span class="mi">2</span>
+  <span class="n">flagsFlat</span>   <span class="o">=</span> <span class="mi">4</span>
+  <span class="n">flagsRound</span>  <span class="o">=</span> <span class="mi">8</span>
+<span class="p">};</span>
+
+<span class="n">LLVM_YAML_STRONG_TYPEDEF</span><span class="p">(</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">MyFlags</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>To support reading and writing of MyFlags, you specialize ScalarBitSetTraits<>
+on MyFlags and provide the bit values and their names.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarBitSetTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarBitSetTraits</span><span class="o"><</span><span class="n">MyFlags</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">bitset</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyFlags</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"hollow"</span><span class="p">,</span>  <span class="n">flagHollow</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"flat"</span><span class="p">,</span>    <span class="n">flagFlat</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"round"</span><span class="p">,</span>   <span class="n">flagRound</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"pointy"</span><span class="p">,</span>  <span class="n">flagPointy</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="k">struct</span> <span class="n">Info</span> <span class="p">{</span>
+  <span class="n">StringRef</span>   <span class="n">name</span><span class="p">;</span>
+  <span class="n">MyFlags</span>     <span class="n">flags</span><span class="p">;</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Info</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Info</span><span class="o">&</span> <span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span>  <span class="n">info</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span> <span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">);</span>
+   <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>With the above, YAML I/O (when writing) will test mask each value in the
+bitset trait against the flags field, and each that matches will
+cause the corresponding string to be added to the flow sequence.  The opposite
+is done when reading and any unknown string values will result in a error. With
+the above schema, a same valid YAML document is:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>    <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">flags</span><span class="p p-Indicator">:</span>   <span class="p p-Indicator">[</span> <span class="nv">pointy</span><span class="p p-Indicator">,</span> <span class="nv">flat</span> <span class="p p-Indicator">]</span>
+</pre></div>
+</div>
+<p>Sometimes a “flags” field might contains an enumeration part
+defined by a bit-mask.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">enum</span> <span class="p">{</span>
+  <span class="n">flagsFeatureA</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
+  <span class="n">flagsFeatureB</span> <span class="o">=</span> <span class="mi">2</span><span class="p">,</span>
+  <span class="n">flagsFeatureC</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span>
+
+  <span class="n">flagsCPUMask</span> <span class="o">=</span> <span class="mi">24</span><span class="p">,</span>
+
+  <span class="n">flagsCPU1</span> <span class="o">=</span> <span class="mi">8</span><span class="p">,</span>
+  <span class="n">flagsCPU2</span> <span class="o">=</span> <span class="mi">16</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>To support reading and writing such fields, you need to use the maskedBitSet()
+method and provide the bit values, their names and the enumeration mask.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarBitSetTraits</span><span class="o"><</span><span class="n">MyFlags</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">bitset</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyFlags</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"featureA"</span><span class="p">,</span>  <span class="n">flagsFeatureA</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"featureB"</span><span class="p">,</span>  <span class="n">flagsFeatureB</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">bitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"featureC"</span><span class="p">,</span>  <span class="n">flagsFeatureC</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">maskedBitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"CPU1"</span><span class="p">,</span>  <span class="n">flagsCPU1</span><span class="p">,</span> <span class="n">flagsCPUMask</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">maskedBitSetCase</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s">"CPU2"</span><span class="p">,</span>  <span class="n">flagsCPU2</span><span class="p">,</span> <span class="n">flagsCPUMask</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>YAML I/O (when writing) will apply the enumeration mask to the flags field,
+and compare the result and values from the bitset. As in case of a regular
+bitset, each that matches will cause the corresponding string to be added
+to the flow sequence.</p>
+</div>
+<div class="section" id="custom-scalar">
+<h3><a class="toc-backref" href="#id10">Custom Scalar</a><a class="headerlink" href="#custom-scalar" title="Permalink to this headline">¶</a></h3>
+<p>Sometimes for readability a scalar needs to be formatted in a custom way. For
+instance your internal data structure may use a integer for time (seconds since
+some epoch), but in YAML it would be much nicer to express that integer in
+some time format (e.g. 4-May-2012 10:30pm).  YAML I/O has a way to support
+custom formatting and parsing of scalar types by specializing ScalarTraits<> on
+your data type.  When writing, YAML I/O will provide the native type and
+your specialization must create a temporary llvm::StringRef.  When reading,
+YAML I/O will provide an llvm::StringRef of scalar and your specialization
+must convert that to your native data type.  An outline of a custom scalar type
+looks like:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">ScalarTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">ScalarTraits</span><span class="o"><</span><span class="n">MyCustomType</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">output</span><span class="p">(</span><span class="k">const</span> <span class="n">MyCustomType</span> <span class="o">&</span><span class="n">value</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span><span class="p">,</span>
+                     <span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="n">out</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">out</span> <span class="o"><<</span> <span class="n">value</span><span class="p">;</span>  <span class="c1">// do custom formatting here</span>
+  <span class="p">}</span>
+  <span class="k">static</span> <span class="n">StringRef</span> <span class="n">input</span><span class="p">(</span><span class="n">StringRef</span> <span class="n">scalar</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span><span class="p">,</span> <span class="n">MyCustomType</span> <span class="o">&</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="c1">// do custom parsing here.  Return the empty string on success,</span>
+    <span class="c1">// or an error message on failure.</span>
+    <span class="k">return</span> <span class="n">StringRef</span><span class="p">();</span>
+  <span class="p">}</span>
+  <span class="c1">// Determine if this scalar needs quotes.</span>
+  <span class="k">static</span> <span class="kt">bool</span> <span class="n">mustQuote</span><span class="p">(</span><span class="n">StringRef</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">true</span><span class="p">;</span> <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="block-scalars">
+<h3><a class="toc-backref" href="#id11">Block Scalars</a><a class="headerlink" href="#block-scalars" title="Permalink to this headline">¶</a></h3>
+<p>YAML block scalars are string literals that are represented in YAML using the
+literal block notation, just like the example shown below:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">text</span><span class="p p-Indicator">:</span> <span class="p p-Indicator">|</span>
+  <span class="no">First line</span>
+  <span class="no">Second line</span>
+</pre></div>
+</div>
+<p>The YAML I/O library provides support for translating between YAML block scalars
+and specific C++ types by allowing you to specialize BlockScalarTraits<> on
+your data type. The library doesn’t provide any built-in support for block
+scalar I/O for types like std::string and llvm::StringRef as they are already
+supported by YAML I/O and use the ordinary scalar notation by default.</p>
+<p>BlockScalarTraits specializations are very similar to the
+ScalarTraits specialization - YAML I/O will provide the native type and your
+specialization must create a temporary llvm::StringRef when writing, and
+it will also provide an llvm::StringRef that has the value of that block scalar
+and your specialization must convert that to your native data type when reading.
+An example of a custom type with an appropriate specialization of
+BlockScalarTraits is shown below:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">BlockScalarTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">MyStringType</span> <span class="p">{</span>
+  <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">Str</span><span class="p">;</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">BlockScalarTraits</span><span class="o"><</span><span class="n">MyStringType</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">output</span><span class="p">(</span><span class="k">const</span> <span class="n">MyStringType</span> <span class="o">&</span><span class="n">Value</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">Ctxt</span><span class="p">,</span>
+                     <span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="n">OS</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">OS</span> <span class="o"><<</span> <span class="n">Value</span><span class="p">.</span><span class="n">Str</span><span class="p">;</span>
+  <span class="p">}</span>
+
+  <span class="k">static</span> <span class="n">StringRef</span> <span class="n">input</span><span class="p">(</span><span class="n">StringRef</span> <span class="n">Scalar</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">Ctxt</span><span class="p">,</span>
+                         <span class="n">MyStringType</span> <span class="o">&</span><span class="n">Value</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">Value</span><span class="p">.</span><span class="n">Str</span> <span class="o">=</span> <span class="n">Scalar</span><span class="p">.</span><span class="n">str</span><span class="p">();</span>
+    <span class="k">return</span> <span class="nf">StringRef</span><span class="p">();</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="mappings">
+<h2><a class="toc-backref" href="#id12">Mappings</a><a class="headerlink" href="#mappings" title="Permalink to this headline">¶</a></h2>
+<p>To be translated to or from a YAML mapping for your type T you must specialize
+llvm::yaml::MappingTraits on T and implement the “void mapping(IO &io, T&)”
+method. If your native data structures use pointers to a class everywhere,
+you can specialize on the class pointer.  Examples:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="c1">// Example of struct Foo which is used by value</span>
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Foo</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Foo</span> <span class="o">&</span><span class="n">foo</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"size"</span><span class="p">,</span>      <span class="n">foo</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="c1">// Example of struct Bar which is natively always a pointer</span>
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Bar</span><span class="o">*></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Bar</span> <span class="o">*&</span><span class="n">bar</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"size"</span><span class="p">,</span>    <span class="n">bar</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<div class="section" id="no-normalization">
+<h3><a class="toc-backref" href="#id13">No Normalization</a><a class="headerlink" href="#no-normalization" title="Permalink to this headline">¶</a></h3>
+<p>The mapping() method is responsible, if needed, for normalizing and
+denormalizing. In a simple case where the native data structure requires no
+normalization, the mapping method just uses mapOptional() or mapRequired() to
+bind the struct’s fields to YAML key names.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Person</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Person</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span>         <span class="n">info</span><span class="p">.</span><span class="n">name</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapOptional</span><span class="p">(</span><span class="s">"hat-size"</span><span class="p">,</span>     <span class="n">info</span><span class="p">.</span><span class="n">hatSize</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="normalization">
+<h3><a class="toc-backref" href="#id14">Normalization</a><a class="headerlink" href="#normalization" title="Permalink to this headline">¶</a></h3>
+<p>When [de]normalization is required, the mapping() method needs a way to access
+normalized values as fields. To help with this, there is
+a template MappingNormalization<> which you can then use to automatically
+do the normalization and denormalization.  The template is used to create
+a local variable in your mapping() method which contains the normalized keys.</p>
+<p>Suppose you have native data type
+Polar which specifies a position in polar coordinates (distance, angle):</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">struct</span> <span class="n">Polar</span> <span class="p">{</span>
+  <span class="kt">float</span> <span class="n">distance</span><span class="p">;</span>
+  <span class="kt">float</span> <span class="n">angle</span><span class="p">;</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>but you’ve decided the normalized YAML for should be in x,y coordinates. That
+is, you want the yaml to look like:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">x</span><span class="p p-Indicator">:</span>   <span class="l l-Scalar l-Scalar-Plain">10.3</span>
+<span class="l l-Scalar l-Scalar-Plain">y</span><span class="p p-Indicator">:</span>   <span class="l l-Scalar l-Scalar-Plain">-4.7</span>
+</pre></div>
+</div>
+<p>You can support this by defining a MappingTraits that normalizes the polar
+coordinates to x,y coordinates when writing YAML and denormalizes x,y
+coordinates into polar when reading YAML.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Polar</span><span class="o">></span> <span class="p">{</span>
+
+  <span class="k">class</span> <span class="nc">NormalizedPolar</span> <span class="p">{</span>
+  <span class="k">public</span><span class="o">:</span>
+    <span class="n">NormalizedPolar</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">)</span>
+      <span class="o">:</span> <span class="n">x</span><span class="p">(</span><span class="mf">0.0</span><span class="p">),</span> <span class="n">y</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span> <span class="p">{</span>
+    <span class="p">}</span>
+    <span class="n">NormalizedPolar</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="p">,</span> <span class="n">Polar</span> <span class="o">&</span><span class="n">polar</span><span class="p">)</span>
+      <span class="o">:</span> <span class="n">x</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">distance</span> <span class="o">*</span> <span class="n">cos</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">angle</span><span class="p">)),</span>
+        <span class="n">y</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">distance</span> <span class="o">*</span> <span class="n">sin</span><span class="p">(</span><span class="n">polar</span><span class="p">.</span><span class="n">angle</span><span class="p">))</span> <span class="p">{</span>
+    <span class="p">}</span>
+    <span class="n">Polar</span> <span class="n">denormalize</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="p">)</span> <span class="p">{</span>
+      <span class="k">return</span> <span class="n">Polar</span><span class="p">(</span><span class="n">sqrt</span><span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="p">),</span> <span class="n">arctan</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">));</span>
+    <span class="p">}</span>
+
+    <span class="kt">float</span>        <span class="n">x</span><span class="p">;</span>
+    <span class="kt">float</span>        <span class="n">y</span><span class="p">;</span>
+  <span class="p">};</span>
+
+  <span class="k">static</span> <span class="kt">void</span> <span class="nf">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Polar</span> <span class="o">&</span><span class="n">polar</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">MappingNormalization</span><span class="o"><</span><span class="n">NormalizedPolar</span><span class="p">,</span> <span class="n">Polar</span><span class="o">></span> <span class="n">keys</span><span class="p">(</span><span class="n">io</span><span class="p">,</span> <span class="n">polar</span><span class="p">);</span>
+
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span>    <span class="n">keys</span><span class="o">-></span><span class="n">x</span><span class="p">);</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"y"</span><span class="p">,</span>    <span class="n">keys</span><span class="o">-></span><span class="n">y</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>When writing YAML, the local variable “keys” will be a stack allocated
+instance of NormalizedPolar, constructed from the supplied polar object which
+initializes it x and y fields.  The mapRequired() methods then write out the x
+and y values as key/value pairs.</p>
+<p>When reading YAML, the local variable “keys” will be a stack allocated instance
+of NormalizedPolar, constructed by the empty constructor.  The mapRequired
+methods will find the matching key in the YAML document and fill in the x and y
+fields of the NormalizedPolar object keys. At the end of the mapping() method
+when the local keys variable goes out of scope, the denormalize() method will
+automatically be called to convert the read values back to polar coordinates,
+and then assigned back to the second parameter to mapping().</p>
+<p>In some cases, the normalized class may be a subclass of the native type and
+could be returned by the denormalize() method, except that the temporary
+normalized instance is stack allocated.  In these cases, the utility template
+MappingNormalizationHeap<> can be used instead.  It just like
+MappingNormalization<> except that it heap allocates the normalized object
+when reading YAML.  It never destroys the normalized object.  The denormalize()
+method can this return “this”.</p>
+</div>
+<div class="section" id="default-values">
+<h3><a class="toc-backref" href="#id15">Default values</a><a class="headerlink" href="#default-values" title="Permalink to this headline">¶</a></h3>
+<p>Within a mapping() method, calls to io.mapRequired() mean that that key is
+required to exist when parsing YAML documents, otherwise YAML I/O will issue an
+error.</p>
+<p>On the other hand, keys registered with io.mapOptional() are allowed to not
+exist in the YAML document being read.  So what value is put in the field
+for those optional keys?
+There are two steps to how those optional fields are filled in. First, the
+second parameter to the mapping() method is a reference to a native class.  That
+native class must have a default constructor.  Whatever value the default
+constructor initially sets for an optional field will be that field’s value.
+Second, the mapOptional() method has an optional third parameter.  If provided
+it is the value that mapOptional() should set that field to if the YAML document
+does not have that key.</p>
+<p>There is one important difference between those two ways (default constructor
+and third parameter to mapOptional). When YAML I/O generates a YAML document,
+if the mapOptional() third parameter is used, if the actual value being written
+is the same as (using ==) the default value, then that key/value is not written.</p>
+</div>
+<div class="section" id="order-of-keys">
+<h3><a class="toc-backref" href="#id16">Order of Keys</a><a class="headerlink" href="#order-of-keys" title="Permalink to this headline">¶</a></h3>
+<p>When writing out a YAML document, the keys are written in the order that the
+calls to mapRequired()/mapOptional() are made in the mapping() method. This
+gives you a chance to write the fields in an order that a human reader of
+the YAML document would find natural.  This may be different that the order
+of the fields in the native class.</p>
+<p>When reading in a YAML document, the keys in the document can be in any order,
+but they are processed in the order that the calls to mapRequired()/mapOptional()
+are made in the mapping() method.  That enables some interesting
+functionality.  For instance, if the first field bound is the cpu and the second
+field bound is flags, and the flags are cpu specific, you can programmatically
+switch how the flags are converted to and from YAML based on the cpu.
+This works for both reading and writing. For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">Info</span> <span class="p">{</span>
+  <span class="n">CPUs</span>        <span class="n">cpu</span><span class="p">;</span>
+  <span class="kt">uint32_t</span>    <span class="n">flags</span><span class="p">;</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Info</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Info</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+    <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"cpu"</span><span class="p">,</span>       <span class="n">info</span><span class="p">.</span><span class="n">cpu</span><span class="p">);</span>
+    <span class="c1">// flags must come after cpu for this to work when reading yaml</span>
+    <span class="k">if</span> <span class="p">(</span> <span class="n">info</span><span class="p">.</span><span class="n">cpu</span> <span class="o">==</span> <span class="n">cpu_x86_64</span> <span class="p">)</span>
+      <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span>  <span class="o">*</span><span class="p">(</span><span class="n">My86_64Flags</span><span class="o">*</span><span class="p">)</span><span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">);</span>
+    <span class="k">else</span>
+      <span class="n">io</span><span class="p">.</span><span class="n">mapRequired</span><span class="p">(</span><span class="s">"flags"</span><span class="p">,</span>  <span class="o">*</span><span class="p">(</span><span class="n">My86Flags</span><span class="o">*</span><span class="p">)</span><span class="n">info</span><span class="p">.</span><span class="n">flags</span><span class="p">);</span>
+ <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="tags">
+<h3><a class="toc-backref" href="#id17">Tags</a><a class="headerlink" href="#tags" title="Permalink to this headline">¶</a></h3>
+<p>The YAML syntax supports tags as a way to specify the type of a node before
+it is parsed. This allows dynamic types of nodes.  But the YAML I/O model uses
+static typing, so there are limits to how you can use tags with the YAML I/O
+model. Recently, we added support to YAML I/O for checking/setting the optional
+tag on a map. Using this functionality it is even possbile to support different
+mappings, as long as they are convertible.</p>
+<p>To check a tag, inside your mapping() method you can use io.mapTag() to specify
+what the tag should be.  This will also add that tag when writing yaml.</p>
+</div>
+<div class="section" id="validation">
+<h3><a class="toc-backref" href="#id18">Validation</a><a class="headerlink" href="#validation" title="Permalink to this headline">¶</a></h3>
+<p>Sometimes in a yaml map, each key/value pair is valid, but the combination is
+not.  This is similar to something having no syntax errors, but still having
+semantic errors.  To support semantic level checking, YAML I/O allows
+an optional <code class="docutils literal"><span class="pre">validate()</span></code> method in a MappingTraits template specialization.</p>
+<p>When parsing yaml, the <code class="docutils literal"><span class="pre">validate()</span></code> method is call <em>after</em> all key/values in
+the map have been processed. Any error message returned by the <code class="docutils literal"><span class="pre">validate()</span></code>
+method during input will be printed just a like a syntax error would be printed.
+When writing yaml, the <code class="docutils literal"><span class="pre">validate()</span></code> method is called <em>before</em> the yaml
+key/values  are written.  Any error during output will trigger an <code class="docutils literal"><span class="pre">assert()</span></code>
+because it is a programming error to have invalid struct values.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">Stuff</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Stuff</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Stuff</span> <span class="o">&</span><span class="n">stuff</span><span class="p">)</span> <span class="p">{</span>
+  <span class="p">...</span>
+  <span class="p">}</span>
+  <span class="k">static</span> <span class="n">StringRef</span> <span class="n">validate</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Stuff</span> <span class="o">&</span><span class="n">stuff</span><span class="p">)</span> <span class="p">{</span>
+    <span class="c1">// Look at all fields in 'stuff' and if there</span>
+    <span class="c1">// are any bad values return a string describing</span>
+    <span class="c1">// the error.  Otherwise return an empty string.</span>
+    <span class="k">return</span> <span class="n">StringRef</span><span class="p">();</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="flow-mapping">
+<h3><a class="toc-backref" href="#id19">Flow Mapping</a><a class="headerlink" href="#flow-mapping" title="Permalink to this headline">¶</a></h3>
+<p>A YAML “flow mapping” is a mapping that uses the inline notation
+(e.g { x: 1, y: 0 } ) when written to YAML. To specify that a type should be
+written in YAML using flow mapping, your MappingTraits specialization should
+add “static const bool flow = true;”. For instance:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">MappingTraits</span><span class="p">;</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">IO</span><span class="p">;</span>
+
+<span class="k">struct</span> <span class="n">Stuff</span> <span class="p">{</span>
+  <span class="p">...</span>
+<span class="p">};</span>
+
+<span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">MappingTraits</span><span class="o"><</span><span class="n">Stuff</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">void</span> <span class="n">mapping</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">Stuff</span> <span class="o">&</span><span class="n">stuff</span><span class="p">)</span> <span class="p">{</span>
+    <span class="p">...</span>
+  <span class="p">}</span>
+
+  <span class="k">static</span> <span class="k">const</span> <span class="kt">bool</span> <span class="n">flow</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>Flow mappings are subject to line wrapping according to the Output object
+configuration.</p>
+</div>
+</div>
+<div class="section" id="sequence">
+<h2><a class="toc-backref" href="#id20">Sequence</a><a class="headerlink" href="#sequence" title="Permalink to this headline">¶</a></h2>
+<p>To be translated to or from a YAML sequence for your type T you must specialize
+llvm::yaml::SequenceTraits on T and implement two methods:
+<code class="docutils literal"><span class="pre">size_t</span> <span class="pre">size(IO</span> <span class="pre">&io,</span> <span class="pre">T&)</span></code> and
+<code class="docutils literal"><span class="pre">T::value_type&</span> <span class="pre">element(IO</span> <span class="pre">&io,</span> <span class="pre">T&,</span> <span class="pre">size_t</span> <span class="pre">indx)</span></code>.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">SequenceTraits</span><span class="o"><</span><span class="n">MySeq</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MySeq</span> <span class="o">&</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="n">MySeqEl</span> <span class="o">&</span><span class="n">element</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MySeq</span> <span class="o">&</span><span class="n">list</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>The size() method returns how many elements are currently in your sequence.
+The element() method returns a reference to the i’th element in the sequence.
+When parsing YAML, the element() method may be called with an index one bigger
+than the current size.  Your element() method should allocate space for one
+more element (using default constructor if element is a C++ object) and returns
+a reference to that new allocated space.</p>
+<div class="section" id="flow-sequence">
+<h3><a class="toc-backref" href="#id21">Flow Sequence</a><a class="headerlink" href="#flow-sequence" title="Permalink to this headline">¶</a></h3>
+<p>A YAML “flow sequence” is a sequence that when written to YAML it uses the
+inline notation (e.g [ foo, bar ] ).  To specify that a sequence type should
+be written in YAML as a flow sequence, your SequenceTraits specialization should
+add “static const bool flow = true;”.  For instance:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">SequenceTraits</span><span class="o"><</span><span class="n">MyList</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyList</span> <span class="o">&</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="n">MyListEl</span> <span class="o">&</span><span class="n">element</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyList</span> <span class="o">&</span><span class="n">list</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+
+  <span class="c1">// The existence of this member causes YAML I/O to use a flow sequence</span>
+  <span class="k">static</span> <span class="k">const</span> <span class="kt">bool</span> <span class="n">flow</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+<p>With the above, if you used MyList as the data type in your native data
+structures, then when converted to YAML, a flow sequence of integers
+will be used (e.g. [ 10, -3, 4 ]).</p>
+<p>Flow sequences are subject to line wrapping according to the Output object
+configuration.</p>
+</div>
+<div class="section" id="utility-macros">
+<h3><a class="toc-backref" href="#id22">Utility Macros</a><a class="headerlink" href="#utility-macros" title="Permalink to this headline">¶</a></h3>
+<p>Since a common source of sequences is std::vector<>, YAML I/O provides macros:
+LLVM_YAML_IS_SEQUENCE_VECTOR() and LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR() which
+can be used to easily specify SequenceTraits<> on a std::vector type.  YAML
+I/O does not partial specialize SequenceTraits on std::vector<> because that
+would force all vectors to be sequences.  An example use of the macros:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyType1</span><span class="o">></span><span class="p">;</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyType2</span><span class="o">></span><span class="p">;</span>
+<span class="n">LLVM_YAML_IS_SEQUENCE_VECTOR</span><span class="p">(</span><span class="n">MyType1</span><span class="p">)</span>
+<span class="n">LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR</span><span class="p">(</span><span class="n">MyType2</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="document-list">
+<h2><a class="toc-backref" href="#id23">Document List</a><a class="headerlink" href="#document-list" title="Permalink to this headline">¶</a></h2>
+<p>YAML allows you to define multiple “documents” in a single YAML file.  Each
+new document starts with a left aligned “—” token.  The end of all documents
+is denoted with a left aligned “…” token.  Many users of YAML will never
+have need for multiple documents.  The top level node in their YAML schema
+will be a mapping or sequence. For those cases, the following is not needed.
+But for cases where you do want multiple documents, you can specify a
+trait for you document list type.  The trait has the same methods as
+SequenceTraits but is named DocumentListTraits.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">template</span> <span class="o"><></span>
+<span class="k">struct</span> <span class="n">DocumentListTraits</span><span class="o"><</span><span class="n">MyDocList</span><span class="o">></span> <span class="p">{</span>
+  <span class="k">static</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyDocList</span> <span class="o">&</span><span class="n">list</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+  <span class="k">static</span> <span class="n">MyDocType</span> <span class="n">element</span><span class="p">(</span><span class="n">IO</span> <span class="o">&</span><span class="n">io</span><span class="p">,</span> <span class="n">MyDocList</span> <span class="o">&</span><span class="n">list</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">index</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
+<span class="p">};</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="user-context-data">
+<h2><a class="toc-backref" href="#id24">User Context Data</a><a class="headerlink" href="#user-context-data" title="Permalink to this headline">¶</a></h2>
+<p>When an llvm::yaml::Input or llvm::yaml::Output object is created their
+constructors take an optional “context” parameter.  This is a pointer to
+whatever state information you might need.</p>
+<p>For instance, in a previous example we showed how the conversion type for a
+flags field could be determined at runtime based on the value of another field
+in the mapping. But what if an inner mapping needs to know some field value
+of an outer mapping?  That is where the “context” parameter comes in. You
+can set values in the context in the outer map’s mapping() method and
+retrieve those values in the inner map’s mapping() method.</p>
+<p>The context value is just a void*.  All your traits which use the context
+and operate on your native data types, need to agree what the context value
+actually is.  It could be a pointer to an object or struct which your various
+traits use to shared context sensitive information.</p>
+</div>
+<div class="section" id="output">
+<h2><a class="toc-backref" href="#id25">Output</a><a class="headerlink" href="#output" title="Permalink to this headline">¶</a></h2>
+<p>The llvm::yaml::Output class is used to generate a YAML document from your
+in-memory data structures, using traits defined on your data types.
+To instantiate an Output object you need an llvm::raw_ostream, an optional
+context pointer and an optional wrapping column:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Output</span> <span class="o">:</span> <span class="k">public</span> <span class="n">IO</span> <span class="p">{</span>
+<span class="k">public</span><span class="o">:</span>
+  <span class="n">Output</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">raw_ostream</span> <span class="o">&</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">context</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">,</span> <span class="kt">int</span> <span class="n">WrapColumn</span> <span class="o">=</span> <span class="mi">70</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Once you have an Output object, you can use the C++ stream operator on it
+to write your native data as YAML. One thing to recall is that a YAML file
+can contain multiple “documents”.  If the top level data structure you are
+streaming as YAML is a mapping, scalar, or sequence, then Output assumes you
+are generating one document and wraps the mapping output
+with  “<code class="docutils literal"><span class="pre">---</span></code>” and trailing “<code class="docutils literal"><span class="pre">...</span></code>”.</p>
+<p>The WrapColumn parameter will cause the flow mappings and sequences to
+line-wrap when they go over the supplied column. Pass 0 to completely
+suppress the wrapping.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Output</span><span class="p">;</span>
+
+<span class="kt">void</span> <span class="nf">dumpMyMapDoc</span><span class="p">(</span><span class="k">const</span> <span class="n">MyMapType</span> <span class="o">&</span><span class="n">info</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">Output</span> <span class="n">yout</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">outs</span><span class="p">());</span>
+  <span class="n">yout</span> <span class="o"><<</span> <span class="n">info</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The above could produce output like:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="nn">---</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+<span class="nn">...</span>
+</pre></div>
+</div>
+<p>On the other hand, if the top level data structure you are streaming as YAML
+has a DocumentListTraits specialization, then Output walks through each element
+of your DocumentList and generates a “—” before the start of each element
+and ends with a “…”.</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Output</span><span class="p">;</span>
+
+<span class="kt">void</span> <span class="nf">dumpMyMapDoc</span><span class="p">(</span><span class="k">const</span> <span class="n">MyDocListType</span> <span class="o">&</span><span class="n">docList</span><span class="p">)</span> <span class="p">{</span>
+  <span class="n">Output</span> <span class="n">yout</span><span class="p">(</span><span class="n">llvm</span><span class="o">::</span><span class="n">outs</span><span class="p">());</span>
+  <span class="n">yout</span> <span class="o"><<</span> <span class="n">docList</span><span class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+</div>
+<p>The above could produce output like:</p>
+<div class="highlight-yaml"><div class="highlight"><pre><span></span><span class="nn">---</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">hat-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">7</span>
+<span class="nn">---</span>
+<span class="l l-Scalar l-Scalar-Plain">name</span><span class="p p-Indicator">:</span>      <span class="l l-Scalar l-Scalar-Plain">Tom</span>
+<span class="l l-Scalar l-Scalar-Plain">shoe-size</span><span class="p p-Indicator">:</span>  <span class="l l-Scalar l-Scalar-Plain">11</span>
+<span class="nn">...</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="input">
+<h2><a class="toc-backref" href="#id26">Input</a><a class="headerlink" href="#input" title="Permalink to this headline">¶</a></h2>
+<p>The llvm::yaml::Input class is used to parse YAML document(s) into your native
+data structures. To instantiate an Input
+object you need a StringRef to the entire YAML file, and optionally a context
+pointer:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Input</span> <span class="o">:</span> <span class="k">public</span> <span class="n">IO</span> <span class="p">{</span>
+<span class="k">public</span><span class="o">:</span>
+  <span class="n">Input</span><span class="p">(</span><span class="n">StringRef</span> <span class="n">inputContent</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">context</span><span class="o">=</span><span class="nb">NULL</span><span class="p">);</span>
+</pre></div>
+</div>
+<p>Once you have an Input object, you can use the C++ stream operator to read
+the document(s).  If you expect there might be multiple YAML documents in
+one file, you’ll need to specialize DocumentListTraits on a list of your
+document type and stream in that document list type.  Otherwise you can
+just stream in the document type.  Also, you can check if there was
+any syntax errors in the YAML be calling the error() method on the Input
+object.  For example:</p>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// Reading a single document</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Input</span><span class="p">;</span>
+
+<span class="n">Input</span> <span class="nf">yin</span><span class="p">(</span><span class="n">mb</span><span class="p">.</span><span class="n">getBuffer</span><span class="p">());</span>
+
+<span class="c1">// Parse the YAML file</span>
+<span class="n">MyDocType</span> <span class="n">theDoc</span><span class="p">;</span>
+<span class="n">yin</span> <span class="o">>></span> <span class="n">theDoc</span><span class="p">;</span>
+
+<span class="c1">// Check for error</span>
+<span class="k">if</span> <span class="p">(</span> <span class="n">yin</span><span class="p">.</span><span class="n">error</span><span class="p">()</span> <span class="p">)</span>
+  <span class="k">return</span><span class="p">;</span>
+</pre></div>
+</div>
+<div class="highlight-c++"><div class="highlight"><pre><span></span><span class="c1">// Reading multiple documents in one file</span>
+<span class="k">using</span> <span class="n">llvm</span><span class="o">::</span><span class="n">yaml</span><span class="o">::</span><span class="n">Input</span><span class="p">;</span>
+
+<span class="n">LLVM_YAML_IS_DOCUMENT_LIST_VECTOR</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyDocType</span><span class="o">></span><span class="p">)</span>
+
+<span class="n">Input</span> <span class="n">yin</span><span class="p">(</span><span class="n">mb</span><span class="p">.</span><span class="n">getBuffer</span><span class="p">());</span>
+
+<span class="c1">// Parse the YAML file</span>
+<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MyDocType</span><span class="o">></span> <span class="n">theDocList</span><span class="p">;</span>
+<span class="n">yin</span> <span class="o">>></span> <span class="n">theDocList</span><span class="p">;</span>
+
+<span class="c1">// Check for error</span>
+<span class="k">if</span> <span class="p">(</span> <span class="n">yin</span><span class="p">.</span><span class="n">error</span><span class="p">()</span> <span class="p">)</span>
+  <span class="k">return</span><span class="p">;</span>
+</pre></div>
+</div>
+</div>
+</div>
+
+
+          </div>
+      </div>
+      <div class="clearer"></div>
+    </div>
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             >index</a></li>
+        <li class="right" >
+          <a href="GetElementPtr.html" title="The Often Misunderstood GEP Instruction"
+             >next</a> |</li>
+        <li class="right" >
+          <a href="Passes.html" title="LLVM’s Analysis and Transform Passes"
+             >previous</a> |</li>
+  <li><a href="http://llvm.org/">LLVM Home</a> | </li>
+  <li><a href="index.html">Documentation</a>»</li>
+ 
+      </ul>
+    </div>
+    <div class="footer" role="contentinfo">
+        © Copyright 2003-2017, LLVM Project.
+      Last updated on 2017-12-20.
+      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.5.
+    </div>
+  </body>
+</html>
\ No newline at end of file

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastfail.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastfail.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastfail.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastsuccess.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastsuccess.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-bitcastsuccess.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ld1.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-ld1.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ld1.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ldr.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/ARM-BE-ldr.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/ARM-BE-ldr.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/LangImpl05-cfg.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/LangImpl05-cfg.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/LangImpl05-cfg.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-creation.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-creation.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-creation.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-dyld-load.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-dyld-load.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-dyld-load.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-engine-builder.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-engine-builder.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-engine-builder.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-load-object.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-load-object.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-load-object.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-load.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-load.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-load.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/MCJIT-resolve-relocations.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/MCJIT-resolve-relocations.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/MCJIT-resolve-relocations.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/gcc-loops.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/gcc-loops.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/gcc-loops.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_images/linpack-pc.png
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_images/linpack-pc.png?rev=321287&view=auto
==============================================================================
Binary file - no diff available.

Propchange: www-releases/trunk/5.0.1/docs/_images/linpack-pc.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Added: www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt
URL: http://llvm.org/viewvc/llvm-project/www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt?rev=321287&view=auto
==============================================================================
--- www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt (added)
+++ www-releases/trunk/5.0.1/docs/_sources/AMDGPUUsage.rst.txt Thu Dec 21 10:09:53 2017
@@ -0,0 +1,3761 @@
+=============================
+User Guide for AMDGPU Backend
+=============================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+The AMDGPU backend provides ISA code generation for AMD GPUs, starting with the
+R600 family up until the current GCN families. It lives in the
+``lib/Target/AMDGPU`` directory.
+
+LLVM
+====
+
+.. _amdgpu-target-triples:
+
+Target Triples
+--------------
+
+Use the ``clang -target <Architecture>-<Vendor>-<OS>-<Environment>`` option to
+specify the target triple:
+
+  .. table:: AMDGPU Target Triples
+     :name: amdgpu-target-triples-table
+
+     ============ ======== ========= ===========
+     Architecture Vendor   OS        Environment
+     ============ ======== ========= ===========
+     r600         amd      <empty>   <empty>
+     amdgcn       amd      <empty>   <empty>
+     amdgcn       amd      amdhsa    <empty>
+     amdgcn       amd      amdhsa    opencl
+     amdgcn       amd      amdhsa    amdgizcl
+     amdgcn       amd      amdhsa    amdgiz
+     amdgcn       amd      amdhsa    hcc
+     ============ ======== ========= ===========
+
+``r600-amd--``
+  Supports AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders executed on
+  the MESA runtime.
+
+``amdgcn-amd--``
+  Supports AMD GPUs GCN 6 onwards for graphics and compute shaders executed on
+  the MESA runtime.
+
+``amdgcn-amd-amdhsa-``
+  Supports AMD GCN GPUs GFX6 onwards for compute kernels executed on HSA [HSA]_
+  compatible runtimes such as AMD's ROCm [AMD-ROCm]_.
+
+``amdgcn-amd-amdhsa-opencl``
+  Supports AMD GCN GPUs GFX6 onwards for OpenCL compute kernels executed on HSA
+  [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
+  :ref:`amdgpu-opencl`.
+
+``amdgcn-amd-amdhsa-amdgizcl``
+  Same as ``amdgcn-amd-amdhsa-opencl`` except a different address space mapping
+  is used (see :ref:`amdgpu-address-spaces`).
+
+``amdgcn-amd-amdhsa-amdgiz``
+  Same as ``amdgcn-amd-amdhsa-`` except a different address space mapping is
+  used (see :ref:`amdgpu-address-spaces`).
+
+``amdgcn-amd-amdhsa-hcc``
+  Supports AMD GCN GPUs GFX6 onwards for AMD HC language compute kernels
+  executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
+  :ref:`amdgpu-hcc`.
+
+.. _amdgpu-processors:
+
+Processors
+----------
+
+Use the ``clang -mcpu <Processor>`` option to specify the AMD GPU processor. The
+names from both the *Processor* and *Alternative Processor* can be used.
+
+  .. table:: AMDGPU Processors
+     :name: amdgpu-processors-table
+
+     ========== =========== ============ ===== ======= ==================
+     Processor  Alternative Target       dGPU/ Runtime Example
+                Processor   Triple       APU   Support Products
+                            Architecture
+     ========== =========== ============ ===== ======= ==================
+     **R600** [AMD-R6xx]_
+     --------------------------------------------------------------------
+     r600                   r600         dGPU
+     r630                   r600         dGPU
+     rs880                  r600         dGPU
+     rv670                  r600         dGPU
+     **R700** [AMD-R7xx]_
+     --------------------------------------------------------------------
+     rv710                  r600         dGPU
+     rv730                  r600         dGPU
+     rv770                  r600         dGPU
+     **Evergreen** [AMD-Evergreen]_
+     --------------------------------------------------------------------
+     cedar                  r600         dGPU
+     redwood                r600         dGPU
+     sumo                   r600         dGPU
+     juniper                r600         dGPU
+     cypress                r600         dGPU
+     **Northern Islands** [AMD-Cayman-Trinity]_
+     --------------------------------------------------------------------
+     barts                  r600         dGPU
+     turks                  r600         dGPU
+     caicos                 r600         dGPU
+     cayman                 r600         dGPU
+     **GCN GFX6 (Southern Islands (SI))** [AMD-Souther-Islands]_
+     --------------------------------------------------------------------
+     gfx600     - SI        amdgcn       dGPU
+                - tahiti
+     gfx601     - pitcairn  amdgcn       dGPU
+                - verde
+                - oland
+                - hainan
+     **GCN GFX7 (Sea Islands (CI))** [AMD-Sea-Islands]_
+     --------------------------------------------------------------------
+     gfx700     - bonaire   amdgcn       dGPU          - Radeon HD 7790
+                                                       - Radeon HD 8770
+                                                       - R7 260
+                                                       - R7 260X
+     \          - kaveri    amdgcn       APU           - A6-7000
+                                                       - A6 Pro-7050B
+                                                       - A8-7100
+                                                       - A8 Pro-7150B
+                                                       - A10-7300
+                                                       - A10 Pro-7350B
+                                                       - FX-7500
+                                                       - A8-7200P
+                                                       - A10-7400P
+                                                       - FX-7600P
+     gfx701     - hawaii    amdgcn       dGPU  ROCm    - FirePro W8100
+                                                       - FirePro W9100
+                                                       - FirePro S9150
+                                                       - FirePro S9170
+     gfx702                              dGPU  ROCm    - Radeon R9 290
+                                                       - Radeon R9 290x
+                                                       - Radeon R390
+                                                       - Radeon R390x
+     gfx703     - kabini    amdgcn       APU           - E1-2100
+                - mullins                              - E1-2200
+                                                       - E1-2500
+                                                       - E2-3000
+                                                       - E2-3800
+                                                       - A4-5000
+                                                       - A4-5100
+                                                       - A6-5200
+                                                       - A4 Pro-3340B
+     **GCN GFX8 (Volcanic Islands (VI))** [AMD-Volcanic-Islands]_
+     --------------------------------------------------------------------
+     gfx800     - iceland   amdgcn       dGPU          - FirePro S7150
+                                                       - FirePro S7100
+                                                       - FirePro W7100
+                                                       - Radeon R285
+                                                       - Radeon R9 380
+                                                       - Radeon R9 385
+                                                       - Mobile FirePro
+                                                         M7170
+     gfx801     - carrizo   amdgcn       APU           - A6-8500P
+                                                       - Pro A6-8500B
+                                                       - A8-8600P
+                                                       - Pro A8-8600B
+                                                       - FX-8800P
+                                                       - Pro A12-8800B
+     \                      amdgcn       APU   ROCm    - A10-8700P
+                                                       - Pro A10-8700B
+                                                       - A10-8780P
+     \                      amdgcn       APU           - A10-9600P
+                                                       - A10-9630P
+                                                       - A12-9700P
+                                                       - A12-9730P
+                                                       - FX-9800P
+                                                       - FX-9830P
+     \                      amdgcn       APU           - E2-9010
+                                                       - A6-9210
+                                                       - A9-9410
+     gfx802     - tonga     amdgcn       dGPU  ROCm    Same as gfx800
+     gfx803     - fiji      amdgcn       dGPU  ROCm    - Radeon R9 Nano
+                                                       - Radeon R9 Fury
+                                                       - Radeon R9 FuryX
+                                                       - Radeon Pro Duo
+                                                       - FirePro S9300x2
+     \          - polaris10 amdgcn       dGPU  ROCm    - Radeon RX 470
+                                                       - Radeon RX 480
+     \          - polaris11 amdgcn       dGPU  ROCm    - Radeon RX 460
+     gfx804                 amdgcn       dGPU          Same as gfx803
+     gfx810     - stoney    amdgcn       APU
+     **GCN GFX9**
+     --------------------------------------------------------------------
+     gfx900                 amdgcn       dGPU          - Radeon Vega Frontier Edition
+     gfx901                 amdgcn       dGPU  ROCm    Same as gfx900
+                                                       except XNACK is
+                                                       enabled
+     gfx902                 amdgcn       APU           *TBA*
+
+                                                       .. TODO
+                                                          Add product
+                                                          names.
+     gfx903                 amdgcn       APU           Same as gfx902
+                                                       except XNACK is
+                                                       enabled
+     ========== =========== ============ ===== ======= ==================
+
+.. _amdgpu-address-spaces:
+
+Address Spaces
+--------------
+
+The AMDGPU backend uses the following address space mappings.
+
+The memory space names used in the table, aside from the region memory space, is
+from the OpenCL standard.
+
+LLVM Address Space number is used throughout LLVM (for example, in LLVM IR).
+
+  .. table:: Address Space Mapping
+     :name: amdgpu-address-space-mapping-table
+
+     ================== ================= ================= ================= =================
+     LLVM Address Space Memory Space
+     ------------------ -----------------------------------------------------------------------
+     \                  Current Default   amdgiz/amdgizcl   hcc               Future Default
+     ================== ================= ================= ================= =================
+     0                  Private (Scratch) Generic (Flat)    Generic (Flat)    Generic (Flat)
+     1                  Global            Global            Global            Global
+     2                  Constant          Constant          Constant          Region (GDS)
+     3                  Local (group/LDS) Local (group/LDS) Local (group/LDS) Local (group/LDS)
+     4                  Generic (Flat)    Region (GDS)      Region (GDS)      Constant
+     5                  Region (GDS)      Private (Scratch) Private (Scratch) Private (Scratch)
+     ================== ================= ================= ================= =================
+
+Current Default
+  This is the current default address space mapping used for all languages
+  except hcc. This will shortly be deprecated.
+
+amdgiz/amdgizcl
+  This is the current address space mapping used when ``amdgiz`` or ``amdgizcl``
+  is specified as the target triple environment value.
+
+hcc
+  This is the current address space mapping used when ``hcc`` is specified as
+  the target triple environment value.This will shortly be deprecated.
+
+Future Default
+  This will shortly be the only address space mapping for all languages using
+  AMDGPU backend.
+
+.. _amdgpu-memory-scopes:
+
+Memory Scopes
+-------------
+
+This section provides LLVM memory synchronization scopes supported by the AMDGPU
+backend memory model when the target triple OS is ``amdhsa`` (see
+:ref:`amdgpu-amdhsa-memory-model` and :ref:`amdgpu-target-triples`).
+
+The memory model supported is based on the HSA memory model [HSA]_ which is
+based in turn on HRF-indirect with scope inclusion [HRF]_. The happens-before
+relation is transitive over the synchonizes-with relation independent of scope,
+and synchonizes-with allows the memory scope instances to be inclusive (see
+table :ref:`amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table`).
+
+This is different to the OpenCL [OpenCL]_ memory model which does not have scope
+inclusion and requires the memory scopes to exactly match. However, this
+is conservatively correct for OpenCL.
+
+  .. table:: AMDHSA LLVM Sync Scopes for AMDHSA
+     :name: amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table
+
+     ================ ==========================================================
+     LLVM Sync Scope  Description
+     ================ ==========================================================
+     *none*           The default: ``system``.
+
+                      Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system``.
+                      - ``agent`` and executed by a thread on the same agent.
+                      - ``workgroup`` and executed by a thread in the same
+                        workgroup.
+                      - ``wavefront`` and executed by a thread in the same
+                        wavefront.
+
+     ``agent``        Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system`` or ``agent`` and executed by a thread on the
+                        same agent.
+                      - ``workgroup`` and executed by a thread in the same
+                        workgroup.
+                      - ``wavefront`` and executed by a thread in the same
+                        wavefront.
+
+     ``workgroup``    Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system``, ``agent`` or ``workgroup`` and executed by a
+                        thread in the same workgroup.
+                      - ``wavefront`` and executed by a thread in the same
+                        wavefront.
+
+     ``wavefront``    Synchronizes with, and participates in modification and
+                      seq_cst total orderings with, other operations (except
+                      image operations) for all address spaces (except private,
+                      or generic that accesses private) provided the other
+                      operation's sync scope is:
+
+                      - ``system``, ``agent``, ``workgroup`` or ``wavefront``
+                        and executed by a thread in the same wavefront.
+
+     ``singlethread`` Only synchronizes with, and participates in modification
+                      and seq_cst total orderings with, other operations (except
+                      image operations) running in the same thread for all
+                      address spaces (for example, in signal handlers).
+     ================ ==========================================================
+
+AMDGPU Intrinsics
+-----------------
+
+The AMDGPU backend implements the following intrinsics.
+
+*This section is WIP.*
+
+.. TODO
+   List AMDGPU intrinsics
+
+Code Object
+===========
+
+The AMDGPU backend generates a standard ELF [ELF]_ relocatable code object that
+can be linked by ``lld`` to produce a standard ELF shared code object which can
+be loaded and executed on an AMDGPU target.
+
+Header
+------
+
+The AMDGPU backend uses the following ELF header:
+
+  .. table:: AMDGPU ELF Header
+     :name: amdgpu-elf-header-table
+
+     ========================== =========================
+     Field                      Value
+     ========================== =========================
+     ``e_ident[EI_CLASS]``      ``ELFCLASS64``
+     ``e_ident[EI_DATA]``       ``ELFDATA2LSB``
+     ``e_ident[EI_OSABI]``      ``ELFOSABI_AMDGPU_HSA``
+     ``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA``
+     ``e_type``                 ``ET_REL`` or ``ET_DYN``
+     ``e_machine``              ``EM_AMDGPU``
+     ``e_entry``                0
+     ``e_flags``                0
+     ========================== =========================
+
+..
+
+  .. table:: AMDGPU ELF Header Enumeration Values
+     :name: amdgpu-elf-header-enumeration-values-table
+
+     ============================ =====
+     Name                         Value
+     ============================ =====
+     ``EM_AMDGPU``                224
+     ``ELFOSABI_AMDGPU_HSA``      64
+     ``ELFABIVERSION_AMDGPU_HSA`` 1
+     ============================ =====
+
+``e_ident[EI_CLASS]``
+  The ELF class is always ``ELFCLASS64``. The AMDGPU backend only supports 64 bit
+  applications.
+
+``e_ident[EI_DATA]``
+  All AMDGPU targets use ELFDATA2LSB for little-endian byte ordering.
+
+``e_ident[EI_OSABI]``
+  The AMD GPU architecture specific OS ABI of ``ELFOSABI_AMDGPU_HSA`` is used to
+  specify that the code object conforms to the AMD HSA runtime ABI [HSA]_.
+
+``e_ident[EI_ABIVERSION]``
+  The AMD GPU architecture specific OS ABI version of
+  ``ELFABIVERSION_AMDGPU_HSA`` is used to specify the version of AMD HSA runtime
+  ABI to which the code object conforms.
+
+``e_type``
+  Can be one of the following values:
+
+
+  ``ET_REL``
+    The type produced by the AMD GPU backend compiler as it is relocatable code
+    object.
+
+  ``ET_DYN``
+    The type produced by the linker as it is a shared code object.
+
+  The AMD HSA runtime loader requires a ``ET_DYN`` code object.
+
+``e_machine``
+  The value ``EM_AMDGPU`` is used for the machine for all members of the AMD GPU
+  architecture family. The specific member is specified in the
+  ``NT_AMD_AMDGPU_ISA`` entry in the ``.note`` section (see
+  :ref:`amdgpu-note-records`).
+
+``e_entry``
+  The entry point is 0 as the entry points for individual kernels must be
+  selected in order to invoke them through AQL packets.
+
+``e_flags``
+  The value is 0 as no flags are used.
+
+Sections
+--------
+
+An AMDGPU target ELF code object has the standard ELF sections which include:
+
+  .. table:: AMDGPU ELF Sections
+     :name: amdgpu-elf-sections-table
+
+     ================== ================ =================================
+     Name               Type             Attributes
+     ================== ================ =================================
+     ``.bss``           ``SHT_NOBITS``   ``SHF_ALLOC`` + ``SHF_WRITE``
+     ``.data``          ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_WRITE``
+     ``.debug_``\ *\**  ``SHT_PROGBITS`` *none*
+     ``.dynamic``       ``SHT_DYNAMIC``  ``SHF_ALLOC``
+     ``.dynstr``        ``SHT_PROGBITS`` ``SHF_ALLOC``
+     ``.dynsym``        ``SHT_PROGBITS`` ``SHF_ALLOC``
+     ``.got``           ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_WRITE``
+     ``.hash``          ``SHT_HASH``     ``SHF_ALLOC``
+     ``.note``          ``SHT_NOTE``     *none*
+     ``.rela``\ *name*  ``SHT_RELA``     *none*
+     ``.rela.dyn``      ``SHT_RELA``     *none*
+     ``.rodata``        ``SHT_PROGBITS`` ``SHF_ALLOC``
+     ``.shstrtab``      ``SHT_STRTAB``   *none*
+     ``.strtab``        ``SHT_STRTAB``   *none*
+     ``.symtab``        ``SHT_SYMTAB``   *none*
+     ``.text``          ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_EXECINSTR``
+     ================== ================ =================================
+
+These sections have their standard meanings (see [ELF]_) and are only generated
+if needed.
+
+``.debug``\ *\**
+  The standard DWARF sections. See :ref:`amdgpu-dwarf` for information on the
+  DWARF produced by the AMDGPU backend.
+
+``.dynamic``, ``.dynstr``, ``.dynstr``, ``.hash``
+  The standard sections used by a dynamic loader.
+
+``.note``
+  See :ref:`amdgpu-note-records` for the note records supported by the AMDGPU
+  backend.
+
+``.rela``\ *name*, ``.rela.dyn``
+  For relocatable code objects, *name* is the name of the section that the
+  relocation records apply. For example, ``.rela.text`` is the section name for
+  relocation records associated with the ``.text`` section.
+
+  For linked shared code objects, ``.rela.dyn`` contains all the relocation
+  records from each of the relocatable code object's ``.rela``\ *name* sections.
+
+  See :ref:`amdgpu-relocation-records` for the relocation records supported by
+  the AMDGPU backend.
+
+``.text``
+  The executable machine code for the kernels and functions they call. Generated
+  as position independent code. See :ref:`amdgpu-code-conventions` for
+  information on conventions used in the isa generation.
+
+.. _amdgpu-note-records:
+
+Note Records
+------------
+
+As required by ``ELFCLASS64``, minimal zero byte padding must be generated after
+the ``name`` field to ensure the ``desc`` field is 4 byte aligned. In addition,
+minimal zero byte padding must be generated to ensure the ``desc`` field size is
+a multiple of 4 bytes. The ``sh_addralign`` field of the ``.note`` section must
+be at least 4 to indicate at least 8 byte alignment.
+
+The AMDGPU backend code object uses the following ELF note records in the
+``.note`` section. The *Description* column specifies the layout of the note
+record’s ``desc`` field. All fields are consecutive bytes. Note records with
+variable size strings have a corresponding ``*_size`` field that specifies the
+number of bytes, including the terminating null character, in the string. The
+string(s) come immediately after the preceding fields.
+
+Additional note records can be present.
+
+  .. table:: AMDGPU ELF Note Records
+     :name: amdgpu-elf-note-records-table
+
+     ===== ========================== ==========================================
+     Name  Type                       Description
+     ===== ========================== ==========================================
+     "AMD" ``NT_AMD_AMDGPU_METADATA`` <metadata null terminated string>
+     "AMD" ``NT_AMD_AMDGPU_ISA``      <isa name null terminated string>
+     ===== ========================== ==========================================
+
+..
+
+  .. table:: AMDGPU ELF Note Record Enumeration Values
+     :name: amdgpu-elf-note-record-enumeration-values-table
+
+     ============================= =====
+     Name                          Value
+     ============================= =====
+     *reserved*                    0-9
+     ``NT_AMD_AMDGPU_METADATA``    10
+     ``NT_AMD_AMDGPU_ISA``         11
+     ============================= =====
+
+``NT_AMD_AMDGPU_ISA``
+  Specifies the instruction set architecture used by the machine code contained
+  in the code object.
+
+  This note record is required for code objects containing machine code for
+  processors matching the ``amdgcn`` architecture in table
+  :ref:`amdgpu-processors`.
+
+  The null terminated string has the following syntax:
+
+    *architecture*\ ``-``\ *vendor*\ ``-``\ *os*\ ``-``\ *environment*\ ``-``\ *processor*
+
+  where:
+
+    *architecture*
+      The architecture from table :ref:`amdgpu-target-triples-table`.
+
+      This is always ``amdgcn`` when the target triple OS is ``amdhsa`` (see
+      :ref:`amdgpu-target-triples`).
+
+    *vendor*
+      The vendor from table :ref:`amdgpu-target-triples-table`.
+
+      For the AMDGPU backend this is always ``amd``.
+
+    *os*
+      The OS from table :ref:`amdgpu-target-triples-table`.
+
+    *environment*
+      An environment from table :ref:`amdgpu-target-triples-table`, or blank if
+      the environment has no affect on the execution of the code object.
+
+      For the AMDGPU backend this is currently always blank.
+    *processor*
+      The processor from table :ref:`amdgpu-processors-table`.
+
+  For example:
+
+    ``amdgcn-amd-amdhsa--gfx901``
+
+``NT_AMD_AMDGPU_METADATA``
+  Specifies extensible metadata associated with the code object. See
+  :ref:`amdgpu-code-object-metadata` for the syntax of the code object metadata
+  string.
+
+  This note record is required and must contain the minimum information
+  necessary to support the ROCM kernel queries. For example, the segment sizes
+  needed in a dispatch packet. In addition, a high level language runtime may
+  require other information to be included. For example, the AMD OpenCL runtime
+  records kernel argument information.
+
+  .. TODO
+     Is the string null terminated? It probably should not if YAML allows it to
+     contain null characters, otherwise it should be.
+
+.. _amdgpu-code-object-metadata:
+
+Code Object Metadata
+--------------------
+
+The code object metadata is specified by the ``NT_AMD_AMDHSA_METADATA`` note
+record (see :ref:`amdgpu-note-records`).
+
+The metadata is specified as a YAML formatted string (see [YAML]_ and
+:doc:`YamlIO`).
+
+The metadata is represented as a single YAML document comprised of the mapping
+defined in table :ref:`amdgpu-amdhsa-code-object-metadata-mapping-table` and
+referenced tables.
+
+For boolean values, the string values of ``false`` and ``true`` are used for
+false and true respectively.
+
+Additional information can be added to the mappings. To avoid conflicts, any
+non-AMD key names should be prefixed by "*vendor-name*.".
+
+  .. table:: AMDHSA Code Object Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-metadata-mapping-table
+
+     ========== ============== ========= =======================================
+     String Key Value Type     Required? Description
+     ========== ============== ========= =======================================
+     "Version"  sequence of    Required  - The first integer is the major
+                2 integers                 version. Currently 1.
+                                         - The second integer is the minor
+                                           version. Currently 0.
+     "Printf"   sequence of              Each string is encoded information
+                strings                  about a printf function call. The
+                                         encoded information is organized as
+                                         fields separated by colon (':'):
+
+                                         ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString``
+
+                                         where:
+
+                                         ``ID``
+                                           A 32 bit integer as a unique id for
+                                           each printf function call
+
+                                         ``N``
+                                           A 32 bit integer equal to the number
+                                           of arguments of printf function call
+                                           minus 1
+
+                                         ``S[i]`` (where i = 0, 1, ... , N-1)
+                                           32 bit integers for the size in bytes
+                                           of the i-th FormatString argument of
+                                           the printf function call
+
+                                         FormatString
+                                           The format string passed to the
+                                           printf function call.
+     "Kernels"  sequence of    Required  Sequence of the mappings for each
+                mapping                  kernel in the code object. See
+                                         :ref:`amdgpu-amdhsa-code-object-kernel-metadata-mapping-table`
+                                         for the definition of the mapping.
+     ========== ============== ========= =======================================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-metadata-mapping-table
+
+     ================= ============== ========= ================================
+     String Key        Value Type     Required? Description
+     ================= ============== ========= ================================
+     "Name"            string         Required  Source name of the kernel.
+     "SymbolName"      string         Required  Name of the kernel
+                                                descriptor ELF symbol.
+     "Language"        string                   Source language of the kernel.
+                                                Values include:
+
+                                                - "OpenCL C"
+                                                - "OpenCL C++"
+                                                - "HCC"
+                                                - "OpenMP"
+
+     "LanguageVersion" sequence of              - The first integer is the major
+                       2 integers                 version.
+                                                - The second integer is the
+                                                  minor version.
+     "Attrs"           mapping                  Mapping of kernel attributes.
+                                                See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table`
+                                                for the mapping definition.
+     "Arguments"       sequence of              Sequence of mappings of the
+                       mapping                  kernel arguments. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table`
+                                                for the definition of the mapping.
+     "CodeProps"       mapping                  Mapping of properties related to
+                                                the kernel code. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
+                                                for the mapping definition.
+     "DebugProps"      mapping                  Mapping of properties related to
+                                                the kernel debugging. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table`
+                                                for the mapping definition.
+     ================= ============== ========= ================================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Attribute Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table
+
+     =================== ============== ========= ==============================
+     String Key          Value Type     Required? Description
+     =================== ============== ========= ==============================
+     "ReqdWorkGroupSize" sequence of              The dispatch work-group size
+                         3 integers               X, Y, Z must correspond to the
+                                                  specified values.
+
+                                                  Corresponds to the OpenCL
+                                                  ``reqd_work_group_size``
+                                                  attribute.
+     "WorkGroupSizeHint" sequence of              The dispatch work-group size
+                         3 integers               X, Y, Z is likely to be the
+                                                  specified values.
+
+                                                  Corresponds to the OpenCL
+                                                  ``work_group_size_hint``
+                                                  attribute.
+     "VecTypeHint"       string                   The name of a scalar or vector
+                                                  type.
+
+                                                  Corresponds to the OpenCL
+                                                  ``vec_type_hint`` attribute.
+     =================== ============== ========= ==============================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Argument Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table
+
+     ================= ============== ========= ================================
+     String Key        Value Type     Required? Description
+     ================= ============== ========= ================================
+     "Name"            string                   Kernel argument name.
+     "TypeName"        string                   Kernel argument type name.
+     "Size"            integer        Required  Kernel argument size in bytes.
+     "Align"           integer        Required  Kernel argument alignment in
+                                                bytes. Must be a power of two.
+     "ValueKind"       string         Required  Kernel argument kind that
+                                                specifies how to set up the
+                                                corresponding argument.
+                                                Values include:
+
+                                                "ByValue"
+                                                  The argument is copied
+                                                  directly into the kernarg.
+
+                                                "GlobalBuffer"
+                                                  A global address space pointer
+                                                  to the buffer data is passed
+                                                  in the kernarg.
+
+                                                "DynamicSharedPointer"
+                                                  A group address space pointer
+                                                  to dynamically allocated LDS
+                                                  is passed in the kernarg.
+
+                                                "Sampler"
+                                                  A global address space
+                                                  pointer to a S# is passed in
+                                                  the kernarg.
+
+                                                "Image"
+                                                  A global address space
+                                                  pointer to a T# is passed in
+                                                  the kernarg.
+
+                                                "Pipe"
+                                                  A global address space pointer
+                                                  to an OpenCL pipe is passed in
+                                                  the kernarg.
+
+                                                "Queue"
+                                                  A global address space pointer
+                                                  to an OpenCL device enqueue
+                                                  queue is passed in the
+                                                  kernarg.
+
+                                                "HiddenGlobalOffsetX"
+                                                  The OpenCL grid dispatch
+                                                  global offset for the X
+                                                  dimension is passed in the
+                                                  kernarg.
+
+                                                "HiddenGlobalOffsetY"
+                                                  The OpenCL grid dispatch
+                                                  global offset for the Y
+                                                  dimension is passed in the
+                                                  kernarg.
+
+                                                "HiddenGlobalOffsetZ"
+                                                  The OpenCL grid dispatch
+                                                  global offset for the Z
+                                                  dimension is passed in the
+                                                  kernarg.
+
+                                                "HiddenNone"
+                                                  An argument that is not used
+                                                  by the kernel. Space needs to
+                                                  be left for it, but it does
+                                                  not need to be set up.
+
+                                                "HiddenPrintfBuffer"
+                                                  A global address space pointer
+                                                  to the runtime printf buffer
+                                                  is passed in kernarg.
+
+                                                "HiddenDefaultQueue"
+                                                  A global address space pointer
+                                                  to the OpenCL device enqueue
+                                                  queue that should be used by
+                                                  the kernel by default is
+                                                  passed in the kernarg.
+
+                                                "HiddenCompletionAction"
+                                                  *TBD*
+
+                                                  .. TODO
+                                                     Add description.
+
+     "ValueType"       string         Required  Kernel argument value type. Only
+                                                present if "ValueKind" is
+                                                "ByValue". For vector data
+                                                types, the value is for the
+                                                element type. Values include:
+
+                                                - "Struct"
+                                                - "I8"
+                                                - "U8"
+                                                - "I16"
+                                                - "U16"
+                                                - "F16"
+                                                - "I32"
+                                                - "U32"
+                                                - "F32"
+                                                - "I64"
+                                                - "U64"
+                                                - "F64"
+
+                                                .. TODO
+                                                   How can it be determined if a
+                                                   vector type, and what size
+                                                   vector?
+     "PointeeAlign"    integer                  Alignment in bytes of pointee
+                                                type for pointer type kernel
+                                                argument. Must be a power
+                                                of 2. Only present if
+                                                "ValueKind" is
+                                                "DynamicSharedPointer".
+     "AddrSpaceQual"   string                   Kernel argument address space
+                                                qualifier. Only present if
+                                                "ValueKind" is "GlobalBuffer" or
+                                                "DynamicSharedPointer". Values
+                                                are:
+
+                                                - "Private"
+                                                - "Global"
+                                                - "Constant"
+                                                - "Local"
+                                                - "Generic"
+                                                - "Region"
+
+                                                .. TODO
+                                                   Is GlobalBuffer only Global
+                                                   or Constant? Is
+                                                   DynamicSharedPointer always
+                                                   Local? Can HCC allow Generic?
+                                                   How can Private or Region
+                                                   ever happen?
+     "AccQual"         string                   Kernel argument access
+                                                qualifier. Only present if
+                                                "ValueKind" is "Image" or
+                                                "Pipe". Values
+                                                are:
+
+                                                - "ReadOnly"
+                                                - "WriteOnly"
+                                                - "ReadWrite"
+
+                                                .. TODO
+                                                   Does this apply to
+                                                   GlobalBuffer?
+     "ActualAcc"       string                   The actual memory accesses
+                                                performed by the kernel on the
+                                                kernel argument. Only present if
+                                                "ValueKind" is "GlobalBuffer",
+                                                "Image", or "Pipe". This may be
+                                                more restrictive than indicated
+                                                by "AccQual" to reflect what the
+                                                kernel actual does. If not
+                                                present then the runtime must
+                                                assume what is implied by
+                                                "AccQual" and "IsConst". Values
+                                                are:
+
+                                                - "ReadOnly"
+                                                - "WriteOnly"
+                                                - "ReadWrite"
+
+     "IsConst"         boolean                  Indicates if the kernel argument
+                                                is const qualified. Only present
+                                                if "ValueKind" is
+                                                "GlobalBuffer".
+
+     "IsRestrict"      boolean                  Indicates if the kernel argument
+                                                is restrict qualified. Only
+                                                present if "ValueKind" is
+                                                "GlobalBuffer".
+
+     "IsVolatile"      boolean                  Indicates if the kernel argument
+                                                is volatile qualified. Only
+                                                present if "ValueKind" is
+                                                "GlobalBuffer".
+
+     "IsPipe"          boolean                  Indicates if the kernel argument
+                                                is pipe qualified. Only present
+                                                if "ValueKind" is "Pipe".
+
+                                                .. TODO
+                                                   Can GlobalBuffer be pipe
+                                                   qualified?
+     ================= ============== ========= ================================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Code Properties Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table
+
+     ============================ ============== ========= =====================
+     String Key                   Value Type     Required? Description
+     ============================ ============== ========= =====================
+     "KernargSegmentSize"         integer        Required  The size in bytes of
+                                                           the kernarg segment
+                                                           that holds the values
+                                                           of the arguments to
+                                                           the kernel.
+     "GroupSegmentFixedSize"      integer        Required  The amount of group
+                                                           segment memory
+                                                           required by a
+                                                           work-group in
+                                                           bytes. This does not
+                                                           include any
+                                                           dynamically allocated
+                                                           group segment memory
+                                                           that may be added
+                                                           when the kernel is
+                                                           dispatched.
+     "PrivateSegmentFixedSize"    integer        Required  The amount of fixed
+                                                           private address space
+                                                           memory required for a
+                                                           work-item in
+                                                           bytes. If
+                                                           IsDynamicCallstack
+                                                           is 1 then additional
+                                                           space must be added
+                                                           to this value for the
+                                                           call stack.
+     "KernargSegmentAlign"        integer        Required  The maximum byte
+                                                           alignment of
+                                                           arguments in the
+                                                           kernarg segment. Must
+                                                           be a power of 2.
+     "WavefrontSize"              integer        Required  Wavefront size. Must
+                                                           be a power of 2.
+     "NumSGPRs"                   integer                  Number of scalar
+                                                           registers used by a
+                                                           wavefront for
+                                                           GFX6-GFX9. This
+                                                           includes the special
+                                                           SGPRs for VCC, Flat
+                                                           Scratch (GFX7-GFX9)
+                                                           and XNACK (for
+                                                           GFX8-GFX9). It does
+                                                           not include the 16
+                                                           SGPR added if a trap
+                                                           handler is
+                                                           enabled. It is not
+                                                           rounded up to the
+                                                           allocation
+                                                           granularity.
+     "NumVGPRs"                   integer                  Number of vector
+                                                           registers used by
+                                                           each work-item for
+                                                           GFX6-GFX9
+     "MaxFlatWorkgroupSize"       integer                  Maximum flat
+                                                           work-group size
+                                                           supported by the
+                                                           kernel in work-items.
+     "IsDynamicCallStack"         boolean                  Indicates if the
+                                                           generated machine
+                                                           code is using a
+                                                           dynamically sized
+                                                           call stack.
+     "IsXNACKEnabled"             boolean                  Indicates if the
+                                                           generated machine
+                                                           code is capable of
+                                                           supporting XNACK.
+     ============================ ============== ========= =====================
+
+..
+
+  .. table:: AMDHSA Code Object Kernel Debug Properties Metadata Mapping
+     :name: amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table
+
+     =================================== ============== ========= ==============
+     String Key                          Value Type     Required? Description
+     =================================== ============== ========= ==============
+     "DebuggerABIVersion"                string
+     "ReservedNumVGPRs"                  integer
+     "ReservedFirstVGPR"                 integer
+     "PrivateSegmentBufferSGPR"          integer
+     "WavefrontPrivateSegmentOffsetSGPR" integer
+     =================================== ============== ========= ==============
+
+.. TODO
+   Plan to remove the debug properties metadata.   
+
+.. _amdgpu-symbols:
+
+Symbols
+-------
+
+Symbols include the following:
+
+  .. table:: AMDGPU ELF Symbols
+     :name: amdgpu-elf-symbols-table
+
+     ===================== ============== ============= ==================
+     Name                  Type           Section       Description
+     ===================== ============== ============= ==================
+     *link-name*           ``STT_OBJECT`` - ``.data``   Global variable
+                                          - ``.rodata``
+                                          - ``.bss``
+     *link-name*\ ``@kd``  ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
+     *link-name*           ``STT_FUNC``   - ``.text``   Kernel entry point
+     ===================== ============== ============= ==================
+
+Global variable
+  Global variables both used and defined by the compilation unit.
+
+  If the symbol is defined in the compilation unit then it is allocated in the
+  appropriate section according to if it has initialized data or is readonly.
+
+  If the symbol is external then its section is ``STN_UNDEF`` and the loader
+  will resolve relocations using the definition provided by another code object
+  or explicitly defined by the runtime.
+
+  All global symbols, whether defined in the compilation unit or external, are
+  accessed by the machine code indirectly through a GOT table entry. This
+  allows them to be preemptable. The GOT table is only supported when the target
+  triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`).
+
+  .. TODO
+     Add description of linked shared object symbols. Seems undefined symbols
+     are marked as STT_NOTYPE.
+
+Kernel descriptor
+  Every HSA kernel has an associated kernel descriptor. It is the address of the
+  kernel descriptor that is used in the AQL dispatch packet used to invoke the
+  kernel, not the kernel entry point. The layout of the HSA kernel descriptor is
+  defined in :ref:`amdgpu-amdhsa-kernel-descriptor`.
+
+Kernel entry point
+  Every HSA kernel also has a symbol for its machine code entry point.
+
+.. _amdgpu-relocation-records:
+
+Relocation Records
+------------------
+
+AMDGPU backend generates ``Elf64_Rela`` relocation records. Supported
+relocatable fields are:
+
+``word32``
+  This specifies a 32-bit field occupying 4 bytes with arbitrary byte
+  alignment. These values use the same byte order as other word values in the
+  AMD GPU architecture.
+
+``word64``
+  This specifies a 64-bit field occupying 8 bytes with arbitrary byte
+  alignment. These values use the same byte order as other word values in the
+  AMD GPU architecture.
+
+Following notations are used for specifying relocation calculations:
+
+**A**
+  Represents the addend used to compute the value of the relocatable field.
+
+**G**
+  Represents the offset into the global offset table at which the relocation
+  entry’s symbol will reside during execution.
+
+**GOT**
+  Represents the address of the global offset table.
+
+**P**
+  Represents the place (section offset for ``et_rel`` or address for ``et_dyn``)
+  of the storage unit being relocated (computed using ``r_offset``).
+
+**S**
+  Represents the value of the symbol whose index resides in the relocation
+  entry.
+
+The following relocation types are supported:
+
+  .. table:: AMDGPU ELF Relocation Records
+     :name: amdgpu-elf-relocation-records-table
+
+     ==========================  =====  ==========  ==============================
+     Relocation Type             Value  Field       Calculation
+     ==========================  =====  ==========  ==============================
+     ``R_AMDGPU_NONE``           0      *none*      *none*
+     ``R_AMDGPU_ABS32_LO``       1      ``word32``  (S + A) & 0xFFFFFFFF
+     ``R_AMDGPU_ABS32_HI``       2      ``word32``  (S + A) >> 32
+     ``R_AMDGPU_ABS64``          3      ``word64``  S + A
+     ``R_AMDGPU_REL32``          4      ``word32``  S + A - P
+     ``R_AMDGPU_REL64``          5      ``word64``  S + A - P
+     ``R_AMDGPU_ABS32``          6      ``word32``  S + A
+     ``R_AMDGPU_GOTPCREL``       7      ``word32``  G + GOT + A - P
+     ``R_AMDGPU_GOTPCREL32_LO``  8      ``word32``  (G + GOT + A - P) & 0xFFFFFFFF
+     ``R_AMDGPU_GOTPCREL32_HI``  9      ``word32``  (G + GOT + A - P) >> 32
+     ``R_AMDGPU_REL32_LO``       10     ``word32``  (S + A - P) & 0xFFFFFFFF
+     ``R_AMDGPU_REL32_HI``       11     ``word32``  (S + A - P) >> 32
+     ==========================  =====  ==========  ==============================
+
+.. _amdgpu-dwarf:
+
+DWARF
+-----
+
+Standard DWARF [DWARF]_ Version 2 sections can be generated. These contain
+information that maps the code object executable code and data to the source
+language constructs. It can be used by tools such as debuggers and profilers.
+
+Address Space Mapping
+~~~~~~~~~~~~~~~~~~~~~
+
+The following address space mapping is used:
+
+  .. table:: AMDGPU DWARF Address Space Mapping
+     :name: amdgpu-dwarf-address-space-mapping-table
+
+     =================== =================
+     DWARF Address Space Memory Space
+     =================== =================
+     1                   Private (Scratch)
+     2                   Local (group/LDS)
+     *omitted*           Global
+     *omitted*           Constant
+     *omitted*           Generic (Flat)
+     *not supported*     Region (GDS)
+     =================== =================
+
+See :ref:`amdgpu-address-spaces` for infomration on the memory space terminology
+used in the table.
+
+An ``address_class`` attribute is generated on pointer type DIEs to specify the
+DWARF address space of the value of the pointer when it is in the *private* or
+*local* address space. Otherwise the attribute is omitted.
+
+An ``XDEREF`` operation is generated in location list expressions for variables
+that are allocated in the *private* and *local* address space. Otherwise no
+``XDREF`` is omitted.
+
+Register Mapping
+~~~~~~~~~~~~~~~~
+
+*This section is WIP.*
+
+.. TODO
+   Define DWARF register enumeration.
+
+   If want to present a wavefront state then should expose vector registers as
+   64 wide (rather than per work-item view that LLVM uses). Either as separate
+   registers, or a 64x4 byte single register. In either case use a new LANE op
+   (akin to XDREF) to select the current lane usage in a location
+   expression. This would also allow scalar register spilling to vector register
+   lanes to be expressed (currently no debug information is being generated for
+   spilling). If choose a wide single register approach then use LANE in
+   conjunction with PIECE operation to select the dword part of the register for
+   the current lane. If the separate register approach then use LANE to select
+   the register.
+
+Source Text
+~~~~~~~~~~~
+
+*This section is WIP.*
+
+.. TODO
+   DWARF extension to include runtime generated source text.
+
+.. _amdgpu-code-conventions:
+
+Code Conventions
+================
+
+AMDHSA
+------
+
+This section provides code conventions used when the target triple OS is
+``amdhsa`` (see :ref:`amdgpu-target-triples`).
+
+Kernel Dispatch
+~~~~~~~~~~~~~~~
+
+The HSA architected queuing language (AQL) defines a user space memory interface
+that can be used to control the dispatch of kernels, in an agent independent
+way. An agent can have zero or more AQL queues created for it using the ROCm
+runtime, in which AQL packets (all of which are 64 bytes) can be placed. See the
+*HSA Platform System Architecture Specification* [HSA]_ for the AQL queue
+mechanics and packet layouts.
+
+The packet processor of a kernel agent is responsible for detecting and
+dispatching HSA kernels from the AQL queues associated with it. For AMD GPUs the
+packet processor is implemented by the hardware command processor (CP),
+asynchronous dispatch controller (ADC) and shader processor input controller
+(SPI).
+
+The ROCm runtime can be used to allocate an AQL queue object. It uses the kernel
+mode driver to initialize and register the AQL queue with CP.
+
+To dispatch a kernel the following actions are performed. This can occur in the
+CPU host program, or from an HSA kernel executing on a GPU.
+
+1. A pointer to an AQL queue for the kernel agent on which the kernel is to be
+   executed is obtained.
+2. A pointer to the kernel descriptor (see
+   :ref:`amdgpu-amdhsa-kernel-descriptor`) of the kernel to execute is
+   obtained. It must be for a kernel that is contained in a code object that that
+   was loaded by the ROCm runtime on the kernel agent with which the AQL queue is
+   associated.
+3. Space is allocated for the kernel arguments using the ROCm runtime allocator
+   for a memory region with the kernarg property for the kernel agent that will
+   execute the kernel. It must be at least 16 byte aligned.
+4. Kernel argument values are assigned to the kernel argument memory
+   allocation. The layout is defined in the *HSA Programmer’s Language Reference*
+   [HSA]_. For AMDGPU the kernel execution directly accesses the kernel argument
+   memory in the same way constant memory is accessed. (Note that the HSA
+   specification allows an implementation to copy the kernel argument contents to
+   another location that is accessed by the kernel.)
+5. An AQL kernel dispatch packet is created on the AQL queue. The ROCm runtime
+   api uses 64 bit atomic operations to reserve space in the AQL queue for the
+   packet. The packet must be set up, and the final write must use an atomic
+   store release to set the packet kind to ensure the packet contents are
+   visible to the kernel agent. AQL defines a doorbell signal mechanism to
+   notify the kernel agent that the AQL queue has been updated. These rules, and
+   the layout of the AQL queue and kernel dispatch packet is defined in the *HSA
+   System Architecture Specification* [HSA]_.
+6. A kernel dispatch packet includes information about the actual dispatch,
+   such as grid and work-group size, together with information from the code
+   object about the kernel, such as segment sizes. The ROCm runtime queries on
+   the kernel symbol can be used to obtain the code object values which are
+   recorded in the :ref:`amdgpu-code-object-metadata`.
+7. CP executes micro-code and is responsible for detecting and setting up the
+   GPU to execute the wavefronts of a kernel dispatch.
+8. CP ensures that when the a wavefront starts executing the kernel machine
+   code, the scalar general purpose registers (SGPR) and vector general purpose
+   registers (VGPR) are set up as required by the machine code. The required
+   setup is defined in the :ref:`amdgpu-amdhsa-kernel-descriptor`. The initial
+   register state is defined in
+   :ref:`amdgpu-amdhsa-initial-kernel-execution-state`.
+9. The prolog of the kernel machine code (see
+   :ref:`amdgpu-amdhsa-kernel-prolog`) sets up the machine state as necessary
+   before continuing executing the machine code that corresponds to the kernel.
+10. When the kernel dispatch has completed execution, CP signals the completion
+    signal specified in the kernel dispatch packet if not 0.
+
+.. _amdgpu-amdhsa-memory-spaces:
+
+Memory Spaces
+~~~~~~~~~~~~~
+
+The memory space properties are:
+
+  .. table:: AMDHSA Memory Spaces
+     :name: amdgpu-amdhsa-memory-spaces-table
+
+     ================= =========== ======== ======= ==================
+     Memory Space Name HSA Segment Hardware Address NULL Value
+                       Name        Name     Size
+     ================= =========== ======== ======= ==================
+     Private           private     scratch  32      0x00000000
+     Local             group       LDS      32      0xFFFFFFFF
+     Global            global      global   64      0x0000000000000000
+     Constant          constant    *same as 64      0x0000000000000000
+                                   global*
+     Generic           flat        flat     64      0x0000000000000000
+     Region            N/A         GDS      32      *not implemented
+                                                    for AMDHSA*
+     ================= =========== ======== ======= ==================
+
+The global and constant memory spaces both use global virtual addresses, which
+are the same virtual address space used by the CPU. However, some virtual
+addresses may only be accessible to the CPU, some only accessible by the GPU,
+and some by both.
+
+Using the constant memory space indicates that the data will not change during
+the execution of the kernel. This allows scalar read instructions to be
+used. The vector and scalar L1 caches are invalidated of volatile data before
+each kernel dispatch execution to allow constant memory to change values between
+kernel dispatches.
+
+The local memory space uses the hardware Local Data Store (LDS) which is
+automatically allocated when the hardware creates work-groups of wavefronts, and
+freed when all the wavefronts of a work-group have terminated. The data store
+(DS) instructions can be used to access it.
+
+The private memory space uses the hardware scratch memory support. If the kernel
+uses scratch, then the hardware allocates memory that is accessed using
+wavefront lane dword (4 byte) interleaving. The mapping used from private
+address to physical address is:
+
+  ``wavefront-scratch-base +
+  (private-address * wavefront-size * 4) +
+  (wavefront-lane-id * 4)``
+
+There are different ways that the wavefront scratch base address is determined
+by a wavefront (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). This
+memory can be accessed in an interleaved manner using buffer instruction with
+the scratch buffer descriptor and per wave scratch offset, by the scratch
+instructions, or by flat instructions. If each lane of a wavefront accesses the
+same private address, the interleaving results in adjacent dwords being accessed
+and hence requires fewer cache lines to be fetched. Multi-dword access is not
+supported except by flat and scratch instructions in GFX9.
+
+The generic address space uses the hardware flat address support available in
+GFX7-GFX9. This uses two fixed ranges of virtual addresses (the private and
+local appertures), that are outside the range of addressible global memory, to
+map from a flat address to a private or local address.
+
+FLAT instructions can take a flat address and access global, private (scratch)
+and group (LDS) memory depending in if the address is within one of the
+apperture ranges. Flat access to scratch requires hardware aperture setup and
+setup in the kernel prologue (see :ref:`amdgpu-amdhsa-flat-scratch`). Flat
+access to LDS requires hardware aperture setup and M0 (GFX7-GFX8) register setup
+(see :ref:`amdgpu-amdhsa-m0`).
+
+To convert between a segment address and a flat address the base address of the
+appertures address can be used. For GFX7-GFX8 these are available in the
+:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
+Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
+GFX9 the appature base addresses are directly available as inline constant
+registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit
+address mode the apperture sizes are 2^32 bytes and the base is aligned to 2^32
+which makes it easier to convert from flat to segment or segment to flat.
+
+HSA Image and Samplers
+~~~~~~~~~~~~~~~~~~~~~~
+
+Image and sample handles created by the ROCm runtime are 64 bit addresses of a
+hardware 32 byte V# and 48 byte S# object respectively. In order to support the
+HSA ``query_sampler`` operations two extra dwords are used to store the HSA BRIG
+enumeration values for the queries that are not trivially deducible from the S#
+representation.
+
+HSA Signals
+~~~~~~~~~~~
+
+Signal handles created by the ROCm runtime are 64 bit addresses of a structure
+allocated in memory accessible from both the CPU and GPU. The structure is
+defined by the ROCm runtime and subject to change between releases (see
+[AMD-ROCm-github]_).
+
+.. _amdgpu-amdhsa-hsa-aql-queue:
+
+HSA AQL Queue
+~~~~~~~~~~~~~
+
+The AQL queue structure is defined by the ROCm runtime and subject to change
+between releases (see [AMD-ROCm-github]_). For some processors it contains
+fields needed to implement certain language features such as the flat address
+aperture bases. It also contains fields used by CP such as managing the
+allocation of scratch memory.
+
+.. _amdgpu-amdhsa-kernel-descriptor:
+
+Kernel Descriptor
+~~~~~~~~~~~~~~~~~
+
+A kernel descriptor consists of the information needed by CP to initiate the
+execution of a kernel, including the entry point address of the machine code
+that implements the kernel.
+
+Kernel Descriptor for GFX6-GFX9
++++++++++++++++++++++++++++++++
+
+CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
+
+  .. table:: Kernel Descriptor for GFX6-GFX9
+     :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table
+
+     ======= ======= =============================== ===========================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================
+     31:0    4 bytes group_segment_fixed_size        The amount of fixed local
+                                                     address space memory
+                                                     required for a work-group
+                                                     in bytes. This does not
+                                                     include any dynamically
+                                                     allocated local address
+                                                     space memory that may be
+                                                     added when the kernel is
+                                                     dispatched.
+     63:32   4 bytes private_segment_fixed_size      The amount of fixed
+                                                     private address space
+                                                     memory required for a
+                                                     work-item in bytes. If
+                                                     is_dynamic_callstack is 1
+                                                     then additional space must
+                                                     be added to this value for
+                                                     the call stack.
+     95:64   4 bytes max_flat_workgroup_size         Maximum flat work-group
+                                                     size supported by the
+                                                     kernel in work-items.
+     96      1 bit   is_dynamic_call_stack           Indicates if the generated
+                                                     machine code is using a
+                                                     dynamically sized call
+                                                     stack.
+     97      1 bit   is_xnack_enabled                Indicates if the generated
+                                                     machine code is capable of
+                                                     suppoting XNACK.
+     127:98  30 bits                                 Reserved. Must be 0.
+     191:128 8 bytes kernel_code_entry_byte_offset   Byte offset (possibly
+                                                     negative) from base
+                                                     address of kernel
+                                                     descriptor to kernel's
+                                                     entry point instruction
+                                                     which must be 256 byte
+                                                     aligned.
+     383:192 24                                      Reserved. Must be 0.
+             bytes
+     415:384 4 bytes compute_pgm_rsrc1               Compute Shader (CS)
+                                                     program settings used by
+                                                     CP to set up
+                                                     ``COMPUTE_PGM_RSRC1``
+                                                     configuration
+                                                     register. See
+                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc1_t-gfx6-gfx9-table`.
+     447:416 4 bytes compute_pgm_rsrc2               Compute Shader (CS)
+                                                     program settings used by
+                                                     CP to set up
+                                                     ``COMPUTE_PGM_RSRC2``
+                                                     configuration
+                                                     register. See
+                                                     :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
+     448     1 bit   enable_sgpr_private_segment     Enable the setup of the
+                     _buffer                         SGPR user data registers
+                                                     (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     The total number of SGPR
+                                                     user data registers
+                                                     requested must not exceed
+                                                     16 and match value in
+                                                     ``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
+                                                     Any requests beyond 16
+                                                     will be ignored.
+     449     1 bit   enable_sgpr_dispatch_ptr        *see above*
+     450     1 bit   enable_sgpr_queue_ptr           *see above*
+     451     1 bit   enable_sgpr_kernarg_segment_ptr *see above*
+     452     1 bit   enable_sgpr_dispatch_id         *see above*
+     453     1 bit   enable_sgpr_flat_scratch_init   *see above*
+     454     1 bit   enable_sgpr_private_segment     *see above*
+                     _size
+     455     1 bit   enable_sgpr_grid_workgroup      Not implemented in CP and
+                     _count_X                        should always be 0.
+     456     1 bit   enable_sgpr_grid_workgroup      Not implemented in CP and
+                     _count_Y                        should always be 0.
+     457     1 bit   enable_sgpr_grid_workgroup      Not implemented in CP and
+                     _count_Z                        should always be 0.
+     463:458 6 bits                                  Reserved. Must be 0.
+     511:464 4                                       Reserved. Must be 0.
+             bytes
+     512     **Total size 64 bytes.**
+     ======= ===================================================================
+
+..
+
+  .. table:: compute_pgm_rsrc1 for GFX6-GFX9
+     :name: amdgpu-amdhsa-compute_pgm_rsrc1_t-gfx6-gfx9-table
+
+     ======= ======= =============================== ===========================================================================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================================================================
+     5:0     6 bits  granulated_workitem_vgpr_count  Number of vector registers
+                                                     used by each work-item,
+                                                     granularity is device
+                                                     specific:
+
+                                                     GFX6-9
+                                                       roundup((max-vgpg + 1)
+                                                       / 4) - 1
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.VGPRS``.
+     9:6     4 bits  granulated_wavefront_sgpr_count Number of scalar registers
+                                                     used by a wavefront,
+                                                     granularity is device
+                                                     specific:
+
+                                                     GFX6-8
+                                                       roundup((max-sgpg + 1)
+                                                       / 8) - 1
+                                                     GFX9
+                                                       roundup((max-sgpg + 1)
+                                                       / 16) - 1
+
+                                                     Includes the special SGPRs
+                                                     for VCC, Flat Scratch (for
+                                                     GFX7 onwards) and XNACK
+                                                     (for GFX8 onwards). It does
+                                                     not include the 16 SGPR
+                                                     added if a trap handler is
+                                                     enabled.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.SGPRS``.
+     11:10   2 bits  priority                        Must be 0.
+
+                                                     Start executing wavefront
+                                                     at the specified priority.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.PRIORITY``.
+     13:12   2 bits  float_mode_round_32             Wavefront starts execution
+                                                     with specified rounding
+                                                     mode for single (32
+                                                     bit) floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point rounding
+                                                     mode values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     15:14   2 bits  float_mode_round_16_64          Wavefront starts execution
+                                                     with specified rounding
+                                                     denorm mode for half/double (16
+                                                     and 64 bit) floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point rounding
+                                                     mode values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     17:16   2 bits  float_mode_denorm_32            Wavefront starts execution
+                                                     with specified denorm mode
+                                                     for single (32
+                                                     bit)  floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point denorm mode
+                                                     values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     19:18   2 bits  float_mode_denorm_16_64         Wavefront starts execution
+                                                     with specified denorm mode
+                                                     for half/double (16
+                                                     and 64 bit) floating point
+                                                     precision floating point
+                                                     operations.
+
+                                                     Floating point denorm mode
+                                                     values are defined in
+                                                     :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.FLOAT_MODE``.
+     20      1 bit   priv                            Must be 0.
+
+                                                     Start executing wavefront
+                                                     in privilege trap handler
+                                                     mode.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.PRIV``.
+     21      1 bit   enable_dx10_clamp               Wavefront starts execution
+                                                     with DX10 clamp mode
+                                                     enabled. Used by the vector
+                                                     ALU to force DX-10 style
+                                                     treatment of NaN's (when
+                                                     set, clamp NaN to zero,
+                                                     otherwise pass NaN
+                                                     through).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.DX10_CLAMP``.
+     22      1 bit   debug_mode                      Must be 0.
+
+                                                     Start executing wavefront
+                                                     in single step mode.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.DEBUG_MODE``.
+     23      1 bit   enable_ieee_mode                Wavefront starts execution
+                                                     with IEEE mode
+                                                     enabled. Floating point
+                                                     opcodes that support
+                                                     exception flag gathering
+                                                     will quiet and propagate
+                                                     signaling-NaN inputs per
+                                                     IEEE 754-2008. Min_dx10 and
+                                                     max_dx10 become IEEE
+                                                     754-2008 compliant due to
+                                                     signaling-NaN propagation
+                                                     and quieting.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC1.IEEE_MODE``.
+     24      1 bit   bulky                           Must be 0.
+
+                                                     Only one work-group allowed
+                                                     to execute on a compute
+                                                     unit.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.BULKY``.
+     25      1 bit   cdbg_user                       Must be 0.
+
+                                                     Flag that can be used to
+                                                     control debugging code.
+
+                                                     CP is responsible for
+                                                     filling in
+                                                     ``COMPUTE_PGM_RSRC1.CDBG_USER``.
+     31:26   6 bits                                  Reserved. Must be 0.
+     32      **Total size 4 bytes**
+     ======= ===================================================================================================================
+
+..
+
+  .. table:: compute_pgm_rsrc2 for GFX6-GFX9
+     :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table
+
+     ======= ======= =============================== ===========================================================================
+     Bits    Size    Field Name                      Description
+     ======= ======= =============================== ===========================================================================
+     0       1 bit   enable_sgpr_private_segment     Enable the setup of the
+                     _wave_offset                    SGPR wave scratch offset
+                                                     system register (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.SCRATCH_EN``.
+     5:1     5 bits  user_sgpr_count                 The total number of SGPR
+                                                     user data registers
+                                                     requested. This number must
+                                                     match the number of user
+                                                     data registers enabled.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.USER_SGPR``.
+     6       1 bit   enable_trap_handler             Set to 1 if code contains a
+                                                     TRAP instruction which
+                                                     requires a trap handler to
+                                                     be enabled.
+
+                                                     CP sets
+                                                     ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``
+                                                     if the runtime has
+                                                     installed a trap handler
+                                                     regardless of the setting
+                                                     of this field.
+     7       1 bit   enable_sgpr_workgroup_id_x      Enable the setup of the
+                                                     system SGPR register for
+                                                     the work-group id in the X
+                                                     dimension (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_X_EN``.
+     8       1 bit   enable_sgpr_workgroup_id_y      Enable the setup of the
+                                                     system SGPR register for
+                                                     the work-group id in the Y
+                                                     dimension (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_Y_EN``.
+     9       1 bit   enable_sgpr_workgroup_id_z      Enable the setup of the
+                                                     system SGPR register for
+                                                     the work-group id in the Z
+                                                     dimension (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_Z_EN``.
+     10      1 bit   enable_sgpr_workgroup_info      Enable the setup of the
+                                                     system SGPR register for
+                                                     work-group information (see
+                                                     :ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TGID_SIZE_EN``.
+     12:11   2 bits  enable_vgpr_workitem_id         Enable the setup of the
+                                                     VGPR system registers used
+                                                     for the work-item ID.
+                                                     :ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`
+                                                     defines the values.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.TIDIG_CMP_CNT``.
+     13      1 bit   enable_exception_address_watch  Must be 0.
+
+                                                     Wavefront starts execution
+                                                     with address watch
+                                                     exceptions enabled which
+                                                     are generated when L1 has
+                                                     witnessed a thread access
+                                                     an *address of
+                                                     interest*.
+
+                                                     CP is responsible for
+                                                     filling in the address
+                                                     watch bit in
+                                                     ``COMPUTE_PGM_RSRC2.EXCP_EN_MSB``
+                                                     according to what the
+                                                     runtime requests.
+     14      1 bit   enable_exception_memory         Must be 0.
+
+                                                     Wavefront starts execution
+                                                     with memory violation
+                                                     exceptions exceptions
+                                                     enabled which are generated
+                                                     when a memory violation has
+                                                     occurred for this wave from
+                                                     L1 or LDS
+                                                     (write-to-read-only-memory,
+                                                     mis-aligned atomic, LDS
+                                                     address out of range,
+                                                     illegal address, etc.).
+
+                                                     CP sets the memory
+                                                     violation bit in
+                                                     ``COMPUTE_PGM_RSRC2.EXCP_EN_MSB``
+                                                     according to what the
+                                                     runtime requests.
+     23:15   9 bits  granulated_lds_size             Must be 0.
+
+                                                     CP uses the rounded value
+                                                     from the dispatch packet,
+                                                     not this value, as the
+                                                     dispatch may contain
+                                                     dynamically allocated group
+                                                     segment memory. CP writes
+                                                     directly to
+                                                     ``COMPUTE_PGM_RSRC2.LDS_SIZE``.
+
+                                                     Amount of group segment
+                                                     (LDS) to allocate for each
+                                                     work-group. Granularity is
+                                                     device specific:
+
+                                                     GFX6:
+                                                       roundup(lds-size / (64 * 4))
+                                                     GFX7-GFX9:
+                                                       roundup(lds-size / (128 * 4))
+
+     24      1 bit   enable_exception_ieee_754_fp    Wavefront starts execution
+                     _invalid_operation              with specified exceptions
+                                                     enabled.
+
+                                                     Used by CP to set up
+                                                     ``COMPUTE_PGM_RSRC2.EXCP_EN``
+                                                     (set from bits 0..6).
+
+                                                     IEEE 754 FP Invalid
+                                                     Operation
+     25      1 bit   enable_exception_fp_denormal    FP Denormal one or more
+                     _source                         input operands is a
+                                                     denormal number
+     26      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP Division by
+                     _division_by_zero               Zero
+     27      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP FP Overflow
+                     _overflow
+     28      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP Underflow
+                     _underflow
+     29      1 bit   enable_exception_ieee_754_fp    IEEE 754 FP Inexact
+                     _inexact
+     30      1 bit   enable_exception_int_divide_by  Integer Division by Zero
+                     _zero                           (rcp_iflag_f32 instruction
+                                                     only)
+     31      1 bit                                   Reserved. Must be 0.
+     32      **Total size 4 bytes.**
+     ======= ===================================================================================================================
+
+..
+
+  .. table:: Floating Point Rounding Mode Enumeration Values
+     :name: amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table
+
+     ===================================== ===== ===============================
+     Enumeration Name                      Value Description
+     ===================================== ===== ===============================
+     AMD_FLOAT_ROUND_MODE_NEAR_EVEN        0     Round Ties To Even
+     AMD_FLOAT_ROUND_MODE_PLUS_INFINITY    1     Round Toward +infinity
+     AMD_FLOAT_ROUND_MODE_MINUS_INFINITY   2     Round Toward -infinity
+     AMD_FLOAT_ROUND_MODE_ZERO             3     Round Toward 0
+     ===================================== ===== ===============================
+
+..
+
+  .. table:: Floating Point Denorm Mode Enumeration Values
+     :name: amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table
+
+     ===================================== ===== ===============================
+     Enumeration Name                      Value Description
+     ===================================== ===== ===============================
+     AMD_FLOAT_DENORM_MODE_FLUSH_SRC_DST   0     Flush Source and Destination
+                                                 Denorms
+     AMD_FLOAT_DENORM_MODE_FLUSH_DST       1     Flush Output Denorms
+     AMD_FLOAT_DENORM_MODE_FLUSH_SRC       2     Flush Source Denorms
+     AMD_FLOAT_DENORM_MODE_FLUSH_NONE      3     No Flush
+     ===================================== ===== ===============================
+
+..
+
+  .. table:: System VGPR Work-Item ID Enumeration Values
+     :name: amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table
+
+     ===================================== ===== ===============================
+     Enumeration Name                      Value Description
+     ===================================== ===== ===============================
+     AMD_SYSTEM_VGPR_WORKITEM_ID_X         0     Set work-item X dimension ID.
+     AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y       1     Set work-item X and Y
+                                                 dimensions ID.
+     AMD_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z     2     Set work-item X, Y and Z
+                                                 dimensions ID.
+     AMD_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3     Undefined.
+     ===================================== ===== ===============================
+
+.. _amdgpu-amdhsa-initial-kernel-execution-state:
+
+Initial Kernel Execution State
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section defines the register state that will be set up by the packet
+processor prior to the start of execution of every wavefront. This is limited by
+the constraints of the hardware controllers of CP/ADC/SPI.
+
+The order of the SGPR registers is defined, but the compiler can specify which
+ones are actually setup in the kernel descriptor using the ``enable_sgpr_*`` bit
+fields (see :ref:`amdgpu-amdhsa-kernel-descriptor`). The register numbers used
+for enabled registers are dense starting at SGPR0: the first enabled register is
+SGPR0, the next enabled register is SGPR1 etc.; disabled registers do not have
+an SGPR number.
+
+The initial SGPRs comprise up to 16 User SRGPs that are set by CP and apply to
+all waves of the grid. It is possible to specify more than 16 User SGPRs using
+the ``enable_sgpr_*`` bit fields, in which case only the first 16 are actually
+initialized. These are then immediately followed by the System SGPRs that are
+set up by ADC/SPI and can have different values for each wave of the grid
+dispatch.
+
+SGPR register initial state is defined in
+:ref:`amdgpu-amdhsa-sgpr-register-set-up-order-table`.
+
+  .. table:: SGPR Register Set Up Order
+     :name: amdgpu-amdhsa-sgpr-register-set-up-order-table
+
+     ========== ========================== ====== ==============================
+     SGPR Order Name                       Number Description
+                (kernel descriptor enable  of
+                field)                     SGPRs
+     ========== ========================== ====== ==============================
+     First      Private Segment Buffer     4      V# that can be used, together
+                (enable_sgpr_private              with Scratch Wave Offset as an
+                _segment_buffer)                  offset, to access the private
+                                                  memory space using a segment
+                                                  address.
+
+                                                  CP uses the value provided by
+                                                  the runtime.
+     then       Dispatch Ptr               2      64 bit address of AQL dispatch
+                (enable_sgpr_dispatch_ptr)        packet for kernel dispatch
+                                                  actually executing.
+     then       Queue Ptr                  2      64 bit address of amd_queue_t
+                (enable_sgpr_queue_ptr)           object for AQL queue on which
+                                                  the dispatch packet was
+                                                  queued.
+     then       Kernarg Segment Ptr        2      64 bit address of Kernarg
+                (enable_sgpr_kernarg              segment. This is directly
+                _segment_ptr)                     copied from the
+                                                  kernarg_address in the kernel
+                                                  dispatch packet.
+
+                                                  Having CP load it once avoids
+                                                  loading it at the beginning of
+                                                  every wavefront.
+     then       Dispatch Id                2      64 bit Dispatch ID of the
+                (enable_sgpr_dispatch_id)         dispatch packet being
+                                                  executed.
+     then       Flat Scratch Init          2      This is 2 SGPRs:
+                (enable_sgpr_flat_scratch
+                _init)                            GFX6
+                                                    Not supported.
+                                                  GFX7-GFX8
+                                                    The first SGPR is a 32 bit
+                                                    byte offset from
+                                                    ``SH_HIDDEN_PRIVATE_BASE_VIMID``
+                                                    to per SPI base of memory
+                                                    for scratch for the queue
+                                                    executing the kernel
+                                                    dispatch. CP obtains this
+                                                    from the runtime.
+
+                                                    This is the same offset used
+                                                    in computing the Scratch
+                                                    Segment Buffer base
+                                                    address. The value of
+                                                    Scratch Wave Offset must be
+                                                    added by the kernel machine
+                                                    code and moved to SGPRn-4
+                                                    for use as the FLAT SCRATCH
+                                                    BASE in flat memory
+                                                    instructions.
+
+                                                    The second SGPR is 32 bit
+                                                    byte size of a single
+                                                    work-item’s scratch memory
+                                                    usage. This is directly
+                                                    loaded from the kernel
+                                                    dispatch packet Private
+                                                    Segment Byte Size and
+                                                    rounded up to a multiple of
+                                                    DWORD.
+
+                                                    The kernel code must move to
+                                                    SGPRn-3 for use as the FLAT
+                                                    SCRATCH SIZE in flat memory
+                                                    instructions. Having CP load
+                                                    it once avoids loading it at
+                                                    the beginning of every
+                                                    wavefront.
+                                                  GFX9
+                                                    This is the 64 bit base
+                                                    address of the per SPI
+                                                    scratch backing memory
+                                                    managed by SPI for the queue
+                                                    executing the kernel
+                                                    dispatch. CP obtains this
+                                                    from the runtime (and
+                                                    divides it if there are
+                                                    multiple Shader Arrays each
+                                                    with its own SPI). The value
+                                                    of Scratch Wave Offset must
+                                                    be added by the kernel
+                                                    machine code and moved to
+                                                    SGPRn-4 and SGPRn-3 for use
+                                                    as the FLAT SCRATCH BASE in
+                                                    flat memory instructions.
+     then       Private Segment Size       1      The 32 bit byte size of a
+                (enable_sgpr_private              single work-item’s scratch
+                _segment_size)                    memory allocation. This is the
+                                                  value from the kernel dispatch
+                                                  packet Private Segment Byte
+                                                  Size rounded up by CP to a
+                                                  multiple of DWORD.
+
+                                                  Having CP load it once avoids
+                                                  loading it at the beginning of
+                                                  every wavefront.
+
+                                                  This is not used for
+                                                  GFX7-GFX8 since it is the same
+                                                  value as the second SGPR of
+                                                  Flat Scratch Init. However, it
+                                                  may be needed for GFX9 which
+                                                  changes the meaning of the
+                                                  Flat Scratch Init value.
+     then       Grid Work-Group Count X    1      32 bit count of the number of
+                (enable_sgpr_grid                 work-groups in the X dimension
+                _workgroup_count_X)               for the grid being
+                                                  executed. Computed from the
+                                                  fields in the kernel dispatch
+                                                  packet as ((grid_size.x +
+                                                  workgroup_size.x - 1) /
+                                                  workgroup_size.x).
+     then       Grid Work-Group Count Y    1      32 bit count of the number of
+                (enable_sgpr_grid                 work-groups in the Y dimension
+                _workgroup_count_Y &&             for the grid being
+                less than 16 previous             executed. Computed from the
+                SGPRs)                            fields in the kernel dispatch
+                                                  packet as ((grid_size.y +
+                                                  workgroup_size.y - 1) /
+                                                  workgroupSize.y).
+
+                                                  Only initialized if <16
+                                                  previous SGPRs initialized.
+     then       Grid Work-Group Count Z    1      32 bit count of the number of
+                (enable_sgpr_grid                 work-groups in the Z dimension
+                _workgroup_count_Z &&             for the grid being
+                less than 16 previous             executed. Computed from the
+                SGPRs)                            fields in the kernel dispatch
+                                                  packet as ((grid_size.z +
+                                                  workgroup_size.z - 1) /
+                                                  workgroupSize.z).
+
+                                                  Only initialized if <16
+                                                  previous SGPRs initialized.
+     then       Work-Group Id X            1      32 bit work-group id in X
+                (enable_sgpr_workgroup_id         dimension of grid for
+                _X)                               wavefront.
+     then       Work-Group Id Y            1      32 bit work-group id in Y
+                (enable_sgpr_workgroup_id         dimension of grid for
+                _Y)                               wavefront.
+     then       Work-Group Id Z            1      32 bit work-group id in Z
+                (enable_sgpr_workgroup_id         dimension of grid for
+                _Z)                               wavefront.
+     then       Work-Group Info            1      {first_wave, 14’b0000,
+                (enable_sgpr_workgroup            ordered_append_term[10:0],
+                _info)                            threadgroup_size_in_waves[5:0]}
+     then       Scratch Wave Offset        1      32 bit byte offset from base
+                (enable_sgpr_private              of scratch base of queue
+                _segment_wave_offset)             executing the kernel
+                                                  dispatch. Must be used as an
+                                                  offset with Private
+                                                  segment address when using
+                                                  Scratch Segment Buffer. It
+                                                  must be used to set up FLAT
+                                                  SCRATCH for flat addressing
+                                                  (see
+                                                  :ref:`amdgpu-amdhsa-flat-scratch`).
+     ========== ========================== ====== ==============================
+
+The order of the VGPR registers is defined, but the compiler can specify which
+ones are actually setup in the kernel descriptor using the ``enable_vgpr*`` bit
+fields (see :ref:`amdgpu-amdhsa-kernel-descriptor`). The register numbers used
+for enabled registers are dense starting at VGPR0: the first enabled register is
+VGPR0, the next enabled register is VGPR1 etc.; disabled registers do not have a
+VGPR number.
+
+VGPR register initial state is defined in
+:ref:`amdgpu-amdhsa-vgpr-register-set-up-order-table`.
+
+  .. table:: VGPR Register Set Up Order
+     :name: amdgpu-amdhsa-vgpr-register-set-up-order-table
+
+     ========== ========================== ====== ==============================
+     VGPR Order Name                       Number Description
+                (kernel descriptor enable  of
+                field)                     VGPRs
+     ========== ========================== ====== ==============================
+     First      Work-Item Id X             1      32 bit work item id in X
+                (Always initialized)              dimension of work-group for
+                                                  wavefront lane.
+     then       Work-Item Id Y             1      32 bit work item id in Y
+                (enable_vgpr_workitem_id          dimension of work-group for
+                > 0)                              wavefront lane.
+     then       Work-Item Id Z             1      32 bit work item id in Z
+                (enable_vgpr_workitem_id          dimension of work-group for
+                > 1)                              wavefront lane.
+     ========== ========================== ====== ==============================
+
+The setting of registers is is done by GPU CP/ADC/SPI hardware as follows:
+
+1. SGPRs before the Work-Group Ids are set by CP using the 16 User Data
+   registers.
+2. Work-group Id registers X, Y, Z are set by ADC which supports any
+   combination including none.
+3. Scratch Wave Offset is set by SPI in a per wave basis which is why its value
+   cannot included with the flat scratch init value which is per queue.
+4. The VGPRs are set by SPI which only supports specifying either (X), (X, Y)
+   or (X, Y, Z).
+
+Flat Scratch register pair are adjacent SGRRs so they can be moved as a 64 bit
+value to the hardware required SGPRn-3 and SGPRn-4 respectively.
+
+The global segment can be accessed either using buffer instructions (GFX6 which
+has V# 64 bit address support), flat instructions (GFX7-9), or global
+instructions (GFX9).
+
+If buffer operations are used then the compiler can generate a V# with the
+following properties:
+
+* base address of 0
+* no swizzle
+* ATC: 1 if IOMMU present (such as APU)
+* ptr64: 1
+* MTYPE set to support memory coherence that matches the runtime (such as CC for
+  APU and NC for dGPU).
+
+.. _amdgpu-amdhsa-kernel-prolog:
+
+Kernel Prolog
+~~~~~~~~~~~~~
+
+.. _amdgpu-amdhsa-m0:
+
+M0
+++
+
+GFX6-GFX8
+  The M0 register must be initialized with a value at least the total LDS size
+  if the kernel may access LDS via DS or flat operations. Total LDS size is
+  available in dispatch packet. For M0, it is also possible to use maximum
+  possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
+  GFX7-GFX8).
+GFX9
+  The M0 register is not used for range checking LDS accesses and so does not
+  need to be initialized in the prolog.
+
+.. _amdgpu-amdhsa-flat-scratch:
+
+Flat Scratch
+++++++++++++
+
+If the kernel may use flat operations to access scratch memory, the prolog code
+must set up FLAT_SCRATCH register pair (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which
+are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch Wave
+Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
+
+GFX6
+  Flat scratch is not supported.
+
+GFX7-8
+  1. The low word of Flat Scratch Init is 32 bit byte offset from
+     ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
+     being managed by SPI for the queue executing the kernel dispatch. This is
+     the same value used in the Scratch Segment Buffer V# base address. The
+     prolog must add the value of Scratch Wave Offset to get the wave's byte
+     scratch backing memory offset from ``SH_HIDDEN_PRIVATE_BASE_VIMID``. Since
+     FLAT_SCRATCH_LO is in units of 256 bytes, the offset must be right shifted
+     by 8 before moving into FLAT_SCRATCH_LO.
+  2. The second word of Flat Scratch Init is 32 bit byte size of a single
+     work-items scratch memory usage. This is directly loaded from the kernel
+     dispatch packet Private Segment Byte Size and rounded up to a multiple of
+     DWORD. Having CP load it once avoids loading it at the beginning of every
+     wavefront. The prolog must move it to FLAT_SCRATCH_LO for use as FLAT SCRATCH
+     SIZE.
+GFX9
+  The Flat Scratch Init is the 64 bit address of the base of scratch backing
+  memory being managed by SPI for the queue executing the kernel dispatch. The
+  prolog must add the value of Scratch Wave Offset and moved to the FLAT_SCRATCH
+  pair for use as the flat scratch base in flat memory instructions.
+
+.. _amdgpu-amdhsa-memory-model:
+
+Memory Model
+~~~~~~~~~~~~
+
+This section describes the mapping of LLVM memory model onto AMDGPU machine code
+(see :ref:`memmodel`). *The implementation is WIP.*
+
+.. TODO
+   Update when implementation complete.
+
+   Support more relaxed OpenCL memory model to be controlled by environment
+   component of target triple.
+
+The AMDGPU backend supports the memory synchronization scopes specified in
+:ref:`amdgpu-memory-scopes`.
+
+The code sequences used to implement the memory model are defined in table
+:ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table`.
+
+The sequences specify the order of instructions that a single thread must
+execute. The ``s_waitcnt`` and ``buffer_wbinvl1_vol`` are defined with respect
+to other memory instructions executed by the same thread. This allows them to be
+moved earlier or later which can allow them to be combined with other instances
+of the same instruction, or hoisted/sunk out of loops to improve
+performance. Only the instructions related to the memory model are given;
+additional ``s_waitcnt`` instructions are required to ensure registers are
+defined before being used. These may be able to be combined with the memory
+model ``s_waitcnt`` instructions as described above.
+
+The AMDGPU memory model supports both the HSA [HSA]_ memory model, and the
+OpenCL [OpenCL]_ memory model. The HSA memory model uses a single happens-before
+relation for all address spaces (see :ref:`amdgpu-address-spaces`). The OpenCL
+memory model which has separate happens-before relations for the global and
+local address spaces, and only a fence specifying both global and local address
+space joins the relationships. Since the LLVM ``memfence`` instruction does not
+allow an address space to be specified the OpenCL fence has to convervatively
+assume both local and global address space was specified. However, optimizations
+can often be done to eliminate the additional ``s_waitcnt``instructions when
+there are no intervening corresponding ``ds/flat_load/store/atomic`` memory
+instructions. The code sequences in the table indicate what can be omitted for
+the OpenCL memory. The target triple environment is used to determine if the
+source language is OpenCL (see :ref:`amdgpu-opencl`).
+
+``ds/flat_load/store/atomic`` instructions to local memory are termed LDS
+operations.
+
+``buffer/global/flat_load/store/atomic`` instructions to global memory are
+termed vector memory operations.
+
+For GFX6-GFX9:
+
+* Each agent has multiple compute units (CU).
+* Each CU has multiple SIMDs that execute wavefronts.
+* The wavefronts for a single work-group are executed in the same CU but may be
+  executed by different SIMDs.
+* Each CU has a single LDS memory shared by the wavefronts of the work-groups
+  executing on it.
+* All LDS operations of a CU are performed as wavefront wide operations in a
+  global order and involve no caching. Completion is reported to a wavefront in
+  execution order.
+* The LDS memory has multiple request queues shared by the SIMDs of a
+  CU. Therefore, the LDS operations performed by different waves of a work-group
+  can be reordered relative to each other, which can result in reordering the
+  visibility of vector memory operations with respect to LDS operations of other
+  wavefronts in the same work-group. A ``s_waitcnt lgkmcnt(0)`` is required to
+  ensure synchronization between LDS operations and vector memory operations
+  between waves of a work-group, but not between operations performed by the
+  same wavefront.
+* The vector memory operations are performed as wavefront wide operations and
+  completion is reported to a wavefront in execution order. The exception is
+  that for GFX7-9 ``flat_load/store/atomic`` instructions can report out of
+  vector memory order if they access LDS memory, and out of LDS operation order
+  if they access global memory.
+* The vector memory operations access a vector L1 cache shared by all wavefronts
+  on a CU. Therefore, no special action is required for coherence between
+  wavefronts in the same work-group. A ``buffer_wbinvl1_vol`` is required for
+  coherence between waves executing in different work-groups as they may be
+  executing on different CUs.
+* The scalar memory operations access a scalar L1 cache shared by all wavefronts
+  on a group of CUs. The scalar and vector L1 caches are not coherent. However,
+  scalar operations are used in a restricted way so do not impact the memory
+  model. See :ref:`amdgpu-amdhsa-memory-spaces`.
+* The vector and scalar memory operations use an L2 cache shared by all CUs on
+  the same agent.
+* The L2 cache has independent channels to service disjoint ranges of virtual
+  addresses.
+* Each CU has a separate request queue per channel. Therefore, the vector and
+  scalar memory operations performed by waves executing in different work-groups
+  (which may be executing on different CUs) of an agent can be reordered
+  relative to each other. A ``s_waitcnt vmcnt(0)`` is required to ensure
+  synchronization between vector memory operations of different CUs. It ensures a
+  previous vector memory operation has completed before executing a subsequent
+  vector memory or LDS operation and so can be used to meet the requirements of
+  acquire and release.
+* The L2 cache can be kept coherent with other agents on some targets, or ranges
+  of virtual addresses can be set up to bypass it to ensure system coherence.
+
+Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-8),
+or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
+memory, atomic memory orderings are not meaningful and all accesses are treated
+as non-atomic.
+
+Constant address space uses ``buffer/global_load`` instructions (or equivalent
+scalar memory instructions). Since the constant address space contents do not
+change during the execution of a kernel dispatch it is not legal to perform
+stores, and atomic memory orderings are not meaningful and all access are
+treated as non-atomic.
+
+A memory synchronization scope wider than work-group is not meaningful for the
+group (LDS) address space and is treated as work-group.
+
+The memory model does not support the region address space which is treated as
+non-atomic.
+
+Acquire memory ordering is not meaningful on store atomic instructions and is
+treated as non-atomic.
+
+Release memory ordering is not meaningful on load atomic instructions and is
+treated a non-atomic.
+
+Acquire-release memory ordering is not meaningful on load or store atomic
+instructions and is treated as acquire and release respectively.
+
+AMDGPU backend only uses scalar memory operations to access memory that is
+proven to not change during the execution of the kernel dispatch. This includes
+constant address space and global address space for program scope const
+variables. Therefore the kernel machine code does not have to maintain the
+scalar L1 cache to ensure it is coherent with the vector L1 cache. The scalar
+and vector L1 caches are invalidated between kernel dispatches by CP since
+constant address space data may change between kernel dispatch executions. See
+:ref:`amdgpu-amdhsa-memory-spaces`.
+
+The one execption is if scalar writes are used to spill SGPR registers. In this
+case the AMDGPU backend ensures the memory location used to spill is never
+accessed by vector memory operations at the same time. If scalar writes are used
+then a ``s_dcache_wb`` is inserted before the ``s_endpgm`` and before a function
+return since the locations may be used for vector memory instructions by a
+future wave that uses the same scratch area, or a function call that creates a
+frame at the same address, respectively. There is no need for a ``s_dcache_inv``
+as all scalar writes are write-before-read in the same thread.
+
+Scratch backing memory (which is used for the private address space) is accessed
+with MTYPE NC_NV (non-coherenent non-volatile). Since the private address space
+is only accessed by a single thread, and is always write-before-read,
+there is never a need to invalidate these entries from the L1 cache. Hence all
+cache invalidates are done as ``*_vol`` to only invalidate the volatile cache
+lines.
+
+On dGPU the kernarg backing memory is accessed as UC (uncached) to avoid needing
+to invalidate the L2 cache. This also causes it to be treated as non-volatile
+and so is not invalidated by ``*_vol``. On APU it is accessed as CC (cache
+coherent) and so the L2 cache will coherent with the CPU and other agents.
+
+  .. table:: AMDHSA Memory Model Code Sequences GFX6-GFX9
+     :name: amdgpu-amdhsa-memory-model-code-sequences-gfx6-gfx9-table
+
+     ============ ============ ============== ========== =======================
+     LLVM Instr   LLVM Memory  LLVM Memory    AMDGPU     AMDGPU Machine Code
+                  Ordering     Sync Scope     Address
+                                              Space
+     ============ ============ ============== ========== =======================
+     **Non-Atomic**
+     ---------------------------------------------------------------------------
+     load         *none*       *none*         - global   non-volatile
+                                              - generic    1. buffer/global/flat_load
+                                                         volatile
+                                                           1. buffer/global/flat_load
+                                                              glc=1
+     load         *none*       *none*         - local    1. ds_load
+     store        *none*       *none*         - global   1. buffer/global/flat_store
+                                              - generic
+     store        *none*       *none*         - local    1. ds_store
+     **Unordered Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  unordered    *any*          *any*      *Same as non-atomic*.
+     store atomic unordered    *any*          *any*      *Same as non-atomic*.
+     atomicrmw    unordered    *any*          *any*      *Same as monotonic
+                                                         atomic*.
+     **Monotonic Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  monotonic    - singlethread - global   1. buffer/global/flat_load
+                               - wavefront    - generic
+                               - workgroup
+     load atomic  monotonic    - singlethread - local    1. ds_load
+                               - wavefront
+                               - workgroup
+     load atomic  monotonic    - agent        - global   1. buffer/global/flat_load
+                               - system       - generic     glc=1
+     store atomic monotonic    - singlethread - global   1. buffer/global/flat_store
+                               - wavefront    - generic
+                               - workgroup
+                               - agent
+                               - system
+     store atomic monotonic    - singlethread - local    1. ds_store
+                               - wavefront
+                               - workgroup
+     atomicrmw    monotonic    - singlethread - global   1. buffer/global/flat_atomic
+                               - wavefront    - generic
+                               - workgroup
+                               - agent
+                               - system
+     atomicrmw    monotonic    - singlethread - local    1. ds_atomic
+                               - wavefront
+                               - workgroup
+     **Acquire Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  acquire      - singlethread - global   1. buffer/global/ds/flat_load
+                               - wavefront    - local
+                                              - generic
+     load atomic  acquire      - workgroup    - global   1. buffer/global_load
+     load atomic  acquire      - workgroup    - local    1. ds/flat_load
+                                              - generic  2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+
+     load atomic  acquire      - agent        - global   1. buffer/global_load
+                               - system                     glc=1
+                                                         2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the load
+                                                             has completed
+                                                             before invalidating
+                                                             the cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following
+                                                             loads will not see
+                                                             stale global data.
+
+     load atomic  acquire      - agent        - generic  1. flat_load glc=1
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the flat_load
+                                                             has completed
+                                                             before invalidating
+                                                             the cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    acquire      - workgroup    - global   1. buffer/global_atomic
+     atomicrmw    acquire      - workgroup    - local    1. ds/flat_atomic
+                                              - generic  2. waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the
+                                                             atomicrmw value
+                                                             being acquired.
+
+     atomicrmw    acquire      - agent        - global   1. buffer/global_atomic
+                               - system                  2. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acquire      - agent        - generic  1. flat_atomic
+                               - system                  2. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         3. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acquire      - singlethread *none*     *none*
+                               - wavefront
+     fence        acquire      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             waitcnt. However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate. If
+                                                             fence had an
+                                                             address space then
+                                                             set to address
+                                                             space of OpenCL
+                                                             fence flag, or to
+                                                             generic if both
+                                                             local and global
+                                                             flags are
+                                                             specified.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the
+                                                             value read by the
+                                                             fence-paired-atomic.
+
+     fence        acquire      - agent        *none*     1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                             However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             group/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures that the
+                                                             fence-paired atomic
+                                                             has completed
+                                                             before invalidating
+                                                             the
+                                                             cache. Therefore
+                                                             any following
+                                                             locations read must
+                                                             be no older than
+                                                             the value read by
+                                                             the
+                                                             fence-paired-atomic.
+
+                                                         2. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     **Release Atomic**
+     ---------------------------------------------------------------------------
+     store atomic release      - singlethread - global   1. buffer/global/ds/flat_store
+                               - wavefront    - local
+                                              - generic
+     store atomic release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/flat_store
+     store atomic release      - workgroup    - local    1. ds_store
+     store atomic release      - agent        - global   1. s_waitcnt vmcnt(0) &
+                               - system       - generic     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             store.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             store that is being
+                                                             released.
+
+                                                         2. buffer/global/ds/flat_store
+     atomicrmw    release      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    release      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+                                              - generic
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global/flat_atomic
+     atomicrmw    release      - workgroup    - local    1. ds_atomic
+     atomicrmw    release      - agent        - global   1. s_waitcnt vmcnt(0) &
+                               - system       - generic     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global and local
+                                                             have completed
+                                                             before performing
+                                                             the atomicrmw that
+                                                             is being released.
+
+                                                         2. buffer/global/ds/flat_atomic*
+     fence        release      - singlethread *none*     *none*
+                               - wavefront
+     fence        release      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             waitcnt. However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     fence        release      - agent        *none*     1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                             However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             following
+                                                             fence-paired-atomic.
+
+     **Acquire-Release Atomic**
+     ---------------------------------------------------------------------------
+     atomicrmw    acq_rel      - singlethread - global   1. buffer/global/ds/flat_atomic
+                               - wavefront    - local
+                                              - generic
+     atomicrmw    acq_rel      - workgroup    - global   1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+     atomicrmw    acq_rel      - workgroup    - local    1. ds_atomic
+                                                         2. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+
+     atomicrmw    acq_rel      - workgroup    - generic  1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             waitcnt.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures any
+                                                             following global
+                                                             data read is no
+                                                             older than the load
+                                                             atomic value being
+                                                             acquired.
+     atomicrmw    acq_rel      - agent        - global   1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. buffer/global_atomic
+                                                         3. s_waitcnt vmcnt(0)
+
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         4. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     atomicrmw    acq_rel      - agent        - generic  1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to global have
+                                                             completed before
+                                                             performing the
+                                                             atomicrmw that is
+                                                             being released.
+
+                                                         2. flat_atomic
+                                                         3. s_waitcnt vmcnt(0) &
+                                                            lgkmcnt(0)
+
+                                                           - If OpenCL, omit
+                                                             lgkmcnt(0).
+                                                           - Must happen before
+                                                             following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures the
+                                                             atomicrmw has
+                                                             completed before
+                                                             invalidating the
+                                                             cache.
+
+                                                         4. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data.
+
+     fence        acq_rel      - singlethread *none*     *none*
+                               - wavefront
+     fence        acq_rel      - workgroup    *none*     1. s_waitcnt lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             waitcnt. However,
+                                                             since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that all
+                                                             memory operations
+                                                             to local have
+                                                             completed before
+                                                             performing any
+                                                             following global
+                                                             memory operations.
+                                                           - Ensures that the
+                                                             preceding
+                                                             local/generic load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic)
+                                                             has completed
+                                                             before following
+                                                             global memory
+                                                             operations. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             local/generic store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+     fence        acq_rel      - agent        *none*     1. s_waitcnt vmcnt(0) &
+                               - system                     lgkmcnt(0)
+
+                                                           - If OpenCL and
+                                                             address space is
+                                                             not generic, omit
+                                                             lgkmcnt(0).
+                                                             However, since LLVM
+                                                             currently has no
+                                                             address space on
+                                                             the fence need to
+                                                             conservatively
+                                                             always generate
+                                                             (see comment for
+                                                             previous fence).
+                                                           - Could be split into
+                                                             separate s_waitcnt
+                                                             vmcnt(0) and
+                                                             s_waitcnt
+                                                             lgkmcnt(0) to allow
+                                                             them to be
+                                                             independently moved
+                                                             according to the
+                                                             following rules.
+                                                           - s_waitcnt vmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             global/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - s_waitcnt lgkmcnt(0)
+                                                             must happen after
+                                                             any preceding
+                                                             local/generic
+                                                             load/store/load
+                                                             atomic/store
+                                                             atomic/atomicrmw.
+                                                           - Must happen before
+                                                             the following
+                                                             buffer_wbinvl1_vol.
+                                                           - Ensures that the
+                                                             preceding
+                                                             global/local/generic
+                                                             load
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic)
+                                                             has completed
+                                                             before invalidating
+                                                             the cache. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+                                                           - Ensures that all
+                                                             previous memory
+                                                             operations have
+                                                             completed before a
+                                                             following
+                                                             global/local/generic
+                                                             store
+                                                             atomic/atomicrmw
+                                                             with an equal or
+                                                             wider sync scope
+                                                             and memory ordering
+                                                             stronger than
+                                                             unordered (this is
+                                                             termed the
+                                                             fence-paired-atomic).
+                                                             This satisfies the
+                                                             requirements of
+                                                             release.
+
+                                                         2. buffer_wbinvl1_vol
+
+                                                           - Must happen before
+                                                             any following
+                                                             global/generic
+                                                             load/load
+                                                             atomic/store/store
+                                                             atomic/atomicrmw.
+                                                           - Ensures that
+                                                             following loads
+                                                             will not see stale
+                                                             global data. This
+                                                             satisfies the
+                                                             requirements of
+                                                             acquire.
+
+     **Sequential Consistent Atomic**
+     ---------------------------------------------------------------------------
+     load atomic  seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    load atomic acquire*.
+                               - workgroup    - generic
+     load atomic  seq_cst      - agent        - global   1. s_waitcnt vmcnt(0)
+                               - system       - local
+                                              - generic    - Must happen after
+                                                             preceding
+                                                             global/generic load
+                                                             atomic/store
+                                                             atomic/atomicrmw
+                                                             with memory
+                                                             ordering of seq_cst
+                                                             and with equal or
+                                                             wider sync scope.
+                                                             (Note that seq_cst
+                                                             fences have their
+                                                             own s_waitcnt
+                                                             vmcnt(0) and so do
+                                                             not need to be
+                                                             considered.)
+                                                           - Ensures any
+                                                             preceding
+                                                             sequential
+                                                             consistent global
+                                                             memory instructions
+                                                             have completed
+                                                             before executing
+                                                             this sequentially
+                                                             consistent
+                                                             instruction. This
+                                                             prevents reordering
+                                                             a seq_cst store
+                                                             followed by a
+                                                             seq_cst load (Note
+                                                             that seq_cst is
+                                                             stronger than
+                                                             acquire/release as
+                                                             the reordering of
+                                                             load acquire
+                                                             followed by a store
+                                                             release is
+                                                             prevented by the
+                                                             waitcnt vmcnt(0) of
+                                                             the release, but
+                                                             there is nothing
+                                                             preventing a store
+                                                             release followed by
+                                                             load acquire from
+                                                             competing out of
+                                                             order.)
+
+                                                         2. *Following
+                                                            instructions same as
+                                                            corresponding load
+                                                            atomic acquire*.
+
+     store atomic seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    store atomic release*.
+                               - workgroup    - generic
+     store atomic seq_cst      - agent        - global   *Same as corresponding
+                               - system       - generic  store atomic release*.
+     atomicrmw    seq_cst      - singlethread - global   *Same as corresponding
+                               - wavefront    - local    atomicrmw acq_rel*.
+                               - workgroup    - generic
+     atomicrmw    seq_cst      - agent        - global   *Same as corresponding
+                               - system       - generic  atomicrmw acq_rel*.
+     fence        seq_cst      - singlethread *none*     *Same as corresponding
+                               - wavefront               fence acq_rel*.
+                               - workgroup
+                               - agent
+                               - system
+     ============ ============ ============== ========== =======================
+
+The memory order also adds the single thread optimization constrains defined in
+table
+:ref:`amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx9-table`.
+
+  .. table:: AMDHSA Memory Model Single Thread Optimization Constraints GFX6-GFX9
+     :name: amdgpu-amdhsa-memory-model-single-thread-optimization-constraints-gfx6-gfx9-table
+
+     ============ ==============================================================
+     LLVM Memory  Optimization Constraints
+     Ordering
+     ============ ==============================================================
+     unordered    *none*
+     monotonic    *none*
+     acquire      - If a load atomic/atomicrmw then no following load/load
+                    atomic/store/ store atomic/atomicrmw/fence instruction can
+                    be moved before the acquire.
+                  - If a fence then same as load atomic, plus no preceding
+                    associated fence-paired-atomic can be moved after the fence.
+     release      - If a store atomic/atomicrmw then no preceding load/load
+                    atomic/store/ store atomic/atomicrmw/fence instruction can
+                    be moved after the release.
+                  - If a fence then same as store atomic, plus no following
+                    associated fence-paired-atomic can be moved before the
+                    fence.
+     acq_rel      Same constraints as both acquire and release.
+     seq_cst      - If a load atomic then same constraints as acquire, plus no
+                    preceding sequentially consistent load atomic/store
+                    atomic/atomicrmw/fence instruction can be moved after the
+                    seq_cst.
+                  - If a store atomic then the same constraints as release, plus
+                    no following sequentially consistent load atomic/store
+                    atomic/atomicrmw/fence instruction can be moved before the
+                    seq_cst.
+                  - If an atomicrmw/fence then same constraints as acq_rel.
+     ============ ==============================================================
+
+Trap Handler ABI
+~~~~~~~~~~~~~~~~
+
+For code objects generated by AMDGPU backend for HSA [HSA]_ compatible runtimes
+(such as ROCm [AMD-ROCm]_), the runtime installs a trap handler that supports
+the ``s_trap`` instruction with the following usage:
+
+  .. table:: AMDGPU Trap Handler for AMDHSA OS
+     :name: amdgpu-trap-handler-for-amdhsa-os-table
+
+     =================== =============== =============== =======================
+     Usage               Code Sequence   Trap Handler    Description
+                                         Inputs
+     =================== =============== =============== =======================
+     reserved            ``s_trap 0x00``                 Reserved by hardware.
+     ``debugtrap(arg)``  ``s_trap 0x01`` ``SGPR0-1``:    Reserved for HSA
+                                           ``queue_ptr`` ``debugtrap``
+                                         ``VGPR0``:      intrinsic (not
+                                           ``arg``       implemented).
+     ``llvm.trap``       ``s_trap 0x02`` ``SGPR0-1``:    Causes dispatch to be
+                                           ``queue_ptr`` terminated and its
+                                                         associated queue put
+                                                         into the error state.
+     ``llvm.debugtrap``  ``s_trap 0x03`` ``SGPR0-1``:    If debugger not
+                                           ``queue_ptr`` installed handled
+                                                         same as ``llvm.trap``.
+     debugger breakpoint ``s_trap 0x07``                 Reserved for  debugger
+                                                         breakpoints.
+     debugger            ``s_trap 0x08``                 Reserved for debugger.
+     debugger            ``s_trap 0xfe``                 Reserved for debugger.
+     debugger            ``s_trap 0xff``                 Reserved for debugger.
+     =================== =============== =============== =======================
+
+Non-AMDHSA
+----------
+
+Trap Handler ABI
+~~~~~~~~~~~~~~~~
+
+For code objects generated by AMDGPU backend for non-amdhsa OS, the runtime does
+not install a trap handler. The ``llvm.trap`` and ``llvm.debugtrap``
+instructions are handled as follows:
+
+  .. table:: AMDGPU Trap Handler for Non-AMDHSA OS
+     :name: amdgpu-trap-handler-for-non-amdhsa-os-table
+
+     =============== =============== ===========================================
+     Usage           Code Sequence   Description
+     =============== =============== ===========================================
+     llvm.trap       s_