<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    TL;DR This RFC is a joint effort by Intel and Saarland University to

    bring the divergence analysis of the Region Vectorizer [1,2,3,4,5]

    (dubbed the vectorization analysis of RV) to LLVM. The

    implementation is available on github for feedback [0]. The existing

    divergence analysis infrastructure in LLVM has conceptual

    limitations (structured control, SCEV based). The new analysis

    resolves bugs for the GPU backends from the first patch on and will

    be used by VPlan [6] in the mid term. The implementation is based on

    RV, the Region Vectorizer.<br>

    <br>

    <br>

    -- RV --<br>

    <br>

    The Region Vectorizer (github [1]) is an analysis and transformation

    framework for outer-loop and whole-function vectorization. RV

    vectorizes arbitrary, reducible control flow including nested

    divergent loops through partial control-flow linearization [2]. RV

    is being used by the Impala [7] and the PACXX [3,8] high performance

    programming frameworks and implements OpenMP #pragma omp simd and

    #pragma omp declare simd.<br>

    <br>

    <br>

    -- The Divergence Analysis --<br>

    <br>

    The Divergence Analysis determines how instructions will behave if

    executed in lockstep for multiple threads or vector lanes. The loop

    vectorizer (VPlan) uses divergence information to generate correct

    and efficient vector code. The GPU backends rely on it to

    re-structure the CFG.<br>

    An instruction is uniform if it evaluates to the same result for all

    (active) threads. Otherwise, it is is called divergent. Divergent

    branch conditions cause control to diverge at branches and can turn

    loops into divergent loops. Divergent branches induce divergence in

    phi nodes, if the branch divergence implies that a phi node could be

    reached by different threads from two distinct predecessors at once

    or from two distinct loop iterations at once (temporal divergence).<br>

    <br>

    Example A: Divergence analysis result with regard to the outer loop.<br>

    <tt>for (int thread_id = 0; thread_id < n; ++thread_id) {</tt><tt>

      // thread_id divergent<br>

    </tt><tt>  V = 0; // V uniform</tt><tt><br>

    </tt><tt>  if (A[thread_id] > 0) { // divergent branch</tt><tt><br>

    </tt><tt>    V = 1; // V uniform</tt><tt><br>

    </tt><tt>  }</tt><tt><br>

    </tt><tt>  // divergent phi for V (control induced)</tt><tt><br>

    </tt><tt>  int j = 0; // uniform</tt><tt><br>

    </tt><tt>  for (; j < m; ++j) { // uniform loop exit</tt><tt><br>

    </tt><tt>    if (B[j] > 50) { // uniform branch</tt><tt><br>

    </tt><tt>      C[j][thread_id] = 42;</tt><tt><br>

    </tt><tt>    }</tt><tt><br>

    </tt><tt>    if (D[j] < V) break; // divergent loop exit</tt><tt><br>

    </tt><tt>  }</tt><tt><br>

    </tt><tt>  x = j; // divergent (temporal; due to divergent loop)</tt><tt><br>

    </tt><tt>}</tt><br>

    <br>

    <br>

    -- Motivation --<br>

    <br>

    The LoopVectorizer and VPlan currently use SCEV to detect uniform

    values. The StructurizeCFG pass and the AMDGPU backends rely on

    LLVM’s existing DivergenceAnalysis pass for SPMD kernels. Both

    approaches have shortcomings that are resolved by the new analysis.<br>

    <br>

    <br>

    - Why the current DivergenceAnalysis is broken -<br>

    <br>

    LLVM already includes a DivergenceAnalysis for GPU kernels. It is

    used in the GPU backends and to re-structure CFGs (StructurizeCFG

    pass). However, the existing DivergenceAnalysis of LLVM is broken

    for unstructured CFGs (bug report [9]):<br>

    <tt><br>

    </tt>Example B: undetected divergence in unstructured control.<tt><br>

    </tt><tt>entry:</tt><tt><br class="kix-line-break">

    </tt><tt>  divergent branch to B and C</tt><tt><br

        class="kix-line-break">

    </tt><tt>B:</tt><tt><br class="kix-line-break">

    </tt><tt>  uniform branch to C and D</tt><tt><br

        class="kix-line-break">

    </tt><tt>C:</tt><tt><br class="kix-line-break">

    </tt><tt>  %phi.var.hidden = phi .. <- undetected DIVERGENT phi</tt><tt><br

        class="kix-line-break">

    </tt><tt>  br D</tt><tt><br class="kix-line-break">

    </tt><tt>D:</tt><tt><br class="kix-line-break">

    </tt><tt>  %phi.var = phi .. <- detected DIVERGENT phi in

      IPostDom of entry</tt><tt><br class="kix-line-break">

    </tt><tt>  Ret</tt><br>

    <br>

    The existing DivergenceAnalysis expects control to re-converge only

    at the immediate post dominator of a branch, which is not the case

    for unstructured control as shown in the example. The new analysis

    supports unstructured control flow.<br>

    <br>

    <br>

    -- Why SCEV is not enough --<br>

    <br>

    SCEV is restricted to induction variables and (piecewise) polynomial

    functions. The divergence analysis on the other hand is able to

    detect uniform instructions and branches in arbitrary programs. The

    key takeaway here is that “there are uniform values that are not

    invariant.” (<a

href="http://lists.llvm.org/pipermail/llvm-dev/2017-December/119522.html"

      style="text-decoration:none;">http://lists.llvm.org/pipermail/llvm-dev/2017-December/119522.html</a>).<br>

    <br>

    Example A: non-affine loop (Test NonAffineUniLoop.ll)<br>

    <tt>for (int i = 0; i < n; ++i) {</tt><tt><br>

    </tt><tt>  for (int len = 1; len * 2 < n; len *= 2) {</tt><tt><br>

    </tt><tt>    for (int k = 0; k < n k += 2 * len) {</tt><tt><br>

    </tt><tt>      A[k * n + i] += A[(k + len) * n + i];</tt><tt><br>

    </tt><tt>    }</tt><tt><br>

    </tt><tt>  }</tt><tt><br>

    </tt><tt>}</tt><br>

    <br>

    The variable “len” is uniform with regards to the outer i-loop. SCEV

    does not detect this:<br>

    <tt>%mul44 U: [0,-1) S: [-2147483648,2147483647)</tt><tt><br>

    </tt><tt>Exits: <<Unknown>></tt><tt><br>

    </tt><tt>LoopDispositions: { %for.body8.lr.ph: Variant, %for.body:

      Variant, %for.body8: Invariant }</tt><tt><br>

    </tt><br>

    <br>

    -- The new DivergenceAnalysis --<br>

    <br>

    The proposed DivergenceAnalysis detects divergence in unstructured,

    reducible control flow and nested divergent loops. The

    implementation provides light-weight frontends for use in VPlan

    (LoopDivergenceAnalysis) and the analysis of GPU kernels

    (GPUDivergenceAnalysis). Both frontends build on a single, shared

    implementation of the actual divergence analysis algorithm.<br>

    <br>

    Apart from solving current issues with divergence detection in LLVM,

    the new analysis will enable the adoption of advanced vectorization

    techniques in VPlan that are already implemented in RV, for example

    partial linearization [2], automatic conversion of divergent loops,

    etc).<br>

    <br>

    <br>

    --- Planned patches ---<br>

    <br>

    -- Patch set 1 (reference impl. [0]) --<br>

    o New DivergenceAnalysis class, which implements a basic divergence

    analysis (uniform/divergent) with support for unstructured,

    reducible control and nested divergent loops. The analysis operates

    on LLVM IR.GPUDivergenceAnalysis: analysis frontend for SPMD kernels

    (fixes the unstructured control bug [9]). Acts as a drop-in

    replacement for the existing DivergenceAnalysis in LLVM (opt-in flag

    -use-rv-da).<br>

    o LoopDivergenceAnalysis: frontend for loop vectorization in

    preparation of VPlan [6].<br>

    o The existing DivergenceAnalysis gets renamed to

    KernelDivergenceAnalysis.<br>

    o LIT tests for both frontends.<br>

    <br>

    -- Patch set 2 --<br>

    o Integration with VPlan to detect uniform values in outer loop

    vectorization (opt-in flag -vectorizer-use-da).<br>

    o More expressive analysis lattice (uniform/consecutive/strided…  +

    alignment). This is the full sa lattice (stride+alignment lattice)

    used in RV [3].<br>

    <br>

    -- Patch set 3 --<br>

    o Templatize the divergence analysis to operate directly on VPlans

    also (block/instruction type parametric).<br>

    <br>

    <br>

    We provide the implementation of this RFC as a fork of LLVM on

    github [0]. We welcome your feedback. This implementation will be

    the source of upcoming patches.<br>

    <br>

    <br>

    Thanks,<br>

    <br>

    Simon<br>

    <br>

    <br>

    <br>

    <br>

    [0] VPlan+RV: <a class="moz-txt-link-freetext"

      href="https://github.com/cdl-saarland/vplan-rv">https://github.com/cdl-saarland/vplan-rv</a><br>

    [1] RV on github: <a href="https://github.com/cdl-saarland/rv"

      style="text-decoration:none;">https://github.com/cdl-saarland/rv</a><br>

    [2] Partial Control-Flow Linearization, Moll, Hack, PLDI ‘18 (to

    appear): <a moz-do-not-send="true"

      href="http://compilers.cs.uni-saarland.de/papers/moll_parlin_pldi18.pdf">http://compilers.cs.uni-saarland.de/papers/moll_parlin_pldi18.pdf</a><br>

    [3] PACXXv2 + RV: An LLVM-based Portable High-Performance

    Programming Model: <a

      href="https://dl.acm.org/citation.cfm?id=3148185"

      style="text-decoration:none;">https://dl.acm.org/citation.cfm?id=3148185</a><br>

    [4] VPlan+RV SRC talk at US DevMtg ‘17: <a

      href="https://llvm.org/devmtg/2017-10/#src1"

      style="text-decoration:none;">https://llvm.org/devmtg/2017-10/#src1</a><br>

    [5] VPlan+RV Lightning talk at EuroLLVM ‘18: <a

      href="http://llvm.org/devmtg/2018-04/talks.html#Lightning_16"

      style="text-decoration:none;">http://llvm.org/devmtg/2018-04/talks.html#Lightning_16</a><br>

    [6] VPlan proposal: <a

      href="https://llvm.org/docs/Proposals/VectorizationPlan.html"

      style="text-decoration:none;">https://llvm.org/docs/Proposals/VectorizationPlan.html</a><br>

    [7] Impala: <a href="https://anydsl.github.io/Impala"

      style="text-decoration:none;">https://anydsl.github.io/Impala</a><br>

    [8] PACXX: <a href="https://github.com/pacxx"

      style="text-decoration:none;">https://github.com/pacxx</a><br>

    [9] DivergenceAnalysis bug <a

      href="https://bugs.llvm.org/show_bug.cgi?id=37185"

      style="text-decoration:none;">https://bugs.llvm.org/show_bug.cgi?id=37185</a><br>

    <br>

    <pre class="moz-signature" cols="72">-- 


Simon Moll

Researcher / PhD Student


Compiler Design Lab (Prof. Hack)

Saarland University, Computer Science

Building E1.3, Room 4.31


Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>

Fax. +49 (0)681 302-3065  : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>

  </body>

</html>