<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 2/4/19 9:18 PM, Robin Kruppe wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAJrduR7bEMR=utbUx5Rcdiiry9nYt2xPawG9+j+vCqBt=3_unw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Mon, 4 Feb 2019 at 18:15,
            David Greene via llvm-dev <<a
              href="mailto:llvm-dev@lists.llvm.org"
              moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">Simon Moll <<a
              href="mailto:moll@cs.uni-saarland.de" target="_blank"
              moz-do-not-send="true">moll@cs.uni-saarland.de</a>>
            writes:<br>
            <br>
            > You are referring to the sub-vector sizes, if i am
            understanding<br>
            > correctly. I'd assume that the mask sub-vector length
            always has to be<br>
            > either 1 or the same as the data sub-vector length. For
            example, this<br>
            > is ok:<br>
            ><br>
            > %result = call <scalable 3 x float>
            @llvm.evl.fsub.v4f32(<scalable 3 x<br>
            > float> %x, <scalable 3 x float> %y,
            <scalable 1 x i1> %M, i32 %L)<br>
            <br>
            What does <scalable 1 x i1> applied to <scalable 3
            x float> mean?  I<br>
            would expect a requirement of <scalable 3 x i1>.  At
            least that's how I<br>
            understood the SVE proposal [1].  The n's in <scalable n
            x type> have to<br>
            match.<br>
          </blockquote>
          <div><br>
          </div>
          <div>I believe the idea is to allow each single mask bit to
            control multiple consecutive lanes at once, effectively
            interpreting the vector being operated on as "many short
            fixed-length vectors, concatenated" rather than a single
            long vector of scalars. This is a different interpretation
            of that type than usual, but it's not crazy, e.g. a similar
            reinterpretation of vector types seems to be the favored
            approach for adding matrix operations to LLVM IR. It
            somewhat obscures the point to discuss this only for
            scalable vectors, there's no conceptual reason why one
            couldn't do the same with fixed size vectors.</div>
          <div><br>
          </div>
          <div>In fact, I would recommend against making almost any new
            feature or intrinsic exclusive to scalable vectors,
            including this one: there shouldn't be much extra code
            required to allow and support it, and not doing so makes the
            IR less orthogonal. For example, if a <scalable 4 x
            float> fadd with a <scalable 1 x i1> mask works,
            then <4 x float> fadd with a <1 x i1> mask, a
            <8 x float> fadd with a <2 x i1> mask, etc.
            should also be possible overloads of the same intrinsic.<br>
          </div>
        </div>
      </div>
    </blockquote>
    Yep. Doing the same for standard vector IR is on the radar:
    <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504#1380587">https://reviews.llvm.org/D57504#1380587</a>.<br>
    <blockquote type="cite"
cite="mid:CAJrduR7bEMR=utbUx5Rcdiiry9nYt2xPawG9+j+vCqBt=3_unw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>So far, so good. A bit odd, when I think about it, but if
            hardware out there has that capability, maybe this is a good
            way to encode it in IR (other options might work too,
            though). The crux, however, is the interaction with the
            dynamic vector length: is it in terms of the mask? the
            longer data vector? if the latter, what happens if it isn't
            divisible by the mask length? There are multiple options and
            it's not clear to me which one is "the right one", both for
            architectures with native support (hopefully the one brough
            up here won't be the only one) and for internal consistency
            of the IR. If there was an established architecture with
            this kind of feature where people have gathered lots of
            practical experience with it, we could use that inform the
            decision (just as we have for ordinary predication and
            dynamic vector length). But I'm not aware of any
            architecture that does this other than the one Jacob and
            lkcl are working on, and as far as I know their project
            still in the early stages.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>The current understanding is that the dynamic vector length
      operates in the granularity of the mask:
      <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504#1381211">https://reviews.llvm.org/D57504#1381211</a></p>
    <p>In unscaled IR types, this means VL masks each scalar result, in
      scaled types VL masks sub vectors. E.g. for %L == 1 the following
      call produces a pair of floats as the result:<br>
    </p>
    <p><span class="transaction-comment"
        data-sigil="transaction-comment" data-meta="0_31">
        <div class="gmail_quote">
          <pre class="remarkup-code">   <scalable 2 x float> evl.fsub(<scalable 2 x float> %x, <scalable 2 x float> %y, <scalable 2 x i1> %M, i32 %L)

</pre>
          <p><span class="transaction-comment"
              data-sigil="transaction-comment" data-meta="0_31"><span
                class="transaction-comment"
                data-sigil="transaction-comment" data-meta="0_31"><span
                  class="transaction-comment"
                  data-sigil="transaction-comment" data-meta="0_31">I
                  agree that we should only consider the tied sub-vector
                  case for this first version and keep discussing the
                  unconstrained version. It is seductively easy to allow
                  this but impossible to take it back.</span></span></span></p>
          <p><span class="transaction-comment"
              data-sigil="transaction-comment" data-meta="0_31"><span
                class="transaction-comment"
                data-sigil="transaction-comment" data-meta="0_31"><span
                  class="transaction-comment"
                  data-sigil="transaction-comment" data-meta="0_31"></span></span></span></p>
          <pre class="remarkup-code"><span class="transaction-comment" data-sigil="transaction-comment" data-meta="0_31"><span class="transaction-comment" data-sigil="transaction-comment" data-meta="0_31"><span class="transaction-comment" data-sigil="transaction-comment" data-meta="0_31">---
</span></span></span></pre>
          <p><span class="transaction-comment"
              data-sigil="transaction-comment" data-meta="0_31"><span
                class="transaction-comment"
                data-sigil="transaction-comment" data-meta="0_31"><span
                  class="transaction-comment"
                  data-sigil="transaction-comment" data-meta="0_31">The
                  story is different when we talk only(!) about memory
                  accesses and having different vector sizes in the
                  operands and the transferred type (result type for
                  loads, value operand type for stores):</span></span></span></p>
          <span class="transaction-comment"
            data-sigil="transaction-comment" data-meta="0_31"><span
              class="transaction-comment"
              data-sigil="transaction-comment" data-meta="0_31"><span
                class="transaction-comment"
                data-sigil="transaction-comment" data-meta="0_31"></span></span></span>
          <p class="remarkup-code">Eg on AVX, this call could turn into
            a 64bit gather operation of pairs of floats:<br>
          </p>
          <pre><tt>    <16 x float> llvm.evl.gather.v16f32(<8 x float*> %Ptr, <8 x i1> mask %M, i32 vlen 8)</tt></pre>
        </div>
      </span><span class="transaction-comment"
        data-sigil="transaction-comment" data-meta="0_31">
        <div class="gmail_quote"><span class="transaction-comment"
            data-sigil="transaction-comment" data-meta="0_31">And there
            is a native 16 x 16 element load (VLD2D) on SX-Aurora, which
            may be represented as:<br>
          </span></div>
      </span><span class="transaction-comment"
        data-sigil="transaction-comment" data-meta="0_31">
        <div class="gmail_quote"><span class="transaction-comment"
            data-sigil="transaction-comment" data-meta="0_31"><span
              class="transaction-comment"
              data-sigil="transaction-comment" data-meta="0_31">
              <pre><tt>    <scalable 256 x double> llvm.evl.gather.nxv16f64(<scalable 16 x double*> %Ptr, <scalable 16 x i1> mask %M, i32 vlen 16)</tt>

</pre>
            </span></span><span class="transaction-comment"
            data-sigil="transaction-comment" data-meta="0_31"><span
              class="transaction-comment"
              data-sigil="transaction-comment" data-meta="0_31"></span></span></div>
      </span><span class="transaction-comment"
        data-sigil="transaction-comment" data-meta="0_31">
        <div class="gmail_quote"><span class="transaction-comment"
            data-sigil="transaction-comment" data-meta="0_31">- Simon<br>
          </span></div>
      </span></p>
    <pre class="moz-signature" cols="72">-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>
Fax. +49 (0)681 302-3065  : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>
  </body>
</html>