<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 2/4/19 10:40 PM, Robin Kruppe wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAJrduR6UYt2+N+XTadYzxgTNe_pDEaUDtyemq-ot5KyK5c29BQ@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr"><br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Mon, 4 Feb 2019 at 22:04,

            Simon Moll <<a href="mailto:moll@cs.uni-saarland.de"

              moz-do-not-send="true">moll@cs.uni-saarland.de</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div bgcolor="#FFFFFF">

              <div class="gmail-m_1539775417466410328moz-cite-prefix">On

                2/4/19 9:18 PM, Robin Kruppe wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <div dir="ltr"><br>

                  </div>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr" class="gmail_attr">On Mon, 4 Feb 2019

                      at 18:15, David Greene via llvm-dev <<a

                        href="mailto:llvm-dev@lists.llvm.org"

                        target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">Simon Moll <<a

                        href="mailto:moll@cs.uni-saarland.de"

                        target="_blank" moz-do-not-send="true">moll@cs.uni-saarland.de</a>>

                      writes:<br>

                      <br>

                      > You are referring to the sub-vector sizes, if

                      i am understanding<br>

                      > correctly. I'd assume that the mask

                      sub-vector length always has to be<br>

                      > either 1 or the same as the data sub-vector

                      length. For example, this<br>

                      > is ok:<br>

                      ><br>

                      > %result = call <scalable 3 x float>

                      @llvm.evl.fsub.v4f32(<scalable 3 x<br>

                      > float> %x, <scalable 3 x float> %y,

                      <scalable 1 x i1> %M, i32 %L)<br>

                      <br>

                      What does <scalable 1 x i1> applied to

                      <scalable 3 x float> mean?  I<br>

                      would expect a requirement of <scalable 3 x

                      i1>.  At least that's how I<br>

                      understood the SVE proposal [1].  The n's in

                      <scalable n x type> have to<br>

                      match.<br>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>I believe the idea is to allow each single mask

                      bit to control multiple consecutive lanes at once,

                      effectively interpreting the vector being operated

                      on as "many short fixed-length vectors,

                      concatenated" rather than a single long vector of

                      scalars. This is a different interpretation of

                      that type than usual, but it's not crazy, e.g. a

                      similar reinterpretation of vector types seems to

                      be the favored approach for adding matrix

                      operations to LLVM IR. It somewhat obscures the

                      point to discuss this only for scalable vectors,

                      there's no conceptual reason why one couldn't do

                      the same with fixed size vectors.</div>

                    <div><br>

                    </div>

                    <div>In fact, I would recommend against making

                      almost any new feature or intrinsic exclusive to

                      scalable vectors, including this one: there

                      shouldn't be much extra code required to allow and

                      support it, and not doing so makes the IR less

                      orthogonal. For example, if a <scalable 4 x

                      float> fadd with a <scalable 1 x i1> mask

                      works, then <4 x float> fadd with a <1 x

                      i1> mask, a <8 x float> fadd with a <2

                      x i1> mask, etc. should also be possible

                      overloads of the same intrinsic.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              Yep. Doing the same for standard vector IR is on the

              radar: <a

                class="gmail-m_1539775417466410328moz-txt-link-freetext"

                href="https://reviews.llvm.org/D57504#1380587"

                target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D57504#1380587</a>.<br>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div><br>

                    </div>

                    <div>So far, so good. A bit odd, when I think about

                      it, but if hardware out there has that capability,

                      maybe this is a good way to encode it in IR (other

                      options might work too, though). The crux,

                      however, is the interaction with the dynamic

                      vector length: is it in terms of the mask? the

                      longer data vector? if the latter, what happens if

                      it isn't divisible by the mask length? There are

                      multiple options and it's not clear to me which

                      one is "the right one", both for architectures

                      with native support (hopefully the one brough up

                      here won't be the only one) and for internal

                      consistency of the IR. If there was an established

                      architecture with this kind of feature where

                      people have gathered lots of practical experience

                      with it, we could use that inform the decision

                      (just as we have for ordinary predication and

                      dynamic vector length). But I'm not aware of any

                      architecture that does this other than the one

                      Jacob and lkcl are working on, and as far as I

                      know their project still in the early stages.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>The current understanding is that the dynamic vector

                length operates in the granularity of the mask: <a

                  class="gmail-m_1539775417466410328moz-txt-link-freetext"

                  href="https://reviews.llvm.org/D57504#1381211"

                  target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D57504#1381211</a></p>

            </div>

          </blockquote>

          <div>I do understand that this is what Jacob proposes based on

            the architecture he works on. However, it is not yet clear

            to me whether that is the most useful option overall, nor

            that it is the only option that will lead to reasonable

            codegen for their architecture. But let's leave discussion

            of the details on Phab. I just want to highlight one issue

            that is not specific to Jacob's angle, as it relates to the

            interpretation of scalable vectors more generally:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div bgcolor="#FFFFFF">

              <p>In unscaled IR types, this means VL masks each scalar

                result, in scaled types VL masks sub vectors. E.g. for

                %L == 1 the following call produces a pair of floats as

                the result:<br>

              </p>

              <p><span

                  class="gmail-m_1539775417466410328transaction-comment">

                </span></p>

              <div class="gmail_quote">

                <pre class="gmail-m_1539775417466410328remarkup-code">   <scalable 2 x float> evl.fsub(<scalable 2 x float> %x, <scalable 2 x float> %y, <scalable 2 x i1> %M, i32 %L)</pre>

              </div>

            </div>

          </blockquote>

          <div>As I wrote on Phab mere minutes before you sent this

            email, I do not think this is the right interpretation for

            any architecture I know about (I do not know anything about

            the things Jacob and Luke are working on) nor from the POV

            of the scalable vector types proposal. A scalable vector is

            not conventionally "a variable-length vector of fixed-size

            vectors", it it simply an ordinary "flat" vector whose

            length happens to be mostly unknown at compile time. If some

            intrinsics want to interpret it differently, that is fine,

            but that's a property of those specific intrinsics --

            similar to how proposed matrix intrinsics might interpret a

            16 element vector as a 4x4 matrix.<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>On NEC SX-Aurora the vector length is always interpreted in 64bit

      data chunks. That is one example of a real architecture where the

      vscaled interpretation of VL makes sense.<br>

    </p>

    <blockquote type="cite"

cite="mid:CAJrduR6UYt2+N+XTadYzxgTNe_pDEaUDtyemq-ot5KyK5c29BQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div bgcolor="#FFFFFF">

              <div class="gmail_quote">

                <p><span

                    class="gmail-m_1539775417466410328transaction-comment"><span

class="gmail-m_1539775417466410328transaction-comment"><span

                        class="gmail-m_1539775417466410328transaction-comment">I

                        agree that we should only consider the tied

                        sub-vector case for this first version and keep

                        discussing the unconstrained version. It is

                        seductively easy to allow this but impossible to

                        take it back.</span></span></span></p>

                <p><span

                    class="gmail-m_1539775417466410328transaction-comment"><span

class="gmail-m_1539775417466410328transaction-comment"><span

                        class="gmail-m_1539775417466410328transaction-comment"></span></span></span></p>

                <pre class="gmail-m_1539775417466410328remarkup-code"><span class="gmail-m_1539775417466410328transaction-comment"><span class="gmail-m_1539775417466410328transaction-comment"><span class="gmail-m_1539775417466410328transaction-comment">---

</span></span></span></pre>

                <p><span

                    class="gmail-m_1539775417466410328transaction-comment"><span

class="gmail-m_1539775417466410328transaction-comment"><span

                        class="gmail-m_1539775417466410328transaction-comment">The

                        story is different when we talk only(!) about

                        memory accesses and having different vector

                        sizes in the operands and the transferred type

                        (result type for loads, value operand type for

                        stores):</span></span></span></p>

                <span

                  class="gmail-m_1539775417466410328transaction-comment"><span

class="gmail-m_1539775417466410328transaction-comment"><span

                      class="gmail-m_1539775417466410328transaction-comment"></span></span></span>

                <p class="gmail-m_1539775417466410328remarkup-code">Eg

                  on AVX, this call could turn into a 64bit gather

                  operation of pairs of floats:<br>

                </p>

                <pre><tt>    <16 x float> llvm.evl.gather.v16f32(<8 x float*> %Ptr, <8 x i1> mask %M, i32 vlen 8)</tt></pre>

              </div>

            </div>

          </blockquote>

          <div>Is that IR you'd expect someone to generate (or a backend

            to consume) for this operation? It seems like a rather

            unnatural or "magical" way to represent the intent (load 64b

            each from 8 pointers), at least with the way I'm thinking

            about it. I'd expect a gather of 8xi64 and a bitcast.</div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div bgcolor="#FFFFFF">

              <div class="gmail_quote"> </div>

              <span

                class="gmail-m_1539775417466410328transaction-comment">

                <div class="gmail_quote"><span

                    class="gmail-m_1539775417466410328transaction-comment">And

                    there is a native 16 x 16 element load (VLD2D) on

                    SX-Aurora, which may be represented as:<br>

                  </span></div>

              </span><span

                class="gmail-m_1539775417466410328transaction-comment">

                <div class="gmail_quote"><span

                    class="gmail-m_1539775417466410328transaction-comment"><span

class="gmail-m_1539775417466410328transaction-comment">

                      <pre><tt>    <scalable 256 x double> llvm.evl.gather.nxv16f64(<scalable 16 x double*> %Ptr, <scalable 16 x i1> mask %M, i32 vlen 16)</tt></pre>

                    </span></span></div>

              </span></div>

          </blockquote>

          <div>In contrast to the above I can't very well say one should

            write this as a gather of i1024, but it also seems like a

            rather specialized instruction (presumably used for blocked

            processing of matrices?) so I can't say that this on its own

            motivates me to complicate a proposed core IR construct.<br>

          </div>

        </div>

      </div>

    </blockquote>

    It actually reduces complexity by shifting it from the address

    computation into the instruction. This would cover all three cases:

    VLD2D, <2 x float> gather on AVX and <W x float> loads

    for this early RISC-V based architecture that Jacob and lkcl are

    working on. However, this is not a top priority and we can leave it

    out of the first version.<br>

    <blockquote type="cite"

cite="mid:CAJrduR6UYt2+N+XTadYzxgTNe_pDEaUDtyemq-ot5KyK5c29BQ@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <div><br>

          </div>

          <div>Cheers,</div>

          <div>Robin</div>

          <br>

        </div>

      </div>

    </blockquote>

    - Simon<br>

    <pre class="moz-signature" cols="72">-- 


Simon Moll

Researcher / PhD Student


Compiler Design Lab (Prof. Hack)

Saarland University, Computer Science

Building E1.3, Room 4.31


Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>

Fax. +49 (0)681 302-3065  : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>

  </body>

</html>