<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 2/4/19 10:40 PM, Robin Kruppe wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJrduR6UYt2+N+XTadYzxgTNe_pDEaUDtyemq-ot5KyK5c29BQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 4 Feb 2019 at 22:04,
Simon Moll <<a href="mailto:moll@cs.uni-saarland.de"
moz-do-not-send="true">moll@cs.uni-saarland.de</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail-m_1539775417466410328moz-cite-prefix">On
2/4/19 9:18 PM, Robin Kruppe wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 4 Feb 2019
at 18:15, David Greene via llvm-dev <<a
href="mailto:llvm-dev@lists.llvm.org"
target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Simon Moll <<a
href="mailto:moll@cs.uni-saarland.de"
target="_blank" moz-do-not-send="true">moll@cs.uni-saarland.de</a>>
writes:<br>
<br>
> You are referring to the sub-vector sizes, if
i am understanding<br>
> correctly. I'd assume that the mask
sub-vector length always has to be<br>
> either 1 or the same as the data sub-vector
length. For example, this<br>
> is ok:<br>
><br>
> %result = call <scalable 3 x float>
@llvm.evl.fsub.v4f32(<scalable 3 x<br>
> float> %x, <scalable 3 x float> %y,
<scalable 1 x i1> %M, i32 %L)<br>
<br>
What does <scalable 1 x i1> applied to
<scalable 3 x float> mean? I<br>
would expect a requirement of <scalable 3 x
i1>. At least that's how I<br>
understood the SVE proposal [1]. The n's in
<scalable n x type> have to<br>
match.<br>
</blockquote>
<div><br>
</div>
<div>I believe the idea is to allow each single mask
bit to control multiple consecutive lanes at once,
effectively interpreting the vector being operated
on as "many short fixed-length vectors,
concatenated" rather than a single long vector of
scalars. This is a different interpretation of
that type than usual, but it's not crazy, e.g. a
similar reinterpretation of vector types seems to
be the favored approach for adding matrix
operations to LLVM IR. It somewhat obscures the
point to discuss this only for scalable vectors,
there's no conceptual reason why one couldn't do
the same with fixed size vectors.</div>
<div><br>
</div>
<div>In fact, I would recommend against making
almost any new feature or intrinsic exclusive to
scalable vectors, including this one: there
shouldn't be much extra code required to allow and
support it, and not doing so makes the IR less
orthogonal. For example, if a <scalable 4 x
float> fadd with a <scalable 1 x i1> mask
works, then <4 x float> fadd with a <1 x
i1> mask, a <8 x float> fadd with a <2
x i1> mask, etc. should also be possible
overloads of the same intrinsic.<br>
</div>
</div>
</div>
</blockquote>
Yep. Doing the same for standard vector IR is on the
radar: <a
class="gmail-m_1539775417466410328moz-txt-link-freetext"
href="https://reviews.llvm.org/D57504#1380587"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D57504#1380587</a>.<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>So far, so good. A bit odd, when I think about
it, but if hardware out there has that capability,
maybe this is a good way to encode it in IR (other
options might work too, though). The crux,
however, is the interaction with the dynamic
vector length: is it in terms of the mask? the
longer data vector? if the latter, what happens if
it isn't divisible by the mask length? There are
multiple options and it's not clear to me which
one is "the right one", both for architectures
with native support (hopefully the one brough up
here won't be the only one) and for internal
consistency of the IR. If there was an established
architecture with this kind of feature where
people have gathered lots of practical experience
with it, we could use that inform the decision
(just as we have for ordinary predication and
dynamic vector length). But I'm not aware of any
architecture that does this other than the one
Jacob and lkcl are working on, and as far as I
know their project still in the early stages.<br>
</div>
</div>
</div>
</blockquote>
<p>The current understanding is that the dynamic vector
length operates in the granularity of the mask: <a
class="gmail-m_1539775417466410328moz-txt-link-freetext"
href="https://reviews.llvm.org/D57504#1381211"
target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D57504#1381211</a></p>
</div>
</blockquote>
<div>I do understand that this is what Jacob proposes based on
the architecture he works on. However, it is not yet clear
to me whether that is the most useful option overall, nor
that it is the only option that will lead to reasonable
codegen for their architecture. But let's leave discussion
of the details on Phab. I just want to highlight one issue
that is not specific to Jacob's angle, as it relates to the
interpretation of scalable vectors more generally:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>In unscaled IR types, this means VL masks each scalar
result, in scaled types VL masks sub vectors. E.g. for
%L == 1 the following call produces a pair of floats as
the result:<br>
</p>
<p><span
class="gmail-m_1539775417466410328transaction-comment">
</span></p>
<div class="gmail_quote">
<pre class="gmail-m_1539775417466410328remarkup-code"> <scalable 2 x float> evl.fsub(<scalable 2 x float> %x, <scalable 2 x float> %y, <scalable 2 x i1> %M, i32 %L)</pre>
</div>
</div>
</blockquote>
<div>As I wrote on Phab mere minutes before you sent this
email, I do not think this is the right interpretation for
any architecture I know about (I do not know anything about
the things Jacob and Luke are working on) nor from the POV
of the scalable vector types proposal. A scalable vector is
not conventionally "a variable-length vector of fixed-size
vectors", it it simply an ordinary "flat" vector whose
length happens to be mostly unknown at compile time. If some
intrinsics want to interpret it differently, that is fine,
but that's a property of those specific intrinsics --
similar to how proposed matrix intrinsics might interpret a
16 element vector as a 4x4 matrix.<br>
</div>
</div>
</div>
</blockquote>
<p>On NEC SX-Aurora the vector length is always interpreted in 64bit
data chunks. That is one example of a real architecture where the
vscaled interpretation of VL makes sense.<br>
</p>
<blockquote type="cite"
cite="mid:CAJrduR6UYt2+N+XTadYzxgTNe_pDEaUDtyemq-ot5KyK5c29BQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail_quote">
<p><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment">I
agree that we should only consider the tied
sub-vector case for this first version and keep
discussing the unconstrained version. It is
seductively easy to allow this but impossible to
take it back.</span></span></span></p>
<p><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment"></span></span></span></p>
<pre class="gmail-m_1539775417466410328remarkup-code"><span class="gmail-m_1539775417466410328transaction-comment"><span class="gmail-m_1539775417466410328transaction-comment"><span class="gmail-m_1539775417466410328transaction-comment">---
</span></span></span></pre>
<p><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment">The
story is different when we talk only(!) about
memory accesses and having different vector
sizes in the operands and the transferred type
(result type for loads, value operand type for
stores):</span></span></span></p>
<span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment"></span></span></span>
<p class="gmail-m_1539775417466410328remarkup-code">Eg
on AVX, this call could turn into a 64bit gather
operation of pairs of floats:<br>
</p>
<pre><tt> <16 x float> llvm.evl.gather.v16f32(<8 x float*> %Ptr, <8 x i1> mask %M, i32 vlen 8)</tt></pre>
</div>
</div>
</blockquote>
<div>Is that IR you'd expect someone to generate (or a backend
to consume) for this operation? It seems like a rather
unnatural or "magical" way to represent the intent (load 64b
each from 8 pointers), at least with the way I'm thinking
about it. I'd expect a gather of 8xi64 and a bitcast.</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div class="gmail_quote"> </div>
<span
class="gmail-m_1539775417466410328transaction-comment">
<div class="gmail_quote"><span
class="gmail-m_1539775417466410328transaction-comment">And
there is a native 16 x 16 element load (VLD2D) on
SX-Aurora, which may be represented as:<br>
</span></div>
</span><span
class="gmail-m_1539775417466410328transaction-comment">
<div class="gmail_quote"><span
class="gmail-m_1539775417466410328transaction-comment"><span
class="gmail-m_1539775417466410328transaction-comment">
<pre><tt> <scalable 256 x double> llvm.evl.gather.nxv16f64(<scalable 16 x double*> %Ptr, <scalable 16 x i1> mask %M, i32 vlen 16)</tt></pre>
</span></span></div>
</span></div>
</blockquote>
<div>In contrast to the above I can't very well say one should
write this as a gather of i1024, but it also seems like a
rather specialized instruction (presumably used for blocked
processing of matrices?) so I can't say that this on its own
motivates me to complicate a proposed core IR construct.<br>
</div>
</div>
</div>
</blockquote>
It actually reduces complexity by shifting it from the address
computation into the instruction. This would cover all three cases:
VLD2D, <2 x float> gather on AVX and <W x float> loads
for this early RISC-V based architecture that Jacob and lkcl are
working on. However, this is not a top priority and we can leave it
out of the first version.<br>
<blockquote type="cite"
cite="mid:CAJrduR6UYt2+N+XTadYzxgTNe_pDEaUDtyemq-ot5KyK5c29BQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>Cheers,</div>
<div>Robin</div>
<br>
</div>
</div>
</blockquote>
- Simon<br>
<pre class="moz-signature" cols="72">--
Simon Moll
Researcher / PhD Student
Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31
Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>
Fax. +49 (0)681 302-3065 : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>
</body>
</html>