<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">

Hi David,

<div class=""><br class="">

</div>

<div class="">Responses below.</div>

<div class=""><br class="">

</div>

<div class="">-Graham<br class="">

<div><br class="">

<blockquote type="cite" class="">

<div class="">On 11 Jun 2018, at 22:19, David A. Greene <<a href="mailto:dag@cray.com" class="">dag@cray.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class="">

<div class="">Graham Hunter <<a href="mailto:Graham.Hunter@arm.com" class="">Graham.Hunter@arm.com</a>> writes:<br class="">

<br class="">

<blockquote type="cite" class="">========<br class="">

1. Types<br class="">

========<br class="">

<br class="">

To represent a vector of unknown length a boolean `Scalable` property has been<br class="">

added to the `VectorType` class, which indicates that the number of elements in<br class="">

the vector is a runtime-determined integer multiple of the `NumElements` field.<br class="">

Most code that deals with vectors doesn't need to know the exact length, but<br class="">

does need to know relative lengths -- e.g. get a vector with the same number of<br class="">

elements but a different element type, or with half or double the number of<br class="">

elements.<br class="">

<br class="">

In order to allow code to transparently support scalable vectors, we introduce<br class="">

an `ElementCount` class with two members:<br class="">

<br class="">

- `unsigned Min`: the minimum number of elements.<br class="">

- `bool Scalable`: is the element count an unknown multiple of `Min`?<br class="">

<br class="">

For non-scalable vectors (``Scalable=false``) the scale is considered to be<br class="">

equal to one and thus `Min` represents the exact number of elements in the<br class="">

vector.<br class="">

<br class="">

The intent for code working with vectors is to use convenience methods and avoid<br class="">

directly dealing with the number of elements. If needed, calling<br class="">

`getElementCount` on a vector type instead of `getVectorNumElements` can be used<br class="">

to obtain the (potentially scalable) number of elements. Overloaded division and<br class="">

multiplication operators allow an ElementCount instance to be used in much the<br class="">

same manner as an integer for most cases.<br class="">

<br class="">

This mixture of compile-time and runtime quantities allow us to reason about the<br class="">

relationship between different scalable vector types without knowing their<br class="">

exact length.<br class="">

</blockquote>

<br class="">

How does this work in practice?  Let's say I populate a vector with a<br class="">

splat.  Presumably, that gives me a "full length" vector.  Let's say the<br class="">

type is <scalable 2 x double>.  How do I split the vector and get<br class="">

something half the width?  What is its type?  How do I split it again<br class="">

and get something a quarter of the width?  What is its type?  How do I<br class="">

use half- and quarter-width vectors?  Must I resort to predication?<br class="">

</div>

</div>

</blockquote>

<div><br class="">

</div>

<div>To split a <scalable 2 x double> in half, you'd use a shufflevector in much the</div>

<div>same way you would for fixed-length vector types.</div>

<div><br class="">

</div>

<div>e.g.</div>

<div>``</div>

<div>  %sv = call <scalable 1 x i32> @llvm.experimental.vector.stepvector.nxv1i32()</div>

<div>  %halfvec = shufflevector <scalable 2 x double> %fullvec, <scalable 2 x double> undef, <scalable 1 x i32> %sv</div>

<div>``</div>

<div><br class="">

</div>

<div>You can't split it any further than a <scalable 1 x <ty>>, since there may only be</div>

<div>one element in the actual hardware vector at runtime. The same restriction applies to</div>

<div>a <1 x <ty>>. This is why we have a minimum number of lanes in addition to the</div>

<div>scalable flag so that we can concatenate and split vectors, since SVE registers have</div>

<div>the same number of bytes and will therefore decrease the number of elements per</div>

<div>register as the element type increases in size.</div>

<div><br class="">

</div>

<div>If you want to extract something other than the first part of a vector, you need to add</div>

<div>offsets based on a calculation from vscale (e.g. adding vscale * (min_elts/2) allows you</div>

<div>to reach the high half of a larger register).</div>

<div><br class="">

</div>

<div>If you check the patch which introduces splatvector (<a href="https://reviews.llvm.org/D47775" class="">https://reviews.llvm.org/D47775</a>),</div>

<div>you can see a line which currently produces an error if changing the size of a vector</div>

<div>is required, and notes that VECTOR_SHUFFLE_VAR hasn't been implemented yet.</div>

<div><br class="">

</div>

<div>In our downstream compiler, this is an ISD alongside VECTOR_SHUFFLE which</div>

<div>allows a shuffle with a variable mask instead of a constant.</div>

<div><br class="">

</div>

<div>If people feel it would be useful, I can prepare another patch which implements these</div>

<div>shuffles (as an intrinsic rather than a common ISD) for review now instead of later;</div>

<div>I tried to keep the initial patch set small so didn't cover all cases.</div>

<div><br class="">

</div>

<div>In terms of using predication, that's generally not required for the legal integer types;</div>

<div>normal promotion via sign or zero extension work. We've tried to reuse existing</div>

<div>mechanisms wherever possible.</div>

<div><br class="">

</div>

<div>For floating point types, we do use predication to allow the use of otherwise illegal</div>

<div>types like <scalable 1 x double>, but that's limited to the AArch64 backend and does</div>

<div>not need to be represented in IR.</div>

<div><br class="">

</div>

<br class="">

<blockquote type="cite" class="">

<div class="">

<div class="">Ths split question comes into play for backward compatibility.  How<br class="">

would one take a scalable vector and pass it into a NEON library?  It is<br class="">

likely that some math functions, for example, will not have SVE versions<br class="">

available.<br class="">

</div>

</div>

</blockquote>

<div><br class="">

</div>

<div>I don't believe we intend to support this, but instead provide libraries with</div>

<div>SVE versions of functions instead. The problem is that you don't know how</div>

<div>many NEON-size subvectors exist within an SVE vector at compile time.</div>

<div>While you could create a loop with 'vscale' number of iterations and try to</div>

<div>extract those subvectors, I suspect the IR would end up being quite messy</div>

<div>and potentially hard to recognize and optimize.</div>

<div><br class="">

</div>

<div>The other problem with calling non-SVE functions is that any live SVE</div>

<div>registers must be spilled to the stack and filled after the call, which is</div>

<div>likely to be quite expensive.</div>

<br class="">

<blockquote type="cite" class="">

<div class="">

<div class="">Is there a way to represent "double width" vectors?  In mixed-data-size<br class="">

loops it is sometimes convenient to reason about double-width vectors<br class="">

rather than having to split them (to legalize for the target<br class="">

architecture) and keep track of their parts early on.  I guess the more<br class="">

fundamental question is about how such loops should be handled.<br class="">

</div>

</div>

</blockquote>

<div><br class="">

</div>

<div>For SVE, it's fine to generate IR with types that are 'double size' or larger,</div>

<div>and just leave it to legalization at SelectionDAG level to split into multiple</div>

<div>legal size registers.</div>

<div><br class="">

</div>

<div>Again, if people would like me to create a patch to illustrate legalization sooner</div>

<div>rather than later to help understand what's needed to support these types, let</div>

<div>me know.</div>

<br class="">

<blockquote type="cite" class="">

<div class="">

<div class=""><br class="">

What do insertelement and extractelement mean for scalable vectors?<br class="">

Your examples showed insertelement at index zero.  How would I, say,<br class="">

insertelement into the upper half of the vector?  Or any arbitrary<br class="">

place?  Does insertelement at index 10 of a <scalable 2 x double> work,<br class="">

assuming vscale is large enough?  It is sometimes useful to constitute a<br class="">

vector out of various scalar pieces and insertelement is a convenient<br class="">

way to do it.<br class="">

</div>

</div>

</blockquote>

<br class="">

</div>

<div>So you can insert or extract any element known to exist (in other words, it's</div>

<div>within the minimum number of elements). Using a constant index outside</div>

<div>that range will fail, as we won't know whether the element actually exists</div>

<div>until we're running on a cpu.</div>

<div><br class="">

</div>

<div>Our downstream compiler supports inserting and extracting arbitrary elements</div>

<div>from calculated offsets as part of our experiment on search loop vectorization,</div>

<div>but that generates the offsets based on a count of true bits within partitioned</div>

<div>predicates. I was planning on proposing new intrinsics to improve predicate use</div>

<div>within llvm at a later date.</div>

<div><br class="">

</div>

<div>We have been able to implement various types of known shuffles (like the high/low</div>

<div>half extract, zip, concatention, etc) with vscale, stepvector, and the existing IR</div>

<div>instructions.</div>

<div><br class="">

</div>

<br class="">

</div>

</body>

</html>