[PATCH] D32737: [Constants][SVE] Represent the runtime length of a scalable vector

Sun May 21 12:40:59 PDT 2017

sanjoy added a comment.

In https://reviews.llvm.org/D32737#760375, @rengolin wrote:

> In https://reviews.llvm.org/D32737#760310, @sanjoy wrote:
>
> > I've only lightly read the spec, but it looks like the vector length can be controlled by writing to the `ZCR_ELn` registers (so, e.g. user code could make a syscall to change the vector length)?
>
>
> As I explained to Hal on his comment, that is correct but doesn't have the effect you're expecting.
>
> Vectors don't have length, they have the "idea that they may have length", and it's up to the CPU to control that.
>
> Just to be clear, the example you propose has no effect on the notion of length:
>
>   // SVE length defined at boot time to be 4
>   ...
>   add z0.s, p0/m, z0.s, z1.s // z0+=z1 only where the predicate p0 is valid, which here is "up-to" 4 vector lengths
>   ...
>   svc ... // Try to change vector length to 8, assuming this works
>   ...
>   add z0.s, p0/m, z0.s, z1.s // z0+=z1 only where the predicate p0 is valid, which here is "up-to" 8 vector lengths

Let me try to give an example.  Say we have code like (quasi-llvm syntax):

  // vector length is 4
  %v0 = load <4 x vscale x i8>, <4 x vscale x i8>* %ptr0
  svc ... // Try to change vector length to 8, assuming this works
  %ptr1 = %ptr0 + vscale
  %v1 = load <4 x vscale x i8>, <4 x vscale x i8>* %ptr1
  %v2 = add %v1, %v2

I have two questions here:

[edit: I just saw Hal's reply -- if we *disallow* mid-process changes to the vector length, then things are much simpler, but that needs to be documented.]

- What is the semantics of `add %v1, %v2`?  As far as I can tell, the two vectors have "different" vector lengths, since one was created before the resize and the other was created after.  I know the *registers* will have the same size, but as I understand it, one of the 32-byte registers will have 16 elements, while the other one will have 32.  The bit that's worrying me here is that if we allow resizing operations then things like `shufflevector` (say) will have to be ordered with respect to unknown calls.
- Whether `%ptr1` is computed before or after the syscall gives the program different semantics since it will either be 4 or 8 bytes after `%ptr0`.  Does this mean we will have to order `%ptr1` (a GEP) with respect to unknown function calls?

>> If that's accurate, I think a constant `vscale` is not sufficient.
> 
> The main problem here is one of representation. In the ARM implementation, SVE vectors alias with SIMD vectors, so you need to be careful on how you write to them.
> 
> If you don't have a way to separate SVE from SIMD, you'll have trouble generating either code. If you separate them completely, you'll have trouble worrying about the aliasing.

I'm not sure how SIMD etc. is related to what I asked.

> Having a flag (even as boolean "i1 vscale") is enough. It needs to be a constant because of how scalar evolution will work on the predicate vectors, but I'll let Graham explain that in more depth, as I'm only "familiar" at this point.

Hm?  I was under the impression that `vscale` was supposed to help offset induction variables (and things like that) by the right amount.  How would you do that with an `i1 vscale`?

================
Comment at: lib/IR/Constants.cpp:812
+  // Free the constant and any dangling references to it.
+  getContext().pImpl->VSVConstants.erase(getType());
+}
----------------
rengolin wrote:
> sanjoy wrote:
> > rengolin wrote:
> > > aemerson wrote:
> > > > rengolin wrote:
> > > > > So, in theory, you can have vscale constans of different integer types, and this would only clear the ones that are the same as this one? 
> > > > > 
> > > > > This sounds confusing.
> > > > Yes, in the same way you can have i32 undef, i64 undef etc.
> > > Right, makes sense.
> > Is there a minimum width, or is (say) an `i1 vscale` allowed?  If there isn't a minimum, I presume the semantics is that the runtime value of `vscale` will be truncated to the type width?
> The vscale does not define the vector length. That is defined by the CPU (via a status register) at runtime.
> 
> The *exact* same code can run in one process with length = 10 and another with length = 1. In theory, the same binary could run one instruction with 10 and the very next with 1 (that'd be crazy, but valid).
> 
> However, one instruction being executed by the unit will operate on identical lengths. Ie. you can't have two vectors of different sizes on the same "add". AFAIK this is not just illegal, it's theoretically impossible, from where that information comes from.
> 
> What's illegal (and probably traps) is if you set the status register to a value that is larger than the actual physical length, but that will never be generated by the compiler (which has no business setting the length at all), so it's not something the compiler should worry about.
What I meant to say is, say I have code like:

```
for (iN i = 0; i < L; i += (iN vscale)) {
  load scaled vector from &a[i];
  ...
}
```

Does `N` have to be greater than some value for the loop above to make sense?  For instance, if the vector length in the CPU is set to `32` then `N` = `2` clearly does not make sense -- `i2 32` is just `i2 0`.  If there is such a restriction, then it needs to be documented.

https://reviews.llvm.org/D32737