[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Tue Jun 5 12:08:49 PDT 2018

Graham Hunter <Graham.Hunter at arm.com> writes:

>> Can you explain a bit about what the two integers represent?  What's the
>> "unscaled" part for?
>
> 'Unscaled' just means 'exactly this many bits', whereas 'scaled' is 'this many bits
> multiplied by vscale'.

Right, but what do they represent?  If I have <scalable 4 x i32> is "32"
"unscaled" and "4" "scaled?"  Or is "128" "scaled?"  Or something else?

I see you answered this below.

>> The name "getSizeExpressionInBits" makes me think that a Value
>> expression will be returned (something like a ConstantExpr that uses
>> vscale).  I would be surprised to get a pair of integers back.  Do
>> clients actually need constant integer values or would a ConstantExpr
>> sufffice?  We could add a ConstantVScale or something to make it work.
>
> I agree the name is not ideal and I'm open to suggestions -- I was thinking of the two
> integers representing the known-at-compile-time terms in an expression:
> '(scaled_bits * vscale) + unscaled_bits'.
>
> Assuming the pair is of the form (unscaled, scaled), then for a type with a size known at
> compile time like <4 x i32> the size would be (128, 0).
>
> For a scalable type like <scalable 4 x i32> the size would be (0, 128).
>
> For a struct with, say, a <scalable 32 x i8> and an i64, it would be (64, 256).
>
> When calculating the offset for memory addresses, you just need to multiply the scaled
> part by vscale and add the unscaled as is.

Ok, now I understand what you're getting at.  A ConstantExpr would
encapsulate this computation.  We alreay have "non-static-constant"
values for ConstantExpr like sizeof and offsetof.  I would see
VScaleConstant in that same tradition.  In your struct example,
getSizeExpressionInBits would return:

add(mul(256, vscale), 64)

Does that satisfy your needs?

Is there anything about vscale or a scalable vector that requires a
minimum bit width?  For example, is this legal?

<scalable 1 x double>

I know it won't map to an SVE type.  I'm simply curious because
traditionally Cray machines defined vectors in terms of
machine-dependent "maxvl" with an element type, so with the above vscale
would == maxvl.  Not that we make any such things anymore.  But maybe
someone else does?

>> If we went the ConstantExpr route and added ConstantExpr support to
>> ScalarEvolution, then SCEVs could be compared to do this size
>> comparison.  We have code here that adds ConstantExpr support to
>> ScalarEvolution.  We just didn't know if anyone else would be interested
>> in it since we added it solely for our Fortran frontend.
>
> We added a dedicated SCEV expression class for vscale instead; I suspect it works
> either way.

Yes, that's probably true.  A vscale SCEV is less invasive.

> We've tried it as both an instruction and as a 'Constant', and both work fine with
> ScalarEvolution. I have not yet tried it with the intrinsic.

vscale as a Constant is interesting.  It's a target-dependent Constant
like sizeof and offsetof.  It doesn't have a statically known value and
maybe isn't "constant" across functions.  So it's a strange kind of
constant.

Ultimately whatever is easier for LLVM to analyze in the long run is
best.  Intrinsics often block optimization.  I don't know whether vscale
would be "eaiser" as a Constant or an Instruction.

>> As above, we could add ConstantVScale and also ConstantStepVector (or
>> ConstantIota).  They won't fold to compile-time values but the
>> expressions could be simplified.  I haven't really thought through the
>> implications of this, just brainstorming ideas.  What does your
>> downstream compiler require in terms of constant support.  What kinds of
>> queries does it need to do?
>
> It makes things a little easier to pattern match (just looking for a constant to start
> instead of having to match multiple different forms of vscale or stepvector multiplied
> and/or added in each place you're looking for them).

Ok.  Normalization could help with this but I certainly understand the
issue.

> The bigger reason we currently depend on them being constant is that code generation
> generally looks at a single block at a time, and there are several expressions using
> vscale that we don't want to be generated in one block and passed around in a register,
> since many of the load/store addressing forms for instructions will already scale properly.

This is kind of like X86 memop folding.  If a load has multiple uses, it
won't be folded, on the theory that one load is better than many folded
loads.  If a load has exactly one use, it will fold.  There's explicit
predicate code in the X86 backend to enforce this requirement.  I
suspect if the X86 backend tried to fold a single load into multiple
places, Bad Things would happen (needed SDNodes might disappear, etc.).

Codegen probably doesn't understand non-statically-constant
ConstantExprs, since sizeof of offsetof can be resolved by the target
before instruction selection.

> We've done this downstream by having them be Constants, but if there's a good way
> of doing them with intrinsics we'd be fine with that too.

If vscale/stepvector as Constants works, it seems fine to me.

                               -David