[llvm-dev] [RFC] Introduce a new stepvector operation
David Sherwood via llvm-dev
llvm-dev at lists.llvm.org
Wed Jan 20 08:03:54 PST 2021
As part of adding support for scalable vectorization we need to update llvm::InnerLoopVectorizer::getStepVector for scalable vectors. Currently this just returns a constant vector with the sequence <0, 1, 2, 3, ..>, however this assumes we are using fixed length vectors. For scalable vectors we need an operation that does the same thing, but without having to explicitly initalise all the elements. Any new stepvector operation we provide could also be used for fixed length vectors too if desired.
I believe the desirable properties of the operation should be:
1. The operation requires no arguments and simply returns a vector with the numeric sequence <0, 1, 2, ...>
2. For types with a large number of elements, e.g. <vscale x 32 x i8> (vscale = 16), there is the possibility of the sequence value exceeding the limit of the type midway through the vector. In such cases we define the operation such that those elements are undefined or poison values.
A simple 'stepvector' operation (however we choose to implement it) with the properties described above can then be used together with additional 'mul' and 'add' instructions to create any arbitrary linear sequence, i.e. <0, 2, 4, 6, ...> or <1, 3, 5, 7, ...>
The first possible implementation with the properties as described above involves using a new llvm.stepvector() intrinsic with no arguments that simply returns a vector sequence <0, 1, 2, ...> of the requested type, i.e.
declare <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
Introducing a new intrinsic is simple to implement and we can easily come up with an appropriate cost model - cheap for fixed width vectors or scalable vectors using SVE where we have the 'index' instruction.
However, since such an intrinsic is effectively returning a constant vector sequence we could instead implement it using a new 'stepvector' constant in a similar way to how 'zeroinitializer' works. This would be done using a new ConstantStepVector class similar to ConstantAggregateZero and would return a vector with the numeric sequence <0, 1, 2, ...>. The main advantages of using a constant over an intrinsic are:
1. It is easy to write tests in LLVM IR since 'stepvector' would work in the same way as 'zeroinitializer', i.e. "%1 = add <4 x i32> %0, stepvector"
2. Creation of the node is easy with the simple interface:
static Constant *ConstantStepVector::get(Type Ty)
3. It is easy to do optimisations, e.g. CSE, and pattern matching in IR.
The main disadvantages are:
1. A scalable constant cannot be represented as well in the .data section, although we can still create a constant based on the architectural maximum for vscale. It's worth pointing out that this problem also exists for zeroinitializer too - we're just more likely to have cheap instructions to do the job.
2. Harder to fit into the cost model due to it being a constant.
3. There are some concerns that we might then have to support stepvector as a constant in the shufflevector operation too and that it should be restricted to zeroinitializer only.
Any thoughts or feedback you have would be much appreciated!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev