[LLVMdev] Indexed Load and Store Intrinsics - proposal

Thu Dec 18 11:56:05 PST 2014

"Demikhovsky, Elena" <elena.demikhovsky at intel.com> writes:

> Semantics:
> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale)
> = VectorValue[i];}
> VectorValue: any float or integer vector type.
> BaseAddr: a pointer; may be zero if full address is placed in the
> index.
> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer
> values.

What about the case of a gather/scatter where the BaseAddr is zero and
the indices are pointers?  Must we do a ptrtoint?  llvm.org is down at
the moment but I don't think we currently have a vector ptrtoint.

> Scale: a compile time constant 1, 2, 4 or 8.

This seems a bit too Intel-focused.  Why not allow arbitrary scales?  Or
alternatively, eliminate the Scale and do a vector multiply on
VectorOfIndices.  It should be simple enough to write matching TableGen
patterns.  We do it now for the x86 memop stuff.

> VectorValue, VectorOfIndices and Mask must have the same vector width.

>From your example, you mean they must have the same number of vector
elements, not the same bit width, right?  I'm used to "width" meaning a
specific bit length and "vector length" meaning "number of elements."
With that terminology, I think you mean they must have the same vector
length.

> An indexed store instruction with complete or partial overlap in
> memory (i.e., two indices with same or close values) will provide the
> result equivalent to serial scalar stores from least to most
> significant vector elements.

Yep, they must be ordered.  Do we want to provide unordered scatters as
well?  Some (non-LLVM) targets have them.  We don't need to add them
right now but it's worth thinking about.

> The new intrinsics are common for all targets, like recently
> introduced masked load and store.
> Examples:
> <16 x float> @llvm.sindex.load.v16f32.v16i32 (i8 *%ptr, <16 x i32>
> %index, i32 %scale)
> <16 x float> @llvm.masked.sindex.load.v16f32.v16i32 (i8 *%ptr, <16 x
> i32> %index, <16 x float> %passthru, <16 x i1> %mask)
> void @llvm.sindex.store.v16f32.v16i64(i8* %ptr, <16 x float> %value,
> <16 x 164> %index, i32 %scale, <16 x i1> %mask)
> Comments?

I think it's definitely a good idea to introduce them, but let's make
them a little more target-neutral if we can.

                           -David