[LLVMdev] Indexed Load and Store Intrinsics - proposal

Demikhovsky, Elena elena.demikhovsky at intel.com
Thu Dec 18 13:38:13 PST 2014


Hi Nadav,


It looks like the proposed intrinsic is very specific to the x86 implementation of gather/scatter.  Would it be possible to remove the PassThrough value from the intrinsic and define the masked-out value to be undef?  You would still be able to pattern match it if you use a maskedload + select.
[Demikhovsky, Elena] We have PassThrough value in masked load. We want to be consistent in all intrinsics.

Can we remove the masked version of the intrinsic altogether and pattern match it using the non-masked version somehow?
[Demikhovsky, Elena] on the contrary, we can remove non-masked and use masked somehow. Using non-masked+select is not safe for gather and meaningless for scatter. That’s why we added masked load/store intrinsics.

Can we infer the scale value based on the loaded element type?
[Demikhovsky, Elena]
In this case we need two different intrinsics: one with non-zero base and vector of indices (index is relative to base) and implicit scale based, as you say, on element type.
And the second one without base, without scale, just vector of pointers.

Semantics:
For i=0,1,…,N-1: if (Mask[i]) {VectorValue[i] = *(BaseAddr + VectorOfIndices[i]*Scale) else VectorValue[i]=PassThruVal[i];}

void @llvm.sindex.store (BaseAddr, VectorValue, VectorOfIndices, Scale)
void @llvm.uindex.store (BaseAddr, VectorValue, VectorOfIndices, Scale)
void @llvm.sindex.masked.store (BaseAddr, VectorValue, VectorOfIndices, Scale, Mask)
void @llvm.uindex.masked.store (BaseAddr, VectorValue, VectorOfIndices, Scale, Mask)

Semantics:
For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale) = VectorValue[i];}

VectorValue: any float or integer vector type.

We should also support loading and storing pointer values.


BaseAddr: a pointer; may be zero if full address is placed in the index.
VectorOfIndices: a vector of i32 or i64 signed or unsigned integer values.
Scale: a compile time constant 1, 2, 4 or 8.

Why do we need to limit the scale values?


VectorValue, VectorOfIndices and Mask must have the same vector width.

An indexed store instruction with complete or partial overlap in memory (i.e., two indices with same or close values) will provide the result equivalent to serial scalar stores from least to most significant vector elements.


Thanks,
Nadav

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141218/c701ff2f/attachment.html>


More information about the llvm-dev mailing list