[LLVMdev] Indexed Load and Store Intrinsics - proposal

Nadav Rotem nrotem at apple.com
Thu Dec 18 10:01:34 PST 2014


Hi Elena, 

I think that in general this proposal makes sense and is consistent with discussions that we’ve had in the past. These new intrinsics can be very useful for vectorization.  I have a few comments below

> On Dec 18, 2014, at 6:40 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote:
> 
> Hi,
>  
> Recent Intel architectures AVX-512 and AVX2 provide vector gather and/or scatter instructions.
> Gather/scatter instructions allow read/write access to multiple memory addresses. The addresses are specified using a base address and a vector of indices.
> We’d like Vectorizers to tap this functionality, and propose to do so by introducing new intrinsics:
>  
> VectorValue = @llvm.sindex.load (BaseAddr, VectorOfIndices, Scale)
> VectorValue = @llvm.uindex.load (BaseAddr, VectorOfIndices, Scale)
> VectorValue = @llvm.sindex.masked.load (BaseAddr, VectorOfIndices, Scale, PassThruVal, Mask)
> VectorValue = @llvm.uindex.masked.load (BaseAddr, VectorOfIndices, Scale, PassThruVal, Mask)
>  

It looks like the proposed intrinsic is very specific to the x86 implementation of gather/scatter.  Would it be possible to remove the PassThrough value from the intrinsic and define the masked-out value to be undef?  You would still be able to pattern match it if you use a maskedload + select. 

Can we remove the masked version of the intrinsic altogether and pattern match it using the non-masked version somehow?

Can we infer the scale value based on the loaded element type?

> Semantics:
> For i=0,1,…,N-1: if (Mask[i]) {VectorValue[i] = *(BaseAddr + VectorOfIndices[i]*Scale) else VectorValue[i]=PassThruVal[i];}
>  
> void @llvm.sindex.store (BaseAddr, VectorValue, VectorOfIndices, Scale)
> void @llvm.uindex.store (BaseAddr, VectorValue, VectorOfIndices, Scale)
> void @llvm.sindex.masked.store (BaseAddr, VectorValue, VectorOfIndices, Scale, Mask)
> void @llvm.uindex.masked.store (BaseAddr, VectorValue, VectorOfIndices, Scale, Mask)
>  
> Semantics:
> For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale) = VectorValue[i];}
>  
> VectorValue: any float or integer vector type.

We should also support loading and storing pointer values. 

> BaseAddr: a pointer; may be zero if full address is placed in the index.
> VectorOfIndices: a vector of i32 or i64 signed or unsigned integer values.
> Scale: a compile time constant 1, 2, 4 or 8.

Why do we need to limit the scale values?

> VectorValue, VectorOfIndices and Mask must have the same vector width.
>  
> An indexed store instruction with complete or partial overlap in memory (i.e., two indices with same or close values) will provide the result equivalent to serial scalar stores from least to most significant vector elements.
>  

Thanks,
Nadav

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141218/f382b3f3/attachment.html>


More information about the llvm-dev mailing list