<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from rtf -->

<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<font face="Calibri" size="2"><span style="font-size:11pt;">

<div>Hi,</div>

<div> </div>

<div>Recent Intel architectures AVX-512 and AVX2 provide vector gather and/or scatter instructions.</div>

<div>Gather/scatter instructions allow read/write access to multiple memory addresses. The addresses are specified using a base address and a vector of indices.</div>

<div>We’d like Vectorizers to tap this functionality, and propose to do so by introducing new intrinsics:</div>

<div> </div>

<div>VectorValue = @llvm.sindex.load (BaseAddr, VectorOfIndices, Scale)</div>

<div>VectorValue = @llvm.uindex.load (BaseAddr, VectorOfIndices, Scale)</div>

<div>VectorValue = @llvm.sindex.masked.load (BaseAddr, VectorOfIndices, Scale, PassThruVal, Mask)</div>

<div>VectorValue = @llvm.uindex.masked.load (BaseAddr, VectorOfIndices, Scale, PassThruVal, Mask)</div>

<div> </div>

<div>Semantics:</div>

<div>For i=0,1,…,N-1: if (Mask[i]) {VectorValue[i] = *(BaseAddr + VectorOfIndices[i]*Scale) else VectorValue[i]=PassThruVal[i];}</div>

<div> </div>

<div>void @llvm.sindex.store (BaseAddr, VectorValue, VectorOfIndices, Scale)</div>

<div>void @llvm.uindex.store (BaseAddr, VectorValue, VectorOfIndices, Scale)</div>

<div>void @llvm.sindex.masked.store (BaseAddr, VectorValue, VectorOfIndices, Scale, Mask)</div>

<div>void @llvm.uindex.masked.store (BaseAddr, VectorValue, VectorOfIndices, Scale, Mask)</div>

<div> </div>

<div>Semantics:</div>

<div>For i=0,1,…,N-1: if (Mask[i]) {*(BaseAddr + VectorOfIndices[i]*Scale) = VectorValue[i];}</div>

<div> </div>

<div>VectorValue: any float or integer vector type.</div>

<div>BaseAddr: a pointer; may be zero if full address is placed in the index.</div>

<div>VectorOfIndices: a vector of i32 or i64 signed or unsigned integer values.</div>

<div>Scale: a compile time constant 1, 2, 4 or 8.</div>

<div>VectorValue, VectorOfIndices and Mask must have the same vector width.</div>

<div> </div>

<div>An indexed store instruction with complete or partial overlap in memory (i.e., two indices with same or close values) will provide the result equivalent to serial scalar stores from least to most significant vector elements.</div>

<div> </div>

<div>The new intrinsics are common for all targets, like recently introduced masked load and store.</div>

<div> </div>

<div>Examples:</div>

<div> </div>

<div><16 x float> @llvm.sindex.load.v16f32.v16i32 (i8 *%ptr,   <16 x i32> %index, i32 %scale)</div>

<div><16 x float> @llvm.masked.sindex.load.v16f32.v16i32  (i8 *%ptr, <16 x i32> %index,   <16 x float> %passthru, <16 x i1> %mask)</div>

<div>void @llvm.sindex.store.v16f32.v16i64(i8* %ptr, <16 x float> %value,   <16 x 164> %index, i32 %scale,  <16 x i1> %mask)</div>

<div> </div>

<div>Comments?</div>

<div> </div>

<div>Thank you.</div>

<div> </div>

<ul style="margin:0;padding-left:36pt;">

<font face="Times New Roman" size="3"><span style="font-size:12pt;">

<li><b><i> Elena</i></b></li></span></font>

</ul>

<div> </div>

<div> </div>

<div> </div>

</span></font>

<p>---------------------------------------------------------------------<br>

Intel Israel (74) Limited</p>


<p>This e-mail and any attachments may contain confidential material for<br>

the sole use of the intended recipient(s). Any review or distribution<br>

by others is strictly prohibited. If you are not the intended<br>

recipient, please contact the sender and delete all copies.</p></body>

</html>