[LLVMdev] Adding masked vector load and store intrinsics

Tian, Xinmin xinmin.tian at intel.com
Fri Oct 24 11:48:28 PDT 2014

Adam,  yes, there are more stuff we need to consider, e.g. masked gather / scatter, masked arithmetic ops, ...etc.  This proposal serves the first step which is an important, as a direction check w/ community.

Xinmin Tian

From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Adam Nemet
Sent: Friday, October 24, 2014 10:58 AM
To: Demikhovsky, Elena
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Adding masked vector load and store intrinsics

On Oct 24, 2014, at 4:24 AM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote:


We would like to add support for masked vector loads and stores by introducing new target-independent intrinsics. The loop vectorizer will then be enhanced to optimize loops containing conditional memory accesses by generating these intrinsics for existing targets such as AVX2 and AVX-512. The vectorizer will first ask the target about availability of masked vector loads and stores. The SLP vectorizer can potentially be enhanced to use these intrinsics as well.

The intrinsics would be legal for all targets; targets that do not support masked vector loads or stores will scalarize them.

I do agree that we would like to have one IR node to capture these so that they survive until ISel and that their specific semantics can be expressed.  However, can you discuss the other options (new IR instructions, target-specific intrinsics) and why you went with target-independent intrinsics.

My intuition would have been to go with target-specific intrinsics until we have something solid implemented and then potentially turn this into native IR instructions as the next step (for other targets, etc.).  I am particularly worried whether we really want to generate these for targets that don't have vector predication support.

There is also the related question of vector predicating any other instruction beyond just loads and stores which AVX512 supports.  This is probably a smaller gain but should probably be part of the plan as well.


The addressed memory will not be touched for masked-off lanes. In particular, if all lanes are masked off no address will be accessed.

  call void @llvm.masked.store (i32* %addr, <16 x i32> %data, i32 4, <16 x i1> %mask)

  %data = call <8 x i32> @llvm.masked.load (i32* %addr, <8 x i32> %passthru, i32 4, <8 x i1> %mask)

where %passthru is used to fill the elements of %data that are masked-off (if any; can be zeroinitializer or undef).

Comments so far, before we dive into more details?

Thank you.

- Elena and Ayal

Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141024/d117864f/attachment.html>

More information about the llvm-dev mailing list