[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Sat Aug 2 14:47:28 PDT 2008

Here are some ideas that came out of brainstorming during and after
the dev meeting. Disclaimer: they're just ideas :-).

* Vector Gather/Scatter

One way to do gather/scatter would be to extend vector types to
support pointer element types, and extend load, store, and getelementptr
to operate on vectors of pointers.

A typical gather sequence would then look like this:

  %vi = load <2 x i64>* %indices             ; load a vector of indices
  %vp = gep <2 x f32*> %base, <2 x i64> %vi  ; compute an address vector
  %vx = load <2 x f32*> %vp                  ; gather

This is significantly less ugly than a build-it-yourself gather, and
would be much friendlier to masks (below) than the "individually extract
each i1 from the mask" approach.

Note that this wouldn't support multiple alignments or multiple
address spaces in a single gather/scatter. Similarly, volatile
would be all-or-nothing. These don't seem like show-stoppers though.

This would complicate analyses that look at load and store addresses,
but if we really want to do gather/scatter without messes, this might be
an acceptable tradeoff.

* Masks

While adding a mask operand to every instruction that needs it would
serve the intended purpose, it would also enlarge and complicate IR,
even in code that doesn't need masks. It's a common use-case to have a
single mask used by many adjacent instructions, so this would also be
highly redundant.

An alternative that exploits this common use-case is to add a new
applymask instruction:

  %w = applymask <2 x f32> %v, <2 x i1> %m

The semantics would be to copy %v into %w, and implicitly apply mask %m
to all users (recursively) of %w, unless overridden by another
applymask. For example:

  %p = applymask <2 x f32*> %q, <2 x i1> %m
  %x = load <2 x f32*>* %p                   ; implicitly masked by %m
  %y = add <2 x f32> %x, %w                  ; implicitly masked by %m
  %z = mul <2 x f32> %y, %y                  ; implicitly masked by %m

This approach minimizes enlargement of IR in code that doesn't use
masks, and it reduces the impact on complexity -- a pass like
instcombine that wants to combine an instruction and one of its operands
into a single instruction could safely do so without knowing anything
about masks.

The applymask instruction could apply to scalars as well as vectors.

The set of instructions requiring mask legalization would be found by
recursively traversing the uses of all applymask instructions.

extractvalue on a masked-out value yields undef. overriding a value's
mask with a new applymask instruction that de-masks any elements leaves
those elements set to undef.

It would be illegal for an instruction with multiple operands to have
multiple distinct masks. This includes PHI nodes.

Insertelement and shufflevector with masked-out values would not be
immune to the mask; assigning a value to a masked-out element of a
vector does not make that element non-undef. Code wanting to fill in
masked-out values must apply a new mask to the operands first.

Masks would not themselves be maskable. That is, the result of an
applymask could not be used (even indirectly) as the second operand
of another applymask.

Dan