[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Mon Aug 4 14:02:32 PDT 2008

On Saturday 02 August 2008 16:47, Dan Gohman wrote:

> * Vector Gather/Scatter
>
> One way to do gather/scatter would be to extend vector types to
> support pointer element types, and extend load, store, and getelementptr
> to operate on vectors of pointers.
>
> A typical gather sequence would then look like this:
>
>   %vi = load <2 x i64>* %indices             ; load a vector of indices
>   %vp = gep <2 x f32*> %base, <2 x i64> %vi  ; compute an address vector
>   %vx = load <2 x f32*> %vp                  ; gather

This looks very good to me.  It will make vector gather/scatter code much
cleaner than the "extract data and mask bits" stuff we talked about at the
meeting.

> Note that this wouldn't support multiple alignments or multiple
> address spaces in a single gather/scatter. Similarly, volatile
> would be all-or-nothing. These don't seem like show-stoppers though.

Nope.  If alignment is a concern, my assumption would be that the same
alignment would be required on all elements, otherwise one would not be
able to vectorize the gather/scatter.  One doesn't generally mix data types
in a gather/scatter operations.

> This would complicate analyses that look at load and store addresses,
> but if we really want to do gather/scatter without messes, this might be
> an acceptable tradeoff.

By "complicate" do you mean "need to look at multiple addresses from a
single instruction?"  Or is there more than that?  I'm trying to understand 
all the implications.

> While adding a mask operand to every instruction that needs it would
> serve the intended purpose, it would also enlarge and complicate IR,
> even in code that doesn't need masks. It's a common use-case to have a
> single mask used by many adjacent instructions, so this would also be
> highly redundant.

But explicit is better than implicit in my experience.  It's also the LLVM
philosophy to be as explicit as possible.

> An alternative that exploits this common use-case is to add a new
> applymask instruction:
>
>   %w = applymask <2 x f32> %v, <2 x i1> %m
>
> The semantics would be to copy %v into %w, and implicitly apply mask %m
> to all users (recursively) of %w, unless overridden by another
> applymask. For example:
>
>   %p = applymask <2 x f32*> %q, <2 x i1> %m
>   %x = load <2 x f32*>* %p                   ; implicitly masked by %m
>   %y = add <2 x f32> %x, %w                  ; implicitly masked by %m
>   %z = mul <2 x f32> %y, %y                  ; implicitly masked by %m

Yuck.  I don't like this at all.  It makes reading the IR harder because now
you need to worry about context.  Not all dependencies are readily expressed
in the instructions.  How would one express TableGen patterns for such
things?

My understanding is that we came away with a general agreement to add
mask support to operations that can trap and to memory operations,  That
would mean adding masks to floating-point arithmetic and memory operations.
As I recall, Chris experssed some interest in create separate integer and fp
arithmetic instructions anyway, so it doesn't seem to be a lot of additional
work to add masks to the fp side since instcombine, et. al. will need to know
about entirely new operations anyway.

We concluded that operation results would be undefined for vector elements 
corresponding to a zero mask bit.

We also talked about adding a vector select, which is crucial for any code 
that uses masks.

                                                                -Dave