[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Mon Aug 4 15:56:32 PDT 2008

On Aug 4, 2008, at 2:02 PM, David Greene wrote:

> On Saturday 02 August 2008 16:47, Dan Gohman wrote:
>
>> * Vector Gather/Scatter

>> This would complicate analyses that look at load and store addresses,
>> but if we really want to do gather/scatter without messes, this  
>> might be
>> an acceptable tradeoff.
>
> By "complicate" do you mean "need to look at multiple addresses from a
> single instruction?"  Or is there more than that?  I'm trying to  
> understand
> all the implications.

I mean just that -- we have a fair amount of code built around looking
at the addresses of load and store nodes that in some cases would need
to be restructured if it would cope with multiple addresses at a time.

>
>
>> While adding a mask operand to every instruction that needs it would
>> serve the intended purpose, it would also enlarge and complicate IR,
>> even in code that doesn't need masks. It's a common use-case to  
>> have a
>> single mask used by many adjacent instructions, so this would also be
>> highly redundant.
>
> But explicit is better than implicit in my experience.  It's also  
> the LLVM
> philosophy to be as explicit as possible.
>
>> An alternative that exploits this common use-case is to add a new
>> applymask instruction:
>>
>>  %w = applymask <2 x f32> %v, <2 x i1> %m
>>
>> The semantics would be to copy %v into %w, and implicitly apply  
>> mask %m
>> to all users (recursively) of %w, unless overridden by another
>> applymask. For example:
>>
>>  %p = applymask <2 x f32*> %q, <2 x i1> %m
>>  %x = load <2 x f32*>* %p                   ; implicitly masked by %m
>>  %y = add <2 x f32> %x, %w                  ; implicitly masked by %m
>>  %z = mul <2 x f32> %y, %y                  ; implicitly masked by %m
>
> Yuck.  I don't like this at all.  It makes reading the IR harder  
> because now
> you need to worry about context.

I don't disagree with these. I think it's a trade-off, with LLVM
design philosophy and IR cleanliness arguments on both sides.

The applymask approach leverages use-def information rather than
what can be thought of as duplicating a subset of it, making the IR
less cluttered. And, it makes it trivially straightforward to write
passes that work correctly on both masked and unmasked code.

>  Not all dependencies are readily expressed
> in the instructions.  How would one express TableGen patterns for such
> things?

The syntax above is an idea for LLVM IR. SelectionDAG doesn't  
necessarily
have to use the same approach.

>
>
> My understanding is that we came away with a general agreement to add
> mask support to operations that can trap and to memory operations,   
> That
> would mean adding masks to floating-point arithmetic and memory  
> operations.
> As I recall, Chris experssed some interest in create separate  
> integer and fp
> arithmetic instructions anyway, so it doesn't seem to be a lot of  
> additional
> work to add masks to the fp side since instcombine, et. al. will  
> need to know
> about entirely new operations anyway.

I think we all recognize the need, and in the absence of better
alternatives are willing to accept the mask operand approach. It would
have a significant impact on everyone, even those that don't use masks.
I don't want to stand in the way of progress, but this alternative
approach seems promising enough to be worth consideration.

>
>
> We concluded that operation results would be undefined for vector  
> elements
> corresponding to a zero mask bit.
>
> We also talked about adding a vector select, which is crucial for  
> any code
> that uses masks.

Right. This applymask idea doesn't conflict with these.

Dan