[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Tue Aug 5 10:39:21 PDT 2008

On Tue, August 5, 2008 8:32 am, David Greene wrote:
> On Monday 04 August 2008 17:56, Dan Gohman wrote:
>> The applymask approach leverages use-def information rather than
>> what can be thought of as duplicating a subset of it, making the IR
>
> I don't understand what you mean by "duplicating" here.

If you look just at the case where every instruction in a
given use-def sub-dag uses the same mask, adding that mask as
an operand to all of them is largely just duplicating the
information about them all being connected. This is the common
case that applymask is aimed at.

In the case where multiple masks are used, applymask can still
cope, but the neat thing is that in this case it serves to
mark the dataflow edges where masks change.

>  You need some
> kind of use-def information for the masks themselves because at some
> point they need to be register-allocated.

What I'm talking about here is just in LLVM IR. I agree that we want
mask registers as operands during register allocation, and probably
also instruction selection.

>
>> less cluttered. And, it makes it trivially straightforward to write
>> passes that work correctly on both masked and unmasked code.
>
> I had a thought on this, actually.  Let's say the mask is the very last
> operand on masked instructions.  Most passes don't care about the mask
> at all.  They can just ignore it.  Since they don't look at the extra
> operand
> right now, there shouldn't be many changes necessary (some asserts
> may need to be fixed, etc.).
>
> Think about instcombine.  It's matching patterns.  If the matcher doesn't
> look at masks, that may be ok most of the time (mod corner cases which
> I fully appreciate can be a real pain to track down).  If we want fancy
> instcombine tricks that understand masks, we can add those later.

If masks are operands, instcombine will need to check if all the
relevent masks match before many of the transformations it does,
and it'll need to take care to put the mask operand in the
instructions it creates.

With applymask, I believe instcombine wouldn't require any
modifications, except things like "case ApplyMaskInst: break;" in
a few places. Applymask makes masks in the IR so easy to reason
about, most passes won't need to do any special reasoning.

>
>> >  Not all dependencies are readily expressed
>> > in the instructions.  How would one express TableGen patterns for such
>> > things?
>>
>> The syntax above is an idea for LLVM IR. SelectionDAG doesn't
>> necessarily
>> have to use the same approach.
>
> What do you mean by "ideal for LLVM IR?"  This looks very much _not_ ideal
> to
> me from a debugging standpoint.  It's difficult to understand.  It took me
> reading through the proposal a few times to grok what you are talking
> about.

I said "idea", not "ideal" :-). But I just meant that LLVM IR
and SelectionDAG don't have to do the same thing.

>
>> I think we all recognize the need, and in the absence of better
>> alternatives are willing to accept the mask operand approach. It would
>> have a significant impact on everyone, even those that don't use masks.
>
> How do you define "significant impact?"  Compile time?  Development
> effort?
> Transition pain?  All of the above?  More?

With mask operands, many passes will need to explicitly check for
masks even if they don't care and just want to be conservatively
correct.

With applymask, passes will often be able to operate on masked IR
just as aggressively as non-masked IR.

>> I don't want to stand in the way of progress, but this alternative
>> approach seems promising enough to be worth consideration.
>
> Alternatives are always welcome and worth considering.  I'm looking at the
> kind of things the LLVM community is going to want to support and I'm
> pretty sure masks are going to be a very big part of architectures in the
> future.  We're done with clock speed improvements, so we need to rely on
> architecture more.  Vectorization is a well-known technique to improve
> single thread performance and masks are critical to producing efficient
> vector
> code.
>
> If y'all agree with this premise, it seems to me that we want to support
> such architectures in as straightforward a way as possible so as to
> minimize
> future pain when we're all writing complex and beautiful vector hacks.  :)

I think we basically agree here :-). For me, that applymask
simplifies the reasoning that optimizers must do for masked
instructions is a large part of what motivates it for
consideration.

> What can we learn from the IA64 and ARM backends?  How do they handle
> their masks (scalar predication)?  Is all the if-conversion done in
> target-specific passes?

It's in lib/CodeGen/IfConversion.cpp, but it wouldn't be usable for
vectors. If-conversion for vectors must be done as part of the
vectorization (whether that's the user/front-end or the optimizer).

Dan