[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Tue Aug 5 11:27:28 PDT 2008

On Tuesday 05 August 2008 12:39, Dan Gohman wrote:
> On Tue, August 5, 2008 8:32 am, David Greene wrote:
> > On Monday 04 August 2008 17:56, Dan Gohman wrote:
> >> The applymask approach leverages use-def information rather than
> >> what can be thought of as duplicating a subset of it, making the IR
> >
> > I don't understand what you mean by "duplicating" here.
>
> If you look just at the case where every instruction in a
> given use-def sub-dag uses the same mask, adding that mask as
> an operand to all of them is largely just duplicating the
> information about them all being connected. This is the common
> case that applymask is aimed at.

But why couldn't one argue the same thing for other operands and make
LLVM IR a stack machine?  It just seems strange to me to treat a
fundamental part of the IR differently simply because it came along
later.

> In the case where multiple masks are used, applymask can still
> cope, but the neat thing is that in this case it serves to
> mark the dataflow edges where masks change.

What advantage does that give over explicit masks in SSA form?  PHI nodes do 
the same thing.

> >  You need some
> > kind of use-def information for the masks themselves because at some
> > point they need to be register-allocated.
>
> What I'm talking about here is just in LLVM IR. I agree that we want
> mask registers as operands during register allocation, and probably
> also instruction selection.

So how does isel go from an implicit-masked LLVM IR to an
explicit-masked MachineInstruction?  And then wouldn't all Machine
passes need to understand mask operands?

> > Think about instcombine.  It's matching patterns.  If the matcher doesn't
> > look at masks, that may be ok most of the time (mod corner cases which
> > I fully appreciate can be a real pain to track down).  If we want fancy
> > instcombine tricks that understand masks, we can add those later.
>
> If masks are operands, instcombine will need to check if all the
> relevent masks match before many of the transformations it does,

I'm not sure about "many."  Many of instcombines tricks are based on
expression reorderings which are valid with or without masks.  We might
need to do some mask manipulation (ANDing or ORing them, for example)
but we'd have to do that anyway with applymask if the incoming masks
to expressions being combined don't match.  We'd have to introduce a new
applymask that takes the result of the logical mask operation.

So, it seems to me that instcombine needs to check and manipulate masks
either way.

We'd have to be careful about not introducing traps but again instcombine will
have to worry about that with applymask as well in the general case.

It strikes me that you're indicating that with applymask, instcombine can just
ignore the implicit mask operand on every instruction (because it isn't 
explicit and instcombine doesn't look at it right now).  I'm in fact saying 
the exact same thing, except masks would be explicit and instcombine would
just not look at those operands.

Neither solution eliminates the need for instcombine to be careful and consult
masks from time to time.

Perhaps I'm totally missing something.  Concrete examples would be helpful.

> and it'll need to take care to put the mask operand in the
> instructions it creates.

Or create a new applymask if it needs to generate new masks as the result of 
combining.

> With applymask, I believe instcombine wouldn't require any
> modifications, except things like "case ApplyMaskInst: break;" in
> a few places. Applymask makes masks in the IR so easy to reason
> about, most passes won't need to do any special reasoning.

I don't think this is quite right.  See the comment above.

> >> The syntax above is an idea for LLVM IR. SelectionDAG doesn't
> >> necessarily
> >> have to use the same approach.
> >
> > What do you mean by "ideal for LLVM IR?"  This looks very much _not_
> > ideal to
> > me from a debugging standpoint.  It's difficult to understand.  It took
> > me reading through the proposal a few times to grok what you are talking
> > about.
>
> I said "idea", not "ideal" :-). But I just meant that LLVM IR
> and SelectionDAG don't have to do the same thing.

Oops, sorry about the misread.  :)

I don't think SelectionDAG is really the issue.  The issue is how to reason 
about vector code and debug compilers.  In my experience, explicit is 
usually the better way to go for maintainability.

> > How do you define "significant impact?"  Compile time?  Development
> > effort?
> > Transition pain?  All of the above?  More?
>
> With mask operands, many passes will need to explicitly check for
> masks even if they don't care and just want to be conservatively
> correct.

If passes need to worry about masks for correctness, that doesn't change
with the representation of masks.  That's a program dataflow issue and applies
no matter how it's represented.

> With applymask, passes will often be able to operate on masked IR
> just as aggressively as non-masked IR.

And they should be able to do so with any other representation as well.  The 
compiler needs to maintain semantics, not syntax.

> > If y'all agree with this premise, it seems to me that we want to support
> > such architectures in as straightforward a way as possible so as to
> > minimize
> > future pain when we're all writing complex and beautiful vector hacks. 
> > :)
>
> I think we basically agree here :-). For me, that applymask
> simplifies the reasoning that optimizers must do for masked
> instructions is a large part of what motivates it for
> consideration.

I'm not convinced that there is a simplification of reasoning but I'm open to 
changing my mind.   Again, concrete examples would be really helpful.

I do agree that in some cases code rewrites might be avoided because existing
Instructions will retain the same number of operands they have now.  But 
that's not a "reasoning" issue.  That's a mechanics issue (how you implement
the algorithm).

Avoiding the one-time pain of converting this kind of code doesn't seem worth
the cost to maintainability to me.

The key with all of this is going to be introducing it incrementally.  The 
first step is to support vector of i1 all the way through isel.  Next is 
vector select.  Then we can start selectively adding masks to Instructions.
As we do this, if a vector operation can trap and a mask isn't available for
that Instruction yet, then we just can't vectorize that code yet.  There's no 
loss from what we have right now, which is total inability to vectorize
conditional code in a reasonable way, save for a few tricks for corner cases.

If instcombine isn't ready to handle masks on some Instructions, then we can't
vectorize in those cases.  We don't *need* to convert instcombine or any other
pass all at once.  We can do it gradually and introduce more vector capability
as we go.

> > What can we learn from the IA64 and ARM backends?  How do they handle
> > their masks (scalar predication)?  Is all the if-conversion done in
> > target-specific passes?
>
> It's in lib/CodeGen/IfConversion.cpp, but it wouldn't be usable for
> vectors. If-conversion for vectors must be done as part of the
> vectorization (whether that's the user/front-end or the optimizer).

I meant using them as examples of representation.  I know that scalar 
if-conversion is different than vector if-conversion.  I'm curious about how 
IfConversion.cpp represents the predicates it creates.  Since it's in CodeGen
I guess the MachineInstruction representation just includes them as extra
operands.

                                                            -Dave