[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Tue Aug 12 16:17:19 PDT 2008

On Tue, 2008-08-12 at 16:47 -0500, David Greene wrote:
> On Thursday 07 August 2008 21:48, Dan Gohman wrote:
> 
> > A key rule in my initial email is that the operands of any
> > operator must be under the same mask. The Verifier can enforce
> > this constraint. One problem though is that this places an
> > interesting burden on front-ends and vectorizers. I'll try to
> > write more about this soon.)
> 
> I totally missed that restriction.  I think this is error-prone.  I certainly 
> wouldn't have caught the problem in my code even if I had known about the 
> restriction. I suppose the fix in this case is to set the mask to all 1's 
> before the add.  This could get rather complicated in interesting cases.

I'm studying this problem and I have some ideas, but I don't
have any clear answers yet.

> 
> > There are things instcombine can do when the masks are different.
> > I think you mentioned earlier it could do clever tricks ANDing
> > masks and such. However, based on the assumption above, this
> > wouldn't be used very often, so a conservative implementation of
> > instcombine that doesn't do any of this is mostly good enough.
> 
> Since we don't support ANY vectorization of conditional code right now, any 
> addition is a win.  That's true for whatever implementation of masks we
> go with.
> 
> > Here's one more example, or at least a skeleton of several
> > examples. A loop to be vectorized:
> >
> >    for (...) {
> >      A
> >      if (...) {
> >        B
> >      } else {
> >        C
> >      }
> >      D
> >    }
> >
> > Assuming there's a bunch of code in B and a bunch in C, then
> > we have four bunches of code and three mask conditions -
> > A and D are unmasked, B is masked, and C is masked with the
> > inverse mask value. This code could easily have more if
> > branches, nested loops, early exits, and so on, but the main
> > idea is that there are blocks of instructions grouped together
> > under the same mask, cross-block optimization opportunities
> > exist but are limited, and that this is assumed to be
> > basically representative.
> 
> I think that's right.
> 
> > There are some interesting cross-block cases here.
> > If you can prove that something in B and/or C can be run
> > unmasked, it could be CSE'd/LICM'd/PRE'd/etc.
> > If D has a PHI merging a value from B and C, it might be
> > nice to do a vector merge. It's interesting to look at
> > out how these kinds of cases work under both
> > mask operands and applymask.
> 
> Yes.  Unfortunately, I'm out of spare cycles at the moment.

I understand :-). And I should clarify for everyone following along
here that there are some more basic projects that need to be completed
before LLVM is ready to take on generalized predication, so we have
some time.

> 
> I was thinking about this more yesterday and came up with what I think is a 
> useful question to ask.
> 
> If LLVM had had masks way back when it started, what would it look like?
> 
> I realize that there are constraints on what kind of transition pain we want
> to go through but it gets back to my earlier point that core IR infrastructure
> should first first-class no matter when it is added.  And "what it
> would have
> looked like from the start" is a reasonable definition of
> "first-class."

One thing I want to clarify is that my interest in applymask is actually
not so much motivated by one-time transition pain avoidance here. I'm
more interested in keeping the IR simple and easy to understand, as much
for the benefit of code that isn't yet written as code that is. I've
gotten some good feedback and there are some outstanding problems, but
I'm hoping that generally acceptable solutions can be found.

Dan