[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Tue Aug 12 14:47:37 PDT 2008

On Thursday 07 August 2008 21:48, Dan Gohman wrote:

> A key rule in my initial email is that the operands of any
> operator must be under the same mask. The Verifier can enforce
> this constraint. One problem though is that this places an
> interesting burden on front-ends and vectorizers. I'll try to
> write more about this soon.)

I totally missed that restriction.  I think this is error-prone.  I certainly 
wouldn't have caught the problem in my code even if I had known about the 
restriction. I suppose the fix in this case is to set the mask to all 1's 
before the add.  This could get rather complicated in interesting cases.

> There are things instcombine can do when the masks are different.
> I think you mentioned earlier it could do clever tricks ANDing
> masks and such. However, based on the assumption above, this
> wouldn't be used very often, so a conservative implementation of
> instcombine that doesn't do any of this is mostly good enough.

Since we don't support ANY vectorization of conditional code right now, any 
addition is a win.  That's true for whatever implementation of masks we
go with.

> Here's one more example, or at least a skeleton of several
> examples. A loop to be vectorized:
>
>    for (...) {
>      A
>      if (...) {
>        B
>      } else {
>        C
>      }
>      D
>    }
>
> Assuming there's a bunch of code in B and a bunch in C, then
> we have four bunches of code and three mask conditions -
> A and D are unmasked, B is masked, and C is masked with the
> inverse mask value. This code could easily have more if
> branches, nested loops, early exits, and so on, but the main
> idea is that there are blocks of instructions grouped together
> under the same mask, cross-block optimization opportunities
> exist but are limited, and that this is assumed to be
> basically representative.

I think that's right.

> There are some interesting cross-block cases here.
> If you can prove that something in B and/or C can be run
> unmasked, it could be CSE'd/LICM'd/PRE'd/etc.
> If D has a PHI merging a value from B and C, it might be
> nice to do a vector merge. It's interesting to look at
> out how these kinds of cases work under both
> mask operands and applymask.

Yes.  Unfortunately, I'm out of spare cycles at the moment.

I was thinking about this more yesterday and came up with what I think is a 
useful question to ask.

If LLVM had had masks way back when it started, what would it look like?

I realize that there are constraints on what kind of transition pain we want
to go through but it gets back to my earlier point that core IR infrastructure
should first first-class no matter when it is added.  And "what it would have
looked like from the start" is a reasonable definition of "first-class."

                                                 -Dave