[LLVMdev] Ideas for representing vector gather/scatter and masks in LLVM IR

Thu Aug 7 12:13:57 PDT 2008

On Tuesday 05 August 2008 13:27, David Greene wrote:

> Neither solution eliminates the need for instcombine to be careful and
> consult masks from time to time.
>
> Perhaps I'm totally missing something.  Concrete examples would be helpful.

Ok, so I took my own advice and thought about CSE and instcombine a bit.
I wrote the code by hand in a sort of pseudo-llvm language, so don't
crucify me for mistakes.  :)

CSE is an optimization that needs to pay attention to masks.  You can't
(easily) CSE an expression with different masks.

Using a mask-per-operation setup, code might look like this:

CSE Mask Example

Mask-per-operation:
%m1 = ...
%m2 = ...
%r1 = add %r2, %r3, %m1
%r4 = add %r2, %r3, %m2

We can only CSE the adds if %m1 == %m2.  I'd think it would be
sufficient to just check whether the mask registers are the same, though
you could imagine walking back through the use-def chains to do more
complex analysis.  Esentially, the mask operand becomes part of the
expression you're trying to match.  CSE definitely needs to be
changed to understand masks.

With applymask, things look something like this:

applymask:
%m1 = ...
%m2 = ...
%r5 = applymask %r2, %m1
%r6 = applymask %r3, %m1
%r7 = applymask %r2, %m2
%r8 = applymask %r3, %m2
%r1 = add %r5, %r6
%r4 = add %r7, %r8

This is interesting because CSE will not do the optimization since the
operands of the adds are different.  CSE does not have to change to
understand masks.  Value numbering will have to understand applymask to
make sure it doesn't do something bad because it doesn't care about the
names of operands but the renaming of registers required by applymask is
somewhat attractive.

Now let's look at an instcombine example:

Instcombine example: -A + -B --> -(A + B)

Mask-per-operation:
%m1 = ...
%m2 = ...
%m3 = ...
%r1 = neg %r2, %m1
%r1 = select %r1, %r2, %m1  # Define all elements
%r3 = neg %r4, %m2
%r3 = select %r3, %r4, %m2  # Define all elements
%r5 = add %r1, %r3, %m3

Since the masks on the negs are different we have to be careful in how
we do this.  In the easiest case, we simply disallow combining here,
which requires instcombine to examine the masks and, as in the CSE
example above, potentially just look at the mask register names to
determine equality.

One could imagine doing the combining this way:

%m1 = ...
%m2 = ...
%m3 = ...
# Do the combining where legal
%m4 = and %m1, %m2
%r6 = add %r2, %r4, %m4
%r7 = neg %r6, %m4
%r7 = select %r7, %r6, %m4 # Define all elements for final merge
# Take care of the rest
%m6 = xor %m1, %m4
%m7 = xor %m2, %m4
%m8 = xor %m3, %m4
%r1 = neg %r2, %m6
%r1 = select %r1, %r2, %m6 # Define all elements for add
%r3 = neg %r4, %m7
%r3 = select %r3, %r4, %m7 # Define all elements for add
%r5 = add %r1, %r3, %m8
%r5 = select %r5, %r1, %m7 # Define all elements for final merge
# Do the final merge
%r5 = select %r7, %r5, %m4

But this is obviously a loss.  We may be able to get away with
eliminating some merges (I'm being conservative about undefined elements
here) but it would still be a loss.

applymask:

%m1 = ...
%m2 = ...
%m3 = ...
%r6 = applymask %r2, %m1
%r7 = applymask %r4, %m2
%r1 = neg %r6
%r3 = neg %r7
%r8 = applymask %r1. %m3
%r9 = applymask %r3, %m3
%r5 = add %r8, %r9

This is really no different than the mask-per-operation case.
Instcombine still has to consider masks when evaluating the legality and
profitability of combining.  Here instcombine would have to walk back
through the def-use lists on the operands of the neg and add to find if
they are masked and then determine the mask value (or mask register name
for simpler equality testing).

Again, doing the combining in this case would be detrimental.  My guess
is that most combining turns out worse code when operations are masked
and the masks are not equal.

Given these two examples, I see some benefit to applymask for
transformations like CSE and other pattern-matching algorithms where the
renaming of registers that applymask requires converts previously
matching patterns into patterns that don't match.  For something like
instcombine or value numbering that is based on value flow and/or pure
semantics, I don't see any benefit.

Of course, these are just two examples.  One could repeat the exercise
for any transformation we're interested in.

I hope this is helpful in moving us along in the discussion.  Feedback
and comments are welcome!

                                                -Dave