[PATCH] Distribute binary operators through SELECT_CC

Mon Jul 15 04:18:41 PDT 2013

Hi Duncan,

Thanks for the review.

Duncan Sands <duncan.sands at gmail.com> writes:
> On 28/06/13 12:49, Richard Sandiford wrote:
>> This patch transforms something like:
>>
>>    (add (select_cc X, Y, 10, 20, CC), 10)
>>
>> into:
>>
>>    (select_cc X, Y, 20, 30, CC)
>>
>> but generalised for a tree of select_ccs.  It allows at most one of
>> the selected values to be non-constant.
>>
>> If this goes in, I'd like to follow it with patches to handle unary
>> operations and floating-point binary operations in the same way.
>>
>> The main motivation was to optimise operations on scalarised vector
>> selects.  It also changed the output of the R600/alu-shift.ll test that
>> Tom removed earlier.  Tom reckoned the new code was an improvement
>> (but defeated the original purpose of that test).
>
> while this looks OK (I have a few small nits to pick, see below), it's not good
> that the IR level optimizers don't get this.  For general transformations of
> this type, the usual is (1) teach the IR level optimizers how to do it; (2) if,
> due to type legalization or whatever, this nonetheless still pops up a lot at
> the codegen level then do it in codegen too.  In short, I'd like to see this be
> implemented at the IR level first (in which case it should probably look through
> phi nodes too).

Argh, it looks like I overreduced the testcase to the point where it already
passes. :-(  These cases are already handled at the IR level by InstCombine.
Sorry for being careless.

The original idea was to tweak the code generated by llvmpipe.
I wanted to get to the point where:

define void @f1(<4 x i32> *%ptra, <4 x i32> *%ptrb) {
  %a = load <4 x i32> *%ptra
  %b = load <4 x i32> *%ptrb
  %cmp = icmp eq <4 x i32> %a, %b
  %ext = sext <4 x i1> %cmp to <4 x i32>
  %and = and <4 x i32> %ext, <i32 2, i32 2, i32 2, i32 2>
  store <4 x i32> %and, <4 x i32> *%ptra
  ret void
}

produces the same code as:

define void @f2(<4 x i32> *%ptra, <4 x i32> *%ptrb) {
  %a = load <4 x i32> *%ptra
  %b = load <4 x i32> *%ptrb
  %cmp = icmp eq <4 x i32> %a, %b
  %res = select <4 x i1> %cmp, <4 x i32> <i32 2, i32 2, i32 2, i32 2>, <4 x i32> zeroinitializer
  store <4 x i32> %res, <4 x i32> *%ptra
  ret void
}

on targets where <4 x i32> isn't a legal type.  At the moment, the first
version selects between 0 and -1 and then has separate ANDs for each
element.  (Or, depending on the target, selects between 0 and 1,
sign extends it, and then ANDs it with 2.)  The second version selects
directly between 0 and 2.

So like you say, the case I was originally looking at was one where the
select is introduced in codegen during legalisation.

This patch was supposed to be a step towards that goal, but given the
above is probably useless standalone.  The idea was to include the SETCC
conversion as a separate patch, but I think I'll need to do it in one.

>> +/// Return null if N is a tree containing just constants and single-use selects.
>> +/// Otherwise return a node L such that replacing L with a constant would
>> +/// give such a tree.
>> +static SDNode *findNonConstLeaf(SDNode *N, unsigned Depth = 0) {
>> +  if (N->getOpcode() == ISD::Constant || N->getOpcode() == ISD::ConstantFP)
>> +    return 0;
>> +
>> +  if (N->getOpcode() == ISD::SELECT_CC && N->hasOneUse() && Depth < 6) {
>
> Please turn the recursion limit into a named constant.

OK, I agree that would be better.  The reason I'd chosen a hard-coded 6
is because that's what other DAG routines seem to do.  E.g.:

  // Don't recurse exponentially.
  if (Depth > 6) return 0;

in DAGCombiner.cpp:isNegatibleForFree,

    if (Depth > 6 || Aliases.size() == 2) {

in DAGCombiner::GatherAllAliases(), and:

  if (Depth == 6)
    return;  // Limit search depth.

in SelectionDAG::ComputeMaskedBits().  Would it be better for these 6s
to be a single named constant, or should they be separate named constants
that happen to have the same value at the moment?

>> +    SDNode *A = findNonConstLeaf(N->getOperand(2).getNode(), Depth + 1);
>> +    if (!A)
>> +      return findNonConstLeaf(N->getOperand(3).getNode(), Depth + 1);
>> +    SDNode *B = findNonConstLeaf(N->getOperand(3).getNode(), Depth + 1);
>> +    if (!B)
>> +      return A;
>
> Since you always call findNonConstLeaf on both sides, this seems neater:
>
>    SDNode *A = findNonConstLeaf(N->getOperand(2).getNode(), Depth + 1);
>    SDNode *B = findNonConstLeaf(N->getOperand(3).getNode(), Depth + 1);
>    if (!A)
>      return B;
>    if (!B)
>      return A;

Yeah, true.  I was trying to emphasise the tail call for !A, but that's the
compiler's job...

>> +SDValue DAGCombiner::distributeBinary(SDNode *N) {
>> +  for (unsigned I = 0; I < 2; I++) {
>> +    SDValue Other = N->getOperand(I ^ 1);
>> +    if (Other.getOpcode() == ISD::Constant ||
>> +        Other.getOpcode() == ISD::ConstantFP) {
>
> Since the DAG combiner should be normalizing constants onto the RHS, this loop
> shouldn't be necessary.

The idea was to cope with constant first operands in noncommutative operations,
e.g. (shl (select ..., 1, 0), 2)

Thanks,
Richard