[PATCH] D24924: [PPC] Better codegen for AND, ANY_EXT, SRL sequence

Wed Sep 28 07:27:04 PDT 2016

amehsan added a comment.

In https://reviews.llvm.org/D24924#552610, @hfinkel wrote:

> Alternatively, we might teach the BitPermutationSelector to look through extends, which would be more general. Had you looked at that?

There is potentially a more general solution as well. I can call it "bubbling up" any_ext and zero_ext. So for example in the following selection DAG

  Optimized type-legalized selection DAG: BB#0 '_Z3fooRK3PB2S1_:entry'
  SelectionDAG has 17 nodes:
    t0: ch = EntryToken
                t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
              t37: i32,ch = load<LD1[%arrayidx.i6](align=8), anyext from i8> t0, t2, undef:i64
                t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
              t42: i32,ch = load<LD1[%arrayidx.i37](align=8), anyext from i8> t0, t4, undef:i64
            t54: i32 = xor t37, t42
          t55: i32 = srl t54, Constant:i64<3>
        t56: i64 = any_extend t55
      t51: i64 = and t56, Constant:i64<1>
    t20: ch,glue = CopyToReg t0, Register:i64 %X3, t51
    t21: ch = PPCISD::RET_FLAG t20, Register:i64 %X3, t20:1

We can first push any_ext before srl. (This is what I did in my first attempt).  But we don't need to stop here. We can then push it before xor and apply (and merge it) to load instructions that feed xor. Given that zero extension is free for PPC loads this is easier than applying the similar idea to sign_extend.

I also played a little bit with BitPermutationSelector. I added the following code to BitPermutationSelector::getValueBits

  +    case ISD::ANY_EXTEND: {
  +        auto Size = V.getOperand(0).getNode()->getValueType(0).getSizeInBits();
  +        DEBUG(dbgs() << "LOC B1\n" << NumBits << "\n" << Size <<"\n" ; );
  +        const SmallVector<ValueBit, 64> *InnerBits;
  +        std::tie(Interesting, InnerBits) = getValueBits(V.getOperand(0), Size);
  +        for (unsigned i = 0; i < Size; ++i)
  +          Bits[i] = (*InnerBits)[i];
  +        for (unsigned i = Size; i < NumBits; ++i) {
  +          Bits[i] = ValueBit(ValueBit::ConstZero);
  +        }
  +        return std::make_pair(Interesting, &Bits);
  +      }
  +      break;

This works, but the codegen currently does not generate correct code for i32 to i64 conversions. So more work is needed. As we discussed, this pattern might be special given it is generated in the target independent codegen. So a more general solution may not be needed. So I will add a note to our readme files, with a pointer to this comment, so in future if we need better handling of this opcodes, we follow one of these ideas.

https://reviews.llvm.org/D24924