[llvm-commits] PATCH: Teach the x86 backend to fold mask & shift patterns into an addressing mode computation

Mon Jan 9 00:48:46 PST 2012

I've attached an updated patch. This addresses your comments as well as a
bunch of comment Owen made to me on IRC. There was some overlap. It also
includes an extra test case that helped me catch some cases it wasn't
firing, and it factors all of the logic out into a helper function. This is
used to apply the logic whether the AND or the SRL node come first in the
DAG, I've found inputs with it both ways around (hence the extra test case)
and the transform itself is easily generalized to either ordering.

Owen expressed some concerns about whether this is doing the correct thing
WRT the topological sort of the DAG, and I'm now fairly confident in this
part. The code inserts the new nodes, in the correct sequence, each one
before the input node 'N', so they will be ordered correctly for the
remaining ISel progression. The contract of ISel is slightly violated by
assigning them all the same node ID number, so that the IDs are no longer
unique, but there are several other transforms in the same section (x86
address mode matching) that do the same thing, so this appears to be fine
in practice. My suspicion is that the IDs merely need to form a SWO or some
such, and that at least is satisfied. I could write code that would
re-number all subsequent nodes in order to have unique numbers again, but I
don't think it should be part of this patch considering the existing
precedent.

I'm also starting more extensive testing of this code to make sure it
doesn't silently miscompile anything.

On Sun, Jan 8, 2012 at 6:01 AM, Duncan Sands <baldrick at free.fr> wrote:

> Hi Chandler,
>
> > --- a/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > +++ b/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> > @@ -4237,6 +4237,17 @@ SDValue DAGCombiner::visitZERO_EXTEND(SDNode *N) {
> >        return SDValue(N, 0);   // Return N so it doesn't get rechecked!
> >      }
> >
> > +    // fold (zext (truncate (zextload x))) -> (zext (zextload x))
> > +    if (LoadSDNode *Load =
> dyn_cast<LoadSDNode>(N0.getOperand(0).getNode())) {
> > +      EVT TruncVT = N0.getValueType(), MemVT = Load->getMemoryVT();
> > +      // This is safe as long as the truncate doesn't truncate any of
> the bits
> > +      // loaded from memory and we zero extended the rest of the bits.
> > +      if (ISD::isZEXTLoad(Load) && TruncVT.bitsGE(MemVT) && 0) {
> > +        return DAG.getNode(ISD::ZERO_EXTEND, N->getDebugLoc(), VT,
> > +                           N0.getOperand(0));
> > +      }
> > +    }
>
> this is a special case of a much more general transform: zext(trunc(x)) ->
> zext(x) if the bits dropped by the trunc are known zero [in full generality
> the result of the transform might be x if the zext is just restoring the
> bits truncated, or trunc(x) (to a different size) if the zext does not
> restore all of the truncated bits].  How about implementing the general
> transform instead?
>

Done.

>
> > --- a/lib/Target/X86/X86ISelDAGToDAG.cpp
> > +++ b/lib/Target/X86/X86ISelDAGToDAG.cpp
> > @@ -814,6 +814,106 @@ bool
> X86DAGToDAGISel::MatchAddressRecursively(SDValue N, X86ISelAddressMode &AM,
> >      break;
> >      }
> >
> > +  case ISD::SRL: {
> > +    // Try some heroics to detect shifts of masked values where the
> mask can be
> > +    // replaced by extending the shift and undoing that in the
> addressing mode
> > +    // scale. Patterns such as (shl (srl x, c1), c2) are canonicalized
> into
> > +    // (and (srl x, SHIFT), MASK) by DAGCombines that don't know the
> shl can be
> > +    // done in the addressing mode.
>
> How about an explicit example of what you want to achieve?
>

Added, although its a bit weird to have here instead of in the testcase.
Still, maybe it helps as I agree this is a bit of a weird transform.

>
> > +
> > +    // Scale must not be used already.
> > +    if (AM.IndexReg.getNode() != 0 || AM.Scale != 1) break;
> > +
> > +    if (N.getNumOperands() != 2) break;
>
> Can this check actually ever fire?  Also, might this be loading a vector
> value?
>

The check was dead, but I don't understand the comment about vector values?
This may be about to load a vector, but the input shouldn't be. None of the
existing code worries about vectors, likely because legalize has already
precluded them from being inputs to this chain?

> > +    ConstantSDNode *ShiftAmtNode =
> dyn_cast<ConstantSDNode>(N.getOperand(1));
> > +    if (!ShiftAmtNode) break;
> > +    SDValue And = N.getOperand(0);
> > +    if (And.getOpcode() != ISD::AND || And.getNumOperands() != 2) break;
>
> Can an "and" really have more than two operands?
>

Nope, nuked.

>
> > +    ConstantSDNode *MaskNode =
> dyn_cast<ConstantSDNode>(And.getOperand(1));
> > +    if (!MaskNode) break;
> > +    SDValue X = And.getOperand(0);
> > +
> > +    // We only handle up to 64-bit values here for simplicity.
> > +    if (X.getValueSizeInBits() > 64) break;
> > +
> > +    // The number of bits that would have to be shifted left is the
> number of
> > +    // zero bits in the mask *after* it is shifted right.
>
> Are you sure about that?  shl(shr x, N),M should result in M bits being
> masked.
> Thus I would have expected AMShiftAmt below to be MaskTZ (assuming
> AMShiftAmt
> is the amount the zapped shl would have shifted by, which is what your
> comment
> suggested to me; how about more explicit variable names or comments).
>

Sorry, this comment was stale, and highly misleading, as it dated from an
earlier version of th ecode.. I've removed it as there is a much better
comment now directly attached to AMShiftAmt.

>
> > +    uint64_t Mask = MaskNode->getZExtValue();
> > +    unsigned ShiftAmt = ShiftAmtNode->getZExtValue();
> > +    unsigned MaskLZ = CountLeadingZeros_64(Mask);
> > +    unsigned MaskTZ = CountTrailingZeros_64(Mask);
> > +    int AMShiftAmt = MaskTZ - ShiftAmt;
> > +
> > +    // There is nothing we can do here unless the mask is removing some
> bits.
> > +    // Also, the addressing mode can only represent shifts of 1, 2, or
> 3 bits.
> > +    if (AMShiftAmt <= 0 || AMShiftAmt > 3) break;
> > +
> > +    // We also need to ensure that mask is a continuous run of bits.
> > +    if (CountTrailingOnes_64(Mask >> MaskTZ) + MaskTZ + MaskLZ != 64)
> break;
>
> Is this the best way to check this?
>

Other code uses the same pattern to check this... If there is a better way,
I'd love to know it? It seems... oddly elegant.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120109/42cf42af/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: crazy-addr-mode2.patch
Type: text/x-patch
Size: 9902 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120109/42cf42af/attachment.bin>