[llvm-commits] patch: folding (zext (and x, cst))

Jakob Stoklund Olesen stoklund at 2pi.dk
Wed Jun 8 06:18:54 PDT 2011


On Jun 7, 2011, at 11:11 PM, Nick Lewycky wrote:

> Jakob Stoklund Olesen wrote:
>> 
>> On Jun 7, 2011, at 7:58 AM, Eli Friedman wrote:
>> 
>>> On Tue, Jun 7, 2011 at 2:46 AM, Nick Lewycky<nicholas at mxc.ca>  wrote:
>>>> I have an unfinished patch. I was looking to optimize:
>>>> 
>>>>  define i32 @test1(i8 %x) nounwind readnone {
>>>>    %A = and i8 %x, -32
>>>>    %B = zext i8 %A to i32
>>>>    ret i32 %B
>>>>  }
>>>> 
>>>> which currently does a mov into %al, then the "and", then extends, into
>>>> doing a single extending mov, then an "and" in 32-bits. The rule I decided
>>>> upon is "(zext (and x, cst)) ->  (and (anyext x), (zext cst))" in the DAG
>>>> combiner.
>>> 
>>> Just a comment, without reading the patch: it would be much more
>>> conservative to fold (zext (and (load x), cst)) ->  (and (zextload x),
>>> (zext cst)).  The transform you're proposing is much less obviously
>>> beneficial.
>> 
>> Also make sure this does the right thing on x86 when the zext is i32 ->  i64. The and implicitly zero-extends to 64 bit.
> 
> Could you elaborate? Are you saying that ISD::AND may have a wider result than its arguments?

32-bit operations on x86-64 will usually clear the high part of the destination register. That means the 32-64 bit zero-extension is really a noop:

// Any instruction that defines a 32-bit result leaves the high half of the
// register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
// be copying from a truncate. And x86's cmov doesn't do anything if the
// condition is false. But any other 32-bit operation will zero-extend
// up to 64 bits.
def def32 : PatLeaf<(i32 GR32:$src), [{
  return N->getOpcode() != ISD::TRUNCATE &&
         N->getOpcode() != TargetOpcode::EXTRACT_SUBREG &&
         N->getOpcode() != ISD::CopyFromReg &&
         N->getOpcode() != X86ISD::CMOV;
}]>;

// In the case of a 32-bit def that is known to implicitly zero-extend,
// we can use a SUBREG_TO_REG.
def : Pat<(i64 (zext def32:$src)),
          (SUBREG_TO_REG (i64 0), GR32:$src, sub_32bit)>;

That means (zext (and x, cst)) simply becomes (SUBREG_TO_REG (AND32ri x,cst), sub_32bit). The SUBREG_TO_REG is emitted as a copy which is almost always coalesced away.

It looks like it will work out, but you should make sure that you are not increasing code size for this common case.

/jakob





More information about the llvm-commits mailing list