[PATCH] D23253: [X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`

David Kreitzer via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 23 18:20:16 PDT 2016


DavidKreitzer added a comment.

> Just to clarify, the optimal result is `movzbl 2(%rdi), %r10d`, not

> 

> `movb 2(%rdi), %al; movzbl %al, %r10d` (which is the status quo)

> 

> nor

> 

> `xorl %r10d, %r10d; movb 2(%rdi), %r10d` (which is what this patch does)

> 

> Correct?


Yes, exactly.

> So the converted movzb to movzbl is smaller and has fewer micro-ops. Am I

>  missing something obvious?


AHA! Yes, you are right that this conversion is always profitable:

  mov    0x2(%edi),%al
  movzbl %al,%eax

TO

  movzbl 0x2(%edi),%eax

but the X86FixupBWInsts pass will also do this:

  mov    0x2(%edi),%al

TO

  movzbl 0x2(%edi),%eax

In other words, it will convert the movb to movzbl even if the result of the movb is never zero extended. Doing so is often a win for performance due to the partial register dependence of the movb, but movzbl does encode larger.

I assume you are saying that your new pass will only transform 8-bit moves that are subsequently zero extended? In that case, you can disregard my code size concern for now.

Ultimately, we will want to conditionally optimize the 8-bit instructions that do not feed into a zero extend. (That applies both to the movb --> movzbl transformation and to inserting xor before instructions like SETcc.) It is just a bit harder, due to the tradeoff that needs to be made. I made the same comment on Michael's FixupSetCC pass - just haven't gotten around to experimenting with it yet.

> I'll compile a nice histogram from the diff data in

>  https://reviews.llvm.org/P7213 and https://reviews.llvm.org/P7214 .


That would be very nice! Like I said, though, I am less worried about the code size impact if you are only optimizing instructions that are already being zero extended.

This does mean, however, that this pass will not be an adequate replacement for FixupBWInsts until we add the raw movb --> movzbl transform.

Thanks,
Dave


Repository:
  rL LLVM

https://reviews.llvm.org/D23253





More information about the llvm-commits mailing list