[PATCH] D23253: [X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`

Mon Aug 22 15:23:44 PDT 2016

DavidKreitzer added a comment.

Hi bryant,

I haven't found time to review this patch in detail yet, but here are some initial comments/questions.

Given some of your sample generated-code changes, I expected to see lots of changes in tests/CodeGen/X86. Are you planning to add those changes to this patch later?

Your output code examples have a few instances of this:

+    xorl    %r10d, %r10d
+    movb    2(%rdi), %r10b

Rather than insert an xor here, you'd prefer to convert the movb to movzbl.

Converting movb to movzbl (and movw to movzwl) is essentially what FixupBWInstPass does. The author of that pass was deliberately aggressive about converting movw to movzwl but a bit more conservative about converting to movb to movzbl. Here are the relevant comments:

  // Only replace 8 bit loads with the zero extending versions if
  // in an inner most loop and not optimizing for size. This takes
  // an extra byte to encode, and provides limited performance upside.

  // Always try to replace 16 bit load with 32 bit zero extending.
  // Code size is the same, and there is sometimes a perf advantage
  // from eliminating a false dependence on the upper portion of
  // the register.

This leads to 2 questions for me.

(1) What is the code size impact of this new pass?
(2) How does the behavior of this new pass compare to simply changing X86FixupBWInsts to always optimize the 8-bit case, i.e. not check for innermost loops?

Thanks,
Dave

Repository:
  rL LLVM

https://reviews.llvm.org/D23253