[PATCH] D129775: [x86] use zero-extending load of a byte outside of loops too

Tue Jul 19 11:01:53 PDT 2022

pcordes added a comment.

In D129775#3663121 <https://reviews.llvm.org/D129775#3663121>, @RKSimon wrote:

> LGTM - although I can't see much Os and Oz test coverage?

Good point about -Oz / -Os.  This would be a code-size regression for -Oz (for common cases that don't end up needing a later movzx if we start with movb).

And since this doesn't fix  issue #56498 , it's *always* a code-size regression.  We still redundantly do a `movzbl %al, %eax` after starting with a movzbl load and only doing byte operations on the bottom byte of a register, so it's still correctly zero-extended.  e.g. the last test case (`zext-logicop-shift-load.ll`) shows that.

This is a good first step; fixing that issue to take advantage of cases where byte values are already zero extended can be done later.

If -Oz (and probably -Os) continue to use movb, then later code can't count on byte values being zero-extended to 64-bit.  So the byte-load will need to tell the optimizer whether or not it's zero-extending, depending on options. (And ideally an optimization pass could still fold later zero-extension back into an earlier byte or word load, even with -Oz, because spending 1 extra byte in a load to save a multi-byte instruction is worth it for code-size.)

A temporary regression to code-size for -Oz and -Os is probably less important than a speedup for -O2 / -O3.  (Assuming we actually do get any net speedup from this change alone!)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129775/new/

https://reviews.llvm.org/D129775