[PATCH] D159140: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND

Tobias Stadler via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 30 06:25:00 PDT 2023


tobias-stadler added a comment.

In D159140#4627276 <https://reviews.llvm.org/D159140#4627276>, @aemerson wrote:

>> We only emit the mask G_CONSTANT when necessary. Even when the G_AND is combined away later, the constant sometimes ends up being reused by other instructions instead of becoming dead.
>
> I'm a bit confused by how this could happen. Does this happen with optimizations?
>
> The wording suggests that a later transform needs a G_AND, and probably the CSEMIRBuilder returns a reference to dead one? Why would other instructions need a dead instruction?

It happens at `O3`. You can take a look at https://godbolt.org/z/G97Kv6bza (this is taken from one of the Atomics tests).  Maybe "reused" is the wrong wording. %47 is the mask constant emitted for the G_AND, but is then also used in various other places right after the legalizer. I assume this is intended CSE behavior, but I'd prefer this not to happen between G_UADDE sequences, so that these patterns can easily be detected by the selector again. Getting rid of the useless G_ANDs everywhere seemed like the cleanest solution and happens to also be a win in every CTMark metric.
Some more context: When I implemented wide adds in the selector, I opted to prevent unnecessary NZCV reloads by looking at the previous instruction to determine if NZCV will already be set correctly. This seems like the a good solution because it doesn't require custom legalization or changes to the way we handle flags (I think X86 does copies to and from the flags register and let's RegAlloc do the heavy lifting). The example also hits another issue though: RegBankAlloc decides to move one of the vRegs to fpr and inserts a copy, which also inhibits the optimization. I will have to look into if I missed some more edge cases like this. I think in the worst case we can always do some more cleanup in PostSelectOptimize, since we already do liveness analysis there. Though I'd prefer to make legalized G_UADDE sequences reliably detectable by the selector, so that we can still get good codegen without going the X86 route or resorting to additional optimizations.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159140/new/

https://reviews.llvm.org/D159140



More information about the llvm-commits mailing list