[PATCH] D159140: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND

Wed Aug 30 07:53:07 PDT 2023

aemerson added a comment.

In D159140#4628211 <https://reviews.llvm.org/D159140#4628211>, @tobias-stadler wrote:

> In D159140#4627276 <https://reviews.llvm.org/D159140#4627276>, @aemerson wrote:
>
>>> We only emit the mask G_CONSTANT when necessary. Even when the G_AND is combined away later, the constant sometimes ends up being reused by other instructions instead of becoming dead.
>>
>> I'm a bit confused by how this could happen. Does this happen with optimizations?
>>
>> The wording suggests that a later transform needs a G_AND, and probably the CSEMIRBuilder returns a reference to dead one? Why would other instructions need a dead instruction?
>
> It happens at `O3`. You can take a look at https://godbolt.org/z/G97Kv6bza (this is taken from one of the Atomics tests).  Maybe "reused" is the wrong wording. %47 is the mask constant emitted for the G_AND, but is then also used in various other places right after the legalizer. I assume this is intended CSE behavior, but I'd prefer this not to happen between G_UADDE sequences, so that these patterns can easily be detected by the selector again. Getting rid of the useless G_ANDs everywhere seemed like the cleanest solution and happens to also be a win in every CTMark metric.
> Some more context: When I implemented wide adds in the selector, I opted to prevent unnecessary NZCV reloads by looking at the previous instruction to determine if NZCV will already be set correctly. This seems like the a good solution because it doesn't require custom legalization or changes to the way we handle flags (I think X86 does copies to and from the flags register and let's RegAlloc do the heavy lifting). The example also hits another issue though: RegBankAlloc decides to move one of the vRegs to fpr and inserts a copy, which also inhibits the optimization. I will have to look into if I missed some more edge cases like this. I think in the worst case we can always do some more cleanup in PostSelectOptimize, since we already do liveness analysis there. Though I'd prefer to make legalized G_UADDE sequences reliably detectable by the selector, so that we can still get good codegen without going the X86 route or resorting to additional optimizations.

I see what you mean now. Is there any code size impact with optimizations (-Os or -O3)? I expect there to be none but just want to check.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159140/new/

https://reviews.llvm.org/D159140