[PATCH] D159140: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
Amara Emerson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 30 07:53:07 PDT 2023
aemerson added a comment.
In D159140#4628211 <https://reviews.llvm.org/D159140#4628211>, @tobias-stadler wrote:
> In D159140#4627276 <https://reviews.llvm.org/D159140#4627276>, @aemerson wrote:
>
>>> We only emit the mask G_CONSTANT when necessary. Even when the G_AND is combined away later, the constant sometimes ends up being reused by other instructions instead of becoming dead.
>>
>> I'm a bit confused by how this could happen. Does this happen with optimizations?
>>
>> The wording suggests that a later transform needs a G_AND, and probably the CSEMIRBuilder returns a reference to dead one? Why would other instructions need a dead instruction?
>
> It happens at `O3`. You can take a look at https://godbolt.org/z/G97Kv6bza (this is taken from one of the Atomics tests). Maybe "reused" is the wrong wording. %47 is the mask constant emitted for the G_AND, but is then also used in various other places right after the legalizer. I assume this is intended CSE behavior, but I'd prefer this not to happen between G_UADDE sequences, so that these patterns can easily be detected by the selector again. Getting rid of the useless G_ANDs everywhere seemed like the cleanest solution and happens to also be a win in every CTMark metric.
> Some more context: When I implemented wide adds in the selector, I opted to prevent unnecessary NZCV reloads by looking at the previous instruction to determine if NZCV will already be set correctly. This seems like the a good solution because it doesn't require custom legalization or changes to the way we handle flags (I think X86 does copies to and from the flags register and let's RegAlloc do the heavy lifting). The example also hits another issue though: RegBankAlloc decides to move one of the vRegs to fpr and inserts a copy, which also inhibits the optimization. I will have to look into if I missed some more edge cases like this. I think in the worst case we can always do some more cleanup in PostSelectOptimize, since we already do liveness analysis there. Though I'd prefer to make legalized G_UADDE sequences reliably detectable by the selector, so that we can still get good codegen without going the X86 route or resorting to additional optimizations.
I see what you mean now. Is there any code size impact with optimizations (-Os or -O3)? I expect there to be none but just want to check.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D159140/new/
https://reviews.llvm.org/D159140
More information about the llvm-commits
mailing list