[PATCH] D110170: [InstCombine] fold cast of right-shift if high bits are not demanded

Bjorn Pettersson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 5 15:32:33 PDT 2021


bjope added a comment.

In D110170#3042818 <https://reviews.llvm.org/D110170#3042818>, @spatel wrote:

> In D110170#3041159 <https://reviews.llvm.org/D110170#3041159>, @bjope wrote:
>
>> I noticed a regression in a downstream benchmark, that at least partly seem to be caused by it. Here is a reduced example: https://godbolt.org/z/M9MKjcYPG
>>
>> From what I can see there is a quite early run of InstCombine in the O3 <https://reviews.llvm.org/owners/package/3/> pipeline, which basically happens directly after GlobalOpt without any CSE in between. So in such an early run of InstCombine we do trigger transforms based on "one use", which wouldn't have happened if running CSE before InstCombine. I figure that might be a more general problem and not only specific to the rewrites introduced in this patch.
>>
>> We'll analyse the regression a bit more (maybe there are other things that happens that contributes to the regression). But wanted to mention the above. And it makes me a bit curious if it is a general problem with that early instcombine run that "one use" checks might be fooled by not having done CSE after GlobalOpt.
>
> Thanks for posting the example. That does seem like a general problem, and it's worth experimenting with the pass manager to see if reordering the passes makes things better or worse.
> I'm not sure if we have an IR pass that is responsible for seeing that we have redundant shift ops like in the example. Is that a possible trick for GVN?
> Also, I tried running the example through codegen for x86 and AArch64, and they both manage to eliminate the redundant extra shift after legalization. Is it possible that your target is missing a semi-generic SDAG transform?

The IR posted in godbolt was a bit reduced, and running the example through codegen gave the same result also for my target.
Although. the original IR looked a bit more like in this example https://godbolt.org/z/s8Krzrq36 , which show that the number of instructions in the loop increase from 16 to 19, for x86, when using opt from trunc instead of the 13.0.0 version. And afaict this patch is the main difference.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110170/new/

https://reviews.llvm.org/D110170



More information about the llvm-commits mailing list