[PATCH] D125248: [instcombine] Propagate freeze back to def

Tue May 10 07:45:35 PDT 2022

nikic added a comment.

In D125248#3502172 <https://reviews.llvm.org/D125248#3502172>, @mtrofin wrote:

> In D125248#3501979 <https://reviews.llvm.org/D125248#3501979>, @nikic wrote:
>
>> In D125248#3501917 <https://reviews.llvm.org/D125248#3501917>, @mtrofin wrote:
>>
>>> In D125248#3501446 <https://reviews.llvm.org/D125248#3501446>, @nikic wrote:
>>>
>>>> I don't think this is the right place to do this. You should either:
>>>>
>>>> 1. Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.
>>>>
>>>> 2. Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.
>>>
>>> I think in either case the `freeze %s` def needs to dominate its uses.
>>
>> Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.
>>
>> There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).
>>
>> With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.
>
> Ah, yes, dropping the one-use check makes sense. Waiting to see if the others had any concerns with that, otherwise sgtm.
>
> Here's also the shrunk down example showing the larger picture (it's as much shrinking I could do off binilearKernel in that benchmark):
> https://gist.github.com/mtrofin/19ccc1c0fc2da8689422837ad442896c
>
> (Just comment in/out the 2 options)
>
> basically instcombine moves the freeze from cmp.fr up to div, but stops there. then gvn can't unify idx1 and 2 and then we end up with a select when we do simplifycfg. In the larger picture that's a bit more than just a select, the whole `if.else` block survives (with the subset of the instructions that couldn't be folded) and that redundant computation then causes the regression.

Thanks, I understand your case now. I also ran into another related one while working on CHR:

  define i1 @test(i32 %x) {
    %and1 = and i32 %x, 4
    %cmp1 = icmp ne i32 %and1, 0
    %x.fr = freeze i32 %x
    %and2 = and i32 %x.fr, 11
    %cmp2 = icmp eq i32 %and2, 11
    %and = and i1 %cmp1, %cmp2
    ret i1 %and
  }

Dropping the one-use limitation from the "push freeze upwards" fold would not help this case, as the freeze cannot be pushed any further. To cover this case, we do need to extend the freezeDominatedUses() transform.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125248/new/

https://reviews.llvm.org/D125248