[PATCH] D59926: Constant folding sometimes creates large constants even though a smaller constant is used multiple times

Fri Mar 29 04:12:04 PDT 2019

lebedev.ri added a comment.

In D59926#1447460 <https://reviews.llvm.org/D59926#1447460>, @ramred01 wrote:

> In D59926#1445936 <https://reviews.llvm.org/D59926#1445936>, @spatel wrote:
>
> > In D59926#1445839 <https://reviews.llvm.org/D59926#1445839>, @ramred01 wrote:
> >
> > > In D59926#1445823 <https://reviews.llvm.org/D59926#1445823>, @lebedev.ri wrote:
> > >
> > > > In D59926#1445822 <https://reviews.llvm.org/D59926#1445822>, @ramred01 wrote:
> > > >
> > > > > In D59926#1445820 <https://reviews.llvm.org/D59926#1445820>, @lebedev.ri wrote:
> > > > >
> > > > > > This sounds like a back-end problem.
> > > > > >  Have you considered/tried solving it there, instead of limiting the middle-end?
> > > > >
> > > > >
> > > > > This folding is happening in the middle-end itself.  The LLVM IR already has the folded value.
> > > >
> > > >
> > > > That is precisely my point. Is that folding not optimal for the IR?
> > >
> > >
> > > That folding would have been optimal for the IR if the constant were not to be reused.  If the constant is reused, then most architectures will do better with materializing one constant in a register and reusing it rather than materializing two constants.  Most architectures require two instructions to materialize 32-bit constants.  If, we add the additional operation of shift also, that now needs to be done, even with one reuse, it will generate 1 instruction fewer.  With more reuses and each reuse folded, that number could increase.
> >
> >
> > I think you're mixing up canonicalization with optimization (which I've also done a bunch). The goal here in instcombine is to produce canonical code. Ie, create a reduced set of easier-to-optimize forms of the infinite patterns that might have existed in the source. Canonical code often is identical to optimal code, but sometimes they diverge. That's when things get harder (and we might end up reversing an earlier transform). But we have to deal with it in the backend because -- as Roman noted -- any solution at this level would be incomplete simply because the input might already be in the form that you don't want.
>
>
> I get your point and fully agree with it.
>
> But if a certain canonicalization were to result in a form that always requires reversing that transform at a later stage, won't we be better off not performing that transform in the first place?  If a canonicalization were to result in exposing better optimization opportunities across architectures, then that makes more sense, isn't it?

Is it *always*, though?
If i understand correctly, you want to block this transform, right?
https://godbolt.org/z/IvWYKl

we have //essentially// replaced

  %4 = ashr i32 %3, 8
  %5 = icmp slt i32 %4, 4096

with

  %5 = icmp slt i32 %3, 1048832

thus that cmp does not depend on the `ashr`, thus we reduced the data dependency chain,
thus `icmp` can execute without waiting for the `ashr`.

Which is, as it can be seen from the lowest view there, unsurprisingly improves performance. (assuming i fixed-up asm for mca correctly..)

So i'm not convinced that this hammer approach is the right solution to the problem.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59926/new/

https://reviews.llvm.org/D59926