[PATCH] D38313: [InstCombine] Introducing Aggressive Instruction Combine pass

Wed Nov 29 00:54:50 PST 2017

escha added a comment.

In https://reviews.llvm.org/D38313#937267, @zvi wrote:

> In https://reviews.llvm.org/D38313#936422, @escha wrote:
>
> > Two comments on the trunc thing:
> >
> > 1. Thank you!!! As a GPU target maintainer, one of my main frustrations is how much LLVM *loves* to generate code that is needlessly too wide when smaller would do. We mostly have avoided this problem due to being float-heavy, but as integer code becomes more important, I absolutely love any chance I can get to reduce 32-bit to 16-bit and save register space accordingly.
>
>
> Sometimes it's LLVM, and sometimes it's the frontend that is required to extend small typed values before performing operations.
>
> > 2. I'm worried about this because the DAG *loves* to eliminate """redundant""" truncates and extensions, even if they're both marked as free. I've accidentally triggered infinite loops many times when trying to trick the DAG into emitting code that keeps intermediate variables small, an extreme example being something like this:
> > 
> > 
> > 
> >   ; pseudo-asm
> >   ; R1 = *b + (*a & 15);
> >   ; R2 = *c + (*a >> 16) & 15;
> >   load.32 R0, [a]
> >   load.32 R1, [b]
> >   load.32 R2, [c]
> >   shr.32 R0H, R0, 16
> >   and.16 R0L, R0L, 15
> >   and.16 R0H, R0H, 15
> >   add.32 R1, R1, R0L
> >   add.32 R2, R2, R0H
> > 
> > 
> > The DAG will usually try to turn this into this:
> > 
> >   load.32 R0, [a]
> >   load.32 R1, [b]
> >   load.32 R2, [c]
> >   shr.32 R3, R0, 16
> >   and.32 R0, R0, 15
> >   and.32 R3, R3, 15
> >   add.32 R1, R1, R0
> >   add.32 R2, R2, R3
> > 
> > 
> > this is just a hypothetical example but in general this makes me worry from past attempts at experimentation in this realm.
>
> Not sure I fully understand the concern of this patch, but if the problem is root caused to Instruction Selection, shouldn't we fix it there? If DAGCombiner's elimination of free truncates/extensions is an issue, have you considered predicating the specific combines with TLI hooks?

There *are* TLI hooks; they're just not as widely used in the DAG as they could be.

I'm warning you with regard to this patch because the DAG may inadvertently undo a lot of the optimizations you're doing here. This isn't an objection, just something that might be worth looking at later given past experiences in trying to do similar optimizations.

https://reviews.llvm.org/D38313