[PATCH] D79078: [VectorCombine] Leave reduction operation to SLP

Sat May 16 09:29:38 PDT 2020

lebedev.ri added a comment.

In D79078#2040144 <https://reviews.llvm.org/D79078#2040144>, @spatel wrote:

> In D79078#2038950 <https://reviews.llvm.org/D79078#2038950>, @lebedev.ri wrote:
>
> > In D79078#2010024 <https://reviews.llvm.org/D79078#2010024>, @spatel wrote:
> >
> > > I also sympathize with trying to solve this here rather than SLP. One of the reasons vector-combine exists is because SLP became too hard to reason about. In hindsight, we should have created a separate pass for reductions - those are not traditional SLP concerns. Just my opinion. :)
> >
> >
> > I'm not sure what you have in mind here?
> >  That *this* pass should also form such reductions?
> >  Or that we should not disturb them after SLP formed them?
> >  Or something else?
>
>
> The reduction logic is a complicated blob of code, so I don't think it belongs here. I'd split it off from SLP into its own pass, but it looks like a lot of untangling.
>  Currently, we're running this pass *before* SLP only. We could move this after SLP to make sure we are not disturbing reductions before SLP has a chance to recognize them...but I'm not sure if that would also now cause regressions. I don't have a good feel for how these passes are interacting.
>
> What does it take to cause the infinite looping that you found?

No, i mean in the case if we would be forming reductions here,
because i think we'd then have two conflicting transforms here,
and they would cause traditional instcombine/dagcombine-esque infinite combine loop.

> Looking at that 1st test - if we allow iteration in this pass, we'll end up with:
> 
>   define i32 @ext_ext_reduction(<4 x i32> %x, <4 x i32> %y) {
>     %and = and <4 x i32> %x, %y
>     %1 = shufflevector <4 x i32> %and, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
>     %2 = or <4 x i32> %1, %and
>     %3 = shufflevector <4 x i32> %and, <4 x i32> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 undef>
>     %4 = or <4 x i32> %3, %2
>     %5 = shufflevector <4 x i32> %and, <4 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>
>     %6 = or <4 x i32> %5, %4
>     %7 = extractelement <4 x i32> %6, i64 0
>     ret i32 %7
>   }
>   
> 
> 
> And nothing knows how to form the optimal reduction from that pattern. We could say that's the real problem - source code could be in that form originally, so we just miss the reassociation optimization opportunity.

That would be the outcome i would prefer, yes.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79078/new/

https://reviews.llvm.org/D79078