[PATCH] D79078: [VectorCombine] Leave reduction operation to SLP

Sat May 16 08:57:47 PDT 2020

spatel added a comment.

In D79078#2038950 <https://reviews.llvm.org/D79078#2038950>, @lebedev.ri wrote:

> In D79078#2010024 <https://reviews.llvm.org/D79078#2010024>, @spatel wrote:
>
> > I also sympathize with trying to solve this here rather than SLP. One of the reasons vector-combine exists is because SLP became too hard to reason about. In hindsight, we should have created a separate pass for reductions - those are not traditional SLP concerns. Just my opinion. :)
>
>
> I'm not sure what you have in mind here?
>  That *this* pass should also form such reductions?
>  Or that we should not disturb them after SLP formed them?
>  Or something else?

The reduction logic is a complicated blob of code, so I don't think it belongs here. I'd split it off from SLP into its own pass, but it looks like a lot of untangling.
Currently, we're running this pass *before* SLP only. We could move this after SLP to make sure we are not disturbing reductions before SLP has a chance to recognize them...but I'm not sure if that would also now cause regressions. I don't have a good feel for how these passes are interacting.

What does it take to cause the infinite looping that you found?

Looking at that 1st test - if we allow iteration in this pass, we'll end up with:

  define i32 @ext_ext_reduction(<4 x i32> %x, <4 x i32> %y) {
    %and = and <4 x i32> %x, %y
    %1 = shufflevector <4 x i32> %and, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
    %2 = or <4 x i32> %1, %and
    %3 = shufflevector <4 x i32> %and, <4 x i32> undef, <4 x i32> <i32 2, i32 undef, i32 undef, i32 undef>
    %4 = or <4 x i32> %3, %2
    %5 = shufflevector <4 x i32> %and, <4 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>
    %6 = or <4 x i32> %5, %4
    %7 = extractelement <4 x i32> %6, i64 0
    ret i32 %7
  }

And nothing knows how to form the optimal reduction from that pattern. We could say that's the real problem - source code could be in that form originally, so we just miss the reassociation optimization opportunity.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79078/new/

https://reviews.llvm.org/D79078