[PATCH] D111800: [VectorCombine] Add option to only run scalarization transforms.

Thu Oct 14 13:10:15 PDT 2021

spatel added a comment.

In D111800#3064694 <https://reviews.llvm.org/D111800#3064694>, @fhahn wrote:

> In D111800#3064573 <https://reviews.llvm.org/D111800#3064573>, @spatel wrote:
>
>> What if we just did better in VectorCombine?
>>
>> We'd need to chain a bunch of combines together on paper or just implement this:
>> https://llvm.org/PR52178
>> ...to know if it gets us to the minimal set of shuffles in IR and/or codegen for the 'hadd' example, but it might be enough?
>
> Yes, I think that would get us a bit further, especially on the ARM64 test case. For X86 the shuffles/add chains are a bit more difficult to tackle: it converts the 4 scalar adds to 4 vector adds which each process a single lane. I'm not sure if we will be able to cover this in VectorCombine.

I just drafted a patch for PR51278, and it got the hadd example down to:

In D111800#3064694 <https://reviews.llvm.org/D111800#3064694>, @fhahn wrote:

> In D111800#3064573 <https://reviews.llvm.org/D111800#3064573>, @spatel wrote:
>
>> What if we just did better in VectorCombine?
>>
>> We'd need to chain a bunch of combines together on paper or just implement this:
>> https://llvm.org/PR52178
>> ...to know if it gets us to the minimal set of shuffles in IR and/or codegen for the 'hadd' example, but it might be enough?
>
> Yes, I think that would get us a bit further, especially on the ARM64 test case. For X86 the shuffles/add chains are a bit more difficult to tackle: it converts the 4 scalar adds to 4 vector adds which each process a single lane. I'm not sure if we will be able to cover this in VectorCombine.

I drafted a patch for PR52178 , and I see that we get the first pair folded, but we're stuck on the next pair:

  define <4 x float> @reverse_hadd_v4f32(<4 x float> %a, <4 x float> %b) local_unnamed_addr #0 {
    %shift = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
    %shift1 = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>
    %1 = shufflevector <4 x float> %shift, <4 x float> %shift1, <4 x i32> <i32 undef, i32 undef, i32 6, i32 0>
    %2 = shufflevector <4 x float> %a, <4 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 2, i32 0>
    %3 = fadd <4 x float> %1, %2
    %shift2 = shufflevector <4 x float> %b, <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 undef, i32 undef>
    %4 = fadd <4 x float> %shift2, %b
    %5 = shufflevector <4 x float> %3, <4 x float> %4, <4 x i32> <i32 undef, i32 5, i32 2, i32 3>
    %shift3 = shufflevector <4 x float> %b, <4 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 3, i32 undef>
    %6 = fadd <4 x float> %shift3, %b
    %7 = shufflevector <4 x float> %5, <4 x float> %6, <4 x i32> <i32 6, i32 1, i32 2, i32 3>
    ret <4 x float> %7
  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111800/new/

https://reviews.llvm.org/D111800