[llvm-bugs] [Bug 39936] New: [X86] [BtVer2] 256-bit integer horizontal add idiom not fully expanded using PHADDD

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Dec 10 08:46:43 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=39936

            Bug ID: 39936
           Summary: [X86] [BtVer2] 256-bit integer horizontal add idiom
                    not fully expanded using PHADDD
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: andrea.dibiagio at gmail.com
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, spatel+llvm at rotateright.com

This is a spin off of bug 39921.

The following code performs an integer reduction using operator ADD. The type
is __v8si, so it could be implemented in three steps of horizontal adds.

```
int foo(__v8si A) {
  __v8si Lo = __builtin_shufflevector(A, A, 0, 2, 4, 6, -1, -1, -1, -1);
  __v8si Hi = __builtin_shufflevector(A, A, 1, 3, 5, 7, -1, -1, -1, -1);
  __v8si Step = Lo + Hi;
  Lo = __builtin_shufflevector(Step, Step, 0, 2, -1, -1, -1, -1, -1, -1);
  Hi = __builtin_shufflevector(Step, Step, 1, 3, -1, -1, -1, -1, -1, -1);
  Step = Lo + Hi;
  Hi = __builtin_shufflevector(Step, Step, 1, -1, -1, -1, -1, -1, -1, -1);
  Step += Hi;
  return Step[0];
}
```

Instead, on BtVer2, we currently generate this:

        vextractf128    $1, %ymm0, %xmm1
        vshufps $136, %xmm1, %xmm0, %xmm2 # xmm2 = xmm0[0,2],xmm1[0,2]
        vshufps $221, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[1,3],xmm1[1,3]
        vpaddd  %xmm0, %xmm2, %xmm0
        vphaddd %xmm0, %xmm0, %xmm0
        vphaddd %xmm0, %xmm0, %xmm0
        vmovd   %xmm0, %eax
        retq


We could have generate this instead:

        vextractf128    $1, %ymm0, %xmm1
        vphaddd %xmm0, %xmm1, %xmm0
        vphaddd %xmm0, %xmm0, %xmm0
        vphaddd %xmm0, %xmm0, %xmm0
        vmovd   %xmm0, %eax
        retq


It looks like that our target specific `combineAnd()` routine is unable to
match the following DAG due to the presence of extract_subvector nodes.


      t6: v8i32 = vector_shuffle<0,2,4,6,u,u,u,u> t2, undef:v8i32
    t33: v4i32 = extract_subvector t6, Constant:i64<0>
      t7: v8i32 = vector_shuffle<1,3,5,7,u,u,u,u> t2, undef:v8i32
    t34: v4i32 = extract_subvector t7, Constant:i64<0>
  t35: v4i32 = add t33, t34


If we try to hoist the extract_subvector, instead of shrinking the binary
computation (and therefore shrink the shuffle operands), we end up in an
infinite loop of combine. That is because the DAGCombiner would always attempt
to sink a extract_subvector of a binop into the operands of the binop itself.


It is worth mentioning that if we compile the following (equivalent) C++ code,
then we get the optimal HADDD sequence:

```
int foo(__v8si A) {
  __v4si Lo = __builtin_shufflevector(A, A, 0, 2, 4, 6);
  __v4si Hi = __builtin_shufflevector(A, A, 1, 3, 5, 7);
  __v4si Step = Lo + Hi;
  Lo = __builtin_shufflevector(Step, Step, 0, 2, -1, -1);
  Hi = __builtin_shufflevector(Step, Step, 1, 3, -1, -1);
  Step = Lo + Hi;
  Hi = __builtin_shufflevector(Step, Step, 1, -1, -1, -1);
  Step += Hi;
  return Step[0];
}
```

        vextractf128    $1, %ymm0, %xmm1
        vphaddd %xmm1, %xmm0, %xmm0
        vphaddd %xmm0, %xmm0, %xmm0
        vphaddd %xmm0, %xmm0, %xmm0
        vmovd   %xmm0, %eax
        retq


Note how the shuffle mask is shrunk in the IR, so we only manipulate 128-bit
values in practice (excluding vector A in input to the function).

Does it mean that we can do something better at IR level, rather than
complicating existing target specific/independent dag combine rules?

Do we have a demanded-elts kind of analysis at IR level? If so, then we may be
able to realize that all those shufflevectors could be shrunk before we even
reach the code generator.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181210/a584166f/attachment.html>


More information about the llvm-bugs mailing list