[llvm-dev] Instcombine-code-sinking increases the value’s live range

Wed Oct 13 02:20:38 PDT 2021

Answer by myself :P

The original input pattern is as below:

```c
local memory
for (...) {
  a function (has side effect) which copies from global to local memory
  access data in local memory and do compute
}

if (...)
  return;
store the computed result back.
```

If the for loop is fully unrolled, and the computing part is sunk to
the basicblock which stores the computed result back, then the backend
compiler needs to find some places (registers or memory) to store
these copied data.

I've tested with aarch64 and amdgcn, in the test pattern both targets
will spill the data to memory.

In the for-loop If we can directly copy instead of using a copy
function, both targets can generate better basicblock layouts.
(aarch64: "Machine code sinking (machine-sink)" pass, amdgcn: "Code
sinking (sink)" pass)

On Wed, Sep 29, 2021 at 9:52 AM Chuang-Yu Cheng
<cycheng.buddhist at gmail.com> wrote:
>
> Hi,
>
> In the InstCombinePass, by default the pass will try to sink an
> instruction to its successor basic block when possible (so that the
> instruction isn’t executed on a path where its result isn’t needed.).
> But doing that will also increase a value’s live range. For example:
>
> entry:
>   ..
>   %6 = load float, ..
>   %s.0 = load float, ..
>   %mul22 = fmul float %6, %s.0
>   %add23 = fadd float %mul22, zeroinitializer
>
>   %7 = load float, ..
>   %s.1 = load float, ..
>   %mul26 = fmul float %7, %s.1
>   %add27 = fadd float %add23, %mul26
>
>   ..
>   br i1 %cmp, label %cleanup, label %if.end1
>
> if.end1:
>   %15 = load float, ..
>   %add67 = fadd %add27, %15
>   store float %add67, ..
>   br label %cleanup
>
> cleanup:
>   return
>
>
> In the original input, only %add27 has longer live range, but after
> InstCombine with instcombine-code-sinking=true (default), it turns out
> that %6, %s.0, %7, %s.1 are having longer live ranges.
>
> entry:
>   ..
>   %6 = load float, ..
>   %s.0 = load float, ..
>
>   %7 = load float, ..
>   %s.1 = load float, ..
>
>   ..
>   br i1 %cmp, label %cleanup, label %if.end1
>
> if.end1:
>   %mul22 = fmul float %6, %s.0
>   %add23 = fadd float %mul22, zeroinitializer
>
>   %mul26 = fmul float %7, %s.1
>   %add27 = fadd float %add23, %mul26
>
>   %15 = load float, ..
>   %add67 = fadd %add27, %15
>   store float %add67, ..
>   br label %cleanup
>
> cleanup:
>   return
>
> We see an issue which causes our customized register-allocator keeping
> those values like %6, %s.0, %7, %s.1 in registers with a long period.
>
> My questions are:
>
> Does llvm expect the backend's instruction scheduler and register
> allocator can handle this properly?
>
> Can this be solved by llvm’s GlobalISel?
>
> Thank you!
> CY