[all-commits] [llvm/llvm-project] ccfabf: Fix subrange liveness checking at rematerialization

Tue Aug 16 10:50:28 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: ccfabfbb1f91ffb936f7d0ce877a24c1c1ebaea8
      https://github.com/llvm/llvm-project/commit/ccfabfbb1f91ffb936f7d0ce877a24c1c1ebaea8
  Author: Nicolas Miller <nicolas.miller at codeplay.com>
  Date:   2022-08-16 (Tue, 16 Aug 2022)

  Changed paths:
    M llvm/lib/CodeGen/LiveRangeEdit.cpp
    M llvm/test/CodeGen/AMDGPU/remat-dead-subreg.mir

  Log Message:
  -----------
  Fix subrange liveness checking at rematerialization

This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```

>From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:

```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0 at 2224r 1 at 2240r  L0000000000000003 [2224r,3440r:0) 0 at 2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0 at 2240r
```

So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Original Author: npmiller

Reviewed by: rampitec

Submitted by: rampitec

Differential Revision: https://reviews.llvm.org/D131884