[PATCH] D92405: [VirtRegRewriter] Insert missing killed flags when tracking subregister liveness

Tue Dec 1 10:18:26 PST 2020

bsaleil created this revision.
bsaleil added reviewers: MatzeB, qcolombet, tstellar, nemanjai, AMDGPU, PowerPC.
bsaleil added a project: LLVM.
Herald added subscribers: llvm-commits, kerbowa, hiraditya, tpr, nhaehnle, jvesely.
bsaleil requested review of this revision.

When using a register operand for the last time and if all its lanes are defined, virtregrewriter may fail to add the `killed` flag when tracking liveness of subregisters. On PPC, this may cause the generation of unnecessary code.
For example, for the following code:

  0B	bb.0 (%ir-block.0):
  	  liveins: $v2, $v3, $v4, $v5
  16B	  undef %14.sub_vsx1:vsrprc_with_sub_64_in_vfrc = COPY $v5
  32B	  %14.sub_vsx0:vsrprc_with_sub_64_in_vfrc = COPY $v4
  48B	  undef %8.sub_vsx1:vsrprc_with_sub_64_in_vfrc = COPY $v3
  64B	  %8.sub_vsx0:vsrprc_with_sub_64_in_vfrc = COPY $v2
  80B	  %4:g8rc_and_g8rc_nox0 = LD 0, %fixed-stack.0 :: (load 8 from %fixed-stack.0, align 16)
  160B	  %8:vsrprc_with_sub_64_in_vfrc = KILL_PAIR %8:vsrprc_with_sub_64_in_vfrc(tied-def 0)
  176B	  undef %15.sub_pair0:uaccrc = COPY %8:vsrprc_with_sub_64_in_vfrc
  256B	  %14:vsrprc_with_sub_64_in_vfrc = KILL_PAIR %14:vsrprc_with_sub_64_in_vfrc(tied-def 0)
  288B	  %15.sub_pair1:uaccrc = COPY %14:vsrprc_with_sub_64_in_vfrc
  304B	  %28:accrc = BUILD_UACC %15:uaccrc
  336B	  %28:accrc = XXMTACC %28:accrc(tied-def 0)
  340B	  SPILL_ACC %28:accrc, 0, %stack.0 :: (store 64 into %stack.0, align 16)
  352B	  ADJCALLSTACKDOWN 32, 0, implicit-def dead $r1, implicit $r1
  368B	  BL8_NOTOC @foo, <regmask $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 $r25 and 60 more...>, implicit-def dead $lr8, implicit $rm, implicit-def $r1
  384B	  ADJCALLSTACKUP 32, 0, implicit-def dead $r1, implicit $r1
  408B	  %25:accrc = RESTORE_ACC 0, %stack.0 :: (load 64 from %stack.0, align 16)
  416B	  %25:accrc = XXMFACC %25:accrc(tied-def 0)
  448B	  STXV %25.sub_vsx0:accrc, 48, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2 + 48)
  480B	  STXV %25.sub_vsx1:accrc, 32, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2 + 32, align 32)
  528B	  STXV %25.sub_pair1_then_sub_vsx0:accrc, 16, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2 + 16)
  536B	  undef %26.sub_pair1_then_sub_vsx1:accrc = COPY %25.sub_pair1_then_sub_vsx1:accrc
  560B	  STXVX %26.sub_pair1_then_sub_vsx1:accrc, $zero8, %4:g8rc_and_g8rc_nox0 :: (store 16 into %ir.2, align 64)
  576B	  BLR8 implicit $lr8, implicit $rm

`LiveIntervals::addKillFlags` should add a `killed` flag to the `%28` operand of the SPILL_ACC instruction (340B).
The live interval for `%28` is:

  %28 [304r,336r:0)[336r,340r:1)  0 at 304r 1 at 336r L0000000000000080 [304r,336r:0)[336r,340r:1)  0 at 304r 1 at 336r L0000000000000040 [304r,336r:0)[336r,340r:1)  0 at 304r 1 at 336r L0000000000000002 [304r,336r:0)[336r,340r:1)  0 at 304r 1 at 336r L0000000000000100 [304r,336r:0)[336r,340r:1)  0 at 304r 1 at 336r weight:INF
  RegMasks: 368r

We see that lanes `L0000000000000080`, `L0000000000000040`, `L0000000000000002` and `L0000000000000100` are all live until 340B.
The full lane mask of the accumulator registers (accrc) is `L00000000000001C2` meaning that all the lanes used for the SPILL are defined and the flag should be added.

This patch fixes two things:

1. The way the mask for defined lanes is computed. We currently compute the mask `L0000000000000000` instead of `L00000000000001C2`. To fix that, we go through all the segments before the instruction using the register operand to see which lanes are defined and killed at this instruction.

2. The way the mask for used lanes is computed. In case we use a register that is not a subregister, we currently use the conservative default mask `LFFFFFFFFFFFFFFFF` to represent used lanes. However, this mask is not necessarily accurate. As shown above, on PPC, the mask should be `L00000000000001C2` for accumulators. So we try to retrieve this mask from the register class instead of using the default mask.

I also need someone familiar with the AMDGPU backend to check if the test cases are still correct after these changes.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D92405

Files:
  llvm/lib/CodeGen/LiveIntervals.cpp
  llvm/test/CodeGen/AMDGPU/load-global-i16.ll
  llvm/test/CodeGen/AMDGPU/regbank-reassign-wave64.mir
  llvm/test/CodeGen/AMDGPU/regbank-reassign.mir
  llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll
  llvm/test/CodeGen/AMDGPU/splitkit-nolivesubranges.mir
  llvm/test/CodeGen/AMDGPU/splitkit.mir
  llvm/test/CodeGen/AMDGPU/subreg_interference.mir
  llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir
  llvm/test/CodeGen/PowerPC/subreg-killed.mir

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D92405.308694.patch
Type: text/x-patch
Size: 20118 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20201201/50818c1c/attachment.bin>