[PATCH] D129735: [RISCV] Add new pass to transform undef to pesudo for vector values.

Piyou Chen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 21 02:44:14 PST 2022


BeMg added a comment.

Handle Sub-register undef+early-clobber
=======================================

For sub-registers, there is the same issue. The register allocator will also generate the program that breaks the early-clobber constraint. The reason for this situation is that the partial register used in instruction (with early-clobber flag) is undef. For example:

  early-clobber %12:vr = PseudoVRGATHEREI16_VV_M1_M2 %10:vr, %1:vrm2, 4, 3, implicit $vl, implicit $vtype

->

  vrgatherei16.vv v13, v9, v12

v12 is selected as VRM2, it will occupy the v12~v13. The register allocator still allocates the v13 for %12:vrm2 due to the v13 is undef for %1:vr in the register allocation stage. This is an example of how an undef subregister breaks the early-clobber constraint in the register allocation stage.

Here we propose an approach to fix this problem. The concept is the same as a normal undef register situation. We define the sub-register with pseudo instruction and remove it in the later pass (after RA).

There are three steps for this approach:

1. Select the def-use chain from implicit_def to the first user with early-clobber constraint
2. Compute the undef sub-register index from collecting information from INSERT_SUBREG and PHI node
3. Insert the PseudoInit and INSERT_SUBREG for undefined sub-register after the last INSERT_SUGREG that updates the sub-register

F25402509: 螢幕擷取畫面_20221121_063112.png <https://reviews.llvm.org/F25402509>

Here we show the example with the pattern that will trigger undef+early-clobber issue.

Step 1
------

There are three def-use chains we need to care about in this program.

F25402516: 螢幕擷取畫面_20221121_064019.png <https://reviews.llvm.org/F25402516>

The pattern will look like

  v0 = Implicit_def
  …
  INSERT_SUBREG | COPY | PHI
  …
  early-clobber rd = Op vN



Step 2
------

The INSERT_SUBREG node third operand is subregister index. It shows that this node defines which sub-register in the whole register. We can use the information to construct the sub-register that is undefined.

We use the LaneBitMask for this purpose.

  LaneBitmask == 0xC for whole VRM2 register
  LaneBitmask == 0x4 for %subreg.sub_vrm1_0 
  LaneBitmask == 0x8 for %subreg.sub_vrm1_1

If we get the following def-use chain in step1

  %4:vrm2 = Implicit_def
  %0:vrm2 = INSERT_SUBREG %4, %subreg.sub_vrm1_0
  early-clobber %11:vr = Op %0

0xC is VRM2’s LaneBitMask and 0x4 is already defined by INSERT_SUBREG in the program.

0xC & ~0x4 = 0x8 -> subreg.sub_vrm1_1

In this case, subreg.sub_vrm1_1 is the undefined sub-register before being used by early-clobber instruction.

Step 3
------

We can define a sub-register by INSERT_SUBREG between the last INSERT_SUBREG and the user with early-clobber. Our goal is to make sure the sub-registers are all defined before being used by early-clobber instruction.

  %4:vrm2 = Implicit_def
  %0:vrm2 = INSERT_SUBREG %4, %subreg.sub_vrm1_0
  early-clobber %11:vr = Op %0

->

  %4:vrm2 = Implicit_def
  %0:vrm2 = INSERT_SUBREG %4, %subreg.sub_vrm1_0
  %21:vr = PseudoRVVInitUndefM1
  %22:vrm2 = INSERT_SUBREG %1:vrm2, %21:vr, %subreg.sub_vrm1_1
  early-clobber %11:vr = Op %22



PHI in def-use chain
--------------------

In Step 2, PHI will be seen as another instruction that will change the subregister defined region. The PHINodeLaneBitRecord will record the LaneBitMask from both predecessors, and insert the INSERT_SUBREG with this information.

F25402524: 螢幕擷取畫面_20221118_040618.png <https://reviews.llvm.org/F25402524>

F25402526: 螢幕擷取畫面_20221121_063340.png <https://reviews.llvm.org/F25402526>


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129735/new/

https://reviews.llvm.org/D129735



More information about the llvm-commits mailing list