[PATCH] D129735: [RISCV] Add new pass to transform undef to pesudo for vector values.
Piyou Chen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 21 02:44:14 PST 2022
BeMg added a comment.
Handle Sub-register undef+early-clobber
=======================================
For sub-registers, there is the same issue. The register allocator will also generate the program that breaks the early-clobber constraint. The reason for this situation is that the partial register used in instruction (with early-clobber flag) is undef. For example:
early-clobber %12:vr = PseudoVRGATHEREI16_VV_M1_M2 %10:vr, %1:vrm2, 4, 3, implicit $vl, implicit $vtype
->
vrgatherei16.vv v13, v9, v12
v12 is selected as VRM2, it will occupy the v12~v13. The register allocator still allocates the v13 for %12:vrm2 due to the v13 is undef for %1:vr in the register allocation stage. This is an example of how an undef subregister breaks the early-clobber constraint in the register allocation stage.
Here we propose an approach to fix this problem. The concept is the same as a normal undef register situation. We define the sub-register with pseudo instruction and remove it in the later pass (after RA).
There are three steps for this approach:
1. Select the def-use chain from implicit_def to the first user with early-clobber constraint
2. Compute the undef sub-register index from collecting information from INSERT_SUBREG and PHI node
3. Insert the PseudoInit and INSERT_SUBREG for undefined sub-register after the last INSERT_SUGREG that updates the sub-register
F25402509: 螢幕擷取畫面_20221121_063112.png <https://reviews.llvm.org/F25402509>
Here we show the example with the pattern that will trigger undef+early-clobber issue.
Step 1
------
There are three def-use chains we need to care about in this program.
F25402516: 螢幕擷取畫面_20221121_064019.png <https://reviews.llvm.org/F25402516>
The pattern will look like
v0 = Implicit_def
…
INSERT_SUBREG | COPY | PHI
…
early-clobber rd = Op vN
Step 2
------
The INSERT_SUBREG node third operand is subregister index. It shows that this node defines which sub-register in the whole register. We can use the information to construct the sub-register that is undefined.
We use the LaneBitMask for this purpose.
LaneBitmask == 0xC for whole VRM2 register
LaneBitmask == 0x4 for %subreg.sub_vrm1_0
LaneBitmask == 0x8 for %subreg.sub_vrm1_1
If we get the following def-use chain in step1
%4:vrm2 = Implicit_def
%0:vrm2 = INSERT_SUBREG %4, %subreg.sub_vrm1_0
early-clobber %11:vr = Op %0
0xC is VRM2’s LaneBitMask and 0x4 is already defined by INSERT_SUBREG in the program.
0xC & ~0x4 = 0x8 -> subreg.sub_vrm1_1
In this case, subreg.sub_vrm1_1 is the undefined sub-register before being used by early-clobber instruction.
Step 3
------
We can define a sub-register by INSERT_SUBREG between the last INSERT_SUBREG and the user with early-clobber. Our goal is to make sure the sub-registers are all defined before being used by early-clobber instruction.
%4:vrm2 = Implicit_def
%0:vrm2 = INSERT_SUBREG %4, %subreg.sub_vrm1_0
early-clobber %11:vr = Op %0
->
%4:vrm2 = Implicit_def
%0:vrm2 = INSERT_SUBREG %4, %subreg.sub_vrm1_0
%21:vr = PseudoRVVInitUndefM1
%22:vrm2 = INSERT_SUBREG %1:vrm2, %21:vr, %subreg.sub_vrm1_1
early-clobber %11:vr = Op %22
PHI in def-use chain
--------------------
In Step 2, PHI will be seen as another instruction that will change the subregister defined region. The PHINodeLaneBitRecord will record the LaneBitMask from both predecessors, and insert the INSERT_SUBREG with this information.
F25402524: 螢幕擷取畫面_20221118_040618.png <https://reviews.llvm.org/F25402524>
F25402526: 螢幕擷取畫面_20221121_063340.png <https://reviews.llvm.org/F25402526>
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D129735/new/
https://reviews.llvm.org/D129735
More information about the llvm-commits
mailing list