[PATCH] D136918: [AMDGPU] Scheduler: fix RP calculation for a MBB with one successor
Valery Pykhtin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Oct 29 02:51:18 PDT 2022
vpykhtin added a comment.
There're some difficulties with LIS however, example (reduced):
0B bb.0.entry: successors: %bb.4, %bb.1;
...
320B S_CBRANCH_VCCNZ %bb.4, implicit killed $vcc
336B bb.1: ; predecessors: %bb.0, successors: %bb.2;
352B ...
368B undef %47.sub2:vreg_96 = IMPLICIT_DEF
432B bb.2.Flow: ; predecessors: %bb.4, %bb.1, successors: %bb.3, %bb.5;
512B ...
...
624B S_BRANCH %bb.3
640B bb.3.if: ; predecessors: %bb.2, successors: %bb.5;
656B ...
672B ...
720B %47:vreg_96 = IMAGE_SAMPLE_V3_V2 %48:vreg_64, %0:sgpr_256, %1:sgpr_128, 7, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "ImageResource")
832B S_BRANCH %bb.5
848B bb.4.else: ; predecessors: %bb.0, successors: %bb.2;
864B ...
...
928B %47:vreg_96 = IMAGE_SAMPLE_V3_V2 %40:vreg_64, %0:sgpr_256, %1:sgpr_128, 7, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load (s128) from custom "ImageResource")
...
1072B S_BRANCH %bb.2
1088B bb.5.endif: ; predecessors: %bb.2, %bb.3
1136B ...
1168B EXP_DONE 0, %47.sub0:vreg_96, %47.sub1:vreg_96, %47.sub2:vreg_96, %49:vgpr_32, -1, 0, 15, implicit $exec
1184B S_ENDPGM 0
Live intervals for %47
%47 [368r,432B:3)[432B,640B:1)[720r,848B:0)[928r,1088B:4)[1088B,1168r:2) 0 at 720r 1 at 432B-phi 2 at 1088B-phi 3 at 368r 4 at 928r
L0000000000000030 [368r,432B:3)[432B,640B:1)[720r,848B:0)[928r,1088B:4)[1088B,1168r:2) 0 at 720r 1 at 432B-phi 2 at 1088B-phi 3 at 368r 4 at 928r
L000000000000000C [432B,640B:1)[720r,848B:0)[928r,1088B:4)[1088B,1168r:2) 0 at 720r 1 at 432B-phi 2 at 1088B-phi 3 at x 4 at 928r
L0000000000000003 [432B,640B:1)[720r,848B:0)[928r,1088B:4)[1088B,1168r:2) 0 at 720r 1 at 432B-phi 2 at 1088B-phi 3 at x 4 at 928r
bb.2.Flow has two predecessors bb.4 and bb.1:
- at the end of bb.1 %47 has live mask 0x30 - L0000000000000030 [368r,432B:3) due to undef %47.sub2:vreg_96 = IMPLICIT_DEF
- at the beginning of bb.2.Flow %47 has 0x3F lane mask due to 928B %47:vreg_96 = IMAGE_SAMPLE_V3_V2 ...
In this example we're tracking bb.1 first and going to continue in bb.2.Flow with the liveouts left from bb.1 but liveout for %47 has 0x30 mask and livein should have 0x3F.
I'm not sure if the undef definition with subreg should be treated as a whole reg def because LIS doesn't think so. However there is a different point of view:
llvm/lib/CodeGen/RegisterPressure.cpp: line 541
void collectOperandLanes(const MachineOperand &MO) const {
...
// Treat read-undef subreg defs as definitions of the whole register.
if (MO.isUndef())
SubRegIdx = 0;
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136918/new/
https://reviews.llvm.org/D136918
More information about the llvm-commits
mailing list