[PATCH] D68205: [ModuloSchedule] Peel out prologs and epilogs, generate actual code
James Molloy via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 1 05:14:28 PDT 2019
jmolloy added a comment.
Hi Thomas,
I made an example to show how this is handled.
%11:intregs = S2_addasl_rrri %7, %6, 1, post-instr-symbol <mcsymbol Stage-0_Cycle-0>
%12:intregs = L2_loadruh_io %11, -4, post-instr-symbol <mcsymbol Stage-1_Cycle-0> :: (load 2 from %ir.cgep2, !tbaa !0)
%5:intregs = S2_storerh_pi %6, -2, %12, post-instr-symbol <mcsymbol Stage-2_Cycle-0> :: (store 2 into %ir.lsr.iv, !tbaa !0)
ENDLOOP0 %bb.3, implicit-def $pc, implicit-def $lc0, implicit $sa0, implicit $lc0
We generate this code, annotated:
<... prolog, boring ...>
bb.3.b2 (address-taken): // Kernel.
successors: %bb.3(0x7c000000), %bb.10(0x04000000)
%15:intregs = PHI %25, %bb.6, %11, %bb.3
%17:intregs = PHI %26, %bb.6, %12, %bb.3
%11:intregs = S2_addasl_rrri %7, %6, 1, post-instr-symbol <mcsymbol Stage-0_Cycle-0>
%12:intregs = L2_loadruh_io %15, -4, post-instr-symbol <mcsymbol Stage-1_Cycle-0> :: (load 2 from %ir.cgep2, !tbaa !0)
dead %5:intregs = S2_storerh_pi %6, -2, %17, post-instr-symbol <mcsymbol Stage-2_Cycle-0> :: (store 2 into %ir.lsr.iv, !tbaa !0)
ENDLOOP0 %bb.3, implicit-def $pc, implicit-def $lc0, implicit $sa0, implicit $lc0
J2_jump %bb.10, implicit-def $pc
bb.10.b2: // Epilog 0, runs stage 2
successors: %bb.9(0x80000000)
%40:intregs = PHI %11, %bb.3, %25, %bb.6
%41:intregs = PHI %12, %bb.3, %26, %bb.6
dead %44:intregs = S2_storerh_pi %6, -2, %41, post-instr-symbol <mcsymbol Stage-2_Cycle-0> :: (store 2 into %ir.lsr.iv, !tbaa !0)
J2_jump %bb.9, implicit-def $pc
bb.9.b2: // Start of Epilog 1, runs stage 1
successors: %bb.8(0x80000000)
%35:intregs = PHI %40, %bb.10, %20, %bb.5
%38:intregs = L2_loadruh_io %35, -4, post-instr-symbol <mcsymbol Stage-1_Cycle-0> :: (load 2 from %ir.cgep2, !tbaa !0)
J2_jump %bb.8, implicit-def $pc
bb.8.b2: // Next stage of Epilog 1, runs stage 2
successors: %bb.7(0x80000000)
dead %34:intregs = S2_storerh_pi %6, -2, %38, post-instr-symbol <mcsymbol Stage-2_Cycle-0> :: (store 2 into %ir.lsr.iv, !tbaa !0)
J2_jump %bb.7, implicit-def $pc
The key is that though E1 runs stages {1,2}, we *don't* create a block with both stages {1,2} enabled. This would cause the invalid code issue you mentioned. Instead, we expand this into *two* blocks. The first performs stage 1, the second stage 2 which consumes its input from stage 1.
That means we do generate superfluous epilog blocks, but these get merged by the control flow optimizer later.
That said, I'm not guaranteeing there are no bugs here. Perhaps the testcase you're thinking of distills to something more complex than my testcase?
Cheers,
James
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68205/new/
https://reviews.llvm.org/D68205
More information about the llvm-commits
mailing list