[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors (PR #108596)
via cfe-commits
cfe-commits at lists.llvm.org
Wed Sep 25 21:47:02 PDT 2024
https://github.com/ruiling commented:
With one trick mentioned inline, we can further reduce one instruction. In most cases, we would have to introduce one scalar instruction in if-block and one salu in flow-block compared with existing solution.
Block prologue is unclear concept and not well implemented. And we don't know if there are any gaps in LLVM core framework to make prologue work correctly.
We first introduced block prologue to get the insertion point past the `s_or_bnn exec` for valu instructions. For salu, we still allow them being inserted into the prologue because the `s_or_bnn exec` needs an input sgpr. for threaded-vgpr, we should always put them after prologue. But as we may also reload sgpr from wwm-vgpr, I think we also should put wwm-vgpr reload in block prologue.
Another issue with the implementation is we should only count the instructions before the `s_or_bnn exec` as prologue. Like in the case below, the last prologue instruction should be the `S_OR_B64 exec`.
>$exec = S_OR_B64 $exec, killed renamable $sgpr48_sgpr49, implicit-def $scc
%129034:vgpr_32 = SI_SPILL_V32_RESTORE %stack.265, $sgpr32, 0, implicit $exec
%129055:vgpr_32 = SI_SPILL_V32_RESTORE %stack.266, $sgpr32, 0, implicit $exec
%129083:vgpr_32 = SI_SPILL_V32_RESTORE %stack.267, $sgpr32, 0, implicit $exec
%129635:vgpr_32 = SI_SPILL_V32_RESTORE %stack.282, $sgpr32, 0, implicit $exec
Although tuning the `isSpill()` check help for this case, we still possibly have other issues related to this. I think it would better we fix this, but I am not sure it can be easily done.
Meanwhile I think it would definitely be helpful to figure out whether LiveRangeSplit can work with block prologue properly at least for the known cases.
https://github.com/llvm/llvm-project/pull/108596
More information about the cfe-commits
mailing list