[PATCH] D136676: [AMDGPU] Speedup SIFormMemoryClauses live-in register set calculation
Valery Pykhtin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 25 14:48:33 PDT 2022
vpykhtin added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:280
- for (MachineBasicBlock &MBB : MF) {
- GCNDownwardRPTracker RPT(*LIS);
+ SmallVector<MachineInstr *, 16> FirstBBClauseMI;
+ for (auto &MBB : MF) {
----------------
vpykhtin wrote:
> arsenm wrote:
> > vpykhtin wrote:
> > > arsenm wrote:
> > > > You seem to be assuming a single clause per block. I'd expect to handle this a full clause in a time, within a single block.
> > > Not quite, I just compute the live-in set for the first clause per BB to reset the RPTracker, then it is advanced to the next clause.
> > Can you do this per block, instead of calculating this for every block?
> This is the whole point of doing that all at once:
>
> 1. Slot indexes of all clauses first instructions are collected and sorted.
> 2. For every virtual register's LiveRange we have two sorted sequences: Segments and SlotIndexes. We need to determine
> which of SlotIndexes fall into Segments of the virtual register - that would mean the register is live at those SlotIndexes.
> 3. Since both sequences are sorted we progressively use two-way binary search: either SlotIndex that is contained by the Segment, or Segment containing the SlotIndex.
>
> I now realize the complexity is not what I thought before, it should be (per register):
>
> O( min ( NumSegments * lg(NumSlotIndexes), NumSlotIndexes * lg(NumSegments) )
>
> See getLiveRegMap, LiveRange::findIndexesLiveAt.
Sorry I mean first instruction of a first clause per BB, following clauses are processed using 'advance'
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136676/new/
https://reviews.llvm.org/D136676
More information about the llvm-commits
mailing list