[PATCH] D136676: [AMDGPU] Speedup SIFormMemoryClauses live-in register set calculation

Valery Pykhtin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 25 14:48:33 PDT 2022


vpykhtin added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:280
 
-  for (MachineBasicBlock &MBB : MF) {
-    GCNDownwardRPTracker RPT(*LIS);
+  SmallVector<MachineInstr *, 16> FirstBBClauseMI;
+  for (auto &MBB : MF) {
----------------
vpykhtin wrote:
> arsenm wrote:
> > vpykhtin wrote:
> > > arsenm wrote:
> > > > You seem to be assuming a single clause per block. I'd expect to handle this a full clause in a time, within a single block.
> > > Not quite, I just compute the live-in set for the first clause per BB to reset the RPTracker, then it is advanced to the next clause.
> > Can you do this per block, instead of calculating this for every block?
> This is the whole point of doing that all at once:
> 
> 1. Slot indexes of all clauses first instructions are collected and sorted. 
> 2. For every virtual register's LiveRange we have two sorted sequences: Segments and SlotIndexes. We need to determine
> which of SlotIndexes fall into Segments of the virtual register - that would mean the register is live at those SlotIndexes.
> 3. Since both sequences are sorted we progressively use two-way binary search: either SlotIndex that is contained by the Segment, or Segment containing the SlotIndex.
> 
> I now realize the complexity is not what I thought before, it should be (per register):
> 
> O( min ( NumSegments * lg(NumSlotIndexes),  NumSlotIndexes * lg(NumSegments)  )
> 
> See  getLiveRegMap,  LiveRange::findIndexesLiveAt.
Sorry I mean first instruction of a first clause per BB, following clauses are processed using 'advance'


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136676/new/

https://reviews.llvm.org/D136676



More information about the llvm-commits mailing list