[PATCH] D147532: [MachineLateInstrsCleanup] Improve compile time for huge functions.

Sun Apr 30 04:58:37 PDT 2023

jonpa marked an inline comment as done.
jonpa added a comment.

It seems that this would result in this loop:

  for (auto DefI = MBBDefs.begin(); DefI != MBBDefs.end();) {
    Register Reg = DefI->first;
    if (MI.modifiesRegister(Reg, TRI)) {
      MBBDefs.erase(DefI++);
      MBBKills.erase(Reg);
      continue;
    } else if (MI.findRegisterUseOperandIdx(Reg, false /*isKill*/, TRI) != -1)
      // Keep track of the last use seen so far.
      MBBKills[Reg] = &MI;
    DefI++;
  }

I find that acceptable, but the iteration logic is slightly complicated. That's why I chose to ignore the extra lookup when calling erase(Reg) in the first place:

  for (auto DefI : llvm::make_early_inc_range(MBBDefs)) {
    Register Reg = DefI.first;
    if (MI.modifiesRegister(Reg, TRI)) {
      MBBDefs.erase(Reg);
      MBBKills.erase(Reg);
    } else if (MI.findRegisterUseOperandIdx(Reg, false /*isKill*/, TRI) != -1)
      // Keep track of the last use seen so far.
      MBBKills[Reg] = &MI;
  }

I compared both these versions just now on a z15 with your test case, but found no difference:

        ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  itr:   4.8783 (  0.4%)   0.0011 (  0.0%)   4.8794 (  0.4%)   4.8833 (  0.4%)  Machine Late Instructions Cleanup Pass
  reg:   4.8384 (  0.4%)   0.0009 (  0.0%)   4.8393 (  0.4%)   4.8439 (  0.4%)  Machine Late Instructions Cleanup Pass

I guess this could mean that this test case does a lot of reuse of the earlier Candidate, so it doesn't call erase() here that much, maybe.

On SPEC/SystemZ they both average 0.52% compile time across all files, so there is no difference there either.

The situation may be different on other systems, I suppose. I was using GCC version used to build clang: 12.2.1. Do you see any compile time difference at all on your platform?

I would personally vote for the readability (ignore the erase(Reg) potential overhead), but I will let you the reviewer(s) decide :)

================
Comment at: llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp:236
+      if (MI.modifiesRegister(Reg, TRI)) {
+        MBBDefs.erase(Reg);
+        MBBKills.erase(Reg);
----------------
vpykhtin wrote:
> jonpa wrote:
> > jonpa wrote:
> > > vpykhtin wrote:
> > > > vpykhtin wrote:
> > > > > Sorry, I didn't notice this change first.
> > > > > 
> > > > > You can still use erase by iterator (it's slightly faster than another lookup) because you're using make_early_inc_range.
> > > > bad advice - DefI is already incremented after Register Reg = DefI.first. 
> > > > 
> > > > May be you can leave previous variant of iteration, you can increment the iterator like this:
> > > > 
> > > > MBBDefs.erase(DefI++) instead of DefI = MBBDefs.erase(DefI)
> > > > 
> > > > 
> > > Yeah - I went for readability at first here as I did not see any change in compile time.
> > I did not find anything to convince me 100% that this use of erase with iterator is legal, so I will have to trust you on this ;-)
> > 
> > I think the readability is ok now as well with this.
> > 
> Now you're incrementing DefI after erase and in general this is incorrect for STL containers (though it's not the case here) because erase invalidates iterators to erased elements.
> 
> MBBDefs.erase(DefI++)  works because arguments are fully evaluated before function call, so this is equivalent to:
> 
> auto DefIBefore = DefI;
> ++DefI; // Now DefI is valid after erase 
> MBBDefs.erase(DefIBefore);
Sorry,I thought you were implying that increment after erase() was fine with this SmallDenseMap...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147532/new/

https://reviews.llvm.org/D147532