[PATCH] D82709: [MachineLICM] [PowerPC] hoisting rematerializable cheap instructions based on register pressure.

Mon Dec 21 02:54:08 PST 2020

shchenz added a comment.

@qcolombet @efriedma , sorry for the late response and the long main. I made a summary for this issue.

Before machine LICM:

  entry:
      outer-non-remat-def

  outer loop:
      inner loop:
          inner-remat-def
          inner-use
          b inner loop
      outer-use
      b outer loop

After machine LICM:

  entry:
      outer-non-remat-def
      inner-remat-def          ———>all remat definitions are hoisted out
  outer loop:
      inner loop:
          inner-use
          b inner loop
      outer-use
      b outer loop

machineLICM depends on RA(register allocator) remat the inner-remat-def down to its inner loop users. This is ok when there is level-one loop. But here we have a level-two loop, RA does not work as machine LICM expects.

If we hoist all remat definitions in machine LICM pass:
RA can make sure the loop where the inner-users are have no spills but it can not make sure the outer loop has no spills.

The process in Greedy Register Allocation after we hoist all remat instructions to entry block is like this:
1: outer defs have higher allocation priority than inner def because outer defs have larger live interval. But outer defs have smaller spill weight because inner def users are in the inner loop.
2.1: firstly RA allocates physical registers to outer def
2.2: then RA allocates physical register to inner def and evicts the assigned physical registers for outer defs when RA found there are not enough physical registers for inner defs. (outer defs have smaller spill weight).
2.3: then the outer def virtual registers are in stage split and in the second round for outer def virtual registers allocation, they are all in split stage, so RA will do spilling for them without trying to elicit any virtual registers.
2.4: for the inner-remat-def virtual registers, RA will try its best to assign physical registers to some of them and split some of them in the outer loop in the second round.
2.5: and at last, when RA allocates registers for inner-users, it will remat the use of remat-instructions down to the front of the inner-users to make sure there is no spill/reload in the inner loop.

After greedy register allocation:

  entry:
      outer-non-remat-def
      some of inner-remat-def 
  outer loop:
      some inner-remat-def       // split for inner-remat-defs
      inner loop:
          some inner-remat-def // remat when allocates for inner-users
          inner-use
          b inner loop
      ;;;;;; reload happens in outer loop
      outer-use
      b outer loop

So the issue here is, machine licm expects RA remat the inner-remat-def to the inner loop, but in fact, RA will first try to split the inner-remat-def to the outer loop and then when does allocation for inner loop register, it will remat the def to the front of the uses to make sure inner loop has no spill. There is no problem for the inner loop, we can make sure there is no spill by remat. But to avoid spill in the inner loop, we don’t need to remat all inner-remat-def splitted in the outer loop. The left inner-remat-defs in the header of the outer loop will increase outer loop register pressure for sure. MachineLICM is not aware of the register pressure increasing in the outer loop when it hoists all remat instructions to the outer loop preheader.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82709/new/

https://reviews.llvm.org/D82709