[PATCH] D137813: [RegAlloc Greedy]Account statepoints while splitting single basic block

Tue Nov 15 19:28:08 PST 2022

skatkov added a comment.

Hi Quentin, thank you for participation in this discussion and your comments.
Today, my plan to go to InlineSpiller and learn in details what I'm missing :)

About the issue I'm trying to solve. I wrote a dedicated (more or less small) test to show the problem (llvm/test/CodeGen/X86/statepoint-split-single-block.ll).

First of all some details about statepoint instruction (related to interaction with register allocator).
Statepoint instruction is semantically a call with some additional information (like deopt state and gc-lives). For this additional information (some elements are represented as virtual registers before register allocation), all we need is two know where the corresponding value is located. It would be perfectly ok if it is a register or stack location. This information will be encoded into stack map during machine instruction lowering.
So difference between statepoint instruction and other instructions that it has operands which do not require physical register as operand and more over they do not prefer register.
InlinerSpiller will successfully fold any load from stack for such operands and unspill will be eliminated.

Return back to my test and problem to solve.
There is incoming argument %arg which has actually three uses and two defs.

1. def as incoming argument
2. use in a copy to rdi as it an argument to callee
3. use in a statepoint instruction as gc-live
4. def in a statepoint as gc-live can be relocated
5. use in return statement

2-3-4 are in the same block and 3-4 are on the same instruction.
To disable region spilt and force spill in the first block and unspill in the last block I've added calls to @nocsr which is actually a callee which does not preserve any physical registers.
As a result when we come to basic block with a call.
It observes three uses (2-3-4) and decides to make a split around them. As a result we got a new live interval with eliminated constrains caused by call to @nocsr.
Register allocator has a lot of free callee saved register and assign one of them to this live interval.
Technically we did a good job - made a split and was able to allocate live interval to register.
However complement interval goes through @nocsr constraints and has nothing to do except spilling.

So we enter basic block on stack, do unspill to chosen register rbx, copy rbx to rdi (use 2), use this rbx in 3-4 and finally spill rbx back due to we exit on stack.
However as I said statepoint does not require register. If we did not allocate physical register to our interval and just spill around uses then
we load rdi from stack and did nothing for statepoint.
That it was the test shows.

Now, what was in my mind behind this patch. I thought it is a win-win patch due to I thought about single basic block split as follows:
we enter on stack and exit on stack, we do not care about statepoint and now if there is only one use, do not create new interval and just go on stack.
It makes sense.

It looks like, now I should think about it in the following way:
statepoint instruction in its specific operands does not introduced any constraints (or what constraints?) and so no sense to split it in a separate interval.
That is what in my mind at the moment :)

Probably I should do it in completely other direction - when Inline spiller spill some interval, check its siblings and if they are perfectly good to be on stack, unassing them from register and spill as well...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137813/new/

https://reviews.llvm.org/D137813