[PATCH] D15302: [Greedy regalloc] Replace analyzeSiblingValues with something new [Part1]

Fri Jan 8 17:02:25 PST 2016

wmi added a comment.

Thank you for the review.

> I think it is not the right approach because this creates a (bigger) gap between the cost model of the register allocator and the actual spill cost. Indeed, the cost model of the register allocator, w.r.t. spill cost, is basically reload before the uses, spill after the definitions.

>  In other word, it is better to keep the spiller simple but be smarter on the splitting of live-ranges so that the register allocator takes the right spilling/splitting decisions. That way, sharing spills/reloads will come naturally without to do anything in the spiller plus we may get better copies placement.

> 

> This was also, I believe, what Jakob had in mind when he described a solution in PR17409.

I have some different ideas here because of the following three things:

1. I tried Jakob's solution when I just started to work on this problem, but quickly I realized a fundamental problem there compared with the existing way using InlineSpiller::analyzeSiblingValues. For Jakob's solution, it can only see the spills generated from siblings splitted from current VirtReg, .i.e, in current round of selectOrSplit. Let's say current VirtReg http://reviews.llvm.org/diffusion/L/ is splitted into siblings: R2, R3, R4 (Suppose R2 is for the remainder interval). Suppose R4 is further splitted in the next round of selectOrSplit and a new remainder interval R5 is generated. If a spill for R2 dominate a spill for R5, Jakob's solution cannot remove the spill for R5 because they are generated in different rounds of selectOrSplit. InlineSpiller::analyzeSiblingValues doesn't have the issue and that is exactly why InlineSpiller::analyzeSiblingValues has to pay so much cost to track the siblings with equal values.

  To evaluate how serious the problem above is, I did some experiment using Jakob's solution. I wrote a sanity check to caught redundent spills left over in the final stage (To make sure no later phase will clean up the redundent spills) because of the issue above, and I caught such cases in about 100 files when building llvm testsuite. And when I had the solution in this patch ready, I also used the sanity check and have ensured all those redundent spills have been cleaned up.

2. In Jakob's solution, the spills sharing work is done mostly in func SplitEditor::hoistCopiesForSize. hoistCopiesForSize is called inside SplitEditor::finish (last step of splitting) .i.e, it hoist spills and removes redundent spills after the splitting decision has been done for the current VirtReg. So for Jakob's solution, it has the same cost model issue as what you described here.

3. For my solution, the major func is called after RegAllocBase::allocatePhysRegs. That is to say it keeps the reload/spills in the original places (reload before the uses, spill after the definitions) during RegAllocBase::allocatePhysRegs, and only try to share/hoist spills after most of the regalloc work is done. So I think my solution seems to be more close to your idea here. It is like a cleanup work after regalloc, which has simpler logic and clearer impact which will be easier for performance tuning. In comparison, Jakob mentioned in PR17409 that enabling either split spill mode is going to affect the live range splitting algorithm a lot, and somebody has to track down the regressions and fix them.

> Concretely, what you should do:

>  0. Create a baseline for performance comparisons without any changes

> 

> 1. Add an option to disable the InlineSpiller::analyzeSiblingValues, say -disable-spill-analyze-sibvalue

> 2. Benchmark with (-mllvm) -disable-spill-analyze-sibvalue  (-mllvm) -split-spill-mode=size (we may want to use “speed" and improve that splitting mode instead of “size")

> 3. Investigate the regressions and/or file PRs

> 4. Fix the regressions

> 

>   At this point, the new (or size) split mode should be better or equivalent to the baseline and we can just kill analyzeSiblingValue and make that new mode the default.

Actually when I was working on the patch, I followed your steps here to improve it gradually by comparing with existing implementation and fixing regressions.

Another thing is: No matter which solution is adopted in the end, Part2 is needed because of there is no way to get DefMI for rematerialization after removing InlineSpiller::analyzeSiblingValues.

Thanks,
Wei.

Repository:
  rL LLVM

http://reviews.llvm.org/D15302