[PATCH] [MachineLICM] Sink instructions only if they are unlikely to be executed
djasper at google.com
Fri Mar 20 02:15:25 PDT 2015
Comment at: lib/CodeGen/MachineLICM.cpp:836
@@ +835,3 @@
+ // factor of the a cost model.
+ if (HeaderFreq * ColdProb < MBFI->getBlockFreq(B))
> djasper wrote:
> > qcolombet wrote:
> > > Shouldn’t we have a threshold in the other direction as well, i.e., Freq(B) < Freq(Preheader) * <some threshold>?
> > Could you elaborate what that might accomplish?
> > Also the frequency of the *pre*header doesn't really relate strongly to the frequencies within the loop AFAICT.
> Sure, but the instructions we are moving come from the pre header and we do not want them to explode the number of time they are executed.
> Let say the pre-header is executed once and the loop 1,000,000 times. Then, even in the cold path of the loop, this is still rather expensive to sink the instructions.
I think we will almost never have accurate information about that. And the cost model is also not easy. We are generally weighing up the cost of the additional computation in each loop vs. the cost of the additional live ranges of these registers and the register spills that might be a result. The latter might actually make the computation inside the loop *more* expensive as demonstrated in the changed test code. Here, we are very frequently accessing one of the struct's fields. If we hoist all of the GEPs out of the loop, we end up spilling them all onto the stack and each load becomes a load from memory. With this change, we keep the GEPs of the less frequently accessed fields inside the loops (where they are folded into the LEAs and potentially don't cause significant overhead over loading them from the stack onto which they would be spilled). We still pull out the GEP for the frequently accessed field and can actually keep that in a register using only a cheap MOV inside the loop.
I think the long-term solution might be to order all instructions we could sink by frequency and then sink them into the loop starting from the least frequently executed until there is no more register pressure. Not entirely sure how to implement that correctly yet. My hope is that this patch is an incremental step towards that.
More information about the llvm-commits