[PATCH] [MachineLICM] Sink instructions only if they are unlikely to be executed

Quentin Colombet qcolombet at apple.com
Fri Mar 20 10:05:39 PDT 2015

Comment at: lib/CodeGen/MachineLICM.cpp:835
@@ -818,8 +834,3 @@
-      if (!B) {
-        B = MI.getParent();
-        continue;
-      }
-      B = DT->findNearestCommonDominator(B, MI.getParent());
-      if (!B) {
-        CanSink = false;
+      B = B ? DT->findNearestCommonDominator(B, MI.getParent())
+            : MI.getParent();
Aborting on PHI does not make much sense.
Instead, you should look for the common dominator for the use of the PHI, e.g., by looking at the dominator of the terminator of the related block. In that case, you wouldn’t insert at after the first non-PHI instruction.

Comment at: lib/CodeGen/MachineLICM.cpp:836
@@ +835,3 @@
+    // factor of the a cost model.
+    if (HeaderFreq * ColdProb < MBFI->getBlockFreq(B))
+      continue;
djasper wrote:
> qcolombet wrote:
> > djasper wrote:
> > > qcolombet wrote:
> > > > Shouldn’t we have a threshold in the other direction as well, i.e., Freq(B) < Freq(Preheader) * <some threshold>?
> > > Could you elaborate what that might accomplish?
> > > 
> > > Also the frequency of the *pre*header doesn't really relate strongly to the frequencies within the loop AFAICT.
> > Sure, but the instructions we are moving come from the pre header and we do not want them to explode the number of time they are executed.
> > Let say the pre-header is executed once and the loop 1,000,000 times. Then, even in the cold path of the loop, this is still rather expensive to sink the instructions.
> I think we will almost never have accurate information about that. And the cost model is also not easy. We are generally weighing up the cost of the additional computation in each loop vs. the cost of the additional live ranges of these registers and the register spills that might be a result. The latter might actually make the computation inside the loop *more* expensive as demonstrated in the changed test code. Here, we are very frequently accessing one of the struct's fields. If we hoist all of the GEPs out of the loop, we end up spilling them all onto the stack and each load becomes a load from memory. With this change, we keep the GEPs of the less frequently accessed fields inside the loops (where they are folded into the LEAs and potentially don't cause significant overhead over loading them from the stack onto which they would be spilled). We still pull out the GEP for the frequently accessed field and can actually keep that in a register using only a cheap MOV inside the loop.
> I think the long-term solution might be to order all instructions we could sink by frequency and then sink them into the loop starting from the least frequently executed until there is no more register pressure. Not entirely sure how to implement that correctly yet. My hope is that this patch is an incremental step towards that.
> We are generally weighing up the cost of the additional computation in each loop vs. the cost of the additional live ranges of these registers and the register spills that might be a result.

I agree, but checking for frequencies to derive a heuristic for that sounds wrong to me.
To simplify, let us assume that the spiller would insert the spill instructions at the exact same locations as the sinking algorithm.
The first question we should ask ourselves is: do the reload instructions are more expensive than the things we sunk?
If the answer is no, then there is no point in sinking. Frequency does not help for that.

Second, when do we consider the live-range we shorten/extend?
E.g., what if you sink a = add b<kill> + c<kill>?
You end up increasing the register pressure by one in the whole loop body and pushed the instruction in a more expensive location. Frequency does not help for that too.

The bottom line is, I believe that frequency has nothing to do, with the heuristic we want, to achieve better spill placement.
Therefore, I do not think this is a step toward fixing the problem.

I am fine if you want to do experimentation to gather ideas, but seeing gains out of that study seems more luck based than actual improvements.



More information about the llvm-commits mailing list