[PATCH] D39976: [AArch64] Consider the cost model when folding loads and stores

Evandro Menezes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 18 11:49:51 PST 2017


evandro added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:696
+    // ... the new instruction is not more complex than both old ones.
+    if (AUops > 1 || BUops > 1)
+      return NewUops <= (AUops + BUops);
----------------
gberry wrote:
> evandro wrote:
> > junbuml wrote:
> > > Can you add more comment why you do this? Is this target independent ? 
> > I'm not a hardware designer, but AFAIK many targets hiccup when an instruction is decoded into more than one uop.  How bad the hiccup is, if at all, does depend on the target though.  Some decrease the decode bandwidth, typically by inserting a bubble whose size depends on the design.  This heuristic is an attempt at mitigating the new instruction inducing such a bubble.
> > 
> > If the new instruction has a shorter latency, then it's chosen.  One might wonder if it's still a good choice if it induces a bubble, but I could not devise a satisfying heuristic.
> > 
> > If the latency of the new instruction is the same as the combined latency of the both old ones, then the potential of inducing a bubble is considered.  If either of the old instructions had multiple uops, then even if the new one has them too it's probably no worse than before.  However, if neither of the old instructions resulted in multiple uops, the new one is chosen only if it results in fewer uops than before.  One might argue that, if bubbles when decoding into multiple uops are the norm among targets, it'd be better to choose the new instruction only if it doesn't potentially induce bubbles itself.
> > 
> > If the new instruction has a longer latency, then it's discarded.  Again, if it mitigates decode bubbles it might still be profitable, but the conditions seem hard to weigh in general.
> It seems like if you want to take this multi-uop bubble into account, you should make it explicit, and make the subtarget opt in to it.
> The simple heuristic here to me would be:
> 
> ```
> if (newLatency < oldLatency)
>   return true;
> if (newLatency > oldLatency)
>   return false;
> if (newUops < oldUops)
>   return true;
> if (newUops > oldUops)
>   return false;
> if (newMultiUopPenalty < oldMultiUopPenalty)
>   return true;
> return false;
> ```
How do you propose that the new penalty be calculated?


================
Comment at: llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:707
+  else if (UopA <= 1 && UopB <= 1) {
+    if (LatDif < -UopDif) {
+      DEBUG(dbgs() << "Evaluating instructions: replacement is faster "
----------------
gberry wrote:
> evandro wrote:
> > junbuml wrote:
> > > I'm not sure if it's okay to compare values from  computeInstrLatency() and getNumMicroOps() ? 
> > That's what the heuristic does, as briefly explained in the debug message below.
> I think Jun's point was that these values are in completely different units, so comparing them doesn't make sense.
It´s very common in process control to use one variable to effect another, as long as there´s at least an implied correlation between them.


Repository:
  rL LLVM

https://reviews.llvm.org/D39976





More information about the llvm-commits mailing list