[PATCH] Tune TTI getMaxInterleaveFactor for POWER8

Tue Feb 10 08:16:57 PST 2015

In http://reviews.llvm.org/D7503#121303, @ohsallen wrote:

> Hal,
>
> > Thanks for working on this, but I don't quite understand the logic (stacking the latency of the two pipelines seems odd to me). How did you tune this?
>
>
> I based this on the comment above the default case: to me, it seems that we can have 12 FP operation in the pipeline. Did you expected that number to be 6?
>
>   // For most things, modern systems have two execution units (and
>   // out-of-order execution).
>   return 2;
>   

Ah, okay. The logic behind the comment was to create a reasonable default. The idea is that you interleave (which, to be clear, is what is often called modulo unrolling) by 2 to fill both functional units under the assumption that the ooo dispatching would take care of hiding instruction latency. Obviously, when you know something about the latency, you can do better.

And so you're right, if we follow that logic, then 12x would be right. Of course, except for very simple loops, we can't unroll that much because of register pressure (and I'm not entirely sure how accurate the IR-level register use estimator will be in this regard). It is also too much for integer instructions (which I imagine have lower latency?), although maybe not for vector integer ops?

In short, I'm slightly worried about setting such a large number without supporting measurements, because by the time that instruction scheduling, register allocation, and the core's ooo dispatching and dispatch group formation get involved, it might not be optimal.

> > Would the same logic apply to the `P7`?

> 

> 

> You are right (if the logic makes sense!).

http://reviews.llvm.org/D7503

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/