[LLVMdev] Postponing more passes in LTO

Sean Silva chisophugis at gmail.com
Thu Jan 8 15:24:52 PST 2015


On Thu, Jan 8, 2015 at 12:55 AM, Chandler Carruth <chandlerc at google.com>
wrote:

>
> On Thu, Dec 18, 2014 at 2:16 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>> Okay, so I've cast doubt on the use of standard deviation (and actually
>> an argument could be made against the mean too for similar reasons). On a
>> more constructive note, what *is* a good small set of parameters to
>> summarize measurements of program run time? One interesting model that was
>> brought to my attention by Nick Lewycky is based on the following
>> observation: most "perturbations" of the run time of a fixed program binary
>> on a fixed chip and fixed OS are due to externalities that contend for (or
>> otherwise deprive the program of) execution resources and therefore almost
>> always make it *slower*.
>>
>> The model is then that there is a "true" execution time for the program,
>> which is a completely deterministic function of the program, chip, and OS;
>> intuitively, this is time occurs when the program is e.g. able to execute
>> by itself, on a single execution unit of the CPU, with no interrupts
>> knocking it off CPU, and no other threads using any of the execution
>> resources of this CPU. On top of this, we assume that there is a set of
>> perturbations that contend for execution resources and slow the program
>> down.
>>
>> For this model, it makes sense to approximate the "true" value by the
>> maximum of the recorded values. If we want to use two numbers to summarize
>> the data, it probably makes sense to look at a way of characterizing the
>> extent of the variation from this "true" value. In the absence of proper
>> characterization of the perturbations (such as the distribution of when
>> they occur and how much effect they have when they occur), one simple
>> solution is the minimum. Max and min might be *too* sensitive to the
>> perturbations (especially disk cache of the program itself on the first
>> run), so a pair of percentiles might be a bit more reliable, such as 10th
>> percentile and 90th percentile.
>>
>
> I used to subscribe to this thinking. I no longer do, and I think it has
> serious problems. Let me try to explain with a hopefully entertaining
> analogy. =D
>
> Back in physics class we would analyze the behavior of basic principles of
> physics as they would act upon a brave cadre of scientifically savvy
> penguins sliding down slopes under the effect of gravity. Naturally, these
> slopes were very slick ice, and the physics problems would be solved
> "neglecting friction"[1]. That made the analysis much more tractable and
> predictable. It also makes it really inaccurate in the real world.
>
> Now, I'm not actually deriding this approach to physics. The results from
> neglecting friction are *very* informative as to the basic principles at
> work. In a very similar way, I don't think the model of a "true" execution
> time is very informative, but much more about the principles at work than
> about the actual performance of the software.
>

Absolutely. One the one hand a good model is a useful predictor of reality
when its assumptions are met. Conversely, looking at how reality deviates
from the model helps you understand the thing which is causing a deviation
from the model. The penguins will have a much easier time analyzing dirt
slopes if they already understand gravity from the frictionless slopes.

Of course, it has to actually be a *good* model for this to apply. I don't
know if what I described is a good model. However, it was the first model
for program performance that I've seen that actually seemed to be based on
a pretty sound idea: characterizing externalities as contending for (or
otherwise depriving the program of) execution resources.


>
>
> The reality I have observed these days, especially in larger software
> systems, is that there is inherently no deterministic and representative
> execution to rely on for finding "true" execution time. And too many of the
> random fluctuations are actually by-products of the program itself. Some
> examples:
>
> - If the program runs uninterrupted on a single core, it may cause that
> core to be throttled due to the thermals generated. This can have the
> effect of lowering the overall performance more than what you would see if
> the program ran *just* under that level without repeated drops in frequency.
>
> - Different optimizations can change the thermal profile by using
> different execution units. This is especially true for vector units.
>
> - If you fully leverage the AVX unit of some intel chips, my understanding
> is that regardless of the specific thermals, those units run at a lower
> clock!
>
> - Naturally, things like ASLR or other sources of correct but
> non-deterministic layout of data in memory will cause fluctuations in the
> rate at which cache misses occur. This is probably the most common cause of
> non-external fluctuations in performance. Different generated code,
> register spill frequencies, etc., can dramatically change these cache
> collision rates.
>
> Now, with all of these factors, we have to assume that in real world
> scenarios, the programs will not always run at the maximal speed. Instead,
> there is, much as you describe in your email, some distribution of
> performance that will be observed in practice. The interesting question is
> -- can generating different code from the compiler shift the distribution
> in a meaningful way that is not correlated with the fastest measurement (or
> the N fastest measurement). And I think the above examples are all places
> where this can happen. =/
>
> As a consequence, I am increasingly in favor of using very simple but
> aggregated statistics for measuring performance and acknowledging the lack
> of precision when it is present. We can still very often measure precisely
> those localized improvements provided by a specific change even when very
> small. We just won't see a reflection from them in many cases when
> measuring large systems that have a high noise floor. Hopefully, we can at
> least observe good trends in the large system to confirm we are making
> progress, and at times we can make large improvements that actually rise
> above the noise floor.
>

If I'm understanding you correctly, you are emphasizing the need to be sure
that the holistic system we ultimately care about improving is tracking the
improvements we think we're making based on more localized measurements. I
fully agree.

-- Sean Silva


>
> Anyways, my somewhat rambling 2 cents on this. Sorry for the not entirely
> structured email. =]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150108/bdda1f6a/attachment.html>


More information about the llvm-dev mailing list