[LLVMdev] RFC:LNT Improvements

Chris Matthews chris.matthews at apple.com
Wed Apr 30 07:56:08 PDT 2014


I have so many comments about this thread! I will start here.

I think having a total compile time metric is a great idea. The summary report code already does this.  The one problem with this metric is that it does not work well as the test suite evolves and we add and remove tests, so it should be done on a subset of the tests, which is not going to change.  I would love to see that feature reported in the nightly reports.

In the past we have toyed with a total execution time metric (sum of the execution of all benchmarks), and it has not worked well.  There are some benchmarks that run for so long that they alone can swing the metric, and all the other little tests amount to nothing.  How the SPEC benchmarks do their calculations in might be relevant.  They have a baseline run, and the metric is the geometric mean of the ratio of current exec to base exec. That fixes the different sized benchmarks problem.

On Apr 30, 2014, at 7:34 AM, Tobias Grosser <tobias at grosser.es> wrote:

> On 30/04/2014 16:20, Yi Kong wrote:
>> Hi Tobias, Renato,
>> 
>> Thanks for your attention to my RFC.
> 
>> On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
>> >> - Show and graph total compile time
>> >>    There is no obvious way to scale up the compile time of
>> >> individual benchmarks, so total time is the best thing we can do to
>> >> minimize error.
>> >>    LNT: [PATCH 1/3] Add Total to run view and graph plot
>> >
>> > I did not see the effect of these changes in your images and also
>> > honestly do not fully understand what you are doing. What is the
>> > total compile time? Don't we already show the compile time in run
>> > view? How is the total time different to this compile time?
>> 
>> It is hard to spot minor improvements or regressions over a large number
>> of tests from independent machine noise. So I added a "total time"
>> analysis to the run report and able to graph its trend, hoping that
>> noise will cancel out and will help us to easily spot. (Screenshot
>> attached)
> 
> I understand the picture, but I still don't get how to compute "total time". Is this a well known term?
> 
> When looking at the plots of our existing -O3 testers, I also look for some kind of less noisy line. The first thing coming to my mind would just be the median of the set of run samples. Are you doing something similar? Or are you computing a value across different runs?
> 
>> On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
>> > I am a little sceptical on this. Machines should generally not be
>> > noisy. However, if for some reason there is noise on the machine, the
>> > noise is as likely to appear during this pre-noise-test than during
>> > the actual benchmark runs, maybe during both, but maybe also only
>> > during the benchmark. So I am afraid we might often run in the
>> > situation where this test says OK but the later test is still
>> > suffering noise.
>> 
>> I agree that measuring before each run may not be very useful. The main
>> purpose of it is for adaptive problem scaling.
> 
> I see. If it is OK with you, I would propose to first get your LNT improvements in, before we move to adaptive problem scaling.
> 
>> On 30 April 2014 07:50, Tobias Grosser <tobias at grosser.es> wrote:
>> > In general, I see such changes as a second step. First, we want to
>> > have a system in place that allows us to reliably detect if a
>> > benchmark is noisy or not, second we want to increase the number of
>> > benchmarks that are not noisy and where we can use the results.
>> Ok.
> 
> Obviously, as you already looked into this deeper, feel free to suggest different priorities if necessary.
> 
> Tobias




More information about the llvm-dev mailing list