[LLVMdev] Dev Meeting BOF: Performance Tracking

Tue Aug 5 07:41:36 PDT 2014

Kristof,
Unfortunately, our merge process is less than ideal.  It has vastly improved
over the past few months (years I hear), but we still have times where we
bring
in days/weeks worth of commits en mass.  To that end, I've setup a nightly
performance run against the community branch, but it's still an overwhelming
amount of work to track/report/bisect regressions.  As you guessed, this is
what motivated my initial email.

> On 5 August 2014 10:30, Kristof Beyls <Kristof.Beyls at arm.com> wrote:
>> The biggest problem that we were trying to solve this year was to produce
>> data without too much noise. I think with Renato hopefully setting up
>> a chromebook (Cortex-A15) soon there will finally be an ARM architecture
>> board producing useful data and pushing it into the central database.
>
> I haven't got around finishing that work (at least not reporting to
> Perf anyway) because of the instability issues.
>
> I think getting Perf stable is priority 0 right now in the LLVM
> benchmarking field.

I agree 110%; we don't want the bots crying wolf.  Otherwise, real issues
will
fall on deaf ears.

>> I think this should be the main topic of the BoF this year: now that we
>> can produce useful data; what do we do with the data to actually improve
>> LLVM?
>
> With the benchmark LNT reporting meaningful results and warning users
> of spikes, I think we have at least the base covered.

I haven't used LNT in well over a year, but I recall Daniel Dunbar and I
having
many discussion on how LNT could be improved.  (Forgive me if any of my
suggestions have already been address. I'm playing catch up at the moment.)

> Further improvements I can think of would be to:
>
> * Allow Perf/LNT to fix a set of "golden standards" based on past releases
> * Mark the levels of those standards on every graph as coloured horizontal
> lines
> * Add warning systems when the current values deviate from any past
> golden standard

I agree.  IIRC, there's functionality to set a baseline run to compare
against.
Unfortunately, I think this is too coarse.  It would be great if the golden
standard could be set on a per benchmark basis.  Thus, upward trending
benchmarks can have their standard updated while other benchmarks remain
static.

> * Allow Perf/LNT to report on differences between two distinct bots
> * Create GCC buildbots with the same configurations/architectures and
> compare them to LLVM's
> * Mark golden standards for GCC releases, too, as a visual aid (no
> warnings)
>
> * Implement trend detection (gradual decrease of performance) and
> historical comparisons (against older releases)
> * Implement warning systems to the admin (not users) for such trends

Would it be useful to detect upwards trends as well?  Per my comment above,
it would be great to update the golden standard so we're always moving in the
right direction.

> * Improve spike detection to wait one or two more builds to make sure
> the spike was an actual regression, but then email the original blame
> list, not the current builds' one.

I recall Daniel and I discussing this issue.  IIRC, we considered an
eager approach where the current build would rerun the benchmark to
verify the spikes.  However, I like the lazy detection approach you're
suggesting.  This avoids long running builds when there are real
regressions.

> * Implement this feature on all warnings (previous runs, golden
> standards, GCC comparisons)
>
> * Renovate the list of tests and benchmarks, extending their run times
> dynamically instead of running them multiple times, getting the times
> for the core functionality instead of whole-program timing, etc.

Could we create a minimal test-suite that includes only benchmarks that
are known to have little variance and run times greater than some decided
upon threshold?  With that in place we could begin the performance
tracking (and hopefully adoption) sooner.

> I agree with Kristof that, with the world of benchmarks being what it
> is, focusing on test-suite buildbots will probably give the best
> return on investment for the community.
>
> cheers,
> --renato

Kristof/All,
I would be more than happy to contribute to this BOF in any way I can.

 Chad

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation