[LLVMdev] GSoC Proposal: Profiling Enhancements
Alastair Murray
alastairmurray42 at gmail.com
Thu Apr 5 11:18:51 PDT 2012
Hello Everyone,
Before I get started I just want to sincerely apologise for not getting
feedback on this earlier. I've had an extremely busy week as I was
presenting a paper at the CGO conference. If anyone is able to provide
feedback in such a short time-frame then it will be gratefully received.
If not, then I just hope the work described sounds useful. I have
already submitted this proposal due to the short time until the
deadline, but I can make changes. (By the way, no need to worry about
me being so busy during the coding period, see below.)
Proposal
--------
=== Overview ===
LLVM already contains a profiling framework, but only a handful of
transforms make use of the metadata. Further, it even contains a path
profiling framework, but no transforms make use of it. This "Google
Summer of Code" proposal lays out an achievable plan to enhance
profiling in LLVM and to use profiling metadata in key transformations
where it can have a strong positive effect.
=== Why this is Useful for LLVM ===
Many profiling-data friendly transformations in LLVM do not currently
use that data. Specific examples are loop-unrolling, loop-unswitching
and inlining that can all find good performance improvements, though at
the cost of increased code-size. Using profiling data with these three
transformations would allow faster code in hot areas by applying these
more aggressively and smaller code cold areas where minimising
instruction cache cold-hits is the key performance concern.
Additionally, the path profiling framework was added to LLVM in January
2011 but hasn't been touched since (except for wide-spread API updates).
As no transforms use the the produced metadata the code never gets
tested (though the mailing lists suggest a small number of people may
still be using it for external purposes). This code is in serious
danger of suffering from "bit-rot". This proposal suggests enhancing a
transformation to use this information to improve code-quality, and as a
side-effect this stale code will be testable. Superblock formation
already has a basic implementation in LLVM (it is called
Tail-Duplication) but it does not use profiling information. It can be
improved by using either basic profiling information (for small gains)
or path profiling information (for larger gains).
The aim of this proposal is that it should result in code which really
will be integrated into LLVM. Scanning a small subset of previous
"Google Summer of Code" efforts (for many projects, not just LLVM) it
appeared that few projects were actually able to use the contributed
code. Often students would do a good job, but would not quite truly
complete the work. Once summer was over no-one had time to polish the
code, so it would be left unused. With this in mind I am proposing an
achievable but useful project which is decomposable into related
sub-goals. Time has been left in the proposal plan to produce code of
high enough quality to be committed to the LLVM repository, and to have
time to find sensible default parameters for heuristics that are
changed. Also, risk is reduced as the project is decomposable into
seven largely independent sub-goals, even if one aspect does not work,
the other six should still be usable by the LLVM project.
=== About Me ===
I am a PhD student at the University of Edinburgh, in the UK. I am very
close to the end of my studies and have a few months of time available
that line-up perfectly with "Google Summer of Code" (I won't describe
why I have the time here, but feel free to ask me via private email or
IM if you're curious). During my PhD I produced seven papers (five as
first author) based on research built on the GCC and SUIF compilers. I
have over four years of full-time experience working with compilers, and
another two years during my undergraduate degree where compilers were
part of my studies. My research has focussed on both code-generation
and middle-end transformations (my publications:
http://homepages.inf.ed.ac.uk/s0233454/pubs.html). In the past I have
also undertaken internships which involved hand-coding assembly, so I've
no shortage of experience working on optimisation.
I have not worked with LLVM before, but I am genuinely very keen to do
so. Looking at job advertisements and having just attended CGO 2012 I
noticed at both that LLVM is now used more than GCC (required for more
job postings, used in more papers). That is one reason why this
proposal touches multiple transformations within LLVM, I'd like to learn
as much as possible about LLVM while using my existing skills to produce
something useful for the project. The opportunity to have an expert
help mentor me through LLVM is an excellent opportunity. Hopefully my
general compiler experience can offset my lack of LLVM experience as it
always fun to get to use new technologies. Additionally I have worked
on multiple projects written in C++ during my studies so the programming
aspects of this proposal will not present any issues.
Contact information:
* CUT -- This is in the official submission *
=== Further Proposal Details ===
All of the below used http://llvm.org/OpenProjects.html and
http://nondot.org/sabre/LLVMNotes/ as a starting point.
The plan is to become familiar with working within the LLVM
infrastructure during the "Bonding Period". The best way to do that, of
course, is to get stuck in, so I plan to add a few small benchmark items
to "test-suite". One of the above links says that DSPStone and UTDSP
should be added. As I have worked with both of these before I will add
these to "test-suite". The process of doing so is probably the best way
to learn how the LLVM testing infrastructure works. UTDSP, however, is
non-free for commercial use, so I will only be able to add that as an
external benchmark (like SPEC).
Further to this, the path-profiling pass currently in LLVM does not seem
to have a test-case in "test", so I will create one to aid me in
learning how verification works in LLVM.
Once the coding period begins I will first want to start working with
the profiling infrastructure directly. I will modify the profiler so
that run-time instrumentation is not inserted for loops with iterations
counts that are known at compile time. This will be added to
OptimalEdgeProfiling.cpp, and assuming it has a very-low compile-time
overhead it will also be added to EdgeProfiling.cpp. Test-cases will be
created for "test", and compile-time overhead and run-time benefits will
be measured using "test-suite".
After this three transforms will be modified to exploit profiling data:
loop unrolling (LoopUnrollPass.cpp), unswitching (LoopUnswitch.cpp) and
inlining (Inliner.cpp). This will be done by modifying the cost
heuristics (e.g. in LoopUnroll::runOnLoop or Inliner::shouldInline).
Initially this will be done by raising the threshold which limits the
transformation on hot code, and lowering it on cold code (changing both
so perhaps the overall produced code size will not change much). If
required (or if there is time) more sophisticated heuristics will be
evaluated, e.g. using the caller/callee site execution ratio in the
inliner. Attempts will also be made to try and keep profiling data
correct after transformations, so e.g. a hot loop with a heavily biased
conditional branch inside it can be unswitched and have only one side
unrolled. Test-cases will be added to "test" designed so that identical
loops and functions exist but with different execution counts, so
differing behaviour on the hot/cold samples should be observed.
Performance will be evaluated using "test-suite" again, though with one
addition. To ensure that the produced code is not over-specialised to
the training profiling data an additional set of data will be used.
MiDatasets (http://ctuning.org/wiki/index.php/CTools:CBench:Downloads)
provides alternatives inputs for the MiBench suite, so that will be used
to test this.
The next large chunk of work will be to make use of the exisiting path
profiling code (PathProfiling.cpp). Superblock Formation will be done
via Tail Duplication (already implemented in TailDuplication.cpp).
Basic profiling data should provide small gains for this, but the true
expected gains come from using path-profiling to create the superblock
such that there is a "trace" through it. For the hot case should result
in good instruction locality, good branch-prediction behaviour and good
scheduling for the "trace". Path-profiling may need to be fixed or
enhanced for this, as it is currently unused it is difficult to know how
well it will work. (Note: I'm stating that it is unused based on the
fact that nothing includes PathProfileInfo.h except the path profiling
itself — it is, however, possible that tools outside the core llvm use
it though I couldn't find anything by means of an internet search). This
will be tested and evaluated in the same way as the previously modified
transformations.
In all the above cases the modifications will be made with the aim that
when profiling data is not available then compiler behaviour will remain
unchanged from its current state. This will be verified by ensuring
identical binaries are produced for "llvm-suite" (LLVM-head vs my local
LLVM without profiling).
The final changes to be made will be to "llvm-prof". It has been noted
on the mailing list that it is not currently compatible with path-based
profiling, so support for this will be added. Finally, overall
performance evaluations and code-cleanups will be performed with the aim
of having the code integrated into LLVM. This will hopefully be done
with the help of the llvm-commits mailing list.
=== Deliverables ===
* Additions to test-suite.
* Reduced profile instrumentation run-time overhead.
* Profile enhanced loop-unrolling.
* Profile enhanced loop-unswitching.
* Profile enhanced inlining.
* Tested (and fixed if required) path profiling.
* Profile and path profile enhanced tail-duplication
(superblock formation).
* llvm-prof fixed to work with path profiling.
=== Timeline ===
During "Bonding Period":
* Install/Setup LLVM (and Clang etc.) and run "test".
* Install "test-suite" and run with profiling.
* Re-read textbooks/papers concerning profiling, path-profiling and
relevant transformations.
* Read any relevant LLVM documentation.
* Add DSPStone to test-suite. (I have worked with it before.)
* Add UTDSP to test-suite as an "External" benchmark suite
(like SPEC). (I have also worked with this before.)
* Write test for path-profiling (there is currently not one
in "test").
Week 1: Eliminate profiling instrumentation for loops where
the iteration count is known at compile time.
Weeks 2-5: Have the heuristics for loop unrolling, unswitching
and inlining use profiling information. Add MiDatasets
to test-suite (this may not be suitable for the
repository though).
Weeks 6-9: Enhance Tail-Duplication to use first basic profiling
and then path profiling. Test and fix path profiling.
Week 10: Fix llvm-prof to work with path-profiling.
Week 11: Performance analysis and code-cleanup.
Week 12: Separate code into sets of independent patches for review
on llvm-commits mailing list. Write final reports.
Some aspects of weeks 12 may well be ready for review earlier than week
12, but I don't want to over-complicate the timeline.
Thank you for taking the time to read this,
Alatair Murray.
More information about the llvm-dev
mailing list