[LLVMdev] GSoC Proposal: Profiling Enhancements

Thu Apr 5 11:18:51 PDT 2012

Hello Everyone,

Before I get started I just want to sincerely apologise for not getting 
feedback on this earlier.  I've had an extremely busy week as I was 
presenting a paper at the CGO conference.  If anyone is able to provide 
feedback in such a short time-frame then it will be gratefully received. 
  If not, then I just hope the work described sounds useful.  I have 
already submitted this proposal due to the short time until the 
deadline, but I can make changes.  (By the way, no need to worry about 
me being so busy during the coding period, see below.)

Proposal
--------

=== Overview ===

LLVM already contains a profiling framework, but only a handful of 
transforms make use of the metadata.  Further, it even contains a path 
profiling framework, but no transforms make use of it.  This "Google 
Summer of Code" proposal lays out an achievable plan to enhance 
profiling in LLVM and to use profiling metadata in key transformations 
where it can have a strong positive effect.

=== Why this is Useful for LLVM ===

Many profiling-data friendly transformations in LLVM do not currently 
use that data.  Specific examples are loop-unrolling, loop-unswitching 
and inlining that can all find good performance improvements, though at 
the cost of increased code-size.  Using profiling data with these three 
transformations would allow faster code in hot areas by applying these 
more aggressively and smaller code cold areas where minimising 
instruction cache cold-hits is the key performance concern.

Additionally, the path profiling framework was added to LLVM in January 
2011 but hasn't been touched since (except for wide-spread API updates). 
  As no transforms use the the produced metadata the code never gets 
tested (though the mailing lists suggest a small number of people may 
still be using it for external purposes).  This code is in serious 
danger of suffering from "bit-rot".  This proposal suggests enhancing a 
transformation to use this information to improve code-quality, and as a 
side-effect this stale code will be testable. Superblock formation 
already has a basic implementation in LLVM (it is called 
Tail-Duplication) but it does not use profiling information.  It can be 
improved by using either basic profiling information (for small gains) 
or path profiling information (for larger gains).

The aim of this proposal is that it should result in code which really 
will be integrated into LLVM.  Scanning a small subset of previous 
"Google Summer of Code" efforts (for many projects, not just LLVM) it 
appeared that few projects were actually able to use the contributed 
code.  Often students would do a good job, but would not quite truly 
complete the work.  Once summer was over no-one had time to polish the 
code, so it would be left unused.  With this in mind I am proposing an 
achievable but useful project which is decomposable into related 
sub-goals.  Time has been left in the proposal plan to produce code of 
high enough quality to be committed to the LLVM repository, and to have 
time to find sensible default parameters for heuristics that are 
changed.  Also, risk is reduced as the project is decomposable into 
seven largely independent sub-goals, even if one aspect does not work, 
the other six should still be usable by the LLVM project.

=== About Me ===

I am a PhD student at the University of Edinburgh, in the UK.  I am very 
close to the end of my studies and have a few months of time available 
that line-up perfectly with "Google Summer of Code" (I won't describe 
why I have the time here, but feel free to ask me via private email or 
IM if you're curious).  During my PhD I produced seven papers (five as 
first author) based on research built on the GCC and SUIF compilers.  I 
have over four years of full-time experience working with compilers, and 
another two years during my undergraduate degree where compilers were 
part of my studies.  My research has focussed on both code-generation 
and middle-end transformations (my publications: 
http://homepages.inf.ed.ac.uk/s0233454/pubs.html).  In the past I have 
also undertaken internships which involved hand-coding assembly, so I've 
no shortage of experience working on optimisation.

I have not worked with LLVM before, but I am genuinely very keen to do 
so.  Looking at job advertisements and having just attended CGO 2012 I 
noticed at both that LLVM is now used more than GCC (required for more 
job postings, used in more papers).  That is one reason why this 
proposal touches multiple transformations within LLVM, I'd like to learn 
as much as possible about LLVM while using my existing skills to produce 
something useful for the project.  The opportunity to have an expert 
help mentor me through LLVM is an excellent opportunity.  Hopefully my 
general compiler experience can offset my lack of LLVM experience as it 
always fun to get to use new technologies.  Additionally I have worked 
on multiple projects written in C++ during my studies so the programming 
aspects of this proposal will not present any issues.

Contact information:
	* CUT -- This is in the official submission *

=== Further Proposal Details ===

All of the below used http://llvm.org/OpenProjects.html and 
http://nondot.org/sabre/LLVMNotes/ as a starting point.

The plan is to become familiar with working within the LLVM 
infrastructure during the "Bonding Period".  The best way to do that, of 
course, is to get stuck in, so I plan to add a few small benchmark items 
to "test-suite".  One of the above links says that DSPStone and UTDSP 
should be added.  As I have worked with both of these before I will add 
these to "test-suite".  The process of doing so is probably the best way 
to learn how the LLVM testing infrastructure works.  UTDSP, however, is 
non-free for commercial use, so I will only be able to add that as an 
external benchmark (like SPEC).

Further to this, the path-profiling pass currently in LLVM does not seem 
to have a test-case in "test", so I will create one to aid me in 
learning how verification works in LLVM.

Once the coding period begins I will first want to start working with 
the profiling infrastructure directly.  I will modify the profiler so 
that run-time instrumentation is not inserted for loops with iterations 
counts that are known at compile time.  This will be added to 
OptimalEdgeProfiling.cpp, and assuming it has a very-low compile-time 
overhead it will also be added to EdgeProfiling.cpp.  Test-cases will be 
created for "test", and compile-time overhead and run-time benefits will 
be measured using "test-suite".

After this three transforms will be modified to exploit profiling data: 
loop unrolling (LoopUnrollPass.cpp), unswitching (LoopUnswitch.cpp) and 
inlining (Inliner.cpp).  This will be done by modifying the cost 
heuristics (e.g. in LoopUnroll::runOnLoop or Inliner::shouldInline). 
Initially this will be done by raising the threshold which limits the 
transformation on hot code, and lowering it on cold code (changing both 
so perhaps the overall produced code size will not change much).  If 
required (or if there is time) more sophisticated heuristics will be 
evaluated, e.g. using the caller/callee site execution ratio in the 
inliner.  Attempts will also be made to try and keep profiling data 
correct after transformations, so e.g. a hot loop with a heavily biased 
conditional branch inside it can be unswitched and have only one side 
unrolled.  Test-cases will be added to "test" designed so that identical 
loops and functions exist but with different execution counts, so 
differing behaviour on the hot/cold samples should be observed. 
Performance will be evaluated using "test-suite" again, though with one 
addition.  To ensure that the produced code is not over-specialised to 
the training profiling data an additional set of data will be used. 
MiDatasets (http://ctuning.org/wiki/index.php/CTools:CBench:Downloads) 
provides alternatives inputs for the MiBench suite, so that will be used 
to test this.

The next large chunk of work will be to make use of the exisiting path 
profiling code (PathProfiling.cpp).  Superblock Formation will be done 
via Tail Duplication (already implemented in TailDuplication.cpp). 
Basic profiling data should provide small gains for this, but the true 
expected gains come from using path-profiling to create the superblock 
such that there is a "trace" through it.  For the hot case should result 
in good instruction locality, good branch-prediction behaviour and good 
scheduling for the "trace".  Path-profiling may need to be fixed or 
enhanced for this, as it is currently unused it is difficult to know how 
well it will work. (Note: I'm stating that it is unused based on the 
fact that nothing includes PathProfileInfo.h except the path profiling 
itself — it is, however, possible that tools outside the core llvm use 
it though I couldn't find anything by means of an internet search). This 
will be tested and evaluated in the same way as the previously modified 
transformations.

In all the above cases the modifications will be made with the aim that 
when profiling data is not available then compiler behaviour will remain 
unchanged from its current state.  This will be verified by ensuring 
identical binaries are produced for "llvm-suite" (LLVM-head vs my local 
LLVM without profiling).

The final changes to be made will be to "llvm-prof". It has been noted 
on the mailing list that it is not currently compatible with path-based 
profiling, so support for this will be added.  Finally, overall 
performance evaluations and code-cleanups will be performed with the aim 
of having the code integrated into LLVM.  This will hopefully be done 
with the help of the llvm-commits mailing list.

=== Deliverables ===

* Additions to test-suite.
* Reduced profile instrumentation run-time overhead.
* Profile enhanced loop-unrolling.
* Profile enhanced loop-unswitching.
* Profile enhanced inlining.
* Tested (and fixed if required) path profiling.
* Profile and path profile enhanced tail-duplication
   (superblock formation).
* llvm-prof fixed to work with path profiling.

=== Timeline ===

During "Bonding Period":
	* Install/Setup LLVM (and Clang etc.) and run "test".
	* Install "test-suite" and run with profiling.
	* Re-read textbooks/papers concerning profiling, path-profiling and 
relevant transformations.
	* Read any relevant LLVM documentation.
	* Add DSPStone to test-suite. (I have worked with it before.)
	* Add UTDSP to test-suite as an "External" benchmark suite
           (like SPEC). (I have also worked with this before.)
	* Write test for path-profiling (there is currently not one
           in "test").

Week 1:  Eliminate profiling instrumentation for loops where
          the iteration count is known at compile time.
Weeks 2-5:  Have the heuristics for loop unrolling, unswitching
             and inlining use profiling information. Add MiDatasets
             to test-suite (this may not be suitable for the
             repository though).
Weeks 6-9: Enhance Tail-Duplication to use first basic profiling
            and then path profiling.  Test and fix path profiling.
Week 10: Fix llvm-prof to work with path-profiling.
Week 11: Performance analysis and code-cleanup.
Week 12: Separate code into sets of independent patches for review
          on llvm-commits mailing list. Write final reports.

Some aspects of weeks 12 may well be ready for review earlier than week 
12, but I don't want to over-complicate the timeline.

Thank you for taking the time to read this,
Alatair Murray.