[llvm-dev] [GSoc 2016] Proposal - Capture Tracking Improvements

Fri Apr 22 09:32:22 PDT 2016

(I've copied the body of the text from the PDF into the email and am 
replying inline to particularly parts.)

On 04/16/2016 10:43 AM, Scott Egerton wrote:
> Hello,
>
> Attached is the proposal that I have submitted. I would be grateful for any and all feedback provided.
>
> Many Thanks,
> Scott
>
> C APTURE T RACKING I MPROVEMENTS
> ABSTRACT
> Capture tracking analysis is an analysis pass used to determine which pointers are
> “captured”. This means that a function has made a copy of a pointer that has outlived
> the function that called it.
Subtle but important clarification: "has outlived" to "may potentially 
outlive".  As with most compiler analysis, we have to be conservative 
when we can't fully analyze.
> This information is useful during the optimisation process. I
> believe that this is used to improve memory management.
Via aliasing primarily.  Yes.
> The capture tracking analysis is currently inefficient and inaccurate in cases due to
> the fact that it returns false positives and expensive functions are unnecessarily
> repeated.
We'll always have false positives.  The trick is to get a reasonably 
small number with a reasonable impact on compile time.  The universal 
problem balance in a compiler.
> This is where I would make improvements. It could be improved in a
> number of ways, as mentioned by Philip Reames on the mailing list. I would like to
> use this opportunity to take my previous experience within LLVM and apply it to other
> areas of LLVM.
>
> BENEFITS
>
> Improvements to pointer capture tracking would be greatly beneficial. Changes such
> as the removal of false positives as well as the reduction to the cost through the
> introduction of caching will be made. Caching the results of certain functions, storing
> the result, will cause performance increases, should the same data be requested
> again as the result will already be stored. As a result of this, it could become a more
> valuable tool to have within the LLVM suite. This would also cause improvements to
> the code optimisation process and potentially increase the quality of code compiled
> with LLVM. It should also use up fewer resources during the analysis pass.
>
> DELIVERABLES
>
> There are cases where “potentially captured” is returned incorrectly. As a
> starter task I would remove the known and unknown cases that cause this to
> occur. This would serve as a good introduction to LLVM compiler analysis and
> would be achievable in a short time span.
> Approximately 2 weeks.
It would be good to gather a concrete list of known false positives.  I 
can help with this, but I'd like you to do a first pass over the code 
and bug tracking system first.  Another approach is to instrument the 
analysis, print each potentially captured result, and scan through the 
output when compiling a largish piece of code.  Do you see a common 
pattern which looks worth analyzing further?

FYI, two weeks may be very optimistic.  I suspect you'll iterate on this 
a couple of times finding more and more cases each time. Getting each 
change implemented and reviewed will take a couple of days.  Thankfully, 
the analysis and review time should easily overlap.  We can also overlap 
this with the design work for the next part.
> By making changes to the current design, this could be made to be more cost
> effective than it currently is. I would do this by caching the results as
> previously suggested. This will be used to invalidate results when required.
> Approximately 6 weeks.
You will definitely need a more concrete proposal for what caching and 
invalidation scheme you're using.  I'd suggest reviewing callers of 
PointerMayBeCaptured to get a sense for what's going on.  I can also 
help with this.  (We should setup a skype call or something if your 
proposal gets approved.)

Once we have a proposal, we'll share this before you start 
implementation.  That will give us a chance to find design problems 
early.  :)
> The analysis could be made more accurate in order to recognise object sub-
> graphs which do not escape.
> Approximately 5 weeks.
I honestly doubt you'll get to this in the summer, but that's okay. Once 
we get closer to actually working on this, we'll need to drill in and 
make a more concrete proposal.  There's a bunch of algorithms which 
could be used here and picking one will take some thought.
> 1 | S COTT E GERTONThis work will link in with other ongoing work within LLVM and will assist other
> developers working on compiler analysis. I believe that I may be able to complete
> this work within 11 weeks; however I have allocated the extra time to account for any
> issues I may encounter.
I'm going to push strongly to have you contributing incrementally 
through the normal LLVM development process.  This will mean that if 
something runs long and you don't get to finish, most of the supporting 
work will already have landed.  (e.g. If we don't get to the object 
graph part, at least we'll have the caching improvements.)
> BIOGRAPHICAL INFORMATION
> I have been working on LLVM for the past year as an industrial placement student
> within the MIPS compiler team and am keen to do more LLVM work. I am a BSc
> Computer Science student at the University of Hull and will begin my final year of
> study in September. In the past I have worked on a University project which required
> me to be inventive with data structures in order to efficiently solve the given problem.
> I thoroughly enjoyed this and would be grateful for another opportunity to exercise
> my design skills.
> Due to other commitments with my current industrial placement, unfortunately I will
> be unable to work on this project during working hours (9-5 GMT) until the first week
> of June. However, I am more than happy to make up this time as the summer
> progresses.
June is fine by me.  If anything, that works better for my schedule anyways.

Philip