[llvm-dev] RFC: Pass to prune redundant profiling instrumentation

Fri Mar 11 14:54:28 PST 2016

On Fri, Mar 11, 2016 at 2:50 PM, Vedant Kumar <vsk at apple.com> wrote:

>
> > On Mar 11, 2016, at 2:25 PM, Justin Bogner <mail at justinbogner.com>
> wrote:
> >
> > Vedant Kumar <vsk at apple.com> writes:
> >> There have been a lot of responses. I'll try to summarize the thread
> >> and respond to some of the questions/feedback.
> > ...
> >> FE to IR Counter Remapping
> >> ==========================
> >>
> >> I have a question about this plan:
> >>
> >>> for each CFG edge:
> >>>    record which FE counters have ended up associated with it
> >>> remove FE counters
> >>> run IR instrumentation pass
> >>> emit a side table mapping IR instr counters to FE counters
> >>
> >> Currently, -instrprof happens early in the pipeline. IIUC this is done
> to
> >> allow the optimizer to work with load+add+stores, instead of profile
> update
> >> intrinsics.
> >
> > It would be an interesting experiment to see what it would look like to
> > teach optimizations about the instrprof intrinsics and lower them much
> > later. I suspect knowing that these aren't just stores to random memory
> > would enable us to make good decisions in various places.
>
> Do you think we could get good enough results by attaching !invariant.load
> or
> AA metadata to lowered profile counter updates?
>
>
> > Of course, this might end up spreading to much special case knowledge
> > through various optimizations and not be worth it.
> >
> >> Say we introduce a counter remapping pass like the one Sean suggested.
> It
> >> should be run before -instrprof so that we don't waste time lowering a
> bunch
> >> of instrprof_increment intrinsics which we'll have to throw away later.
> >>
> >> But that means that the CFGs that the counter remapping pass operates
> >> on won't reflect changes made by the inliner (or any other
> >> optimizations which alter the CFG), right?
> >>
> >> ISTM the pruning pass I've proposed is useful whether we're doing
> FE-based
> >> instrumentation _or_ late instrumentation. Since it operates on
> loads+stores
> >> directly, it can clean up redundant counter increments at any point in
> the
> >> pipeline (after -instrprof).
>
> I'd like to add an interesting data point to back this up. Revisiting the
> std::sort example, here's what I get with -fprofile-instrument=llvm (again
> using 10^8 array elements, and averaging over 5 runs):
>
> O3:                       0.262s
> O3 + LLVMInstr:           0.705s
> O3 + LLVMInstr + Pruning: 0.644s (47 counter alias mappings created)
>

There is a llvm-pipeline change for llvm instr pending. Once that is in,
 the benefit shown here will probably disappear.

David

> So, it *is* possible for we see real performance improvements by running a
> pruning pass after late IR-based instrumentation.
>
> I still think we need more numbers before moving forward, and will work on
> that.
>
> vedant
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160311/6b03c926/attachment.html>