[PATCH] D11579: [InstrProfiling] Fix data race on profile counters by using AtomicRMW.

Thu Aug 6 16:00:16 PDT 2015

On Thu, Jul 30, 2015 at 10:27 PM, Xinliang David Li <davidxl at google.com>
wrote:

> On Thu, Jul 30, 2015 at 10:15 PM, Sean Silva <chisophugis at gmail.com>
> wrote:
> >
> >
> > On Thu, Jul 30, 2015 at 8:10 PM, Jake VanAdrighem <
> jvanadrighem at gmail.com>
> > wrote:
> >>
> >>
> >>
> >> On Wed, Jul 29, 2015 at 7:27 PM, Sean Silva <chisophugis at gmail.com>
> wrote:
> >>>
> >>>
> >>>
> >>> On Wed, Jul 29, 2015 at 6:55 PM, Bob Wilson <bob.wilson at apple.com>
> wrote:
> >>>>
> >>>>
> >>>> On Jul 29, 2015, at 4:56 PM, Alexey Samsonov <vonosmas at gmail.com>
> wrote:
> >>>>
> >>>> Do relaxed atomics actually introduce that much of slowdown?
> >>>>
> >>>>
> >>>> I would definitely want to see some data showing that they do not slow
> >>>> things down before we decide to do this unconditionally. We’ve
> discussed
> >>>> this issue several times in the past. My recollection is those
> discussions
> >>>> ended with an acknowledgement there is a tradeoff between speed and
> accuracy
> >>>> and that we don’t all agree on where we want to be on that spectrum.
> Adding
> >>>> options to let people choose would be one solution. Good data, on a
> variety
> >>>> of platforms, showing that it doesn’t make much difference would be
> another
> >>>> way to resolve it.
> >>>
> >>>
> >>> +1
> >>>
> >>> In my testing, the overhead of the existing instrumentation is about a
> 2x
> >>> slowdown, which is starting to get close to the range it would be very
> >>> difficult to play an instrumented game. I wouldn't want to make this
> much
> >>> slower. I'm glad to test this for you; I'll try to get around to this
> this
> >>> week.
> >>>
> >>
> >> Sean and I tested Alexey's patch on one of our large titles and got
> >> somewhere in the area of 2.5 to 3x worse performance than without
> AtomicRMW.
> >> For the game we tested, it was basically unplayable.
> >
> >
> > We also noticed that the top 100 functions (out of 10's of thousands;
> i.e.
> > <1%) accounted for over 97% of the total counts (and just the top 10
> cover
> > more than 50% IIRC). The few we manually looked into seemed like they
> would
> > be trivially inlined anyway and/or their high count otherwise doesn't
> seem
> > to really contribute much useful information to the optimizer since we
> > should already be "getting those right" (stuff like simple constructors,
> > getters, or vector operators).
>
> That matches exactly with our observation and the approach we take is
> to address this issue.
>
> >
> > Obviously this is a discussion for another thread, but there seems to be
> > enormous scope for reduction in our profiling overhead for those
> interested
> > in doing that; even just a simple file of 100 functions passed to the
> > compiler instructing those functions to not be instrumented would
> decrease
> > the profiling overhead by over 2 orders of magnitude for this title,
> based
> > on the above data.
> >
>
> yes, this will be another thread. Rong (cc'ed ) has collected lots of
> data (as well as the implementation to address this). He will share it
> early next week.
>

Was Rong going to post something this week?

-- Sean Silva

>
> David
>
>
> > -- Sean Silva
> >
> >
> >>
> >>
> >> Jake Van Adrighem
> >>
> >>>
> >>> Also, in the past David Li suggested that his findings were that not
> >>> using atomic operations "only contribute
> >>> to very small count variations"
> >>> https://groups.google.com/d/msg/llvm-dev/ScLa2xIdo9s/Ow1FPDVVRIoJ
> >>> CC'ing David in case he has more input to the discussion.
> >>>
> >>> -- Sean Silva
> >>>
> >>>
> >>>>
> >>>>
> >>>>
> >>>> You're intentionally introducing a data race, this doesn't look good
> to
> >>>> me at all. However, I'm not confident about
> >>>> what's allowed in LLVM IR - it's not C++ where any source-level data
> >>>> race is UB, but not an x86 assembly either.
> >>>>
> >>>> On Wed, Jul 29, 2015 at 4:43 PM, Justin Bogner <mail at justinbogner.com
> >
> >>>> wrote:
> >>>>>
> >>>>> Alexey Samsonov <vonosmas at gmail.com> writes:
> >>>>> > samsonov created this revision.
> >>>>> > samsonov added reviewers: dnovillo, bogner.
> >>>>> > samsonov added a subscriber: llvm-commits.
> >>>>> >
> >>>>> > Since we introduced counters for functions in COMDAT sections (e.g.
> >>>>> > inline functions from STL headers), these headers can easily be
> >>>>> > incremented concurrently by multiple threads. Replace
> load-add-store
> >>>>> > with a single "atomicrmw add" with monotonic memory ordering.
> >>>>>
> >>>>> This significantly changes the performance characteristics of this
> >>>>> code,
> >>>>> pessimizing single-threaded users and potentially making the
> >>>>> multithreaded performance issues even worse.
> >>>>>
> >>>>> It's fine to add an option to lower these to atomics, since this does
> >>>>> guarantee accuracy, but I think we need a switch to choose which kind
> >>>>> of
> >>>>> lowering we're interested in in that case.
> >>>>>
> >>>>> > http://reviews.llvm.org/D11579
> >>>>> >
> >>>>> > Files:
> >>>>> >   lib/Transforms/Instrumentation/InstrProfiling.cpp
> >>>>> >
> >>>>> > Index: lib/Transforms/Instrumentation/InstrProfiling.cpp
> >>>>> > ===================================================================
> >>>>> > --- lib/Transforms/Instrumentation/InstrProfiling.cpp
> >>>>> > +++ lib/Transforms/Instrumentation/InstrProfiling.cpp
> >>>>> > @@ -147,9 +147,9 @@
> >>>>> >    IRBuilder<> Builder(Inc->getParent(), *Inc);
> >>>>> >    uint64_t Index = Inc->getIndex()->getZExtValue();
> >>>>> >    Value *Addr = Builder.CreateConstInBoundsGEP2_64(Counters, 0,
> >>>>> > Index);
> >>>>> > -  Value *Count = Builder.CreateLoad(Addr, "pgocount");
> >>>>> > -  Count = Builder.CreateAdd(Count, Builder.getInt64(1));
> >>>>> > -  Inc->replaceAllUsesWith(Builder.CreateStore(Count, Addr));
> >>>>> > +  Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr,
> >>>>> > Builder.getInt64(1),
> >>>>> > +                          llvm::Monotonic);
> >>>>> > +  assert(Inc->use_empty() && "InstrProfIncrementInst has uses!");
> >>>>> >    Inc->eraseFromParent();
> >>>>> >  }
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Alexey Samsonov
> >>>> vonosmas at gmail.com
> >>>> _______________________________________________
> >>>> llvm-commits mailing list
> >>>> llvm-commits at cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> llvm-commits mailing list
> >>>> llvm-commits at cs.uiuc.edu
> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150806/95145315/attachment.html>