[PATCH] D11579: [InstrProfiling] Fix data race on profile counters by using AtomicRMW.

Thu Aug 6 16:07:27 PDT 2015

yes -- hopefully tomorrow ..

David

On Thu, Aug 6, 2015 at 4:00 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Thu, Jul 30, 2015 at 10:27 PM, Xinliang David Li <davidxl at google.com>
> wrote:
>>
>> On Thu, Jul 30, 2015 at 10:15 PM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>> >
>> >
>> > On Thu, Jul 30, 2015 at 8:10 PM, Jake VanAdrighem
>> > <jvanadrighem at gmail.com>
>> > wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Jul 29, 2015 at 7:27 PM, Sean Silva <chisophugis at gmail.com>
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Jul 29, 2015 at 6:55 PM, Bob Wilson <bob.wilson at apple.com>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>> On Jul 29, 2015, at 4:56 PM, Alexey Samsonov <vonosmas at gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> Do relaxed atomics actually introduce that much of slowdown?
>> >>>>
>> >>>>
>> >>>> I would definitely want to see some data showing that they do not
>> >>>> slow
>> >>>> things down before we decide to do this unconditionally. We’ve
>> >>>> discussed
>> >>>> this issue several times in the past. My recollection is those
>> >>>> discussions
>> >>>> ended with an acknowledgement there is a tradeoff between speed and
>> >>>> accuracy
>> >>>> and that we don’t all agree on where we want to be on that spectrum.
>> >>>> Adding
>> >>>> options to let people choose would be one solution. Good data, on a
>> >>>> variety
>> >>>> of platforms, showing that it doesn’t make much difference would be
>> >>>> another
>> >>>> way to resolve it.
>> >>>
>> >>>
>> >>> +1
>> >>>
>> >>> In my testing, the overhead of the existing instrumentation is about a
>> >>> 2x
>> >>> slowdown, which is starting to get close to the range it would be very
>> >>> difficult to play an instrumented game. I wouldn't want to make this
>> >>> much
>> >>> slower. I'm glad to test this for you; I'll try to get around to this
>> >>> this
>> >>> week.
>> >>>
>> >>
>> >> Sean and I tested Alexey's patch on one of our large titles and got
>> >> somewhere in the area of 2.5 to 3x worse performance than without
>> >> AtomicRMW.
>> >> For the game we tested, it was basically unplayable.
>> >
>> >
>> > We also noticed that the top 100 functions (out of 10's of thousands;
>> > i.e.
>> > <1%) accounted for over 97% of the total counts (and just the top 10
>> > cover
>> > more than 50% IIRC). The few we manually looked into seemed like they
>> > would
>> > be trivially inlined anyway and/or their high count otherwise doesn't
>> > seem
>> > to really contribute much useful information to the optimizer since we
>> > should already be "getting those right" (stuff like simple constructors,
>> > getters, or vector operators).
>>
>> That matches exactly with our observation and the approach we take is
>> to address this issue.
>>
>> >
>> > Obviously this is a discussion for another thread, but there seems to be
>> > enormous scope for reduction in our profiling overhead for those
>> > interested
>> > in doing that; even just a simple file of 100 functions passed to the
>> > compiler instructing those functions to not be instrumented would
>> > decrease
>> > the profiling overhead by over 2 orders of magnitude for this title,
>> > based
>> > on the above data.
>> >
>>
>> yes, this will be another thread. Rong (cc'ed ) has collected lots of
>> data (as well as the implementation to address this). He will share it
>> early next week.
>
>
> Was Rong going to post something this week?
>
> -- Sean Silva
>
>>
>>
>> David
>>
>>
>> > -- Sean Silva
>> >
>> >
>> >>
>> >>
>> >> Jake Van Adrighem
>> >>
>> >>>
>> >>> Also, in the past David Li suggested that his findings were that not
>> >>> using atomic operations "only contribute
>> >>> to very small count variations"
>> >>> https://groups.google.com/d/msg/llvm-dev/ScLa2xIdo9s/Ow1FPDVVRIoJ
>> >>> CC'ing David in case he has more input to the discussion.
>> >>>
>> >>> -- Sean Silva
>> >>>
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> You're intentionally introducing a data race, this doesn't look good
>> >>>> to
>> >>>> me at all. However, I'm not confident about
>> >>>> what's allowed in LLVM IR - it's not C++ where any source-level data
>> >>>> race is UB, but not an x86 assembly either.
>> >>>>
>> >>>> On Wed, Jul 29, 2015 at 4:43 PM, Justin Bogner
>> >>>> <mail at justinbogner.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Alexey Samsonov <vonosmas at gmail.com> writes:
>> >>>>> > samsonov created this revision.
>> >>>>> > samsonov added reviewers: dnovillo, bogner.
>> >>>>> > samsonov added a subscriber: llvm-commits.
>> >>>>> >
>> >>>>> > Since we introduced counters for functions in COMDAT sections
>> >>>>> > (e.g.
>> >>>>> > inline functions from STL headers), these headers can easily be
>> >>>>> > incremented concurrently by multiple threads. Replace
>> >>>>> > load-add-store
>> >>>>> > with a single "atomicrmw add" with monotonic memory ordering.
>> >>>>>
>> >>>>> This significantly changes the performance characteristics of this
>> >>>>> code,
>> >>>>> pessimizing single-threaded users and potentially making the
>> >>>>> multithreaded performance issues even worse.
>> >>>>>
>> >>>>> It's fine to add an option to lower these to atomics, since this
>> >>>>> does
>> >>>>> guarantee accuracy, but I think we need a switch to choose which
>> >>>>> kind
>> >>>>> of
>> >>>>> lowering we're interested in in that case.
>> >>>>>
>> >>>>> > http://reviews.llvm.org/D11579
>> >>>>> >
>> >>>>> > Files:
>> >>>>> >   lib/Transforms/Instrumentation/InstrProfiling.cpp
>> >>>>> >
>> >>>>> > Index: lib/Transforms/Instrumentation/InstrProfiling.cpp
>> >>>>> >
>> >>>>> > ===================================================================
>> >>>>> > --- lib/Transforms/Instrumentation/InstrProfiling.cpp
>> >>>>> > +++ lib/Transforms/Instrumentation/InstrProfiling.cpp
>> >>>>> > @@ -147,9 +147,9 @@
>> >>>>> >    IRBuilder<> Builder(Inc->getParent(), *Inc);
>> >>>>> >    uint64_t Index = Inc->getIndex()->getZExtValue();
>> >>>>> >    Value *Addr = Builder.CreateConstInBoundsGEP2_64(Counters, 0,
>> >>>>> > Index);
>> >>>>> > -  Value *Count = Builder.CreateLoad(Addr, "pgocount");
>> >>>>> > -  Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>> >>>>> > -  Inc->replaceAllUsesWith(Builder.CreateStore(Count, Addr));
>> >>>>> > +  Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr,
>> >>>>> > Builder.getInt64(1),
>> >>>>> > +                          llvm::Monotonic);
>> >>>>> > +  assert(Inc->use_empty() && "InstrProfIncrementInst has uses!");
>> >>>>> >    Inc->eraseFromParent();
>> >>>>> >  }
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Alexey Samsonov
>> >>>> vonosmas at gmail.com
>> >>>> _______________________________________________
>> >>>> llvm-commits mailing list
>> >>>> llvm-commits at cs.uiuc.edu
>> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> llvm-commits mailing list
>> >>>> llvm-commits at cs.uiuc.edu
>> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> llvm-commits mailing list
>> >>> llvm-commits at cs.uiuc.edu
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >>>
>> >>
>> >
>
>