[PATCH] D11579: [InstrProfiling] Fix data race on profile counters by using AtomicRMW.

Thu Jul 30 22:27:42 PDT 2015

On Thu, Jul 30, 2015 at 10:15 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Thu, Jul 30, 2015 at 8:10 PM, Jake VanAdrighem <jvanadrighem at gmail.com>
> wrote:
>>
>>
>>
>> On Wed, Jul 29, 2015 at 7:27 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>>
>>>
>>>
>>> On Wed, Jul 29, 2015 at 6:55 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>>>>
>>>>
>>>> On Jul 29, 2015, at 4:56 PM, Alexey Samsonov <vonosmas at gmail.com> wrote:
>>>>
>>>> Do relaxed atomics actually introduce that much of slowdown?
>>>>
>>>>
>>>> I would definitely want to see some data showing that they do not slow
>>>> things down before we decide to do this unconditionally. We’ve discussed
>>>> this issue several times in the past. My recollection is those discussions
>>>> ended with an acknowledgement there is a tradeoff between speed and accuracy
>>>> and that we don’t all agree on where we want to be on that spectrum. Adding
>>>> options to let people choose would be one solution. Good data, on a variety
>>>> of platforms, showing that it doesn’t make much difference would be another
>>>> way to resolve it.
>>>
>>>
>>> +1
>>>
>>> In my testing, the overhead of the existing instrumentation is about a 2x
>>> slowdown, which is starting to get close to the range it would be very
>>> difficult to play an instrumented game. I wouldn't want to make this much
>>> slower. I'm glad to test this for you; I'll try to get around to this this
>>> week.
>>>
>>
>> Sean and I tested Alexey's patch on one of our large titles and got
>> somewhere in the area of 2.5 to 3x worse performance than without AtomicRMW.
>> For the game we tested, it was basically unplayable.
>
>
> We also noticed that the top 100 functions (out of 10's of thousands; i.e.
> <1%) accounted for over 97% of the total counts (and just the top 10 cover
> more than 50% IIRC). The few we manually looked into seemed like they would
> be trivially inlined anyway and/or their high count otherwise doesn't seem
> to really contribute much useful information to the optimizer since we
> should already be "getting those right" (stuff like simple constructors,
> getters, or vector operators).

That matches exactly with our observation and the approach we take is
to address this issue.

>
> Obviously this is a discussion for another thread, but there seems to be
> enormous scope for reduction in our profiling overhead for those interested
> in doing that; even just a simple file of 100 functions passed to the
> compiler instructing those functions to not be instrumented would decrease
> the profiling overhead by over 2 orders of magnitude for this title, based
> on the above data.
>

yes, this will be another thread. Rong (cc'ed ) has collected lots of
data (as well as the implementation to address this). He will share it
early next week.

David

> -- Sean Silva
>
>
>>
>>
>> Jake Van Adrighem
>>
>>>
>>> Also, in the past David Li suggested that his findings were that not
>>> using atomic operations "only contribute
>>> to very small count variations"
>>> https://groups.google.com/d/msg/llvm-dev/ScLa2xIdo9s/Ow1FPDVVRIoJ
>>> CC'ing David in case he has more input to the discussion.
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>>
>>>>
>>>> You're intentionally introducing a data race, this doesn't look good to
>>>> me at all. However, I'm not confident about
>>>> what's allowed in LLVM IR - it's not C++ where any source-level data
>>>> race is UB, but not an x86 assembly either.
>>>>
>>>> On Wed, Jul 29, 2015 at 4:43 PM, Justin Bogner <mail at justinbogner.com>
>>>> wrote:
>>>>>
>>>>> Alexey Samsonov <vonosmas at gmail.com> writes:
>>>>> > samsonov created this revision.
>>>>> > samsonov added reviewers: dnovillo, bogner.
>>>>> > samsonov added a subscriber: llvm-commits.
>>>>> >
>>>>> > Since we introduced counters for functions in COMDAT sections (e.g.
>>>>> > inline functions from STL headers), these headers can easily be
>>>>> > incremented concurrently by multiple threads. Replace load-add-store
>>>>> > with a single "atomicrmw add" with monotonic memory ordering.
>>>>>
>>>>> This significantly changes the performance characteristics of this
>>>>> code,
>>>>> pessimizing single-threaded users and potentially making the
>>>>> multithreaded performance issues even worse.
>>>>>
>>>>> It's fine to add an option to lower these to atomics, since this does
>>>>> guarantee accuracy, but I think we need a switch to choose which kind
>>>>> of
>>>>> lowering we're interested in in that case.
>>>>>
>>>>> > http://reviews.llvm.org/D11579
>>>>> >
>>>>> > Files:
>>>>> >   lib/Transforms/Instrumentation/InstrProfiling.cpp
>>>>> >
>>>>> > Index: lib/Transforms/Instrumentation/InstrProfiling.cpp
>>>>> > ===================================================================
>>>>> > --- lib/Transforms/Instrumentation/InstrProfiling.cpp
>>>>> > +++ lib/Transforms/Instrumentation/InstrProfiling.cpp
>>>>> > @@ -147,9 +147,9 @@
>>>>> >    IRBuilder<> Builder(Inc->getParent(), *Inc);
>>>>> >    uint64_t Index = Inc->getIndex()->getZExtValue();
>>>>> >    Value *Addr = Builder.CreateConstInBoundsGEP2_64(Counters, 0,
>>>>> > Index);
>>>>> > -  Value *Count = Builder.CreateLoad(Addr, "pgocount");
>>>>> > -  Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>>>>> > -  Inc->replaceAllUsesWith(Builder.CreateStore(Count, Addr));
>>>>> > +  Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr,
>>>>> > Builder.getInt64(1),
>>>>> > +                          llvm::Monotonic);
>>>>> > +  assert(Inc->use_empty() && "InstrProfIncrementInst has uses!");
>>>>> >    Inc->eraseFromParent();
>>>>> >  }
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Alexey Samsonov
>>>> vonosmas at gmail.com
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>
>