[PATCH] D11579: [InstrProfiling] Fix data race on profile counters by using AtomicRMW.

Wed Jul 29 22:13:17 PDT 2015

I also recommend guarding this with an option -- this is what Google
GCC does too.

>
>
> +1
>
> In my testing, the overhead of the existing instrumentation is about a 2x
> slowdown, which is starting to get close to the range it would be very
> difficult to play an instrumented game. I wouldn't want to make this much
> slower. I'm glad to test this for you; I'll try to get around to this this
> week.
>
> Also, in the past David Li suggested that his findings were that not using
> atomic operations "only contribute
> to very small count variations"
> https://groups.google.com/d/msg/llvm-dev/ScLa2xIdo9s/Ow1FPDVVRIoJ
> CC'ing David in case he has more input to the discussion.

yes, this is true. Heavy data races can lead to large precision loss
to certain implementation of value profilers. Betul's value profile
patch for LLVM (under review, pending check in) does not suffer that
problem though.

Also we have been working on reducing instrumentation runtime overhead
and seen great results. RFC and patches will be sent out soon.

David

>
> -- Sean Silva
>
>
>>
>>
>>
>> You're intentionally introducing a data race, this doesn't look good to me
>> at all. However, I'm not confident about
>> what's allowed in LLVM IR - it's not C++ where any source-level data race
>> is UB, but not an x86 assembly either.
>>
>> On Wed, Jul 29, 2015 at 4:43 PM, Justin Bogner <mail at justinbogner.com>
>> wrote:
>>>
>>> Alexey Samsonov <vonosmas at gmail.com> writes:
>>> > samsonov created this revision.
>>> > samsonov added reviewers: dnovillo, bogner.
>>> > samsonov added a subscriber: llvm-commits.
>>> >
>>> > Since we introduced counters for functions in COMDAT sections (e.g.
>>> > inline functions from STL headers), these headers can easily be
>>> > incremented concurrently by multiple threads. Replace load-add-store
>>> > with a single "atomicrmw add" with monotonic memory ordering.
>>>
>>> This significantly changes the performance characteristics of this code,
>>> pessimizing single-threaded users and potentially making the
>>> multithreaded performance issues even worse.
>>>
>>> It's fine to add an option to lower these to atomics, since this does
>>> guarantee accuracy, but I think we need a switch to choose which kind of
>>> lowering we're interested in in that case.
>>>
>>> > http://reviews.llvm.org/D11579
>>> >
>>> > Files:
>>> >   lib/Transforms/Instrumentation/InstrProfiling.cpp
>>> >
>>> > Index: lib/Transforms/Instrumentation/InstrProfiling.cpp
>>> > ===================================================================
>>> > --- lib/Transforms/Instrumentation/InstrProfiling.cpp
>>> > +++ lib/Transforms/Instrumentation/InstrProfiling.cpp
>>> > @@ -147,9 +147,9 @@
>>> >    IRBuilder<> Builder(Inc->getParent(), *Inc);
>>> >    uint64_t Index = Inc->getIndex()->getZExtValue();
>>> >    Value *Addr = Builder.CreateConstInBoundsGEP2_64(Counters, 0,
>>> > Index);
>>> > -  Value *Count = Builder.CreateLoad(Addr, "pgocount");
>>> > -  Count = Builder.CreateAdd(Count, Builder.getInt64(1));
>>> > -  Inc->replaceAllUsesWith(Builder.CreateStore(Count, Addr));
>>> > +  Builder.CreateAtomicRMW(AtomicRMWInst::Add, Addr,
>>> > Builder.getInt64(1),
>>> > +                          llvm::Monotonic);
>>> > +  assert(Inc->use_empty() && "InstrProfIncrementInst has uses!");
>>> >    Inc->eraseFromParent();
>>> >  }
>>>
>>
>>
>>
>> --
>> Alexey Samsonov
>> vonosmas at gmail.com
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>