[lld] r248106 - COFF: Fix race condition.

Mon Sep 21 16:08:51 PDT 2015

On Mon, Sep 21, 2015 at 4:04 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Mon, Sep 21, 2015 at 3:56 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> I mean I would do that if there's a portable way in C++11 to get the
>> current thread ID as a small integer.
>>
>
> Or an integer *at all*. Apparently it was decided to use an opaque class
> for this with no way to access it as an integer/void*:
> http://en.cppreference.com/w/cpp/thread/thread/id
>

Yup. I don't think even POSIX provides that functionality. Everything is in
pthread_t which is an opaque data type.

> -- Sean Silva
>
>
>> If we have such function, we could write like this.
>>
>>   thread_local uint64_t NextID = 1;
>>
>>  ...
>>
>>   ID = (getThreadID() << 32) + NextID++;
>>
>> But I don't think such function exists.
>>
>> On Mon, Sep 21, 2015 at 3:49 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Mon, Sep 21, 2015 at 3:35 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Sep 21, 2015 at 8:20 AM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> On Sun, Sep 20, 2015 at 6:42 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>
>>>>>> We can, but I think we don't have to, because I didn't see any
>>>>>> measurable difference between this code and the old racy one.
>>>>>>
>>>>> ... in terms of speed.
>>>>>
>>>>
>>>> On x86 at a hardware level a similar amount of synchronization is
>>>> required for both cases, so this isn't surprising (regardless of whether
>>>> the access is atomic or not, the cache coherence mechanisms kick in).
>>>> Simply replacing a non-atomic access with an atomic access is not enough.
>>>> From inspection of the code, there seems to be quite a bit of work being
>>>> done between increments of this value, so it's probably not going to
>>>> matter, but if you someday change this to use fine-grained parallelism so
>>>> that it scales linearly with # cores (it is highly parallelizable, so this
>>>> should be possible), it will probably be necessary to factor this in.
>>>>
>>>> For reference, I have attached a simple test program that shows the
>>>> relative impact of the different choices.
>>>>
>>>> Sean:~/tmp % clang++ -std=c++11 -DNUM_THREADS=12
>>>> -DNUM_OPS_PER_THREAD=5000000 -O3 testsynchronization.cpp && ./a.out
>>>>
>>>> Per thread, non-atomic: 1.00x (baseline)
>>>>
>>>> Per-thread, atomic:     3.61x slower than baseline
>>>>
>>>> Shared, non-atomic:     29.47x slower than baseline
>>>>
>>>> Shared, atomic:         112.72x slower than baseline
>>>>
>>>> This is on a Xeon E5-2697 v2 (12 physical cores).
>>>>
>>>> As you can see, the effect of making the access non-atomic is very
>>>> small compared to the effect of avoiding cache coherence overhead (3-4x vs.
>>>> 30x).
>>>>
>>>
>>> Thank you for the interesting numbers. Shared atomic variable are indeed
>>> much slower than non-shared, non-atomic variables when you have a lot of
>>> concurrent writes. If I'm really trying to get until the last drop from
>>> multi-cores, I'd do as you suggested so that threads share almost nothing.
>>> I didn't do that simply because the cost of updating the atomic counter is
>>> in practice negligible in this code because we are doing so many things
>>> other than updating the atomic variable. I don't generally optimize unless
>>> doing that makes measurable performance difference.
>>>
>>>
>>>> CC'ing Dmitry who may have more insight to give.
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> 2015/09/20 18:32 "Sean Silva" <chisophugis at gmail.com>:
>>>>>>
>>>>>> It seems unfortunate for us to need any synchronization at all just
>>>>>>> to generate a unique ID. Can we just have a bunch of counters, one per
>>>>>>> thread, then initialize them to `ThreadIndex << 50` or something? (make
>>>>>>> sure they are each on their own cache line)
>>>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>> On Sat, Sep 19, 2015 at 6:44 PM, Rui Ueyama via llvm-commits <
>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Author: ruiu
>>>>>>>> Date: Sat Sep 19 20:44:44 2015
>>>>>>>> New Revision: 248106
>>>>>>>>
>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=248106&view=rev
>>>>>>>> Log:
>>>>>>>> COFF: Fix race condition.
>>>>>>>>
>>>>>>>> NextID is updated inside parallel_for_each, so it needs mutual
>>>>>>>> exclusion.
>>>>>>>>
>>>>>>>> Modified:
>>>>>>>>     lld/trunk/COFF/ICF.cpp
>>>>>>>>
>>>>>>>> Modified: lld/trunk/COFF/ICF.cpp
>>>>>>>> URL:
>>>>>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/ICF.cpp?rev=248106&r1=248105&r2=248106&view=diff
>>>>>>>>
>>>>>>>> ==============================================================================
>>>>>>>> --- lld/trunk/COFF/ICF.cpp (original)
>>>>>>>> +++ lld/trunk/COFF/ICF.cpp Sat Sep 19 20:44:44 2015
>>>>>>>> @@ -73,7 +73,7 @@ private:
>>>>>>>>    bool forEachGroup(std::vector<SectionChunk *> &Chunks,
>>>>>>>> Comparator Eq);
>>>>>>>>    bool partition(ChunkIterator Begin, ChunkIterator End,
>>>>>>>> Comparator Eq);
>>>>>>>>
>>>>>>>> -  uint64_t NextID = 1;
>>>>>>>> +  std::atomic<uint64_t> NextID = { 1 };
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  // Entry point to ICF.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150921/a966500d/attachment-0001.html>