[lld] r248106 - COFF: Fix race condition.

Mon Sep 21 15:35:12 PDT 2015

On Mon, Sep 21, 2015 at 8:20 AM, Rui Ueyama <ruiu at google.com> wrote:

> On Sun, Sep 20, 2015 at 6:42 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> We can, but I think we don't have to, because I didn't see any measurable
>> difference between this code and the old racy one.
>>
> ... in terms of speed.
>

On x86 at a hardware level a similar amount of synchronization is required
for both cases, so this isn't surprising (regardless of whether the access
is atomic or not, the cache coherence mechanisms kick in). Simply replacing
a non-atomic access with an atomic access is not enough. From inspection of
the code, there seems to be quite a bit of work being done between
increments of this value, so it's probably not going to matter, but if you
someday change this to use fine-grained parallelism so that it scales
linearly with # cores (it is highly parallelizable, so this should be
possible), it will probably be necessary to factor this in.

For reference, I have attached a simple test program that shows the
relative impact of the different choices.

Sean:~/tmp % clang++ -std=c++11 -DNUM_THREADS=12
-DNUM_OPS_PER_THREAD=5000000 -O3 testsynchronization.cpp && ./a.out

Per thread, non-atomic: 1.00x (baseline)

Per-thread, atomic:     3.61x slower than baseline

Shared, non-atomic:     29.47x slower than baseline

Shared, atomic:         112.72x slower than baseline

This is on a Xeon E5-2697 v2 (12 physical cores).

As you can see, the effect of making the access non-atomic is very small
compared to the effect of avoiding cache coherence overhead (3-4x vs. 30x).

CC'ing Dmitry who may have more insight to give.

-- Sean Silva

>
>
>
>> 2015/09/20 18:32 "Sean Silva" <chisophugis at gmail.com>:
>>
>> It seems unfortunate for us to need any synchronization at all just to
>>> generate a unique ID. Can we just have a bunch of counters, one per thread,
>>> then initialize them to `ThreadIndex << 50` or something? (make sure they
>>> are each on their own cache line)
>>>
>>> -- Sean Silva
>>>
>>> On Sat, Sep 19, 2015 at 6:44 PM, Rui Ueyama via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> Author: ruiu
>>>> Date: Sat Sep 19 20:44:44 2015
>>>> New Revision: 248106
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=248106&view=rev
>>>> Log:
>>>> COFF: Fix race condition.
>>>>
>>>> NextID is updated inside parallel_for_each, so it needs mutual
>>>> exclusion.
>>>>
>>>> Modified:
>>>>     lld/trunk/COFF/ICF.cpp
>>>>
>>>> Modified: lld/trunk/COFF/ICF.cpp
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/lld/trunk/COFF/ICF.cpp?rev=248106&r1=248105&r2=248106&view=diff
>>>>
>>>> ==============================================================================
>>>> --- lld/trunk/COFF/ICF.cpp (original)
>>>> +++ lld/trunk/COFF/ICF.cpp Sat Sep 19 20:44:44 2015
>>>> @@ -73,7 +73,7 @@ private:
>>>>    bool forEachGroup(std::vector<SectionChunk *> &Chunks, Comparator
>>>> Eq);
>>>>    bool partition(ChunkIterator Begin, ChunkIterator End, Comparator
>>>> Eq);
>>>>
>>>> -  uint64_t NextID = 1;
>>>> +  std::atomic<uint64_t> NextID = { 1 };
>>>>  };
>>>>
>>>>  // Entry point to ICF.
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150921/1197f868/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testsynchronization.cpp
Type: text/x-c++src
Size: 2392 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150921/1197f868/attachment.cpp>