[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Kostya Serebryany kcc at google.com
Fri Apr 18 00:25:49 PDT 2014


On Fri, Apr 18, 2014 at 5:24 AM, Joerg Sonnenberger <joerg at britannica.bec.de
> wrote:

> On Thu, Apr 17, 2014 at 04:21:46PM +0400, Kostya Serebryany wrote:
> > - sharded counters: each counter represented as N counters sitting in
> > different cache lines. Every thread accesses the counter with index
> TID%N.
> > Solves the problem partially, better with larger values of N, but then
> > again it costs RAM.
>
> I'd strongly go with this schema with one tweak. Use the stack pointer
> as base value with some fudging to not just use the lowest bits. It is
> typically easier to get.
>

Indeed, several middle bits of %sp may be used instead of TID%N.
This would heavily depend on the pthread implementation (how it allocates
stacks) though.
It may be tricky to come up with the same constant scheme across all
platforms.


>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140418/458ecc1a/attachment.html>


More information about the llvm-dev mailing list