[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Joerg Sonnenberger
joerg at britannica.bec.de
Fri Apr 18 03:55:39 PDT 2014
On Fri, Apr 18, 2014 at 11:25:49AM +0400, Kostya Serebryany wrote:
> On Fri, Apr 18, 2014 at 5:24 AM, Joerg Sonnenberger <joerg at britannica.bec.de
> > wrote:
>
> > On Thu, Apr 17, 2014 at 04:21:46PM +0400, Kostya Serebryany wrote:
> > > - sharded counters: each counter represented as N counters sitting in
> > > different cache lines. Every thread accesses the counter with index
> > TID%N.
> > > Solves the problem partially, better with larger values of N, but then
> > > again it costs RAM.
> >
> > I'd strongly go with this schema with one tweak. Use the stack pointer
> > as base value with some fudging to not just use the lowest bits. It is
> > typically easier to get.
> >
>
> Indeed, several middle bits of %sp may be used instead of TID%N.
> This would heavily depend on the pthread implementation (how it allocates
> stacks) though.
> It may be tricky to come up with the same constant scheme across all
> platforms.
Since it doesn't have to be stable, just multiply with a (random)
constant and pick the high bits?
Joerg
More information about the llvm-dev
mailing list