[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Joerg Sonnenberger joerg at britannica.bec.de
Fri Apr 18 03:55:39 PDT 2014


On Fri, Apr 18, 2014 at 11:25:49AM +0400, Kostya Serebryany wrote:
> On Fri, Apr 18, 2014 at 5:24 AM, Joerg Sonnenberger <joerg at britannica.bec.de
> > wrote:
> 
> > On Thu, Apr 17, 2014 at 04:21:46PM +0400, Kostya Serebryany wrote:
> > > - sharded counters: each counter represented as N counters sitting in
> > > different cache lines. Every thread accesses the counter with index
> > TID%N.
> > > Solves the problem partially, better with larger values of N, but then
> > > again it costs RAM.
> >
> > I'd strongly go with this schema with one tweak. Use the stack pointer
> > as base value with some fudging to not just use the lowest bits. It is
> > typically easier to get.
> >
> 
> Indeed, several middle bits of %sp may be used instead of TID%N.
> This would heavily depend on the pthread implementation (how it allocates
> stacks) though.
> It may be tricky to come up with the same constant scheme across all
> platforms.

Since it doesn't have to be stable, just multiply with a (random)
constant and pick the high bits?

Joerg



More information about the llvm-dev mailing list