[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)
Joerg Sonnenberger
joerg at britannica.bec.de
Thu Apr 17 18:24:00 PDT 2014
On Thu, Apr 17, 2014 at 04:21:46PM +0400, Kostya Serebryany wrote:
> - sharded counters: each counter represented as N counters sitting in
> different cache lines. Every thread accesses the counter with index TID%N.
> Solves the problem partially, better with larger values of N, but then
> again it costs RAM.
I'd strongly go with this schema with one tweak. Use the stack pointer
as base value with some fudging to not just use the lowest bits. It is
typically easier to get.
Joerg
More information about the llvm-dev
mailing list