[LLVMdev] multithreaded performance disaster with -fprofile-instr-generate (contention on profile counters)

Joerg Sonnenberger joerg at britannica.bec.de
Thu Apr 17 18:24:00 PDT 2014


On Thu, Apr 17, 2014 at 04:21:46PM +0400, Kostya Serebryany wrote:
> - sharded counters: each counter represented as N counters sitting in
> different cache lines. Every thread accesses the counter with index TID%N.
> Solves the problem partially, better with larger values of N, but then
> again it costs RAM.

I'd strongly go with this schema with one tweak. Use the stack pointer
as base value with some fudging to not just use the lowest bits. It is
typically easier to get.

Joerg



More information about the llvm-dev mailing list