As per offline discussion, Alexey will measure overhead of the single mutex on tsan performance. If the slowdown is negligible, we can use it. Otherwise we need to think about per-thread caches will all the required machinery. http://llvm-reviews.chandlerc.com/D671