[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles
Nipun Sehrawat
nipun at thoughtspot.com
Tue Feb 24 13:50:52 PST 2015
Hi,
We use LLVM libraries to compile C++ code and noticed slow downs when
multiple threads of a process were compiling at once. *perf *indicated that
most of the CPU time was spent in a spin lock, which was being
locked/unlocked from llvm::PassRegistry::getPassInfo().
We read the relevant LLVM code and found out that PassRegistry is a
ManagedStatic and is shared among all threads in case of a multi-threaded
setup. This sharing requires locking in PassRegistry's method, which
becomes source of the contention. To get rid of the contention, we made a
change to make PassRegistry thread-local and got rid of the locking. This
removed all the contention and we noticed a 2x speed up in single thread
compiles and 7x improvement when ten threads were compiling in parallel.
Please find attached the diff for this change. We are using a old version
of LLVM code (svn revision 170375), so the code might be quite outdated.
We have two questions:
1. Does the change look reasonable? Or are we missing something here?
2. When we run with 1000 threads compiling concurrently, we
deterministically run into a segfault in PassRegistry lookup. Any insights
into the segfault?
Please find attached the following files:
1. *pass_registry.txt*: Git diff of our change. Note that it is against
LLVM svn revision 170375.
2.* contention.txt*: Perf report with existing LLVM code - shows contention
in llvm::PassRegistry::getPassInfo()
3. *no_contention.txt*: Perf report of LLVM built with our change.
4. *segfault.txt*: Segfault we are encountering after our change.
5. *clang_compile.cpp*: Snippet of code we use to compile code using LLVM.
Thanks a lot,
Nipun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/47a0c9d5/attachment.html>
-------------- next part --------------
Samples: 696K of event 'cycles', Event count (approx.): 427355036253
- 48.98% caching_compile [kernel.kallsyms] [k] __ticket_spin_lock
- __ticket_spin_lock
- 98.91% _raw_spin_lock
- 57.49% futex_wake
do_futex
SyS_futex
- system_call_fastpath
- 100.00% __lll_unlock_wake
+ 99.77% llvm::PassRegistry::getPassInfo(void const*) const
- 38.01% futex_wait_setup
futex_wait
do_futex
SyS_futex
- system_call_fastpath
- 100.00% __lll_lock_wait
+ 99.86% llvm::PassRegistry::getPassInfo(void const*) const
+ 4.03% try_to_wake_up
+ 0.61% _raw_spin_lock_irq
+ 4.10% caching_compile libpthread.so.0 [.] pthread_mutex_lock
+ 1.81% caching_compile caching_compiler_test_old [.] llvm::sys::MemoryFence()
+ 1.63% caching_compile libpthread.so.0 [.] __pthread_mutex_unlock_usercnt
-------------- next part --------------
Samples: 113K of event 'cycles', Event count (approx.): 70370381305
+ 10.38% caching_compile caching_compiler_test [.] common::RecordParser::MatchData(common::RecordParser::Record*, unsigned long)
+ 5.45% caching_compile caching_compiler_test [.] common::RecordParser::Stream::FileStreamAdapter::good()
+ 3.31% caching_compile caching_compiler_test [.] common::RecordParser::Stream::FileStreamAdapter::get()
+ 1.17% caching_compile caching_compiler_test [.] llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
+ 1.10% caching_compile caching_compiler_test [.] operator new(unsigned long)
+ 1.07% caching_compile caching_compiler_test [.] llvm::SmallPtrSetImpl::insert_imp(void const*)
+ 1.00% caching_compile caching_compiler_test [.] llvm::PMDataManager::removeNotPreservedAnalysis(llvm::Pass*)
+ 0.99% caching_compile caching_compiler_test [.] operator delete(void*)
+ 0.69% caching_compile caching_compiler_test [.] llvm::sys::MemoryFence()
+ 0.64% caching_compile caching_compiler_test [.] llvm::PMDataManager::initializeAnalysisImpl(llvm::Pass*)
+ 0.59% caching_compile libc.so.6 [.] __memcpy_ssse3_back
+ 0.57% caching_compile caching_compiler_test [.] llvm::PassRegistry::getPassInfo(void const*) const
+ 0.54% caching_compile libc.so.6 [.] __memset_sse2
+ 0.53% caching_compile caching_compiler_test [.] (anonymous namespace)::VectorLegalizer::LegalizeOp(llvm::SDValue)
+ 0.47% caching_compile caching_compiler_test [.] llvm::PMDataManager::findAnalysisPass(void const*, bool)
-------------- next part --------------
diff --git a/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h b/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
index 3633f47..44b4d56 100644
--- a/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
+++ b/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
@@ -130,25 +130,14 @@ private:
PassInfo(const PassInfo &) LLVM_DELETED_FUNCTION;
};
+
+
#define CALL_ONCE_INITIALIZATION(function) \
- static volatile sys::cas_flag initialized = 0; \
- sys::cas_flag old_val = sys::CompareAndSwap(&initialized, 1, 0); \
- if (old_val == 0) { \
+ static __thread int initialized = 0; \
+ if (initialized == 0) { \
function(Registry); \
- sys::MemoryFence(); \
- TsanIgnoreWritesBegin(); \
- TsanHappensBefore(&initialized); \
- initialized = 2; \
- TsanIgnoreWritesEnd(); \
- } else { \
- sys::cas_flag tmp = initialized; \
- sys::MemoryFence(); \
- while (tmp != 2) { \
- tmp = initialized; \
- sys::MemoryFence(); \
- } \
- } \
- TsanHappensAfter(&initialized);
+ initialized = 1; \
+ }
#define INITIALIZE_PASS(passName, arg, name, cfg, analysis) \
static void* initialize##passName##PassOnce(PassRegistry &Registry) { \
diff --git a/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp b/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
index 5d22879..8cca689 100644
--- a/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
+++ b/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
@@ -30,12 +30,17 @@ using namespace llvm;
// llvm_shutdown clear this map prevents successful resurrection after
// llvm_shutdown is run. Ideally we should find a solution so that we don't
// leak the map, AND can still resurrect after shutdown.
-static ManagedStatic<PassRegistry> PassRegistryObj;
+
+// Each thread gets its own PassRegistry.
+static __thread PassRegistry* PassRegistryObj = NULL;
PassRegistry *PassRegistry::getPassRegistry() {
- return &*PassRegistryObj;
+ // TODO(nipun): Handle deletion.
+ if (NULL == PassRegistryObj) {
+ PassRegistryObj = new PassRegistry();
+ }
+ return PassRegistryObj;
}
-static ManagedStatic<sys::SmartMutex<true> > Lock;
//===----------------------------------------------------------------------===//
// PassRegistryImpl
@@ -72,26 +77,23 @@ void *PassRegistry::getImpl() const {
//
PassRegistry::~PassRegistry() {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(pImpl);
-
+
for (std::vector<const PassInfo*>::iterator I = Impl->ToFree.begin(),
E = Impl->ToFree.end(); I != E; ++I)
delete *I;
-
+
delete Impl;
pImpl = 0;
}
const PassInfo *PassRegistry::getPassInfo(const void *TI) const {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
PassRegistryImpl::MapType::const_iterator I = Impl->PassInfoMap.find(TI);
return I != Impl->PassInfoMap.end() ? I->second : 0;
}
const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
PassRegistryImpl::StringMapType::const_iterator
I = Impl->PassInfoStringMap.find(Arg);
@@ -103,7 +105,6 @@ const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {
//
void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
bool Inserted =
Impl->PassInfoMap.insert(std::make_pair(PI.getTypeInfo(),&PI)).second;
@@ -115,24 +116,22 @@ void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {
for (std::vector<PassRegistrationListener*>::iterator
I = Impl->Listeners.begin(), E = Impl->Listeners.end(); I != E; ++I)
(*I)->passRegistered(&PI);
-
+
if (ShouldFree) Impl->ToFree.push_back(&PI);
}
void PassRegistry::unregisterPass(const PassInfo &PI) {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
- PassRegistryImpl::MapType::iterator I =
+ PassRegistryImpl::MapType::iterator I =
Impl->PassInfoMap.find(PI.getTypeInfo());
assert(I != Impl->PassInfoMap.end() && "Pass registered but not in map!");
-
+
// Remove pass from the map.
Impl->PassInfoMap.erase(I);
Impl->PassInfoStringMap.erase(PI.getPassArgument());
}
void PassRegistry::enumerateWith(PassRegistrationListener *L) {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
for (PassRegistryImpl::MapType::const_iterator I = Impl->PassInfoMap.begin(),
E = Impl->PassInfoMap.end(); I != E; ++I)
@@ -160,8 +159,6 @@ void PassRegistry::registerAnalysisGroup(const void *InterfaceID,
assert(ImplementationInfo &&
"Must register pass before adding to AnalysisGroup!");
- sys::SmartScopedLock<true> Guard(*Lock);
-
// Make sure we keep track of the fact that the implementation implements
// the interface.
ImplementationInfo->addInterfaceImplemented(InterfaceInfo);
@@ -180,20 +177,18 @@ void PassRegistry::registerAnalysisGroup(const void *InterfaceID,
InterfaceInfo->setNormalCtor(ImplementationInfo->getNormalCtor());
}
}
-
+
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
if (ShouldFree) Impl->ToFree.push_back(&Registeree);
}
void PassRegistry::addRegistrationListener(PassRegistrationListener *L) {
- sys::SmartScopedLock<true> Guard(*Lock);
PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
Impl->Listeners.push_back(L);
}
void PassRegistry::removeRegistrationListener(PassRegistrationListener *L) {
- sys::SmartScopedLock<true> Guard(*Lock);
-
+
// NOTE: This is necessary, because removeRegistrationListener() can be called
// as part of the llvm_shutdown sequence. Since we have no control over the
// order of that sequence, we need to gracefully handle the case where the
-------------- next part --------------
*** Aborted at 1424812565 (unix time) try "date -d @1424812565" if you are using GNU date ***
PC: @ 0x51c290 llvm::PMTopLevelManager::findAnalysisPass()
*** SIGSEGV (@0x20) received by PID 50599 (TID 0x7f1fab00e700) from PID 32; stack trace: ***
@ 0x7f20d332a0d0 (unknown)
@ 0x51c290 llvm::PMTopLevelManager::findAnalysisPass()
@ 0x51f508 llvm::PMDataManager::initializeAnalysisImpl()
@ 0x5232fb llvm::FPPassManager::runOnFunction()
@ 0x523426 llvm::FunctionPassManagerImpl::run()
@ 0x52354d llvm::FunctionPassManager::run()
@ 0x5407aa llvm::JIT::jitTheFunction()
@ 0x540db8 llvm::JIT::runJITOnFunctionUnlocked()
@ 0x540fbf llvm::JIT::getPointerToFunction()
@ 0x424793 falcon::ClangCompiler::ClangModuleHandle::GetPointerToFunction()
@ 0x40f92d _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6falcon29CachingCompiler_PerfTest_Test8TestBodyEvEUlvE0_vEEE6_M_runEv
@ 0x7f20d23ab784 execute_native_thread_routine
@ 0x7f20d3322dc2 start_thread
@ 0x7f20d1b413fd __clone
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang_compiler.cpp
Type: text/x-c++src
Size: 3554 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/47a0c9d5/attachment.cpp>
More information about the llvm-dev
mailing list