[LLVMdev] Removing contention in PassRegistry accesses to speed up compiles

Nipun Sehrawat nipun at thoughtspot.com
Tue Feb 24 13:50:52 PST 2015


Hi,

We use LLVM libraries to compile C++ code and noticed slow downs when
multiple threads of a process were compiling at once. *perf *indicated that
most of the CPU time was spent in a spin lock, which was being
locked/unlocked from llvm::PassRegistry::getPassInfo().

We read the relevant LLVM code and found out that PassRegistry is a
ManagedStatic and is shared among all threads in case of a multi-threaded
setup. This sharing requires locking in PassRegistry's method, which
becomes source of the contention. To get rid of the contention, we made a
change to make PassRegistry thread-local and got rid of the locking. This
removed all the contention and we noticed a 2x speed up in single thread
compiles and 7x improvement when ten threads were compiling in parallel.

Please find attached the diff for this change. We are using a old version
of LLVM code (svn revision 170375), so the code might be quite outdated.

We have two questions:
1. Does the change look reasonable? Or are we missing something here?
2. When we run  with 1000 threads compiling concurrently, we
deterministically run into a segfault in PassRegistry lookup. Any insights
into the segfault?


Please find attached the following files:
1. *pass_registry.txt*: Git diff of our change. Note that it is against
LLVM svn revision 170375.
2.* contention.txt*: Perf report with existing LLVM code - shows contention
in llvm::PassRegistry::getPassInfo()
3. *no_contention.txt*: Perf report of LLVM built with our change.
4. *segfault.txt*: Segfault we are encountering after our change.
5. *clang_compile.cpp*: Snippet of code we use to compile code using LLVM.


Thanks a lot,
Nipun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/47a0c9d5/attachment.html>
-------------- next part --------------
Samples: 696K of event 'cycles', Event count (approx.): 427355036253
-  48.98%  caching_compile  [kernel.kallsyms]          [k] __ticket_spin_lock
   - __ticket_spin_lock
      - 98.91% _raw_spin_lock
         - 57.49% futex_wake
              do_futex
              SyS_futex
            - system_call_fastpath
               - 100.00% __lll_unlock_wake
                  + 99.77% llvm::PassRegistry::getPassInfo(void const*) const
         - 38.01% futex_wait_setup
              futex_wait
              do_futex
              SyS_futex
            - system_call_fastpath
               - 100.00% __lll_lock_wait
                  + 99.86% llvm::PassRegistry::getPassInfo(void const*) const
         + 4.03% try_to_wake_up
      + 0.61% _raw_spin_lock_irq
+   4.10%  caching_compile  libpthread.so.0            [.] pthread_mutex_lock
+   1.81%  caching_compile  caching_compiler_test_old  [.] llvm::sys::MemoryFence()
+   1.63%  caching_compile  libpthread.so.0            [.] __pthread_mutex_unlock_usercnt
-------------- next part --------------
Samples: 113K of event 'cycles', Event count (approx.): 70370381305
+  10.38%  caching_compile  caching_compiler_test  [.] common::RecordParser::MatchData(common::RecordParser::Record*, unsigned long)
+   5.45%  caching_compile  caching_compiler_test  [.] common::RecordParser::Stream::FileStreamAdapter::good()
+   3.31%  caching_compile  caching_compiler_test  [.] common::RecordParser::Stream::FileStreamAdapter::get()
+   1.17%  caching_compile  caching_compiler_test  [.] llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
+   1.10%  caching_compile  caching_compiler_test  [.] operator new(unsigned long)
+   1.07%  caching_compile  caching_compiler_test  [.] llvm::SmallPtrSetImpl::insert_imp(void const*)
+   1.00%  caching_compile  caching_compiler_test  [.] llvm::PMDataManager::removeNotPreservedAnalysis(llvm::Pass*)
+   0.99%  caching_compile  caching_compiler_test  [.] operator delete(void*)
+   0.69%  caching_compile  caching_compiler_test  [.] llvm::sys::MemoryFence()
+   0.64%  caching_compile  caching_compiler_test  [.] llvm::PMDataManager::initializeAnalysisImpl(llvm::Pass*)
+   0.59%  caching_compile  libc.so.6              [.] __memcpy_ssse3_back
+   0.57%  caching_compile  caching_compiler_test  [.] llvm::PassRegistry::getPassInfo(void const*) const
+   0.54%  caching_compile  libc.so.6              [.] __memset_sse2
+   0.53%  caching_compile  caching_compiler_test  [.] (anonymous namespace)::VectorLegalizer::LegalizeOp(llvm::SDValue)
+   0.47%  caching_compile  caching_compiler_test  [.] llvm::PMDataManager::findAnalysisPass(void const*, bool)
-------------- next part --------------
diff --git a/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h b/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
index 3633f47..44b4d56 100644
--- a/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
+++ b/llvm/llvm.svnrev.170375/include/llvm/PassSupport.h
@@ -130,25 +130,14 @@ private:
   PassInfo(const PassInfo &) LLVM_DELETED_FUNCTION;
 };
 
+
+
 #define CALL_ONCE_INITIALIZATION(function) \
-  static volatile sys::cas_flag initialized = 0; \
-  sys::cas_flag old_val = sys::CompareAndSwap(&initialized, 1, 0); \
-  if (old_val == 0) { \
+  static __thread int initialized = 0; \
+  if (initialized == 0) { \
     function(Registry); \
-    sys::MemoryFence(); \
-    TsanIgnoreWritesBegin(); \
-    TsanHappensBefore(&initialized); \
-    initialized = 2; \
-    TsanIgnoreWritesEnd(); \
-  } else { \
-    sys::cas_flag tmp = initialized; \
-    sys::MemoryFence(); \
-    while (tmp != 2) { \
-      tmp = initialized; \
-      sys::MemoryFence(); \
-    } \
-  } \
-  TsanHappensAfter(&initialized);
+    initialized = 1; \
+  }
 
 #define INITIALIZE_PASS(passName, arg, name, cfg, analysis) \
   static void* initialize##passName##PassOnce(PassRegistry &Registry) { \
diff --git a/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp b/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
index 5d22879..8cca689 100644
--- a/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
+++ b/llvm/llvm.svnrev.170375/lib/VMCore/PassRegistry.cpp
@@ -30,12 +30,17 @@ using namespace llvm;
 // llvm_shutdown clear this map prevents successful resurrection after
 // llvm_shutdown is run.  Ideally we should find a solution so that we don't
 // leak the map, AND can still resurrect after shutdown.
-static ManagedStatic<PassRegistry> PassRegistryObj;
+
+// Each thread gets its own PassRegistry.
+static __thread PassRegistry* PassRegistryObj = NULL;
 PassRegistry *PassRegistry::getPassRegistry() {
-  return &*PassRegistryObj;
+  // TODO(nipun): Handle deletion.
+  if (NULL == PassRegistryObj) {
+    PassRegistryObj = new PassRegistry();
+  }
+  return PassRegistryObj;
 }
 
-static ManagedStatic<sys::SmartMutex<true> > Lock;
 
 //===----------------------------------------------------------------------===//
 // PassRegistryImpl
@@ -72,26 +77,23 @@ void *PassRegistry::getImpl() const {
 //
 
 PassRegistry::~PassRegistry() {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(pImpl);
-  
+
   for (std::vector<const PassInfo*>::iterator I = Impl->ToFree.begin(),
        E = Impl->ToFree.end(); I != E; ++I)
     delete *I;
-  
+
   delete Impl;
   pImpl = 0;
 }
 
 const PassInfo *PassRegistry::getPassInfo(const void *TI) const {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   PassRegistryImpl::MapType::const_iterator I = Impl->PassInfoMap.find(TI);
   return I != Impl->PassInfoMap.end() ? I->second : 0;
 }
 
 const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   PassRegistryImpl::StringMapType::const_iterator
     I = Impl->PassInfoStringMap.find(Arg);
@@ -103,7 +105,6 @@ const PassInfo *PassRegistry::getPassInfo(StringRef Arg) const {
 //
 
 void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   bool Inserted =
     Impl->PassInfoMap.insert(std::make_pair(PI.getTypeInfo(),&PI)).second;
@@ -115,24 +116,22 @@ void PassRegistry::registerPass(const PassInfo &PI, bool ShouldFree) {
   for (std::vector<PassRegistrationListener*>::iterator
        I = Impl->Listeners.begin(), E = Impl->Listeners.end(); I != E; ++I)
     (*I)->passRegistered(&PI);
-  
+
   if (ShouldFree) Impl->ToFree.push_back(&PI);
 }
 
 void PassRegistry::unregisterPass(const PassInfo &PI) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
-  PassRegistryImpl::MapType::iterator I = 
+  PassRegistryImpl::MapType::iterator I =
     Impl->PassInfoMap.find(PI.getTypeInfo());
   assert(I != Impl->PassInfoMap.end() && "Pass registered but not in map!");
-  
+
   // Remove pass from the map.
   Impl->PassInfoMap.erase(I);
   Impl->PassInfoStringMap.erase(PI.getPassArgument());
 }
 
 void PassRegistry::enumerateWith(PassRegistrationListener *L) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   for (PassRegistryImpl::MapType::const_iterator I = Impl->PassInfoMap.begin(),
        E = Impl->PassInfoMap.end(); I != E; ++I)
@@ -160,8 +159,6 @@ void PassRegistry::registerAnalysisGroup(const void *InterfaceID,
     assert(ImplementationInfo &&
            "Must register pass before adding to AnalysisGroup!");
 
-    sys::SmartScopedLock<true> Guard(*Lock);
-    
     // Make sure we keep track of the fact that the implementation implements
     // the interface.
     ImplementationInfo->addInterfaceImplemented(InterfaceInfo);
@@ -180,20 +177,18 @@ void PassRegistry::registerAnalysisGroup(const void *InterfaceID,
       InterfaceInfo->setNormalCtor(ImplementationInfo->getNormalCtor());
     }
   }
-  
+
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   if (ShouldFree) Impl->ToFree.push_back(&Registeree);
 }
 
 void PassRegistry::addRegistrationListener(PassRegistrationListener *L) {
-  sys::SmartScopedLock<true> Guard(*Lock);
   PassRegistryImpl *Impl = static_cast<PassRegistryImpl*>(getImpl());
   Impl->Listeners.push_back(L);
 }
 
 void PassRegistry::removeRegistrationListener(PassRegistrationListener *L) {
-  sys::SmartScopedLock<true> Guard(*Lock);
-  
+
   // NOTE: This is necessary, because removeRegistrationListener() can be called
   // as part of the llvm_shutdown sequence.  Since we have no control over the
   // order of that sequence, we need to gracefully handle the case where the
-------------- next part --------------
*** Aborted at 1424812565 (unix time) try "date -d @1424812565" if you are using GNU date ***
PC: @           0x51c290 llvm::PMTopLevelManager::findAnalysisPass()
*** SIGSEGV (@0x20) received by PID 50599 (TID 0x7f1fab00e700) from PID 32; stack trace: ***
    @     0x7f20d332a0d0 (unknown)
    @           0x51c290 llvm::PMTopLevelManager::findAnalysisPass()
    @           0x51f508 llvm::PMDataManager::initializeAnalysisImpl()
    @           0x5232fb llvm::FPPassManager::runOnFunction()
    @           0x523426 llvm::FunctionPassManagerImpl::run()
    @           0x52354d llvm::FunctionPassManager::run()
    @           0x5407aa llvm::JIT::jitTheFunction()
    @           0x540db8 llvm::JIT::runJITOnFunctionUnlocked()
    @           0x540fbf llvm::JIT::getPointerToFunction()
    @           0x424793 falcon::ClangCompiler::ClangModuleHandle::GetPointerToFunction()
    @           0x40f92d _ZNSt6thread5_ImplISt12_Bind_simpleIFZN6falcon29CachingCompiler_PerfTest_Test8TestBodyEvEUlvE0_vEEE6_M_runEv
    @     0x7f20d23ab784 execute_native_thread_routine
    @     0x7f20d3322dc2 start_thread
    @     0x7f20d1b413fd __clone
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang_compiler.cpp
Type: text/x-c++src
Size: 3554 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/47a0c9d5/attachment.cpp>


More information about the llvm-dev mailing list