[Mlir-commits] [mlir] fix a concurrent destruction issue for FallbackTypeIDResolver::registerImplicitTypeID due to function static variable (PR #85471)
Li Deng
llvmlistbot at llvm.org
Sat Mar 16 16:15:49 PDT 2024
================
@@ -81,8 +81,11 @@ struct ImplicitTypeIDRegistry {
} // end namespace
TypeID detail::FallbackTypeIDResolver::registerImplicitTypeID(StringRef name) {
- static ImplicitTypeIDRegistry registry;
- return registry.lookupOrInsert(name);
+ // To prevent race conditions when one thread is accessing this `static`
+ // variable while other threads are destructing it; construct the `registry`
+ // on the heap.
----------------
dengl11 wrote:
Thanks for the review!
The context here is that we recently enabled initializing and then destructing multiple Tensorflow models in parallel (inside one unit test), which uses this function under the hood. And then we are seeing a memory sanitizer error complaining "use of uninitialized memory" like below
```
==3525==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7fe2b5fb4cf2 in std::__msan::__shared_mutex_base::lock_shared() third_party/llvm/llvm-project/libcxx/src/shared_mutex.cpp:50:40
#1 0x7feed3dfb297 in lock_shared third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/shared_mutex:209:20
#2 0x7feed3dfb297 in lock_shared third_party/llvm/llvm-project/llvm/include/llvm/Support/RWMutex.h:103:12
#3 0x7feed3dfb297 in shared_lock third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/shared_mutex:325:11
#4 0x7feed3dfb297 in lookupOrInsert third_party/llvm/llvm-project/mlir/lib/Support/TypeID.cpp:60:42
#5 0x7feed3dfb297 in mlir::detail::FallbackTypeIDResolver::registerImplicitTypeID(llvm::StringRef) third_party/llvm/llvm-project/mlir/lib/Support/TypeID.cpp:85:19
#6 0x7feed9946adf in mlir::detail::TypeIDResolver<mlir::detail::OpToOpPassAdaptor, void>::resolveTypeID() third_party/llvm/llvm-project/mlir/include/mlir/Support/TypeID.h:195:24
#7 0x7feed994a113 in get<mlir::detail::OpToOpPassAdaptor> third_party/llvm/llvm-project/mlir/include/mlir/Support/TypeID.h:233:10
#8 0x7feed994a113 in classof third_party/llvm/llvm-project/mlir/include/mlir/Pass/Pass.h:460:33
#9 0x7feed994a113 in doit third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:64:53
#10 0x7feed994a113 in doit third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:110:12
#11 0x7feed994a113 in doit third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:137:12
#12 0x7feed994a113 in doit third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:127:12
#13 0x7feed994a113 in isPossible third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:255:12
#14 0x7feed994a113 in doCastIfPossible third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:493:10
#15 0x7feed994a113 in dyn_cast<mlir::detail::OpToOpPassAdaptor, mlir::Pass> third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:663:10
#16 0x7feed994a113 in mlir::Pass::printAsTextualPipeline(llvm::raw_ostream&) third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:77:23
#17 0x7feed994f132 in operator() third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:390:44
#18 0x7feed994f132 in interleave<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass> > *, mlir::Pass>, (lambda at third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:390:15), (lambda at third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:391:7), void> third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLExtras.h:2127:3
#19 0x7feed994f132 in void llvm::interleave<llvm::iterator_range<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass>>*, mlir::Pass>>, printAsTextualPipeline(llvm::raw_ostream&, llvm::StringRef, llvm::iterator_range<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass>>*, mlir::Pass>> const&)::$_0, printAsTextualPipeline(llvm::raw_ostream&, llvm::StringRef, llvm::iterator_range<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass>>*, mlir::Pass>> const&)::$_1, void>(llvm::iterator_range<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass>>*, mlir::Pass>> const&, printAsTextualPipeline(llvm::raw_ostream&, llvm::StringRef, llvm::iterator_range<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass>>*, mlir::Pass>> const&)::$_0, printAsTextualPipeline(llvm::raw_ostream&, llvm::StringRef, llvm::iterator_range<llvm::pointee_iterator<std::__msan::unique_ptr<mlir::Pass, std::__msan::default_delete<mlir::Pass>>*, mlir::Pass>> const&)::$_1) third_party/llvm/llvm-project/llvm/include/llvm/ADT/STLExtras.h:2141:3
#20 0x7feed994f497 in printAsTextualPipeline third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:389:3
#21 0x7feed994f497 in mlir::OpPassManager::printAsTextualPipeline(llvm::raw_ostream&) const third_party/llvm/llvm-project/mlir/lib/Pass/Pass.cpp:396:3
#22 0x7feeeb0535ca in mlir::tfg::TFGGrapplerOptimizer::Impl::GetPipelineString() third_party/tensorflow/core/grappler/optimizers/tfg_optimizer_hook.cc:86:10
#23 0x7feeeb05322f in mlir::tfg::TFGGrapplerOptimizer::name() const third_party/tensorflow/core/grappler/optimizers/tfg_optimizer_hook.cc:107:48
#24 0x7feef9197cba in tensorflow::grappler::MetaOptimizer::OptimizeGraph(std::__msan::vector<std::__msan::unique_ptr<tensorflow::grappler::GraphOptimizer, std::__msan::default_delete<tensorflow::grappler::GraphOptimizer>>, std::__msan::allocator<std::__msan::unique_ptr<tensorflow::grappler::GraphOptimizer, std::__msan::default_delete<tensorflow::grappler::GraphOptimizer>>>> const&, tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem&&, tensorflow::GraphDef*) third_party/tensorflow/core/grappler/optimizers/meta_optimizer.cc:835:22
#25 0x7feef919f177 in tensorflow::grappler::MetaOptimizer::OptimizeGraph(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem&&, tensorflow::GraphDef*) third_party/tensorflow/core/grappler/optimizers/meta_optimizer.cc:911:10
#26 0x7feef91a19c5 in tensorflow::grappler::MetaOptimizer::OptimizeConsumeItem(tensorflow::grappler::Cluster*, tensorflow::grappler::GrapplerItem&&, tensorflow::GraphDef*) third_party/tensorflow/core/grappler/optimizers/meta_optimizer.cc:1084:3
#27 0x7feef91a97fb in tensorflow::grappler::RunMetaOptimizer(tensorflow::grappler::GrapplerItem&&, tensorflow::ConfigProto const&, tensorflow::DeviceBase*, tensorflow::grappler::Cluster*, tensorflow::GraphDef*) third_party/tensorflow/core/grappler/optimizers/meta_optimizer.cc:1371:20
#28 0x7feef95f63af in tensorflow::GraphExecutionState::OptimizeGraph(tensorflow::BuildGraphOptions const&, tensorflow::Graph const&, tensorflow::FunctionLibraryDefinition const*, std::__msan::unique_ptr<tensorflow::Graph, std::__msan::default_delete<tensorflow::Graph>>*, std::__msan::unique_ptr<tensorflow::FunctionLibraryDefinition, std::__msan::default_delete<tensorflow::FunctionLibraryDefinition>>*) third_party/tensorflow/core/common_runtime/graph_execution_state.cc:817:5
#29 0x7feef95e87d5 in tensorflow::GraphExecutionState::BuildGraph(tensorflow::BuildGraphOptions const&, std::__msan::unique_ptr<tensorflow::ClientGraph, std::__msan::default_delete<tensorflow::ClientGraph>>*) third_party/tensorflow/core/common_runtime/graph_execution_state.cc:875:14
#30 0x7fefdb1bdca1 in tensorflow::DirectSession::CreateGraphs(tensorflow::BuildGraphOptions const&, std::__msan::unordered_map<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char>>, std::__msan::unique_ptr<tensorflow::Graph, std::__msan::default_delete<tensorflow::Graph>>, std::__msan::hash<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char>>>, std::__msan::equal_to<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char>>>, std::__msan::allocator<std::__msan::pair<std::__msan::basic_string<char, std::__msan::char_traits<char>, std::__msan::allocator<char>> const, std::__msan::unique_ptr<tensorflow::Graph, std::__msan::default_delete<tensorflow::Graph>>>>>*, std::__msan::unique_ptr<tensorflow::FunctionLibraryDefinition, std::__msan::default_delete<tensorflow::FunctionLibraryDefinition>>*, tensorflow::DirectSession::RunStateArgs*, absl::InlinedVector<tensorflow::DataType, 4ul, std::__msan::allocator<tensorflow::DataType>>*, absl::InlinedVector<tensorflow::DataType, 4ul, std::__msan::allocator<tensorflow::DataType>>*, long*) third_party/tensorflow/core/common_runtime/direct_session.cc:1635:5
#31 0x7fefdb1b9743 in tensorflow::DirectSession::CreateExecutors(tensorflow::CallableOptions const&, std::__msan::unique_ptr<tensorflow::DirectSession::ExecutorsAndKeys, std::__msan::default_delete<tensorflow::DirectSession::ExecutorsAndKeys>>*, std::__msan::unique_ptr<tensorflow::DirectSession::FunctionInfo, std::__msan::default_delete<tensorflow::DirectSession::FunctionInfo>>*, tensorflow::DirectSession::RunStateArgs*) third_party/tensorflow/core/common_runtime/direct_session.cc:1340:3
#32 0x7fefdb1c6061 in tensorflow::DirectSession::MakeCallable(tensorflow::CallableOptions const&, long*) third_party/tensorflow/core/common_runtime/direct_session.cc:1878:3
```
We believe that "When a C++ program terminates, the destructors for function static objects and globals will be executed by whichever thread started that termination but there is no guarantee that other threads have terminated". So after this function static variable `registry` is destructed by some thread, the other threads could still be using it, and cause the usage of uninitialized memory. After putting it on the heap (and thus no destruction), this error is gone.
I have also reworded to remove the plurals to avoid confusions. Please let me know what you have in mind could be more clear.
Thanks!
https://github.com/llvm/llvm-project/pull/85471
More information about the Mlir-commits
mailing list