[PATCH] D129755: Thread safety analysis: Support copy-elided production of scoped capabilities through arbitrary calls

Fri Oct 7 12:47:28 PDT 2022

hans added a comment.

In D129755#3843146 <https://reviews.llvm.org/D129755#3843146>, @aaronpuchert wrote:

> In D129755#3842729 <https://reviews.llvm.org/D129755#3842729>, @hans wrote:
>
>> We're hitting a false positive in grpc after this:
>>
>>   > ../../third_party/grpc/src/src/core/lib/gprpp/ref_counted_ptr.h:335:31: error: calling function 'TlsSessionKeyLoggerCache' requires holding mutex 'g_tls_session_key_log_cache_mu' exclusively [-Werror,-Wthread-safety-analysis]
>>   >   return RefCountedPtr<T>(new T(std::forward<Args>(args)...));
>>   >                               ^
>>   > ../../third_party/grpc/src/src/core/tsi/ssl/key_logging/ssl_key_logging.cc:121:26: note: in instantiation of function template specialization 'grpc_core::MakeRefCounted<tsi::TlsSessionKeyLoggerCache>' requested here
>>   >      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();
>>   >                         ^
>>   
>>   
>>   The code looks like this:
>>   
>>   grpc_core::RefCountedPtr<TlsSessionKeyLogger> TlsSessionKeyLoggerCache::Get(
>>       std::string tls_session_key_log_file_path) {
>>     gpr_once_init(&g_cache_mutex_init, do_cache_mutex_init);
>>     GPR_DEBUG_ASSERT(g_tls_session_key_log_cache_mu != nullptr);
>>     if (tls_session_key_log_file_path.empty()) {
>>       return nullptr;
>>     }
>>     {
>>       grpc_core::MutexLock lock(g_tls_session_key_log_cache_mu);        <---------- holding the mutex
>>       grpc_core::RefCountedPtr<TlsSessionKeyLoggerCache> cache;
>>       if (g_cache_instance == nullptr) {
>>         // This will automatically set g_cache_instance.
>>         cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();      <------ line 121
>>   
>>   
>>   
>>   lock is holding a MutexLock (I assume that's an exclusive thing) on g_tls_session_key_log_cache_mu.
>>
>> See https://bugs.chromium.org/p/chromium/issues/detail?id=1372394#c4 for how to reproduce.
>>
>> I've reverted this in https://github.com/llvm/llvm-project/commit/a4afa2bde6f4db215ddd3267a8d11c04367812e5 in the meantime.
>
> This doesn't look like a false positive to me. The documentation states that no inlining <https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#no-inlining> is being done, so unless `grpc_core::MakeRefCounted` (or a specialization) has a `require_capability` attribute, the lock doesn't count as locked in the function body. That has always been the case.

Thanks for looking into this so quickly.

I'd still call it a false positive since the mutex is in fact being held. However I now understand that this is due to a pre-existing limitation in the analysis.

How would you suggest the developers work around this?

> In D129755#3842744 <https://reviews.llvm.org/D129755#3842744>, @hans wrote:
>
>> We also hit a different case: https://bugs.chromium.org/p/chromium/issues/detail?id=1372394#c6
>
> From the logs:
>
>   ../../extensions/browser/api/content_settings/content_settings_store.cc(75,49): note: mutex acquired here
>     std::unique_ptr<base::AutoLock> auto_lock(new base::AutoLock(lock_));
>                                                   ^
>
> That seems to be precisely the example that the documentation says doesn't work. The constructor of `std::unique_ptr` has no thread safety annotations and so we have a constructor call without a matching destructor call. (The destructor is called indirectly from the destructor of `unique_ptr`. We see this now because of the constructor call that doesn't initialize a local variable.
>
> The documentation also lists ways to make this work (such as having more powerful scope types).

I see. I looked through the docs but it wasn't clear what you referred too. How would one change base::AutoLock in this case? Or do the developers need to avoid unique_ptr, or is there something else?

> I'll give you some time to reply before I reland, but I can't see a bug here. For the next time, as already pointed out, give the involved people some time to reply before you revert (unless it's a crash). We have to be able to continue working on the compiler without having Google revert changes all the time because we produce new warnings.

We've seen reports here of various breakages from two different projects in short time after your patch. The general policy is to revert to green, so I think that was the right call. Typically, new warnings are introduced behind new flags so that users can adopt them incrementally, though I realize that might be tricky in this instance. In any case, perhaps we should give other projects some time to assess the impact of this before relanding?

This also seems like the kind of disruptive change we'd want to announce on Discourse, as discussed in Aaron's recent RFC <https://discourse.llvm.org/t/rfc-add-new-discourse-channel-for-potentially-breaking-disruptive-changes-for-clang/65251>. Probably should be mentioned in the "Potentially Breaking Changes" section of the release notes too.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129755/new/

https://reviews.llvm.org/D129755