[PATCH] D28836: [tsan] Provide API for libraries for race detection on custom objects

Kuba (Brecka) Mracek via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 17 17:45:14 PST 2017


kubabrecka created this revision.
kubabrecka added a project: Sanitizers.
Herald added a subscriber: mgorny.

Hi,

TSan's race detection algorithm can be applied generally to anything where modification disallows other simulatenous accesses.  Data races on memory locations are the common case, but we also already use the same algorithm to track races on file descriptors.  This patch extends race detection and proposes to provide an interface that would allow libraries/framework to track races on objects that are exposed via API.  If a mutable object is accessible via some API, it's very common that the threading rules of the API follow TSan's standard race rules.  Example:

  ArchiverRef ArchiverCreateNew();
  void        ArchiverAddFile(ArchiverRef ref, char *filename, char *data, size_t len);
  void        ArchiverRemoveAllFiles(ArchiverRef ref);
  long        ArchiverGetNumberOfFiles(ArchiverRef ref);
  char *      ArchiverGetFileData(ArchiverRef ref, char *filename, size_t *out_len);

A reasonable threading contract would be that "read-only" methods can be called simulatenously from multiple threads (for one `ArchiverRef` handle), while modifying methods must be exclusive.

One option to detect violations of this API contract is to TSanify the library itself, but this has some downsides:

1. The detected race will be on some internal state variable, possibly several functions deep (but the real bug is that user's code is breaking the API contract).
2. The bug might not be detected at all, because the two APIs just happen not to touch the same memory.  It's still a misuse of the API, and something we want to report.
3. It might not be feasible/easy to recompile the library with TSan instrumentation.

Another option is to add interceptors for the APIs of the library.  However, interceptors introduce a dependency, which is undesirable for high-level libraries, and we want TSan to link against as few libraries as possible.  Secondly, maintaining a large list of interceptors for various libraries is hard and error-prone.

The idea of this patch is to allow a non-instrumented library to call into TSan runtime, which would be weakly (dynamically) linked (if TSan is not loaded into the process, this won't do anything).  The library would tell TSan about "readonly" and "modifying" accesses to an "object" (which simply translate to a read/write of the object's pointer) and provide the caller and tag (type of object):

  static void *tag_Archiver = __tsan_register_tag("MyLibrary", "Archiver");
  
  void ArchiverAddFile(ArchiverRef ref, ...) {
    __tsan_external_write(ref, CALLER_PC, tag_Archiver);
    ...
  }
  
  long ArchiverGetNumberOfFiles(ArchiverRef ref) {
    __tsan_external_read(ref, CALLER_PC, tag_Archiver);
    ...
  }

The caller and tag are used to generate a user-friendly report:

  ==================
  WARNING: ThreadSanitizer: race on a library object
    Mutating access of object Archiver from library MyLibrary at 0x7b04000000f0 by thread T6:
      #0 ArchiverAddFile (archiver.dylib+...)
      #1 UserCodeFunc1 (usercode+...)
  
    Previous read-only access of object Archiver from library MyLibrary at 0x7b04000000f0 by thread T5:
      #0 ArchiverAddFile (archiver.dylib+...)
      #1 UserCodeFunc2 (usercode+...)
  
    Location is Archiver object from library MyLibrary of size 16 at 0x7b04000000f0 allocated by main thread:
      #0 malloc (...)
      #1 ArchiverCreateNew (archiver.dylib+...)
      #2 main (usercode+...)
  ...

So, the proposal here is to add these functions to TSan's interface:

  void *__tsan_external_register_tag(char *library_name, char *object_type);
  void  __tsan_external_assign_tag(void *addr, void *tag);
  void  __tsan_external_read(void *addr, void *caller_pc, void *tag);
  void  __tsan_external_write(void *addr, void *caller_pc, void *tag);

The `__tsan_external_register_tag` function creates a new tag, `__tsan_external_assign_tag` marks a heap-allocated memory chunk with the specified tag (this is to identify the object in the report).  The `__tsan_external_read`/`__tsan_external_read` calls are basically the same as `__tsan_read8`/`__tsan_write8`, but they identify that the call isn't from the instrumentation, but from a library.

Another common threading model in library API is that a object handle must only be used from a single thread "at any given time" (but may be passed to another thread).  This can be enforced by simply treating all APIs as modifying methods.


Repository:
  rL LLVM

https://reviews.llvm.org/D28836

Files:
  lib/tsan/CMakeLists.txt
  lib/tsan/rtl/tsan_debugging.cc
  lib/tsan/rtl/tsan_defs.h
  lib/tsan/rtl/tsan_external.cc
  lib/tsan/rtl/tsan_external.h
  lib/tsan/rtl/tsan_interface.h
  lib/tsan/rtl/tsan_report.cc
  lib/tsan/rtl/tsan_report.h
  lib/tsan/rtl/tsan_rtl.h
  lib/tsan/rtl/tsan_rtl_report.cc
  lib/tsan/rtl/tsan_suppressions.cc
  lib/tsan/rtl/tsan_sync.cc
  test/tsan/Darwin/external.cc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D28836.84783.patch
Type: text/x-patch
Size: 18597 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170118/c7950219/attachment-0001.bin>


More information about the llvm-commits mailing list