[PATCH] D28836: [tsan] Provide API for libraries for race detection on custom objects
Kuba (Brecka) Mracek via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 17 17:45:14 PST 2017
kubabrecka created this revision.
kubabrecka added a project: Sanitizers.
Herald added a subscriber: mgorny.
Hi,
TSan's race detection algorithm can be applied generally to anything where modification disallows other simulatenous accesses. Data races on memory locations are the common case, but we also already use the same algorithm to track races on file descriptors. This patch extends race detection and proposes to provide an interface that would allow libraries/framework to track races on objects that are exposed via API. If a mutable object is accessible via some API, it's very common that the threading rules of the API follow TSan's standard race rules. Example:
ArchiverRef ArchiverCreateNew();
void ArchiverAddFile(ArchiverRef ref, char *filename, char *data, size_t len);
void ArchiverRemoveAllFiles(ArchiverRef ref);
long ArchiverGetNumberOfFiles(ArchiverRef ref);
char * ArchiverGetFileData(ArchiverRef ref, char *filename, size_t *out_len);
A reasonable threading contract would be that "read-only" methods can be called simulatenously from multiple threads (for one `ArchiverRef` handle), while modifying methods must be exclusive.
One option to detect violations of this API contract is to TSanify the library itself, but this has some downsides:
1. The detected race will be on some internal state variable, possibly several functions deep (but the real bug is that user's code is breaking the API contract).
2. The bug might not be detected at all, because the two APIs just happen not to touch the same memory. It's still a misuse of the API, and something we want to report.
3. It might not be feasible/easy to recompile the library with TSan instrumentation.
Another option is to add interceptors for the APIs of the library. However, interceptors introduce a dependency, which is undesirable for high-level libraries, and we want TSan to link against as few libraries as possible. Secondly, maintaining a large list of interceptors for various libraries is hard and error-prone.
The idea of this patch is to allow a non-instrumented library to call into TSan runtime, which would be weakly (dynamically) linked (if TSan is not loaded into the process, this won't do anything). The library would tell TSan about "readonly" and "modifying" accesses to an "object" (which simply translate to a read/write of the object's pointer) and provide the caller and tag (type of object):
static void *tag_Archiver = __tsan_register_tag("MyLibrary", "Archiver");
void ArchiverAddFile(ArchiverRef ref, ...) {
__tsan_external_write(ref, CALLER_PC, tag_Archiver);
...
}
long ArchiverGetNumberOfFiles(ArchiverRef ref) {
__tsan_external_read(ref, CALLER_PC, tag_Archiver);
...
}
The caller and tag are used to generate a user-friendly report:
==================
WARNING: ThreadSanitizer: race on a library object
Mutating access of object Archiver from library MyLibrary at 0x7b04000000f0 by thread T6:
#0 ArchiverAddFile (archiver.dylib+...)
#1 UserCodeFunc1 (usercode+...)
Previous read-only access of object Archiver from library MyLibrary at 0x7b04000000f0 by thread T5:
#0 ArchiverAddFile (archiver.dylib+...)
#1 UserCodeFunc2 (usercode+...)
Location is Archiver object from library MyLibrary of size 16 at 0x7b04000000f0 allocated by main thread:
#0 malloc (...)
#1 ArchiverCreateNew (archiver.dylib+...)
#2 main (usercode+...)
...
So, the proposal here is to add these functions to TSan's interface:
void *__tsan_external_register_tag(char *library_name, char *object_type);
void __tsan_external_assign_tag(void *addr, void *tag);
void __tsan_external_read(void *addr, void *caller_pc, void *tag);
void __tsan_external_write(void *addr, void *caller_pc, void *tag);
The `__tsan_external_register_tag` function creates a new tag, `__tsan_external_assign_tag` marks a heap-allocated memory chunk with the specified tag (this is to identify the object in the report). The `__tsan_external_read`/`__tsan_external_read` calls are basically the same as `__tsan_read8`/`__tsan_write8`, but they identify that the call isn't from the instrumentation, but from a library.
Another common threading model in library API is that a object handle must only be used from a single thread "at any given time" (but may be passed to another thread). This can be enforced by simply treating all APIs as modifying methods.
Repository:
rL LLVM
https://reviews.llvm.org/D28836
Files:
lib/tsan/CMakeLists.txt
lib/tsan/rtl/tsan_debugging.cc
lib/tsan/rtl/tsan_defs.h
lib/tsan/rtl/tsan_external.cc
lib/tsan/rtl/tsan_external.h
lib/tsan/rtl/tsan_interface.h
lib/tsan/rtl/tsan_report.cc
lib/tsan/rtl/tsan_report.h
lib/tsan/rtl/tsan_rtl.h
lib/tsan/rtl/tsan_rtl_report.cc
lib/tsan/rtl/tsan_suppressions.cc
lib/tsan/rtl/tsan_sync.cc
test/tsan/Darwin/external.cc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D28836.84783.patch
Type: text/x-patch
Size: 18597 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170118/c7950219/attachment-0001.bin>
More information about the llvm-commits
mailing list