[libcxx-commits] [PATCH] D96842: [dfsan] Do not specialize vector<bool> for DFSan
Kostya Serebryany via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Feb 17 13:52:26 PST 2021
kcc added a comment.
In D96842#2569364 <https://reviews.llvm.org/D96842#2569364>, @ldionne wrote:
> The reason why we can't make this change as-is is that it modifies the API of `std::vector` when instantiated with `bool` when the dataflow sanitizer is used. Specifically, the specialization of `vector<bool>` has a different interface, it's not only an optimization. For example `std::vector<bool>::reference` has a `.flip()` method. So if someone is doing something like `v.flip()` (which is legal), now their code won't compile when they turn on the dataflow sanitizer. That's not acceptable, and it would make us non-conforming when the dataflow sanitizer is used.
Bummer, I didn't realize that... Ok...
> Frankly, from your description, it sounds like the dataflow sanitizer is just not clever enough (yet?) to understand our code.
It's not about being clever, but about trading overhead for accuracy when it comes to bit-packed data structures.
msan maintains a bit-to-bit metadata and thus doesn't have this kind of false positives.
dfsan maintains a byte-to-byte metadata (currently, byte-to-two-bytes, but we are moving to byte-to-byte).
This enables us to maintain 8 taint bits per one application byte, and thus track up to 8 different data flows (taints).
This means a) 8x fewer runs required compared to a single taint tracer and b) ability to detect when two taints merge.
The downside is that in the presence of bit packing we over-taint, i.e. when setting a single bit with a tainted
value we taint the entire byte.
I am not aware of any good solution here, users will have to workaround in their code,
e.g. by not using bit-packed data structures in cases where data flow tracing is critical.
Valgrind/memcheck uses a secondary shadow for bit-to-bit shadow (for uninitialized),
but I don't think we can afford this in dfsan.
> This isn't specific to `std::vector` - you'll run into similar issues in a bunch of places in user code. libc++ is hardly the only place in the C++ world to do these kinds of bit manipulations. Now, this is the standard library, so we definitely want to be team players when it comes to enabling useful tooling - that's why we have things like `_LIBCPP_NO_CFI` to turn off some sanitizers in specific areas of our code where the sanitizers are known to be wrong.
> Are there ways of locally turning off the dataflow sanitizer? Is there some sort of annotation that we could use to tell the dataflow sanitizer to track at the bit level for some operations inside `vector<bool>`? Those are the kinds of solutions that would be acceptable.
> Also, independently of the dataflow sanitizer, it looks to me like we do have UB right here:
> template <class _Allocator>
> vector<bool, _Allocator>::push_back(const value_type& __x)
> if (this->__size_ == this->capacity())
> reserve(__recommend(this->__size_ + 1));
> back() = __x; // UB HERE
> As you mention in the great description for this review (thanks for that by the way), we call `back() = __x;`, but we might not have not initialized `back()` yet (if we're about to set the first bit in that byte). @miscco can you confirm that we're on the same page? If so, I'll send a review to fix this. I'm a bit surprised ubsan doesn't catch that, but I guess ubsan isn't a silver bullet.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the libcxx-commits