[cfe-dev] [libc++][RFC] Implementing Directory Entry Caching forFilesystem

Billy O'Neal via cfe-dev cfe-dev at lists.llvm.org
Tue Jul 17 10:40:07 PDT 2018


Does anyone have an example of something that is secure without caching that is insecure with caching? All of the examples posted as problematic are still insecure.

Billy3

From: Aaron Ballman
Sent: Tuesday, July 17, 2018 4:46 AM
To: Eric Fiselier
Cc: clang developer list; Marshall Clow; Titus Winters; Billy O'Neal; Geoffrey Romer
Subject: Re: [cfe-dev] [libc++][RFC] Implementing Directory Entry Caching forFilesystem

Thank you for putting this together, I greatly appreciate it!

Out of curiosity, is the caching behavior something you expect you'll
make configurable for users of libc++ (for instance, a macro to enable
or disable caching), or are there issues preventing such an
implementation? My feeling is that we're going to have the usual push
and pull between security and performance where different users will
have different (and equally valid) use cases in mind justifying the
need for the performance provided by caching or the security red flags
enhanced by not caching. We should obviously pick a good default
behavior (and can argue over the definition of "good" in that
context), but I think no matter what option we pick, users will want
the other option to be available.

~Aaron

On Mon, Jul 16, 2018 at 10:39 PM, Eric Fiselier via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> Hi All,
>
>
> I have a couple of questions and concerns regarding implementations of
> P0317R1 [1], which I would like help from the community answering. The paper
> changes `directory_entry` to cache a number of attributes, including file
> status,  permissions, file size, number of hard links, last write time, and
> more. For reference, this is the interface of `directory_entry` before this
> paper [2], and this is the interface after [3].
>
>
> The implementation has a lot of latitude over which attributes it caches, if
> any, and how it caches them. As I'll explain below, each different approach
> can cause non-obvious behavior, possibly leading to TOCTOU bugs and security
> vulnerabilities!
>
>
> My question for the community is this:
>
>
> Given the considerations below, what should libc++ choose to do?
>
>
> 1. Cache nothing?
>
> 2. Cache as much as possible, when possible?
>
> 3. Something in between?
>
>
> I would like to land std::filesystem before the 7.0 branch, and doing so
> requires deciding on which implementation to provide now.
>
>
> Note that this paper only considers POSIX based implementations of
> filesystem.
>
> TOCTOU Violations
>
> ============================
>
>
> Time of check to time of use, or TOCTOU, is a class of software bugs caused
> by changes in a system between the time a condition is checked, and the time
> the result is used [4]. If libc++ chooses to do any caching in
> directory_entry, it will be vulnerable to this. If libc++ chooses to do no
> caching at all, it will also be vulnerable to this but in different
> scenarios.
>
>
> Most importantly, whichever approach is taken, libc++ will bare some
> responsibility for the TOCTOU bugs it allows users to write accidently. So
> it’s imperative we consider all approaches carefully.
>
>
> Let’s take a simple example:
>
>
> void remove_symlinks_in_dir(path p) {
>
>  for (directory_entry ent : directory_iterator(p)) {
>
>    if (ent.is_symlink())
>
>      remove(ent);
>
>  }
>
> }
>
>
> Above, a TOCTOU violation may occur if another process acts on the file
> referenced by `ent`, changing the attributes between the point the
> directory_entry was created and the point at which the `is_symlink`
> condition is checked.
>
>
> At this point it's important to note that <filesystem> interface is, in
> general, vulnerable to TOCTOU --- even without directory_entry caching. For
> example:
>
>
> void remove_symlinks_in_dir(path p) {
>
>  for (directory_entry ent : directory_iterator(p)) {
>
>    if (is_symlink(ent))
>
>      remove(ent);
>
>  }
>
> }
>
>
> In the above case, a TOCTOU violation still occurs, since changes to `ent`
> can take place between the check and the call to `remove`. However, at least
> the above example makes it clear *when* the check is occuring.
>
>
> Simply eliminating caching will not prevent TOCTOU violations, but, in this
> particular case, it would make them harder to write and make it more obvious
> where they may occur. Having a cache both extends the period of time between
> the check and the use, as well as hiding when the check actually occurs.
>
>
> Perhaps the above concessions make it worthwhile to support a more efficient
> interface with caching and fewer file system accesses?
>
>
> If we do choose to cache some attributes, then, at minimum, it should be
> obvious to the user which values are cached and which are not. The following
> section will explore the feasibility of this.
>
>
> Directory Iteration VS Refresh
>
> ==================================
>
>
> The paper's intention is to allow attributes returned during directory
> iteration to be accessed by the user without additional calls to the
> underlying filesystem.
>
>
> The cache is populated two different ways:
>
> During directory iteration, the cache is populated with only the attributes
> returned by directory iteration function. With POSIX `readdir`, only the
> file type as returned by `lstat` is available.
>
> Using the `refresh` function, either called directly by the user, or
> implicitly by the constructors or modifying methods (Ex. `assign`).
> `refresh` has the ability to fully populate the cache (but not atomically in
> some cases, see the symlink considerations below).
>
>
> Note that const accessors may not update the cache.
>
>
> Therefore the state of the cache depends on whether the `directory_entry`
> was created during directory iteration or by the `directory_entry(path&)`
> constructor, and whether there have been any intervening calls to `refresh`
> or any other modifying member functions.
>
>
> The following case demonstrates a surprising TOCTOU violation, only when the
> `directory_entry` cache was populated during directory iteration, and not by
> a call to refresh:
>
>
> // Assume the filesystem supports permissions on symlinks.
>
> bool is_readable_symlink(directory_entry ent) {
>
>  if (!ent.is_symlink())
>
>    return false;
>
>  // We should have the file_status for the symlink cached, right?
>
>  file_status st = ent.symlink_status();
>
>  // Nope! Only when `refresh` has been called are the perms cached.
>
>  assert(is_symlink(st)); // May fail!
>
>  return (st.permissions() & perms::others_read) != perms::none;
>
> }
>
>
> The above example is intended to demonstrate that the different accessors
> may return a mix of cached and uncached values. Here `is_symlink()` is
> cached, but `symlink_status()` may not be.
>
> If the symlink has been replaced with a regular file, then the result of
> this function is potentially dangerous nonsense.
>
>
> So what happens when the user calls `refresh()` before accessing the
> attributes? Does this make it safer?
>
>
> The answer depends on the caching behavior of the implementation. and
> whether that behavior provides "atomic" or "non-atomic" attribute access (as
> defined in the next section).
>
>
> Atomic VS Non-Atomic Attribute Access
>
> =============================================
>
>
> As previously described, TOCTOU violations require there to be intervening
> time between the check and the use. We'll call any filesystem function, or
> set of functions, which don’t allow for TOCTOU bugs "atomic". All other
> functions or set of functions are considered "non-atomic".
>
>
> Therefore, a `directory_entry` object can be said to provide "atomic
> attribute access" across all accessors if and only if (A) all of the
> attributes are cached, and (B) the cache was populated atomically. A
> non-caching implementation of `directory_entry` never provides atomic
> attribute access.
>
>
> Let’s consider the consequences using the following example:
>
>
> bool  is_empty_regular_file(path p) {
>
>  directory_entry ent(p);   // Calls refresh()
>
>  return ent.is_regular_file() && ent.file_size() == 0;
>
> }
>
>
> In a non-caching implementation `refresh()` is a nop, and therefore opens
> the user up to TOCTOU issues occuring between the call to `is_symlink()` and
> `is_regular_file()`.
>
>
> In a fully-caching implementation, which provides atomic attribute access,
> no such problem occurs.
>
>
> In this case it seems preferable for libc++ to provide a fully-caching
> implementation with "atomic attribute access". But is this possible? The
> answer is "sometimes, but not always" as we'll see below.
>
>
> Problems with Symlinks And Refresh()
>
> ===========================================
>
>
> Above we established that if refresh caches at all, then it should do so
> "atomically", retrieving all attributes from the filesystem in a atomic
> manner. Note that for symlinks, some of the cached attributes refer to the
> symlink (is_symlink() and `symlink_status()), while the rest refer to the
> linked entity (is_regular_file(), file_size(), etc).
>
>
> When the directory entry does not refer to a symlink, a single call to
> `lstat` provides enough information to fully populate the cache atomically.
> However, when the entity is a symlink, we would need a second call to `stat`
> to determine the properties of the linked file.
>
>
> Therefore, "atomic attribute access" is not possible to guarantee.
>
>
> An implementation may choose to partially populate the caches, omitting
> attributes about the linked entity; or it may choose to fully populate the
> cache non-atomically using a non-atomic series of calls to `lstat` and
> `stat`.
>
>
> The former case obfuscates from the user which attributes are cached and
> which are not, opening them up to TOCTOU violations.
>
>
> Worse yet, the latter case, which fully populates the cache non-atomically,
> potentially commits a TOCTOU violation itself, and this violation is hidden
> from the user, preventing them from knowing it exists or being able to do
> anything about it.
>
>
> Both solutions seem fraught with problems.
>
> Conclusion
>
> ===============
>
>
> I hope to have shown the pros and cons of different directory entry caching
> implementations. Now I need your help deciding what's best for Libc++!
>
>
> Personally, I think the best solution is to revert these changes to the
> standard, since each possible implementation seems broken in one way or
> another. Directory entries should not cache, and directory iterators should
> not attempt to populate the cache incompletely. Afterwards, a better and
> safer proposal can be put forward  which to provides efficient and atomic
> access to multiple attributes for the same entity can be proposed.
>
>
> Unfortunately, it might be to late at this point to revert the paper, even
> if the committee generally agrees with these problems.
>
>
> Thoughts?
>
>
>
> [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
>
> [2] https://en.cppreference.com/w/cpp/experimental/fs/directory_entry
>
> [3] https://en.cppreference.com/w/cpp/filesystem/directory_entry
>
> [4] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180717/b00eb2de/attachment.html>


More information about the cfe-dev mailing list