[cfe-dev] [libc++][RFC] Implementing Directory Entry Caching for Filesystem

Eric Fiselier via cfe-dev cfe-dev at lists.llvm.org
Tue Jul 17 11:49:12 PDT 2018


Adding CC's for some other potentially interested parties.

On Mon, Jul 16, 2018 at 8:39 PM Eric Fiselier <eric at efcs.ca> wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Hi All,I have a couple of questions and concerns regarding
> implementations of P0317R1
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html> [1],
> which I would like help from the community answering. The paper changes
> `directory_entry` to cache a number of attributes, including file status,
>  permissions, file size, number of hard links, last write time, and more.
> For reference, this is the interface
> <https://en.cppreference.com/w/cpp/experimental/fs/directory_entry> of
> `directory_entry` before this paper [2], and this is the interface
> <https://en.cppreference.com/w/cpp/filesystem/directory_entry> after
> [3].The implementation has a lot of latitude over which attributes it
> caches, if any, and how it caches them. As I'll explain below, each
> different approach can cause non-obvious behavior, possibly leading to
> TOCTOU bugs and security vulnerabilities!My question for the community is
> this:Given the considerations below, what should libc++ choose to do?1.
> Cache nothing?2. Cache as much as possible, when possible?3. Something in
> between?I would like to land std::filesystem before the 7.0 branch, and
> doing so requires deciding on which implementation to provide now. Note
> that this paper only considers POSIX based implementations of
> filesystem.TOCTOU Violations============================Time of check to
> time of use, or TOCTOU, is a class of software bugs caused by changes in a
> system between the time a condition is checked, and the time the result is
> used [4]. If libc++ chooses to do any caching in directory_entry, it will
> be vulnerable to this. If libc++ chooses to do no caching at all, it will
> also be vulnerable to this but in different scenarios.Most importantly,
> whichever approach is taken, libc++ will bare some responsibility for the
> TOCTOU bugs it allows users to write accidently. So it’s imperative we
> consider all approaches carefully.Let’s take a simple example:void
> remove_symlinks_in_dir(path p) {  for (directory_entry ent :
> directory_iterator(p)) {    if (ent.is_symlink())      remove(ent);
>  }}Above, a TOCTOU violation may occur if another process acts on the file
> referenced by `ent`, changing the attributes between the point the
> directory_entry was created and the point at which the `is_symlink`
> condition is checked.At this point it's important to note that <filesystem>
> interface is, in general, vulnerable to TOCTOU --- even without
> directory_entry caching. For example:void remove_symlinks_in_dir(path p) {
>  for (directory_entry ent : directory_iterator(p)) {    if
> (is_symlink(ent))      remove(ent);  }}In the above case, a TOCTOU
> violation still occurs, since changes to `ent` can take place between the
> check and the call to `remove`. However, at least the above example makes
> it clear *when* the check is occuring.Simply eliminating caching will not
> prevent TOCTOU violations, but, in this particular case, it would make them
> harder to write and make it more obvious where they may occur. Having a
> cache both extends the period of time between the check and the use, as
> well as hiding when the check actually occurs.Perhaps the above concessions
> make it worthwhile to support a more efficient interface with caching and
> fewer file system accesses?If we do choose to cache some attributes, then,
> at minimum, it should be obvious to the user which values are cached and
> which are not. The following section will explore the feasibility of
> this.Directory Iteration VS Refresh==================================The
> paper's intention is to allow attributes returned during directory
> iteration to be accessed by the user without additional calls to the
> underlying filesystem. The cache is populated two different ways: - During
> directory iteration, the cache is populated with only the attributes
> returned by directory iteration function. With POSIX `readdir`, only the
> file type as returned by `lstat` is available.- Using the `refresh`
> function, either called directly by the user, or implicitly by the
> constructors or modifying methods (Ex. `assign`). `refresh` has the ability
> to fully populate the cache (but not atomically in some cases, see the
> symlink considerations below).Note that const accessors may not update the
> cache. Therefore the state of the cache depends on whether the
> `directory_entry` was created during directory iteration or by the
> `directory_entry(path&)` constructor, and whether there have been any
> intervening calls to `refresh` or any other modifying member functions. The
> following case demonstrates a surprising TOCTOU violation, only when the
> `directory_entry` cache was populated during directory iteration, and not
> by a call to refresh:// Assume the filesystem supports permissions on
> symlinks.bool is_readable_symlink(directory_entry ent) {  if
> (!ent.is_symlink())    return false;  // We should have the file_status for
> the symlink cached, right?  file_status st = ent.symlink_status();  //
> Nope! Only when `refresh` has been called are the perms cached.
>  assert(is_symlink(st)); // May fail!  return (st.permissions() &
> perms::others_read) != perms::none;}The above example is intended to
> demonstrate that the different accessors may return a mix of cached and
> uncached values. Here `is_symlink()` is cached, but `symlink_status()` may
> not be. If the symlink has been replaced with a regular file, then the
> result of this function is potentially dangerous nonsense.So what happens
> when the user calls `refresh()` before accessing the attributes? Does this
> make it safer?The answer depends on the caching behavior of the
> implementation. and whether that behavior provides "atomic" or "non-atomic"
> attribute access (as defined in the next section).Atomic VS Non-Atomic
> Attribute Access=============================================As previously
> described, TOCTOU violations require there to be intervening time between
> the check and the use. We'll call any filesystem function, or set of
> functions, which don’t allow for TOCTOU bugs "atomic". All other functions
> or set of functions are considered "non-atomic".Therefore, a
> `directory_entry` object can be said to provide "atomic attribute access"
> across all accessors if and only if (A) all of the attributes are cached,
> and (B) the cache was populated atomically. A non-caching implementation of
> `directory_entry` never provides atomic attribute access.Let’s consider the
> consequences using the following example:bool  is_empty_regular_file(path
> p) {  directory_entry ent(p);   // Calls refresh()  return
> ent.is_regular_file() && ent.file_size() == 0;}In a non-caching
> implementation `refresh()` is a nop, and therefore opens the user up to
> TOCTOU issues occuring between the call to `is_symlink()` and
> `is_regular_file()`.In a fully-caching implementation, which provides
> atomic attribute access, no such problem occurs.In this case it seems
> preferable for libc++ to provide a fully-caching implementation with
> "atomic attribute access". But is this possible? The answer is "sometimes,
> but not always" as we'll see below.Problems with Symlinks And
> Refresh()===========================================Above we established
> that if refresh caches at all, then it should do so "atomically",
> retrieving all attributes from the filesystem in a atomic manner. Note that
> for symlinks, some of the cached attributes refer to the symlink
> (is_symlink() and `symlink_status()), while the rest refer to the linked
> entity (is_regular_file(), file_size(), etc).When the directory entry does
> not refer to a symlink, a single call to `lstat` provides enough
> information to fully populate the cache atomically. However, when the
> entity is a symlink, we would need a second call to `stat` to determine the
> properties of the linked file.Therefore, "atomic attribute access" is not
> possible to guarantee.An implementation may choose to partially populate
> the caches, omitting attributes about the linked entity; or it may choose
> to fully populate the cache non-atomically using a non-atomic series of
> calls to `lstat` and `stat`.The former case obfuscates from the user which
> attributes are cached and which are not, opening them up to TOCTOU
> violations.Worse yet, the latter case, which fully populates the cache
> non-atomically, potentially commits a TOCTOU violation itself, and this
> violation is hidden from the user, preventing them from knowing it exists
> or being able to do anything about it.Both solutions seem fraught with
> problems.Conclusion===============I hope to have shown the pros and cons of
> different directory entry caching implementations. Now I need your help
> deciding what's best for Libc++!Personally, I think the best solution is to
> revert these changes to the standard, since each possible implementation
> seems broken in one way or another. Directory entries should not cache, and
> directory iterators should not attempt to populate the cache incompletely.
> Afterwards, a better and safer proposal can be put forward  which to
> provides efficient and atomic access to multiple attributes for the same
> entity can be proposed.Unfortunately, it might be to late at this point to
> revert the paper, even if the committee generally agrees with these
> problems.Thoughts? [1]
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html>[2]
> https://en.cppreference.com/w/cpp/experimental/fs/directory_entry
> <https://en.cppreference.com/w/cpp/experimental/fs/directory_entry>[3]
> https://en.cppreference.com/w/cpp/filesystem/directory_entry
> <https://en.cppreference.com/w/cpp/filesystem/directory_entry>[4]
> https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
> <https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use> *
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180717/634237d4/attachment-0001.html>


More information about the cfe-dev mailing list