[cfe-dev] [libc++][RFC] Implementing Directory Entry Caching forFilesystem
Eric Fiselier via cfe-dev
cfe-dev at lists.llvm.org
Tue Jul 17 15:32:29 PDT 2018
On Tue, Jul 17, 2018 at 3:53 PM Geoffrey Romer <gromer at google.com> wrote:
>
>
> On Tue, Jul 17, 2018 at 10:40 AM Billy O'Neal <billy.oneal at gmail.com>
> wrote:
>
>> Does anyone have an example of something that is secure without caching
>> that is insecure with caching? All of the examples posted as problematic
>> are still insecure.
>>
>
> +1. By my reading, different caching decisions can only make TOCTOU
> problems more or less acute; caching can't introduce TOCTOU problems that
> don't already exist. From that point of view, caching could actually be
> helpful, by making the TOCTOU problems more obvious. The only change we
> could meaningfully make here is to give users some conditions under which
> non-modifying operations on the same file are guaranteed to reflect a
> single self-consistent state of the file (which may or may not be the
> present state), but the benefit of that seems minimal.
>
> I think this ship sailed when we decided to standardize a TOCTOU-prone
> API; all we can do now is try to educate users about what not to use
> std::filesystem for (e.g. anything remotely related to security).
>
>
Thanks for the excellent input. I'll go ahead and implement it with caching.
/Eric
>
>
>>
>>
>> Billy3
>>
>>
>>
>> *From: *Aaron Ballman <aaron at aaronballman.com>
>> *Sent: *Tuesday, July 17, 2018 4:46 AM
>> *To: *Eric Fiselier <eric at efcs.ca>
>> *Cc: *clang developer list <cfe-dev at lists.llvm.org>; Marshall Clow
>> <mclow.lists at gmail.com>; Titus Winters <titus at google.com>; Billy O'Neal
>> <billy.oneal at gmail.com>; Geoffrey Romer <gromer at google.com>
>> *Subject: *Re: [cfe-dev] [libc++][RFC] Implementing Directory Entry
>> Caching forFilesystem
>>
>>
>>
>> Thank you for putting this together, I greatly appreciate it!
>>
>>
>>
>> Out of curiosity, is the caching behavior something you expect you'll
>>
>> make configurable for users of libc++ (for instance, a macro to enable
>>
>> or disable caching), or are there issues preventing such an
>>
>> implementation? My feeling is that we're going to have the usual push
>>
>> and pull between security and performance where different users will
>>
>> have different (and equally valid) use cases in mind justifying the
>>
>> need for the performance provided by caching or the security red flags
>>
>> enhanced by not caching. We should obviously pick a good default
>>
>> behavior (and can argue over the definition of "good" in that
>>
>> context), but I think no matter what option we pick, users will want
>>
>> the other option to be available.
>>
>>
>>
>> ~Aaron
>>
>>
>>
>> On Mon, Jul 16, 2018 at 10:39 PM, Eric Fiselier via cfe-dev
>>
>> <cfe-dev at lists.llvm.org> wrote:
>>
>> > Hi All,
>>
>> >
>>
>> >
>>
>> > I have a couple of questions and concerns regarding implementations of
>>
>> > P0317R1 [1], which I would like help from the community answering. The
>> paper
>>
>> > changes `directory_entry` to cache a number of attributes, including
>> file
>>
>> > status, permissions, file size, number of hard links, last write time,
>> and
>>
>> > more. For reference, this is the interface of `directory_entry` before
>> this
>>
>> > paper [2], and this is the interface after [3].
>>
>> >
>>
>> >
>>
>> > The implementation has a lot of latitude over which attributes it
>> caches, if
>>
>> > any, and how it caches them. As I'll explain below, each different
>> approach
>>
>> > can cause non-obvious behavior, possibly leading to TOCTOU bugs and
>> security
>>
>> > vulnerabilities!
>>
>> >
>>
>> >
>>
>> > My question for the community is this:
>>
>> >
>>
>> >
>>
>> > Given the considerations below, what should libc++ choose to do?
>>
>> >
>>
>> >
>>
>> > 1. Cache nothing?
>>
>> >
>>
>> > 2. Cache as much as possible, when possible?
>>
>> >
>>
>> > 3. Something in between?
>>
>> >
>>
>> >
>>
>> > I would like to land std::filesystem before the 7.0 branch, and doing so
>>
>> > requires deciding on which implementation to provide now.
>>
>> >
>>
>> >
>>
>> > Note that this paper only considers POSIX based implementations of
>>
>> > filesystem.
>>
>> >
>>
>> > TOCTOU Violations
>>
>> >
>>
>> > ============================
>>
>> >
>>
>> >
>>
>> > Time of check to time of use, or TOCTOU, is a class of software bugs
>> caused
>>
>> > by changes in a system between the time a condition is checked, and the
>> time
>>
>> > the result is used [4]. If libc++ chooses to do any caching in
>>
>> > directory_entry, it will be vulnerable to this. If libc++ chooses to do
>> no
>>
>> > caching at all, it will also be vulnerable to this but in different
>>
>> > scenarios.
>>
>> >
>>
>> >
>>
>> > Most importantly, whichever approach is taken, libc++ will bare some
>>
>> > responsibility for the TOCTOU bugs it allows users to write accidently.
>> So
>>
>> > it’s imperative we consider all approaches carefully.
>>
>> >
>>
>> >
>>
>> > Let’s take a simple example:
>>
>> >
>>
>> >
>>
>> > void remove_symlinks_in_dir(path p) {
>>
>> >
>>
>> > for (directory_entry ent : directory_iterator(p)) {
>>
>> >
>>
>> > if (ent.is_symlink())
>>
>> >
>>
>> > remove(ent);
>>
>> >
>>
>> > }
>>
>> >
>>
>> > }
>>
>> >
>>
>> >
>>
>> > Above, a TOCTOU violation may occur if another process acts on the file
>>
>> > referenced by `ent`, changing the attributes between the point the
>>
>> > directory_entry was created and the point at which the `is_symlink`
>>
>> > condition is checked.
>>
>> >
>>
>> >
>>
>> > At this point it's important to note that <filesystem> interface is, in
>>
>> > general, vulnerable to TOCTOU --- even without directory_entry caching.
>> For
>>
>> > example:
>>
>> >
>>
>> >
>>
>> > void remove_symlinks_in_dir(path p) {
>>
>> >
>>
>> > for (directory_entry ent : directory_iterator(p)) {
>>
>> >
>>
>> > if (is_symlink(ent))
>>
>> >
>>
>> > remove(ent);
>>
>> >
>>
>> > }
>>
>> >
>>
>> > }
>>
>> >
>>
>> >
>>
>> > In the above case, a TOCTOU violation still occurs, since changes to
>> `ent`
>>
>> > can take place between the check and the call to `remove`. However, at
>> least
>>
>> > the above example makes it clear *when* the check is occuring.
>>
>> >
>>
>> >
>>
>> > Simply eliminating caching will not prevent TOCTOU violations, but, in
>> this
>>
>> > particular case, it would make them harder to write and make it more
>> obvious
>>
>> > where they may occur. Having a cache both extends the period of time
>> between
>>
>> > the check and the use, as well as hiding when the check actually occurs.
>>
>> >
>>
>> >
>>
>> > Perhaps the above concessions make it worthwhile to support a more
>> efficient
>>
>> > interface with caching and fewer file system accesses?
>>
>> >
>>
>> >
>>
>> > If we do choose to cache some attributes, then, at minimum, it should be
>>
>> > obvious to the user which values are cached and which are not. The
>> following
>>
>> > section will explore the feasibility of this.
>>
>> >
>>
>> >
>>
>> > Directory Iteration VS Refresh
>>
>> >
>>
>> > ==================================
>>
>> >
>>
>> >
>>
>> > The paper's intention is to allow attributes returned during directory
>>
>> > iteration to be accessed by the user without additional calls to the
>>
>> > underlying filesystem.
>>
>> >
>>
>> >
>>
>> > The cache is populated two different ways:
>>
>> >
>>
>> > During directory iteration, the cache is populated with only the
>> attributes
>>
>> > returned by directory iteration function. With POSIX `readdir`, only the
>>
>> > file type as returned by `lstat` is available.
>>
>> >
>>
>> > Using the `refresh` function, either called directly by the user, or
>>
>> > implicitly by the constructors or modifying methods (Ex. `assign`).
>>
>> > `refresh` has the ability to fully populate the cache (but not
>> atomically in
>>
>> > some cases, see the symlink considerations below).
>>
>> >
>>
>> >
>>
>> > Note that const accessors may not update the cache.
>>
>> >
>>
>> >
>>
>> > Therefore the state of the cache depends on whether the
>> `directory_entry`
>>
>> > was created during directory iteration or by the
>> `directory_entry(path&)`
>>
>> > constructor, and whether there have been any intervening calls to
>> `refresh`
>>
>> > or any other modifying member functions.
>>
>> >
>>
>> >
>>
>> > The following case demonstrates a surprising TOCTOU violation, only
>> when the
>>
>> > `directory_entry` cache was populated during directory iteration, and
>> not by
>>
>> > a call to refresh:
>>
>> >
>>
>> >
>>
>> > // Assume the filesystem supports permissions on symlinks.
>>
>> >
>>
>> > bool is_readable_symlink(directory_entry ent) {
>>
>> >
>>
>> > if (!ent.is_symlink())
>>
>> >
>>
>> > return false;
>>
>> >
>>
>> > // We should have the file_status for the symlink cached, right?
>>
>> >
>>
>> > file_status st = ent.symlink_status();
>>
>> >
>>
>> > // Nope! Only when `refresh` has been called are the perms cached.
>>
>> >
>>
>> > assert(is_symlink(st)); // May fail!
>>
>> >
>>
>> > return (st.permissions() & perms::others_read) != perms::none;
>>
>> >
>>
>> > }
>>
>> >
>>
>> >
>>
>> > The above example is intended to demonstrate that the different
>> accessors
>>
>> > may return a mix of cached and uncached values. Here `is_symlink()` is
>>
>> > cached, but `symlink_status()` may not be.
>>
>> >
>>
>> > If the symlink has been replaced with a regular file, then the result of
>>
>> > this function is potentially dangerous nonsense.
>>
>> >
>>
>> >
>>
>> > So what happens when the user calls `refresh()` before accessing the
>>
>> > attributes? Does this make it safer?
>>
>> >
>>
>> >
>>
>> > The answer depends on the caching behavior of the implementation. and
>>
>> > whether that behavior provides "atomic" or "non-atomic" attribute
>> access (as
>>
>> > defined in the next section).
>>
>> >
>>
>> >
>>
>> > Atomic VS Non-Atomic Attribute Access
>>
>> >
>>
>> > =============================================
>>
>> >
>>
>> >
>>
>> > As previously described, TOCTOU violations require there to be
>> intervening
>>
>> > time between the check and the use. We'll call any filesystem function,
>> or
>>
>> > set of functions, which don’t allow for TOCTOU bugs "atomic". All other
>>
>> > functions or set of functions are considered "non-atomic".
>>
>> >
>>
>> >
>>
>> > Therefore, a `directory_entry` object can be said to provide "atomic
>>
>> > attribute access" across all accessors if and only if (A) all of the
>>
>> > attributes are cached, and (B) the cache was populated atomically. A
>>
>> > non-caching implementation of `directory_entry` never provides atomic
>>
>> > attribute access.
>>
>> >
>>
>> >
>>
>> > Let’s consider the consequences using the following example:
>>
>> >
>>
>> >
>>
>> > bool is_empty_regular_file(path p) {
>>
>> >
>>
>> > directory_entry ent(p); // Calls refresh()
>>
>> >
>>
>> > return ent.is_regular_file() && ent.file_size() == 0;
>>
>> >
>>
>> > }
>>
>> >
>>
>> >
>>
>> > In a non-caching implementation `refresh()` is a nop, and therefore
>> opens
>>
>> > the user up to TOCTOU issues occuring between the call to
>> `is_symlink()` and
>>
>> > `is_regular_file()`.
>>
>> >
>>
>> >
>>
>> > In a fully-caching implementation, which provides atomic attribute
>> access,
>>
>> > no such problem occurs.
>>
>> >
>>
>> >
>>
>> > In this case it seems preferable for libc++ to provide a fully-caching
>>
>> > implementation with "atomic attribute access". But is this possible? The
>>
>> > answer is "sometimes, but not always" as we'll see below.
>>
>> >
>>
>> >
>>
>> > Problems with Symlinks And Refresh()
>>
>> >
>>
>> > ===========================================
>>
>> >
>>
>> >
>>
>> > Above we established that if refresh caches at all, then it should do so
>>
>> > "atomically", retrieving all attributes from the filesystem in a atomic
>>
>> > manner. Note that for symlinks, some of the cached attributes refer to
>> the
>>
>> > symlink (is_symlink() and `symlink_status()), while the rest refer to
>> the
>>
>> > linked entity (is_regular_file(), file_size(), etc).
>>
>> >
>>
>> >
>>
>> > When the directory entry does not refer to a symlink, a single call to
>>
>> > `lstat` provides enough information to fully populate the cache
>> atomically.
>>
>> > However, when the entity is a symlink, we would need a second call to
>> `stat`
>>
>> > to determine the properties of the linked file.
>>
>> >
>>
>> >
>>
>> > Therefore, "atomic attribute access" is not possible to guarantee.
>>
>> >
>>
>> >
>>
>> > An implementation may choose to partially populate the caches, omitting
>>
>> > attributes about the linked entity; or it may choose to fully populate
>> the
>>
>> > cache non-atomically using a non-atomic series of calls to `lstat` and
>>
>> > `stat`.
>>
>> >
>>
>> >
>>
>> > The former case obfuscates from the user which attributes are cached and
>>
>> > which are not, opening them up to TOCTOU violations.
>>
>> >
>>
>> >
>>
>> > Worse yet, the latter case, which fully populates the cache
>> non-atomically,
>>
>> > potentially commits a TOCTOU violation itself, and this violation is
>> hidden
>>
>> > from the user, preventing them from knowing it exists or being able to
>> do
>>
>> > anything about it.
>>
>> >
>>
>> >
>>
>> > Both solutions seem fraught with problems.
>>
>> >
>>
>> > Conclusion
>>
>> >
>>
>> > ===============
>>
>> >
>>
>> >
>>
>> > I hope to have shown the pros and cons of different directory entry
>> caching
>>
>> > implementations. Now I need your help deciding what's best for Libc++!
>>
>> >
>>
>> >
>>
>> > Personally, I think the best solution is to revert these changes to the
>>
>> > standard, since each possible implementation seems broken in one way or
>>
>> > another. Directory entries should not cache, and directory iterators
>> should
>>
>> > not attempt to populate the cache incompletely. Afterwards, a better and
>>
>> > safer proposal can be put forward which to provides efficient and
>> atomic
>>
>> > access to multiple attributes for the same entity can be proposed.
>>
>> >
>>
>> >
>>
>> > Unfortunately, it might be to late at this point to revert the paper,
>> even
>>
>> > if the committee generally agrees with these problems.
>>
>> >
>>
>> >
>>
>> > Thoughts?
>>
>> >
>>
>> >
>>
>> >
>>
>> > [1]
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
>>
>> >
>>
>> > [2] https://en.cppreference.com/w/cpp/experimental/fs/directory_entry
>>
>> >
>>
>> > [3] https://en.cppreference.com/w/cpp/filesystem/directory_entry
>>
>> >
>>
>> > [4] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > _______________________________________________
>>
>> > cfe-dev mailing list
>>
>> > cfe-dev at lists.llvm.org
>>
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>> >
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180717/23590e06/attachment.html>
More information about the cfe-dev
mailing list