[cfe-dev] [libc++][RFC] Implementing Directory Entry Caching forFilesystem

Geoffrey Romer via cfe-dev cfe-dev at lists.llvm.org
Wed Jul 18 10:46:58 PDT 2018


On Wed, Jul 18, 2018 at 9:52 AM Aaron Ballman <aaron at aaronballman.com>
wrote:

> On Tue, Jul 17, 2018 at 5:53 PM, Geoffrey Romer <gromer at google.com> wrote:
> >
> >
> > On Tue, Jul 17, 2018 at 10:40 AM Billy O'Neal <billy.oneal at gmail.com>
> wrote:
> >>
> >> Does anyone have an example of something that is secure without caching
> >> that is insecure with caching? All of the examples posted as
> problematic are
> >> still insecure.
> >
> >
> > +1. By my reading, different caching decisions can only make TOCTOU
> problems
> > more or less acute; caching can't introduce TOCTOU problems that don't
> > already exist.
>
> That is my belief as well. However, I don't find the argument to be
> persuasive, so it's a -1 from me.
>
> > From that point of view, caching could actually be helpful,
> > by making the TOCTOU problems more obvious.
>
> Caching is hidden from the user's view, and I fail to see how
> something hidden from the user's view will will make anything more
> obvious. What caching does do is extend the length of time over which
> TOCTOU bugs can occur, which is worse behavior from a security
> perspective. Users will definitely introduce exciting TOCTOU bugs into
> their code with or without the caching mechanism, but I'm worried
> about worsening the effects in practice.
>

Generally speaking, I would expect "worsening the effects" of the bugs to
make those bugs more obvious.


>
> > The only change we could
> > meaningfully make here is to give users some conditions under which
> > non-modifying operations on the same file are guaranteed to reflect a
> single
> > self-consistent state of the file (which may or may not be the present
> > state), but the benefit of that seems minimal.
> >
> > I think this ship sailed when we decided to standardize a TOCTOU-prone
> API;
> > all we can do now is try to educate users about what not to use
> > std::filesystem for (e.g. anything remotely related to security).
>
> "This doesn't introduce new security issues, it just makes existing
> ones worse" is not an argument that leaves me with warm, fuzzy
> feelings. The goal is obviously to prevent the TOCTOU bugs in the
> first place, but if we cannot achieve that, we shouldn't exacerbate
> the TOCTOU problems only because we can't achieve the ideal.
>

That's not so obvious to me. If exacerbating the TOCTOU problems makes
people more aware of those problems, that could well be a net win.
Conversely, to the extent that the caching design makes it easier to use
std::filesystem in security-sensitive contexts, that is likely to be a net
loss, because std::filesystem simply should not be used in
security-sensitive contexts.

Keep in mind, security exploits are generally not statistical in nature;
halving the time window for a TOCTOU attack does not halve your
vulnerability, because the components of the attack are not occurring at
random; they're being induced by an attacker. Shortening the time window
may have some security benefit on the margins in terms of increasing the
effort required for a successful exploit, but that's highly contingent and
uncertain. I expect the security benefits of eliminating a TOCTOU
vulnerability (e.g. by using a safer filesystem API) to dwarf the benefits
of almost any reduction in the vulnerability time window.


>
> ~Aaron
>
> >
> >
> >>
> >>
> >>
> >> Billy3
> >>
> >>
> >>
> >> From: Aaron Ballman
> >> Sent: Tuesday, July 17, 2018 4:46 AM
> >> To: Eric Fiselier
> >> Cc: clang developer list; Marshall Clow; Titus Winters; Billy O'Neal;
> >> Geoffrey Romer
> >> Subject: Re: [cfe-dev] [libc++][RFC] Implementing Directory Entry
> Caching
> >> forFilesystem
> >>
> >>
> >>
> >> Thank you for putting this together, I greatly appreciate it!
> >>
> >>
> >>
> >> Out of curiosity, is the caching behavior something you expect you'll
> >>
> >> make configurable for users of libc++ (for instance, a macro to enable
> >>
> >> or disable caching), or are there issues preventing such an
> >>
> >> implementation? My feeling is that we're going to have the usual push
> >>
> >> and pull between security and performance where different users will
> >>
> >> have different (and equally valid) use cases in mind justifying the
> >>
> >> need for the performance provided by caching or the security red flags
> >>
> >> enhanced by not caching. We should obviously pick a good default
> >>
> >> behavior (and can argue over the definition of "good" in that
> >>
> >> context), but I think no matter what option we pick, users will want
> >>
> >> the other option to be available.
> >>
> >>
> >>
> >> ~Aaron
> >>
> >>
> >>
> >> On Mon, Jul 16, 2018 at 10:39 PM, Eric Fiselier via cfe-dev
> >>
> >> <cfe-dev at lists.llvm.org> wrote:
> >>
> >> > Hi All,
> >>
> >> >
> >>
> >> >
> >>
> >> > I have a couple of questions and concerns regarding implementations of
> >>
> >> > P0317R1 [1], which I would like help from the community answering. The
> >> > paper
> >>
> >> > changes `directory_entry` to cache a number of attributes, including
> >> > file
> >>
> >> > status,  permissions, file size, number of hard links, last write
> time,
> >> > and
> >>
> >> > more. For reference, this is the interface of `directory_entry` before
> >> > this
> >>
> >> > paper [2], and this is the interface after [3].
> >>
> >> >
> >>
> >> >
> >>
> >> > The implementation has a lot of latitude over which attributes it
> >> > caches, if
> >>
> >> > any, and how it caches them. As I'll explain below, each different
> >> > approach
> >>
> >> > can cause non-obvious behavior, possibly leading to TOCTOU bugs and
> >> > security
> >>
> >> > vulnerabilities!
> >>
> >> >
> >>
> >> >
> >>
> >> > My question for the community is this:
> >>
> >> >
> >>
> >> >
> >>
> >> > Given the considerations below, what should libc++ choose to do?
> >>
> >> >
> >>
> >> >
> >>
> >> > 1. Cache nothing?
> >>
> >> >
> >>
> >> > 2. Cache as much as possible, when possible?
> >>
> >> >
> >>
> >> > 3. Something in between?
> >>
> >> >
> >>
> >> >
> >>
> >> > I would like to land std::filesystem before the 7.0 branch, and doing
> so
> >>
> >> > requires deciding on which implementation to provide now.
> >>
> >> >
> >>
> >> >
> >>
> >> > Note that this paper only considers POSIX based implementations of
> >>
> >> > filesystem.
> >>
> >> >
> >>
> >> > TOCTOU Violations
> >>
> >> >
> >>
> >> > ============================
> >>
> >> >
> >>
> >> >
> >>
> >> > Time of check to time of use, or TOCTOU, is a class of software bugs
> >> > caused
> >>
> >> > by changes in a system between the time a condition is checked, and
> the
> >> > time
> >>
> >> > the result is used [4]. If libc++ chooses to do any caching in
> >>
> >> > directory_entry, it will be vulnerable to this. If libc++ chooses to
> do
> >> > no
> >>
> >> > caching at all, it will also be vulnerable to this but in different
> >>
> >> > scenarios.
> >>
> >> >
> >>
> >> >
> >>
> >> > Most importantly, whichever approach is taken, libc++ will bare some
> >>
> >> > responsibility for the TOCTOU bugs it allows users to write
> accidently.
> >> > So
> >>
> >> > it’s imperative we consider all approaches carefully.
> >>
> >> >
> >>
> >> >
> >>
> >> > Let’s take a simple example:
> >>
> >> >
> >>
> >> >
> >>
> >> > void remove_symlinks_in_dir(path p) {
> >>
> >> >
> >>
> >> >  for (directory_entry ent : directory_iterator(p)) {
> >>
> >> >
> >>
> >> >    if (ent.is_symlink())
> >>
> >> >
> >>
> >> >      remove(ent);
> >>
> >> >
> >>
> >> >  }
> >>
> >> >
> >>
> >> > }
> >>
> >> >
> >>
> >> >
> >>
> >> > Above, a TOCTOU violation may occur if another process acts on the
> file
> >>
> >> > referenced by `ent`, changing the attributes between the point the
> >>
> >> > directory_entry was created and the point at which the `is_symlink`
> >>
> >> > condition is checked.
> >>
> >> >
> >>
> >> >
> >>
> >> > At this point it's important to note that <filesystem> interface is,
> in
> >>
> >> > general, vulnerable to TOCTOU --- even without directory_entry
> caching.
> >> > For
> >>
> >> > example:
> >>
> >> >
> >>
> >> >
> >>
> >> > void remove_symlinks_in_dir(path p) {
> >>
> >> >
> >>
> >> >  for (directory_entry ent : directory_iterator(p)) {
> >>
> >> >
> >>
> >> >    if (is_symlink(ent))
> >>
> >> >
> >>
> >> >      remove(ent);
> >>
> >> >
> >>
> >> >  }
> >>
> >> >
> >>
> >> > }
> >>
> >> >
> >>
> >> >
> >>
> >> > In the above case, a TOCTOU violation still occurs, since changes to
> >> > `ent`
> >>
> >> > can take place between the check and the call to `remove`. However, at
> >> > least
> >>
> >> > the above example makes it clear *when* the check is occuring.
> >>
> >> >
> >>
> >> >
> >>
> >> > Simply eliminating caching will not prevent TOCTOU violations, but, in
> >> > this
> >>
> >> > particular case, it would make them harder to write and make it more
> >> > obvious
> >>
> >> > where they may occur. Having a cache both extends the period of time
> >> > between
> >>
> >> > the check and the use, as well as hiding when the check actually
> occurs.
> >>
> >> >
> >>
> >> >
> >>
> >> > Perhaps the above concessions make it worthwhile to support a more
> >> > efficient
> >>
> >> > interface with caching and fewer file system accesses?
> >>
> >> >
> >>
> >> >
> >>
> >> > If we do choose to cache some attributes, then, at minimum, it should
> be
> >>
> >> > obvious to the user which values are cached and which are not. The
> >> > following
> >>
> >> > section will explore the feasibility of this.
> >>
> >> >
> >>
> >> >
> >>
> >> > Directory Iteration VS Refresh
> >>
> >> >
> >>
> >> > ==================================
> >>
> >> >
> >>
> >> >
> >>
> >> > The paper's intention is to allow attributes returned during directory
> >>
> >> > iteration to be accessed by the user without additional calls to the
> >>
> >> > underlying filesystem.
> >>
> >> >
> >>
> >> >
> >>
> >> > The cache is populated two different ways:
> >>
> >> >
> >>
> >> > During directory iteration, the cache is populated with only the
> >> > attributes
> >>
> >> > returned by directory iteration function. With POSIX `readdir`, only
> the
> >>
> >> > file type as returned by `lstat` is available.
> >>
> >> >
> >>
> >> > Using the `refresh` function, either called directly by the user, or
> >>
> >> > implicitly by the constructors or modifying methods (Ex. `assign`).
> >>
> >> > `refresh` has the ability to fully populate the cache (but not
> >> > atomically in
> >>
> >> > some cases, see the symlink considerations below).
> >>
> >> >
> >>
> >> >
> >>
> >> > Note that const accessors may not update the cache.
> >>
> >> >
> >>
> >> >
> >>
> >> > Therefore the state of the cache depends on whether the
> >> > `directory_entry`
> >>
> >> > was created during directory iteration or by the
> >> > `directory_entry(path&)`
> >>
> >> > constructor, and whether there have been any intervening calls to
> >> > `refresh`
> >>
> >> > or any other modifying member functions.
> >>
> >> >
> >>
> >> >
> >>
> >> > The following case demonstrates a surprising TOCTOU violation, only
> when
> >> > the
> >>
> >> > `directory_entry` cache was populated during directory iteration, and
> >> > not by
> >>
> >> > a call to refresh:
> >>
> >> >
> >>
> >> >
> >>
> >> > // Assume the filesystem supports permissions on symlinks.
> >>
> >> >
> >>
> >> > bool is_readable_symlink(directory_entry ent) {
> >>
> >> >
> >>
> >> >  if (!ent.is_symlink())
> >>
> >> >
> >>
> >> >    return false;
> >>
> >> >
> >>
> >> >  // We should have the file_status for the symlink cached, right?
> >>
> >> >
> >>
> >> >  file_status st = ent.symlink_status();
> >>
> >> >
> >>
> >> >  // Nope! Only when `refresh` has been called are the perms cached.
> >>
> >> >
> >>
> >> >  assert(is_symlink(st)); // May fail!
> >>
> >> >
> >>
> >> >  return (st.permissions() & perms::others_read) != perms::none;
> >>
> >> >
> >>
> >> > }
> >>
> >> >
> >>
> >> >
> >>
> >> > The above example is intended to demonstrate that the different
> >> > accessors
> >>
> >> > may return a mix of cached and uncached values. Here `is_symlink()` is
> >>
> >> > cached, but `symlink_status()` may not be.
> >>
> >> >
> >>
> >> > If the symlink has been replaced with a regular file, then the result
> of
> >>
> >> > this function is potentially dangerous nonsense.
> >>
> >> >
> >>
> >> >
> >>
> >> > So what happens when the user calls `refresh()` before accessing the
> >>
> >> > attributes? Does this make it safer?
> >>
> >> >
> >>
> >> >
> >>
> >> > The answer depends on the caching behavior of the implementation. and
> >>
> >> > whether that behavior provides "atomic" or "non-atomic" attribute
> access
> >> > (as
> >>
> >> > defined in the next section).
> >>
> >> >
> >>
> >> >
> >>
> >> > Atomic VS Non-Atomic Attribute Access
> >>
> >> >
> >>
> >> > =============================================
> >>
> >> >
> >>
> >> >
> >>
> >> > As previously described, TOCTOU violations require there to be
> >> > intervening
> >>
> >> > time between the check and the use. We'll call any filesystem
> function,
> >> > or
> >>
> >> > set of functions, which don’t allow for TOCTOU bugs "atomic". All
> other
> >>
> >> > functions or set of functions are considered "non-atomic".
> >>
> >> >
> >>
> >> >
> >>
> >> > Therefore, a `directory_entry` object can be said to provide "atomic
> >>
> >> > attribute access" across all accessors if and only if (A) all of the
> >>
> >> > attributes are cached, and (B) the cache was populated atomically. A
> >>
> >> > non-caching implementation of `directory_entry` never provides atomic
> >>
> >> > attribute access.
> >>
> >> >
> >>
> >> >
> >>
> >> > Let’s consider the consequences using the following example:
> >>
> >> >
> >>
> >> >
> >>
> >> > bool  is_empty_regular_file(path p) {
> >>
> >> >
> >>
> >> >  directory_entry ent(p);   // Calls refresh()
> >>
> >> >
> >>
> >> >  return ent.is_regular_file() && ent.file_size() == 0;
> >>
> >> >
> >>
> >> > }
> >>
> >> >
> >>
> >> >
> >>
> >> > In a non-caching implementation `refresh()` is a nop, and therefore
> >> > opens
> >>
> >> > the user up to TOCTOU issues occuring between the call to
> `is_symlink()`
> >> > and
> >>
> >> > `is_regular_file()`.
> >>
> >> >
> >>
> >> >
> >>
> >> > In a fully-caching implementation, which provides atomic attribute
> >> > access,
> >>
> >> > no such problem occurs.
> >>
> >> >
> >>
> >> >
> >>
> >> > In this case it seems preferable for libc++ to provide a fully-caching
> >>
> >> > implementation with "atomic attribute access". But is this possible?
> The
> >>
> >> > answer is "sometimes, but not always" as we'll see below.
> >>
> >> >
> >>
> >> >
> >>
> >> > Problems with Symlinks And Refresh()
> >>
> >> >
> >>
> >> > ===========================================
> >>
> >> >
> >>
> >> >
> >>
> >> > Above we established that if refresh caches at all, then it should do
> so
> >>
> >> > "atomically", retrieving all attributes from the filesystem in a
> atomic
> >>
> >> > manner. Note that for symlinks, some of the cached attributes refer to
> >> > the
> >>
> >> > symlink (is_symlink() and `symlink_status()), while the rest refer to
> >> > the
> >>
> >> > linked entity (is_regular_file(), file_size(), etc).
> >>
> >> >
> >>
> >> >
> >>
> >> > When the directory entry does not refer to a symlink, a single call to
> >>
> >> > `lstat` provides enough information to fully populate the cache
> >> > atomically.
> >>
> >> > However, when the entity is a symlink, we would need a second call to
> >> > `stat`
> >>
> >> > to determine the properties of the linked file.
> >>
> >> >
> >>
> >> >
> >>
> >> > Therefore, "atomic attribute access" is not possible to guarantee.
> >>
> >> >
> >>
> >> >
> >>
> >> > An implementation may choose to partially populate the caches,
> omitting
> >>
> >> > attributes about the linked entity; or it may choose to fully populate
> >> > the
> >>
> >> > cache non-atomically using a non-atomic series of calls to `lstat` and
> >>
> >> > `stat`.
> >>
> >> >
> >>
> >> >
> >>
> >> > The former case obfuscates from the user which attributes are cached
> and
> >>
> >> > which are not, opening them up to TOCTOU violations.
> >>
> >> >
> >>
> >> >
> >>
> >> > Worse yet, the latter case, which fully populates the cache
> >> > non-atomically,
> >>
> >> > potentially commits a TOCTOU violation itself, and this violation is
> >> > hidden
> >>
> >> > from the user, preventing them from knowing it exists or being able to
> >> > do
> >>
> >> > anything about it.
> >>
> >> >
> >>
> >> >
> >>
> >> > Both solutions seem fraught with problems.
> >>
> >> >
> >>
> >> > Conclusion
> >>
> >> >
> >>
> >> > ===============
> >>
> >> >
> >>
> >> >
> >>
> >> > I hope to have shown the pros and cons of different directory entry
> >> > caching
> >>
> >> > implementations. Now I need your help deciding what's best for Libc++!
> >>
> >> >
> >>
> >> >
> >>
> >> > Personally, I think the best solution is to revert these changes to
> the
> >>
> >> > standard, since each possible implementation seems broken in one way
> or
> >>
> >> > another. Directory entries should not cache, and directory iterators
> >> > should
> >>
> >> > not attempt to populate the cache incompletely. Afterwards, a better
> and
> >>
> >> > safer proposal can be put forward  which to provides efficient and
> >> > atomic
> >>
> >> > access to multiple attributes for the same entity can be proposed.
> >>
> >> >
> >>
> >> >
> >>
> >> > Unfortunately, it might be to late at this point to revert the paper,
> >> > even
> >>
> >> > if the committee generally agrees with these problems.
> >>
> >> >
> >>
> >> >
> >>
> >> > Thoughts?
> >>
> >> >
> >>
> >> >
> >>
> >> >
> >>
> >> > [1]
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html
> >>
> >> >
> >>
> >> > [2] https://en.cppreference.com/w/cpp/experimental/fs/directory_entry
> >>
> >> >
> >>
> >> > [3] https://en.cppreference.com/w/cpp/filesystem/directory_entry
> >>
> >> >
> >>
> >> > [4] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use
> >>
> >> >
> >>
> >> >
> >>
> >> >
> >>
> >> >
> >>
> >> >
> >>
> >> > _______________________________________________
> >>
> >> > cfe-dev mailing list
> >>
> >> > cfe-dev at lists.llvm.org
> >>
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>
> >> >
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180718/5292fe48/attachment.html>


More information about the cfe-dev mailing list