[libcxx-commits] [libcxx] [libcxx] Caches file attributes during directory iteration. (PR #93316)

Yuriy Chernyshov via libcxx-commits libcxx-commits at lists.llvm.org
Tue May 28 02:41:12 PDT 2024


georgthegreat wrote:

TL;DR: I do not consider _more TOCTOU_ and _less TOCTOU_ as the correct method of thinking. The implementation either protects you from TOCTOU (none does) or not.

@EricWF, thanks for pointing to this issue.

**There is, of course, TOCTOU issue with the code**, yet this PR neither introduced it nor do we consider it as a problem.

filesystem is a naturally shared resource not controlled by a single process, hence one should either 

* do explicit synchronization / locking it from concurrent usage (I am not aware of any implementation that would follow this option)

OR

* do nothing and be OK with slightly out of date information being returned.

**I do not think that either option is an std library responsibility.**
As the [standard prescribes](https://eel.is/c++draft/fs.class.directory.entry#general-2):

> Implementations should store such additional file attributes during directory iteration if their values are available and storing the values would allow the implementation to eliminate file system accesses by directory_entry observer functions

So, technically speaking, this PR makes current implementation more conformant to the standard than it was before.

**Now, let's observe particular consequences brought by this PR.**

First of all, this PR only affects `_LIBCPP_WIN32API` implementation.
Prior to this PR TOCTOU only affected file_type (which was cached already) and now it affects both `file_size` and last_write_time(). This does not seem as a degradation to me.

POSIX-implementations will require additional lstat / stat syscall upon accessing the first attribute. libc++ will cache attributes inside directory_entry thus making it susceptible to TOCTOU, though this will happen one syscall later than for windows implementation. Once this values get cached, no additional syscalls / attribute invalidations will happen.

In both implementations the use can call `refresh()` to request cache invalidation.

**References to other implementations**

* [microsoft/STL caches](https://github.com/microsoft/STL/blob/ff0cff1ad6de63525d0a6646b49cc10667446682/stl/inc/filesystem#L2540) attributes during iterations. Cached information is TOCTOU-affected right after receiving this information from `NextFileW` syscall.
* [libstdc++ does not cache anything](https://github.com/gcc-mirror/gcc/blob/07cdba6294756af350198fbb01ea8c8efeac54dd/libstdc%2B%2B-v3/include/bits/fs_dir.h#L268) thus causing a syscall to access any attribute. The received information becomes TOCTOU-affected right after finishing the syscall and returning information to the user.


https://github.com/llvm/llvm-project/pull/93316


More information about the libcxx-commits mailing list