[libcxx-commits] [PATCH] D118134: [libc++] Fix TOCTOU issue with std::filesystem::remove_all

Louis Dionne via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Wed Jan 26 08:08:46 PST 2022


ldionne marked 10 inline comments as done.
ldionne added inline comments.


================
Comment at: libcxx/src/filesystem/directory_iterator.cpp:77
-#else
-// defined(_LIBCPP_WIN32API)
 
----------------
Quuxplusone wrote:
> I wonder if it would be simpler to just move `remove_all` into //this// .cpp file. A priori (without seeing that this diff is where they came from) it's weird to see static functions in a .h file.
I hadn't noticed they were static. I think it would be entirely fine to make them non-static and leave them in the header, but I'd rather not move `remove_all` into this file, it seems pretty weird to have that operation (and only that one) inside `directory_iterator.cpp`.

Edit: After looking more at the contents of the file, there's a bunch of `static` functions and almost everything is defined in an anonymous namespace. Honestly, it's just weird. I'll make the move as-is in a separate commit and we can take an action item to go back and refactor this later.


================
Comment at: libcxx/src/filesystem/filesystem_common.h:570
+  ec.clear();
+  if ((dir_entry_ptr = ::readdir(dir_stream)) == nullptr) {
+    if (errno)
----------------
Quuxplusone wrote:
> Seems like a perfect place for `if (struct dirent *dir_entry_ptr = ::readdir(dir_stream)) { ...` (and swap the if/else branches).
> 
> Btw, for anyone else who's confused why we aren't using `::readdir_r`, apparently `::readdir` is equally thread-safe on modern systems (at least, those modern systems documented by LWN ;)) https://lwn.net/Articles/696474/ We should expect that `dir_entry_ptr` will point to some memory located physically inside the footprint of `*dir_stream` (as opposed to some static buffer or something).
> 
> However, I now also see that this code simply moved from `directory_iterator.cpp` (and have suggested that maybe `remove_all` should move over //there// instead), in which case there's no need to drive-by refactor any of this particular function.
Right, I think I'll move the code in a separate commit to make this less confusing.


================
Comment at: libcxx/src/filesystem/filesystem_common.h:572
+    if (errno)
+      ec = capture_errno();
+    return {};
----------------
ojhunt wrote:
> (non-libc++ expert) Sorry if this is a dumb question: is capture_errno() API? is it safe to have in a header?
`filesystem_common.h` is only used inside the sources of libc++ used for building the shared library. But yeah, I think it would be fine regardless, it's just a helper function.


================
Comment at: libcxx/src/filesystem/operations.cpp:1351
+
+  ~scope_exit() { cleanup_(); }
+
----------------
Quuxplusone wrote:
> If anything below were actually to throw an //exception//, then our `count` would be wrong. But this simplifies the early-return cases even in the absence of exceptions, so I assume that's why you did it.
Yes, simplifying the cleanup (without worrying about exceptions) is the purpose of this class. I am assuming that nothing throws in the filesystem code because we are always using the `error_code` version of functions.


================
Comment at: libcxx/src/filesystem/operations.cpp:1372
+
+    intmax_t count = 0;
+    while (true) {
----------------
Quuxplusone wrote:
> Any reason for `intmax_t` here but `uintmax_t` in the function return type? I suggest consistency, but don't care which.
Sorry, that was a typo. It should have been `uintmax_t`, thanks for spotting.


================
Comment at: libcxx/src/filesystem/operations.cpp:1385
+    // Then, remove the directory itself.
+    if (::unlinkat(parent_directory, p.c_str(), AT_REMOVEDIR) == -1) {
+      ec = detail::capture_errno();
----------------
Quuxplusone wrote:
> FWIW, this line is still racey:
> https://stackoverflow.com/questions/28517236/can-posix-linux-unlink-file-entries-completely-race-free
> https://bugzilla.kernel.org/show_bug.cgi?id=93441
> The primitive that we need here is "remove-`parent_directory`'s-child-named-`p`-iff-it-still-refers-to-the-same-inode-as-`fd`", i.e., a sort of compare-exchange primitive, which Linux does not provide and which is impossible to emulate in userspace.
> 
> So, this line right here is already the state of the art, and I can't think of anything better to do than to be OK with it.
> 
> Also, at first glance I can't see any way to really "exploit" this race. Either (happy path) you remove the now-empty directory you intended to; or the attacker moves that now-empty directory out of the way and puts their own thing in its place which you fail to delete (because it's not a directory, or because it's a non-empty directory); or the attacker moves that now-empty directory out of the way and puts their own empty directory in its place and you delete it (but it was an empty directory, and it wasn't a symlink (because it was a directory), so who cares).
> Also, we will //always// have this same race on the front side of the transaction: the programmer gives us a `fs::path` and we start removing whatever's there //now//, not whatever //was// there when the programmer decided to call into us.
Right, I agree -- I hadn't thought about this race but I think we're both on the same page that it is benign (and also unavoidable without different OS APIs).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118134/new/

https://reviews.llvm.org/D118134



More information about the libcxx-commits mailing list