<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
h1
{mso-style-priority:9;
mso-style-link:"Heading 1 Char";
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:24.0pt;
font-family:"Calibri",sans-serif;
font-weight:bold;}
h2
{mso-style-priority:9;
mso-style-link:"Heading 2 Char";
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:18.0pt;
font-family:"Calibri",sans-serif;
font-weight:bold;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.Heading1Char
{mso-style-name:"Heading 1 Char";
mso-style-priority:9;
mso-style-link:"Heading 1";
font-family:"Calibri",sans-serif;
font-weight:bold;}
span.Heading2Char
{mso-style-name:"Heading 2 Char";
mso-style-priority:9;
mso-style-link:"Heading 2";
font-family:"Calibri",sans-serif;
font-weight:bold;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:952857282;
mso-list-template-ids:-1;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style></head><body lang=EN-US link=blue vlink="#954F72"><div class=WordSection1><p class=MsoNormal>MSVC++’s caching is atomic as the underlying platform can be (in that we call only one API when refreshing). When enumerating a directory, FindFirstFileW/FindNextFileW return a <a href="https://docs.microsoft.com/en-us/windows/desktop/api/minwinbase/ns-minwinbase-_win32_find_dataw">WIN32_FIND_DATAW</a>, which contain all of the data that directory_entry can cache. Note that the data returned is information about a reparse point (e.g. symlink or junction), never the reparse point target. If the results indicate that a reparse point is present, most data becomes uncached, though we can still answer some questions, like is_directory, without following a reparse point. We treat IO_REPARSE_TAG_SYMLINK as a symbolic link, and IO_REPARSE_TAG_MOUNT_POINT as an implementation-defined file_type::junction. All other reparse points are treated like ordinary directories, as that is the intended behavior of hierarchical storage management, clustered FS, and similar. refresh() is similarly atomic in that it calls only GetFileAttributesExW, which returns a <a href="https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/ns-fileapi-_win32_file_attribute_data">WIN32_FILE_ATTRIBUTE_DATA</a>, containing everything WIN32_FIND_DATAW does except the reparse point type tag, so if we see a reparse point at all we don’t/can’t cache whether it is a symlink or not if refresh() has been called.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>><span style='font-family:"Arial",sans-serif;color:#222222'> I think the best solution is to revert these changes to the standard<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:#222222'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:#222222'>I think we have to be strongly against this, as we have already shipped this in production, and lack of caching prevents us from exposing data returned directly by our platform directory enumeration API (this is why we argued so strongly to put this in in the first place).<o:p></o:p></span></p><p class=MsoNormal><span style='font-family:"Arial",sans-serif;color:#222222'><o:p> </o:p></span></p><p class=MsoNormal>I think all stat()-like APIs are intrinsically vulnerable to the TOCTOU problem you describe. For example, even with “atomic attribute access”, your example “is_empty_regular_file” is broken before it even returns. If you built the same function out of a stat call it would be just as broken. I don’t think that is a problem std::filesystem can solve.</p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Billy3</p><p class=MsoNormal><o:p> </o:p></p><div style='mso-element:para-border-div;border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal style='border:none;padding:0in'><b>From: </b><a href="mailto:eric@efcs.ca">Eric Fiselier</a><br><b>Sent: </b>Monday, July 16, 2018 7:39 PM<br><b>To: </b><a href="mailto:cfe-dev@lists.llvm.org">clang developer list</a>; <a href="mailto:mclow.lists@gmail.com">Marshall Clow</a>; <a href="mailto:titus@google.com">Titus Winters</a>; <a href="mailto:billy.oneal@gmail.com">Billy O'Neal</a>; <a href="mailto:gromer@google.com">Geoffrey Romer</a><br><b>Subject: </b>[libc++][RFC] Implementing Directory Entry Caching for Filesystem</p></div><p class=MsoNormal><o:p> </o:p></p><div><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Hi All,</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>I have a couple of questions and concerns regarding implementations of </span><b><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html"><span style='font-family:"Arial",sans-serif;color:#1155CC;font-weight:normal'>P0317R1</span></a></b><span style='font-family:"Arial",sans-serif;color:#222222'> [1], which I would like help from the community answering. The paper changes `directory_entry` to cache a number of attributes, including file status, permissions, file size, number of hard links, last write time, and more. <span style='background:white'>For reference, </span></span><b><a href="https://en.cppreference.com/w/cpp/experimental/fs/directory_entry"><span style='font-family:"Arial",sans-serif;color:#1155CC;background:white;font-weight:normal'>this is the interface</span></a></b><span style='font-family:"Arial",sans-serif;color:#222222;background:white'> of `directory_entry` before this paper [2], and </span><b><a href="https://en.cppreference.com/w/cpp/filesystem/directory_entry"><span style='font-family:"Arial",sans-serif;color:#1155CC;background:white;font-weight:normal'>this is the interface</span></a></b><span style='font-family:"Arial",sans-serif;color:#222222;background:white'> after [3].</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>The implementation has a lot of latitude over which attributes it caches, if any, and how it caches them. As I'll explain below, each different approach can cause non-obvious behavior, possibly leading to TOCTOU bugs and security vulnerabilities!</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>My question for the community is this:</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Given the considerations below, what should libc++ choose to do?</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>1. Cache nothing?</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>2. Cache as much as possible, when possible?</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>3. Something in between?</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>I would like to land std::filesystem before the 7.0 branch, and doing so requires deciding on which implementation to provide now. </span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Note that this paper only considers POSIX based implementations of filesystem.</span><b><o:p></o:p></b></p><h1 style='mso-margin-top-alt:20.0pt;margin-right:0in;margin-bottom:6.0pt;margin-left:0in'><span style='font-size:20.0pt;font-family:"Arial",sans-serif;color:black;font-weight:normal'>TOCTOU Violations</span></h1><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>============================</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Time of check to time of use, or TOCTOU, is a class of software bugs caused by changes in a system between the time a condition is checked, and the time the result is used [4]. If libc++ chooses to do any caching in directory_entry, it will be vulnerable to this. If libc++ chooses to do no caching at all, it will also be vulnerable to this but in different scenarios.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Most importantly, whichever approach is taken, <b>libc++ will bare some responsibility for the TOCTOU bugs it allows users to write accidently. </b>So it’s imperative we consider all approaches carefully.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Let’s take a simple example:</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'>void</span><span style='font-family:"Arial",sans-serif;color:black'> remove_symlinks_in_dir(path p) {</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> for</span><span style='font-family:"Arial",sans-serif;color:black'> (directory_entry ent : directory_iterator(p)) {</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> if</span><span style='font-family:"Arial",sans-serif;color:black'> (ent.is_symlink())</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> remove(ent);</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> }</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>}</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Above, a TOCTOU violation may occur if another process acts on the file referenced by `ent`, changing the attributes between the point the directory_entry was created and the point at which the `is_symlink` condition is checked.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>At this point it's important to note that <filesystem> interface is, in general, vulnerable to TOCTOU --- even without directory_entry caching. For example:</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'>void</span><span style='font-family:"Arial",sans-serif;color:black'> remove_symlinks_in_dir(path p) {</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> for</span><span style='font-family:"Arial",sans-serif;color:black'> (directory_entry ent : directory_iterator(p)) {</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> if</span><span style='font-family:"Arial",sans-serif;color:black'> (is_symlink(ent))</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> remove(ent);</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> }</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>}</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>In the above case, a TOCTOU violation still occurs, since changes to `ent` can take place between the check and the call to `remove`. However, at least the above example makes it clear *when* the check is occuring.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Simply eliminating caching will not prevent TOCTOU violations, but, in this particular case, it would make them harder to write and make it more obvious where they may occur. Having a cache both extends the period of time between the check and the use, as well as hiding when the check actually occurs.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Perhaps the above concessions make it worthwhile to support a more efficient interface with caching and fewer file system accesses?</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>If we do choose to cache some attributes, then, at minimum, it should be obvious to the user which values are cached and which are not. The following section will explore the feasibility of this.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><h2 style='mso-margin-top-alt:.25in;margin-right:0in;margin-bottom:6.0pt;margin-left:0in'><span style='font-size:16.0pt;font-family:"Arial",sans-serif;color:black;font-weight:normal'>Directory Iteration VS Refresh</span></h2><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>==================================</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>The paper's intention is to allow attributes returned during directory iteration to be accessed by the user without additional calls to the underlying filesystem. </span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>The cache is populated two different ways:</span><b><o:p></o:p></b></p><p style='mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:47.0pt;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l0 level1 lfo1;vertical-align:baseline'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol;color:#222222'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]><span style='font-family:"Arial",sans-serif;color:#222222'>During directory iteration, the cache is populated with only the attributes returned by directory iteration function. With POSIX `readdir`, only the file type as returned by `lstat` is available.<o:p></o:p></span></p><p style='mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:47.0pt;margin-bottom:.0001pt;text-indent:-.25in;mso-list:l0 level1 lfo1;vertical-align:baseline'><![if !supportLists]><span style='font-size:10.0pt;font-family:Symbol;color:#222222'><span style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'> </span></span></span><![endif]><span style='font-family:"Arial",sans-serif;color:#222222'>Using the `refresh` function, either called directly by the user, or implicitly by the constructors or modifying methods (Ex. `assign`). `refresh` has the ability to fully populate the cache (but not atomically in some cases, see the symlink considerations below).<o:p></o:p></span></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222;background:white'>Note that const accessors may not update the cache. </span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222;background:white'>Therefore the state of the cache depends on whether the `directory_entry` was created during directory iteration or by the `directory_entry(path&)` constructor, and whether there have been any intervening calls to `refresh` or any other modifying member functions. </span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222;background:white'>The following case demonstrates a surprising TOCTOU violation, only when the `directory_entry` cache was populated during directory iteration, and not by a call to refresh:</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:green'>// Assume the filesystem supports permissions on symlinks.</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'>bool</span><span style='font-family:"Arial",sans-serif;color:black'> is_readable_symlink(directory_entry ent) {</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> if</span><span style='font-family:"Arial",sans-serif;color:black'> (!ent.is_symlink())</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> return</span><span style='font-family:"Arial",sans-serif;color:black'> </span><span style='font-family:"Arial",sans-serif;color:blue'>false</span><span style='font-family:"Arial",sans-serif;color:black'>;</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:green'> // We should have the file_status for the symlink cached, right?</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> file_status st = ent.symlink_status();</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:green'> // Nope! Only when `refresh` has been called are the perms cached.</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> assert(is_symlink(st)); </span><span style='font-family:"Arial",sans-serif;color:green'>// May fail!</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> return</span><span style='font-family:"Arial",sans-serif;color:black'> (st.permissions() & perms::others_read) != perms::none;</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>}</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>The above example is intended to demonstrate that the different accessors may return a mix of cached and uncached values. Here `is_symlink()` is cached, but `symlink_status()` may not be. </span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>If the symlink has been replaced with a regular file, then the result of this function is potentially dangerous nonsense.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>So what happens when the user calls `refresh()` before accessing the attributes? Does this make it safer?</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>The answer depends on the caching behavior of the implementation. and whether that behavior provides "atomic" or "non-atomic" attribute access (as defined in the next section).</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><h2 style='mso-margin-top-alt:.25in;margin-right:0in;margin-bottom:6.0pt;margin-left:0in'><span style='font-size:16.0pt;font-family:"Arial",sans-serif;color:black;font-weight:normal'>Atomic VS Non-Atomic Attribute Access</span></h2><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>=============================================</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>As previously described, TOCTOU violations require there to be intervening time between the check and the use. We'll call any filesystem function, or set of functions, which don’t allow for TOCTOU bugs "atomic". All other functions or set of functions are considered "non-atomic".</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Therefore, a `directory_entry` object can be said to provide "atomic attribute access" across all accessors if and only if (A) all of the attributes are cached, and (B) the cache was populated atomically. A non-caching implementation of `directory_entry` never provides atomic attribute access.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Let’s consider the consequences using the following example:</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'>bool</span><span style='font-family:"Arial",sans-serif;color:black'> is_empty_regular_file(path p) {</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'> directory_entry ent(p); </span><span style='font-family:"Arial",sans-serif;color:green'>// Calls refresh()</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:blue'> return</span><span style='font-family:"Arial",sans-serif;color:black'> ent.is_regular_file() && ent.file_size() == 0;</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>}</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>In a non-caching implementation `refresh()` is a nop, and therefore opens the user up to TOCTOU issues occuring between the call to `is_symlink()` and `is_regular_file()`.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>In a fully-caching implementation, which provides atomic attribute access, no such problem occurs.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:black'>In this case it seems preferable for libc++ to provide a fully-caching implementation with "atomic attribute access". But is this possible? The answer is "sometimes, but not always" as we'll see below.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><h2 style='mso-margin-top-alt:.25in;margin-right:0in;margin-bottom:6.0pt;margin-left:0in'><span style='font-size:16.0pt;font-family:"Arial",sans-serif;color:black;font-weight:normal'>Problems with Symlinks And Refresh()</span></h2><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>===========================================</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Above we established that if refresh caches at all, then it should do so "atomically", retrieving all attributes from the filesystem in a atomic manner. Note that for symlinks, some of the cached attributes refer to the symlink (is_symlink() and `symlink_status()), while the rest refer to the linked entity (is_regular_file(), file_size(), etc).</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>When the directory entry does not refer to a symlink, a single call to `lstat` provides enough information to fully populate the cache atomically. However, when the entity is a symlink, we would need a second call to `stat` to determine the properties of the linked file.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Therefore, "atomic attribute access" is not possible to guarantee.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>An implementation may choose to partially populate the caches, omitting attributes about the linked entity; or it may choose to fully populate the cache non-atomically using a non-atomic series of calls to `lstat` and `stat`.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>The former case obfuscates from the user which attributes are cached and which are not, opening them up to TOCTOU violations.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Worse yet, the latter case, which fully populates the cache non-atomically, potentially commits a TOCTOU violation itself, and this violation is hidden from the user, preventing them from knowing it exists or being able to do anything about it.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Both solutions seem fraught with problems.</span><b><o:p></o:p></b></p><h1 style='mso-margin-top-alt:20.0pt;margin-right:0in;margin-bottom:6.0pt;margin-left:0in'><span style='font-size:20.0pt;font-family:"Arial",sans-serif;color:black;font-weight:normal'>Conclusion</span></h1><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>===============</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>I hope to have shown the pros and cons of different directory entry caching implementations. Now I need your help deciding what's best for Libc++!</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Personally, I think the best solution is to revert these changes to the standard, since each possible implementation seems broken in one way or another. Directory entries should not cache, and directory iterators should not attempt to populate the cache incompletely. Afterwards, a better and safer proposal can be put forward which to provides efficient and atomic access to multiple attributes for the same entity can be proposed.</span><b><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Unfortunately, it might be to late at this point to revert the paper, even if the committee generally agrees with these problems.</span><b><o:p></o:p></b></p><p class=MsoNormal><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>Thoughts? </span><b><o:p></o:p></b></p><p class=MsoNormal style='margin-bottom:12.0pt'><b><o:p> </o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>[1] </span><b><a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html"><span style='font-family:"Arial",sans-serif;color:#1155CC;font-weight:normal'>http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0317r1.html</span></a><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>[2] </span><b><a href="https://en.cppreference.com/w/cpp/experimental/fs/directory_entry"><span style='font-family:"Arial",sans-serif;color:#1155CC;font-weight:normal'>https://en.cppreference.com/w/cpp/experimental/fs/directory_entry</span></a><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>[3] </span><b><a href="https://en.cppreference.com/w/cpp/filesystem/directory_entry"><span style='font-family:"Arial",sans-serif;color:#1155CC;font-weight:normal'>https://en.cppreference.com/w/cpp/filesystem/directory_entry</span></a><o:p></o:p></b></p><p style='margin:0in;margin-bottom:.0001pt'><span style='font-family:"Arial",sans-serif;color:#222222'>[4] </span><b><a href="https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use"><span style='font-family:"Arial",sans-serif;color:#1155CC;font-weight:normal'>https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use</span></a><o:p></o:p></b></p></div><p class=MsoNormal style='margin-bottom:12.0pt'><b><br><br></b></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>