[PATCH] D155579: [Windows] Avoid using FileIndex for unique IDs on network mounts

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 18 04:30:44 PDT 2023


mstorsjo created this revision.
mstorsjo added reviewers: aganea, thieta, kadircet, zero9178, alvinhochun, hans, aaron.ballman.
Herald added subscribers: arphaman, hiraditya.
Herald added a project: All.
mstorsjo requested review of this revision.
Herald added a subscriber: wangpc.
Herald added a project: LLVM.

The FileIndex values returned from GetFileInformationByHandle is
considered stable and uniquely identifying a file, as long as the
handle is open. When handles are closed, the indices can be reused
very soon.

When such file indices are used for LLVM's UniqueID, files are
considered duplicates as soon as the filesystem driver happens to
have used the same file index for the handle used to inspect the
file. This caused widespread, non-obvious (seemingly random)
breakage. This can happen e.g. if running on a directory that is
shared via Remote Desktop or VirtualBox.

To avoid the issue, use a hash of the canonicalized path if the
the path isn't a local path. This isn't an exact and precise
solution - many network mounts have perfectly reliable file
indices, and conversely, file indices can change on FAT if the
drive is defragmented. However in practice, this heuristic probably
is good enough and better than having the current sitaution.

This fixes https://github.com/llvm/llvm-project/issues/61401 and
https://github.com/llvm/llvm-project/issues/22079.

Performance wise, each call to getStatus(), which previously only
called GetFileInformationByHandle, now also end up calling
GetVolumePathNameW and GetDriveTypeW, and GetFinalPathNameByHandleW
if getStatus() was called with a file handle instead of a path,
or if the file was deemed to be non-local. Hopefully this extra
overhead is tolerable.

Change the equivalent() function to use getUniqueID instead of
checking individual file_status fields. The
equivalent(Twine,Twine,bool& result) function calls status() on
each path successively, without keeping the file handles open,
which also is prone to such false positives. This also gets rid
of checks of other superfluous fields in the equivalent(file_status, file_status)
function - the unique ID of a file should be enough (that is
what is done for Unix anyway).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D155579

Files:
  llvm/include/llvm/Support/FileSystem.h
  llvm/lib/Support/Windows/Path.inc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D155579.541456.patch
Type: text/x-patch
Size: 5802 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230718/25057d21/attachment.bin>


More information about the llvm-commits mailing list