[PATCH] D155579: [Windows] Avoid using FileIndex for unique IDs on network mounts

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 22 14:17:34 PDT 2023


mstorsjo added a comment.

In D155579#4606800 <https://reviews.llvm.org/D155579#4606800>, @aganea wrote:

> In D155579#4604541 <https://reviews.llvm.org/D155579#4604541>, @mstorsjo wrote:
>
>> You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.
>
> I guess we can try it, but we might be trading one problem for another. Fun stuff like this <https://github.com/rust-lang/rust/pull/86447/files>.

Oh, interesting. However I don't think that seem like a big blocker for us - if `VOLUME_NAME_NT` works reliably across file systems, we could just use that. We don't need a filename to display anywhere here, we just need a canonical string representation for a file.

> We would need a rigorous test plan, since this can't be tested automatically. And a strategy for applying the test plan since a single individual might not have access to all hardware resources to execute the test plan (perhaps sharing the testing across several individuals).

I don't quite see why such a rigorous test plan would be needed here, compared to how we do most other things? If it seems to work reasonably in the cases we've got access to testing and we don't know of any other specific problematic cases, we could land it, and expect to hear back if it causes something else?

> I'd like to suggest again the solution I put forward in D146490 <https://reviews.llvm.org/D146490>, which could only be enabled on a case-by-case basis, for short-lived apps like Clang or LLD. Other, or long-lived applications like ClangD would still use the current (unreliable) trunk behavior for IDs. At least that shovels all the OS/drivers problems to the OS, and leaves us with the optimization problem (of open handles), which could be manageable.

I'm quite hesitant of that solution - it's complex and increases resource consumption - and keeping files open longer than necessary always causes issues, especially on Windows. Admittedly, it's not that big of an issue for short-lived processes. Having separate solutions for short-lived and long-lived processes doesn't feel too great though.

> In D155579#4604241 <https://reviews.llvm.org/D155579#4604241>, @aganea wrote:
>
>> It seems one of root issues is the call to `GetVolumePathNameW` inside `is_local_internal`, which is absolutely terrible. It generates 12 (!!!) separate kernel calls for each folder component in the path. The deeper your files are, the longer it takes to execute.
>
> Correction, the issue is with `GetDriveTypeW` not `GetVolumePathNameW`. I realized that they do this because of reparse points, and because internally they call other APIs that each opens the path and checks for reparse points, again. For example, I don't understand why internally they call `RegSetValueExW`? (inside `GetDriveTypeW`) I wonder if there could be a different way to find whether the file is on a network drive or not.

Hmm, in my testing before, it was definitely `GetVolumePathNameW` that was slow.

FWIW, I tested symlinks and hardlinks with Clang with this patch. Symlinks do get resolved to the target file, so the path hashing approach works just as well as the file index there. For hard links, path hashing is inferior to using the file index though - the file index knows that two different paths are the same file, but `GetFinalPathNameByHandleW` doesn't canonicalize them to one single name (which is the correct behaviour for hard links - all names are equal and independent of each other).

If we consider hard links on Windows something we don't need to cater for, we could use unconditional path hashing, combined with canonicalization with `VOLUME_NAME_NT` if the default `VOLUME_NAME_DOS` is problematic, and that should be quite performant.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155579/new/

https://reviews.llvm.org/D155579



More information about the llvm-commits mailing list