[PATCH] D155579: [Windows] Avoid using FileIndex for unique IDs on network mounts

Alexandre Ganea via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 22 07:35:34 PDT 2023


aganea added a comment.

In D155579#4604541 <https://reviews.llvm.org/D155579#4604541>, @mstorsjo wrote:

> You who might have more use of nontrivial Windows build scenarios - how much impact would it be to go all in on the new path canonicalization + hash approach, i.e. ditching the file index entirely? Performance wise I would believe that it would be no significant change to before. The only thing that matters probably is how it behaves wrt symlinks/hardlinks, if those are present and in use.

I guess we can try it, but we might be trading one problem for another. Fun stuff like this <https://github.com/rust-lang/rust/pull/86447/files>. We would need a rigorous test plan, since this can't be tested automatically. And a strategy for applying the test plan since a single individual might not have access to all hardware resource to execute the test plan (perhaps sharing the testing across several individuals).

I'd like to suggest again the solution I put forward in D146490 <https://reviews.llvm.org/D146490>, which could only be enabled on a case-by-case basis, for short-lived apps like Clang or LLD. Other, or long-lived applications like ClangD would still use the current (unreliable) trunk behavior for IDs. At least that shovels all the OS/drivers problems to the OS, and leaves us with the optimization problem (of open handles), which could be manageable.

We can also try to improve your patch somehow:

In D155579#4604241 <https://reviews.llvm.org/D155579#4604241>, @aganea wrote:

> It seems one of root issues is the call to `GetVolumePathNameW` inside `is_local_internal`, which is absolutely terrible. It generates 12 (!!!) separate kernel calls for each folder component in the path. The deeper your files are, the longer it takes to execute.

Correction, the issue is with `GetDriveTypeW` not `GetVolumePathNameW`. I realized that they do this because of reparse points, and because internally they call other APIs that each opens the path and checks for reparse points, again. For example, I don't understand why internally they call `RegSetValueExW`? (inside `GetDriveTypeW`) I wonder if there could be a different way to find whether the file is on a network drive or not.

> Also worth noting that Clang calls `llvm::sys::status` twice per file:
>
> - once in `DiagnoseInputExistence`.
> - then in `FileSystemStatCache::get`.

We could also fix this, which would maybe dampen the impact of this patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155579/new/

https://reviews.llvm.org/D155579



More information about the llvm-commits mailing list