[PATCH] D155579: [Windows] Avoid using FileIndex for unique IDs on network mounts

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 18 02:31:41 PDT 2023


mstorsjo added a comment.

In D155579#4588227 <https://reviews.llvm.org/D155579#4588227>, @mstorsjo wrote:

> In D155579#4588145 <https://reviews.llvm.org/D155579#4588145>, @aganea wrote:
>
>> @mstorsjo I think a good test for you would be to run clang-scan-deps on a large CDB .json, multithreaded, and see if there’s a difference before/after patch.
>
> That sounds like a good thing to test. Does anyone have a suitable setup with such files ready that could run the tests? I don't really run Windows (other than VMs for testing things occasionally) or do development in it, I mostly try to cross compile things - and I haven't used clang-scan-deps before either.

I set this up and managed to get some numbers now.

As test subject, I configured a build of llvm+clang, with llvm-mingw built with and without this patch. After configuring the build, I ran `tim clang-scan-deps.exe --compilation-database=compile_commands.json > out 2> err` (using https://github.com/sgraham/tim). Before the change, it took consistently `real: 0m17.156s` to run, while after the change it took around `real: 0m23.703s`. So this task became around 38% slower with this change - that's a notable slowdown.

I tried to look into what is causing so much slowdown. Overall, this path adds one call to `realPathFromHandle` (if status was called on a file descriptor without known path), which calls `GetFinalPathNameByHandleW` once or twice if the buffer wasn't large enough. (We could resize the new buffer on line 792 to `MAX_PATH` to avoid needing to do this twice in most cases.) This doesn't incur almost any extra runtime in my tests (perhaps most calls are with a path?). Then we call `is_local_internal`, which first calls `GetVolumePathNameW` followed by `GetDriveTypeW`. Here the absolute majority in runtime seems to be in `GetVolumePathNameW` - by omitting the rest of `is_local_internal` and just doing one single call to `GetVolumePathNameW`, the runtime increases from 17 to 23 seconds.

So the calls to `GetFinalPathNameByHandleW` and `GetDriveTypeW` are irrelevant for performance here, and `GetVolumePathNameW` is the only thing that causes all the extra runtime cost added here. I don't see any great ways around that. (One may be tempted to manually reimplement `GetVolumePathNameW` to just get the first segment of the path - but that breaks the cases where volumes are mounted in the system hierarchy etc.)

With that in mind, are @aaron.ballman and @aganea still ok with going ahead and landing this?

An alternative would be to just skip the call to `is_local_internal` and entirely switch to the hash based logic and never use `FileIndex`. I guess that'd mostly work, but the current patch tries to retain the use of `FileIndex` where reasonable (which probably should work better whenever e.g. symlinks or hardlinks are in use - which aren't common but do exist).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155579/new/

https://reviews.llvm.org/D155579



More information about the llvm-commits mailing list