[cfe-dev] RFC: Improving Clang's FileManager

Sam McCall via cfe-dev cfe-dev at lists.llvm.org
Tue Aug 6 03:29:52 PDT 2019


On Tue, Aug 6, 2019 at 3:19 AM Alex L <arphaman at gmail.com> wrote:

> Hi Sam!
>
> I was thinking about this as well, it would be highly risky to land a
> massive change like this at once. I don't think it would be wise for me to
> try that, especially given that this was attempted before by others.
>
> Right now I'm thinking about introducing a parallel API to `getFile`,
> which would return the FileEntryRef, and starting the migration by
> migrating only those clients that are needed for the initial use-case
> (FileManager reuse for dependency scanning). I think it would be fine if
> the migration happened on per-need basis, as long as new clients use the
> new API from the beginning. Do you think that's a good approach?
>
This sounds good to me. There's two big caveats:
 - if there are reasons existing clients can't easily be moved, these may
represent (weird) use-cases that the new API doesn't support, like the
canonicalization. So the old API may continue to get new users.
 - in practice, migrating clients between subtly different APIs may not
happen, for "ain't broke, don't fix" reasons
But it's probably still the best approach I think.

On Fri, 2 Aug 2019 at 14:12, Sam McCall <sammccall at google.com> wrote:
>
>> I'm sympathetic to wanting to change this behaviour, as it's super subtle
>> and weird.
>> However this is a pretty risky change because it's subtle and filename
>> handling often doesn't have a clearly correct answer. It'd be nice if there
>> was a way to land this patch in isolation without an API change, so it can
>> bake for a while and be more easily reverted if need be.
>> One detail: IIUC, this
>>   >  it will return a FileEntry whose name is guaranteed to be that path
>> isn't quite accurate, it stat()s the the file, assigns that to
>> InterndFileName, and the assigns that to UFE.Name. So getFile() performs
>> some canonicalization, and callers may rely on that now (clangd does).
>>
>
> Right, that's true, thanks for clarifying. Right now I don't want to
> change this particular aspect of this behavior. Do you know some of the
> scenarios when the name is different? I don't think it canonicalized
> symlinks, right?
>
Reading through the implementation again (it's been a while!), it delegates
to ether VFS.openFileForRead(N).status().name(), or VFS.status(N).name().
For RealFileSystem these propagate N directly, but it's up to the VFS.
RedirectingFileSystem seems to optionally use the original name or the
provided name. I know internally we (embarrasingly) have a build system
integration that uses this facility to do some path mapping (in this case
we need to *un*-resolve a symlink, sigh...)


> On Wed, Jul 31, 2019 at 9:30 PM Alex L via cfe-dev <cfe-dev at lists.llvm.org>
>> wrote:
>>
>>> Hi,
>>>
>>> As it stands right now, Clang's FileManager doesn't model file entries
>>> with multiple names correctly. Whenever the `getFile` method is called with
>>> a path it hasn't seen before, it will return a FileEntry whose name is
>>> guaranteed to be that path. However, this guarantee is lost whenever a
>>> symlink file that points to original path is opened using `getFile`, as the
>>> `name` of the FileEntry is overwritten with the symlink path. This happens
>>> because the FileManager uniques the FileEntries using inodes, which means
>>> that symlinks always get the same file entry as the original file.
>>>
>>> I would like to fix this issue as it's blocking clang-scan-deps from
>>> reusing the FileManager across multiple invocations. Here's my idea:
>>>
>>> - Return FileEntryRef that stores the FileEntry * and a Name.
>>> - Don't store the Name in the FileEntry (keep the RealPathName) still.
>>> - A comparison between FileEntryRef will compare the pointer values of
>>> FileEntries, so the code that compares the pointer values right now can
>>> still compare two FileEntryRefs with the equivalent result (important for
>>> modules).
>>> - DenseMaps will still mostly use the FileEntry * values to preserve the
>>> existing behavior which expects one FileEntry per inode.
>>>
>>> Additionally to fixing the issue described above, Harlan (CCed) and I
>>> would like to also perform the following minor cleanups and improvements:
>>>
>>> - Start using error_code/ErrorOr for FileManager for improved API.
>>> - Get rid of the MemorizeStatCalls stat cache, which isn't actually used
>>> right now by Clang (probably a bug)!
>>> - Cache the VFS stat value in the FileEntryRef itself, so that the
>>> FileManager can actually have proper caching that works.
>>>
>>> Please let me know if you have any feedback. I'm hoping to post a patch
>>> for FileEntryRef sometime in the next 1-2 weeks.
>>>
>>> Thanks,
>>> Alex
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190806/0350eac7/attachment.html>


More information about the cfe-dev mailing list