[cfe-dev] RFC: A virtual file system for clang
Manuel Klimek
klimek at google.com
Fri Feb 7 11:31:07 PST 2014
On Fri, Feb 7, 2014 at 8:18 PM, Ben Langmuir <blangmuir at apple.com> wrote:
>
> On Feb 7, 2014, at 11:03 AM, Manuel Klimek <klimek at google.com> wrote:
>
> ...
>>
>
>> I'd vote for making that an explicit goal; two reasons:
>> 1. I don't think it'll make a first iteration harder to implement
>> 2. saying that we'll do things like that later will almost certainly make
>> it super-hard to do later
>>
>> For us, the ability to have virtual file buffers that do not exist on
>> disk is one of the core requirements we have for all our tools; I think in
>> a more and more network-based world this will also become more necessary in
>> general in the future.
>>
>>
>> How do you imagine changing clients that currently expect to be able to
>> get a file descriptor? Do you remove that concept and provide only
>> higher-level APIs, like “getBuffer” and “getRawOstream”, or create some
>> opaque file descriptor that can be returned from openFileForReading and
>> openFileForWriting? The latter seems like it doesn’t need to be built in
>> from the start, since we can continue to have the usual file descriptor
>> APIs, and update clients later when we change what a file descriptor is.
>> That’s what I was imagining, but you may have a better idea. Also, even
>> if adding fully virtual files doesn’t make a first iteration harder to
>> implement, what about testing it?
>>
>
> Well, that's an interesting question :) So, do you want a virtual file
> system that we can plug below the file manager pretty much "as-is", and it
> just works? In that case, I'd guess we need to do the latter. I'm also not
> sure which clients we might be able to convert later.
>
> If you don't want to provide a vfs below the file manager, where do you
> want to use it? Currently, all Tooling/ stuff relies on being able to use
> the file overlaying logic to inject into PPOptions / SourceManager /
> FileManager to allow (nearly) fully file system independent replays of
> compilations. Would you propose to break this behavior as part of the
> transition?
>
>
> I definitely want to put this below the FileManager. FileManager would
> just keep a reference to an AbstractFileSystem (although I’m not sure who
> should actually own that object), that we use to represent the ‘unified’
> file system and do all of its operations through it. Any existing uses of
> FileManager should continue to work as-is. This way we can phase in the
> VFS without changing the way overriding files works now.
>
Sounds good.
>
>
>
>> One implementation of the AbstractFileSystem interface would be a
>>> wrapper over the ‘real’ file system, which would just defer to
>>> llvm::sys::fs.
>>>
>>> class RealFileSystem : public AbstractFileSystem { … };
>>>
>>> And to provide a unified view of the file system, we can create an
>>> overlay file system, similar to [1].
>>>
>>> class OverlayFileSystem : public AbstractFileSystem { … };
>>>
>>> To support a build system providing clang with a virtual file layout, we
>>> could add an option to clang that accepts a file describing the layout of a
>>> virtual file system. In a first iteration, this could be a simple json
>>> file describing the mapping from virtual paths to real paths, and a
>>> corresponding class VFSFromJSONFile : public AbstractFileSystem. Later we
>>> can evolve a more efficient binary format for this. In addition we should
>>> provide functions in libclang to produce these files.
>>>
>>
>> The rest sounds generally good.
>>
>> One concern I have that has not been brought up is the old problem of
>> making file operations relative to a directory entry that is taken once at
>> the start of an operation (imagine starting a compilation from a symlinked
>> directory, and somebody changing the link). I think this will probably not
>> a goal for phase 1, but would be nice if we could keep it in the back of
>> our heads ;)
>>
>>
>> Can you expand on this? Dumb question: why can’t we just ask the OS for a
>> canonical path and work from that?
>>
>
> The main problem is less whether the path is canonical than that an
> unrelated happenstance in the file system might lead to inconsistencies in
> a compile step.
>
> Imagine a source tree in src-head and one in src-branch.
> $ ln -s src-head src
> $ cd src
> $ clang file.cc &
> $ cd ..
> $ rm src
> $ ln -s src-branch src
>
> Now if the clang process takes a while to run and resolves files it opens
> via "absolute" paths rather than relative to the current working dir file
> entry, it can see inconsistent state (for example, get a header from
> src-branch instead of from src-head).
>
>
> Hmm, haven’t thought about this one. I guess I agree this is not a
> near-term goal :)
>
I'd like to consider it design-wise at least on a straw-man level (to be
shot down ;)
If we want to support this in the future, it might affect both the
ownership and the API design question. For example, one design straw-man
would be to have interfaces for FileSystem, Directory and File, where
FileSystem can give you Directory's and those again can give you Files.
That would trivially support using directory-entry based OS interfaces
where they are available, but would mean more code overhead per FileSystem
implementation.
A different approach would be to use descriptors for files and directories,
and have only a single FileSystem interface that can handle the instances.
Seems slightly less "nice" from a user point of view, but potentially
simpler and less overhead for the implementation.
>From the other mails in this thread it sounds to me more like you want to
basically punt on those questions and just provide the interface and access
methods to get buffers for files. That might also be fine for now, but I'd
prefer if it is a conscious decision rather than an accidental one :)
Cheers,
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140207/9d185678/attachment.html>
More information about the cfe-dev
mailing list