[LLVMdev] [cfe-dev] LLVM & Clang file management

Sun Dec 4 08:33:23 PST 2011

On Tue, Nov 29, 2011 at 9:29 PM, Daniel Dunbar <daniel at zuster.org> wrote:
> On Mon, Nov 28, 2011 at 1:04 PM, Manuel Klimek <klimek at google.com> wrote:
>> On Mon, Nov 28, 2011 at 9:07 PM, Daniel Dunbar <daniel at zuster.org> wrote:
>>> Hi Manual,
>>>
>>> I'm +2 on the general idea.
>>>
>>> I have had various thoughts in this direction as well (although no
>>> implementation). See:
>>>  http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-July/009903.html
>>> for my RFC from last year (focused at bug reporting, but involved
>>> defining a VFS layer).
>>
>> Cool, that sounds like another use case very similar to our replaying
>> at scale use case.
>>
>>> My one main implementation level comment is I don't think FileManager
>>> is the right API layer to abstract at (it is too specific to Clang's
>>> usage, and too hard to propagate through the rest of LLVM). My
>>> intuition is that it is better to set out to define a lower level VFS
>>> layer that is rich enough to support everything we do and the vagaries
>>> of Win32/Unix, but is otherwise minimal.
>>
>> What about FileManager is too high level / too clang specific? The
>> uniquing logic? The possibility to add in stats caches?
>> Do you think we'd want to have a CachingFileSystem on top of the VFS
>> layer? That would sound more orthogonal, on the other hand FileManager
>> is doing pretty OS-specific stuff to unique the inodes where possible.
>
> I guess I was thinking that it might be more cumbersome to move the
> other parts of LLVM / Clang that do direct file access to use
> FileManager, and would require expanding the FileManager interface
> much beyond what it currently is (e.g., there are no interfaces at all
> for output).
>
> It's mostly an intuitive guess at this point, but that lead me to
> think it would be better to have the VFS be slightly lower. But this
> also depends on the design goal of the VFS, discussed a bit in the
> reply below.
>
>>> One requirement I hope any proposed VFS design will support is
>>> emulating Win32 on Unix (and vice versa), which imposes assorted API
>>> complications but I think is worth it overall.
>>
>> I'm not sure I understand what you mean with "emulating win32"? I'd
>> hope to get win32 / unix stuff hidden behind the VFS; do you expect
>> that not to be possible performance wise?
>
> I'd like to distinguish between "hidden" and virtualized. What I was
> thinking was to virtualize the interfaces so that LLVM/Clang would
> still be aware of the differences between win32 / unix (when
> necessary, like in relation to inodes), but that would all be based on
> going through a VFS layer. So one could then emulate any FS on another
> one, but the definition of the VFS would still expose the underlying
> differences between Unix/Win32/etc.
>
> Was your plan directed more at hiding? In that case I can see why you
> would want to start at the FileManager level.

Well, the answer to that question depends highly on the performance
characteristics we can get.
I usually prefer hiding, unless performance requires us to break the
abstraction.

Ok, after browsing the implementations of PathV2 and FileSystem, this
stuff already looks pretty close to what I'd want to write anyway,
minus putting it into classes to enable run-time virtualization (and
it doesn't look like it would be too hard to switch FileManager to run
on top of FileSystem...), and splitting up FileSystem into a
FileSystem and a OperatingSystemPaths or something (how system
libraries are found, etc)

Do you know whether there were roadblocks to the PathV1->V2
transition? (or was just somebody with enough stamina missing ;)

> I think both approaches probably can work, although hiding makes me a
> bit more nervous because I think the API design ends up being much
> harder (and more likely to incur performance tradeoffs). I'm always
> pretty leery of attempts to paper over the differences between
> platforms.
> Did that explanation make sense? If not I can sketch pseudocode to
> make it more obvious.

Sure, code always helps :)

Also, do you have any objections to just virtualizing
Support/FileSystem and basing FileManager back on top of that?

Cheers,
/Manuel

>  - Daniel
>
>>
>> Cheers,
>> /Manuel
>>
>>> I see many positive future technologies we could build if we had a
>>> good VFS layer, I'd absolutely love to see work in this direction.
>>>
>>>  - Daniel
>>>
>>> On Mon, Nov 28, 2011 at 2:49 AM, Manuel Klimek <klimek at google.com> wrote:
>>>> Hi,
>>>>
>>>> while working on tooling on top of clang/llvm we found the file system
>>>> abstractions in clang/llvm to be one of the points that could be nicer
>>>> to integrate with. I’m writing this mail to propose a strawman and get
>>>> some feedback on what you guys think the right way forward is (or
>>>> whether we should just leave things as they are).
>>>>
>>>> First, the FileManager we have in clang has helped us a lot for our
>>>> tooling - when we run clang in a mapreduce we don’t need to lay out
>>>> files on a disk, we can just map files into memory and happily clang
>>>> over them. We’re also using the same mechanism to map builtin
>>>> includes; in short, the FileManager has made it possible to do clang
>>>> at scale.
>>>>
>>>> Now we’re aware that it was not really the intention of the
>>>> FileManager to allow doing the things we do with it: not every module
>>>> in clang uses the FileManager, and the moment we hit llvm there is no
>>>> FileManager at all. For example, in case of the Driver we hack around
>>>> the fact that the header search tries to access the file system
>>>> driectly in rather brittle ways, relying on implementation details and
>>>> #ifdefs.
>>>>
>>>> So why not make FileManager a more principled (and still blazing fast)
>>>> file system abstraction?
>>>> Pro:
>>>> - only one interface for developers to learn on the project (no more
>>>> PathV1 vs PathV2 vs FileManager)
>>>> - only one implementation (per-platform) for easier maintenance of the
>>>> file system platform abstraction
>>>> - one point to insert synchronization guarantees for tools / IDE
>>>> integration that wants to run clang in multiple threads at once (for
>>>> example when re-indexing on 12-ht-core machines)
>>>> - being able to replay compilations by injecting a virtual file system
>>>> that exactly “copies” the original file system’s content, which allows
>>>> easy scaling of replays, running tools against dirty edit buffers on a
>>>> lower level than the SourceManager and unit testing
>>>>
>>>> Con:
>>>> - there would be yet another try at unifying the APIs which would be
>>>> in an intermediate state while being worked on (and PathV1 vs PathV2
>>>> is already bad enough)
>>>> - making it the canonical file system interface is a lot of effort
>>>> that requires touching a lot of systems (while we’re volunteering to
>>>> do the work, it will probably eat up other people’s time, too)
>>>>
>>>> What parts (if any) of this type of transition makes sense?
>>>> 1. Figure out the “correct” interface we’d want for FileManager to be
>>>> more generally useful
>>>> 2. Change FileManager to that interface
>>>> 4. Sink FileManager into llvm, so it can be used by other projects
>>>> 4. Use it throughout clang
>>>> 5. Use it throughout llvm
>>>> We don’t need to do all of them at once, and should be able to
>>>> evaluate the results along the way.
>>>>
>>>> Thoughts? If folks are generally happy, I’d start up an email thread
>>>> to drive the target design of the FileManager to get things rolling.
>>>>
>>>> /Manuel
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>
>>>
>