[cfe-dev] Module maps, __FILE__ and lazy stat'ing of headers

Manuel Klimek klimek at google.com
Wed Aug 6 02:01:59 PDT 2014


Ok, I might be dense here, but I'm still missing the picture :)

So, I see that the idea is to have FileName that basically encapsulates
which path/name combination we found the file under, and a FileEntry, which
maps to the inode.

Questions:
1. do we want to replace all uses of FileEntry with FileName, or are there
cases where we're only interested in the inodes (for example in
SourceManager).
2. how would we detect whether we are on a case insensitive file system?

Cheers,
/Manuel



On Tue, Aug 5, 2014 at 9:58 PM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Aug 5, 2014, at 12:28 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Tue, Aug 5, 2014 at 10:57 AM, Ben Langmuir <blangmuir at apple.com> wrote:
>
>>
>> On Aug 5, 2014, at 7:51 AM, Manuel Klimek <klimek at google.com> wrote:
>>
>> As this is blocking us, I'm taking a stab at it…
>>
>>
>> One thing that stood out to me in Richard's patch:
>>
>> * Returning FileName pointers invites users to rely on the identity of
>> the FileName object, which won’t work on case-insensitive file systems and
>> is probably a bad idea in general.  I suggest we wrap FileName * in a
>> “FileRef” object that prevents comparisons or possibly forwards them to
>> comparing the inodes.
>>
>
> I don't agree: there are some uses where we really want to compare the
> canonicalized FileName pointers. For instance, this is the right behavior
> for #pragma once, and for various things in HeaderSearch. (When looking up
> a header relative to a file, it should be the FileName that is the cache
> key, not the FileEntry.)
>
>
> Perhaps the best way to deal with case-insensitive file systems is to give
> paths that differ by case the same FileName object, since they are the same
> dentry (which is the notion we're trying to capture here).
>
>
> SGTM.
>
> Likewise, on Windows, the short and long forms of the same file name
> should have the same FileName object. We may be able to make that change
> without introducing a split between FileName and FileEntry; the downside
> would be that we may potentially load the same inode multiple times if it's
> found by different paths (relative paths being the most likely offender).
>
> Ben
>>
>>
>>
>>
>> On Wed, Jul 30, 2014 at 12:04 AM, Richard Smith <richard at metafoo.co.uk>
>> wrote:
>>
>>> On Mon, Jul 28, 2014 at 11:40 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>>>
>>>> On 28 Jul 2014 17:57, "Ben Langmuir" <blangmuir at apple.com> wrote:
>>>> >
>>>> >
>>>> >> On Jul 28, 2014, at 4:37 PM, Richard Smith <richard at metafoo.co.uk>
>>>> wrote:
>>>> >>
>>>> >> On Mon, Jul 28, 2014 at 3:10 PM, Ben Langmuir <blangmuir at apple.com>
>>>> wrote:
>>>> >>>
>>>> >>>
>>>> >>> > On Jul 28, 2014, at 4:31 AM, Manuel Klimek <klimek at google.com>
>>>> wrote:
>>>> >>> >
>>>> >>> > Hi Richard,
>>>> >>> >
>>>> >>> > while working with module maps for layering checks a problem with
>>>> __FILE__ we noticed triggered a couple of questions.
>>>> >>> >
>>>> >>> > The problem is that __FILE__ uses the path of the first 'stat'
>>>> call (as that is how FileManager::getFile() works).
>>>> >>>
>>>> >>> I was thinking about this a while ago and independently of this
>>>> issue I would really like to change this behaviour at some point.  The name
>>>> of a file is a property of how you look it up, not an intrinsic part of the
>>>> file itself.
>>>> >>
>>>> >>
>>>> >> I agree. We have incorrectly conflated the notion of the file
>>>> identity (inode) with the directory entriy, and we can't tell the two
>>>> apart. This leads to weird behavior in a number of places. For instance:
>>>> >>
>>>> >>  a/
>>>> >>   x.h: #include "y.h"
>>>> >>   y.h: int a = 0;
>>>> >>  b/
>>>> >>   x.h: symlink to a/x.h
>>>> >>   y.h: int b = 0;
>>>> >>  main.c:
>>>> >>   #include "a/x.h"
>>>> >>   #include "b/x.h"
>>>> >>   int main() { return a + b; }
>>>> >>
>>>> >> On a correct compiler, this would work. For clang, it fails, because
>>>> b/x.h's #include finds a/y.h, because we use the path by which we first
>>>> found x.h as the One True Path to x.h. (This also leads to wrong __FILE__,
>>>> etc.)
>>>> >>
>>>> >> I tried fixing this ~2 years ago by splitting FileEntry into
>>>> separate dentry and inode classes, but this rapidly snowballed and exposed
>>>> the same design error being made in various other components.
>>>> >
>>>> >
>>>> > Interesting.  My motivation was keeping track of virtual and “real”
>>>> paths for the VFS, which is a special case of the above.  Maybe we can sink
>>>> the dentry/inode down to the VFS layer eventually?  Getting symlinks and
>>>> “..” entries to work in the VFS make a lot more sense when we have explicit
>>>> dentries rather than inferring from the file path.
>>>> >
>>>> > I’d be interested in what issues you ran into here if you remember.
>>>>
>>>> I'll see if I can dig out my patch tomorrow.
>>>>
>>> Attached is my WIP patch from (as it turns out) over 2 years ago. My
>>> (very) hazy memories were that we had quite a few different places where
>>> people were using FileEntry*s for things, and neither a dentry nor an inode
>>> seemed like the "right" thing. I don't remember any more details than that.
>>> There were also places where we would need to just make a decision, such
>>> as: what should #pragma once use as its key? (I think dentry is the right
>>> answer here, since the same file found in different directories might mean
>>> different things, but that answer may break some people who use #pragma
>>> once and don't also have include guards. Conversely, it fixes some builds
>>> on content-addressed file systems.)
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140806/ba8d2dda/attachment.html>


More information about the cfe-dev mailing list