[cfe-dev] [RFC] Storing relative paths in .pcm files

Tue Nov 18 17:28:03 PST 2014

On Tue, Nov 18, 2014 at 10:46 AM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Nov 17, 2014, at 7:17 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, Nov 17, 2014 at 6:59 PM, Ben Langmuir <blangmuir at apple.com> wrote:
>
>>
>> On Nov 17, 2014, at 6:27 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Nov 17, 2014 at 6:08 PM, Ben Langmuir <blangmuir at apple.com>
>> wrote:
>>
>>> Hey Richard (& cfe-dev),
>>>
>>> Currently if one AST file imports another (e.g. module A imports module
>>> B), we store the absolute path of module B inside module A’s IMPORTS
>>> record.  When we know that both files will always be in the same directory,
>>> this wastes space and more importantly prevents moving those modules to
>>> another directory.  The latter is very handy when debugging a module bug
>>> for which someone has given you their broken module cache.
>>>
>>> When an implicitly built module imports another implicitly built module,
>>> we can rely on the modules always being in the same module cache, and I
>>> think we should switch to a relative path that is either looked up relative
>>> to the current pcm file or the (hash-specific) module cache dir.  Do you
>>> think we should do this for explicitly built modules that happen to be in
>>> the same directory?
>>
>>
>> My initial reaction is that we should preserve the path given in the
>> -fmodule-file= argument on the command line. If I use
>> -fmodule-file=x/foo.pcm and explicitly build y/bar.pcm, I think that
>> y/bar.pcm should say that it finds foo in 'x/foo.pcm’.
>>
>>
>> This makes sense to me.  In that case, we’ll probably need to store
>> another bit to distinguish “relative to CWD” from “relative to module
>> cache”, or else -fmodule-file=<some implicitly built module>.pcm might
>> choose an unexpected file.  Alternatively, we could store the ModuleKind
>> for the module when it was written (as opposed to when it was loaded), I
>> guess.
>>
>>
>> If the user then builds with -fmodule-file=z/foo.pcm
>> -fmodule-file=y/bar.pcm, we should probably ignore the path that was
>> specified for 'foo' when building 'bar’.
>>
>>
>> I assume you mean ‘loading bar'.
>>
>
> Err, I mean we should ignore the path for foo that was specified at the
> time when bar was built when loading bar.
>
>
> Ah.
>
> What about implicitly built modules that are imported by explicitly built
>>> modules?
>>>
>>
>> It seems tricky to make that work transparently if the modules have been
>> relocated. We shouldn't expect that explicitly-built modules are located
>> anywhere near the module cache, so I guess the best we can do is to look
>> for such files in the module cache by default (even if the module cache has
>> moved), and not bother writing out /path/to/module/cache/thing.pcm. If
>> they've been relocated, then I suppose you could explicitly import them
>> with -fmodule-file=$foo.
>>
>> However, we need to be cautious that things can change between explicit
>> module build and use, so we need to use the parameters from the explicit
>> module itself when determining the configuration hash of the implicit
>> module.
>>
>>
>> Good point, I hadn’t considered this issue.
>>
>
>> Maybe the simplest thing to do is to skip this case for now; we'd only be
>> saving the space cost of writing out the path to the module cache,
>>
>>
>> Sounds good.
>>
>
> Actually, can we skip this case?  What if the user builds a bunch of
> modules implicitly then starts using some of them explicitly with
> -fmodule-file.  Then we can’t know at build time whether to write a
> module-cache-relative path or normal path.
>

We can't know at the build time of which module? Just to make sure we're on
the same page: whether a module file is explicit or implicit is a property
of how it's loaded, not of how it's built. If it's found by -fmodule-file,
then it's explicit and we should write out its path relative to $PWD; if
it's found in the module cache implicitly, then it's implicit and we should
write out its path relative to the cache.

That makes me think using cache-relative paths won’t be a great solution.
>
> One answer could be:
>
> 1) When we write a module import, we write out the module’s name.
> 2.1) When we load an imported module, we first check if there is an
> override from -fmodule-file for a module.
> 2.2) Otherwise, if the module is imported explicitly, we use a stored
> path, which will be absolute or relative to the working directory (as
> normal).
> 2.3) Otherwise, if the module is imported implicitly, we lookup the path
> using the hash-specific module cache and the module’s name.
> 3) When we load a module explicitly, we figure out the hash-specific
> module cache directory from the time it was built (either by reconstructing
> all the options or by writing it out separately in the AST file and then
> re-loading it), and use that for any implicit imports of the current module.
>

This all makes sense to me.

> Which results in:
>
> a) Any module can be moved around individually by using -fmodule-file
> b) Implicit imports of explicit modules will look for their .pcm in the
> location it was found when the explicit module was built.
> c) If there are only implicit modules, you can use -fmodules-cache-path
> and move the whole cache directory around.
>
> Thoughts?  I’m not sure how I feel about (b), but (a) and (c) seem good to
> me.
>

I'm not really sure what (b) means. But (a) and (c) seem like goodness.

>
>> and I don't think that's a big deal (at least, not compared to the 100K
>> we waste on a name lookup table for builtins and keywords in each module).
>>
>>
>> OT, but: Fixing that has been near the bottom of my TODO list for a long
>> time.  IIRC it’s not just a waste of space, because if a system module
>> defines one of those builtin names (e.g. ceil in tgmath.h) we might find
>> the wrong one because we take the first one we find that’s up to date.
>>
>
> I did some analysis of the size cost in the context of PR21397, but never
> got any production-ready changes out of it.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141118/9bf9cf96/attachment.html>