[cfe-dev] [RFC] Storing relative paths in .pcm files

Thu Nov 20 17:56:15 PST 2014

On Wed, Nov 19, 2014 at 3:48 PM, Ben Langmuir <blangmuir at apple.com> wrote:

>
> On Nov 18, 2014, at 5:28 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Tue, Nov 18, 2014 at 10:46 AM, Ben Langmuir <blangmuir at apple.com>
> wrote:
>
>>
>> On Nov 17, 2014, at 7:17 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Nov 17, 2014 at 6:59 PM, Ben Langmuir <blangmuir at apple.com>
>> wrote:
>>
>>>
>>> On Nov 17, 2014, at 6:27 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>>
>>> On Mon, Nov 17, 2014 at 6:08 PM, Ben Langmuir <blangmuir at apple.com>
>>> wrote:
>>>
>>>> Hey Richard (& cfe-dev),
>>>>
>>>> Currently if one AST file imports another (e.g. module A imports module
>>>> B), we store the absolute path of module B inside module A’s IMPORTS
>>>> record.  When we know that both files will always be in the same directory,
>>>> this wastes space and more importantly prevents moving those modules to
>>>> another directory.  The latter is very handy when debugging a module bug
>>>> for which someone has given you their broken module cache.
>>>>
>>>> When an implicitly built module imports another implicitly built
>>>> module, we can rely on the modules always being in the same module cache,
>>>> and I think we should switch to a relative path that is either looked up
>>>> relative to the current pcm file or the (hash-specific) module cache dir.
>>>> Do you think we should do this for explicitly built modules that happen to
>>>> be in the same directory?
>>>
>>>
>>> My initial reaction is that we should preserve the path given in the
>>> -fmodule-file= argument on the command line. If I use
>>> -fmodule-file=x/foo.pcm and explicitly build y/bar.pcm, I think that
>>> y/bar.pcm should say that it finds foo in 'x/foo.pcm’.
>>>
>>>
>>> This makes sense to me.  In that case, we’ll probably need to store
>>> another bit to distinguish “relative to CWD” from “relative to module
>>> cache”, or else -fmodule-file=<some implicitly built module>.pcm might
>>> choose an unexpected file.  Alternatively, we could store the ModuleKind
>>> for the module when it was written (as opposed to when it was loaded), I
>>> guess.
>>>
>>>
>>> If the user then builds with -fmodule-file=z/foo.pcm
>>> -fmodule-file=y/bar.pcm, we should probably ignore the path that was
>>> specified for 'foo' when building 'bar’.
>>>
>>>
>>> I assume you mean ‘loading bar'.
>>>
>>
>> Err, I mean we should ignore the path for foo that was specified at the
>> time when bar was built when loading bar.
>>
>>
>> Ah.
>>
>> What about implicitly built modules that are imported by explicitly built
>>>> modules?
>>>>
>>>
>>> It seems tricky to make that work transparently if the modules have been
>>> relocated. We shouldn't expect that explicitly-built modules are located
>>> anywhere near the module cache, so I guess the best we can do is to look
>>> for such files in the module cache by default (even if the module cache has
>>> moved), and not bother writing out /path/to/module/cache/thing.pcm. If
>>> they've been relocated, then I suppose you could explicitly import them
>>> with -fmodule-file=$foo.
>>>
>>> However, we need to be cautious that things can change between explicit
>>> module build and use, so we need to use the parameters from the explicit
>>> module itself when determining the configuration hash of the implicit
>>> module.
>>>
>>>
>>> Good point, I hadn’t considered this issue.
>>>
>>
>>> Maybe the simplest thing to do is to skip this case for now; we'd only
>>> be saving the space cost of writing out the path to the module cache,
>>>
>>>
>>> Sounds good.
>>>
>>
>> Actually, can we skip this case?  What if the user builds a bunch of
>> modules implicitly then starts using some of them explicitly with
>> -fmodule-file.  Then we can’t know at build time whether to write a
>> module-cache-relative path or normal path.
>>
>
> We can't know at the build time of which module?
>
>
> I was basically saying that because the “explicitness” of a module can
> change in subsequent loads, we cannot store just a cache-relative path for
> implicit imports unless we can reconstruct where the cache itself was at
> the time of building the importing module.  In other words, unless we have
> not skipped this case.
>
> Just to make sure we're on the same page: whether a module file is
> explicit or implicit is a property of how it's loaded, not of how it's
> built.
>
>
> What terminology do you use for “built on-demand” vs. “built with
> -emit-module”?  A module that is built on-demand is always being implicitly
> imported at the point it is originally built.
>

I'm not sure to what extent this is a useful distinction; a "built
on-demand" module is built as if by running another clang instance with
"-emit-module". If you use "-emit-module" without "-o", the .pcm file gets
put into the module cache and is indistinguishable from an implicitly-built
module. (But there's some terminology in case we need to talk about this:
"explicitly- versus implicitly-built module".)

> If it's found by -fmodule-file, then it's explicit and we should write out
> its path relative to $PWD; if it's found in the module cache implicitly,
> then it's implicit and we should write out its path relative to the cache.
>
> That makes me think using cache-relative paths won’t be a great solution.
>>
>> One answer could be:
>>
>> 1) When we write a module import, we write out the module’s name.
>> 2.1) When we load an imported module, we first check if there is an
>> override from -fmodule-file for a module.
>> 2.2) Otherwise, if the module is imported explicitly, we use a stored
>> path, which will be absolute or relative to the working directory (as
>> normal).
>> 2.3) Otherwise, if the module is imported implicitly, we lookup the path
>> using the hash-specific module cache and the module’s name.
>> 3) When we load a module explicitly, we figure out the hash-specific
>> module cache directory from the time it was built (either by reconstructing
>> all the options or by writing it out separately in the AST file and then
>> re-loading it), and use that for any implicit imports of the current module.
>>
>
> This all makes sense to me.
>
>
>> Which results in:
>>
>> a) Any module can be moved around individually by using -fmodule-file
>> b) Implicit imports of explicit modules will look for their .pcm in the
>> location it was found when the explicit module was built.
>> c) If there are only implicit modules, you can use -fmodules-cache-path
>> and move the whole cache directory around.
>>
>> Thoughts?  I’m not sure how I feel about (b), but (a) and (c) seem good
>> to me.
>>
>
> I'm not really sure what (b) means. But (a) and (c) seem like goodness.
>
>
> (b) means if you pass -fmodule-file for A, and A has an implicit import
> for B C and D, you cannot move B, C and D together by just changing
> -fmodules-cache-path.  You need to spell out -fmodule-file for each module
> individually if you want to move them.  This is only interesting in that it
> is different from (c) where if *all* of the modules are found implicitly we
> can move the whole cache at once.
>

I see. There are more general issues with this situation; if we explicitly
import module file A, and module file A implicitly imports module file B
from the module cache, and B is no longer in the cache for whatever reason,
then we have essentially no way to recover; we don't have enough
information to rebuild B, and we can't and shouldn't rebuild A. (And even
if we could rebuild B, we would need a bit-for-bit identical version in
order for A to be able to use it.) So I don't think we need to worry too
much about problems that are specific to this case.

> If there’s no objections, I’ll put together a patch for this.
>

Sounds good to me!

> Ben
>
>
>>> and I don't think that's a big deal (at least, not compared to the 100K
>>> we waste on a name lookup table for builtins and keywords in each module).
>>>
>>>
>>> OT, but: Fixing that has been near the bottom of my TODO list for a long
>>> time.  IIRC it’s not just a waste of space, because if a system module
>>> defines one of those builtin names (e.g. ceil in tgmath.h) we might find
>>> the wrong one because we take the first one we find that’s up to date.
>>>
>>
>> I did some analysis of the size cost in the context of PR21397, but never
>> got any production-ready changes out of it.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20141120/49d4e545/attachment.html>