[cfe-dev] Patch for main file ID and stat cache lookup issues when loading AST files
dgregor at apple.com
Sat Jan 7 13:37:33 PST 2012
On Jan 6, 2012, at 9:25 PM, Tom Honermann wrote:
> On 1/5/2012 4:43 PM, Argyrios Kyrtzidis wrote:
>> On Dec 20, 2011, at 11:10 AM, Tom Honermann wrote:
>>> Attached are two patches (svn diff format) that correct two issues
>>> encountered when loading AST files generated by Clang with the
>>> '-emit-ast' option.
>>> Problem 1: Loading an AST file via ASTUnit::LoadFromASTFile() sets
>>> ASTUnit::MainFileIsAST to true so that calls to
>>> ASTUnit::isMainFileAST() return true. However,
>>> SourceManager::MainFileID does not get restored. This is addressed
>>> by clang-main-file-id.patch
>> Addressed in r147612.
> Thank you!
>>> Problem 2: When the main source file is compiled using a relative
>>> path (ie, 'clang -emit-ast file.c'), filesystem stat lookups use
>>> the relative path exactly as passed on the command line. This
>>> results in the stat cache populated by ASTWriter via
>>> MemorizeStatCalls using the relative path as the key to the stat
>>> cache for the main file. However, the main source file name is
>>> also stored in the AST file using an absolute path as the "original
>>> file". When the AST file is later read by ASTReader, the original
>>> (absolute) path is used for filesystem stat lookup resulting in a
>>> cache miss for the stat cache because the absolute path doesn't
>>> match (string comparison) the relative path used as a key in the
>>> stat cache. In this case, the real filesystem is queried which may
>>> return stat data that does not reflect the stat values from the
>>> time of the compilation if the filesystem stat values have changed.
>>> This is addressed by clang-main-file-stat-cache.patch which simply
>>> converts input file names to absolute paths before passing them on
>>> to FrontendAction::BeginSourceFile().
>> Ever since we made the AST reader to check all the files (as you
>> noticed) for robustness, the stored stat cache that the ASTWriter
>> populates became irrelevant, I'll probably remove it from the AST
>> file. This is not blocking you for something, right ?
> The change to check all the files is not always desirable. It makes
> perfect sense when one wants to ensure a PCH file is up to date, but is
> not necessarily desirable for other uses of AST files. With Clang 2.9,
> one can generate an AST file, modify/delete all the source files used to
> generate the AST file, and still use the AST file for code generation,
> analysis, etc... later on.
That's not entirely true. Some operations use the source-location information to go back to the source code and get extra information, such as when one asks the Lexer/Preprocessor to change a token-based range into a character-based range. Or if one looks at the tokens in a macro, inspecting a literal token requires one to go back to the source code. Generating debug info also needs this information.
One could try to stay away from such operations, of course, but it's hard to be certain that you've done so.
> That no longer works with 3.0 without first
> disabling validation (sort of - deleted files don't fail the load, but
> modified files do). For those like myself using
> ASTUnit::LoadFromASTFile(), there are currently no options to disable
> validation without changing the source code.
That's a fair point. Different clients may accept that the PCH file can't be used to extract complete information about the translation unit. A command-line compiler, for example, could probably delay checking the on-disk files until they are actually needed (if they are needed), trading some safety---we've never made recovery from those failures perfect---for some performance. When libclang is running in an IDE, however, we need that extra checking because we're more prone to crashing if the filesystem changes out from under us.
> For my purposes, the stat cache is important so that I can retrieve the
> file size, modification time, etc... used to create the AST file when
> analyzing the AST file on a different machine or after the source files
> have been modified or removed. In my case, missing or modified source
> files are more of a warning situation, not a hard error as I don't
> always need access to the original source.
Interesting. I guess I'd be fine with the stat cache coming back, and to allow more configurability with how strict the checking is when the AST file is initially loaded.
More information about the cfe-dev