[cfe-dev] Patch for main file ID and stat cache lookup issues when loading AST files

Tom Honermann thonermann at coverity.com
Tue Dec 20 11:10:04 PST 2011


Attached are two patches (svn diff format) that correct two issues 
encountered when loading AST files generated by Clang with the 
'-emit-ast' option.

Problem 1: Loading an AST file via ASTUnit::LoadFromASTFile() sets 
ASTUnit::MainFileIsAST to true so that calls to ASTUnit::isMainFileAST() 
return true.  However, SourceManager::MainFileID does not get restored. 
  This is addressed by clang-main-file-id.patch

Problem 2: When the main source file is compiled using a relative path 
(ie, 'clang -emit-ast file.c'), filesystem stat lookups use the relative 
path exactly as passed on the command line.  This results in the stat 
cache populated by ASTWriter via MemorizeStatCalls using the relative 
path as the key to the stat cache for the main file.  However, the main 
source file name is also stored in the AST file using an absolute path 
as the "original file".  When the AST file is later read by ASTReader, 
the original (absolute) path is used for filesystem stat lookup 
resulting in a cache miss for the stat cache because the absolute path 
doesn't match (string comparison) the relative path used as a key in the 
stat cache.  In this case, the real filesystem is queried which may 
return stat data that does not reflect the stat values from the time of 
the compilation if the filesystem stat values have changed.  This is 
addressed by clang-main-file-stat-cache.patch which simply converts 
input file names to absolute paths before passing them on to 
FrontendAction::BeginSourceFile().

A test program is attached to demonstrate these problems.  When compiled 
and linked with an unpatched version of clang, the test program will 
abort on a failed assertion:
   Assertion `!source_manager.getMainFileID().isInvalid()' failed.

Applying clang-main-file-id.patch will correct the assertion failure. 
At this point, the second problem can be demonstrated with the following 
steps:

1) Compile an example .c file with 'clang -emit-ast file.c'.  Make
    sure to use a relative path to the source file.
2) Run the test program against the .ast file generated in step 1.
    Output similar to the following should be displayed:
      main file: /path/to/file.c
          Size: 15
          Time: 1324402591: Tue Dec 20 12:36:31 2011
          Mode: 0100644
3) Change the mode of the .c file (ie, 'chmod g+w file.c'), and
    rerun the test program.  If the stat cache is working properly,
    no changes should be seen.  However, without the attached patch,
    a change to the mode is seen:
      main file: /path/to/file.c
          Size: 15
          Time: 1324402591: Tue Dec 20 12:36:31 2011
          Mode: 0100664

The reason the steps above change the file mode is because the AST 
reader will notice changes to the file size or modification stamp and 
abort the AST load if either fails to match.  Changing the file mode 
avoids the load failure, but still demonstrates the cache miss.

I suspect that stat cache lookups will remain fragile as long as the 
stat cache continues to use non-normalized paths as keys.  I haven't 
attempted to address this.

These issues were present in Clang 2.9 and 3.0 as well.

Tom.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: clang-main-file-id.patch
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111220/ee33143b/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: clang-main-file-stat-cache.patch
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111220/ee33143b/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testcase.cpp
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111220/ee33143b/attachment-0002.ksh>


More information about the cfe-dev mailing list