[cfe-dev] FileManager re-factor

Chris Lattner clattner at apple.com
Tue Dec 23 14:56:36 PST 2008


On Dec 7, 2008, at 6:46 PM, John Kelley wrote:
>>> an index in UniqueDirContainer. Is the stat more expensive then
>>> hitting an llvm::StringMap to see if we've checked this exact
>>> directory yet?
>>
>> I would imagine stat() is much more expensive.
>
> I think so as well, I'll circle back to this one after the others are
> done.

Also note that clang tries to just 'open' a file in certain cases and  
handle the error code instead of doing stat then doing the open on  
success.

>>> We seem to use the device/inode combo to account for
>>> sym-linked directories but we don't seem to try to resolve the
>>> symlinked directory to it's origin so as to avoid the FS stat on
>>> open.
>>> Is this something that should be implemented instead?
>>
>> Sounds good. It would be useful to measure the difference to verify
>> the performance improvement.
>
> I am currently using dtrace to track the number of stats performed by
> clang. I am sampling clang's 'make test' right now to get numbers but
> am wondering if there is a large-ish clang-friendly project that would
> be better suited for this task? Recommendations welcome.
>
> dtrace command line:
> 	dtrace -n 'syscall::stat:entry /execname == "clang"/
> { @[copyinstr(arg0)] = count() }' > trace-clang-stat.log

This is very interesting.  I tried out:

dtrace -n 'syscall:::entry /execname == "clang"/{ printf("%x %x",  
arg0, arg1); } syscall::stat:entry /execname == "clang"/{ printf("%s",  
(copyinstr(arg0))); } syscall::open_nocancel:entry /execname ==  
"clang"/{ printf("%s", copyinstr(arg0)); }' | & less

to get a full trace.  I see stuff that looks like this for Cocoa/ 
Cocoa.h:

          stat:entry /Users/sabre/llvm/Debug/Headers/Cocoa
          stat:entry /usr/local/include/Cocoa
          stat:entry /usr/lib/gcc/i686-apple-darwin9/4.0.1/include/Cocoa
          stat:entry /usr/lib/gcc/powerpc-apple-darwin9/4.0.1/include/ 
Cocoa
          stat:entry /usr/include/Cocoa
          stat:entry /System/Library/Frameworks/Cocoa.framework/Headers
          stat:entry /System/Library/Frameworks/Cocoa.framework/ 
Headers/Cocoa.h
open_nocancel:entry /System/Library/Frameworks/Cocoa.framework/Headers/ 
Cocoa.h

This pattern happens many times for each top level framework  Within a  
framework, I see patterns like this:

          stat:entry /System/Library/Frameworks/Foundation.framework/ 
Headers/NSArchiver.h
open_nocancel:entry /System/Library/Frameworks/Foundation.framework/ 
Headers/NSArchiver.h
          stat:entry /System/Library/Frameworks/Foundation.framework/ 
Headers/NSArray.h
open_nocancel:entry /System/Library/Frameworks/Foundation.framework/ 
Headers/NSArray.h
          stat:entry /System/Library/Frameworks/Foundation.framework/ 
Headers/NSEnumerator.h
open_nocancel:entry /System/Library/Frameworks/Foundation.framework/ 
Headers/NSEnumerator.h

There are two things wrong with this:

1. We're doing a stat followed by an open.  We should just do an open  
and on failure, continue the search.  If the open succeeded and we  
need size info etc, clang should do an fstat on the file descriptor.

2. We end up "checking" some directories many times.  For example,  
Cocoa.h checks /Users/sabre/llvm/Debug/Headers/ 82 times for files in  
my example, and only succeeds 5 times.  It would be better to have  
DirectoryEntry get the file/dir contents of the directory after some  
number of queries using 'getdirentries'.  This would save the repeated  
negative hits to a directory.

The first one is probably pretty easy, the second one is more  
involved.  Both should only be done with careful attention to measured  
performance.

-Chris



More information about the cfe-dev mailing list