[PATCH] Fixing a driver bug where clang cannot locate source file at drive root

Rafael EspĂ­ndola rafael.espindola at gmail.com
Mon Jul 29 08:53:50 PDT 2013

I committed this as r187359.

Testing this a bit more shows that we still don't implement this
exactly like native tools. The behavior is *really* strange. The
meaning of "C:foo.c" seems to be

* C:\foo.c    if the current drive *is not* C
* .\foo.c      if the current drive *is* C

I guess we can implement this in parent_path by having it return "."
for "C:foo.c" if the current drive is C and "C:\" if it is not.

On 26 July 2013 12:59, Rafael EspĂ­ndola <rafael.espindola at gmail.com> wrote:
> On 23 July 2013 17:56, Yunzhong Gao <Yunzhong_Gao at playstation.sony.com> wrote:
>>   Hi Aaron and Rafael,
>>   Many thanks for reviewing my patch. I did a little more investigation on how
>>   the file I/O works on LLVM/Clang.
>>   When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and
>>   ::stat("C:test.c", &stat_buffer) return success. On the other hand,
>>   ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\")
>>   returns success.
>>   Rafael asked whether this is a clang-specifc problem or a llvm problem.
>>   It looks like clang uses FileManager::getFile() to open a file, which will
>>   first check for existence of the enclosing directory before trying to open
>>   the file. This first check is where the trailing back slash makes the
>>   difference. It is done by calling FileSystemStatCache::get() on the directory
>>   name, which then calls library function ::stat(). Eventually it uses
>>   MemoryBuffer::getFile() to open the file.
>>   I checked the other llvm tools and they do not checking for the enclosing
>>   directory as clang does:
>>     llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile()
>>       to open a file. It does not check for existence of the enclosing
>>       directory, so the difference between stat("C:\\") and stat("C:") does
>>       not affect llvm-as here.
>>     bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use
>>       llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input
>>       file. Again they do not check for existence of the enclosing directory.
>>     llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, llvm-dwarfdump,
>>       llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck,
>>       utils/FileUpdate, llvm-tblgen, clang-tblgen also use
>>       MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open
>>       the input file without checking for existence of the enclosing directory.
>>     llvm-objdump, llvm-readobj, llvm-size and macho-dump use
>>       llvm::object::createBinary() and then MemoryBuffer::getFile() to open the
>>       input file without checking for existence of the enclosing file.
>>     llvm-ar uses Memory::getFile() to open the archive file. It does
>>       not check for existence of the enclosing file.
>>     llvm-config needs to compare current directory to development tree for
>>       equality, and when doing so, always gets absolute paths first.
>>     llvm-dis uses DataStreamer::OpenFile() and then sys::fs::openFileForRead(),
>>       which calls open() on Linux and CreateFileW() on Windows.
>>     llvm-extract uses llvm::getLazyIRFileModule() and then
>>       MemoryBuffer::getFile() to open the input file.
>>     llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream()
>>       to open a file for write, which calls sys::fs::openFileForWrite().
>>     llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then
>>       getOrCreateBinary() => object::createBinary() => MemoryBuffer::getFile()
>>       to open the input file.
>>     utils/count reads from stdin.
>>   Because none of these llvm tools check for existence of the enclosing
>>   directory before opening their input files, they do not have the same problem
>>   as clang. So to answer Rafael's question, this is indeed specific to clang.
>>   My proposed patch is in FileManager::getDirectory() because I see some codes
>>   in the beginning of this function already trying to patch the directory name.
>>   Alternatively we could also try to fix up the directory name in llvm::sys::
>>   path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in
>>   getDirectoryFromFile() could take the same approach as I did in getDirectory().
>>   Doing it in parent_path() might be a bit more tricky because to grow the
>>   string, some memory allocator is needed: I would like some advice on what is
>>   the best way to handle it.
>>   To answer Aaron's question, I used path::root_name() instead of
>>   path::root_path() so that the expression "DirName == root_name(DirName)"
>>   returns different results for "C:" and "C:\", where the fix is really only
>>   needed for the "C:" case.
>>   "C:test.c" is not directly passed to path::root_name(). Before entering
>>   FileManager::getDirectory(), where my patch is intended, path::parent_path()
>>   will be run on the file path to extract the directory name first before
>>   passing the directory name to path::root_name(), which returns "C:" for
>>   both C:\ and C:. Does this address your question, Aaron?
>>   I wrote a unit test thanks to Aaron's suggestion. I am adding it to
>>   clang/unittests/Basic/FileManagerTest.cpp. Could you take a look?
> Thanks for the detailed explanation. I think this fix (with the test)
> is good for now. Aaron, any objections?
> Ideas for future improvement:
>  *  We should probably change FileManager::getDirectory to not play
> with the directory name and move the responsibility to parent_path.
> The only valid driver names are the letters from a to z? If so we
> would be able to keep the interface and just keep some global string
> constants:
> static const char * DriveRoots[] = {"a:\", "b:\"....}
> and return one of them when given something like "c:foo".
> * getUniqueID should probably return a pair of 64 bit numbers. On unix
> it should include the st_dev. On windows we could drop the xor trick.
> * FileManager::UniqueDirContainer::getDirectory could then stop using
> the filename on windows and have a common unix and windows
> implementation.
>>   Many thanks,
>>   - Gao.
> Cheers,
> Rafael

More information about the cfe-commits mailing list