[PATCH] Fixing a driver bug where clang cannot locate source file at drive root

Rafael EspĂ­ndola rafael.espindola at gmail.com
Fri Jul 26 09:59:10 PDT 2013


On 23 July 2013 17:56, Yunzhong Gao <Yunzhong_Gao at playstation.sony.com> wrote:
>   Hi Aaron and Rafael,
>
>   Many thanks for reviewing my patch. I did a little more investigation on how
>   the file I/O works on LLVM/Clang.
>
>   When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and
>   ::stat("C:test.c", &stat_buffer) return success. On the other hand,
>   ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\")
>   returns success.
>
>   Rafael asked whether this is a clang-specifc problem or a llvm problem.
>   It looks like clang uses FileManager::getFile() to open a file, which will
>   first check for existence of the enclosing directory before trying to open
>   the file. This first check is where the trailing back slash makes the
>   difference. It is done by calling FileSystemStatCache::get() on the directory
>   name, which then calls library function ::stat(). Eventually it uses
>   MemoryBuffer::getFile() to open the file.
>
>   I checked the other llvm tools and they do not checking for the enclosing
>   directory as clang does:
>     llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile()
>       to open a file. It does not check for existence of the enclosing
>       directory, so the difference between stat("C:\\") and stat("C:") does
>       not affect llvm-as here.
>     bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use
>       llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input
>       file. Again they do not check for existence of the enclosing directory.
>     llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, llvm-dwarfdump,
>       llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck,
>       utils/FileUpdate, llvm-tblgen, clang-tblgen also use
>       MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open
>       the input file without checking for existence of the enclosing directory.
>     llvm-objdump, llvm-readobj, llvm-size and macho-dump use
>       llvm::object::createBinary() and then MemoryBuffer::getFile() to open the
>       input file without checking for existence of the enclosing file.
>     llvm-ar uses Memory::getFile() to open the archive file. It does
>       not check for existence of the enclosing file.
>     llvm-config needs to compare current directory to development tree for
>       equality, and when doing so, always gets absolute paths first.
>     llvm-dis uses DataStreamer::OpenFile() and then sys::fs::openFileForRead(),
>       which calls open() on Linux and CreateFileW() on Windows.
>     llvm-extract uses llvm::getLazyIRFileModule() and then
>       MemoryBuffer::getFile() to open the input file.
>     llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream()
>       to open a file for write, which calls sys::fs::openFileForWrite().
>     llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then
>       getOrCreateBinary() => object::createBinary() => MemoryBuffer::getFile()
>       to open the input file.
>     utils/count reads from stdin.
>   Because none of these llvm tools check for existence of the enclosing
>   directory before opening their input files, they do not have the same problem
>   as clang. So to answer Rafael's question, this is indeed specific to clang.
>
>   My proposed patch is in FileManager::getDirectory() because I see some codes
>   in the beginning of this function already trying to patch the directory name.
>   Alternatively we could also try to fix up the directory name in llvm::sys::
>   path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in
>   getDirectoryFromFile() could take the same approach as I did in getDirectory().
>   Doing it in parent_path() might be a bit more tricky because to grow the
>   string, some memory allocator is needed: I would like some advice on what is
>   the best way to handle it.
>
>   To answer Aaron's question, I used path::root_name() instead of
>   path::root_path() so that the expression "DirName == root_name(DirName)"
>   returns different results for "C:" and "C:\", where the fix is really only
>   needed for the "C:" case.
>
>   "C:test.c" is not directly passed to path::root_name(). Before entering
>   FileManager::getDirectory(), where my patch is intended, path::parent_path()
>   will be run on the file path to extract the directory name first before
>   passing the directory name to path::root_name(), which returns "C:" for
>   both C:\ and C:. Does this address your question, Aaron?
>
>   I wrote a unit test thanks to Aaron's suggestion. I am adding it to
>   clang/unittests/Basic/FileManagerTest.cpp. Could you take a look?

Thanks for the detailed explanation. I think this fix (with the test)
is good for now. Aaron, any objections?

Ideas for future improvement:
 *  We should probably change FileManager::getDirectory to not play
with the directory name and move the responsibility to parent_path.
The only valid driver names are the letters from a to z? If so we
would be able to keep the interface and just keep some global string
constants:

static const char * DriveRoots[] = {"a:\", "b:\"....}

and return one of them when given something like "c:foo".

* getUniqueID should probably return a pair of 64 bit numbers. On unix
it should include the st_dev. On windows we could drop the xor trick.

* FileManager::UniqueDirContainer::getDirectory could then stop using
the filename on windows and have a common unix and windows
implementation.

>   Many thanks,
>   - Gao.

Cheers,
Rafael



More information about the cfe-commits mailing list