[PATCH] Fixing a driver bug where clang cannot locate source file at drive root
Rafael EspĂndola
rafael.espindola at gmail.com
Fri Jul 26 09:59:10 PDT 2013
On 23 July 2013 17:56, Yunzhong Gao <Yunzhong_Gao at playstation.sony.com> wrote:
> Hi Aaron and Rafael,
>
> Many thanks for reviewing my patch. I did a little more investigation on how
> the file I/O works on LLVM/Clang.
>
> When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and
> ::stat("C:test.c", &stat_buffer) return success. On the other hand,
> ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\")
> returns success.
>
> Rafael asked whether this is a clang-specifc problem or a llvm problem.
> It looks like clang uses FileManager::getFile() to open a file, which will
> first check for existence of the enclosing directory before trying to open
> the file. This first check is where the trailing back slash makes the
> difference. It is done by calling FileSystemStatCache::get() on the directory
> name, which then calls library function ::stat(). Eventually it uses
> MemoryBuffer::getFile() to open the file.
>
> I checked the other llvm tools and they do not checking for the enclosing
> directory as clang does:
> llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile()
> to open a file. It does not check for existence of the enclosing
> directory, so the difference between stat("C:\\") and stat("C:") does
> not affect llvm-as here.
> bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use
> llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input
> file. Again they do not check for existence of the enclosing directory.
> llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, llvm-dwarfdump,
> llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck,
> utils/FileUpdate, llvm-tblgen, clang-tblgen also use
> MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open
> the input file without checking for existence of the enclosing directory.
> llvm-objdump, llvm-readobj, llvm-size and macho-dump use
> llvm::object::createBinary() and then MemoryBuffer::getFile() to open the
> input file without checking for existence of the enclosing file.
> llvm-ar uses Memory::getFile() to open the archive file. It does
> not check for existence of the enclosing file.
> llvm-config needs to compare current directory to development tree for
> equality, and when doing so, always gets absolute paths first.
> llvm-dis uses DataStreamer::OpenFile() and then sys::fs::openFileForRead(),
> which calls open() on Linux and CreateFileW() on Windows.
> llvm-extract uses llvm::getLazyIRFileModule() and then
> MemoryBuffer::getFile() to open the input file.
> llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream()
> to open a file for write, which calls sys::fs::openFileForWrite().
> llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then
> getOrCreateBinary() => object::createBinary() => MemoryBuffer::getFile()
> to open the input file.
> utils/count reads from stdin.
> Because none of these llvm tools check for existence of the enclosing
> directory before opening their input files, they do not have the same problem
> as clang. So to answer Rafael's question, this is indeed specific to clang.
>
> My proposed patch is in FileManager::getDirectory() because I see some codes
> in the beginning of this function already trying to patch the directory name.
> Alternatively we could also try to fix up the directory name in llvm::sys::
> path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in
> getDirectoryFromFile() could take the same approach as I did in getDirectory().
> Doing it in parent_path() might be a bit more tricky because to grow the
> string, some memory allocator is needed: I would like some advice on what is
> the best way to handle it.
>
> To answer Aaron's question, I used path::root_name() instead of
> path::root_path() so that the expression "DirName == root_name(DirName)"
> returns different results for "C:" and "C:\", where the fix is really only
> needed for the "C:" case.
>
> "C:test.c" is not directly passed to path::root_name(). Before entering
> FileManager::getDirectory(), where my patch is intended, path::parent_path()
> will be run on the file path to extract the directory name first before
> passing the directory name to path::root_name(), which returns "C:" for
> both C:\ and C:. Does this address your question, Aaron?
>
> I wrote a unit test thanks to Aaron's suggestion. I am adding it to
> clang/unittests/Basic/FileManagerTest.cpp. Could you take a look?
Thanks for the detailed explanation. I think this fix (with the test)
is good for now. Aaron, any objections?
Ideas for future improvement:
* We should probably change FileManager::getDirectory to not play
with the directory name and move the responsibility to parent_path.
The only valid driver names are the letters from a to z? If so we
would be able to keep the interface and just keep some global string
constants:
static const char * DriveRoots[] = {"a:\", "b:\"....}
and return one of them when given something like "c:foo".
* getUniqueID should probably return a pair of 64 bit numbers. On unix
it should include the st_dev. On windows we could drop the xor trick.
* FileManager::UniqueDirContainer::getDirectory could then stop using
the filename on windows and have a common unix and windows
implementation.
> Many thanks,
> - Gao.
Cheers,
Rafael
More information about the cfe-commits
mailing list