[PATCH] Fixing a driver bug where clang cannot locate source file at drive root
Rafael EspĂndola
rafael.espindola at gmail.com
Mon Jul 29 08:53:50 PDT 2013
I committed this as r187359.
Testing this a bit more shows that we still don't implement this
exactly like native tools. The behavior is *really* strange. The
meaning of "C:foo.c" seems to be
* C:\foo.c if the current drive *is not* C
* .\foo.c if the current drive *is* C
I guess we can implement this in parent_path by having it return "."
for "C:foo.c" if the current drive is C and "C:\" if it is not.
On 26 July 2013 12:59, Rafael EspĂndola <rafael.espindola at gmail.com> wrote:
> On 23 July 2013 17:56, Yunzhong Gao <Yunzhong_Gao at playstation.sony.com> wrote:
>> Hi Aaron and Rafael,
>>
>> Many thanks for reviewing my patch. I did a little more investigation on how
>> the file I/O works on LLVM/Clang.
>>
>> When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and
>> ::stat("C:test.c", &stat_buffer) return success. On the other hand,
>> ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\")
>> returns success.
>>
>> Rafael asked whether this is a clang-specifc problem or a llvm problem.
>> It looks like clang uses FileManager::getFile() to open a file, which will
>> first check for existence of the enclosing directory before trying to open
>> the file. This first check is where the trailing back slash makes the
>> difference. It is done by calling FileSystemStatCache::get() on the directory
>> name, which then calls library function ::stat(). Eventually it uses
>> MemoryBuffer::getFile() to open the file.
>>
>> I checked the other llvm tools and they do not checking for the enclosing
>> directory as clang does:
>> llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile()
>> to open a file. It does not check for existence of the enclosing
>> directory, so the difference between stat("C:\\") and stat("C:") does
>> not affect llvm-as here.
>> bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use
>> llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input
>> file. Again they do not check for existence of the enclosing directory.
>> llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, llvm-dwarfdump,
>> llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck,
>> utils/FileUpdate, llvm-tblgen, clang-tblgen also use
>> MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open
>> the input file without checking for existence of the enclosing directory.
>> llvm-objdump, llvm-readobj, llvm-size and macho-dump use
>> llvm::object::createBinary() and then MemoryBuffer::getFile() to open the
>> input file without checking for existence of the enclosing file.
>> llvm-ar uses Memory::getFile() to open the archive file. It does
>> not check for existence of the enclosing file.
>> llvm-config needs to compare current directory to development tree for
>> equality, and when doing so, always gets absolute paths first.
>> llvm-dis uses DataStreamer::OpenFile() and then sys::fs::openFileForRead(),
>> which calls open() on Linux and CreateFileW() on Windows.
>> llvm-extract uses llvm::getLazyIRFileModule() and then
>> MemoryBuffer::getFile() to open the input file.
>> llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream()
>> to open a file for write, which calls sys::fs::openFileForWrite().
>> llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then
>> getOrCreateBinary() => object::createBinary() => MemoryBuffer::getFile()
>> to open the input file.
>> utils/count reads from stdin.
>> Because none of these llvm tools check for existence of the enclosing
>> directory before opening their input files, they do not have the same problem
>> as clang. So to answer Rafael's question, this is indeed specific to clang.
>>
>> My proposed patch is in FileManager::getDirectory() because I see some codes
>> in the beginning of this function already trying to patch the directory name.
>> Alternatively we could also try to fix up the directory name in llvm::sys::
>> path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in
>> getDirectoryFromFile() could take the same approach as I did in getDirectory().
>> Doing it in parent_path() might be a bit more tricky because to grow the
>> string, some memory allocator is needed: I would like some advice on what is
>> the best way to handle it.
>>
>> To answer Aaron's question, I used path::root_name() instead of
>> path::root_path() so that the expression "DirName == root_name(DirName)"
>> returns different results for "C:" and "C:\", where the fix is really only
>> needed for the "C:" case.
>>
>> "C:test.c" is not directly passed to path::root_name(). Before entering
>> FileManager::getDirectory(), where my patch is intended, path::parent_path()
>> will be run on the file path to extract the directory name first before
>> passing the directory name to path::root_name(), which returns "C:" for
>> both C:\ and C:. Does this address your question, Aaron?
>>
>> I wrote a unit test thanks to Aaron's suggestion. I am adding it to
>> clang/unittests/Basic/FileManagerTest.cpp. Could you take a look?
>
> Thanks for the detailed explanation. I think this fix (with the test)
> is good for now. Aaron, any objections?
>
> Ideas for future improvement:
> * We should probably change FileManager::getDirectory to not play
> with the directory name and move the responsibility to parent_path.
> The only valid driver names are the letters from a to z? If so we
> would be able to keep the interface and just keep some global string
> constants:
>
> static const char * DriveRoots[] = {"a:\", "b:\"....}
>
> and return one of them when given something like "c:foo".
>
> * getUniqueID should probably return a pair of 64 bit numbers. On unix
> it should include the st_dev. On windows we could drop the xor trick.
>
> * FileManager::UniqueDirContainer::getDirectory could then stop using
> the filename on windows and have a common unix and windows
> implementation.
>
>> Many thanks,
>> - Gao.
>
> Cheers,
> Rafael
More information about the cfe-commits
mailing list