[PATCH] Fixing a driver bug where clang cannot locate source file at drive root
Arthur O'Dwyer
arthur.j.odwyer at gmail.com
Mon Jul 29 11:58:21 PDT 2013
Rafael,
Is this wacky behavior a Cygwin bug, or did Microsoft's actual
behavior change sometime between Win95 and Win7, or was there a flaw
in your experiment? The usual meaning of "C:foo.c" is
* C:\<last working directory on C:>\foo.c if the current volume *is
not* C:; or
* .\foo.c if the current volume *is* C:.
I suspect that what everyone is really looking for is just the Windows
API function GetFullPathName(), although its documentation isn't very
clear about its behavior on "C:foo.c" and unfortunately I'm not in a
position to test it. There doesn't seem to be any documented way to
extract the "last working directory on C:" information from the
Windows API without also changing the working directory. But if
that's acceptable, then there's this:
TCHAR buffer[64000];
SetCurrentDirectory("C:");
GetCurrentDirectory(buffer, 64000); // I suspect this yields
"C:\<last working directory on C:>\"
HTH,
–Arthur
On Mon, Jul 29, 2013 at 8:53 AM, Rafael Espíndola
<rafael.espindola at gmail.com> wrote:
> I committed this as r187359.
>
> Testing this a bit more shows that we still don't implement this
> exactly like native tools. The behavior is *really* strange. The
> meaning of "C:foo.c" seems to be
>
> * C:\foo.c if the current drive *is not* C
> * .\foo.c if the current drive *is* C
>
> I guess we can implement this in parent_path by having it return "."
> for "C:foo.c" if the current drive is C and "C:\" if it is not.
>
>
>
> On 26 July 2013 12:59, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> On 23 July 2013 17:56, Yunzhong Gao <Yunzhong_Gao at playstation.sony.com> wrote:
>>> Hi Aaron and Rafael,
>>>
>>> Many thanks for reviewing my patch. I did a little more investigation on how
>>> the file I/O works on LLVM/Clang.
>>>
>>> When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and
>>> ::stat("C:test.c", &stat_buffer) return success. On the other hand,
>>> ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\")
>>> returns success.
>>>
>>> Rafael asked whether this is a clang-specifc problem or a llvm problem.
>>> It looks like clang uses FileManager::getFile() to open a file, which will
>>> first check for existence of the enclosing directory before trying to open
>>> the file. This first check is where the trailing back slash makes the
>>> difference. It is done by calling FileSystemStatCache::get() on the directory
>>> name, which then calls library function ::stat(). Eventually it uses
>>> MemoryBuffer::getFile() to open the file.
>>>
>>> I checked the other llvm tools and they do not checking for the enclosing
>>> directory as clang does:
>>> llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile()
>>> to open a file. It does not check for existence of the enclosing
>>> directory, so the difference between stat("C:\\") and stat("C:") does
>>> not affect llvm-as here.
>>> bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use
>>> llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input
>>> file. Again they do not check for existence of the enclosing directory.
>>> llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, llvm-dwarfdump,
>>> llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck,
>>> utils/FileUpdate, llvm-tblgen, clang-tblgen also use
>>> MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open
>>> the input file without checking for existence of the enclosing directory.
>>> llvm-objdump, llvm-readobj, llvm-size and macho-dump use
>>> llvm::object::createBinary() and then MemoryBuffer::getFile() to open the
>>> input file without checking for existence of the enclosing file.
>>> llvm-ar uses Memory::getFile() to open the archive file. It does
>>> not check for existence of the enclosing file.
>>> llvm-config needs to compare current directory to development tree for
>>> equality, and when doing so, always gets absolute paths first.
>>> llvm-dis uses DataStreamer::OpenFile() and then sys::fs::openFileForRead(),
>>> which calls open() on Linux and CreateFileW() on Windows.
>>> llvm-extract uses llvm::getLazyIRFileModule() and then
>>> MemoryBuffer::getFile() to open the input file.
>>> llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream()
>>> to open a file for write, which calls sys::fs::openFileForWrite().
>>> llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then
>>> getOrCreateBinary() => object::createBinary() => MemoryBuffer::getFile()
>>> to open the input file.
>>> utils/count reads from stdin.
>>> Because none of these llvm tools check for existence of the enclosing
>>> directory before opening their input files, they do not have the same problem
>>> as clang. So to answer Rafael's question, this is indeed specific to clang.
>>>
>>> My proposed patch is in FileManager::getDirectory() because I see some codes
>>> in the beginning of this function already trying to patch the directory name.
>>> Alternatively we could also try to fix up the directory name in llvm::sys::
>>> path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in
>>> getDirectoryFromFile() could take the same approach as I did in getDirectory().
>>> Doing it in parent_path() might be a bit more tricky because to grow the
>>> string, some memory allocator is needed: I would like some advice on what is
>>> the best way to handle it.
>>>
>>> To answer Aaron's question, I used path::root_name() instead of
>>> path::root_path() so that the expression "DirName == root_name(DirName)"
>>> returns different results for "C:" and "C:\", where the fix is really only
>>> needed for the "C:" case.
>>>
>>> "C:test.c" is not directly passed to path::root_name(). Before entering
>>> FileManager::getDirectory(), where my patch is intended, path::parent_path()
>>> will be run on the file path to extract the directory name first before
>>> passing the directory name to path::root_name(), which returns "C:" for
>>> both C:\ and C:. Does this address your question, Aaron?
>>>
>>> I wrote a unit test thanks to Aaron's suggestion. I am adding it to
>>> clang/unittests/Basic/FileManagerTest.cpp. Could you take a look?
>>
>> Thanks for the detailed explanation. I think this fix (with the test)
>> is good for now. Aaron, any objections?
>>
>> Ideas for future improvement:
>> * We should probably change FileManager::getDirectory to not play
>> with the directory name and move the responsibility to parent_path.
>> The only valid driver names are the letters from a to z? If so we
>> would be able to keep the interface and just keep some global string
>> constants:
>>
>> static const char * DriveRoots[] = {"a:\", "b:\"....}
>>
>> and return one of them when given something like "c:foo".
>>
>> * getUniqueID should probably return a pair of 64 bit numbers. On unix
>> it should include the st_dev. On windows we could drop the xor trick.
>>
>> * FileManager::UniqueDirContainer::getDirectory could then stop using
>> the filename on windows and have a common unix and windows
>> implementation.
>>
>>> Many thanks,
>>> - Gao.
>>
>> Cheers,
>> Rafael
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
More information about the cfe-commits
mailing list