[PATCH] Fixing a driver bug where clang cannot locate source file at drive root

Arthur O'Dwyer arthur.j.odwyer at gmail.com
Mon Jul 29 11:58:21 PDT 2013


Rafael,

  Is this wacky behavior a Cygwin bug, or did Microsoft's actual
behavior change sometime between Win95 and Win7, or was there a flaw
in your experiment?  The usual meaning of "C:foo.c" is

* C:\<last working directory on C:>\foo.c    if the current volume *is
not* C:; or
* .\foo.c      if the current volume *is* C:.

I suspect that what everyone is really looking for is just the Windows
API function GetFullPathName(), although its documentation isn't very
clear about its behavior on "C:foo.c" and unfortunately I'm not in a
position to test it.  There doesn't seem to be any documented way to
extract the "last working directory on C:" information from the
Windows API without also changing the working directory.  But if
that's acceptable, then there's this:

    TCHAR buffer[64000];
    SetCurrentDirectory("C:");
    GetCurrentDirectory(buffer, 64000);  // I suspect this yields
"C:\<last working directory on C:>\"

HTH,
–Arthur


On Mon, Jul 29, 2013 at 8:53 AM, Rafael Espíndola
<rafael.espindola at gmail.com> wrote:
> I committed this as r187359.
>
> Testing this a bit more shows that we still don't implement this
> exactly like native tools. The behavior is *really* strange. The
> meaning of "C:foo.c" seems to be
>
> * C:\foo.c    if the current drive *is not* C
> * .\foo.c      if the current drive *is* C
>
> I guess we can implement this in parent_path by having it return "."
> for "C:foo.c" if the current drive is C and "C:\" if it is not.
>
>
>
> On 26 July 2013 12:59, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> On 23 July 2013 17:56, Yunzhong Gao <Yunzhong_Gao at playstation.sony.com> wrote:
>>>   Hi Aaron and Rafael,
>>>
>>>   Many thanks for reviewing my patch. I did a little more investigation on how
>>>   the file I/O works on LLVM/Clang.
>>>
>>>   When I tested on Windows 7, ::open("C:test.c", O_RDONLY) and
>>>   ::stat("C:test.c", &stat_buffer) return success. On the other hand,
>>>   ::stat("C:") returns failure. With a trailing back slash, ::stat("C:\\")
>>>   returns success.
>>>
>>>   Rafael asked whether this is a clang-specifc problem or a llvm problem.
>>>   It looks like clang uses FileManager::getFile() to open a file, which will
>>>   first check for existence of the enclosing directory before trying to open
>>>   the file. This first check is where the trailing back slash makes the
>>>   difference. It is done by calling FileSystemStatCache::get() on the directory
>>>   name, which then calls library function ::stat(). Eventually it uses
>>>   MemoryBuffer::getFile() to open the file.
>>>
>>>   I checked the other llvm tools and they do not checking for the enclosing
>>>   directory as clang does:
>>>     llvm-as uses llvm::ParseAssemblyFile() and then MemoryBuffer::getFile()
>>>       to open a file. It does not check for existence of the enclosing
>>>       directory, so the difference between stat("C:\\") and stat("C:") does
>>>       not affect llvm-as here.
>>>     bugpoint, llc, lli, llvm-diff, llvm-link, llvm-jitlistener and opt use
>>>       llvm::ParseIRFile() and then MemoryBuffer::getFile() to open the input
>>>       file. Again they do not check for existence of the enclosing directory.
>>>     llvm-bcanalyzer, llvm-cov, llvm-mc, llvm-mcmarkup, llvm-nm, llvm-dwarfdump,
>>>       llvm-prof, llvm-rtdyld, obj2yaml, yaml2obj, utils/FileCheck,
>>>       utils/FileUpdate, llvm-tblgen, clang-tblgen also use
>>>       MemoryBuffer->getFileOrSTDIN() and then MemoryBuffer->getFile() to open
>>>       the input file without checking for existence of the enclosing directory.
>>>     llvm-objdump, llvm-readobj, llvm-size and macho-dump use
>>>       llvm::object::createBinary() and then MemoryBuffer::getFile() to open the
>>>       input file without checking for existence of the enclosing file.
>>>     llvm-ar uses Memory::getFile() to open the archive file. It does
>>>       not check for existence of the enclosing file.
>>>     llvm-config needs to compare current directory to development tree for
>>>       equality, and when doing so, always gets absolute paths first.
>>>     llvm-dis uses DataStreamer::OpenFile() and then sys::fs::openFileForRead(),
>>>       which calls open() on Linux and CreateFileW() on Windows.
>>>     llvm-extract uses llvm::getLazyIRFileModule() and then
>>>       MemoryBuffer::getFile() to open the input file.
>>>     llvm-stress takes no input file; it uses raw_fd_ostream::raw_fd_ostream()
>>>       to open a file for write, which calls sys::fs::openFileForWrite().
>>>     llvm-symbolizer uses LLVMSymbolizer::getOrCreateModuleInfo(), then
>>>       getOrCreateBinary() => object::createBinary() => MemoryBuffer::getFile()
>>>       to open the input file.
>>>     utils/count reads from stdin.
>>>   Because none of these llvm tools check for existence of the enclosing
>>>   directory before opening their input files, they do not have the same problem
>>>   as clang. So to answer Rafael's question, this is indeed specific to clang.
>>>
>>>   My proposed patch is in FileManager::getDirectory() because I see some codes
>>>   in the beginning of this function already trying to patch the directory name.
>>>   Alternatively we could also try to fix up the directory name in llvm::sys::
>>>   path::parent_path() or FileManager::getDirectoryFromFile(). Doing it in
>>>   getDirectoryFromFile() could take the same approach as I did in getDirectory().
>>>   Doing it in parent_path() might be a bit more tricky because to grow the
>>>   string, some memory allocator is needed: I would like some advice on what is
>>>   the best way to handle it.
>>>
>>>   To answer Aaron's question, I used path::root_name() instead of
>>>   path::root_path() so that the expression "DirName == root_name(DirName)"
>>>   returns different results for "C:" and "C:\", where the fix is really only
>>>   needed for the "C:" case.
>>>
>>>   "C:test.c" is not directly passed to path::root_name(). Before entering
>>>   FileManager::getDirectory(), where my patch is intended, path::parent_path()
>>>   will be run on the file path to extract the directory name first before
>>>   passing the directory name to path::root_name(), which returns "C:" for
>>>   both C:\ and C:. Does this address your question, Aaron?
>>>
>>>   I wrote a unit test thanks to Aaron's suggestion. I am adding it to
>>>   clang/unittests/Basic/FileManagerTest.cpp. Could you take a look?
>>
>> Thanks for the detailed explanation. I think this fix (with the test)
>> is good for now. Aaron, any objections?
>>
>> Ideas for future improvement:
>>  *  We should probably change FileManager::getDirectory to not play
>> with the directory name and move the responsibility to parent_path.
>> The only valid driver names are the letters from a to z? If so we
>> would be able to keep the interface and just keep some global string
>> constants:
>>
>> static const char * DriveRoots[] = {"a:\", "b:\"....}
>>
>> and return one of them when given something like "c:foo".
>>
>> * getUniqueID should probably return a pair of 64 bit numbers. On unix
>> it should include the st_dev. On windows we could drop the xor trick.
>>
>> * FileManager::UniqueDirContainer::getDirectory could then stop using
>> the filename on windows and have a common unix and windows
>> implementation.
>>
>>>   Many thanks,
>>>   - Gao.
>>
>> Cheers,
>> Rafael
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits




More information about the cfe-commits mailing list