[PATCH] Fix Windows path formatting when using -MD

Robinson, Paul Paul_Robinson at playstation.sony.com
Fri Apr 17 15:14:59 PDT 2015


Sent it by email, haven't gotten anything back yet.

From: Sean Silva [mailto:chisophugis at gmail.com]
Sent: Friday, April 17, 2015 3:08 PM
To: Robinson, Paul
Cc: Yaron Keren; Yung, Douglas; llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] Fix Windows path formatting when using -MD



On Fri, Apr 17, 2015 at 2:56 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com<mailto:Paul_Robinson at playstation.sony.com>> wrote:
I have actually sent a bug to make, not gcc, because make is the relevant tool here, not gcc.  There is no documentation for backslash quoting in Makefiles, that I can find.  Without a specification from the make people, there's no point whining to gcc.
And if the make people provide a definitive answer, we can conform to it, independent of whatever gcc does.

Thanks. Do you have a link to the make bug?

-- Sean Silva

--paulr

From: Sean Silva [mailto:chisophugis at gmail.com<mailto:chisophugis at gmail.com>]
Sent: Friday, April 17, 2015 2:44 PM

To: Robinson, Paul
Cc: Yaron Keren; Yung, Douglas; llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: Re: [PATCH] Fix Windows path formatting when using -MD



On Thu, Apr 16, 2015 at 2:49 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com<mailto:Paul_Robinson at playstation.sony.com>> wrote:
GCC does escape backslashes, using a very specific logic that matches that matches what gnu make accepts.
Actually, that matches what some gcc developer believed that gnu make accepted at the time that was written.

Make's input format is extremely esoteric: http://git.savannah.gnu.org/cgit/make.git/tree/read.c
And if you'll note the comment on line 2235:
   Backslashes quote STOPCHAR, blanks if BLANK is nonzero, and backslash.
And cruising various bits of 'make' at random, code that handles backslashes quoting backslashes is rampant.

This is confirmed experimentally, using gcc 4.8.2 and make 3.81 that came on my Ubuntu 14.04.

~/projects/scratch/backslash$ ls -l
total 4
-rw-r--r-- 1 probinson rd  0 Apr 16 13:20 foo\\bar.h
-rw-r--r-- 1 probinson rd 72 Apr 16 13:32 t.c
~/projects/scratch/backslash$ cat t.c
#include "foo\\bar.h" /* Actual filename has two backslashes. */
int i;
~/projects/scratch/backslash$ # Prove the #include works.
~/projects/scratch/backslash$ gcc -MD -c t.c
~/projects/scratch/backslash$ ls -l
total 12
-rw-r--r-- 1 probinson rd   0 Apr 16 13:20 foo\\bar.h
-rw-r--r-- 1 probinson rd  72 Apr 16 13:32 t.c
-rw-r--r-- 1 probinson rd  47 Apr 16 13:33 t.d
-rw-r--r-- 1 probinson rd 951 Apr 16 13:33 t.o
~/projects/scratch/backslash$ # But gcc fails to escape the backslashes.
~/projects/scratch/backslash$ cat t.d
t.o: t.c /usr/include/stdc-predef.h foo\\bar.h
~/projects/scratch/backslash$ # Make is _able_ to find the right file this way...
~/projects/scratch/backslash$ make -f t.d t.o
make: `t.o' is up to date.
~/projects/scratch/backslash$ # ... but only if a better match doesn't exist.
~/projects/scratch/backslash$ echo > foo\\bar.h
~/projects/scratch/backslash$ ls -l
total 16
-rw-r--r-- 1 probinson rd   0 Apr 16 13:20 foo\\bar.h
-rw-r--r-- 1 probinson rd   1 Apr 16 13:34 foo\bar.h
-rw-r--r-- 1 probinson rd  72 Apr 16 13:32 t.c
-rw-r--r-- 1 probinson rd  47 Apr 16 13:33 t.d
-rw-r--r-- 1 probinson rd 951 Apr 16 13:33 t.o
~/projects/scratch/backslash$ # Make prefers to treat \\ as escaped \.
~/projects/scratch/backslash$ make -f t.d t.o
cc    -c -o t.o t.c
~/projects/scratch/backslash$

Now, it's entirely possible that some older version of make didn't treat \\ as escaped \ and clearly make does implement a fallback to treating all \ as normal characters in filenames.  It might be an interesting exercise to find the place within 'make' that implements that fallback, and see what sort of commentary about gcc behavior might be decorating that particular code.  (Running make with –d didn't turn up any informative messages; if foo\bar.h existed, it reported on that file, and if it didn't exist, it reported on foo\\bar.h instead without a murmur.)

I think it's pretty clearly a gcc bug that make has learned to work around.

Please file a bug with GCC if you think its current behavior is wrong. Without acknowledgement from the GCC folks that the current behavior is erroneous, it's hard to justify deviating from it, considering how long it has been in the wild and (apparently) working.

-- Sean Silva

--paulr

From: Sean Silva [mailto:chisophugis at gmail.com<mailto:chisophugis at gmail.com>]
Sent: Thursday, April 16, 2015 5:56 AM
To: Robinson, Paul
Cc: Yaron Keren; Yung, Douglas; llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>

Subject: Re: [PATCH] Fix Windows path formatting when using -MD



On Tue, Apr 14, 2015 at 3:21 AM, Robinson, Paul <Paul_Robinson at playstation.sony.com<mailto:Paul_Robinson at playstation.sony.com>> wrote:
Re not producing nmake-style depfiles, it is more persuasive to me that no version of CL since 6.0 (obsolete for a decade or so) has supported producing them, so why should Clang.
But that's not the end of it!

tl;dr:  We need to escape the backslashes too, regardless of platform.

(The long version…)

If backslashes are the escape character, they ought to be escaped…. leaving aside their semantic significance on Windows, that seems like an oversight in its own right.

I ran an experiment on Linux…

$ cat t.c
#include "foo\bar.h" /* C99 6.4.7, this is not a normal C string */
#include "bar#foo.h"
int I;
$ touch foo\\bar.h
$ touch bar\#foo.h
$ # gcc accepts this
$ gcc -c -MD t.c
$ # but doesn't escape the backslash in the dependency file
$ cat t.d
t.o: t.c /usr/include/stdc-predef.h foo\bar.h bar\#foo.h
$ # make will accept the single-backslash dependency
$ make –f t.d t.o
make: `t.o' is up to date.
$ touch foo\\bar.h
$ make –f t.d t.o
cc –c –o t.o t.c
$ # Will make tolerate an escaped backslash?
$ cat t.d
t.o: t.c /usr/include/stdc-predef.h foo\\bar.h bar\#foo.h
$ make –f t.d t.o
make: `t.o' is up to date.
$ touch foo\\bar.h
$ make –f t.d t.o
cc –c –o t.o t.c

So, gcc doesn't escape backslashes when writing the .d file, but make will tolerate this gcc bug and accept escaped or un-escaped backslashes.

GCC does escape backslashes, using a very specific logic that matches that matches what gnu make accepts. See https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libcpp/mkdeps.c;h=78bbf09a0cc8c3ba0fe974b55e1ededf02c66665;hb=HEAD#l47


In fact, further experiments (not shown here) show that make will *first* try to interpret the backslash as an escape character, and if that doesn't find a matching filename, it falls back to treating the backslash as a non-special character.

Therefore, I think it is always safe to escape backslashes, and that behavior doesn't have to be conditioned on Windows/non-Windows.

Make's input format is extremely esoteric: http://git.savannah.gnu.org/cgit/make.git/tree/read.c
It even varies by platform and build configuration. Unless we are confident that our testing has covered the build configurations and platforms that any user of clang's gcc-compatible driver is likely to encounter, I would hesitate to conclude something is "safe".

The bottom line is that unless we have a good reason, we should exactly match GCC's output format (see the source link above). A "good reason" should probably take the form of a bug report against GCC.

Shoring up our depfile output to match GCC's is probably the most immediate improvement to be made in clang's gcc-compatible depfile output. Our current behavior is close to GCC's but does not handle whitespace escaping in the same way, which causes our makefiles to be incompatible e.g. in the `#include "foo\ bar.h"` corner case, where our output is broken but GCC's works.

-- Sean Silva


Yet even further experiments (not shown here) demonstrate that Windows doesn't mind having a directory name with a *leading* space or hash (because… Windows).  Therefore, on Windows we *must* escape any backslashes, so that
#include "foo\ bar.h"
(the file <space>bar.h in subdirectory foo) ends up correctly escaped in the .d file thusly:
t.o: foo\\\ bar.h

That is, the first two backslashes will be treated as a single backslash (& directory separator on Windows) while the third backslash escapes the space so that make won't treat the space as the end of the filespec.
--paulr

From: Sean Silva [mailto:chisophugis at gmail.com<mailto:chisophugis at gmail.com>]
Sent: Monday, April 13, 2015 3:48 PM
To: Yaron Keren
Cc: Yung, Douglas; Robinson, Paul; llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>

Subject: Re: [PATCH] Fix Windows path formatting when using -MD



On Sat, Apr 11, 2015 at 9:32 AM, Yaron Keren <yaron.keren at gmail.com<mailto:yaron.keren at gmail.com>> wrote:
LLVM_ON_WIN32 is true for mingw and cygwin which always use gcc make.
For nmake vs make issue Driver::IsCLMode() could be used.

Why? No program I'm aware of passes -M* options to (clang-)cl.exe expecting nmake depfiles. If we want to add a new feature to support a new use case, we shouldn't add it as a weird implicit behavior. If we do want to add this as a feature to clang, we should just add a way to explicitly say "use nmake format for depfiles". Still, I don't know of any program that uses or would use nmake depfile output.

-- Sean Silva



2015-04-11 2:46 GMT+03:00 Yung, Douglas <douglas_yung at playstation.sony.com<mailto:douglas_yung at playstation.sony.com>>:
I got the same results as Sean. I guess the question now is if we want to support generating nmake compatible output, and if so, should we do it using a switch similar to how Intel has done it at the link Sean provided.

Douglas Yung

From: Sean Silva [mailto:chisophugis at gmail.com<mailto:chisophugis at gmail.com>]
Sent: Friday, April 10, 2015 16:32
To: Robinson, Paul
Cc: Yung, Douglas; llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>

Subject: Re: [PATCH] Fix Windows path formatting when using -MD



On Fri, Apr 10, 2015 at 11:13 AM, Robinson, Paul <Paul_Robinson at playstation.sony.com<mailto:Paul_Robinson at playstation.sony.com>> wrote:
Hi Sean,
Were you trying that on Windows?  My understanding is that Microsoft's NMAKE wants quoted filespecs and will think backslashes are directory delimiters.

I'm seeing:

foo: bar\ baz

NMAKE : fatal error U1073: don't know how to make 'bar\'

foo: "bar baz"

NMAKE : fatal error U1073: don't know how to make '"bar baz"'

(but note that '"bar baz"' is actually satisfied by a file called `bar baz`; so nmake is not interpreting the quotes as part of the file name)

I wasn't able to find a specific reference for the nmake syntax, but from these experiments, it is clear that nmake is incompatible and a gcc compatible driver that outputs nmake dependency files is broken. We obviously cannot change our gcc-compatible dependency files as that would break users (they may be using a unix make on windows).

It appears that when Intel implemented dependency file generation in their Fortran compiler, they needed to have two different modes for make and nmake. See the last two posts in https://software.intel.com/en-us/forums/topic/269853

-- Sean Silva

--paulr

From: llvm-commits-bounces at cs.uiuc.edu<mailto:llvm-commits-bounces at cs.uiuc.edu> [mailto:llvm-commits-bounces at cs.uiuc.edu<mailto:llvm-commits-bounces at cs.uiuc.edu>] On Behalf Of Yung, Douglas
Sent: Thursday, April 09, 2015 5:17 PM
To: Sean Silva
Cc: llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: RE: [PATCH] Fix Windows path formatting when using -MD

Hi Sean, what you are saying does make sense, I’ll talk with the team that owns the tool to see. Thanks!

Douglas Yung

From: Sean Silva [mailto:chisophugis at gmail.com]
Sent: Thursday, April 09, 2015 17:11
To: Yung, Douglas
Cc: llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: Re: [PATCH] Fix Windows path formatting when using -MD

Actually, looking at this, I think that the root cause of your problem is that your tool is not interpreting the makefile properly. This patch will break every program that properly reads makefiles. Please verify that your tool interprets the attached file "Makefile" as a file "foo" depending on a file "bar baz". This is the output of BSD make and GNU make:

Sean:~/tmp/testmake % make
make: *** No rule to make target `bar baz', needed by `foo'.  Stop.
zsh: exit 2     make
Sean:~/tmp/testmake % gnumake
gnumake: *** No rule to make target `bar baz', needed by `foo'.  Stop.
zsh: exit 2     gnumake

If instead of bar\ baz I write "bar baz" (see attached file "broken"), as your patch causes us to do, then this is the output:

Sean:~/tmp/testmake % make -f broken
make: *** No rule to make target `"bar', needed by `foo'.  Stop.
zsh: exit 2     make -f broken
Sean:~/tmp/testmake % gnumake -f broken
gnumake: *** No rule to make target `"bar', needed by `foo'.  Stop.
zsh: exit 2     gnumake -f broken

Please verify that your tool inteprets the makefile in this way. i.e. `"bar baz"` is actually two files `"bar` and `baz"`.

-- Sean Silva

On Thu, Apr 9, 2015 at 4:55 PM, Sean Silva <chisophugis at gmail.com<mailto:chisophugis at gmail.com>> wrote:


On Thu, Apr 9, 2015 at 3:33 PM, Yung, Douglas <douglas_yung at playstation.sony.com<mailto:douglas_yung at playstation.sony.com>> wrote:
Hi Sean,

The program we are using that consumes the compiler’s –MD output is a tool that we produce which parses the output to provide integration with Visual Studio.

I agree that the example you gave of OpenFile(“\”foo.txt\””) would not be valid, although I’m not sure that the compiler would even generate that on Windows. According to the MSDN documentation (https://msdn.microsoft.com/en-us/library/aa365247), a double quote is not a valid character for a filename, so the compiler should never even reach this point in the code because the compiler would have been unable to resolve the reference to begin with.


No, I mean that any tool that actually reads the makefiles produced by this patch would end up with paths like "\"foo.txt\"". That your tool is interpreting the output of this patch the way you want indicates that it is not actually correctly parsing the makefile syntax.

E.g. a makefile like this:

foo.o: "C:\Program Files\bar.h"

actually means that "foo.o" depends on "\"C:\Program Files\bar.h\"". If your tool does not interpret it like this, then your tool is not interpreting the makefile correctly.

-- Sean Silva


Douglas Yung

From: Sean Silva [mailto:chisophugis at gmail.com<mailto:chisophugis at gmail.com>]
Sent: Thursday, April 09, 2015 14:57
To: Yung, Douglas
Cc: llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
Subject: Re: [PATCH] Fix Windows path formatting when using -MD



On Thu, Apr 9, 2015 at 12:27 PM, Yung, Douglas <douglas_yung at playstation.sony.com<mailto:douglas_yung at playstation.sony.com>> wrote:
Hi,

This patch fixes a problem on Windows when the compiler generates a dependency file using –MD or –MMD. The problem is that when emitting a path that contains spaces, the compiler emits it by escaping the spaces with a leading backspace character which is not valid in Windows. The proper fix for this is to just emit the path enclosed in double quotes which is accepted by Windows. This patch makes that change, and the compiler will then generate the double quoted paths only when the compiler is hosted on Windows. It also updates 3 tests that were affected by this change to accept this change on Windows as well as verify the change works as expected.

What program are you using that consumes makefiles and interprets quoted strings? Quoted strings are not part of Makefile syntax. This will result in programs making calls like OpenFile("\"foo.txt\"") which I don't think is valid.

-- Sean Silva


Douglas Yung

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150417/c44cda35/attachment.html>


More information about the llvm-commits mailing list