[llvm-bugs] [Bug 39506] Regression in batch file quote handling

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Dec 10 10:15:22 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=39506

Reid Kleckner <rnk at google.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WONTFIX
             Status|NEW                         |RESOLVED

--- Comment #4 from Reid Kleckner <rnk at google.com> ---
So, apparently the CRT and CommandLineToArgvW have different behaviors in
obscure cases like these.

I was referred to these comments in Chromium's source code:
https://cs.chromium.org/chromium/src/chrome/install_static/install_util.cc?type=cs&q=%3E%5C+TokenizeCommandLineToArray+file:install_util%5C.cc&sq=package:chromium&g=0&l=742
  // This is baroquely complex to do properly, see e.g.
  // https://blogs.msdn.microsoft.com/oldnewthing/20100917-00/?p=12833
  //
http://www.windowsinspired.com/how-a-windows-programs-splits-its-command-line-into-individual-arguments/
  // and many others. We cannot use CommandLineToArgvW() in chrome_elf, because
  // it's in shell32.dll. Previously, __wgetmainargs() in the CRT was
available,
  // and it's still documented for VS 2015 at
  // https://msdn.microsoft.com/en-us/library/ff770599.aspx but unfortunately,
  // isn't actually available.
  //
  // This parsing matches CommandLineToArgvW()s for arguments, rather than the
  // CRTs. These are different only in the most obscure of cases and will not
  // matter in any practical situation. See the windowsinspired.com post above
  // for details.
  //
  // Indicates whether or not space and tab are interpreted as token
separators.

When we switched to CommandLineToArgvW back in r192069
(http://reviews.llvm.org/D1834 from 2013), we didn't intend to change behavior
from the tokenization that the CRT performs when it populates argv for a
regular 'main' prototype. I believe that the recent change in r341988 brought
us back to the CRT behavior. I'm going to close wontfix to acknowledge that
this was a behavior change, but we intend to stick with it.

If you want to reopen, use this program with various batch file invocations and
compare what it prints to how clang understands its command line:

#include <stdio.h>
int main(int argc, char **argv) {
    for (int i = 0; i < argc; ++i)
        puts(argv[i]);
}

I suspect that if you invoke it like you did clang:
// run.bat
foo.exe -c -x assembler-with-cpp -DFOO="""%~1""" bar.s -o bar.out.o

Usage:
C:\Work>run.bat 123

I believe (don't have a Windows machine to test at the moment) it will print:

foo.exe
-c
-x
assembler-with-cpp
-DFOO=123
bar.s
-o
bar.out.o

If you get a different tokenization, we can re-evaluate.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181210/ca7aa1cd/attachment.html>


More information about the llvm-bugs mailing list