[all-commits] [llvm/llvm-project] 1be024: [Windows] Fix cmd line tokenization of unclosed qu...

Simon Tatham via All-commits all-commits at lists.llvm.org
Tue May 3 03:58:20 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 1be024ee450f2d3cb07086f6141d50f291c1910b
      https://github.com/llvm/llvm-project/commit/1be024ee450f2d3cb07086f6141d50f291c1910b
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2022-05-03 (Tue, 03 May 2022)

  Changed paths:
    M llvm/lib/Support/CommandLine.cpp
    M llvm/lib/Support/Windows/Process.inc
    M llvm/unittests/Support/CommandLineTest.cpp

  Log Message:
  -----------
  [Windows] Fix cmd line tokenization of unclosed quotes.

When cl::TokenizeWindowsCommandLine received a command line with an
unterminated double-quoted string at the end, it would discard the
text within that string. That doesn't match the behavior of the
standard Windows C library, which will return the text in the unclosed
quoted string as an argv word.

Fixed, and added extra unit tests in that area.

In some cases (specifically the one in Bugzilla #47579) this could
cause TokenizeWindowsCommandLine to return a zero-length list of
arguments, leading to an array overrun at the call site in
windows::GetCommandLineArguments. Added a check there, for extra
safety: now windows::GetCommandLineArguments will return an error code
instead of failing an assertion.

(This change was written as part of https://reviews.llvm.org/D122914,
but split into a separate commit at the last minute at the code
reviewer's suggestion, because it's fixing an unrelated bug in the
same area. The rest of D122914 will follow in the next commit.)


  Commit: 32814df442690d4673759296d850804773a7ea5b
      https://github.com/llvm/llvm-project/commit/32814df442690d4673759296d850804773a7ea5b
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2022-05-03 (Tue, 03 May 2022)

  Changed paths:
    M llvm/include/llvm/Support/CommandLine.h
    M llvm/lib/Support/CommandLine.cpp
    M llvm/lib/Support/Windows/Process.inc
    M llvm/unittests/Support/CommandLineTest.cpp

  Log Message:
  -----------
  [Windows] Fix handling of \" in program name on cmd line.

Bugzilla #47579: if you invoke clang on Windows via a pathname in
which a quoted section closes just after a backslash, e.g.

  "C:\Program Files\Whatever\"clang.exe

then cmd.exe and CreateProcess will correctly find the binary, because
when they parse the program name at the start of the command line,
they don't regard the \ before the " as having any kind of escaping
effect. This is different from the behaviour of the Windows standard C
library when it parses the rest of the command line, which would
consider that \" not to close the quoted string.

But this confuses windows::GetCommandLineArguments, because the
Windows API function GetCommandLineW() will return a command line
containing that \" sequence, and cl::TokenizeWindowsCommandLine will
tokenize the whole string according to the C library's rules. So it
will misidentify where the program name stops and the arguments start.

To fix this, I've introduced a new variant function
cl::TokenizeWindowsCommandLineFull(), intended to be applied to the
string returned from GetCommandLineW(). It parses the first word of
the command line according to CreateProcess's rules, considering \ to
never be an escaping character; thereafter, it switches over to the C
library rules for the rest of the command line.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D122914


Compare: https://github.com/llvm/llvm-project/compare/0a1bcab9f3bf...32814df44269


More information about the All-commits mailing list